THREE VARIATIONS ON JOHNSON-LINDENSTRAUSS MAPS FOR SUBMANIFOLDS OF
                      EUCLIDEAN SPACE VIA REACH
                                        By
                                 Arman Tavakoli
                                A DISSERTATION
                                    Submitted to
                            Michigan State University
                    in partial fulfillment of the requirements
                                 for the degree of
                      Mathematics – Doctor of Philosophy
                                       2021


                                             ABSTRACT
THREE VARIATIONS ON JOHNSON-LINDENSTRAUSS MAPS FOR SUBMANIFOLDS OF
                                 EUCLIDEAN SPACE VIA REACH
                                                  By
                                            Arman Tavakoli
In this thesis we investigate 3 variations of the classical Johnson-Lindenstrauss (JL) maps. In one
direction we build on the earlier work of Wakin and Eftekhari (2015), by considering generalizations
to manifolds with boundary. In a second direction we extend the work of Noga Alon (2003) for
lower bounds for the final embedding dimension in JL maps. In the third direction, we consider
matrices with fast matrix-vector multiply and improve the run-time in the earlier work of Oymak,
Recht and Soltanolkotabi (2018), and Ailon and Liberty (2009).
    This thesis is organized into 6 chapters. The three variations are discussed in chapters 4, 5 and
6. The variation for manifolds with boundary is presented in chapter 4. The lower bound problem
is discussed in chapter 5, and chapter 6 is regarding the run-time improvements. The first chapter is
an introduction to Johnson-Lindenstrauss maps. The second chapter is about a regularity parameter
called reach and geometrical estimates for manifolds. The third chapter is regarding two geometry
questions about reach that arise from the discussions in chapter 2.


                                     ACKNOWLEDGEMENTS
I would like to thank my advisers Profs. Mark Iwen and Ben Schmidt for their kind support and
generous spirit. They listened to me express my own ideas, and helped me make progress with my
research. They shared their experience and knowledge with me and I am grateful for that. I thank
Profs. Rongrong Wang and Willie Wong for being on my dissertation committee and providing
feedback about my work.
    I am indebted to many people at MSU for providing a supportive community that improved
my experience. Sami Mehri generously hosted me in my first visit at MSU; Emmet Harrington
and Gorapada Bera were my amazing office mates who shared many mathematical discussions
with me. Keshav Sutrave was there for me on numerous occasions. I am thankful to Tingting
Jian, Lijie Han, Xiangyu (Orange) Wang, Xiaoying Yuan for many cakes, pastor John Bell, the
Greek squad (Michalis Paparizos, Georgios Psaromiligkos, Christos Grigoriadis, Ioannis Zachos,
Dimitris Vardakis), Zhe Zhang, Dan Normand, Timothy Szocinski, Minh Hoang Le, Yesfun Yeh,
Anand Chandrasekhar, Henry Manoj D’Souza, Nopphachai Sirinakbumrung, Liwei Deng, Farzan
Masrour, Hossein Mazaheri, Vojtěch Kejzlar, and Nayeli Rodriguez Briones for hosting me on
multiple occasions.
    My family always supported me and helped me recharge during the holidays. To my parents
Fara and Mohammad Hossein, my sisters Aylin and Iren, and little (at the time of writing) Ethan, I
love you very much.
    During the first year of the Covid-19 pandemic, I could stay home and work remotely. Yet
many front line workers specially in health care services, provided service to public everyday while
risking their lives. This was at a time where limited information was available about Covid-19 and
there were no vaccines. I am grateful to the numerous people in the front lines who made life safer
for others.
                                                  iii


                                 TABLE OF CONTENTS
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    v
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .             1
   1.1 Johnson-Lindenstrauss Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      1
   1.2 Fast Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   7
CHAPTER 2 GEOMETRIC PROPERTIES OF REACH AND COVERING ESTIMATES                                  11
   2.1 Reach and its Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
   2.2 Covering Estimate from Above via Gunther’s Theorem . . . . . . . . . . . . . . .         17
   2.3 Covering Estimate from Above Without Radius Restriction via Tubes . . . . . . . .        22
   2.4 Covering Estimate from Below via Bishop’s Theorem . . . . . . . . . . . . . . . .        26
   2.5 Covering Estimate from Above for the Unit Secants of a Submanifold . . . . . . . .       28
   2.6 Gaussian Width Estimate for Unit Secants From Above . . . . . . . . . . . . . . .        33
CHAPTER 3 MORE ESTIMATES WITH REACH . . . . . . . .                   . . . . . . . . . . . . . 35
   3.1 Closed Submanifolds with Reach 1 and Minimum Volume            . . . . . . . . . . . . . 35
       3.1.1 Codimension 1 . . . . . . . . . . . . . . . . . . .      . . . . . . . . . . . . . 36
   3.2 Torus in R3 with Reach 1 and Minimum Area . . . . . . .        . . . . . . . . . . . . . 37
   3.3 Doing Better than Reach in Covering Estimates . . . . . .      . . . . . . . . . . . . . 40
CHAPTER 4 SUFFICIENT CONDITIONS FOR JL MAPS OF MANIFOLDS . . . . . . 49
   4.1 With sub-Gaussian Matrices (Variation 1) . . . . . . . . . . . . . . . . . . . . . . 49
   4.2 With SORS Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
CHAPTER 5    NECESSARY CONDITIONS FOR JL MAPS OF MANIFOLDS (VARI-
             ATION 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
CHAPTER 6 A JL ALGORITHM WITH 𝑁 log(log(𝑁)) RUN-TIME (VARIATION 3)                          . . 69
   6.1 Case of Finite Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . 71
       6.1.1 JL Maps for Finite Points via sub-Gaussian Matrices . . . . . . . . . .        . . 71
       6.1.2 JL Maps for Finite Points via SORS Matrices . . . . . . . . . . . . . .        . . 72
       6.1.3 Combining the Two Matrix Types . . . . . . . . . . . . . . . . . . . .         . . 73
   6.2 Upgrade for Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . 74
       6.2.1 SORS Matrices for Subspaces . . . . . . . . . . . . . . . . . . . . . .        . . 75
       6.2.2 Sub-Gaussian Matrices for Subspaces . . . . . . . . . . . . . . . . . .        . . 75
       6.2.3 Mixing the Matrix Types . . . . . . . . . . . . . . . . . . . . . . . . .      . . 76
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
                                               iv


                                        LIST OF FIGURES
Figure 1.1: Two parallel hyper-planes containing a set determine its width in that direction.
            From [35]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   5
Figure 2.1: Two points with the geodesic and secant line between them. We bound the
            angle between 𝑑 and 𝑉0 using 𝜏. . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Figure 3.1: Reach is a global parameter. A bump on the surface of a manifold can change
            its reach, and therefore reach is sensitive to local perturbations. . . . . . . . . . 40
Figure 3.2: Figure-a shows a packing with varying density. Figure-b shows the Veronoi
            cells associated with a packing. Figure-c shows a Veronoi cell contained by
            a doubled packing ball. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Figure 6.1: A vector is split into pieces, each part is processed with a fast JL map, outputs
            are combined and fed into a Gaussian JL map. The Gaussian maps have
            optimal dimensional reduction while fast JL maps have better run-time. In
            this mixed approach, we combine the benefits of both methods. . . . . . . . . . 70
                                                  v


                                               CHAPTER 1
                                            INTRODUCTION
In this thesis we study different variations of Johnson-Lindenstrauss (JL) maps. A great source
for learning about such maps is the textbook of Roman Vershynin, High Dimensional Probability,
[35, Chapter 5]. The variations we consider are based on the theme of improving existing bounds
for JL maps, but they come in different flavors such as sufficient conditions, necessary conditions
or runtime time complexity bounds.
    In this chapter we give an introduction to JL maps and discuss some of the tools that are used
in their construction.
1.1     Johnson-Lindenstrauss Maps
JL maps belong to the general field of dimension reduction (also known as dimensionality reduc-
tion). Informally, given 𝑇 as a subset of R𝑁 , a JL map associated to 𝑇 is a map that reduces 𝑁
while preserving the “geometry of 𝑇”, i.e. 𝐿 : R𝑁 → R𝑚 with 𝑚 < 𝑁 such that 𝑇 and 𝐿 (𝑇)
have the “same geometry”. There are many possible interpretations for a rigorous replacement for
preserving the geometry. Perhaps the most strict requirement would be to ask for preservation of
all pairwise Euclidean distances; that would mean for 𝑥, 𝑦 ∈ 𝑇,
                                         ||𝐿 (𝑥 − 𝑦)|| 2 = ||𝑥 − 𝑦|| 2                               (1.1)
where ||.|| 2 is the Euclidean norm. For an example one can consider 𝐿 to be the projection onto
the span of 𝑇. If 𝑇 is a finite set, then its span is at most a |𝑇 |-dimensional subspace. If |𝑇 | < 𝑁, 𝐿
would satisfy the above requirements and it would be the simplest example of a map that “preserves
the geometry” of 𝑇 while reducing its ambient dimension. In this example the final embedding
dimension is at most the number of points. With condition (1.1), this number is also necessary;
consider 𝑛 points with all the pairwise distances equal to 1. Such an arrangement can be embedded
into R𝑁 for 𝑁 large enough, but it can not be found when the dimension is strictly less than 𝑛−1. For
                                                      1


𝑛 = 3, 4 this is the familiar fact that an equilateral triangle doesn’t fit in R1 or a regular tetrahedron
doesn’t fit in R2 . Relaxing (1.1) allows us to reduce the ambient dimension more.
    One could be less strict and allow the map to distort the pairwise distances, but in a controlled
manner. In this way, given 0 < 𝜖 < 1 and 𝑥, 𝑦 ∈ 𝑇 we require
                           (1 − 𝜖)||𝑥 − 𝑦|| ≤ ||𝐿(𝑥 − 𝑦)|| ≤ (1 + 𝜖)||𝑥 − 𝑦||.                        (1.2)
Inequality (1.2) is known as the JL condition. In this sense, if we consider 𝑇 as a metric space
with the chordal metric, the JL map is a metric space embedding. With condition (1.2), we have a
celebrated theorem known as the Johnson-Lindenstrauss lemma.
Theorem 1.1.1. [35, Theorem 5.3.1] Let 𝑇 be a set of 𝑛 points in R𝑁 and 𝜖 > 0. Let 𝑃 be an
orthogonal projection onto a random 𝑚-dimensional subspace in R𝑁 selected uniformly in the
Grassmanian 𝐺 𝑁,𝑚 . Then there are universal constants 𝑐 1 , 𝑐 2 such that if
                                                      log(𝑛)
                                             𝑚 ≥ 𝑐1          .                                        (1.3)
                                                         𝜖2
then with probability at least 1 − 2 exp(−𝑐 2 𝜖 2 𝑚), the scaled projection
                                                     √︂
                                                        𝑁
                                                𝑄=         𝑃
                                                        𝑚
satisfies
                           (1 − 𝜖)||𝑥 − 𝑦|| ≤ ||𝑄(𝑥 − 𝑦)|| ≤ (1 + 𝜖)||𝑥 − 𝑦||.                        (1.4)
    The main tool for proving this theorem is the concentration of measure. In this proof technique
one estimates the expected length of a vector under all projections. One calculates the averaged
length of that vector when it is projected onto all possible directions. Then by the concentration of
measure, most projections do almost the same as the average, with a small probability for deviation.
                                                                                
Having obtained a failure probability for a single vector, we apply it 𝑛2 times to get an estimate
for all pairwise secants. We use the union bound to combine the the failure probabilities over the
different vectors and we get the claimed result.
                                                     2


    The bound (1.3) is logarithmic in the number of points. This is a improvement compared
to the discussion for (1.1) where we saw a linear bound. The price to pay for this logarithmic
improvement is to tolerate the 𝜖12 factor coming from the distortion 𝜖. The break even point would
        √︃
           𝑐 1 log(𝑛)
be 𝜖 =          𝑛     . As long as we allow 𝜖 to be larger that this threshold, the JL lemma offers a great
improvement.
    The bound (1.3) is also independent of 𝑁. This property makes the JL maps robust for applica-
tions in dimension reduction since it allows them to handle models that encode their data in spaces
with large 𝑁.
    A remarkable property of the above theorem is that it is possible to design the JL map before
knowing the data (the 𝑛 points). This property is known as being oblivious, and is essentially the
result of introducing probability and allowing for a controllable chance of failure. Since the expres-
sions for the probability only depend on the number of points, and not where they are, one can pick
a map from all the possible projections before we know the data, and with high probability (1.2)
would still be satisfied. The obliviousness makes this technique very useful and robust in appli-
cations since the dimension reduction technique becomes decoupled from the intricacies of the data.
    The next step is to construct a JL map for infinite points. The theorem mentioned above depends
explicitly on the number of points. However it is possible to give a different variation, known as
the matrix deviation inequality [35, theorem 9.1.1], where the embedding dimension depends on
how the vectors are spread out in different directions, through a quantity called Gaussian width (we
will describe it below), or a variation of it called Gaussian complexity. The deviation inequality
also differs from the JL lemma discussed above because it works with Gaussian random matrices
instead of random projections. Informally given 𝑇 ⊂ R𝑁 , the matrix deviation inequality says that
                                                       3


with high probability
                            || 𝐴𝑥|| = E|| 𝐴𝑥|| + O (𝛾(𝑇)),          for all 𝑥 ∈ 𝑇                    (1.5)
where 𝐴 is a Gaussian random matrix, E is the expectation (average) operator and 𝛾(𝑇) is the
Gaussian complexity of 𝑇. Conceptually (1.5) tells us that the length of a vector in a set 𝑇 after
its is mapped by a random a matrix A is, with a high chance, closer than O (𝛾(𝑇)) to the average
length. Compared to the earlier discussion about pairwise distances, this perspective is different
because we only consider individual vectors. To transition from one to the other, we form a set of
all pairwise secants and work with that set of vectors.
     A quantitative version of the matrix deviation inequality, with tail bounds, is as follows.
Theorem 1.1.2. [35, theorem 9.1.1 and 9.1.8] Let 𝐴 be a 𝑚 × 𝑁 matrix whose rows 𝐴𝑖 are
independent isotropic and sub-Gaussian random vectors in R𝑁 . Let 𝐾 = max𝑖 || 𝐴𝑖 || 𝜓2 where 𝜓2 is
the sub-Gaussian norm. Let 𝑇 ⊂ R𝑁 with Gaussian width 𝜔(𝑇). Then there is a universal constant
𝐶 such that for 𝑢 ≥ 0, the probability of event
                                           √
                            sup || 𝐴𝑥|| − 𝑚||𝑥|| ≤ 𝐶𝐾 2 [𝜔(𝑇) + 𝑢 rad(𝑇)]                            (1.6)
                            𝑥∈𝑇
is at least 1 − 2 exp(−𝑢 2 ).
     Here rad(𝑇) is the radius of the smallest ball that can contain 𝑇. Again in this theorem we
transition from earlier pairwise distances 𝑥 − 𝑦 in theorem 1.1.1 to individual vectors with 𝑥 only.
                                           𝑥−𝑦
We note that all vectors of the form     ||𝑥−𝑦|| are put into a set 𝑇 and theorem 1.1.2 is applied to this
set. The proof of this theorem is based on the generic chaining technique [35, chapter 9]. The
       √
term 𝑚||𝑥|| acts as the average E|| 𝐴𝑥||. The right hand sides is the deviation from the mean. To
achieve a higher success probability, we must allow for a larger deviation.
                                                                         √
     If we have a choice over 𝑚, by dividing both sides of (1.6) by 𝑚 and choosing 𝑚 large enough
we can bound the deviation to an error 𝜖 of our choosing. In this sense, a large enough 𝑚 allows us
to construct an oblivious 𝐽 𝐿 map for any set in terms of its Gaussian width.
                                                      4


     Now we discuss the Gaussian width. We present two ways to think about it. One way is
using probability and Gaussian random variables. The second way is geometrical, and doesn’t use
probability. Having the different view points allows us to switch from one to the other depending
on the application. The probabilistic definition is as follows.
Definition 1.1.3. [35, Definiiton 7.5.1] Let 𝑔 be a standard Gaussian random vector, 𝑔 ∼ 𝑁 (0, 𝐼 𝑁 ).
The Gaussian width of a subset 𝑇 ⊂ R𝑁 is defined as
                                         𝜔(𝑇) = E sup ⟨𝑔, 𝑥⟩.                                    (1.7)
                                                    𝑥∈𝑇
     With an absolute value E sup𝑥∈𝑇 |⟨𝑔, 𝑥⟩|, one gets the Gaussian complexity of T, denoted by
𝛾(𝑇). In general 𝜔(𝑇) ≤ 𝛾(𝑇) but if 𝑇 is origin-symmetric then 𝜔(𝑇) = 𝛾(𝑇).
     To get an interpretation of how Gaussian width measures the width of a set, first we need the
concept of spherical width. Then we show how it relates to Gaussian width.
Definition 1.1.4. [35, Definition 7.5.5] The spherical width of a subset 𝑇 ⊂ R𝑁 is defined as follows
where 𝜃 is a direction uniformly distributed in S𝑁−1 , 𝜃 ∼ Unif(𝑆 𝑁−1 ).
                                         𝜔 𝑠 (𝑇) = E sup ⟨𝜃, 𝑥⟩                                  (1.8)
                                                     𝑥∈𝑇
     Figure 1.1 shows how width of a set in a particular direction is measured. When this directional
width is averaged over all possible directions we get the spherical width.
Figure 1.1: Two parallel hyper-planes containing a set determine its width in that direction. From
[35].
     For a Gaussian random vector, one can consider its direction 𝜃 and length 𝑟 separately. Direction
𝜃 is uniformly distributed and decouples from 𝑟. Next we recall that the expected length of Gaussian
                                                   5


                                √                                        √
random variable differs from 𝑁 by a universal constant, |E||𝑔|| − 𝑁 | ≤ 𝐶. This lets us relate the
Gaussian width to spherical width as follows.
                              √                               √
                             ( 𝑁 − 𝐶)𝜔 𝑠 (𝑇) ≤ 𝜔(𝑇) ≤ ( 𝑁 + 𝐶)𝜔 𝑠 (𝑇)                                  (1.9)
                                                                             √
In this sense Gaussian width is the spherical width of a set scaled by          𝑁, and spherical width has
a clear geometrical interpretation.
    We give 4 examples of Gaussian widths that are often encountered in high dimensional proba-
bility, see [35, 7.5.8-7.5.10]. They are the 𝐿 1 ball 𝐵1𝑁 , the 𝐿 2 ball 𝐵2𝑁 , the 𝐿 ∞ ball 𝐵∞
                                                                                             𝑁 , and a set 𝑇
with 𝑛 points. We use 𝐶 for any universal constant.
                                 √︁                        √︁
                             𝐶1 log(𝑁) ≤ 𝜔(𝐵1𝑛 ) ≤ 𝐶2 log(𝑁)                                          (1.10)
                                                √
                                    |𝜔(𝐵2𝑁 ) − 𝑁 | ≤ 𝐶                                                (1.11)
                                                       √︂
                                                   𝑛       2
                                            𝜔(𝐵∞ ) =         𝑁                                        (1.12)
                                                           𝜋
                                                          √︁
                                               𝜔(𝑇) ≤ 𝐶 log(𝑛) diam(𝑇)                                (1.13)
                                                       √
    Dividing the first three above expressions by        𝑁 gives us spherical width estimates for the
corresponding 𝐿 𝑝 balls. One sees that the width of 𝐿 2 ball is a constant as expected, while the
                1
                                            √︃
                                               log(𝑁)                                              √
                                                                                   ∞ ball grows as 𝑁.
width of the 𝐿 ball decreases to zero as          𝑁   while  the width  of the   𝐿
    With Gaussian width, theorem 1.1.2 lets us get JL maps for infinite sets. However one still
needs techniques for estimating the Gaussian width. In this thesis, we restrict our discussion to
compact smooth manifolds. To get a JL map for a compact manifold we need estimates for the
Gaussian width of its set of unit secants. The regularity provided by being a manifold allows us to
find explicit estimates and thus close the argument for JL maps of this class of objects. More about
this estimate is in section 2.6.
    The key inequality that allows us to relate geometrical properties of a manifold to its Gaussian
width is called Dudley’s inequality [18, page 226]. For a universal constant 𝑐, one can relate
                                                     6


covering number of set 𝑇 to its Gaussian width as follows.
                                            ∫ ∞ √︁
                                  𝜔(𝑇) ≤ 𝑐          log(𝐶 (𝑇, 𝜖))𝑑𝜖 .                            (1.14)
                                              0
     Motivated by this inequality we calculate various covering numbers in chapter 2.
1.2      Fast Matrices
Having discussed random projections and Gaussian matrices for dimension reduction, the next step
would be to speed up the computation. For a 𝑚 × 𝑁 matrix, the general matrix-vector multiplication
takes O (𝑚𝑁) operations. Since Gaussian matrices are densely populated with random variables
this is difficult to improve.
     If instead we specialize to discrete Fourier matrices, one can use the Fast Fourier Transform
(FFT) algorithm. FFT allows for a O (𝑁 log(𝑁)) matrix-vector multiplication, for square matrices.
Therefore the Discrete Fourier Transform (DFT) matrices are ideal candidates for fast JL maps.
Other examples in this class are also the Welsh-Hadamard matrices. However for JL maps we need
short and wide matrices with 𝑚 ≤ 𝑁. This will come from sub-sampling the Fourier matrix. Also
since Fourier matrices have a fast matrix-vector multiplication, one can do the full multiplication
and throw away the unwanted rows. Constructing JL maps from Fourier matrices requires a process
that we briefly describe below. Such an approach has been used before in earlier works. One
example is the work in [29] where the authors give a JL construction via Fourier matrices for an
arbitrary object in terms of its Gaussian width.
     We use an indirect approach from the field of the Compressive Sensing (CS). This topic was
first discussed by Terence Tao and Emmanuel Candes in their seminal papers [11], [12] and [13].
([11] is the highest cited paper of Tao according to Google scholar at the time of writing). CS is a
large field; for our purposes we need three tools from it. First is the concept of Restricted Isometry
Property (RIP), second is an RIP estimate for discrete Fourier transform matrices (one can think
of RIP estimates as the fundamental theorems of CS), and third is a theorem called Krahmer-Ward
that transforms an RIP estimate to a JL theorem. Below we describe each step separately. If 𝐹 is a
DFT matrix, then the construction scheme is as follows.
                                                  7


                        √︂                                           ©±1               ª
                           𝑁                                         ­                 ®
                              (Sample m rows from 𝐹𝑁×𝑁 )
                                                                     ­      . .        ®
                                                                     ­          .      ®
                           𝑚                                         ­                 ®
                        |                   {z                  }    ­                 ®
                               Restricted Isometry Property                          ±1
                                                                     «                 ¬
                                                                     |      {z         }
                                                                  Random Diagonal Matrix
    The random diagonal matrix is used in the Krahmer-Ward theorem. We also note that there are
multiple conventions for normalizing a DFT matrix. Here we adopt the convention that the matrix
is unitary. If 𝜔 is the nth root of unity, 𝜔 𝑁 = 1, then the matrix 𝐹 would be
                                         ©1      1          1      ...       1 ª
                                         ­                                           ®
                                                              2
                                    1 ­­1
                                         ­
                                                 𝜔          𝜔      ... 𝜔    𝑁−1      ®
                                                                                     ®
                             𝐹 = √ ­.                                                ®
                                                                                     ®                  (1.15)
                                     𝑁 ­ ..                                          ®
                                         ­                                           ®
                                         ­                                         2
                                                                                     ®
                                           1  𝜔  𝑁−1 𝜔2(𝑁−1) ... 𝜔 (𝑁−1) .
                                         «                                           ¬
The first step is the Restricted Isometry Property. RIP is about how much a matrix distorts sparse
vectors. It is similar to the JL condition but it is restricted to sparse vectors. This property is
reported through pairs of numbers. In each pair, there is a sparsity level, 𝑠, and a deviation value 𝛿.
In short sometimes it is written as 𝛿 𝑠 . As a reminder, sparsity of a vector is the number of non-zero
components of that vectors, denoted by ||𝑥|| 0 where the zero comes from the lim 𝑝→0 ||𝑥|| 𝑝 . So if a
matrix 𝐴 possess the 𝛿 𝑠 RIP then for all vectors 𝑥 that are 𝑠 or less sparse we simultaneously have
                                (1 − 𝛿 𝑠 )||𝑥|| 2 ≤ || 𝐴𝑥|| 2 ≤ (1 + 𝛿 𝑠 )||𝑥|| 2 .                     (1.16)
Some authors put squares in the RIP condition as in (1 − 𝛿 𝑠 )||𝑥|| 2 ≤ || 𝐴𝑥|| 2 ≤ (1 + 𝛿 𝑠 )||𝑥|| 2 . These
definitions are equivalent up to scaling the 𝛿 𝑠 by an absolute constant.
    The second step is an RIP estimate for the discrete Fourier transform matrices. The estimate we
provide below is for the general family of orthonormal matrices with 𝑚 rows sampled uniformly in
an i.i.d fashion.
                                                         8


Lemma 1.2.1. [29, Lemma 4.2], [18, theorem 12.32] Let 𝐹 ∈ R𝑁×𝑁 be an orthonormal matrix
obeying
                                                                          𝐾
                                   𝐹 ∗ 𝐹 = 𝐼,          max𝑖, 𝑗 |𝐹𝑖 𝑗 | ≤ √ .                          (1.17)
                                                                           𝑁
Define the random subsampled matrix 𝐻 ∈ R𝑚×𝑁 with i.i.d rows chosen uniformly at random from
                                                 √︃
the rows of 𝐹. Then the RIP(𝑠, 𝛿) holds for 𝑚        𝑁
                                                       𝐻 with probability at least 1 − 𝑒 −𝜂 for all 𝛿 > 0 as
long as
                                                            𝑠 log4 (𝑁)
                                       𝑚 ≥ 𝐶𝐾 2 (1 + 𝜂)                  .                            (1.18)
                                                                 𝛿2
    Some remarkable characteristics for the bound in (1.18) are that it is linear in 𝑠 and logarithmic
(to 4th power) in 𝑁. It has an explicit dependence on the ambient dimension 𝑁; since 𝑁 could be
prohibitively large, improving the depence on 𝑁 in this bound has been an active area of research
in CS. For a recent improvement see [10].
    The third step is a theorem that allows us to get a JL theorem for finite points from an RIP
estimate. This theorem is known as the Krahmer-Ward theorem; it takes a matrix with a RIP
estimate, 𝑛 points as data and applies a sequence of random reflections through the coordinate axes.
The result is a probabilistic theorem for a JL map with a controlled failure probability.
Theorem 1.2.2. (Krahmer-Ward) [18, theorem 9.36] Let 𝑥𝑖 be 𝑛 points in R𝑛 . Let 𝜌, 𝜖 ∈ (0, 1).
Let 𝐴 ∈ R𝑚×𝑁 be a matrix where its restricted isometry constant for sparsity 2𝑠, i.e. 𝛿2𝑠 , satisfies
𝛿2𝑠 ≤   𝜖
        4 for 𝑠 ≥ 16 log( 4𝑛
                           𝜌 ). Let 𝐷 be a diagonal matrix with Rademacher random variables, i.e.
with uniform ±1 random variables, on the diagonal. Then with probability exceeding 1 − 𝜌 the
following holds simultaneously for all 𝑥𝑖 .
                               (1 − 𝜖)||𝑥𝑖 || 2 ≤ || 𝐴𝐷𝑥𝑖 || 2 ≤ (1 + 𝜖)||𝑥𝑖 || 2 .                   (1.19)
    Combining the above 3 steps we get a JL map for finite points using DFT matrices. The benefit
of this approach is the computational speed. The cost to pay is a higher final dimension compared
                                                        9


to Gaussian or random projection matrices. In chapter 6, we show how one can combine the two
approaches and get the best of both worlds.
                                              10


                                            CHAPTER 2
         GEOMETRIC PROPERTIES OF REACH AND COVERING ESTIMATES
In this chapter we discuss geometrical estimates based on a regularity parameter known as reach.
The reader interested in only the JL applications can skip to further chapters and refer back as
necessary.
    We provide 4 main theorems. These theorems provide us with the necessary geometrical tools
to construct our desired JL maps for manifolds. In theorem 2.1.7 we give a comprehensive list of
properties of reach. In theorem 2.2.6 we give an upper bound for covering numbers of a compact
manifold with boundary. We do so by first covering the boundary as an independent manifold. This
covers a collar of the boundary, after which we cover the interior. The method of proof is based on
Gunther’s volume comparison theorem and is restricted by the injectivity radius of the exponential
map in the interior of 𝑀 away from the boundary. In theorem 2.4.1 we provide a lower bound for
the covering numbers of a compact manifold. In theorem 2.5.2 we give a covering number bound
from above for the unit secants of a submanifold.
2.1     Reach and its Properties
Here we review the concept of reach for a submanifold of the Euclidean space and in theorem 2.1.7
provide a comprehensive list of its properties. Theorem 2.1.7 provides the main properties of reach
in a convenient and ready-to-use manner for application to geometric problems, and we hope it is
helpful for other researchers. We also discuss the case when the submanifold possibly possesses a
boundary, as is expected to be the case for manifold models in applications.
    Reach is an extrinsic parameter that is defined based on how far one can move away from an
embedded submanifold while maintaining a unique closest point property. This parameter also
controls the extrinsic acceleration of unit speed geodesics. It has been used extensively as a regu-
larity parameter since 1959 when it was defined by Federer in [16]. A historical viewpoint of its
                                                 11


development can be found in [34]. Some earlier results regarding reach can be found in [1], [8] and
[15]. In [1], the authors offer a probabilistic method for estimating the reach of a manifold. In [8]
the authors consider the intersection of a ball of small radius with a set of positive reach, and for
manifolds they estimate the angle between tangent spaces at different points in terms of reach. In
[15, theorem 2], the authors give a JL theorem for a closed manifold via reach.
    Here, we begin by recalling the definition of reach and then we summarize its properties.
Definition 2.1.1. (Reach [16, definition 4.1]) For a closed subset of Euclidean space 𝐴 ⊂ R𝑛 , the
reach 𝜏 is defined as
       𝜏( 𝐴) = sup{𝑡 ≥ 0 ∀𝑥 ∈ R𝑛 such that 𝑑 (𝑥, 𝐴) < 𝑡, 𝑥 has a unique closest point in 𝐴}.
    The above definition is for closed subsets of R𝑛 . When restricted to submanifolds, reach
depends in part on the second fundamental form of the embedding. However control of the second
fundamental form does not fully control reach as two sheets of the submanifold may come close
to each other. One can obtain an equivalent characterization of reach in terms of the injectivity
radius of the normal exponential map, defined below following [22, Section 8.1]. First we need the
concept of cut points for the normal exponential map.
Definition 2.1.2. (Cut point) Let 𝜉 be a line segment in R𝑛 meeting 𝑀 orthogonally at 𝑚 ∈ 𝑀.
We say 𝑥 ∈ 𝜉 is cut point (cut-focal point) along 𝜉 provided distance from 𝑥 to 𝑀 is no longer
minimized along 𝜉 past 𝑥.
Definition 2.1.3. (Cut distance) Define the function 𝑒 𝑐 : {( 𝑝, 𝑢) | 𝑝 ∈ 𝑀, 𝑢 ∈ 𝑁 𝑝 𝑀, ∥𝑢∥ = 1} → R
such that
                             𝑒 𝑐 ( 𝑝, 𝑢) = sup{𝑡 > 0 | distance( 𝑝 + 𝑡𝑢, 𝑀) = 𝑡}.
In words, 𝑒 𝑐 ( 𝑝, 𝑢) is the distance along direction 𝑢 starting orthogonal at 𝑝, past which the line
segment along 𝑢 stops being distance minimizing to 𝑀.
                                                       12


Definition 2.1.4. ( [22, Section 8.1]) The injectivity radius of the normal exponential map
𝑟 inj (exp𝑁 ) (also known as minimal focal-cut distance) is inf{𝑒 𝑐 ( 𝑝, 𝑢) | 𝑝 ∈ 𝑀, 𝑢 ∈ 𝑁 𝑝 𝑀, ∥𝑢∥ = 1}
       When boundary is present, one must specify the criteria for normal vectors at the boundary. They
are defined as the ones that make an obtuse angle with inward pointing tangent vectors, following
[16, definition 4.4]. These vectors are normal to the boundary, when boundary is viewed as an
independent manifold, but only make obtuse angles with the manifold itself. Let 𝑝 ∈ 𝜕 (𝑀) ⊂ R𝑛
and 𝑣 ∈ 𝑇𝑝 𝑀 that is inward pointing. Then 𝑢 ∈ R𝑛 is normal to 𝑀 at 𝑝 if 𝑢 • 𝑣 ≤ 0 for all such 𝑣.
       We also need the concept of focal points; this helps us describe the behavior where the distance
to the submanifold is equal to reach.
Definition 2.1.5. (Focal point, [22, section 8.1]) Let 𝑀 be a submanifold of R𝑛 . A focal point of
𝑀 is a point 𝑥 ∈ R𝑛 such that the exponential map exp𝑁 of the normal bundle of 𝑀, is singular
somewhere on exp−1    𝑁 (𝑥).
Definition 2.1.6. [22, section 8.1] Let 𝑒 𝑓 ( 𝑝, 𝑢) be the distance from 𝑝 ∈ 𝑀 to its first focal point
along the geodesic 𝑡 → exp𝑁 ( 𝑝, 𝑡𝑢). In other words, let 𝑒 𝑓 := {( 𝑝, 𝑢) | 𝑝 ∈ 𝑀, 𝑢 ∈ 𝑀 𝑝⊥ , ||𝑢|| =
1} → R be the function defined by
                          𝑒 𝑓 ( 𝑝, 𝑢) = inf{𝑡 > 0 | kernel(((exp𝑁 )∗ ) ( 𝑝,𝑡𝑢) ) ≠ {0}}.
       We are now ready to state the key properties of reach. In this approach we are in part following
the work of [1, proposition A.1]. We provide a survey of results that are scattered in the literature
and provide proofs where they can not be easily located elsewhere.
Theorem 2.1.7 (Properties of Reach). Let 𝑀 be a compact, smooth submanifold of R𝑛 possibly
with boundary. Let 𝜏 be the reach of 𝑀. Let 𝑝, 𝑞 ∈ 𝑀 and 𝑥 ∈ R𝑁 . Then the following properties
hold.
     1. All geodesics in 𝑀 are 𝐶 1 . In the interior of 𝑀, they are smooth. At all points unit speed
         geodesics have one-sided second derivatives. When switching from interior to boundary,
         geodesics may bifurcate.
                                                       13


   2. The one sided second derivatives for unit speed geodesics are bounded above in norm by 1𝜏 .
      In particular in the interior of 𝑀 the norm of the second fundamental form of 𝑀 is bounded
      by 1𝜏 .
                                                                 −2         1
   3. In the interior of 𝑀, any sectional curvature 𝑘 satisfies  𝜏2
                                                                    ≤𝑘≤    𝜏2
                                                                              .
   4. (Federer Tubular Neighborhood Theorem) If the line segment 𝑝𝑥 is normal to 𝑀 at 𝑝, and
      ∥ 𝑝 − 𝑥∥ < 𝜏 then 𝑝 is the closest point to 𝑥 in 𝑀.
   5. If line segment 𝑝𝑞 is normal to 𝑀 at 𝑝 then ∥ 𝑝 − 𝑞∥ ≥ 2𝜏.
   6. At distance exactly 𝜏 away from 𝑀, either there is a focal point or there are two points
       𝑝 1 , 𝑝 2 ∈ 𝑀 and 𝑦 ∈ R𝑛 such that ∥ 𝑝 1 − 𝑦∥ = ∥ 𝑝 2 − 𝑦∥ = 𝜏 and the line segments 𝑝 1 𝑦 and
       𝑝 2 𝑦 meet at 𝑦 at an angle of 𝜋; or both.
   7. 𝜏 = 𝑟 inj (exp𝑁 ).
   8. 𝜏 > 0.
   9. For 𝑝 ∈ 𝑀, the injectivity radius at 𝑝 is at least min{𝑑 𝑀 ( 𝑝, 𝜕 𝑀), 𝜋𝜏} where 𝑑 𝑀 ( 𝑝, 𝜕 𝑀) is
      the geodesic distance of 𝑝 to the boundary of 𝑀. If there is no boundary, injectivity radius
      is at least 𝜋𝜏.
For the next two properties, let 𝑑 and 𝑙 be the Euclidean and geodesic distances between 𝑝 and 𝑞
respectively.
             𝑙2
  10. 𝑙 −    2𝜏  ≤ 𝑑.
                                                            2𝑑 2
  11. When restricted to 𝑑 ≤ 2𝜏 , we further have 𝑙 ≤ 𝑑 +    𝜏 .
    For the last two properties, let 𝛾(𝑡) be a unit speed geodesic connecting 𝑝 to 𝑞. Let 𝑣 ∈ 𝑇𝑝 𝑀
   ¤
be 𝛾(0).    Let 𝜙 be the angle between 𝑣 and the secant line connecting 𝑝 to 𝑞. Let 𝑤 ∈ 𝑇𝑝 𝑀 be any
unit vector and 𝑤 ∗ ∈ 𝑇𝑞 𝑀 its parallel transport via the connection on 𝑀 along 𝛾. Let 𝜃 be the
angle between 𝑤 and 𝑤 ∗ after they are parallel transported to the origin in R𝑛 .
                                                  14


  12. 𝜃 ≤ 𝜏𝑙 .
                                              𝑑       2𝑑 2
  13. When restricted to 𝑑 ≤ 2𝜏 , sin(𝜙) ≤    2𝜏 (1 +  𝜏 ) .
    Proof. 1. See [3, theorem 1].
    2. See [8, lemma 4] or [27, proposition 6.1]. In the case where a unit speed geodesic 𝛾(𝑡) is
part of the boundary, the only modification is that 𝛾(𝑡)  ¥  is normal to the boundary and the same
argument applies.
    3. This follows from item 2 above and Gauss’s equation [28, page 100, theorem 5].
    4. See [16, theorem 4.8(12)] or [8, theorem 2].
    5. By Federer’s tubular neighborhood theorem [8, section 3.1], for points on line segment 𝑝𝑞
with distance to 𝑝 less than 𝜏, 𝑝 is the closest point to them. Therefore 𝑞 is at least 𝜏 away from
the point that is 𝜏 away from 𝑝. This gives the claimed ∥ 𝑝 − 𝑞∥ ≥ 2𝜏.
    6. See [33].
    7. Let 𝑟 = 𝑟 inj (exp𝑁 ). First we show 𝜏 ≤ 𝑟. For the sake of contradiction assume 𝑟 < 𝜏. Then
by definition of 𝑟, there is a normal line segment to 𝑀 that stops being distance minimizing pass
distance 𝑟 away from 𝑀. But this contradicts Federer’s tubular neighborhood theorem, as such
normal line segments must remain distance minimizing up to distance 𝜏.
    Next we show 𝜏 < 𝑟 is impossible and that is enough for obtaining 𝜏 = 𝑟. For contradiction,
assume 𝜏 < 𝑟. By item 6, at distance exactly 𝜏 away from 𝑀 there must be either a focal point
or two equal length line segments normal to 𝑀 and intersecting each other at angle of 𝜋. If there
is a focal point, it is known that past a focal point, geodesics stop being distance minimizing, see
[31, lemma 2.11]. Hence past a distance 𝜏, there is a normal line segment to 𝑀 that stops being
distance minimizing; but this contradicts the definition of 𝑟 because it guarantees they remain
distance minimizing up to distance 𝑟 away from 𝑀.
    In the case of two equal line segments, call them 𝑝 1 𝑦 and 𝑝 2 𝑦 with 𝑝 1 , 𝑝 2 ∈ 𝑀. Since they
meet at an angle of 𝜋, 𝑝 1 , 𝑦 and 𝑝 2 are collinear; and since ∥ 𝑝 1 − 𝑦∥ = ∥ 𝑝 2 − 𝑦∥, 𝑝 1 𝑦 stops being
                                                    15


distance minimizing past 𝑦. Again this contradicts the guarantee of being distance minimizing up
to distance 𝑟, showing 𝜏 < 𝑟 is impossible.
    8. See [7, page 151, theorem 4].
    9. By [4, theorem 3], in a manifold with boundary and reach 𝜏, two geodesics with the same
starting point must travel a distance of 𝜋𝜏 before they meet. Separately all sectional curvatures are
                      1
bounded above by      𝜏2
                         . Then by Klingenberg’s theorem [6, theorem 89], we get that the injectivity
radius is at least 𝜋𝜏. Since we take a minimum with the distance to the boundary, we are guaranteed
not to intersect it.
    10. See [27, lemma 6.3].
    11. For partial and complementary proofs see [27, lemma 6.3] and [15, lemma 7]. Here we
                                                      𝑙2
streamline the above arguments. Reversing 𝑙 −         2𝜏  ≤ 𝑑, one of
                                        √︂                          √︂
                                                2𝑑                         2𝑑
                               𝑙 ≤ 𝜏−𝜏 1−              or 𝑙 ≥ 𝜏 + 𝜏 1 −
                                                 𝜏                          𝜏
                                             𝜏
must hold. If the strict inequality 𝑑 <      2 holds then 𝑙 = 𝜏 is impossible. We use this fact to
                                                                              √︃
show when 𝑑 ≤ 2𝜏      we must have 𝑙 ≤ 𝜏. From 𝑙 ≤ 𝜏, only 𝑙 ≤ 𝜏 − 𝜏 1 − 2𝑑         𝜏 remains, and using
    √               2                              2𝑑 2
1 − 1 − 𝑥 ≤ 𝑥+𝑥   2 we get the claimed 𝑙 ≤ 𝑑 +       𝜏 .
                                                    𝜏
    For the sake of contradiction assume 𝑑 ≤        2  and 𝜏 < 𝑙. Then 𝑆1 = {𝑎 ∈ 𝑀 | 𝑑 𝑀 ( 𝑝, 𝑎) = 𝜏} is
not empty. Let 𝑞 ∗ be a minimizer for the Euclidean distance to 𝑝 in 𝑆1 :
                                         𝑞 ∗ = arg inf ∥ 𝑝 − 𝑎∥.
                                                     𝑎∈𝑆1
    Then 𝑞 ∗ is also a minimizer for Euclidean distance to 𝑝 over 𝑆2 = {𝑎 ∈ 𝑀 | 𝑑 𝑀 ( 𝑝, 𝑎) ≥ 𝜏}.
Because if there is a minimizer in the interior of 𝑆2 such as 𝑎, then line segment 𝑝𝑎 must be normal
                                                                                                           𝜏
to 𝑀 at 𝑎. Then ∥ 𝑝 − 𝑎∥ ≥ 2𝜏. But since 𝑎 is a minimizer, it holds that ∥ 𝑝 − 𝑎∥ ≤ ∥ 𝑝 − 𝑞∥ ≤             2
and that contradicts ∥ 𝑝 − 𝑎∥ ≥ 2𝜏.
    Similarly 𝑞 can’t be a minimizer because it is in the interior of 𝑆2 . So there is a strict inequality
∥ 𝑝 − 𝑞 ∗ ∥ < ∥ 𝑝 − 𝑞∥. But for ∥ 𝑝 − 𝑞 ∗ ∥ <      𝜏
                                                   2,   it is impossible to have 𝑑 𝑀 ( 𝑝, 𝑞 ∗ ) = 𝜏 and this
                                                      16


contradicts how 𝑞 ∗ was constructed.
    12. See [8, lemma 6].
                                              ℎ
    13. See figure 2.1. We have sin(𝜙) =      𝑑 where ℎ is the distance of 𝑞 to the line through 𝑝
                                                     𝜏
with direction 𝑣. Since we are restricted to 𝑑 ≤     2,  using item 11, we have that 𝑙 ≤ 𝜏. Now
we bound ℎ. Let 𝜃 (𝑠) be the angle between 𝑣 and the tangent vector to the geodesic at distance
                                                                                                      𝑠
𝑠 along the geodesic, when viewed as vectors in R𝑁 . From item 12 we get that 𝜃 (𝑠) ≤                𝜏.
The incremental height gain in the ℎ direction is at most sin(𝜃 (𝑠))𝑑𝑠. Therefore we obtain
                                                                        2
                                                         𝑙2   𝑑2
     ∫𝑙                ∫𝑙
                                                                    1 + 2𝑑
                               
ℎ ≤ 0 sin(𝜃 (𝑠))𝑑𝑠 ≤ 0 sin 𝜏𝑠 𝑑𝑠 = 𝜏 − 𝜏 cos 𝜏𝑙 ≤ 2𝜏        ≤ 2𝜏         𝜏    . Dividing by 𝑑 gives the
claimed result.                                                                                       ■
Figure 2.1: Two points with the geodesic and secant line between them. We bound the angle
between 𝑑 and 𝑉0 using 𝜏.
2.2    Covering Estimate from Above via Gunther’s Theorem
We first present a covering bound for closed manifolds. Since the boundary of a manifold is itself
a closed manifold, this argument is useful in the case the manifold has boundary.
    First we need a lemma relating covering and packing numbers. For definitions of packing and
covering see [35, lemma 4.2.8].
Lemma 2.2.1. Let 𝑋 be a metric space with 𝐴 a subset of it. Then for any 0 < 𝜖, the 𝜖-covering
number of 𝐴 is not greater than the 2𝜖 -packing number of 𝐴.
                                                       𝜖
                                           𝐶 (𝜖) ≤ 𝑃( )                                           (2.1)
                                                       2
                                                 17


    Proof. Start with a packing with balls of radius 2𝜖 . Then enlargen the balls to radius 𝜖. We
claim this is a covering. If not there is a point 𝑦 ∈ 𝐴 such that it is more than 𝜖 away from all the
                                          𝜖
centers of balls. Then a ball of radius   2 centered at 𝑦 would be disjoint with the starting packing
balls and that contradicts the assumption that any packing is maximal.
Theorem 2.2.2. (Covering a Closed Manifold) Let 𝑀 ↩→ R𝑛 be a closed (compact, no boundary)
submanifold of R𝑛 of dimension 𝑑 ≥ 0. Let 𝜏 > 0 be the reach of 𝑀, and 𝑉 denote the volume of
𝑀, as induced by the embedding. When 𝑑 = 0, we use the counting measure for volume.
Then if 𝑑 = 0, 𝑉 balls are sufficient to cover 𝑀.
If 𝑑 > 0, let 𝜔 𝑑 denote the volume of the 𝑑-dimensional Euclidean ball of radius 1. Then for
           √                                                 𝑉
0 < 𝜖 < 2 6𝜏, one can cover 𝑀 with at most                 𝜖 2 𝑑−1 𝜖 𝑑
                                                                       𝑛-dimensional Euclidean balls of
                                                  𝜔 𝑑 (1−       ) (2)
                                                          24𝜏 2
radius 𝜖 with centers on 𝑀.
    Proof. The case of 𝑑 = 0 corresponds to isolated points and 1 ball per point is sufficient. So
assume 𝑑 > 0. The covering number with balls of radius 𝜖 is bounded above by the packing number
with radius 2𝜖 , [35, lemma 4.2.8]. We claim each such packing ball contains a geodesic ball of
        𝜖                                                                𝜖 2 𝑑−1 𝜖 𝑑
radius  2 and the volume of the geodesic ball is at least 𝜔 𝑑 (1 −      24𝜏 2
                                                                              ) ( 2) . Since the packing
balls are disjoint, the claim implies the theorem immediately. For 𝑥 ∈ 𝑀, 𝐵R𝑛 (𝑥, 2𝜖 ) contains the
geodesic ball 𝐵 𝑀 (𝑥, 2𝜖 ) because the geodesic distance between two points of 𝑀 is not smaller than
the Euclidean distance between them. We will work with this geodesic ball.
    To bound the volume of the geodesic ball from below, we use G¥unther’s volume comparison
theorem, see [19, page 169, theorem 3.101, part ii]. To use it, we need to bound the sectional
curvatures of 𝑀 from above, and ensure that the geodesic balls don’t touch the cut locus of their
center. From properties of reach 𝜏, theorem 2.1.7, all sectional curvatures of 𝑀 are bounded above
    1
by  𝜏2
       and the injectivity radius of 𝑀 is at least 𝜋𝜏. To avoid intersecting the cut locus, we need
𝜖
                                                          √
2  < 𝜋𝜏; this is satisfied by the assumption that 𝜖 < 2 6𝜏 (one could allow for a larger range of 𝜖,
see remark 2.2.3). By G¥unther’s theorem, the volume of the geodesic ball is at least the volume
                                                   18


                                 𝜖
of a geodesic ball of radius     2 in the simply connected model space of constant sectional curvature
 1
𝜏2
   , which is a sphere with radius 𝜏. Let 𝑉 1 ( 2𝜖 ) denote this volume. It is explicitly given by the
                                                      𝜏2
formula (see [25]):
                                                             ∫ 𝜖
                                              𝜖     2𝜋 𝑑/2         2
                                      𝑉1 ( )=                         𝑆 1 (𝑥) 𝑑−1 𝑑𝑥                                    (2.2)
                                         𝜏2 2
                                                         𝑑
                                                     Γ( 2 ) 0           𝜏2
                                              √
     where for 𝑘 > 0, 𝑆 𝑘 (𝑥) = √1 sin( 𝑘𝑥). It follows that
                                     𝑘
                                 ∫   𝜖                                 ∫    𝜖
                         2𝜋 𝑑/2      2
                                                  𝑑−1        2𝜋 𝑑/2         2        𝑥
                                        𝑆 1 (𝑥)       𝑑𝑥 =                    (𝜏 sin( )) 𝑑−1 𝑑𝑥.                        (2.3)
                         Γ( 𝑑2 )   0       𝜏2                 Γ( 𝑑2 )     0          𝜏
       sin(𝑥)                                            √
Since     𝑥   is decreasing on [0, 𝜋], and 𝜖 < 2 6𝜏 < 2𝜋𝜏,
                                                                     𝜖
                                            𝜖      2𝜋 𝑑/2 sin( 2𝜏 ) 𝑑−1 1 𝜖 𝑑
                                    𝑉1 ( ) ≥               ( 𝜖 )                 ( ) .                                  (2.4)
                                       𝜏2 2        Γ( 𝑑2 )                     𝑑 2
                                                                  2𝜏
                  𝑥2   sin(𝑥)                   √             𝜋 𝑑/2
Using 0 < 1 −     6  <    𝑥   for 0 < 𝑥 <        6, 𝜔 𝑑 =    Γ( 𝑑2 +1)
                                                                       and Γ( 𝑑2 + 1) = Γ( 𝑑2 ) 𝑑2 we get
                                              𝜖                     𝜖 2 𝑑−1 𝜖 𝑑
                                       𝑉 1 ( ) ≥ 𝜔 𝑑 (1 −                ) ( ) .                                        (2.5)
                                         𝜏2 2                    24𝜏 2          2
                                                                                                                           ■
                                                                                                                        √
Remark 2.2.3. In the proof of the theorem 2.2.2, we restricted the range of 𝜖 as 0 < 𝜖 < 2 6𝜏.
                            2
The reason is that 1 − 𝑥6 , which is the second order Taylor expansion of sin(𝑥)/𝑥, remains positive
in this range. One could work with the sinc(𝑥) function to have the larger range 0 < 𝜖 < 2𝜋𝜏, or
one could work with the 6th order Taylor expansion of sin(𝑥)/𝑥, which remains positive slightly
beyond 3 and we have 0 ≤ 𝑇6 (𝑥) ≤ sinc(𝑥) on [0, 3]. However for the purposes of JL applications
such improvements only offer marginal benefits and hence for simplicity we state our bounds for
the second order expansion.
     Next as an example we apply our theorem to S𝑑 . A standard estimate for covering S𝑑 with balls
                                                                                                  √        𝑑
centered on the sphere is ( 3𝜖 ) 𝑑 , see [35, corollary 4.2.13]. Here we get (3.4 𝑑) 2.1                𝜖 𝑑  . In the special
                                                                                      𝑑 1.5 log(𝑑)
case of S𝑑 , there are more economical bounds on the order of O (                           𝜖𝑑
                                                                                                   ) ([9, theorem 6.8.1]),
but they apply only to S𝑑 , and not arbitrary manifolds.
                                                            19


                                                                                                       √       𝑑
Corollary 2.2.4. For 0 < 𝜖 < 1 the standard S𝑑 ↩→ R𝑛 can be covered with at most (3.4 𝑑) 2.1                 𝜖𝑑
Euclidean 𝑛-dimensional balls of radius 𝜖.
                                                                                                           𝑑+1
                                                                𝑉𝑀                                       2𝜋 2
    Proof. By theorem 2.2.6, we need at most                     2                . We have 𝜏 = 1, 𝑉𝑀 = Γ( 𝑑+1
                                                                                                                ,
                                                       𝜔 𝑑 (1− 𝜖 2 ) 𝑑−1 ( 𝜖2 ) 𝑑                           2 )
                                                              24𝜏
                                                                  𝑀
        𝜋2
           𝑑                  √ Γ( 𝑑 +1)      √
𝜔𝑑 =  Γ( 𝑑2 +1)
                , and 𝑉𝜔𝑀𝑑 = 2 𝜋 Γ( 2𝑑+1 ) < 2 𝜋𝑑. In the last equation we used the Gautschi’s inequality
                                      2
                                              √    24 𝑑−1 2𝑑
                                                                       √
[20]. This leads to an upper bound of 2 𝜋𝑑 ( 23      ) 𝜖 𝑑 ≤ 3.4 𝑑 ( 2.1          𝜖 ) .
                                                                                     𝑑
                                                                                                               ■
    Next we focus on a covering estimate for a compact manifold with boundary. First we present
a lemma showing that the number of connected components of the boundary must be finite.
Lemma 2.2.5. Let 𝑀 be a compact connected embedded submanifold of R𝑛 with boundary 𝜕 𝑀.
Then 𝜕 𝑀 has a finite number of connected components.
    Proof. Assume not. Let 𝑝𝑖 ∈ 𝑀 be a sequence of points where each 𝑝𝑖 belongs to a different
component of 𝜕 𝑀. Since 𝑀 is compact, 𝑝𝑖 converge to a point 𝑝 ∈ 𝑀, after passing to a
subsequence. Either 𝑝 is in interior of 𝑀 or it belongs to the boundary of 𝑀. If 𝑝 is in the interior
there is an open neighborhood of 𝑝 in R𝑛 intersecting 𝑀 only in interior points. Since 𝑝𝑖 are
boundary points, this contradicts their convergence to 𝑝. If 𝑝 is in the 𝜕 𝑀, then it is in one of the
connected components such as 𝑗. Again there is an open neighborhood of 𝑝 in R𝑛 intersecting 𝑀
only in points in component 𝑗 of boundary or interior points; this contradicts 𝑝𝑖 converging to 𝑝. ■
    Using theorem 2.2.2 and lemma 2.2.5 we present a covering estimate for a manifold with
boundary.
Theorem 2.2.6. (Covering a Manifold with Boundary) Let 𝑀 ↩→ R𝑛 be a compact 𝑑-dimensional,
𝑑 ≥ 1, submanifold of R𝑛 with boundary. Let 𝜏𝑀 > 0 and 𝑉𝑀 be the reach and volume of 𝑀. Let
𝜕 𝑀 denote the boundary of 𝑀 as an independent submanifold of R𝑛 with 𝑉𝜕 𝑀 as the volume of
𝜕 𝑀. Let 𝜏𝜕𝑖 𝑀 > 0 be the reach of the 𝑖-th component of boundary of 𝑀. Let 𝜏𝜕 𝑀 = inf 𝑖 {𝜏𝜕𝑖 𝑀 }. Fix
                   √          √
0 < 𝜖 ≤ min{4 6𝜏𝜕 𝑀 , 2 6𝜏𝑀 }. Then for 𝑑 = 1, 𝑀 can be covered with 𝑉𝜖𝑀 + 𝑉𝜕 𝑀 n-dimensional
                                                      20


Euclidean balls of radius 𝜖, where 𝑉𝜕 𝑀 corresponds to the counting measure. For 𝑑 ≥ 2, 𝑀 can
                                𝑉𝑀                               𝑉𝜕𝑀
be covered with                𝜖2
                                                  +             𝜖2
                                                                                         many balls.
                    𝜔 𝑑 (1−        ) 𝑑−1 ( 𝜖2 ) 𝑑   𝜔 𝑑−1 (1−         ) 𝑑−2 ( 𝜖4 ) 𝑑−1
                             24𝜏 2                            96𝜏 2
                                 𝑀                                𝜕𝑀
    Proof. Consider two subsets of M, 𝑆1 = {𝑥 ∈ 𝑀 |𝑑 𝑀 (𝑥, 𝜕 𝑀) < 2𝜖 } and 𝑆2 = 𝑀\𝑆1 . We will
cover each region separately.
To cover 𝑆1 we cover 𝜕 𝑀 with balls of radius 2𝜖 , and then enlarge them to balls of radius 𝜖. We
claim such balls cover 𝑆1 . For each point 𝑥 ∈ 𝑆1 , there is a point 𝑦 ∈ 𝜕 𝑀 such that 𝑑 𝑀 (𝑥, 𝑦) < 2𝜖 .
                                                                          𝜖
Also 𝑑R𝑛 (𝑥, 𝑦) ≤ 𝑑 𝑀 (𝑥, 𝑦) < 2𝜖 . Since we have a                       2  covering of 𝜕 𝑀, center of one of the balls, 𝑜𝑖 ,
satisfies 𝑑R𝑛 (𝑜𝑖 , 𝑦) < 2𝜖 . Therefore 𝑑R𝑛 (𝑜𝑖 , 𝑥) ≤ 𝑑R𝑛 (𝑜𝑖 , 𝑦) + 𝑑R𝑛 (𝑦, 𝑥) < 𝜖 as claimed.
    When 𝑑 = 1, the boundary is a collections of points, so 𝑉𝜕 𝑀 is enough to cover the region 𝑆1 .
For region 𝑆2 , we pack it with balls of radius 2𝜖 . Each Euclidean ball will overlap with the geodesic
ball of the same radius; for 𝑑 = 1 that is a curve of length 𝜖. Since volume of 𝑆2 is at most 𝑉𝑀
                                                                                      𝑉
and the packing balls are disjoint we can pack at most                                 𝜖 balls. This gives our covering claim of
𝑉𝑀
 𝜖  + 𝑉𝜕 𝑀 .
                                                                   𝑉𝜕𝑀                                    𝜖
    Now assume 𝑑 ≥ 2. We claim                                     𝜖2
                                                                                          balls of radius 2 are enough to cover
                                                     𝜔 𝑑−1 (1−           ) 𝑑−2 ( 𝜖4 ) 𝑑−1
                                                                96𝜏 2
                                                                    𝜕𝑀
𝜕 𝑀. To show this, we use theorem 2.2.2 which requires 𝜕 𝑀 to be compact and without boundary.
Boundary of 𝑀 is compact since it is a closed subset of 𝑀; furthermore boundary of boundary of
                                                                                                √
a manifold is empty. Therefore by theorem 2.2.2 for 0 < 2𝜖 ≤ 2 6𝜏𝜕 𝑀 , 𝜕 𝑀 can be covered with
           𝑉𝜕𝑀
           𝜖2
                             n-dimensional balls of radius 2𝜖 .
𝜔 𝑑−1 (1− 2 ) 𝑑−2 ( 𝜖4 ) 𝑑−1
         96𝜏
             𝜕𝑀
                                                                                                  𝜖
    To cover region 𝑆2 , we instead pack it with balls of radius                                  2 and centers in 𝑆2 to get an
upper bound. Since the centers are in 𝑆2 , each ball will contain the geodesic ball of the same
radius, 𝐵 𝑀 (𝑜𝑖 , 2𝜖 ). The open geodesic ball 𝐵 𝑀 (𝑜𝑖 , 2𝜖 ) can not touch the boundary 𝜕 𝑀 since its
                        𝜖
center is at least      2  geodesic distance away. The cut-locus of 𝑜𝑖 is either at least 𝜋𝜏𝑀 away in
the geodesic distance or at least the distance of 𝑜𝑖 to the boundary in geodesic distance; and for
           √
0 < 2𝜖 ≤ 6𝜏𝑀 < 𝜋𝜏𝑀 the geodesic ball 𝐵 𝑀 (𝑜𝑖 , 2𝜖 ) does not touch the cut locus of 𝑜𝑖 . By the volume
                                                              √
comparison in the (2.5), we have 0 < 𝜖 ≤ 2 6𝜏𝑀 the volume of the the geodesic ball 𝐵 𝑀 (𝑜𝑖 , 2𝜖 ) is
                      𝜖 2 𝑑−1 𝜖 𝑑
at least 𝜔 𝑑 (1 −   24𝜏 2
                           ) ( 2) .         Since in a packing such balls are disjoint, the total volume of such
                                                                      21


                                                                          𝑉𝑀
balls can not exceed 𝑉𝑀 and their number is at most                      𝜖2
                                                                                            as claimed.                           ■
                                                              𝜔 𝑑 (1−        ) 𝑑−1 ( 𝜖2 ) 𝑑
                                                                      24𝜏 2
                                                                           𝑀
    We apply our estimate to the standard d-dimensional closed ball 𝐵 𝑑 as a manifold with boundary.
Corollary 2.2.7. Let 𝐵 𝑑 ↩→ R𝑛 for 𝑛 ≥ 2 be the standard 𝑑 dimensional closed ball of radius 1.
Then to cover it with 𝑛-dimensional balls of radius 𝜖 for 0 < 𝜖 < 1, ( 2𝜖 ) 𝑑 + 2𝜋( 4.05                    𝜖 )
                                                                                                                 𝑑−1 balls are
sufficient.
                                                                      𝑉𝑀                                𝑉𝜕𝑀
    Proof. From theorem 2.2.6, we need at most                       𝜖2
                                                                                         +              𝜖2
                                                                                                                              balls.
                                                           𝜔 𝑑 (1−        ) 𝑑−1 ( 𝜖2 ) 𝑑    𝜔 𝑑−1 (1−        ) 𝑑−2 ( 𝜖4 ) 𝑑−1
                                                                   24𝜏 2                              96𝜏 2
                                                                       𝑀                                 𝜕𝑀
The parameters are
                      𝑀     𝑉𝐵 𝑑      𝜏𝐵 𝑑    𝜕𝐵 𝑑     𝑉𝑆 𝑑−1         𝜏𝑆 𝑑−1         𝜔𝑑       𝜔 𝑑−1
                                 𝑑                           𝑑+1
                              𝜋2                           𝜋 2
                      𝐵𝑑    Γ( 𝑑2 +1)
                                      ∞       𝑆 𝑑−1     2 Γ( 𝑑+1
                                                                      1              𝑉𝐵 𝑑     𝑉𝐵 𝑑−1
                                                              2 )
    Using 0 < 𝜖 < 1 we get, ( 2𝜖 ) 𝑑 + 2𝜋 95    96 4 𝑑−1
                                           96 ( 95 𝜖 )    < ( 2𝜖 ) 𝑑 + 2𝜋( 4.05     𝜖 )
                                                                                          𝑑−1 .
                                                                                                                                  ■
2.3     Covering Estimate from Above Without Radius Restriction via Tubes
In this section we show it is possible to remove the restriction imposed on the covering numbers
by the injectivity radius in theorem 2.2.2. This is at the cost of an explicit dependence on the
ambient dimension. This dependence could be prohibitive when one is in spaces with extremely
high dimensions. For a covering radius larger than the diameter of a manifold, one expects the
covering number to be 1. Therefore one encounters 3 regimes for the covering radius: less than
the injectivity radius, larger than the injectivity radius but up to the diameter, and larger than the
diameter. Interpolating between these regimes is an interesting direction for further research. After
we derive our formula in this section, we will discuss this issue again in remark 2.3.3.
Proposition 2.3.1. Let 𝑀 ↩→ R𝑛 be a closed (compact, no boundary) submanifold of R𝑛 of
dimension 𝑑. Let 𝜏 > 0 be the reach of 𝑀, and 𝑉 denote the volume of 𝑀. Let 𝜔 𝑑 denote the
                                                       22


volume of the 𝑑-dimensional Euclidean ball of radius 1. Then for any 0 < 𝜖, one can cover 𝑀 with
              𝜖 𝑑
𝑉𝜔𝑛−𝑑 (1 + 2𝜏  )
          𝜖 𝑑     𝑛-dimensional Euclidean balls of radius 𝜖 with centers on 𝑀.
     𝜔𝑛 ( 2 )
    Remark. Compared to theorem 2.2.2, there is no upper bound on 𝜖.
                                                                                 𝜖
Proof. Instead of covering 𝑀, we pack it with 𝑛-Euclidean ball of radius         2 with centers on 𝑀. All
                                          𝜖
such balls are within the tube of radius  2 around 𝑀. Since the balls are disjoint the number of balls
is bounded above by
                                                                  𝜖
                               Volume of Tube of radius           2   around 𝑀
                                                                                                    (2.6)
                                                  𝜔𝑛 ( 2𝜖 ) 𝑛
To establish the proposition, we need to show that
                                            𝜖                                𝜖       𝜖
               Volume of Tube of radius       around 𝑀 ≤ 𝑉𝑀 𝜔𝑛−𝑑 ( ) 𝑛−𝑑 (1 + ) 𝑑                   (2.7)
                                            2                                2      2𝜏
                                                                               𝜖        𝜖
                                                              ≤ 𝑉𝑀 Vol(𝐵𝑛−𝑑 ( ))(1 + ) 𝑑 .          (2.8)
                                                                               2       2𝜏
    For 𝜖 < 𝜏, the exact volume of this tube was found by Herman Weyl in [36] (see the textbook
of Gray [22] for a more detailed explanation). Lotz [26, theorem 3.1] gives an overestimate for the
volume of the tube for all values of 𝜖 as follows.
    Let 𝑆 be the shape operator of 𝑀 defined via a local orthonormal frame 𝐸 1 , .., 𝐸 𝑑 on 𝑀 and 𝑣
a unit normal vector to 𝑀
                                  𝑆(𝑣)𝑖 𝑗 := 𝑆 𝑣 (𝐸𝑖 , 𝐸 𝑗 ) = ⟨∇𝐸𝑖 𝐸 𝑗 , 𝑣⟩                        (2.9)
    Let 𝜓𝑖 (𝑣) be defined using the characteristic polynomials of the shape operator 𝑆 as
                                                               𝑑
                                                              ∑︁
                                    det(Id − 𝑡𝑆(𝑣)) =             𝑡 𝑖 𝜓𝑖 (𝑣)                       (2.10)
                                                              𝑖=0
    Let 𝑆(𝑁 𝑀) be the unit sphere bundle of the normal bundle of 𝑀 and define the volume form
𝜔 𝑆(𝑁 𝑀) via the Sasaki metric [32]. Define 𝐾𝑖 as
                                            ∫
                                     𝐾𝑖 :=             |𝜓𝑖 (𝑣)|𝜔 𝑆(𝑁 𝑀)                            (2.11)
                                              𝑆(𝑁 𝑀)
                                                     23


Then Lotz’s overestimate is give by
                                                                                𝑑
                                                   𝜖                   𝜖      ∑︁     𝐾𝑖     𝜖
                  (Volume of Tube of radius          around 𝑀) ≤ ( ) 𝑛−𝑑                   ( )𝑖        (2.12)
                                                   2                   2      𝑖=0
                                                                                  𝑛−𝑑 +𝑖 2
                 1
We have that     𝜏   is an upper bound for all eigenvalues of the shape operator. Since 𝜓𝑖 (𝑣) are the
coefficients of the characteristic polynomial of S(v), we have
                                                     
                                                     𝑑 1 𝑖
                                        |𝜓𝑖 (𝑣)| ≤      ( )                                            (2.13)
                                                     𝑖 𝜏
                                                    ∫
                                             𝐾𝑖 :=           |𝜓𝑖 (𝑣)|𝜔 𝑆(𝑁 𝑀)                          (2.14)
                                                      𝑆(𝑁 𝑀)
                                                                    
                                                                    𝑑 1 𝑖
                                                 ≤ vol(𝑆(𝑁 𝑀))         ( ).                            (2.15)
                                                                    𝑖 𝜏
where vol(𝑆(𝑁 𝑀)) = vol(𝑀)vol(𝑆 𝑛−𝑑−1 ). Therefore using (2.12) we get
                                                                                      𝑑   𝑑 1 𝑖
                                    𝜖                                 𝑛−𝑑−1 𝜖 𝑛−𝑑         𝑖 (𝜏)   𝜖
                                                                                     ∑︁
   (Volume of Tube of radius around 𝑀) ≤ vol(𝑀)vol(𝑆                        )( )                 ( ) 𝑖 (2.16)
                                    2                                          2     𝑖=0
                                                                                         𝑛−𝑑 +𝑖 2
    Using vol(𝑆 𝑛−𝑑−1 ) = (𝑛 − 𝑑)vol(𝐵𝑛−𝑑 )
                                                                               𝑑           
                                      𝜖                             𝑛−𝑑 𝜖
                                                                              ∑︁   𝑛−𝑑 𝑑 𝜖 𝑖
     (Volume of Tube of radius around 𝑀) ≤ 𝑉𝑀 Vol(𝐵 ( ))                                      ( )      (2.17)
                                      2                                  2 𝑖=0 𝑛 − 𝑑 + 𝑖 𝑖 2𝜏
                                                                         𝜖         𝜖
                                                       ≤ 𝑉𝑀 Vol(𝐵𝑛−𝑑 ( ))(1 + ) 𝑑                      (2.18)
                                                                         2        2𝜏
as claimed.                                                                                                ■
    Now we present a similar argument for when a boundary is present.
Proposition 2.3.2. (Via Tube Formula) Let 𝑀 ↩→ R𝑛 be a compact 𝑑-dimensional submanifold of
R𝑛 with boundary. Let 𝜏𝑀 > 0 be the reach of 𝑀\𝜕 𝑀, and 𝑉𝑀 denote the volume of 𝑀. Let 𝜕 𝑀
denote the boundary of 𝑀 as an independent submanifold of R𝑛 . Let 𝜏𝜕 𝑀 > 0 be the injectivity
radius of the normal exponential map of 𝜕 𝑀 and let 𝑉𝜕 𝑀 be the volume of 𝜕 𝑀. Let 𝜔𝑛 be volume
of the unit Euclidean 𝑛-dimensional ball. Then for all 0 < 𝜖, 𝑀 can be covered with
𝑉𝑀 𝜔𝜔𝑛−𝑑
      𝑛
         ( 𝜏1𝑀 + 2𝜖 ) 𝑑 + 𝑉𝜕 𝑀 𝜔𝑛−𝑑+1    1    2 𝑑−1
                                𝜔 𝑛 ( 𝜏𝜕𝑀 + 𝜖 )      n-dimensional Euclidean balls of radius 𝜖.
    Proof. Instead of covering 𝑀 with Euclidean balls of radius 𝜖, we pack it with balls of radius
𝜖                                                                                                           𝜖
2.  We restrict the centers to be on 𝑀. Any such ball is contained in the set of points of distance         2
                                                         24


from 𝑀. We will bound this volume from above. Let 𝐴1 be the image of normal exponential map
of 𝑀\𝜕 𝑀 with radius 2𝜖 . Let 𝐴2 be the image of normal exponential map of 𝜕 𝑀 with radius 2𝜖 .
                                                                     𝜖 𝑑
Volume of 𝐴 is bounded above by 𝑉𝑀 Vol(𝐵𝑛−𝑑 ( 2𝜖 ))(1 +             𝜏𝑀 ) .    Volume of 𝐵 is bounded above by
                               𝜖 𝑑−1
𝑉𝜕 𝑀 Vol(𝐵𝑛−(𝑑−1) ( 2𝜖 ))(1 + 𝜏𝜕𝑀 ) . Therefore the number of disjoint packing balls can be bounded
by dividing by the volume of a single ball.
                         𝑉𝑀 Vol(𝐵𝑛−𝑑 ( 2𝜖 ))(1 +   𝜖 𝑑
                                                  𝜏𝑀 )  + 𝑉𝜕 𝑀 Vol(𝐵𝑛−(𝑑−1) ( 2𝜖 ))(1 +             𝜖 𝑑−1
                                                                                                  𝜏𝜕𝑀 )
       num of balls ≤                                                                                                (2.19)
                                                         𝜔𝑛 ( 2𝜖 ) 𝑛
                             𝜔𝑛−𝑑         𝜖 𝑑 1              𝜔𝑛−𝑑+1                  𝜖 𝑑−1 1
                      ≤ 𝑉𝑀         (1 +      ) 𝜖 𝑑 + 𝑉𝜕 𝑀                  (1 +         )                            (2.20)
                              𝜔𝑛         2𝜏𝑀 ( 2 )                𝜔𝑛              2𝜏𝜕 𝑀       ( 2𝜖 ) 𝑑−1
                            𝜔𝑛−𝑑 1         2           𝜔𝑛−𝑑+1 1               2
                      = 𝑉𝑀        (     + ) 𝑑 + 𝑉𝜕 𝑀             (        + ) 𝑑−1                                    (2.21)
                              𝜔𝑛 𝜏𝑀 𝜖                    𝜔𝑛        𝜏𝜕 𝑀 𝜖
Remark 2.3.3. The bound in proposition 2.3.2, which is based on overestimating the volume of a
tube around the manifold, is valid for all radii of covering balls. This is in contrast to the approach
using Gunther’s theorem where one is restricted by the injectivity radius of the manifold. However
the tube bounds depend on the ambient dimension 𝑛 while Gunther’s bound only depends on the
dimension of the manifold. This is because for small enough radii, one is close to the surface of the
manifold. However for large radii, one must account for the dimension of the ambient space. This
dependence on the ambient dimension is prohibitive specially when one is in spaces with extremely
large dimension.
    We show the dependence on the ambient dimension with a calculation. Using the formula for
                                                                       𝑑
                                                                     𝜋2                    𝜔 𝑛−𝑑         1   Γ( 𝑛2 +1)
the volume of a Euclidean ball in dimension 𝑑, 𝜔 𝑑 =              Γ( 𝑑2 +1)
                                                                            .  One gets             =     𝑑
                                                                                                            Γ( 𝑛−𝑑
                                                                                                                       . In
                                                                                                                 2 +1)
                                                                                            𝜔𝑛
                                                                                                       𝜋2
particular if both 𝑛, 𝑑 are even and 𝑑 < 𝑛2 ,
                      𝜔𝑛−𝑑      1 𝑛−𝑑               𝑛      1 𝑛−𝑑                   𝑑      𝑛 𝑑
                             = 𝑑(           + 1)...( ) ≥ 𝑑 (                 + 1) 2 ≥ ( ) 2 .
                        𝜔𝑛     𝜋2      2            2     𝜋2          2                  4𝜋
                                                                                 𝜖 𝑑
                                                            𝑉𝜔𝑛−𝑑 (1 +          2𝜏 )
    One can consider the limit 𝜖 → ∞ in the formula                                   from proposition 2.3.1. The
                                                                    𝜔𝑛 ( 2𝜖 ) 𝑑
                  𝜔𝑛−𝑑 𝑉
limit scales as (       ) . For a compact manifold, the covering number is 1 once 𝜖 > diameter(𝑀).
                   𝜔𝑛 𝜏 𝑑
This difference is in part because in the overestimate we ignore the overlaps as the tube gets large.
Incorporating the overlaps in the covering bound is an interesting direction for future research.
                                                      25


2.4     Covering Estimate from Below via Bishop’s Theorem
In this section we give lower bounds for covering numbers of a manifold. These bounds are used
in chapter 5 for discussing necessary conditions for JL maps.
Theorem 2.4.1. Let 𝑀 be a d-dimensional smooth submanifold of R𝑛 possibly with boundary, with
                                           𝜏                        2𝜖                  𝜋 𝑑/2
volume 𝑉 and reach 𝜏. Let 0 < 𝜖 ≤          2,  and 𝑟 = 𝜖 (1 +        𝜏 ).   Let 𝜔 𝑑 =  Γ( 𝑑2 +1)
                                                                                                 be the volume of
the standard unit 𝑑-ball. Then the covering number of 𝑀 with balls centered on 𝑀 and radius 𝜖
satisfies
                                              𝑉
                                              √              ≤ 𝐶 (𝑀, 𝜖) .
                                               2𝑟 𝑑−1 𝑑
                                    𝜔 𝑑 (1 +  3𝜏 )      𝑟
                                                                                                        √
                                                                                                          2𝑟 𝑑−1 𝑑
     Proof. We claim a single ball can cover at most a subset of 𝑀 with volume 𝜔 𝑑 (1 +                  3𝜏 )    𝑟 .
This claim immediately implies the theorem.
                                                                                                       𝜏
For two points 𝑝, 𝑞 ∈ 𝑀 with Euclidean distance 𝛿, geodesic distance 𝑙, and 𝛿 ≤                        2,  one has
              2𝛿
𝑙 ≤ 𝛿(1 +     𝜏 ).  Hence the intersection of 𝑀 and a Euclidean ball of radius 𝜖 centered on 𝑀 is
contained in a geodesic balls of radius 𝑟 = 𝜖 (1+ 2𝜖𝜏 ) with the same center provided 𝜖 ≤ 2𝜏 . Therefore
                                                                                                   √
                                                                                                     2𝑟 𝑑−1 𝑑
it is sufficient to bound the volume of such a geodesic ball from above with 𝜔 𝑑 (1 +               3𝜏 )     𝑟 .
     We use Bishop’s volume comparison theorem [See [19], page 169, theorem 3.101, part i]. By
properties of reach, theorem 2.1.7, the injectivity radius of each point of 𝑀 is at least 𝜋𝜏, and
                                                        −2
all sectional curves of 𝑀 are bounded below by           𝜏2
                                                            . Hence by Bishop’s theorem the volume of a
geodesic ball of radius 𝑟 in 𝑀 is bounded above by the volume of the geodesic ball of radius 𝑟 in
                                                                          −2
the simply connected space with constant sectional curvature              𝜏2
                                                                             . Denote this volume by 𝑉 −2 (𝑟); it
                                                                                                            𝜏2
is given by the formula
                                                        ∫   𝑟
                                               2𝜋 𝑑/2
                                   𝑉 −2 (𝑟) =                 𝑆 −2 (𝑥) 𝑑−1 𝑑𝑥
                                     𝜏2        Γ( 𝑑2 )    0     𝜏2
                                                       26


                           √                  sinh(𝑥)
where 𝑆 𝑘 (𝑥) = √1   sinh( −𝑘𝑥). Since                  is increasing on 0 < 𝑥,
                 −𝑘                               𝑥
                                                    ∫    𝑟
                                           2𝜋 𝑑/2
                             𝑉 −2 (𝑟) =                    𝑆 −2 (𝑥) 𝑑−1 𝑑𝑥
                               𝜏2          Γ( 𝑑2 )    0       𝜏2
                                                                      √
                                                                        2𝑥
                                                             sinh(      𝜏 ) 𝑑−1 𝑑−1
                                                    ∫    𝑟
                                           2𝜋 𝑑/2
                                       ≤                   (      √          ) 𝑥 𝑑𝑥
                                            Γ( 𝑑2 ) 0               2𝑥
                                                                    𝜏
                                                               √
                                                                 2𝑟
                                           2𝜋  𝑑/2    sinh( 𝜏 ) 𝑑−1 𝑟 𝑑
                                       ≤            (      √          )         .
                                            Γ( 𝑑2 )          2𝑟               𝑑
                                                             𝜏
                                                                                                          √     √
Here we use an approximation for sinh(𝑥)                                  𝜏
                                          𝑥 . Since 0 < 𝜖 ≤ 2 , then 𝑟 = 𝜖 (1 +
                                                                                            2𝜖
                                                                                             𝜏 ) ≤ 𝜏, and  2𝑟
                                                                                                           𝜏  ≤   2.
                                     √
Using sinh(𝑥)
         𝑥    ≤ 1 + 3𝑥 for 0 < 𝑥 ≤ 2,
                                                               √
                                             2𝜋 𝑑/2               2𝑟 𝑑−1 𝑟 𝑑
                                          ≤            (1  +          )
                                              Γ( 𝑑2 )            3𝜏           𝑑
                                                          √
                                                            2𝑟 𝑑−1 𝑑
                                          = 𝜔 𝑑 (1 +             ) 𝑟
                                                           3𝜏
                                                                                                                  ■
Corollary 2.4.2. Let 𝑀 be a d-dimensional smooth submanifold of R𝑛 possibly with boundary,
                                                                       𝜋 𝑑/2
with volume 𝑉 and reach 𝜏. Let 0 < 𝜖 ≤ 2𝜏 . Let 𝜔 𝑑 =                Γ( 𝑑2 +1)
                                                                               be the volume of the standard 𝑑-ball.
Then the covering number of 𝑀 with balls centered on 𝑀 and radius 𝜖 satisfies
                                                 𝑉
                                                           ≤ 𝐶 (𝑀, 𝜖) .
                                            𝜔 𝑑 (3𝜖) 𝑑
    Proof. We use the bound            √
                                        𝑉
                                        2𝑟 𝑑−1 𝑑
                                                    ≤ 𝐶 (𝑀, 𝜖). From 0 < 𝜖 ≤ 2𝜏 , we simplify 𝑟 = 𝜖 (1 + 2𝜖𝜏 )
                               𝜔 𝑑 (1+ 3𝜏 )    𝑟
                                                                    √
                                                                       2
to 𝑟 ≤ 2𝜖 and 𝑟 ≤ 𝜏. We get the factor of 3 from (1 +                 3 )2   < 3.                                 ■
                                                           27


2.5     Covering Estimate from Above for the Unit Secants of a Submanifold
In this section we calculate a covering number for the unit secants generated by a compact subman-
ifold of the Euclidean space with boundary in terms of the covering numbers for the manifold. The
secants of a manifold have been studied previously in [24, section 3], [30, page 1323], [37, section
1] and [38, section 3]. In our approach here, we use the covering numbers to estimate the Gaussian
width of the set of unit secants.
     We give a list of secants that form an 𝜖-net for the set of unit secants. The distance between
two unit secants is calculated by their distance on the unit sphere. The secants of 𝑀 are separated
into long and short secants; the long secants are compared to the secants generated by a net on the
manifold and the short secants are compared to tangent vectors anchored at the net points. The
technique of separating the cords into long and short have been previously used in [14, page 5],
[15, lemma 9], and [23, section 3].
     For long secants, we have the following lemma.
Lemma 2.5.1. [14, lemma 4.1] Let 𝑝, 𝑝 ∗ , 𝑞 and 𝑞 ∗ be 4 points in R𝑛 . Let 0 < 𝑙 = || 𝑝 − 𝑞|| and
                                                              4𝑑
|| 𝑝 − 𝑝 ∗ ||, ||𝑞 − 𝑞 ∗ || < 𝑑. Let 0 < 𝜖 < 1 and assume      𝑙 ≤ 𝜖. Then
                                               𝑝−𝑞        𝑝∗ − 𝑞∗
                                         ||           − ∗            || ≤ 𝜖 .                          (2.22)
                                            || 𝑝 − 𝑞|| || 𝑝 − 𝑞 ∗ ||
     We now present the covering argument for the unit secants. In the proof, we develop a lemma
for short secants similar to long ones.
Theorem 2.5.2. (Covering the unit secants) Let 𝑀 ↩→ R𝑛 be a compact 𝑑-dimensional submanifold
of R𝑛 with boundary 𝜕 𝑀. Let 𝜏𝑀 be the reach of 𝑀. Let 𝜏𝑖 be the reach of the 𝑖-th connected
component of 𝜕 𝑀 as a submanifold of R𝑛 . Let 𝜏 = inf 𝑀,𝑖 {𝜏𝑀 , 𝜏𝑖 }. Let 𝑉𝑀 be the volume of 𝑀 and
                                                          𝑝−𝑞
𝑉𝜕 𝑀 be the volume of 𝜕 𝑀. Let 𝑈 (𝑀 − 𝑀) = { || 𝑝−𝑞||         | 𝑝 ≠ 𝑞, 𝑝, 𝑞 ∈ 𝑀 } be the set of unit secants
of 𝑀, and let 𝑈 (𝑀 − 𝑀) be its closure. Fix 0 < 𝜖 < 1. Define 𝛼 = 𝑉𝜔𝑀𝑑 ( 41              𝑉𝜕𝑀 81 𝑑−1
                                                                                  𝜏 ) + 𝜔 𝑑−1 ( 𝜏 )
                                                                                     𝑑                . Then
                                         2                
𝑈 (𝑀 − 𝑀) can be covered with 𝛼2 + 𝛼(2𝑑3𝑑−1 ) 𝜖 14𝑑 n-dimensional Euclidean balls of radius 𝜖.
                                                        28


       Proof. We give a list of unit vectors in R𝑛 , count its elements, and prove that every element
of 𝑈 (𝑀 − 𝑀) is within 𝜖 distance of one of them. In this approach we follow [15]. The list is as
follows.
                               2
     1. Consider a ( 𝜏𝜖      20 )-net of points in 𝑀. Include all the unit secant lines generated by pair of
            these points in the list. If 𝜏 = ∞ then 1 point is sufficient for the net. (That is because infinite
            reach corresponds to affine spaces and all secants can be compared to tangent vectors at one
            point).
     2. For each point in the above net, consider a ( 3𝜖 )-net for its unit tangent sphere. Include all
            such unit vectors in the list. If an anchor point is in the boundary, there is a half unit sphere
            for inward pointing vectors and we consider the unit 𝑆 𝑑−1 as its extension.
       We must show that the list above is an 𝜖-net for 𝑈 (𝑀 − 𝑀). To do so divide the secant lines
                                                                                                           𝜏𝜖
into a long and short set based on their length. The cut-off between long and short secants is             5.  If
𝜏 = ∞ then all secants count as short.
       Long Secants: For a long secant between 𝑝, 𝑞 ∈ 𝑀, let 𝑝 ∗ and 𝑞 ∗ be the closest point to them
                                        2
in the net on 𝑀. Since 2( 𝜏𝜖          20 ) <
                                                𝜏𝜖
                                                5  ≤ || 𝑝 − 𝑞||, we are guaranteed that 𝑝 ∗ and 𝑞 ∗ are distinct
points. By lemma 2.5.1 the distance between the unit secants of 𝑝 − 𝑞 and 𝑝 ∗ − 𝑞 ∗ is bounded by
        2             2
4( 𝜏20𝜖
          )     ( 𝜏 5𝜖 )
 || 𝑝−𝑞||    ≤   ( 𝜏5𝜖 ) ≤ 𝜖. Therefore all long secants are covered by the secants in item 1 above.
       Short Secants: For a short secant, we first exchange it for the starting tangent direction for the
distance minimizing geodesic connecting its base points. Second, we exchange that tangent vector
for its parallel transport along the distance minimizing geodesic to the tangent space of the closest
                                            2
point to the base point in the ( 𝜏𝜖       20 )-net of 𝑀. Third, we exchange the parallel transported vector for
one of the vectors in the item 2 above. In this process, we will incur 3 errors which we denote by
𝜖1 , 𝜖2 and 𝜖3 . We must show that 𝜖1 + 𝜖2 + 𝜖3 ≤ 𝜖.
       Exchanging a short secant for a tangent vector. Here we bound 𝜖1 by 3𝜖 . Using theorem
                                                                                                   𝑑       2𝑑 2
2.1.7, the angle between the initial tangent vector and the secant satisfies sin(𝜙) ≤              2𝜏 (1 +  𝜏 ) .
                                                            29


                                                                                       49𝜖
Because the cutoff for short secants is       𝜏𝜖
                                              5,    substituting for 𝑑, sin(𝜙) ≤       250 .   Using 2 sin( 𝜙2 ) as the
distance on a unit sphere for 2 vectors with angle 𝜙,
                                               𝜙        3              147𝜖      𝜖
                                  𝜖1 = 2 sin( ) < sin(𝜙) ≤                     < .
                                               2        2               500      3
      Parallel transport to an anchor point. Here we bound 𝜖2 by 3𝜖 . Here we parallel transport an
initial geodesic tangent vector from the base point of the secant to an anchor point. Using theorem
                                                                                                   2
2.1.7, 𝜃 ≤ 𝜏𝑙 . The Euclidean distance to the closest net point is at most 𝑑 ≤ ( 𝜏𝜖             20 ). We translate to
                                                               2𝑑 2
the geodesic distance using theorem 2.1.7, 𝑙 ≤ 𝑑 +              𝜏 . It follows that using 𝜖 ≤ 1
                                    𝜃           𝑙      𝑑        2𝑑        𝜖2       𝜖2         𝜖
                       𝜖2 = 2 sin( ) ≤ 𝜃 ≤ ≤ (1 + ) 2 ≤                       (1 + ) 2 < .
                                    2          𝜏       𝜏         𝜏        20       10         3
                                                                                                                𝜏𝜖 2
      Here we had room to spare for obtaining the bound 3𝜖 . However the cut-off of                     𝜏𝜖
                                                                                                        3  and   20  are
needed for the exchanging a short secant with a tangent vector and for handling the long secants.
      Exchange with a preselected tangent vector: At the anchor point, we exchange the parallel
transported tangent vector with a vector in a net for the unit sphere at the anchor point. By choice
a ( 3𝜖 )-net is considered at the anchor point hence, 𝜖3 ≤ 3𝜖 . Therefore together 𝜖1 + 𝜖2 + 𝜖3 ≤ 𝜖.
      It remains to count the number of unit vectors in the above list. For item 1, by theorem 2.2.6,
                        2                           2
one can create a ( 𝜏𝜖                            𝜏𝜖
                      20 )-net for 𝑀 with 𝐶 ( 20 ) when 𝐶 (𝜖) is defined as
                                          𝑉𝑀                                 𝑉𝜕 𝑀
                       𝐶 (𝜖) =                               +                                   .
                                          𝜖2                                  𝜖2
                                𝜔 𝑑 (1 −    2
                                         24𝜏𝑀
                                              ) 𝑑−1 ( 2𝜖 ) 𝑑   𝜔 𝑑−1 (1 −      2 )
                                                                            96𝜏𝜕𝑀
                                                                                   𝑑−2 ( 𝜖 ) 𝑑−1
                                                                                         4
                                                                                                   2
The number of unit secants generated by such points is bounded above by 12 𝐶 ( 𝜏𝜖                    2
                                                                                                 20 ) . For directions
in the list 2, the anchor points could be in the interior or in the boundary of 𝑀. We consider the
tangent 𝑆 𝑑−1 for the unit vectors for all of them; if an anchor point is in the boundary, there is a
half unit sphere for inward pointing vectors and we consider the unit 𝑆 𝑑−1 as its extension. By [35,
corollary 4.2.13], each 𝑆 𝑑−1 can be covered by ( 3𝜖 ) 𝑑 unit vectors. Since there is a unit sphere for
                                                           30


                                                                  2   3 𝑑
each point in the net from item 1, there are 𝐶 ( 𝜏𝜖             20 )( 𝜖 ) . Adding the two estimates together there
                      2           2
are at most 21 𝐶 ( 𝜏𝜖   2      𝜏𝜖     3 𝑑
                   20 ) + 𝐶 ( 20 )( 𝜖 ) unit vectors. This number is rather inconvenient and we will
                             2
simplify it. One has 𝐶 ( 𝜏𝜖20 )
                  𝜏𝜖 2                      𝑉𝑀                                          𝑉𝜕 𝑀
              𝐶(        )=                𝜖2 2             𝜏𝜖2
                                                                    +                   2               2
                                                                                                                .
                   20                 ( 𝜏20  )           (  20 )
                                                                                       𝜖 2
                                                                                   ( 𝜏20   ) 𝑑−2 ( 𝜏20𝜖
                                                                                                          ) 𝑑−1
                           𝜔 𝑑 (1 −    24𝜏𝑀 2   ) 𝑑−1 (     2    )𝑑    𝜔 𝑑−1 (1 −  96𝜏𝜕𝑀 2    )  (   4 )
    We note 0 < 𝜖 ≤ 1, 𝜏 = min{𝜏𝑀 , 𝜏𝜕 𝑀 }, and
                                           (2 × 20) 𝑑
                                                    4
                                                                  ≤ 40.005𝑑 < 41𝑑
                                     (1  − (24)𝜖(202 ) ) 𝑑−1
                                           (4 × 20) 𝑑
                                                                  ≤ 80.003𝑑 < 81𝑑 .
                                                  𝜖4
                                     (1 −     (96) (202 )
                                                          ) 𝑑−1
                 𝑉𝑀 41 𝑑       𝑉𝜕𝑀 81 𝑑−1
    Define 𝛼 =    𝜔𝑑 ( 𝜏 ) +   𝜔 𝑑−1 ( 𝜏 )       .  Then
                                                           𝜏𝜖 2         𝛼
                                                      𝐶(         ) ≤ 2𝑑
                                                            20        𝜖
Therefore we get the overall upper bound as
                                  𝛼2 1
                                                                      2          
                                                     𝛼 3 𝑑              𝛼       𝑑      1
                                        4𝑑
                                              + ( 2𝑑 )( ) ≤                + 𝛼3             .
                                    2 𝜖            𝜖         𝜖           2           𝜖 4𝑑
                                                                                                                           ■
    For an example we apply our covering estimate to the standard 𝑆 𝑑 as a corollary. We note that
the set of the unit secants of 𝑆 𝑑 is equal to 𝑆 𝑑 itself, hence there is redundancy as many pairs of
points achieve the same unit secant. It is an interesting geometry question to find submanifolds that
their secants avoid being parallel. In this direction there is the work on totally skew embeddings
[21]. Such submanifolds would be great candidates for benchmarking JL maps as their unit secants
is expected to be large.
                                                                                 √      2𝑑
Corollary 2.5.3. For covering the unit secants of 𝑆 𝑑 ↩→ R𝑛 , 14𝑑 𝜖 4𝑑             𝑑41
                                                                                            balls of radius 𝜖 are sufficient.
                                                                31


                                                                     
                                                             𝛼2           1                𝑉𝑀 41 𝑑
    Proof. We calculate using theorem 2.5.2. We need         2  + 𝛼3𝑑   𝜖 4𝑑
                                                                             balls for 𝛼 = 𝜔𝑑 ( 𝜏 )   +
                                                                                                   𝑑+1
𝑉𝜕𝑀 81 𝑑−1                                                                                       𝜋 2
𝜔 𝑑−1 ( 𝜏 )   . We drop the boundary term since 𝑆 𝑑 has no boundary. We have 𝜏 = 1, 𝑉𝑆 𝑑 = 2    Γ( 𝑑+1
                                                                                                    2 )
                         𝑑
                                   𝑉 𝑑      √ Γ( 𝑑 +1)       √                          √
and 𝑉𝐵 𝑑 = 𝜔 𝑑 = Γ(𝜋𝑑 2+1) . Then 𝜔𝑆𝑑 = 2 𝜋 Γ( 2𝑑+1 ) ≤ 2 𝜋𝑑. Therefore 𝛼 ≤ 2 𝜋𝑑41𝑑 , and
                       2                            2
𝛼2
                 √
          𝑑 ≤ 14𝑑 𝑑412𝑑 .
 2  + 𝛼3                                                                                              ■
                                                32


2.6    Gaussian Width Estimate for Unit Secants From Above
Definition 2.6.1. Let 𝑔 be a standard Gaussian random variable in R𝑛 . Define the Gaussian Width
of 𝑆 ⊂ R𝑛 , 𝜔(𝑆), as follows.
                                          𝜔(𝑆) = E sup ⟨𝑔, 𝑥⟩                                    (2.23)
                                                     𝑥∈𝑆
Theorem 2.6.2. (Via Gunther) Let 𝑀 ↩→ R𝑛 be a compact 𝑑-dimensional submanifold of R𝑛 with
boundary 𝜕 𝑀. Let 𝜏𝑀 be the reach of 𝑀. Let 𝜏𝑖 be the reach of the 𝑖-th connected component of
𝜕 𝑀 as a submanifold of R𝑛 . Let 𝜏 = inf 𝑀,𝑖 {𝜏𝑀 , 𝜏𝑖 }. Let 𝑉𝑀 be the volume of 𝑀 and 𝑉𝜕 𝑀 be the
                                      𝑝−𝑞
volume of 𝜕 𝑀. Let 𝑈 (𝑀 − 𝑀) = { || 𝑝−𝑞||  | 𝑝 ≠ 𝑞, 𝑝, 𝑞 ∈ 𝑀 } be the set of unit secants of 𝑀, and let
𝑈 (𝑀 − 𝑀) be its closure. Let Gaussian Width of 𝑈 (𝑀 − 𝑀) be 𝜔(𝑈 (𝑀 − 𝑀)).
    Define
                                        𝑉𝑀 41 𝑑 𝑉𝜕 𝑀 81 𝑑−1
                                  𝛼=         ( ) +           ( )                                 (2.24)
                                        𝜔𝑑 𝜏           𝜔 𝑑−1 𝜏
                                          𝛼2
                                   𝑐 = ( + 3𝑑 𝛼)                                                 (2.25)
                                          2
Then
                                                      √ √︁
                                𝜔(𝑈 (𝑀 − 𝑀)) ≤ 4 2 log(𝑐) + 4𝑑                                   (2.26)
    Proof. We use the covering number bounds in theorem 2.5.2 and Dudley’s inequality [18, page
226]. Let 𝐶 (𝜖) denote the covering number of 𝑈 (𝑀 − 𝑀) with n-dimensional Euclidean balls of
                                    2
radius 𝜖. We have that 𝐶 (𝜖) ≤ ( 𝛼2 + 2𝑑3𝑑 𝛼) 𝜖 14𝑑 for 0 < 𝜖 ≤ 1 and 𝐶 (𝜖) = 1 for 𝜖 > 1 because
𝑈 (𝑀 − 𝑀) is a subset of the 𝑆 𝑛−1 . Dudley’s inequality gives us
                                                  √ ∫     ∞ √︁
                            𝜔(𝑈 (𝑀 − 𝑀)) ≤ 4 2                log(𝐶 (𝜖))𝑑𝜖 .                     (2.27)
                                                        0
Therefore
                                                    √︂
                                           √ ∫    1
                                                             𝛼2             1
                      𝜔(𝑈 (𝑀 − 𝑀)) ≤ 4 2               log((    + 2𝑑3𝑑 𝛼) 4𝑑 )𝑑𝜖 .               (2.28)
                                                0             2           𝜖
                                                   33


            2
Let 𝑐 = ( 𝛼2 + 2𝑑3𝑑 𝛼), then by Cauchy-Schwartz
                                                    √︂
                                           √ ∫    1
                                                                        1
                     𝜔(𝑈 (𝑀 − 𝑀)) ≤ 4 2               log(𝑐) + 4𝑑 log( )          (2.29)
                                                0                       𝜖
                                                             √︄
                                           √ √︁         ∫  1
                                                                      4𝑑       1
                                        ≤ 4 2 log(𝑐)           1+          log( ) (2.30)
                                                         0         log(𝑐)      𝜖
                                                       √︄
                                           √ √︁           ∫ 1
                                                                     4𝑑        1
                                        ≤ 4 2 log(𝑐)           1+         log( ). (2.31)
                                                           0      log(𝑐)       𝜖
With the identity
                        ∫   1
                                        1                             1
                              1 + 𝑘 log( )𝑑𝑥 = 1 − 𝑘 (𝑥 log(𝑥) − 𝑥) 0 = 1 + 𝑘.    (2.32)
                          0             𝑥
We continue
                                                              √︄
                                                  √ √︁                 4𝑑
                              𝜔(𝑈 (𝑀 − 𝑀)) ≤ 4 2 log(𝑐) 1 +                       (2.33)
                                                                     log(𝑐)
                                                  √ √︁
                                             ≤ 4 2 log(𝑐) + 4𝑑                    (2.34)
as required.                                                                          ■
                                                   34


                                            CHAPTER 3
                               MORE ESTIMATES WITH REACH
3.1     Closed Submanifolds with Reach 1 and Minimum Volume
                                                                                   𝑉
In the earlier covering estimates such as theorem 2.6.2 we encounter the ratio     𝜏𝑑
                                                                                      for 𝑑 dimensional
manifolds. This ratio is scaling invariant and hence it is natural to ask if there is a minimum value
for it over closed manifolds. We will also encounter this ratio again in theorem 5.0.5 for a necessary
condition for JL maps. Therefore here we investigate if it has a lower bound.
    In particular if we consider the case when 𝑑 = 1, and the manifold is topologically equivalent
to S1 , and 𝜏 = 1, then the minimum of the ratio is just the minimum length of a closed curve with
reach one. Here we can show that for a closed 𝑑 dimensional manifold the minimum value of this
ratio is given by the volume of the standard S𝑑 . In section 3.2 we also consider the minimum value
of this ratio for a torus embedded in 3 dimensions, which has a topological constraint.
Proposition 3.1.1. Let 𝑀 ⊂ R𝑁 be a closed smooth d dimensional manifold of reach 1. Then the
volume of 𝑀 is at least Vol(S𝑑 ) which is the volume of the standard unit S𝑑 .
    Proof. We present the proof in 4 steps.
    1. The injectivity radius of a closed manifold of reach 𝜏 is at least 𝜋𝜏. So for 𝜏 = 1, the injectivy
radius is at least 𝜋. See theorem 2.1.7
                                                                1
    2. Sectional curvatures of 𝑀 are all abounded above by      𝜏2
                                                                   . For 𝜏 = 1, they are all bounded by
1.
    3. Consider the geodesic ball of radius 𝜋 around an arbitrary point. Because the injectivity
radius is at least 𝜋, then we can apply Gunther’s volume comparison theorem. Then the volume
of the geodesic ball is at least the volume of the geodesic ball of radius 𝜋 in the space of uniform
                                                  35


curvature one. The general formula is
                                                             ∫   𝑟
                                                    2𝜋 𝑑/2
                                       𝑉 1 (𝑟) =                    𝑆 1 (𝑥) 𝑑−1 𝑑𝑥
                                         𝜏2         Γ( 𝑑2 )   0       𝜏2
                                √
    where 𝑆 𝑘 (𝑥) =   √1  sin( 𝑘𝑥) for 𝑘 > 0. So we can rewrite
                        𝑘
                                  ∫  𝑟                               ∫   𝑟
                         2𝜋 𝑑/2                𝑑−1          2𝜋 𝑑/2                 𝑥  𝑑−1
                                       𝑆 1 (𝑥)     𝑑𝑥 =                      𝜏 sin            𝑑𝑥.
                          Γ( 𝑑2 )  0     𝜏2                 Γ( 𝑑2 )    0             𝜏
    Here we want 𝑟 = 𝜋 and 𝜏 = 1 so we get
                                                      ∫    𝜋
                                              2𝜋 𝑑/2
                                        𝑉=                   (sin (𝑥)) 𝑑−1 𝑑𝑥.
                                              Γ( 𝑑2 )   0
This is the formula for the volume of the 𝑑 sphere, as
                                                      𝑑       2𝜋 (𝑑+1)/2
                                              Vol(S ) =                    .                             (3.1)
                                                               Γ( 𝑑+12 )
    4. The volume of 𝑀 is at least the volume of the geodesic ball we found in the previous item. ■
    By scaling we get the following more general bound.
Corollary 3.1.2. Let 𝑀 ⊂ R𝑁 be a closed smooth d dimensional manifold of reach 𝜏. Then the we
have
                                                   𝑉
                                                        ≥ Vol(S𝑑 ).                                      (3.2)
                                                   𝜏𝑑
                                            1
    Proof. We scale by a factor of          𝜏 so that 𝜏 becomes 1. However under the scaling the ratio
 𝑉                                                        𝑉new
𝜏𝑑
    doesn’t change. After the scaling we have               1    ≥ Vol(S𝑑 ). Therefore the same is true before
scaling.                                                                                                    ■
3.1.1    Codimension 1
In this subsection we give an alternative proof with the isoperimetric inequality for the lower bound
              𝑉
on the ratio  𝜏𝑑
                  in the special case where 𝑑 = 𝑛 − 1 (codimension 1) and the manifold is closed.
In the case of equality in our bound, we get a rigidity statement coming from the isoperimetric
inequality. The difference with the previous discuss is in having a different method of proof and a
rigidity result in the case of equality.
                                                           36


Theorem 3.1.3. Let 𝑀 ⊂ R𝑛 be a codimension 1 closed submanifold of R𝑛 . Assume the reach of
𝑀 is 1. Then the 𝑛 − 1-dimensional volume of 𝑀 is at least vol(S𝑛−1 ) where S𝑛−1 is the unit 𝑛 − 1
dimensional sphere. In the case of volume equality, 𝑀 is a rigid transformation of S𝑛−1 .
    Proof. Since 𝑀 is codimension 1 and is closed, its complement will have two connected
components. One of them is the inward component. Call it 𝐴. We give a lower bound for the
volume of 𝐴.
    Since the reach of 𝑀 is 1, the principle curvatures of 𝑀 in all normal directions are bounded
above by 1, [27, Proposition 6.1]. Consider the parallel surface of distance 0 < 𝑡 < 1 in the inwards
normal direction. Let the shape operator of 𝑀 be 𝑆, then the change of volume formula is given by
det(𝐼 − 𝑡𝑆) and
                                        det(𝐼 − 𝑡𝑆) ≥ (1 − 𝑡) 𝑛−1 .                                   (3.3)
Therefore if 𝑉 is the (n-1)-dimensional volume of 𝑀, the n-dimensional volume of 𝐴 is at least
                                           ∫   1
                                                                     𝑉
                                        𝑉        (1 − 𝑡) 𝑛−1 𝑑𝑡 =                                     (3.4)
                                             𝑡=0                     𝑛
Now we apply the isoperimetric inequality to 𝐴 with 𝜕 𝐴 = 𝑀. Let 𝜔𝑛 be the volume of the unit
𝑛-ball.
                                           1           𝑛−1
                                    𝑛(𝜔𝑛 ) 𝑛 vol𝑛 ( 𝐴)  𝑛  ≤ vol𝑛−1 (𝑀)                               (3.5)
                                                1 𝑉 𝑛−1
                                        𝑛(𝜔𝑛 ) 𝑛 ( ) 𝑛 ≤ 𝑉                                            (3.6)
                                                   𝑛
                                      vol(S𝑛−1 ) = 𝑛𝜔𝑛 ≤ 𝑉 .                                          (3.7)
                                                              𝑑
In the last line we used the fact that vol(S𝑛−1 ) =           𝑑𝑟 | 𝑟=1 vol(𝑟 𝐵𝑛 ). The equality case of the
isoperimetric inequality gives that 𝑀 is a rigid motion of S𝑛−1 .                                         ■
3.2     Torus in R3 with Reach 1 and Minimum Area
                                                               𝑉
Here we consider the problem of minimizing the ratio           𝜏𝑑
                                                                    when 𝑑 = 2, 𝜏 = 1 for a smooth torus in
R3 . In this special case there is a topological constraint of having genus 1 and therefore we expect
                                                     37


the minimum to be more than the value 4𝜋 ≈ 13 given in section 3.1.
    Our candidate torus for minimizing the ratio is the torus of revolution with larger radius 2 and
smaller radius 1. The reach of this torus is 1 and its surface area is 8𝜋 2 = (2𝜋(2))(2𝜋(1)) ≈ 79.
    Here we offer two proofs for smaller lower bounds. The first proof is based on Loewner’s
                                                                   √
systolic inequality [6, page 325] and gives a lower bound of 2 3𝜋 2 ≈ 34. The second proof is
based on the isoperimetric inequality and gives a lower bound of 27/3 32/3 𝜋 5/3 ≈ 70.
Theorem 3.2.1. (Via systolic inequality) Let 𝑇 ⊂ R3 be a smooth embedded torus with reach 1.
                                        √
Then the surface area of 𝑇 is at least 2 3𝜋 2 .
    Proof. Consider the systole of 𝑇 (shortest non-contractible loop). It is necessarily a periodic
geodesic [6, page 326]. Since the reach of 𝑇 is 1, the external acceleration of the systole as a unit
speed geodesic is at most 1 in norm. Any closed curve with acceleration bounded by 1 has length
at least 2𝜋 [17]. Therefore our theorem follows from Loewner’s systolic inequality [6, page 325]
                                                   √
                                                 2 3
                                             sys         ≤𝐴                                     (3.8)
                                                   √2
                                                 2 3
                                            (2𝜋)         ≤𝐴                                     (3.9)
                                                    2
                                                √
                                              2 3𝜋 2 ≤ 𝐴.                                     (3.10)
                                                                                                   ■
Theorem 3.2.2. (Via isoperimetric inequality) Let 𝑇 ⊂ R3 be a smooth embedded torus with reach
1. Then the surface area of 𝑇 is at least 27/3 32/3 𝜋 5/3 ≈ 70.
    Proof. Since torus is an orientable 2 dimensional surface there is a well-defined outward and
inward normal direction.
    Consider the parallel surface 𝑇𝑜𝑢𝑡 defined by moving in the outward direction by 1 − 𝜖. 𝑇𝑜𝑢𝑡
is another smooth torus because reach of 𝑇 is 1. We get our bound by applying the isoperimetric
inequality to this surface.
                                                   38


     First we show the volume contained inside 𝑇𝑜𝑢𝑡 (i.e. the volume of the bounded component of
the complement of 𝑇𝑜𝑢𝑡 ) is at least 16𝜋 2 . Second we show that the surface area of 𝑇𝑜𝑢𝑡 is at most
twice the surface area of 𝑇. Then we apply the isoperimetric inequality.
     First we discuss the issue of volume of 𝑇𝑜𝑢𝑡 . Let 𝜖 ∈ (0, 1). Consider the surface Σ𝜖 , that is
obtained from 𝑇 by moving in the inward normal direction by a distance of 1 − 𝜖. Then Σ𝜖 is a
smooth surface because the reach of 𝑇 is 1.
     Consider the solid torus given by the filling the bounded component of the complement of Σ𝜖 .
This solid torus is a 3-manifold with boundary and its systole is a periodic geodesic [6, page 326].
Call this curve 𝑙 𝜖 . We claim the reach of 𝑙 𝜖 is at least 2 − 𝜖. Consider 𝑙 𝜖 (𝑡) as a unit speed geodesic;
for a 𝑡0 if 𝑙 𝜖 (𝑡 0 ) is in the interior of solid bounded by Σ𝜖 , then the acceleration of 𝑙 𝜖 (𝑡0 ) is zero
because R3 is flat. If 𝑙 𝜖 (𝑡 0 ) is on Σ𝜖 , the boundary of the solid torus, then the acceleration of 𝑙 𝜖 (𝑡0 )
is outward pointing, and since reach of 𝑇 is 1 and we moved another 1 − 𝜖 inside, we get that reach
of 𝑙 𝜖 is at least 2 − 𝜖. By [17, page 50], the length of 𝑙 𝜖 is at least 2𝜋(2 − 𝜖).
     The tube of radius 2 − 2𝜖 around 𝑙 𝜖 is contained in the inside of 𝑇𝑜𝑢𝑡 . The volume of this tube
by [22, page 7] is at least 2(2 − 𝜖)(2 − 2𝜖) 2 𝜋 2 . Therefore the volume inside of 𝑇𝑜𝑢𝑡 is at least
2(2 − 2𝜖) 3 𝜋 2 .
     Now we discuss the surface area of 𝑇𝑜𝑢𝑡 . Let 𝐻 and 𝐾 denote the mean and Gaussian curvature
for 𝑇 respectively. By the parallel surface area formula [22, page 8], we have
                                                              ∫                  ∫
                                                                               2
                            Area(𝑇𝑜𝑢𝑡 ) = Area(T) + (1 − 𝜖)      2𝐻 + (1 − 𝜖)        𝐾                 (3.11)
                                                               𝑇                   𝑇
                                       ∫
By the Gauss–Bonnet theorem 𝑇 𝐾 = 0.
                   ∫
     We claim 𝑇 2𝐻 ≤ Area(𝑇). This is because the surface area of the parallel surface in the
inward direction, obtained by moving from 𝑇 in the inward direction by distance 𝑡, is given by
                ∫
Area(T) − 𝑡 𝑇 2𝐻. Since the reach of 𝑇 is 1, this area must remain positive up to 𝑡 = 1 which
           ∫
implies 𝑇 2𝐻 ≤ Area(𝑇). Returning to (3.11), we get that
                                          Area(𝑇𝑜𝑢𝑡 ) ≤ (2 − 𝜖)Area(𝑇)                                 (3.12)
                                                       39


    Now we apply the isoperimetric inequality to 𝑇𝑜𝑢𝑡 .
                                                    4
                                  3Vol(𝑇𝑜𝑢𝑡 ) 2/3 ( 𝜋) 1/3 ≤ Area(𝑇𝑜𝑢𝑡 )                         (3.13)
                                                    3
                                                    4
                           3(2(2 − 2𝜖) 3 𝜋 2 ) 2/3 ( 𝜋) 1/3 ≤ (2 − 𝜖)Area(𝑇)                     (3.14)
                                                    3
Letting 𝜖 → 0 and simplifying we get
                                    70 ≈ 27/3 32/3 𝜋 5/3 ≤ Area(𝑇).                              (3.15)
                                                                                                     ■
3.3    Doing Better than Reach in Covering Estimates
Our earlier covering estimate for a compact manifold was based on reach. We used Gunther’s
theorem for volume comparison and the main ingredient in that theorem is an upper bound on
                                                                  1
sectional curvatures. For a compact manifold of reach 𝜏,          𝜏2
                                                                     is a global bound on all sectional
curvatures. Since this bound is based on reach, it is sensitive to perturbations. A small bump on
the manifold could change the value of 𝜏 substantially; see figure 3.1. However for a covering
argument, one should be able to find a global estimate which is based on averaging a local estimate.
Figure 3.1: Reach is a global parameter. A bump on the surface of a manifold can change its reach,
and therefore reach is sensitive to local perturbations.
    Consider the function 𝑘 : 𝑀 → R
                             𝑘 ( 𝑝) = max sectional curvature of 𝑀 at 𝑝                          (3.16)
Since sectional curvatures at a point are defined in the space of all planes in 𝑇𝑝 𝑀, Grassmanian
                                                                          ⨏
𝐺𝑟 (2, 𝑑), and this space is compact then the max exists. Let 𝑘,     ˜ 𝑘 , 𝑘 be the max, min and the
                                                                        ∼
                                                     40


                                                 ⨏                                          ⨏                    ⨏
average of 𝑘 over 𝑀. In this notation                 takes precedence, hence “ 𝑘 + 𝑘”               ˜ means “( 𝑘) + 𝑘”.
                                                                                                                      ˜
Again since 𝑀 is compact, all three exist. With respect to the covering estimates, the approach
with reach is based on 𝑘˜ ≤        1
                                  𝜏2
                                     .  Here we can     ⨏
                                                             show
                                                                ⨏
                                                                      that in the asymptotic regime as the radius of
                                                  2𝑑 𝑘˜     𝑘−(   𝑘)𝑘 −(2𝑑 −1) 𝑘˜ 𝑘
the covering goes to zero, one can use                              ⨏∼            ∼
                                                                                    . When 𝑘 is not a constant function,
                                                         ˜
                                                        𝑘+(2  𝑑 −1)    𝑘−2𝑑 𝑘
                                           ⨏                                  ∼
this quantity is strictly bounded by 𝑘 and 𝑘˜ and therefore it is an improvement over 𝑘˜ and 𝜏12 . We
show this with a calculation below:
                                                                               ⨏
                                                                0 < ( 𝑘˜ −        𝑘)( 𝑘˜ − 𝑘 )                     (3.17)
                            ⨏         ⨏                                              ⨏ ∼
                     2𝑑 𝑘˜ 𝑘 − 𝑘 𝑘 − (2𝑑 − 1)𝑘 𝑘˜ < (2𝑑 − 1) 𝑘˜                           𝑘 + 𝑘˜ 2 − 2𝑑 𝑘˜ 𝑘       (3.18)
                                   ∼                          ∼                                            ∼
                           ⨏         ⨏
                     2 𝑘˜ 𝑘 − 𝑘 𝑘 − (2 − 1) 𝑘˜ 𝑘
                      𝑑                            𝑑
                                   ∼                            ∼
                                          ⨏                       < 𝑘, ˜                                           (3.19)
                          𝑘˜ + (2 − 1) 𝑘 − 2 𝑘
                                 𝑑                      𝑑
                                                           ∼
and
                                   ⨏              ⨏
                                 (          ˜
                                       𝑘 − 𝑘)(        𝑘 − 𝑘) < 0                                                   (3.20)
                                                             ∼
                             ⨏2          ⨏                 ⨏                ⨏           ⨏
                 (2𝑑 − 1)       𝑘 + 𝑘˜      𝑘 − 2𝑑 𝑘          𝑘 < 2𝑑 𝑘˜         𝑘−𝑘        𝑘 − (2𝑑 − 1)𝑘 𝑘˜        (3.21)
                                                        ∼                            ∼                     ∼
                                                                            ⨏           ⨏
                                                           ⨏         2𝑑 𝑘˜      𝑘−𝑘        𝑘 − (2𝑑 − 1) 𝑘˜ 𝑘
                                                                                      ∼                      ∼
                                                              𝑘<                            ⨏                  ,   (3.22)
                                                                           𝑘˜ + (2𝑑 − 1) 𝑘 − 2𝑑 𝑘
                                                                                                       ∼
and therefore we can conclude
                                                 ⨏             ⨏
                                ⨏         2𝑑 𝑘˜      𝑘−𝑘           𝑘 − (2𝑑 − 1) 𝑘˜ 𝑘
                                                             ∼                          ∼
                                    𝑘<                              ⨏                        ˜
                                                                                          < 𝑘.                     (3.23)
                                                ˜𝑘 + (2𝑑 − 1) 𝑘 − 2𝑑 𝑘
                                                                                  ∼
    The averaged quantity comes from treating each packing ball separately. As usual we use the
packing-covering relationship to switch to a packing estimate. For each packing ball, we showed it
isolates a geodesic ball of some radius with a minimum volume. This minimum volume gives the
desired bound. Consider the supremum of all sectional curvatures only in that geodesic ball, then
the Gunther’s theorem still holds with this sectional curvature. We can average the values of max
sectional curvatures with one value per packing ball to get an averaged value. Now we show this
calculation. First we need a volume estimate lemma.
                                                                41


Lemma 3.3.1. Let 𝑀 be a simply connected d-dimensional manifold of constant sectional curvature
𝑘 ∈ R. Consider a geodesic ball of radius 𝑟 and volume 𝑉𝑘 (𝑟). Then for 𝑟 small enough we have
                                   2𝜋 𝑑/2 𝑟 𝑑 𝑘 (𝑑 − 1) 𝑑+2
                                           ( −                    𝑟 ) ≤ 𝑉𝑘 (𝑟).                     (3.24)
                                   Γ( 𝑑2 ) 𝑑         6(𝑑 + 2)
   Proof. The volume of a ball of radius 𝑟 in a space form is
                                                            ∫   𝑟
                                                   2𝜋 𝑑/2
                                       𝑉𝑘 (𝑟) =                    𝑆 𝑘 (𝑥) 𝑑−1 𝑑𝑥
                                                   Γ( 𝑑2 )    0
where for 𝑘 ∈ R
                                                           √
                                                  √1 sin( 𝑘𝑟)
                                               
                                                                             𝑘>0
                                               
                                               
                                               
                                                   𝑘
                                               
                                               
                                               
                                               
                                    𝑆 𝑘 (𝑟) = 𝑟                              𝑘=0                    (3.25)
                                               
                                               
                                                                √
                                               
                                               
                                                    1
                                               
                                                √−𝑘 sinh( −𝑘𝑟)              𝑘<0
                                               
                                               
                                               
   For 𝑘 > 0
                                    1        √                  𝑘𝑥 3 𝑘 2 𝑥 5
                                   √ sin( 𝑘𝑥) = 𝑥 −                    +         + ...              (3.26)
                                     𝑘                            6         5!
   and 𝑘 < 0
                               1            √                   (−𝑘)𝑥 3 𝑘 2 𝑥 5
                              √ sinh( −𝑘𝑥) = 𝑥 +                            +         + ...         (3.27)
                               −𝑘                                    6           5!
   Therefore
                                                          𝑘𝑥 3
                   ∫   𝑟                    ∫   𝑟
                                 𝑑−1
                         𝑆 𝑘 (𝑥)     𝑑𝑥 =         (𝑥 −          + ...) 𝑑−1 𝑑𝑥                       (3.28)
                     0                        0             6
                                                               (𝑑 − 1) 𝑑−2 −𝑘𝑥 3
                                            ∫ 𝑟                          
                                                     𝑑−1
                                          =        𝑥      +                 𝑥 (           ) + ...𝑑𝑥 (3.29)
                                              0                     1                6
                                            𝑟 𝑑 𝑘 (𝑑 − 1) 𝑑+2
                                          =      −                𝑟      + ...                      (3.30)
                                             𝑑        6(𝑑 + 2)
The next term in the expansion is always positive so for a small enough 𝑟 we have
                                                                 ∫    𝑟
                                  𝑟 𝑑 𝑘 (𝑑 − 1) 𝑑+2
                                       −               𝑟      ≤         𝑆 𝑘 (𝑥) 𝑑−1 𝑑𝑥              (3.31)
                                   𝑑      6(𝑑 + 2)                  0
                                                                                                        ■
   Next we give a lemma about averaging.
                                                           42


Lemma 3.3.2. Let 𝑀 be a closed orientable smooth Riemannian manifold of dimension d with
volume 𝑉. Let 𝑘 : 𝑀 → R be the map that gives the maximum sectional curvature at each point.
Let 𝐵𝑖 (𝑥𝑖 , 𝑟), 1 ≤ 𝑖 ≤ 𝑁 be disjoint closed geodesic balls of radius 𝑟 and centers 𝑥𝑖 . Let 𝑘 𝑖 be the
                                                                        Í
supremum of all sectional curvatures in 𝐵𝑖 . Let 𝑘 ∗ =                   𝑁
                                                                          𝑘𝑖
                                                                               be the average of 𝑘 𝑖 . Then for 𝑟 small
enough we have
                                            "                                             #
                                              2𝜋 𝑑/2 𝑟 𝑑 𝑘 ∗ (𝑑 − 1) 𝑑+2
                                 𝑁 ≤ 𝑉/                  ( −                       𝑟 )                             (3.32)
                                              Γ( 𝑑2 ) 𝑑            6(𝑑 + 2)
                                                                    Í
     Proof. Let 𝑉𝑖 (𝑟) be the volume of ball 𝑖. Then                   𝑖 𝑉𝑖 (𝑟)    ≤ 𝑉. By the Gunther’s theorem we
have
                                   2𝜋 𝑑/2 𝑟 𝑑 𝑘 𝑖 (𝑑 − 1) 𝑑+2
                                           ( −                       𝑟 ) ≤ 𝑉𝑖 (𝑟)                                  (3.33)
                                    Γ( 𝑑2 ) 𝑑           6(𝑑 + 2)
     Summing over the balls we get
                                         𝑁
                                      ∑︁    2𝜋 𝑑/2 𝑟 𝑑 𝑘 𝑖 (𝑑 − 1) 𝑑+2
                                                       ( −                      𝑟 ) ≤𝑉                             (3.34)
                                       𝑖=1 Γ( 2 )
                                                 𝑑        𝑑      6(𝑑 + 2)
                                  "                                            #
                                    2𝜋 𝑑/2   𝑟 𝑑          ∗
                                                        𝑘 (𝑑 − 1) 𝑑+2
                               𝑁           ( −                        𝑟 ) ≤𝑉                                       (3.35)
                                        𝑑
                                    Γ( 2 ) 𝑑             6(𝑑 + 2)
which is our claim.                                                                                                    ■
     So based on the above lemma if we find an estimate for 𝑘 ∗ =                         𝑘𝑖
                                                                                       Í
                                                                                        𝑖 𝑁  we can use sharper bound in
the packing estimates compared to the ones given by reach. In the asymptotic regime, as the radius
                                                                                 ∫
of packing balls goes to zero, a natural candidates for 𝑘 ∗ is 𝑉1 𝑀 𝑘. But because of varying packing
density this is not necessary the case.
                                                                                        1
                                                                                           ∫
                                                                                              𝑘 deviates from 𝑘 ∗ =
                                                                                                                    Í 𝑘𝑖
     Because of varying density as in figure 3.2, it is possible that                   𝑉   𝑀                          𝑁.
However, if 𝐵𝑖 (𝑥𝑖 , 𝑟) is a packing ball with center 𝑥𝑖 and 𝑣 𝑖 is the Voronoi cell corresponding to the
𝐵𝑖 , then
                                          𝐵𝑖 (𝑥𝑖 , 𝑟) ⊂ 𝑣 𝑖 ⊂ 𝐵𝑖 (𝑥𝑖 , 2𝑟)                                         (3.36)
This property restricts the deviation of our average   ⨏       ⨏
                                                                 from the continuous average over the manifold.
                                                 2𝑑 𝑘˜     𝑘−(   𝑘)𝑘 −(2𝑑 −1) 𝑘˜ 𝑘
We use this relation to obtain our bound                           ⨏∼              ∼
                                                                                     .
                                                        ˜
                                                        𝑘+(2 𝑑 −1)    𝑘−2𝑑 𝑘
                                                                             ∼
                                                            43


         (a) A packing with varying density                            (b) Veronoi cells associated to a packing
                                    (c) Voronoi cell contained in a doubled ball
Figure 3.2: Figure-a shows a packing with varying density. Figure-b shows the Veronoi cells
associated with a packing. Figure-c shows a Veronoi cell contained by a doubled packing ball.
Theorem 3.3.3. Let 𝑀 be a closed smooth Riemannian manifold with volume 𝑉. Let 𝐵𝑖𝑗 (𝑥 𝑖𝑗 , 𝑟), 1 ≤
                                                                                                         𝑗
𝑗 ≤ 𝑁𝑖 be a packing of 𝑀 with geodesic balls of radius 𝑟𝑖 and centers 𝑥𝑖 . Let 𝑝𝑖𝑗 ∈ 𝐵𝑖 be points
that are arbitrarily selected from each ball. Let 𝜎𝑖 be a probability measure obtained by placing a
mass of 1/𝑁𝑖 weight at 𝑝𝑖𝑗 . Assume 𝑟𝑖 → 0 and 𝜎𝑖 converges weakly to 𝜎.
    Let 𝑘 : 𝑀 → [𝑘 , 𝑘]    ˜ be the map that gives the maximum sectional curvature at each point.
                        ∼
                                                             ⨏
Assume 𝑘 is a Lipschitz function. Let 𝑘,         ˜ 𝑘 and 𝑘 be the max, min and the average of 𝑘 over 𝑀.
                                                    ∼
Let 𝑘 𝑖𝑗  := 𝑘 ( 𝑝𝑖𝑗 ).
    Let 𝜇 and 𝜇𝜎 be the push forward of measure induced by the volume form and 𝜎 on [𝑘 , 𝑘]                   ˜ via
                                                                                                           ∼
𝑘. Assume 𝜇 and 𝜇𝜎 are absolutely continuous with respect to the Lebesgue measure.
                                                       ⨏
    If 𝑘 is not a constant function, then 𝑘 < 𝑘 < 𝑘˜ and
                                                  ∼
                                                            ⨏       ⨏
                                                      2𝑑 𝑘˜    𝑘−(     𝑘)𝑘 − (2𝑑 − 1) 𝑘˜ 𝑘
                                      Í 𝑁𝑖    𝑖
                                         𝑗=1 𝑘 𝑗                         ∼               ∼
                           lim sup                ≤                      ⨏                 .                  (3.37)
                             𝑟 𝑖 →0       𝑁𝑖                𝑘˜ + (2𝑑 − 1) 𝑘 − 2𝑑 𝑘
                                                                                   ∼
                                                              44


     Proof. We first fix 𝑖 and work with a single packing. Later we consider 𝑖 → ∞. We partition
𝑀 into Voronoi cells based on their distance to the center of the balls 𝑥 𝑖𝑗 . Define
                                 𝑣 𝑖𝑗 = {𝑝 ∈ 𝑀 |𝑑𝑔 ( 𝑝, 𝑥 𝑖𝑗 ) < 𝑑𝑔 ( 𝑝, 𝑥 𝑖𝑘 ) 𝑗 ≠ 𝑘 }                   (3.38)
where 𝑑𝑔 is the geodesic distance. We assume the measure of the points that are equ-distant to two
of 𝑥 𝑖𝑗 is zero and we have
                                                                 𝑁𝑖
                                                                ∑︁
                                                           𝑉=         |𝑣 𝑖𝑗 |                             (3.39)
                                                                 𝑗=1
Since balls 𝐵𝑖𝑗 are a packing of 𝑀 we have
                                             𝐵𝑖𝑗 (𝑥 𝑖𝑗 , 𝑟𝑖 ) ⊂ 𝑣 𝑖𝑗 ⊂ 𝐵𝑖𝑗 (𝑥 𝑖𝑗 , 2𝑟𝑖 )                  (3.40)
                       ˜ we count the number 𝑘 𝑖 ∈ [𝑎, 𝑏] from above and below using the 𝑣 𝑖 and 𝜇.
     For 𝑎, 𝑏 ∈ [𝑘 , 𝑘],
                    ∼                                         𝑗                                         𝑗
Let 𝑐 be the Lipschitz constant of 𝑘.
     For an upper bound, if 𝑘 𝑖𝑗 ∈ [𝑎, 𝑏], then ∀𝑥 ∈ 𝑣 𝑖𝑗 , 𝑎 − 4𝑟𝑖 𝑐 ≤ 𝑘 (𝑥) ≤ 𝑏 + 4𝑟𝑖 𝑐. Then
                                      ∪      𝑣 𝑖𝑗 ⊂ {𝑝|𝑎 − 4𝑟𝑖 𝑐 ≤ 𝑘 ( 𝑝) ≤ 𝑏 + 4𝑟𝑖 𝑐}                    (3.41)
                               𝑖:𝑘 𝑖𝑗 ∈[𝑎,𝑏]
     Let 𝑉𝛼 (𝑟) be the volume of geodesic ball of radius 𝑟 in simply connected space of constant
sectional curvature 𝛼. By Gunther’s volume comparison theorem and relation ((3.40)), for 𝑟 small
enough we have
                                                       𝑉𝑏+4𝑟𝑐 (𝑟𝑖 ) ≤ |𝑣 𝑖𝑗 |                             (3.42)
Since the Voronoi cells are disjoint, from equation ((3.41))
                                                                𝜇(𝑎 − 4𝑟𝑖 𝑐, 𝑏 + 4𝑟𝑖 𝑐)
                                      (#𝑘 𝑖𝑗 ∈ [𝑎, 𝑏]) ≤                                                  (3.43)
                                                                        𝑉𝑏+4𝑟𝑖 𝑐 (𝑟)
where 𝜇(𝑎 − 4𝑟𝑖 𝑐, 𝑏 + 4𝑟𝑖 𝑐) = |{𝑝 : 𝑎 − 4𝑟𝑖 𝑐 ≤ 𝑘 ( 𝑝) ≤ 𝑏 + 4𝑟𝑖 𝑐}|.
     For a lower bound, consider all Voronoi cells that intersect the set {𝑎 + 4𝑟𝑖 𝑐 ≤ 𝑘 ( 𝑝) ≤ 𝑏 − 4𝑟𝑖 𝑐}.
           𝑏−𝑎
If 𝑟𝑖 <     8𝑐 this set is non-empty. For such Voronoi cells we have ∀𝑥 ∈ 𝑣 𝑖𝑗 , 𝑎 ≤ 𝑘 (𝑥) ≤ 𝑏 since 𝑣 𝑖𝑗
is contained in a ball of radius 2𝑟𝑖 and the 𝑘 𝑖𝑗 from 𝑣 𝑖𝑗 satisfies 𝑘 𝑖𝑗 ∈ [𝑎, 𝑏]. Thus
                               {𝑝|𝑎 + 4𝑟𝑖 𝑐 ≤ 𝑘 ( 𝑝) ≤ 𝑏 − 4𝑟𝑖 𝑐} ⊂                         ∪      𝑣 𝑖𝑗   (3.44)
                                                                                     𝑗:𝑘 𝑖𝑗 ∈[𝑎,𝑏]
                                                                45


Again let 𝑉𝛼 (𝑟) be the volume of geodesic ball of radius 𝑟 in simply connected space of constant
sectional curvature 𝛼. Using the Bishop-Gromov’s volume comparison theorem
                                                    |𝑣 𝑖𝑗 | ≤ 𝑉𝑎 (2𝑟𝑖 )                                      (3.45)
Since the Voronoi cells are disjoint, from equation ((3.44))
                                   𝜇(𝑎 + 4𝑟𝑖 𝑐, 𝑏 − 4𝑟𝑖 𝑐)
                                                                   ≤ (#𝑘 𝑖𝑗 ∈ [𝑎, 𝑏])                        (3.46)
                                            𝑉𝑎 (2𝑟𝑖 )
    Combining ((3.43)) and ((3.46)), we have
                     𝜇(𝑎 + 4𝑟𝑖 𝑐, 𝑏 − 4𝑟𝑖 𝑐)                                𝜇(𝑎 − 4𝑟𝑖 𝑐, 𝑏 + 4𝑟𝑖 𝑐)
                                                 ≤ (#𝑘 𝑖𝑗 ∈ [𝑎, 𝑏]) ≤                                        (3.47)
                            𝑉𝑎 (2𝑟𝑖 )                                             𝑉𝑏+4𝑟𝑖 𝑐 (𝑟𝑖 )
To get the proportion of 𝑘 𝑖 ∈ [𝑎, 𝑏] we must divide by 𝑁𝑖 . We divide the ((3.47)) for [𝑎, 𝑏] and
[𝑑, 𝑒] ⊂ [𝑘 , 𝑘]˜ to cancel the 𝑁𝑖 and get
            ∼
   𝑉𝑒+4𝑟𝑖 𝑐 (𝑟𝑖 ) 𝜇(𝑎 + 4𝑟𝑖 𝑐, 𝑏 − 4𝑟𝑖 𝑐)       (#𝑘 𝑖𝑗 ∈ [𝑎, 𝑏])          𝑉𝑑 (2𝑟𝑖 ) 𝜇(𝑎 − 4𝑟𝑖 𝑐, 𝑏 + 4𝑟𝑖 𝑐)
                                           ≤                         ≤                                       (3.48)
     𝑉𝑎 (2𝑟𝑖 ) 𝜇(𝑑 − 4𝑟𝑖 𝑐, 𝑒 + 4𝑟𝑖 𝑐)          (#𝑘 𝑗 ∈ [𝑑, 𝑒]) 𝑉𝑏+4𝑟𝑖 𝑐 (𝑟𝑖 ) 𝜇(𝑑 + 4𝑟𝑖 𝑐, 𝑒 − 4𝑟𝑖 𝑐)
                                                     𝑖
    Note that since curvature terms are second order in the volume expansion
                                                          𝑉𝛼 (2𝑟)
                                                  lim               = 2𝑑                                     (3.49)
                                                  𝑟→0 𝑉𝛽 (𝑟)
Letting 𝑟𝑖 → 0 and using ((3.49)) in ((3.48)) we get
                                  1 𝜇(𝑎, 𝑏)         (#𝑘 𝑖𝑗 ∈ [𝑎, 𝑏])
                                                                              𝑑 𝜇(𝑎, 𝑏)
                                                 ≤                       ≤  2                                (3.50)
                                 2𝑑 𝜇(𝑑, 𝑒)         (#𝑘 𝑗 ∈ [𝑑, 𝑒])
                                                           𝑖                    𝜇(𝑑, 𝑒)
                                         𝜇′ (𝑘)                                                           ˜ Letting
    Let 𝑓 (𝑎) = 𝜎′ (𝑎) and 𝑔(𝑘) =          𝑉 .    Note that 𝑔(𝑘) is a probability density on [𝑘 , 𝑘].
                                                                                                      ∼
𝑏 → 𝑎 and 𝑒 → 𝑑 and using convergence of 𝜎𝑖 → 𝜎, absolute continuity of 𝜎 and 𝜇,
                                          1 𝑔(𝑎)             𝑓 (𝑎)       𝑔(𝑎)
                                                        ≤           ≤ 2𝑑                                     (3.51)
                                          2 𝑔(𝑑)
                                            𝑑                𝑓 (𝑑)       𝑔(𝑑)
By assumption both 𝑓 (𝑘) and 𝑔(𝑘) are positive and continuous on [𝑘 , 𝑘].                     ˜ Therefore there are
                                                                                          ∼
constants 𝑐 1 < 1 < 𝑐 2 such that
                                            𝑐 1 𝑔(𝑘) ≤ 𝑓 (𝑘) ≤ 𝑐 2 𝑔(𝑘)                                      (3.52)
                                                              46


Suppose 𝑓 (𝑘 1 ) = 𝑐 1 𝑔(𝑘 1 ) and 𝑓 (𝑘 2 ) = 𝑐 2 𝑔(𝑘 2 ) for some 𝑘 1 , 𝑘 2 . Using (3.51) with 𝑎 = 𝑘 2 and
𝑑 = 𝑘 1 we get
                                          1 𝑔(𝑘 2 )      𝑐 2 𝑔(𝑘 2 )        𝑑 𝑔(𝑘 2 )
                                                      ≤               ≤   2                                         (3.53)
                                         2𝑑 𝑔(𝑘 1 )      𝑐 1 𝑔(𝑘 1 )            𝑔(𝑘 1 )
     which gives
                                                    𝑐1
                                                         ≤ 𝑐 2 ≤ 2𝑑 𝑐 1                                             (3.54)
                                                   2𝑑
     Since 𝑐 1 < 1 < 𝑐 2
                                             𝑐2
                                                𝑔(𝑘) ≤ 𝑓 (𝑘) ≤ 𝑐 2 𝑔(𝑘)                                             (3.55)
                                             2𝑑
Now consider that both 𝑓 (𝑘) and 𝑔(𝑘) are probability densities on [𝑘 , 𝑘].                        ˜      Given the linear
                                                                                                 ∼
                                                                       ∫                 ∫
constraints on 𝑓 (𝑘) given by condition (3.55), and 𝑓 (𝑘) = 𝑔(𝑘) = 1, from the simplex
                                        ∫
algorithm, we can conclude that 𝑘 𝑘 𝑓 (𝑘) is maximized when the inequalities in (3.55) are saturated
except possibility at one point such as 𝑘 0 to transition between the two sides of (3.55). Therefore
consider
                                                      
                                                       𝑐
                                                       2𝑑2 𝑔(𝑘)      if 𝑘 ≤ 𝑘 0
                                                      
                                                      
                                                      
                                           𝑓 (𝑘) =                                                                  (3.56)
                                                      
                                                      
                                                       𝑐 2 𝑔(𝑘)
                                                                     if 𝑘 0 < 𝑘
                                                      
          ∫   𝑘0                 ∫ 𝑘˜                                                                ∫ 𝑘˜
     Let    𝑘
                 𝑔(𝑘) = 𝑎 and     𝑘0
                                      𝑔(𝑘)  =  𝑏 with    𝑎,  𝑏  >  0,  𝑎  +  𝑏   =  1.   Then   from  𝑘
                                                                                                          𝑓 (𝑘) = 1 we get
            ∼                                                ∫                                        ∼
𝑐2                               2𝑑
2 𝑑 𝑎 + 𝑐 2 𝑏  =  1, and 𝑐 2 =     𝑑
                               𝑎+2 𝑏
                                      . Now   we  bound       𝑘
                                                                𝑘 𝑓 (𝑘) from above in two ways.
                              ∫                  ∫   𝑘0                  ∫   𝑘˜
                                             𝑐2
                                 𝑘 𝑓 (𝑘) = 𝑑             𝑘𝑔(𝑘) + 𝑐 2             𝑘𝑔(𝑘)                              (3.57)
                               𝑘             2    𝑘                       𝑘0
                                                  ∼
                                                 ∫   𝑘˜                              ∫    𝑘˜
                                             𝑐2                                  1
                                          = 𝑑           𝑘𝑔(𝑘) + 𝑐 2 (1 − 𝑑 )                 𝑘𝑔(𝑘)                  (3.58)
                                             2    𝑘                             2       𝑘0
                                                  ∼
                                             𝑐2 ⨏
                                          ≤ 𝑑 𝑘 + 𝑐 2 (1 − 21𝑑 )𝑏 𝑘˜                                                (3.59)
                                             2
                                             𝑐2 ⨏
                                          = 𝑑 𝑘 + (1 − 2𝑐𝑑2 ) 𝑘˜                   A                                (3.60)
                                             2
                                                             47


                     1                 𝑐2
where we use (1 −      )𝑐 𝑏
                    2𝑑 2
                             = (1 −    2𝑑
                                          ).  In another way,
                        ∫                    ∫   𝑘0                   ∫    𝑘˜
                                        𝑐2
                            𝑘 𝑓 (𝑘) = 𝑑             𝑘𝑔(𝑘) + 𝑐 2               𝑘𝑔(𝑘)                         (3.61)
                          𝑘             2      𝑘                         𝑘0
                                               ∼
                                                            ∫    𝑘0                  ∫  𝑘˜
                                                      1
                                     = −𝑐 2 (1 − 𝑑 )                𝑘𝑔(𝑘) + 𝑐 2            𝑘𝑔(𝑘)            (3.62)
                                                     2        𝑘                       𝑘
                                                              ∼                       ∼
                                                      1                 ⨏
                                     ≤ −𝑐 2 (1 −          )𝑎𝑘 + 𝑐 2          𝑘                              (3.63)
                                                     2𝑑 ∼
                                                                 ⨏
                                     = −(𝑐 2 − 1)𝑘 + 𝑐 2 𝑘                        B                         (3.64)
                                                      ∼
                                                     ∫
where we used 𝑐 2 (1 − 21𝑑 )𝑎 = 𝑐 2 − 1. Hence        𝑘
                                                          𝑘 𝑓 (𝑘) ≤ min{𝐴, 𝐵}. If we allow 𝑐 2 to be a variable
then 𝐴 and 𝐵 are lines with positive and negative slopes respectively. Hence their intersection is
                    ∫
an upper bound for 𝑘 𝑘 𝑓 (𝑘).
                                 𝑐 ⨏                 𝑐 ˜
                                                                                      ⨏
                                       𝑘 + (1 −     2𝑑
                                                         )𝑘   = −(𝑐 − 1)𝑘 + 𝑐            𝑘                  (3.65)
                                 2𝑑                                            ∼
                                                                      𝑘˜ − 𝑘
                                                 𝑐=               1
                                                                      ⨏ ∼         1 ˜
                                                                                                            (3.66)
                                                         (1 − 2𝑑 ) 𝑘 +           2𝑑
                                                                                    𝑘 −𝑘
                                                                                         ∼
Substituting back into A or B, we get
                                                    ⨏            ⨏
                             ∫                2𝑑 𝑘˜      𝑘−(         𝑘)𝑘 − (2𝑑 − 1) 𝑘˜ 𝑘
                                                                        ∼                   ∼
                                 𝑘 𝑓 (𝑘) ≤                               ⨏                                  (3.67)
                               𝑘                    𝑘˜ + (2𝑑 − 1) 𝑘 − 2𝑑 𝑘
                                                                                      ∼
                                                                                                                ■
Corollary 3.3.4. Let 𝑇 ⊂ R3 be a smooth torus. Let 𝐵𝑖 be packing balls of radius 𝑟 and let 𝑝𝑖 be
arbitrary points selected from 𝐵𝑖 , and 𝑘 ( 𝑝𝑖 ) be the sectional curvature at 𝑝𝑖 . Then
                                                    Í
                                                          𝑘 ( 𝑝𝑖 )      −3 𝑘˜ 𝑘
                                                       𝑖                       ∼
                                        lim sup                     ≤                                       (3.68)
                                           𝑟→0        #{𝑘 𝑖 }           ˜
                                                                        𝑘 − 4𝑘
                                                                                ∼
                                                                                                 ⨏
    Proof. We use the previous theorem with 𝑑 = 2. By Gauss-Bonnet theorem                         𝑘 = 2 − 2𝑔 = 0,
and the claim follows. Note that in a torus in R3 , the max and min of sectional curvatures are
respectively positive and negative. Hence our bound is a positive quantity.                                     ■
                                                           48


                                                   CHAPTER 4
                   SUFFICIENT CONDITIONS FOR JL MAPS OF MANIFOLDS
4.1      With sub-Gaussian Matrices (Variation 1)
In this section we improve on the work in [15, theorem 2]. In that work, a JL map is constructed for
a compact manifold via Gaussian matrices. The method is based on first finding an overestimate
for covering for the unit secants of a compact manifold based on its reach and volume, and second
using a chaining argument to get a JL map. The chaining method is based on using a sequence of
progressively denser 𝜖-nets and representing each point in the manifold as a branch in the tree of
𝜖-nets. [35, Chapter 8] provides a detailed explanation of this technique.
    Here we improve on the previous work in two ways. First we allow the manifold to have
boundary. This generalization allows the JL map to apply to data models where the data terminates
at a boundary before it can wrap around and form a close manifold. Second we use the covering
numbers to calculate the Gaussian width of the unit secants of the manifold via Dudley’s inequality,
and then we apply the matrix deviation inequality to get JL maps for sub-Gaussian matrices. This
calculation streamlines and simplifies the generic chaining calculation in the earlier work. Our
main theorem in this section is theorem 4.1.7.
    First we state the result in [15, theorem 2].
Theorem 4.1.1. [15, theorem 2] Let 𝑀 be a compact 𝑑-dimensional Riemannian submanifold of
R𝑁 with reach 𝜏 1 and volume 𝑉𝑀 . Conveniently assume that
                                                  𝑉𝑀         21
                                                     𝑑
                                                       ≥ ( √ )𝑑                                (4.1)
                                                   𝜏       2 𝑑
                  1
For 0 < 𝜖 ≤       3 and 0 < 𝜌 < 1. Let Φ be a random 𝑚 × 𝑁 matrix populated with i.i.d zero mean
    1 In [15] the concept of condition number is used which is just the reciprocal of reach.
                                                         49


Gaussian random variables with variance of 1/𝑚 with
                                                           √
                               −2                            𝑑            2        8
                      𝑚 ≥ 18𝜖 max(24𝑑 + 2𝑑 log( 2 ) + log(2𝑉𝑀               ), log( )).       (4.2)
                                                           𝜏𝜖                      𝜌
Then with probability at least 1 − 𝜌, for every pair of points 𝑥 1 , 𝑥2 ∈ 𝑀
                       (1 − 𝜖)||𝑥 1 − 𝑥 2 || ≤ ||Φ𝑥 1 − Φ𝑥2 || ≤ (1 + 𝜖)||𝑥1 − 𝑥 2 ||.        (4.3)
    Next we state the definitions for sub-Gaussian random variables, vectors and matrices, and then
we state the matrix deviation inequality.
Definition 4.1.2. [35, section 2.5](sub-Gaussian Random Variable) A random variable 𝑋 is sub-
Gaussian if it satisfies
                                              E exp(𝑋 2 /𝑎 2 ) ≤ 2                            (4.4)
for some positive constant 𝑎. The sub-Gaussian norm of 𝑋 denoted by ||𝑋 || 𝜓2 is defined as
                               ||𝑋 || 𝜓2 = inf{𝑡 > 0 : E exp(𝑋 2 /𝑡 2 ) ≤ 2}                  (4.5)
Alternatively, a random variable is called sub-Gaussian if it satisfies
                                   P{|𝑋 | ≥ 𝑡} ≤ 2 exp(−𝑐𝑡 2 /||𝑋 || 2𝜓2 )                    (4.6)
for all 𝑡 ≥ 0 where 𝑐 > 0 is an absolute constant.
Definition 4.1.3. [35, definition 3.4.1](sub-Gaussian Random Vector) A random vector 𝑋 ∈ R𝑛 is
called sub-Gaussian if the one-dimensional marginals ⟨𝑋, 𝑥⟩ are sub-Gaussian random variables
for all 𝑥 ∈ R𝑛 . The sub-Gaussian norm of 𝑋 is defined as
                                        ||𝑋 || 𝜓2 = sup ||⟨𝑋, 𝑥⟩|| 𝜓2 .                       (4.7)
                                                   𝑥∈𝑆 𝑛−1
Definition 4.1.4. [35, definition 3.4.1](sub-Gaussian Random Matrix) A random 𝑚 × 𝑛 matrix 𝐴
such that its rows 𝐴𝑖 are independent, isotropic and sub-Gaussian random vectors in R𝑛 is called
a sub-Gaussian random matrix.
                                                      50


Theorem 4.1.5. [35, theorem 9.1.1 and 9.1.8] Let 𝐴 be 𝑚 × 𝑛 sub-Gaussian random matrix. Let 𝐴𝑖
be the rows of 𝐴 with 𝐾 = max || 𝐴𝑖 || 𝜓2 . Let 𝑇 ⊂ R𝑛 with Gaussian width 𝜔(𝑇). Then there exists
an absolute constant 𝐶 such that for any 𝑢 ≥ 0 the following statement holds with probability at
least 1 − 2 exp(−𝑢 2 ).
                                        √
                        sup || 𝐴𝑥|| 2 − 𝑚||𝑥|| 2 ≤ 𝐶𝐾 2 [𝜔(𝑇) + 𝑢 · rad(𝑇)].
                        𝑥∈𝑇
                                                                                                  ■
    We rewrite the matrix deviation inequality as lower bound for the number of rows of a sub-
Gaussian matrix.
Theorem 4.1.6. Let 𝑇 ⊂ S𝑛−1 ⊂ R𝑛 with Gaussian width 𝜔(𝑇). Let 0 < 𝜖, 𝜌 < 1. Let 𝐴 be a 𝑚 × 𝑛
sub-Gaussian random matrix with 𝐾 = max || 𝐴𝑖 || 𝜓2 . Then there exists a universal constant 𝐶 such
that if
                                                   √︃         2
                                                           2
                                 𝐶 2 𝐾 4 𝜔(𝑇) + log( 1−𝜌     )
                                                                  ≤𝑚                           (4.8)
                                                 𝜖2
then with probably at least 𝜌 the following expressions
                                                   1
                               (1 − 𝜖)||𝑥|| ≤ || √ 𝐴𝑥|| ≤ (1 + 𝜖)||𝑥||
                                                    𝑚
holds simultaneously for all 𝑥 ∈ 𝑇.
    Proof. We use theorem 4.1.5. Since 𝑇 ⊂ S𝑛−1 , rad(𝑇) = 1. Inverting the probability expression
                                      √︃
                                              2
𝜌 = 1 − 2 exp(−𝑢 2 ), we obtain 𝑢 = log( 1−𝜌     ). To control the error, we need
                                                𝐶𝐾 2
                                                √ [𝜔(𝑇) + 𝑢] ≤ 𝜖
                                                   𝑚
                                                  √︃         2
                                    2  4                  2
                                 𝐶 𝐾 𝜔(𝑇) + log( 1−𝜌 )
                                                                  ≤ 𝑚.
                                                𝜖2
This completes the proof.                                                                         ■
    Now we put our matrix deviation inequality and the Gaussian width upper bound calculation
together to a JL map for a compact manifold with boundary.
                                                    51


Theorem 4.1.7. (JL Map of a Compact Submanifold of R𝑛 with Boundary via sub-Gaussian
Random Matrices) Let 𝑀 ↩→ R𝑛 be a compact 𝑑-dimensional submanifold of R𝑛 with boundary
𝜕 𝑀. Let 𝜏𝑀 be the reach of 𝑀. Let 𝜏𝑖 be the reach of the 𝑖-th connected component of 𝜕 𝑀 as a
submanifold of R𝑛 . Let 𝜏 = inf 𝑀,𝑖 {𝜏𝑀 , 𝜏𝑖 }. Let 𝑉𝑀 be the volume of 𝑀 and 𝑉𝜕 𝑀 be the volume of
𝜕 𝑀. Define
                                                        𝑉𝑀 41 𝑑 𝑉𝜕 𝑀 81 𝑑−1
                                                  𝛼=          ( ) +            ( )                                    (4.9)
                                                        𝜔𝑑 𝜏            𝜔 𝑑−1 𝜏
                                                         𝛼2
                                                  𝑐 = ( + 𝑑3𝑑 𝛼)                                                    (4.10)
                                                           2
                                                         √     √︁
                                               𝜔∗ = 4 2 log(𝑐) + 4𝑑.                                                (4.11)
      Let 𝐴 be a 𝑚 × 𝑛 sub-Gaussian random matrix such that 𝐾 = max𝑖 || 𝐴𝑖 || 𝜓2 .
      Let 0 < 𝜖, 𝜌 < 1. There exists a universal constant 𝐶 such that if 𝑚 satisfies
                                                                  √︃           2
                                                                            2
                                                 𝐶 2 𝐾 4 𝜔∗ + log( 1−𝜌        )
                                                                                   ≤𝑚                               (4.12)
                                                                  𝜖2
then with probably at least 𝜌, the following bounds hold simultaneously for all 𝑥1 , 𝑥2 ∈ 𝑀,
                              (1 − 𝜖)||𝑥 1 − 𝑥 2 || ≤ || 𝐴(𝑥 1 − 𝑥2 )|| ≤ (1 + 𝜖)||𝑥1 − 𝑥 2 ||.                     (4.13)
      Proof. We use theorem 4.1.6. We know that 𝜔∗ is an upper bound for the Gaussian width of
𝑈 (𝑀). Since 𝑈 (𝑀) is a subset of the unit sphere, we can apply theorem 4.1.6 and this immediately
gives the result.                                                                                                         ■
Remark 4.1.8. We analyze the dependence of 𝑚 on 𝑑 while keeping the other variables fixed. If
                                 √︃             2
                                           2
                   𝐶 2 𝐾 4 𝜔(𝑇)+ log( 1−𝜌      )
one puts 𝑚 =                     𝜖2
                                                    as the least sufficient value of 𝑚, then 𝑚 depends on 𝑑 with
                                                1      Γ( 𝑑2 +1)
order of 𝑂 (𝑑 log(𝑑)). We have                𝜔𝑑    =       𝑑     = 𝑂 (𝑑 𝑑 ), 𝛼 = 𝑉𝜔𝑀𝑑 ( 41  𝑑  𝑉𝜕𝑀 81 𝑑−1
                                                                                          𝜏 ) + 𝜔 𝑑−1 ( 𝜏 ) = 𝑂 (𝑑 𝑑 ), 𝑐 =
                                                          𝜋2
    2                                                        √ √︁                   √︁                       √︁
( 𝛼2  + 2𝑑3𝑑−1 𝛼) = 𝑂 (𝑑 2𝑑 ), and 𝜔(𝑀) = 4 2 log(𝑐) + 4𝑑 = log(𝑂 (𝑑 2𝑑 ) + 4𝑑 = 𝑂 ( 𝑑 log(𝑑)).
                            √︃          2
                                     2
             𝐶 2 𝐾 4 𝜔(𝑇)+ log( 1−𝜌    )
Since 𝑚 =                   𝜖2
                                            , then 𝑚 = 𝑂 (𝑑 log(𝑑)).
Remark 4.1.9. Comparing our result to the work in [15, theorem 2], we have removed the mild
                                             𝑉           21
geometric condition on reach                𝜏𝑑
                                                   ≥ (   √ )𝑑     and can accommodate the presence of a boundary.
                                                        2 𝑑
                                                                     52


However the dependence of the final dimension, 𝑚, on the dimension of the manifold, 𝑑, remains
𝑂 (𝑑 log(𝑑)).
4.2     With SORS Matrices
In this section we improve the work in [29], where a sufficient condition is given for finding JL
maps via SORS matrices in terms of Gaussian width of the target set. The method is base on
the Multi-Resolution Restricted Isometry Property (MRIP). Separately in the work of [10], an
improvement in RIP estimates for SORS matrices is presented. Here we combine the improved RIP
estimate with the MRIP framework and obtain an improved JL map for compact manifolds based
on the Gaussian width of their unit secants. Theorem 4.2.16 is our main theorem in this section.
     We start with the definition of SORS matrices and some properties.
Definition 4.2.1. (SORS Matrix) Let 𝑀 denote a R𝑛×𝑛 orthogonal matrix (or C𝑛×𝑛 where some
authors use unitary matrices) obeying
                                                                              𝐾
                             𝑀∗𝑀 = 𝐼           and          max |𝑀𝑖, 𝑗 | ≤ √                   (4.14)
                                                             𝑖, 𝑗              𝑛
Let 𝐻 ∈ R𝑚×𝑛 be a random matrix created from selecting rows of 𝑀 in an i.i.d fashion. Let
𝐷 ∈ R𝑛×𝑛 be a random diagonal matrix with diagonal entries of ±1 of equal probability. Then
     √
𝐴 = 𝑛𝐻𝐷 is a Subsampled Orthogonal with Random Sign (SORS) matrix with constant 𝐾. Note
that 𝐾 ≥ 1 since 1 = 𝑖 𝑀𝑖2𝑗 ≤ 𝑛( √𝐾𝑛 ) 2 = 𝐾 2 .
                       Í
Definition 4.2.2. (RIP)[18, Prop 6.1] The sth restricted isometry constant 𝜖 𝑠 of a matrix 𝐴 ∈ R𝑚×𝑁
is the smallest 𝜖 ≥ 0 such that for all 𝑠-sparse 𝑥 ∈ R𝑁
                                (1 − 𝜖)||𝑥|| 2 ≤ || 𝐴𝑥|| 2 ≤ (1 + 𝜖)||𝑥|| 2 .
Proposition 4.2.3. [18, Prop 6.6] For matrix 𝐴, let 𝜖 𝑠 be the 𝑠th restricted isometry constant of 𝐴.
Then for integers 1 ≤ 𝑠 ≤ 𝑡
                                     𝑡−𝑑         𝑑
                               𝜖𝑡 ≤       𝜖2𝑠 + 𝜖 𝑠 ,         𝑑 = gcd(𝑠, 𝑡).
                                      𝑠          𝑠
                                                    53


In particular since 𝜖 𝑠 ≤ 𝜖2𝑠
                                                            𝑡
                                                       𝜖𝑡 ≤ 𝜖2𝑠 .
                                                            𝑠
Notation 4.2.4. For a matrix 𝐴, if 𝜖 𝑠 ≤ 𝜖, then we say 𝐴 satisfies the RIP of order (𝑠, 𝜖).
Proposition 4.2.5. Let 𝑠 ∈ N, and 1 ≤ 𝑘 be a real number. Then
                                                   𝜖 𝑠 ≤ 𝑘𝜖 (2 ⌈ 𝑠 ⌉) .
                                                                 𝑘
     Proof. From proposition 4.2.3, or [18, Prop 6.6], for 1 ≤ 𝑠 ≤ 𝑡, we have 𝜖𝑡 ≤ 𝑠𝑡 𝜖2𝑠 .
                                                                            𝑠
Since 1 ≤ 𝑘 and 𝑠 is an integer, 1 ≤ ⌈ 𝑘𝑠 ⌉ ≤ 𝑠, and hence 𝜖 𝑠 ≤                𝜖 (2 ⌈ 𝑠 ⌉) ≤ 𝑘𝜖 (2 ⌈ 𝑠 ⌉) .
                                                                         ⌈ 𝑘𝑠 ⌉        𝑘              𝑘
Definition 4.2.6. [29, definition 2.1](Extended Restricted Isometry Property (ERIP)) A matrix
𝐴 ∈ R𝑚×𝑁 satisfies the ERIP of order (𝑠, 𝜖) if every 𝑥 ∈ R𝑁 with ||𝑥|| 0 = 𝑠 satisfies
                                      || 𝐴𝑥|| 2 − ||𝑥|| 2 ≤ max{𝜖, 𝜖 2 }||𝑥|| 2
Remark 4.2.7. The above definition differs from the RIP when 1 < 𝜖.
Definition 4.2.8. [29, definition 2.2] (Multiresolution Restricted Isometry Property (MRIP)) A
matrix 𝐴 ∈ R𝑚×𝑁 satisfies the MRIP with (𝑠, 𝜖) if it possesses the extended RIP for (2𝑙 𝑠, 2𝑙/2 𝜖) for
0 ≤ 𝑙 ≤ ⌊log2 ( 𝑁𝑠 )⌋.
     Next we state an existing RIP estimate for SORS matrices from [18], followed by an improvement
from [10]. The improvement has one fewer log factor.
Theorem 4.2.9. (RIP for SORS matrices) [18, theorem 12.31]Let 𝐴 ∈ C𝑚×𝑁 be a SORS matrix
with constant 𝐾 ≥ 1. For 𝜖 ∈ (0, 1) if
                                             𝑚 ≥ 𝐶𝐾 2 𝜖 −2 𝑠 log4 (𝑁)                                            (4.15)
                                                3
then with probability at least 1 − 𝑁 − log        (𝑁) the restricted isometry constant of        √1 𝐴    for sparsity 𝑠,
                                                                                                   𝑚
i.e 𝜖 𝑠 , satisfies 𝜖 𝑠 ≤ 𝜖. The constant 𝐶 > 0 is universal.
     Now we present the improvement from [10].
                                                           54


Theorem 4.2.10. [10, Thm 1.1] There exist absolute constants 𝜅, 𝑐 0 , 𝑐 1 > 0 such that the following
holds. Let 𝑋1 , ..., 𝑋𝑚 be independent copies of a random vector 𝑋 ∈ C𝑁 with bounded coordinates,
i.e. max1≤𝑖≤𝑁 |⟨𝑋, e𝑖 ⟩| ≤ 𝐾 for some 𝐾 > 0, where {𝑒𝑖 }𝑖=1                𝑁
                                                                               is the standard basis of      C𝑁 . Let
                              √
𝑇 ⊆ {𝑥 ∈ C𝑁 : ∥𝑥∥ 1 ≤ 𝑠}, 𝜖 ∈ (0, 𝜅), and assume that
                                                                                  
                                     𝑚 ≥ 𝑐 0 𝐾 2 𝜖 −2 𝑠 log(𝑒𝑁) log2 𝑠𝐾 2 /𝜖 .
Then with probability exceeding 1 − 2 exp −𝜖 2 𝑚/(𝑠𝐾 2 ) ,
                                                                     
                                 𝑚
                                                                                              !
                             1 ∑︁
                      sup           |⟨𝑦, 𝑋𝑖 ⟩| 2 − E|⟨𝑦, 𝑋⟩| 2 ≤ 𝑐 1 𝜖 1 + sup E|⟨𝑦, 𝑋⟩| 2 .
                      𝑦∈𝑇 𝑚 𝑖=1                                                𝑦∈𝑇
Approximate values of constants are 𝜅 ≈ 0.306, 𝑐 0 ≈ 316792 and for 𝑐 1 we have 𝑐 1 = 492.
Corollary 4.2.11. (Adapted from [10, Thm 1.1]) There exists absolute constants, 𝑎 0 , 𝑎 1 , 𝑎 2 > 0
such that the following holds for 𝜖 ∈ (0, 𝑎 2 ]. Assume 𝐴 is a 𝑚 × 𝑁 SORS matrix with constant 𝐾
such that
                                                                                2
                                                                                 
                                                  2  𝑠              2 𝑎 1 𝑠𝐾
                                      𝑚 ≥ 𝑎 0 𝐾 2 log(𝑒𝑁) log                        ,
                                                    𝜖                        𝜖
then, with probability at least 1 − 2 exp(−𝜖 2 𝑚/(𝑎 21 𝑠𝐾 2 )), for all 𝑠-sparse vectors 𝑥 ∈ R𝑛 we have
                                                                2
                                                         1
                                                 2
                                   (1 − 𝜖)∥𝑥∥ ≤ √ 𝐴𝑥               ≤ (1 + 𝜖)∥𝑥∥ 2 .
                                                          𝑚
                                                                                           C
     Proof. Using theorem 4.2.10, we consider unit length vectors 𝑇 := {𝑥 ∈ 𝑁 with ∥𝑥∥ 2 = 1 and
            √
                                                                      C
∥𝑥∥ 1 ≤ 𝑠}. Since 𝐴 is a SORS matrix, there is a unitary 𝑁×𝑁 matrix 𝑈 where 𝐴 is sampled from
√                                                                                      √
   𝑁𝑈. Let 𝑋 be a random vector uniformly distributed in the rows of 𝑁𝑈. Then we can apply
                                                                                  2
theorem 4.2.10 to 𝑋. We have E|⟨𝑥, 𝑋⟩| 2 = ∥𝑥∥ 22 = 1 and √1𝑚 𝐴𝑥 = 𝑚1 𝑖=1                    |⟨𝑥, 𝑋𝑖 ⟩| 2 . Thus
                                                                                        Í𝑚
                                                                                  2
                                                          2
                                                   1
                                         sup √ 𝐴𝑥           − ∥𝑥∥ 22 ≤ 2𝑐 1 𝜖 .
                                         𝑥∈𝑇        𝑚     2
Changing constants to account for the extra 2𝑐 1 factor accompanying the 𝜖 above gives the stated
bounds on the probability and 𝑚. The new constants and their approximate values are 𝑎 0 =
𝑐 0 (2𝑐 1 ) 2 ≈ 3 × 1011 , 𝑎 1 = 2𝑐 1 = 984, 𝑎 2 = 2𝑐 1 𝜅 ≈ 301.                                                   ■
                                                           55


    In the next corollary, we remove the restriction on the size of 𝜖 in corollary 4.2.11. We also
introduce a parameter 𝜌 for more explicit control on the probability of success. Using 𝜌 we relate
the number of rows, 𝑚, to the success probability.
Corollary 4.2.12. Let 𝐴 be a 𝑚 × 𝑁 SORS matrix with constant 𝐾. Let 0 < 𝑠, 𝜖, 𝜌, with 𝑠 an integer
and 𝜖, 𝜌 real numbers. There exists absolute constants, 𝑎 0 , 𝑎 1 , 𝑎 2 > 0 such that the following holds.
    If 𝜖 ≤ 𝑎 2 , and
                                                                                 2
                                                  𝑠                    2 𝑎 1 𝑠𝐾
                                  𝑚 ≥ 𝑎0 𝐾 2           log(𝑒𝑁)     log  (           )(1 + 𝜌)                             (4.16)
                                                 𝜖2                          𝜖
                                                                                                  
                                                                           2 𝑎 1 𝑠𝐾 2
then with probability at least 1 − 2 exp −𝑎             𝑎2
                                                          0
                                                            log(𝑒𝑁)    log   (    𝜖   )(1    + 𝜌)    , for all 𝑠-sparse vectors
                                                         1
𝑥 ∈ R𝑛 we have
                                                               1
                                  (1 − 𝜖)||𝑥|| 2 ≤ || √ 𝐴𝑥|| 2 ≤ (1 + 𝜖)||𝑥|| 2 .                                        (4.17)
                                                               𝑚
    If 𝜖 ≥ 𝑎 2 and
                                       2
                                         2⌈ 𝑎𝜖2 𝑠 ⌉                   2
                                                                         𝑎 1 𝐾 2 2⌈ 𝑎𝜖2 𝑠 ⌉
                           𝑚 ≥ 𝑎0 𝐾                   log(𝑒𝑁) log (                         )(1 + 𝜌)                     (4.18)
                                           𝑎 22                                 𝑎2
                                                                                       𝑎2 𝑠
                                                                                                           
                                                                                               2
                                                          −𝑎 0                 2 𝑎 1 2⌈ 𝜖 ⌉𝐾
then with probability at least 1 − 2 exp 𝑎2 log(𝑒𝑁) log (                               𝑎2       )(1 + 𝜌) , for all 𝑠-sparse
                                                            1
vectors 𝑥 ∈  R𝑛   we have
                                                               1
                                  (1 − 𝜖)||𝑥|| 2 ≤ || √ 𝐴𝑥|| 2 ≤ (1 + 𝜖)||𝑥|| 2 .
                                                               𝑚
    Proof. The case of 𝜖 ≤ 𝑎 2 is the same as before, with a simplified expression for probability. In
the simplification we plug in the expression for 𝑚 into the probability bound and simplify. For the
case of 𝜖 > 𝑎 2 , in the formula for 𝜖 ≤ 𝑎 2 we put 𝜖 = 𝑎 2 and 𝑠 = 2⌈ 𝑎𝜖2 𝑠 ⌉. By proposition 4.2.5, from
RIP of order (2⌈ 𝑎𝜖2 𝑠 ⌉, 𝑎 2 ), putting 𝑘 =       𝜖
                                                  𝑎2 ,  we can get RIP of order (𝑠, 𝜖). In this process we need
                                       2
                                         2⌈ 𝑎𝜖2 𝑠 ⌉                  2
                                                                        𝑎 1 𝐾 2 2⌈ 𝑎𝜖2 𝑠 ⌉
                          𝑚 ≥ 𝑎0 𝐾                   log(𝑒𝑁) log (                          )(1 + 𝜌).
                                           𝑎 22                                 𝑎2
    Substituting the value of 𝑚 in the probability bound, we get the claimed expression.                                     ■
    In the next corollary we combine the two regimes of 𝜖 into one expression.
                                                                 56


Corollary 4.2.13. There exist constants 𝑎 0 , 𝑎 1 , 𝑎 2 such that for a SORS matrix 𝐴 of size 𝑚 × 𝑁
with constant 𝐾, and positive integer 𝑠 and positive reals 𝜖 and 𝜌 the following holds. If
                                                                                          
                                     2 1      2                    2      1 2            2
                     𝑚 ≥ 𝑎 0 𝐾 ( 2 + 2 )𝑠 log(𝑒𝑁) log ( + )𝑎 1 𝑠𝐾 (1 + 𝜌)
                                       𝜖     𝑎2                           𝜖 𝑎2
then the matrix   √1 𝐴    satisfies the RIP condition for (𝑠, 𝜖) with probability at least
                    𝑚
                                                                                       !
                                                −𝑎 0                       𝑎 1
                                    1 − 2 exp          log(𝑒𝑁) log2 ( )(1 + 𝜌) .
                                                 𝑎 21                      𝑎2
    Proof. We combine the two bounds (4.16) and (4.18) for 𝑚 from the cases of 𝜖 ≤ 𝑎 2 and
𝜖 > 𝑎 2 into a single bound. Similarly we combine the probability bounds into one expres-
sions. This final expression is independent of 𝑠 and 𝜖 and it relates to 𝑚 through the variable 𝜌.
                                                         𝑎2 𝑆
                                                      2⌈   𝜖 ⌉     2𝑆
For the 𝑚 bound, when 𝜖 ≥ 𝑎 2 we used                    𝑎 22
                                                                 ≤ 𝑎 22
                                                                        . For the probability bounds first note that
                                                                                         𝑎 1 𝑠𝐾 2     𝑎1
1 − exp(−𝑥) is an increasing function. Then when 𝜖 ≤ 𝑎 2 we have                             𝜖      ≥ 𝑎2 ,  since 𝑠, 𝐾 ≥ 1.
                                 𝑎 𝑠
                       𝑎 1 𝐾 2 2⌈ 𝜖2 ⌉   𝑎1
When 𝜖 ≥ 𝑎 2 , then           𝑎2       ≥ 𝑎2 .                                                                             ■
    The focus of the next lemma is transitioning from RIP bounds to their multi-resolution (MRIP)
counterparts.
Proposition 4.2.14. Let 𝐴 be a 𝑚 × 𝑁 SORS matrix with constant 𝐾. Let 𝑠 be a positive integer,
𝜖 < 1 and 𝜌 be positive reals. There exists absolute constants, 𝑎 0 , 𝑎 1 , 𝑎 2 > 0 such that if
                                                                                                  !
                           1        1    1                                     1     1       1
          𝑚 ≥ 2𝑎 0 𝐾 2 ( 2 + 2 + 3 )𝑠 log(𝑒𝑁) log2 2𝑎 1 𝑠𝐾 2 ( 2 +                      + 2 ) (1 + 𝜌)                 (4.19)
                         𝜖        𝑎2 𝑎2                                        𝜖    𝑎2 𝑎2
then  √1 𝐴  satisfies the MRIP (𝑠, 𝜖) criteria with probability at least
        𝑚
                                                                                                  !
                                                              𝑎0                  𝑎 1
                        1 − 2 exp log(log(𝑁)) − 2 log(𝑒𝑁) log2 ( )(1 + 𝜌) .
                                                              𝑎1                  𝑎2
    Proof. The probability expression is simpler to obtain so we start with it. We take the expression
of probability from corollary 4.2.13, and apply it log 𝑁 times. In the definition of MRIP, given a
                                                                                                          𝑖
sparsity level 𝑠 and distortion 𝜖, we need to get ERIP conditions of the form (2𝑖 𝑠, 2 2 𝜖) or (2𝑖 𝑠, 2𝑖 𝜖 2 )
for 𝑖 starting from 0 and continuing until 2𝑖 𝑠 is larger than 𝑁. Therefore there are at most log 𝑁
steps. Since the probability expression in corollary 4.2.13 is independent of 𝑠 and 𝜖, we can use
                                                               57


the same expression for different steps. Combining the failure probabilities by the union bound we
arrive at the expression
                                                                                             !
                                                        𝑎0                        𝑎 1
                       1 − 2 exp log(log(𝑁)) − 2 log(𝑒𝑁) log2 ( )(1 + 𝜌) .                        (4.20)
                                                        𝑎1                        𝑎 2
    Now with respect to 𝑚, we need to show that the given bound for 𝑚, (4.19), is large enough
for the different ERIP conditions required in the MRIP. Here we find it more convenient to return
to corollary 4.2.12. First we check the condition for the regime 𝜖 ≤ 𝑎 2 , over the different 𝑖 values
                  𝑖           𝑖
(𝑠, 𝜖) → (2𝑖 𝑠, 2 2 𝜖) when 2 2 𝜖 ≤ 1.
                                         𝑠                              2
                                                            2 𝑎 1 𝑠𝐾
                           𝑚 ≥ 𝑎0 𝐾 2       log(𝑒𝑁)     log  (            )(1 + 𝜌) →              (4.21)
                                       𝜖2                           𝜖
                                           2𝑖 𝑠                        𝑎 1 2𝑖 𝑠𝐾 2
                           𝑚 ≥ 𝑎0 𝐾 2 𝑖            log(𝑒𝑁) log2 (            𝑖
                                                                                    )(1 + 𝜌)      (4.22)
                                        (2 2 𝜖) 2                        (2 2 𝜖)
                                         𝑠                       𝑖 𝑎 1 𝑠𝐾
                                                                            2
                           𝑚 ≥ 𝑎 0 𝐾 2 2 log(𝑒𝑁) log2 (2 2                     )(1 + 𝜌)           (4.23)
                                       𝜖                               𝜖
                                  𝑖
    The transition from (2𝑖 𝑠, 2 2 𝜖) to (2𝑖 𝑠, 2𝑖 𝜖 2 ) happens when for 𝜖 < 1,
                                                 𝑖             𝑖       1
                                               22 𝜖 = 1 → 22 =                                    (4.24)
                                                                       𝜖
    So we get the bound for 𝑚 as
                                             𝑠                              2
                                                               2 𝑎 1 𝑠𝐾
                              𝑚 ≥ 𝑎0 𝐾 2        log(𝑒𝑁)    log    (            )(1 + 𝜌).          (4.25)
                                            𝜖2                         𝜖2
    Now we plug-in (2𝑖 𝑠, 2𝑖 𝜖 2 ) for the regime when 2𝑖 𝜖 2 > 1 to get
                                           2𝑖 𝑠                                   2
                                                                   2 𝑎 1 2 𝑠𝐾
                                                                            𝑖
                          𝑚 ≥ 𝑎0 𝐾 2               log(𝑒𝑁)   log     (              )(1 + 𝜌)      (4.26)
                                       (2𝑖 𝜖 2 ) 2                        (2𝑖 𝜖)
                                          𝑠                       𝑎 1 𝑠𝐾 2
                              ≥ 𝑎 0 𝐾 2 𝑖 4 log(𝑒𝑁) log2 (                  )(1 + 𝜌)              (4.27)
                                       2𝜖                             𝜖
and this bound for 𝑚 get smaller as 𝑖 grows so we can just work with the earlier one in (4.25).
                                                                                    𝑖
Now we work with the case for 𝜖 > 𝑎 2 . Again we plug in (2𝑖 𝑠, 2 2 𝜖) or (2𝑖 𝑠, 2𝑖 𝜖 2 ), and look for a
                                                         58


                                                             𝑖
combined upper bound. First substitute (2𝑖 𝑠, 2 2 𝜖) into (4.18)
                                2
                                  2⌈ 𝑎𝜖2 𝑠 ⌉                       2
                                                                      𝑎 1 𝐾 2 2⌈ 𝑎𝜖2 𝑠 ⌉
                      𝑚 ≥ 𝑎0 𝐾                    log(𝑒𝑁) log (                          )(1 + 𝜌) →            (4.28)
                                      𝑎 22                                   𝑎2
                                             𝑖                                           𝑖
                                  2⌈ 𝑎2 𝑖2 𝑠 ⌉                          𝑎 1 𝐾 2 2⌈ 𝑎2 𝑖2 𝑠 ⌉
                                        22 𝜖                                         22 𝜖
                      𝑚 ≥ 𝑎0 𝐾 2                     log(𝑒𝑁) log2 (                             )(1 + 𝜌)       (4.29)
                                        𝑎 22                                    𝑎2
                                              𝑖                                            𝑖
                                  2⌈ 𝑎2 2𝜖 𝑠 ⌉                           𝑎 1 𝐾 2 2⌈ 𝑎2 2𝜖 𝑠 ⌉
                                              2                                            2
                                2
                      𝑚 ≥ 𝑎0 𝐾                         log(𝑒𝑁) log2 (                              )(1 + 𝜌)    (4.30)
                                        𝑎 22                                      𝑎2
                   𝑖             1                   𝑖
Similar to above 2 2 is at most  𝜖  when 2 2 𝜖 ≤ 1 so we arrive at
                                   2
                                      2⌈ 𝑎𝜖22𝑠 ⌉                     2
                                                                         𝑎 1 𝐾 2 2⌈ 𝑎𝜖22𝑠 ⌉
                        𝑚 ≥ 𝑎0 𝐾                       log(𝑒𝑁) log (                          )(1 + 𝜌)
                                           𝑎 22                                 𝑎2
    Next we try plugging in (2𝑖 𝑠, 2𝑖 𝜖 2 ) and we arrive at
                                                𝑖                                              𝑖
                                   2⌈ (2𝑎2𝑖 𝜖22 )𝑠2 ⌉                      𝑎 1 𝐾 2 2⌈ (2𝑎2𝑖 𝜖22 )𝑠2 ⌉
                       𝑚 ≥ 𝑎0 𝐾 2                        log(𝑒𝑁) log2 (                               )(1 + 𝜌) (4.31)
                                          𝑎 22                                      𝑎2
                                       𝑎2 𝑠
                                2⌈ 𝜖 2 (2 𝑖 𝜖 2) ⌉                       𝑎 1 𝐾 2 2⌈ 𝜖 2 (2
                                                                                        𝑎2 𝑠
                                                                                           𝑖 𝜖 2) ⌉
                              2                                      2
                   𝑚≥    𝑎0 𝐾                        log(𝑒𝑁) log (                                   )(1 + 𝜌). (4.32)
                                      𝑎 22                                        𝑎2
We note that in this case by assumption 2𝑖 𝜖 2 ≥ 1 so we can drop it and get a stricter lower bound
for 𝑚 as
                                   2
                                     2⌈ 𝑎𝜖22𝑠 ⌉                      2
                                                                         𝑎 1 𝐾 2 2⌈ 𝑎𝜖22𝑠 ⌉
                        𝑚 ≥ 𝑎0 𝐾                     log(𝑒𝑁) log (                           )(1 + 𝜌).         (4.33)
                                          𝑎 22                                 𝑎2
So overall for 𝜖 ≤ 𝑎 2 and 𝜖 > 𝑎 2 we respectively need
                                                   𝑠                              2
                                                                       2 𝑎 1 𝑠𝐾
                             𝑚 ≥ 𝑎0 𝐾 2                 log(𝑒𝑁)   log   (           )(1 + 𝜌)
                                                  𝜖2                         𝜖2
and
                                   2
                                     2⌈ 𝑎𝜖22𝑠 ⌉                      2
                                                                         𝑎 1 𝐾 2 2⌈ 𝑎𝜖22𝑠 ⌉
                        𝑚 ≥ 𝑎0 𝐾                     log(𝑒𝑁) log (                           )(1 + 𝜌).         (4.34)
                                          𝑎 22                                 𝑎2
We can combine them both into one expression with as claimed in the proposition.
                                                                                                         !
                   2 1      1      1                            2             2 1          1          1
        𝑚 ≥ 2𝑎 0 𝐾 ( 2 + 2 + 3 )𝑠 log(𝑒𝑁) log 2𝑎 1 𝑠𝐾 ( 2 +                                       + ) (1 + 𝜌). (4.35)
                      𝜖     𝑎2 𝑎2                                                𝜖         𝑎 2 𝑎 22
                                                               59


                                                                          𝑎2 𝑠
                                      1                                 ⌈      ⌉
where for 𝜖 > 𝑎 2 , we used  ( 𝜖12 + 𝑎 22
                                           + 1
                                             𝑎 32
                                                  )𝑠 ≥    𝑠
                                                         𝑎 32
                                                              +  1
                                                                𝑎 22
                                                                     ≥    𝜖2
                                                                          𝑎22    .                       ■
     Having established the MRIP estimate for SORS matrices, we use the results of [29] to get a JL
map for compact manifolds. First we state the result in [29].
Theorem 4.2.15. Let 𝑇 ⊂ R𝑁 and suppose the matrix 𝐻 ∈ R𝑚×𝑁 obeys the multiresolution RIP
with sparsity and distortion levels
                                                                            𝜖 rad(𝑇)
                          𝑠 = 150(1 + 𝜂),                    𝛿=                                     (4.36)
                                                                  𝐶 max(rad(𝑇), 𝜔(𝑇))
with 𝐶 > 0 an absolute constant. Then, for a diagonal matrix 𝐷 with an i.i.d random sign pattern
on the diagonal, the matrix 𝐴 = 𝐻𝐷 obeys
                              sup || 𝐴𝑥|| 2 − ||𝑥|| 2 ≤ max(𝜖, 𝜖 2 ).rad(𝑇) 2                       (4.37)
                               𝑥∈𝑇
with probability at least 1 − exp(−𝜂). Here rad(𝑇) = sup𝑣∈𝑇 ||𝑣|| is the maximum of the Euclidean
norm of points inside 𝑇.                                                                                 ■
     Now we use our proposition 4.2.14 with theorem 4.2.15.
Theorem 4.2.16. Let 𝑀 ↩→ R𝑁 be a compact 𝑑-dimensional submanifold of R𝑁 with boundary
𝜕 𝑀. Let 𝜏𝑀 be the reach of 𝑀. Let 𝜏𝑖 be the reach of the 𝑖-th connected component of 𝜕 𝑀 as a
submanifold of R𝑁 . Let 𝜏 = inf 𝑀,𝑖 {𝜏𝑀 , 𝜏𝑖 }. Let 𝑉𝑀 be the volume of 𝑀 and 𝑉𝜕 𝑀 be the volume of
𝜕 𝑀. Let 𝑈 = 𝑈 (𝑀 − 𝑀) = { || 𝑝−𝑞  𝑝−𝑞|| | 𝑝 ≠ 𝑞, 𝑝, 𝑞 ∈ 𝑀 } be the set of unit secants of 𝑀, and let 𝑈 be
its closure. Let 𝜔(𝑈) be the Gaussian width of 𝑈. Define
                                                𝑉𝑀 41 𝑑 𝑉𝜕 𝑀 81 𝑑−1
                                         𝛼=          ( ) +                ( )                       (4.38)
                                                 𝜔𝑑 𝜏               𝜔 𝑑−1 𝜏
                                                  𝛼2
                                          𝑐 = ( + 3𝑑 𝛼)                                             (4.39)
                                                   2
                                                  √   √︁
                                     𝑤 ∗ = 4 2 log(𝑐) + 4𝑑                                          (4.40)
Let 𝜌, 𝜂 and 𝜖 be positive reals. Let 0 < 𝜖 < 1. Let 𝐴 be a SORS matrix of size 𝑚 × 𝑁 with constant
𝐾.
                                                              60


    There exist absolute constants 𝑎 0 , 𝑎 1 , 𝑎 2 and 𝐶 such that if
                                                                                                     !
               (𝐶𝑤    ∗2)     1      1                                         (𝐶𝑤  ∗2)    1      1
  𝑚 ≥ 𝑎0 𝐾 2         2
                           + 2 + 3 )(1 + 𝜂) log(𝑒𝑁) log2 (1 + 𝜂)𝑎 1 𝐾 2                  +    + ) (1 + 𝜌)
                   𝜖         𝑎2 𝑎2                                               𝜖2        𝑎 2 𝑎 22
                                                                                                           (4.41)
then for all 𝑥, 𝑦 ∈ 𝑀 we have
                                       ∥ 𝐴(𝑥 − 𝑦)∥ 2 − ∥𝑥 − 𝑦∥ 2 ≤ 𝜖 ||𝑥 − 𝑦|| 2                           (4.42)
with probability
                                                                                              !
                                                             𝑎0                  𝑎1
                  1 − exp(−𝜂) − 2 exp log(log(𝑁)) −            2
                                                                 log(𝑒𝑁) log2 ( )(1 + 𝜌) .                 (4.43)
                                                             𝑎1                  𝑎2
    Proof. We use proposition 4.2.14 with theorem 4.2.15. We consider 𝑈 (𝑀 − 𝑀) ⊂ 𝑆 𝑁−1 . We
have 𝑤 ∗ as an upper bound for the Gaussian width of 𝑈 (𝑀 − 𝑀). Therefore to get an 𝜖 JL map
for 𝑈 (𝑀 − 𝑀) we need to establish           √1 𝐴 has the MRIP property for (𝑠, 𝛿) where       𝑠 determines the
                                              𝑚
                                   𝜖 rad(𝑈 (𝑀−𝑀))
probability and 𝛿 =       𝐶 max(rad(𝑈 (𝑀−𝑀)),𝜔(𝑈 (𝑀−𝑀))) . We have rad(𝑈 (𝑀 − 𝑀)) = 1,          and 𝑤 ∗ > 1, and
               𝜖
we get 𝛿 =   𝐶𝑤 ∗ .
    To get such an MRIP we need
                                                                                                                  !
               (𝐶𝑤∗)    2     1       1                                                      (𝐶𝑤∗)  2    1     1
𝑚 ≥ 2𝑎 0 𝐾 2 (             + 2 + 3 )150(1 + 𝜂) log(𝑒𝑁) log2 2𝑎 1 (150)(1 + 𝜂)𝐾 2 (                     +     + ) (1 + 𝜌).
                   𝜖2         𝑎2 𝑎2                                                              𝜖2      𝑎 2 𝑎 22
                                                                                                           (4.44)
and the success probability is
                                                                                        !
                                                        𝑎0                𝑎 1
                          1 − 2 exp log(log(𝑁)) − 2 log(𝑒𝑁) log2 ( )(1 + 𝜌) .                              (4.45)
                                                        𝑎1                𝑎2
Going from MRIP to an 𝜖-JL map has a failure probability of its own as exp(−𝜂). Combining the
failure probabilities we arrive at
                                                                                              !
                                                             𝑎0                  𝑎 1
                  1 − exp(−𝜂) − 2 exp log(log(𝑁)) − 2 log(𝑒𝑁) log2 ( )(1 + 𝜌) .                            (4.46)
                                                             𝑎1                  𝑎2
                                                                                                               ■
                                                         61


                                              CHAPTER 5
      NECESSARY CONDITIONS FOR JL MAPS OF MANIFOLDS (VARIATION 2)
In this section we give a necessary condition for existence of JL maps for compact manifolds.
Earlier work of Alon in [5, Theorem 9.3.] provides a necessary condition for a JL map of finite
points. Our work is a partial generalization of that bound to manifolds. Our method is based on
the Sudokov inequality [35, Theorem 7.4.1] which bounds the Gaussian width from below using
covering numbers. Complementing with Sudakov-Fernique’s comparison theorem [35, Def 7.2.11],
we control the reduction in the Gaussian width under the JL map. The image of the manifold under
the map must fit in the target space and comparison of the Gaussian widths of the unit sphere of the
target space with the Gaussian width of the manifold gives us the desired necessary condition. Our
main theorem in this section is 5.0.5. After that we consider a reduction from manifolds to finite
points, and compare our work to the lower bound in [5, Theorem 9.3.].
    We start by reviewing the definitions and give estimates of the Gaussian width.
Definition 5.0.1. [35, Def 7.5.1] Let 𝑔 be a standard Gaussian random variable in R𝑛 . Define the
Gaussian Width of 𝑇 ⊂ R𝑛 , 𝜔(𝑇), as follows.
                                           𝜔(𝑇) = E sup ⟨𝑔, 𝑥⟩
                                                        𝑥∈𝑇
Proposition 5.0.2. Let 𝐿 : R𝑛 → R𝑚 be a linear map, 𝑇 ⊂ R𝑛 with 𝑎 and 𝑏 positive reals such that
for 𝑥, 𝑦 ∈ 𝑇, 𝑎||𝑥 − 𝑦|| ≤ ||𝐿 (𝑥 − 𝑦)|| ≤ 𝑏||𝑥 − 𝑦||. Then 𝑎 𝜔(𝑇) ≤ 𝜔(𝐿(𝑇)) ≤ 𝑏 𝜔(𝑇).
    Proof. We use the Sudakov-Fernique’s comparison theorem [35, Def 7.2.11]. For 𝑡 ∈ 𝑇, define
the Gaussian processes 𝐴𝑡 = 𝑎 ⟨𝑔𝑛 , 𝑡⟩, 𝐵𝑡 = ⟨𝑔𝑚 , 𝐿𝑡⟩, 𝐶𝑡 = 𝑏 ⟨𝑔𝑛 , 𝑡⟩. For their increments we have
the estimates
                      E( 𝐴𝑡1 − 𝐴𝑡2 ) 2 = 𝑎 2 E(⟨𝑔𝑛 , 𝑡1 − 𝑡 2 ⟩) 2 = 𝑎 2 ||𝑡1 − 𝑡2 || 2 ≤
                      E(𝐵𝑡1 − 𝐵𝑡2 ) 2 = E(⟨𝑔𝑚 , 𝐿(𝑡1 − 𝑡2 )⟩) 2 = ||𝐿(𝑡1 − 𝑡 2 )|| 2 ≤
                      E(𝐶𝑡1 − 𝐶𝑡2 ) 2 = 𝑏 2 E(⟨𝑔𝑛 , 𝑡1 − 𝑡2 ⟩) 2 = 𝑏 2 ||𝑡1 − 𝑡 2 || 2 .
                                                      62


By the comparison theorem E sup𝑡 𝐴𝑡 ≤ E sup𝑡 𝐵𝑡 ≤ E sup𝑡 𝐶𝑡 . This implies the result as E sup𝑡 𝐴𝑡 =
𝑎 E sup𝑡 ⟨𝑔𝑛 , 𝑡⟩ = 𝑎𝜔(𝑇) with similar expressions for 𝐵𝑡 and 𝐶𝑡 .                                       ■
    The next proposition gives the necessary condition on the target dimension given the requirement
of the Gaussian widths.
Proposition 5.0.3. Let 𝑀 ⊂ R𝑛 have Gaussian width 𝜔(𝑀), and 𝐿 be a linear map from R𝑛 to R𝑚 ,
and 0 < 𝜖 < 1. Assume for all 𝑥, 𝑦 ∈ 𝑀
                          (1 − 𝜖)||𝑥 − 𝑦|| ≤ ||𝐿(𝑥 − 𝑦)|| ≤ (1 + 𝜖)||𝑥 − 𝑦||.
Then
                                                            2
                                       1 1 − 𝜖 𝜔(𝑀)
                                        (      )                  ≤ 𝑚.
                                       2 1 + 𝜖 diam(𝑀)
    Proof. Consider the set 𝑀 − 𝑀 (defined as 𝑀 + (−𝑀)). By proposition 5.0.2,
                                 (1 − 𝜖)𝜔(𝑀 − 𝑀) ≤ 𝜔(𝐿 (𝑀 − 𝑀))
    We have 𝜔(𝑀 − 𝑀) = 21 𝜔(𝑀) [35, proposition 7.5.2]. Image of 𝐿 (𝑀 − 𝑀) is contained in a
                                                                     √
ball in R𝑚 with radius (1 + 𝜖)( diam(𝑀)). Since 𝜔(𝐵𝑚 (1)) = 𝑚 [35, 7.5.7], then by monotonicity
                                                                         √
of Gaussian width and scaling 𝜔(𝐿 (𝑀 − 𝑀)) ≤ (1 + 𝜖)diam(𝑀) 𝑚. Then
                    1                                                                        √
           (1 − 𝜖) 𝜔(𝑀) = (1 − 𝜖)𝜔(𝑀 − 𝑀) ≤ 𝜔(𝐿 (𝑀 − 𝑀)) ≤ (1 + 𝜖) diam(𝑀) 𝑚.
                    2
                                                                                                         ■
    In the next proposition we combine the Sudakov inequality with the lower covering estimates
in corollary 2.4.2. When we optimize the Sudakov inequality over the choice of covering radii, we
arrive at two possible regimes.
Proposition 5.0.4. Let 𝑀 be a d-dimensional smooth submanifold of R𝑛 possibly with boundary,
                                                           𝜋 𝑑/2
with volume 𝑉 and reach 𝜏. Let 0 < 𝜖 ≤ 2𝜏 . Let 𝜔 𝑑 =     Γ( 𝑑2 +1)
                                                                    be the volume of the unit d-ball. There
                                               2    𝑉 𝑑1
there is a universal constant 𝑐 such that, if  √
                                              3 𝑒
                                                  ( 𝜔 𝑑
                                                        ) ≤ 𝜏 then
                                                    63


                                             √︂
                                           𝑐    𝑑 𝑉 1
                                                   ( ) 𝑑 ≤ 𝜔(𝑀).
                                           3    2𝑒 𝜔 𝑑
              2    𝑉 𝑑1
     If 𝜏 ≤   √
             3 𝑒
                 ( 𝜔 𝑑
                       ) then
                                           √︄
                                         𝜏              𝑉
                                       𝑐      log(               ) ≤ 𝜔(𝑀).
                                         2         𝜔 𝑑 ( 32 𝜏) 𝑑
     Proof. We use Sudakov’s inequality [35, theorem 7.4.1].
                                            √︁
                                         𝑐𝜖 log(𝐶 (𝑀, 𝜖)) ≤ 𝜔(𝑀)
                                                                                         𝜏
Sudakov’s inequality is valid for any radius 𝜖. We further restrict to 𝜖 ≤               2 so we can use
    𝑉
𝜔 𝑑 (3𝜖) 𝑑
           ≤ 𝐶 (𝑀, 𝜖), corollary 2.4.2. We obtain
                                           √︄
                                                        𝑉
                                        𝑐𝜖 log(                  ) ≤ 𝜔(𝑀)                             (5.1)
                                                   𝜔 𝑑 (3𝜖) 𝑑
     Maximizing the left hand side with respect to 𝜖, for the optimal 𝜖 = 𝜖 ∗ , we obtain
                          √︄
                                     𝑉                     𝑑
                            log(            ) − √︃                       =0
                                 𝜔 𝑑 (3𝜖) 𝑑
                                                2 log( 𝑉 )  𝜔 𝑑 (3𝜖) 𝑑
                                                                          1     𝑉    1
                                                           log(3𝜖 ∗ ) =     log( ) −
                                                                          𝑑     𝜔𝑑   2
                                                                            1 𝑉 1
                                                                     𝜖∗ = √ ( ) 𝑑
                                                                          3 𝑒 𝜔𝑑
Using this optimal choice in equation (5.1),we get
                                             √︂
                                           𝑐 𝑑 𝑉 1
                                                   ( ) 𝑑 ≤ 𝜔(𝑀).
                                           3 2𝑒 𝜔 𝑑
     If the optimal 𝜖 ∗ is forbidden because of 𝜖 ≤ 2𝜏 , then we just use 𝜖 = 2𝜏 . It gives the inequality
                                           √︄
                                         𝜏              𝑉
                                       𝑐      log(               ) ≤ 𝜔(𝑀).
                                         2         𝜔 𝑑 ( 32 𝜏) 𝑑
For a valid formula we must have
                                                                𝑉
                                              0 ≤ log(                  )
                                                          𝜔 𝑑 ( 32 𝜏) 𝑑
                                                        𝑉
                                              1≤
                                                   𝜔 𝑑 ( 32 𝜏) 𝑑
                                                   2 𝑉 1
                                              𝜏 ≤ ( )𝑑                                                (5.2)
                                                   3 𝜔
                                                        64


Because we are considering the case where the optimal 𝜖 ∗ is forbidden, we have
                                           𝜏            1 𝑉 1
                                              < 𝜖∗ = √ ( ) 𝑑
                                           2           3 𝑒 𝜔𝑑
                                                        2 𝑉 1
                                                 𝜏 < √ ( )𝑑
                                                       3 𝑒 𝜔𝑑
This is enough for satisfying (5.2).                                                          ■
    Now we put together the requirement on the target dimension of a JL map, with the geometric
properties of the manifold and its Gaussian width.
Theorem 5.0.5. Let 𝑀 ⊂ R𝑛 with Gaussian width 𝜔(𝑀), 𝐿 a linear map from R𝑛 to R𝑚 , and
0 < 𝜖 < 1 and let 𝑐 represent multiple universal constants. Assume for all 𝑥, 𝑦 ∈ 𝑀
                         (1 − 𝜖)||𝑥 − 𝑦|| ≤ ||𝐿(𝑥 − 𝑦)|| ≤ (1 + 𝜖)||𝑥 − 𝑦||.
    2        1
If  √  ( 𝑉 )𝑑
   3 𝑒 𝜔𝑑
               ≤ 𝜏 then
                                      1−𝜖 2          𝑑        𝑉 2
                                   𝑐(        )       2
                                                            ( ) 𝑑 ≤ 𝑚.
                                      1 + 𝜖 diam (𝑀) 𝜔 𝑑
         2       1
If 𝜏 ≤   √  ( 𝑉 )𝑑
        3 𝑒 𝜔𝑑
                   then
                                   1−𝜖 2         1                 𝑉
                            𝑐𝜏 2 (       )        2
                                                         log(               ) ≤ 𝑚.
                                   1 + 𝜖 diam (𝑀)             𝜔 𝑑 ( 32 𝜏) 𝑑
    Proof. We use proposition 5.0.3 and proposition 5.0.4. We have a lower bound for the
embedding dimension 𝑚 using the distortion 𝜖 and the Gaussian width of 𝑀, 𝜔(𝑀). Then we have
a lower bound for the 𝜔(𝑀) using the covering numbers of 𝑀. Putting the two together we get the
following.
    2        1
If  √  ( 𝑉 )𝑑
   3 𝑒 𝜔𝑑
               ≤𝜏
                                                    √︂
                                                2𝑐 𝑑 𝑉 1
                                                          ( ) 𝑑 ≤ 𝜔(𝑀)
                                                 3 2𝑒 𝜔 𝑑
                                                              2
                                         1 1 − 𝜖 𝜔(𝑀)
                                           (       )               ≤𝑚
                                         2 1 + 𝜖 diam(𝑀)
                                    1−𝜖 2          𝑑        𝑉 2
                                𝑐(         )              (    ) 𝑑 ≤ 𝑚.
                                    1 + 𝜖 diam2 (𝑀) 𝜔 𝑑
                                                      65


              2       1
    If 𝜏 ≤    √ ( 𝑉 )𝑑
            3 𝑒 𝜔𝑑
                        then
                                                             √︄
                                                           𝜏                 𝑉
                                                        𝑐        log(                 ) ≤ 𝜔(𝑀)
                                                           2            𝜔 𝑑 ( 32 𝜏) 𝑑
                                                                                   2
                                                      1 1 − 𝜖 𝜔(𝑀)
                                                         (         )                    ≤𝑚
                                                      2 1 + 𝜖 diam(𝑀)
                                    1−𝜖 2             1                      𝑉
                             𝑐𝜏 2 (       )                      log(                 ) ≤ 𝑚.
                                    1 + 𝜖 diam (𝑀)     2
                                                                        𝜔 𝑑 ( 32 𝜏) 𝑑
                                                                                                  ■
Standard Examples: We apply our theorem 5.0.5 to the standard examples of unit sphere 𝑆 𝑑 and
unit ball 𝐵 𝑑 as submanifold of R𝑁 .
Corollary 5.0.6.
    For 𝐵 𝑑 , the data is as follows.
                                     𝑀     𝑉𝐵 𝑑            𝜏𝐵 𝑑      𝜔𝑑         diam(M)
                                                  𝑑
                                               𝜋2
                                     𝐵𝑑         𝑑
                                            Γ( 2 +1)
                                                           ∞         𝑉𝐵 𝑑      2
Since 𝜏 is infinite, we have
                                         1−𝜖 2                𝑑            𝑉 2
                                      𝑐(           )           2
                                                                        ( )𝑑 ≤ 𝑚
                                         1 + 𝜖 diam (𝑀) 𝜔 𝑑
                                                             1−𝜖 2
                                                         𝑐(          ) 𝑑≤𝑚
                                                             1+𝜖
    which agrees with our intuition that for a perfect, no distortion embedding of unit 𝐵 𝑑 , target
dimension grows at least linearly in 𝑑.
                                                                                                  ■
Corollary 5.0.7.
    For the unit S𝑑 the data is as follows.
                                   𝑀     𝑉𝑆 𝑑             𝜏𝑆 𝑑     𝜔𝑑             diam(M)
                                               𝑑+1                      𝑑
                                                2                     𝜋2
                                   𝑆𝑑    2 Γ(𝜋 𝑑+1  )
                                                          1        Γ( 𝑑2 +1)
                                                                                 2
                                                2
                                                               66


    We check the relation between reach and the other parameters to find that 𝜏 is big enough.
                                                                          ! 𝑑1
                                  2 𝑉 1               2      √ Γ( 𝑑2 + 1)
                                 √ ( ) = √ 2 𝜋𝑑
                               3 𝑒 𝜔𝑑               3 𝑒         Γ( 𝑑+1
                                                                     2 )
                               ! 𝑑1
                   √ Γ( 𝑑 + 1)
    Using lim 2 𝜋 2𝑑+1              = 1, for large enough 𝑑 we have
           𝑑→∞          Γ( 2 )
                                             2 𝑉 1
                                             √ ( ) 𝑑 < 𝜏 = 1.
                                           3 𝑒 𝜔𝑑
So we get
                                      1−𝜖 2           𝑑        𝑉 2
                                   𝑐(        )         2
                                                             ( )𝑑 ≤ 𝑚
                                      1 + 𝜖 diam (𝑀) 𝜔 𝑑
                                                            1−𝜖 2
                                                         𝑐(     ) 𝑑 ≤ 𝑚.
                                                            1+𝜖
This gives a linear dependence on 𝑑 which agrees which our intuition that to embed 𝑆 𝑑 without
distortion at least 𝑑 dimensions are needed. The reason is standard 𝑆 𝑑 circumscribes 𝑑 + 2 points
that are equidistant from each other and such points need at least R𝑑+1 for an embedding.
                                                                                                  ■
    Reduction to the case of finite points: Now we show that when we reduce to the zero-
                                                                                                 𝑛
dimensional case we recover the finite point case. Consider 𝑛 points in R𝑁 . There are at most 2
pair-wise secants. Normalizing them we get some points on the unit 𝑆 𝑁−1 . Call this set 𝑀. We
                                                                                               𝜏
apply Sudakov’s inequality to the set 𝑀. The covering number with balls of radius 𝜖 for 𝜖 <    2  is
                                                                                                   𝜏
exactly |𝑀 |. Using the counting measure we put 𝑉 = |𝑀 |. Then by Sudakov’s inequality for 𝜖 =     2
we get
                                                  𝜏 √︁
                                                𝑐      log(𝑉) ≤ 𝜔(𝑀)
                                          √︂      2
                                        𝜏          𝑛(𝑛 − 1)
                                      𝑐     log(             ) ≤ 𝜔(𝑀)
                                        2               2
                                                  𝜏 √︁
                                                𝑐 log(𝑛) ≤ 𝜔(𝑀).
                                                  4
                                                        67


     Using proposition 5.0.3
                                         √︁          !2                        2
                                       𝜏
                          1 1 − 𝜖 𝑐 4 log(𝑛)                    1 1 − 𝜖 𝜔(𝑀)
                             (      )                      ≤     (     )           ≤ 𝑚.                (5.3)
                          2 1 + 𝜖 diam(𝑀)                       2 1 + 𝜖 diam(𝑀)
     For finite points the reach 𝜏 is equal to half of the smallest distance between the points and
diameter is the largest distance hence          𝜏
                                              diam    ≤ 21 . Therefore this term won’t push the lower bound
up enough to contradict the known upper bounds. Simplifying equation (5.3) we arrive at
                                             𝜏 2 1−𝜖 2
                                       𝑐(         ) (           ) log(𝑛) ≤ 𝑚.
                                           diam 1 + 𝜖
     Comparing to the well known upper JL bound 𝑂 ( 𝜖12 ) log(𝑛), [35, Theorem 5.3.1], we see
that we recover the log(𝑛). A known lower bound for finite points from [5, Theorem 9.3.] is
          1
𝑂 ( 𝜖 2 log( 1 ) log(𝑛). Comparing to our bound, we have the same dependence on log(𝑛), however
               )
             𝜖
                         1−𝜖                          1
our 𝜖 dependence of      1+𝜖   is weaker than   𝜖 2 log( 𝜖1 )
                                                              .                                           ■
                                                              68


                                             CHAPTER 6
           A JL ALGORITHM WITH 𝑁 log(log(𝑁)) RUN-TIME (VARIATION 3)
In this chapter we give an algorithm for computing a JL map with a 𝑁 log(log(𝑁)) run-time. This
approach improves on the work of Ailon and Liberty in [2] for fast JL maps. First we present the
algorithm for finite point sets, and after we upgrade it to the case of subspaces. Since we restrict the
discussion to finite points and subspaces, our bounds don’t directly involve the reach as a parameter.
    Our algorithm is based on the divide and conquer approach, where we split a vector into pieces,
apply a common fast JL map to each piece, combine the outputs and apply a sub-Gaussian JL map at
the end. An schematic diagram for the algorithm in presented in figure 6.1. In this way we maintain
the best of two worlds: the run-time speed of fast JL maps and the optimal embedding dimension
of sub-Gaussian matrices. In lemma 6.0.1, we show the general framework for combining fast and
sub-Gaussian JL maps work together. Our main theorems are 6.1.5 for finite points and 6.2.5 for
the case of subspaces.
    Although our theorems are stated for general matrices with fast matrix-vector multiply, Discrete
Fourier Transform (DFT) and Welsh-Hadamard matrices are prime examples of matrices that have
the fast runtimes.
                                      C𝑁 be a set with 𝑛 points. Assume 𝑁 > 𝑚12 for integers 𝑚1 ≥ 𝑚2
                              
Lemma 6.0.1. Let 𝜖 ∈ 0, 13 , 𝑆 ⊂
where 𝑚 12 divides 𝑁. Furthermore
   1. Split every vector in 𝑆 into 𝑁/𝑚 12 pieces each in        C𝑚 2
                                                                   1 and obtain the set 𝑆′ ⊂      C𝑚 2
                                                                                                     1  of
       cardinality 𝑛𝑁/𝑚 12 .
   2. Let A ∈   C𝑚 ×𝑚
                    1
                        2
                        1 be an 𝜖-JL map of 𝑆′ into C𝑚 .1
   3. Suppose A has a fast matrix vector multiply so that Ay can be computed in 𝑚 12 · 𝑓 (𝑚 1 ) = 𝑜(𝑚 13 )-
       time for all y ∈  C𝑚 .
                            2
                            1
                                                   69


Figure 6.1: A vector is split into pieces, each part is processed with a fast JL map, outputs are
combined and fed into a Gaussian JL map. The Gaussian maps have optimal dimensional reduction
while fast JL maps have better run-time. In this mixed approach, we combine the benefits of both
methods.
   4. Let 𝑆′′ ⊂ C𝑁/𝑚1 be a set of 𝑛 points obtained by reshaping the 𝑁/𝑚 12 pieces of each vector in
         𝑆 back into a single vector after they’ve been mapped into C𝑚1 by A, and
   5. Let B ∈ C𝑚2 ×𝑁/𝑚1 be any 𝜖-JL map of 𝑆′′ into            C𝑚 .
                                                                 2
                    ©𝐴            ª
                    ­             ®
Then, 𝐿 (x) := B ­
                    ­     . . .   ® will be a 3𝜖-JL map of 𝑆 ⊂ C𝑁 into C𝑚2 . Furthermore, 𝐿 can be
                                  ®
                    ­             ®
                    ­             ®
                                𝐴
                    «             ¬
applied to any vector in just 𝑁 · 𝑓 (𝑚 1 )-time.
    Proof. For 𝑥 ∈ 𝑆, let 𝑥𝑖 , 1 ≤ 𝑖 ≤ 𝑚𝑁2 be its pieces in 𝑆′. Let 𝑥 ′ ∈ 𝑆′′ be the concatenated vector
                                             1
      
 𝐴𝑥 1 
      
      
 𝐴𝑥  . Then ∥𝑥∥ 2 = Í𝑖 ∥𝑥𝑖 ∥ 2 , ∥𝑥 ′ ∥ 2 = Í𝑖 ∥ 𝐴𝑥𝑖 ∥ 2 , and
 2
      
 .. 
 . 
      
                                   (1 − 𝜖)∥𝑥𝑖 ∥ 2 ≤ ∥ 𝐴𝑥𝑖 ∥ 2 ≤ (1 + 𝜖)∥𝑥𝑖 ∥ 2
                                   (1 − 𝜖)∥𝑥 ′ ∥ 2 ≤ ∥𝐵𝑥 ′ ∥ 2 ≤ (1 + 𝜖)∥𝑥 ′ ∥ 2
                                                        70


Therefore we get the claimed 3𝜖-JL map as follows
              (1 − 3𝜖)∥𝑥∥ 2 ≤ (1 − 𝜖) 2 ∥𝑥∥ 2 ≤ ∥𝐵𝑥 ′ ∥ 2 ≤ (1 + 𝜖) 2 ∥𝑥∥ 2 ≤ (1 + 3𝜖)∥𝑥∥ 2 .
For the time complexity, the number of required operations is
                                 𝑁       2                  𝑁
                                      (𝑚 1 𝑓 (𝑚 1 )) +  𝑚 2    = O (𝑁 𝑓 (𝑚 1 )).
                                 𝑚 12                       𝑚1
                       𝑁
    The first term is 𝑚 12
                           applications of matrix 𝐴, and the second term is a single application of
matrix 𝐵.                                                                                          ■
6.1     Case of Finite Points
In this section we review the the necessary theorems for JL maps for finite points via sub-Gaussian
matrices and SORS matrices. For sub-Gaussian matrices, the main tools are the matrix deviation
inequality and an upper bound on the Gaussian width of 𝑛 points. For the SORS matrices, we don’t
have an equivalent matrix deviation inequality, hence we go through an indirect approach. First
an RIP estimate is derived for the SORS matrices, and then one considers a sequence of random
reflections by the coordinate axes, a theorem known as the Krahmer-Ward, to get a JL map.
6.1.1    JL Maps for Finite Points via sub-Gaussian Matrices
We specialize the matrix deviation inequality to a finite point set and use the Gaussian width bound
for finite points.
Lemma 6.1.1. Let 𝑥𝑖 be 𝑛 points in R𝑁 . Let 0 < 𝜖, 𝜌 < 1. Let 𝐴 be a 𝑚 × 𝑁 sub-Gaussian random
matrix with 𝐾 = max || 𝐴𝑖 || 𝜓2 . Then there exists a universal constant 𝐶 such that if
                                                                          
                                             2  4 −2                    2
                                    𝑚≥𝐶 𝐾 𝜖            log(𝑛) 1 + log( )                        (6.1)
                                                                        𝜌
then with probably at least 1 − 𝜌 the following expressions holds simultaneously for all 𝑥 ∈ 𝑇.
                                                       1
                              (1 − 𝜖)||𝑥|| 2 ≤ || √ 𝐴𝑥|| 2 ≤ (1 + 𝜖)||𝑥|| 2 .                   (6.2)
                                                        𝑚
                                                         71


      Proof. By normalizing the 𝑥𝑖 we can assume they are on the unit sphere. Next we use theorem
4.1.6 and we note that (6.2) is scaling invariant. If we let 𝑇 be the set of normalize 𝑥𝑖 , then by [35,
lemma 7.5.10], we have
                                                           √︁
                                               𝜔(𝑇) ≤ 𝐶 log(𝑛).                                        (6.3)
Then using (6.1) and adjusting the constants we get the desired bound. To get the squares in (6.2),
we note that for 0 < 𝜖 < 1, 1 − 2𝜖 ≤ (1 − 𝜖) 2 ≤ (1 + 𝜖) 2 ≤ 1 + 3𝜖.
6.1.2      JL Maps for Finite Points via SORS Matrices
We first review the Krahmer-Ward theorem that allows one to get a JL map for finite point via an
RIP estimate.
Theorem 6.1.2. (Krahmer-Ward) [18, theorem 9.36] Let 𝑥𝑖 be 𝑛 points in R𝑛 . Let 𝜌, 𝜖 ∈ (0, 1).
Let 𝐴 ∈ R𝑚×𝑁 be a matrix where its restricted isometry constant for sparsity 2𝑠, i.e. 𝛿2𝑠 , satisfies
𝛿2𝑠 ≤    𝜖
         4  for 𝑠 ≥ 16 log( 4𝑛
                             𝜌 ). Let 𝐷 be a diagonal matrix with ±1 variables on the diagonal. Then
with probability exceeding 1 − 𝜌 the following holds simultaneously for all 𝑥𝑖 .
                                  (1 − 𝜖)||𝑥𝑖 || 2 ≤ || 𝐴𝐷𝑥𝑖 || 2 ≤ (1 + 𝜖)||𝑥𝑖 || 2 .                 (6.4)
      Here we present an RIP estimate for SORS matrices that we can pair with the Krahmer-Ward
theorem.
Theorem 6.1.3. (RIP for SORS Matrices) [18, theorem 12.31]Let 𝐴 ∈ C𝑚×𝑁 be a SORS matrix
with constant 𝐾 ≥ 1. Let 𝜖 ∈ (0, 1). There is a universal constant such that if
                                            𝑚 ≥ 𝐶𝐾 2 𝜖 −2 𝑠 log4 (𝑁)                                   (6.5)
                                               3
then with probability at least 1 − 𝑁 − log       (𝑁) the restricted isometry constant of √1 𝐴 for sparsity 𝑠,
                                                                                          𝑚
𝛿 𝑠 , satisfies 𝛿 𝑠 ≤ 𝜖.
      Finally we present a finite point JL map for the SORS matrices.
                                                         72


Proposition 6.1.4. Let 𝑥𝑖 be 𝑛 points in C𝑁 . Let 𝐴 ∈ C𝑚×𝑁 be a SORS matrix with constant 𝐾.
There is a universal constant 𝐶 such that if
                                                              𝑛
                                      𝑚 ≥ 𝐶𝐾 2 𝜖 −2 log( ) log4 (𝑁),                                       (6.6)
                                                              𝜌
                                                   3
then with probability at least 1 − 𝜌 − 𝑁 − log       (𝑁)
                                                      1
                              (1 − 𝜖)||𝑥𝑖 || 2 ≤ || √ 𝐴𝐷𝑥𝑖 || 2 ≤ (1 + 𝜖)||𝑥𝑖 || 2 ,                       (6.7)
                                                      𝑚
where 𝐷 is a random diagonal matrix with ±1 on its diagonal with uniform distribution.
    Proof. We combine theorems 6.1.2 and 6.1.3; using the union bound, we get the claimed success
probability.                                                                                                  ■
6.1.3   Combining the Two Matrix Types
We combine the SORS and sub-Gaussian JL maps as in the scheme in lemma 6.0.1 to get our main
theorem for finite points.
Theorem 6.1.5. Let 𝑥𝑖 be 𝑛 points in R𝑛 . Let 𝜖, 𝜌 ∈ (0, 1). Then with probability at least
                     3
1 − 𝜌1 − 𝜌2 − 𝑁 − log  (𝑁) , one can find a linear map 𝐿 : R𝑁 → R𝑚 such that
                                 (1 − 𝜖)||𝑥𝑖 || 2 ≤ ||𝐿𝑥𝑖 || 2 ≤ (1 + 𝜖)||𝑥𝑖 || 2                          (6.8)
                                                                        
                                             log(𝑛)
                                 𝑚=O                  (1 + log(2/𝜌2 ))                                     (6.9)
                                               𝜖2
                                                                                                   
                                                                                               4
and a single 𝐿𝑥𝑖 can be computed with time complexity O 𝑁 log 𝜖 −2 log( 𝑛𝑁             𝑝1 ) log  (𝑁)    .
    Proof. We follow the in the scheme in lemma 6.0.1. Hence we have 𝐴 and 𝐵 such that
𝐴∈   C𝑚 ×𝑚
         1
             2
             1 and 𝐵 ∈ C𝑚2 ×𝑁/𝑚1 . The task is to find sufficient requirements for 𝑚 1 and 𝑚 2 , where
𝑚 1 will determine the runtime and 𝑚 2 will determine the final embedding dimension.
    First we determine 𝑚 1 . We need a 3𝜖 -JL map for          𝑛𝑁
                                                                𝑚 12
                                                                     points from the SORS matrix 𝐴. For this
by proposition 6.1.4, we need
                                                   9         𝑛𝑁
                                    𝑚 1 ≥ 𝐶𝐾 2      2
                                                       log( 2 ) log4 (𝑁).                                 (6.10)
                                                  𝜖         𝑚 1 𝜌1
                                                         73


Removing the reference of 𝑚 1 on the right hand side makes the required condition on 𝑚 1 stricter,
hence it is sufficient to have
                                                9         𝑛𝑁
                                  𝑚 1 ≥ 𝐶𝐾 2      2
                                                     log(     ) log4 (𝑁).                          (6.11)
                                                𝜖         𝜌 1
                                             3
The success probability is 1 − 𝜌1 − 𝑁 − log    (𝑁) .  For 𝑚 2 , we need 𝐵 to be a 3𝜖 -JL map of 𝑛 points .
Hence by lemma 6.1.1, we need
                                                9                        2
                                 𝑚2 ≥ 𝐶 2 𝐾 4       log(𝑛)(1   +  log(      ))                     (6.12)
                                               𝜖2                       𝜌2
with success probability at least 1 − 𝜌2 . So combining the two steps together and specializing to
DFT matrix that has a 𝑑 log(𝑑) runtime, we see that our algorithm has
                                                                                           
                                                                 log(𝑛)
                    Final embedding dimension: 𝑚 2 = O                     (1 + log(2/𝜌2 ))        (6.13)
                                                                   𝜖2
                                                                                        
                                                                  −2        𝑛𝑁       4
                    Runtime : O (𝑁 log(𝑚 1 )) = O 𝑁 log 𝜖 log(                  ) log (𝑁)          (6.14)
                                                                             𝑝1
                                                                   3
                    Success Probability: 1 − 𝜌1 − 𝜌2 − 𝑁 − log       (𝑁)
                                                                         .                         (6.15)
This finishes the proof.                                                                                ■
6.2     Upgrade for Subspaces
In this section, we repeat the process as in the previous case, but here we change our theorems
so that we can get a JL embedding of a subspace. By the scaling invariance of the JL condition
(1 − 𝜖)||𝑥|| ≤ ||𝐿𝑥|| ≤ (1 + 𝜖)||𝑥𝑖 ||, we can reduce the subspace problem to its corresponding unit
sphere.
    We have a theorem from [29, theorem 3.3] that provides a JL map for an arbitrary set in terms
of its Gaussian width. With this theorem we can get a JL embedding of a unit sphere in a subspace.
This theorem uses SORS matrices.
                                                      74


6.2.1   SORS Matrices for Subspaces
Theorem 6.2.1. [29, theorem 3.3] Let 𝑇 ⊂ R𝑁 and suppose 𝐴 ∈ R𝑚×𝑁 is selected from a SORS
distribution with a constant 𝐾. Then
                             sup ||| 𝐴𝑥|| 2 − ||𝑥|| 2 | ≤ max{𝜖, 𝜖 2 }rad(𝑇) 2                  (6.16)
                              𝑥∈𝑇
holds with probability at least 1 − 𝜌 as long as
                                                                            2
                                                                          𝜔 (𝑇)
                                                 2              max(1, rad(𝑇)  2)
                           𝑚 ≥ 𝐶𝐾 2 (1 + log( )) 2 log4 (𝑁)                       .             (6.17)
                                                 𝜌                     𝜖2
                                                                                                    ■
     Now we specialize to a unit sphere of a subspace using an estimate for the Gaussian width of a
unit sphere.
Corollary 6.2.2. Let 𝑇 be a 𝑑 dimensional affine subspace of R𝑁 . Assume 𝐴 ∈ R𝑚×𝑁 is selected
from a SORS distribution with a constant 𝐾. Suppose 0 < 𝜖, 𝜌 < 1 and
                                                          2             𝑑
                                  𝑚 ≥ 𝐶𝐾 2 (1 + log( )) 2 log4 (𝑁) 2                            (6.18)
                                                          𝜌             𝜖
Then with probability at least 1 − 𝜌, the following holds for all 𝑥 ∈ 𝑇
                                 (1 − 𝜖)||𝑥|| 2 ≤ || 𝐴𝑥|| 2 ≤ (1 + 𝜖)||𝑥|| 2 .                  (6.19)
     Proof. We use theorem 6.2.1. Since the equation (6.19) is scaling invariant, we can scale 𝑇 to
                                                                                               √
be a unit 𝑆 𝑑−1 . We have that the 𝜔(𝑆 𝑑−1 ), the Gaussian width of 𝑆 𝑑−1 , is bounded above by 𝑑 + 𝐶
for a universal constant [35, Example 7.5.7]. Using this fact and adjusting the constants the result
follows.                                                                                            ■
6.2.2   Sub-Gaussian Matrices for Subspaces
Now we state a JL theorem for subspace embeddings using sub-Gaussian matrices. The main tool
is the matrix deviation inequality [35, Chapter 9].
                                                       75


Corollary 6.2.3. Let 𝑇 be a 𝑑-dimensional affine subspace of R𝑁 . Let 𝐴 be a 𝑚 × 𝑁 sub-Gaussian
random matrix with 𝐾 = max || 𝐴𝑖 || 𝜓2 . Suppose
                                                             2     𝑑
                                      𝑚 ≥ 𝐶𝐾 4 (1 + log( )) 2 2 .                              (6.20)
                                                             𝜌 𝜖
Then with probability at least 1 − 𝜌, the following holds for all 𝑥 ∈ 𝑇
                                 (1 − 𝜖)||𝑥|| 2 ≤ || 𝐴𝑥|| 2 ≤ (1 + 𝜖)||𝑥|| 2 .                 (6.21)
     Proof. We use theorem 4.1.6. Again since (6.21) is scaling invariant we can reduce to the unit
                            √
sphere. Since 𝜔(𝑆 𝑑−1 ) ≤ 𝑑 + 𝐶 for a universal constant, we get
                                                √           √︃          2
                                         𝐶2𝐾 4      𝑑 +𝐶 +     log( 2𝜌 )
                                   𝑚≥                                                          (6.22)
                                                        𝜖2
adjusting the constants and using 3(𝑎 2 + 𝑏 2 + 𝑐2 ) ≥ (𝑎 + 𝑏 + 𝑐) 2 gives the desired result.      ■
6.2.3    Mixing the Matrix Types
We use the scheme in lemma 6.0.1 to combine the SORS and Gaussian matrices. We break the
vectors into smaller pieces, apply a fast JL map, recombine the pieces and then apply a sub-Gaussian
JL map.
To the prove the desired properties of our algorithm, we also need the to review the proposition
5.0.2 that lets us control the Gaussian width of a set after a JL map is applied to it.
Proposition 6.2.4. Let 𝐿 : R𝑛 → R𝑚 be a linear map, 𝑇 ⊂ R𝑛 with 𝑎 and 𝑏 positive reals such that
for 𝑥, 𝑦 ∈ 𝑇, 𝑎||𝑥 − 𝑦|| ≤ ||𝐿 (𝑥 − 𝑦)|| ≤ 𝑏||𝑥 − 𝑦||. Then 𝑎 𝜔(𝑇) ≤ 𝜔(𝐿(𝑇)) ≤ 𝑏 𝜔(𝑇).
     Now we are ready to present the proof of algorithm in the case of subspaces. The proof strategy
first uses the properties of SORS matrices for the pieces of an input vector, then we use proposition
6.2.4 to control the Gaussian width of the image of the set after the SORS matrix has been applied.
This in turn allows us to apply the matrix deviation inequality as the final step.
                                                     76


Theorem 6.2.5. Let 𝑇 be a 𝑑-dimensional affine subspace of R𝑁 . Let 0 < 𝜖, 𝜌1 , 𝜌2 < 1 such that
𝜌1 + 𝜌2 ≤ 1 and 𝐾 ≥ 1. Let 𝑚 1 and 𝑚 2 be integers such that
                                                       2𝑁 2 4              𝑑
                               𝑚 1 ≥ 𝐶𝐾 2 (1 + log(       2
                                                            )) log (𝑁) 2                                (6.23)
                                                     𝜌1 𝑚 1                𝜖
                                     𝐶𝐾 4 (1 + 𝜖) 2 𝑑 (1 + log( 𝑝22 ))
                               𝑚2 ≥                                                                     (6.24)
                                                   𝜖2
Then with probability at least 1 − 𝜌1 − 𝜌2 one can find a linear map 𝐿 : R𝑁 → R𝑚2 such that for
all 𝑥, 𝑦 ∈ 𝑇
                         (1 − 𝜖)||𝑥 − 𝑦|| 2 ≤ ||𝐿(𝑥 − 𝑦)|| 2 ≤ (1 + 𝜖)||𝑥 − 𝑦|| 2                       (6.25)
                                                                                                          
Furthermore, for 𝑥 ∈ 𝑇, the 𝐿𝑥 can be computed with a O 𝑁 log 𝐾 2 (1 + log( 𝜌2𝑁𝑚 2 )) 2 log4 (𝑁) 𝜖𝑑2
                                                                                           1  1
run-time.
    Proof. Following the scheme in lemma 6.0.1, we have matrices 𝐴 and 𝐵 such that A ∈                 C𝑚 ×𝑚
                                                                                                          1
                                                                                                             2
                                                                                                             1
and B ∈ C𝑚2 ×𝑁/𝑚1 . Our goal is to find sufficient conditions on 𝑚 1 and 𝑚 2 to satisfy the required JL
properties.
    The projection of 𝑇 onto a 𝑚 12 dimensional subspace is another subspace of dimension at most
                                                                                                 √
𝑑. The Gaussian width of the unit sphere in this subspace is bounded above by 𝑑 + 𝐶 for a
                                                                   𝑁
universal constant C. Now we apply corollary 6.2.2, for 3𝜖 ,      𝑚 12
                                                                       different times since there are that
                                                                                   𝜌1 𝑚 12
many different projections. Applying each with a success probability 1 −            𝑁 ,     and combining the
failure probabilities using the union bound we get the success probability of 1 − 𝜌1 . Therefore we
need
                                                      2𝑁 2 4               𝑑
                              𝑚 1 ≥ 𝐶𝐾 2 (1 + log(        2
                                                            )) log (𝑁) 2 .                              (6.26)
                                                     𝜌1 𝑚 1               𝜖
For a universal constant 𝐶, where 𝐾 is the constant associated to the SORS matrix 𝐴.
                                                                                       𝑁
                                                                                           ×𝑁
    Now we consider the block diagonal matrix 𝐶 = diag( 𝐴) where 𝐶 ∈ R 𝑚1                     . For any 𝑥 ∈ 𝑇,
since matrix 𝐴 distorts each individual projection of 𝑥 at must by 1 ± 𝜖, then same is true for 𝑥.
Now we apply proposition 6.2.4 to the unit sphere of 𝑇, call it 𝑈 (𝑇). Hence we can estimate the
                                                                                       √
Gaussian width of the image of 𝑈 (𝑇) from above as 𝜔(𝐶 (𝑈 (𝑇))) ≤ (1 + 𝜖)( 𝑑 + 𝐶 ′).
                                                   77


   Now we can apply the matrix deviation inequality, theorem 4.1.5. With probability at least
1 − 𝑝 2 , we have
                                                   "                                                     #
                          √                                             √︁
       sup       ∥𝐵𝑥∥ 2 − 𝑚 2 ∥𝑥∥ 2 ≤ 𝐶 ′ 𝐾 2 𝑤(𝐶 (𝑈 (𝑇))) + log(2/𝑝 2 ) ·                  sup     ∥𝑥∥ 2 , (6.27)
   𝑥∈𝐶 (𝑈 (𝑇))                                                                         𝑥∈𝐶 (𝑈 (𝑆))
                             √                         h          √             √︁                       i
           sup      ∥𝐵𝑥∥ 2 − 𝑚 2 ∥𝑥∥ 2 ≤ 𝐶 ′ 𝐾 2 (1 + 𝜖)( 𝑑 + 𝐶 ′) + log(2/𝑝 2 ) · (1 + 𝜖) .                (6.28)
        𝑥∈𝐶 (𝑈 (𝑇))
Rewriting, we get that for 𝑥 ∈ 𝐶 (𝑈 (𝑇))
                                                      h √              √︁            i
                                    𝐶 ′ 𝐾 2 (1                    ′
                                               + 𝜖) ( 𝑑 + 𝐶 ) + log(2/𝑝 2 )
                           ∥𝑥∥ 2 −                         √
                                                             𝑚2
                                  1
                            ≤ ∥ √ 𝐵𝑥∥ 2
                                  𝑚2
                                                         h √               √︁           i
                                        𝐶 ′ 𝐾 2 (1                   ′
                                                    + 𝜖) ( 𝑑 + 𝐶 ) + log(2/𝑝 2 )
                            ≤ ∥𝑥∥ 2 +                          √                          .
                                                                  𝑚2
   Now Since 𝐶 is an 𝜖-JL map of a 𝑈 (𝑇) we have that
                                             1 − 𝜖 ≤ ∥𝑥∥ 2 ≤ 1 + 𝜖 .
Therefore we get
                                              h √               √︁             i
                             𝐶 ′ 𝐾 2 (1 + 𝜖) ( 𝑑 + 𝐶 ′) + log(2/𝑝 2 )                  1
                     1−𝜖 −                           √                            ≤ ∥ √ 𝐵𝑥∥ 2
                                                        𝑚2                             𝑚2
                                                                 h √              √︁            i
                                                  ′   2                      ′
                                                𝐶 𝐾 (1 + 𝜖) ( 𝑑 + 𝐶 ) + log(2/𝑝 2 )
                                  ≤ 1+𝜖 +                              √                          .
                                                                         𝑚2
Finally as long as
                                             𝐶 ′ 𝐾 4 (1 + 𝜖) 2 𝑑 (1 + log( 𝑝22 ))
                                     𝑚2 ≥                                                                   (6.29)
                                                              𝜖2
we get
                                                         1
                                        1 − 2𝜖 ≤ ∥ √ 𝐵𝑥∥ 2 ≤ 1 + 2𝜖
                                                         𝑚2
                                                          78


Adjusting the constants we get the desired bound. To summarize we have 3 estimates.
                                               𝐾 4 (1 + 𝜖) 2 𝑑 (1 + log( 𝑝22 ))
                                                                                !
        Final embedding dimension: 𝑚 2 = O                                        .      (6.30)
                                                              𝜖2
                                                                                      !!
                                                               2𝑁                   𝑑
        Runtime : O (𝑁 log(𝑚 1 )) = O 𝑁 log 𝐾 2 (1 + log(           )) 2 log4 (𝑁) 2 .    (6.31)
                                                              𝜌𝑚 12                 𝜖
        Success Probability: 1 − 𝜌1 − 𝜌2 .                                               (6.32)
                                            79


BIBLIOGRAPHY
      80


                                       BIBLIOGRAPHY
[1] Eddie Aamari, Jisu Kim, Frédéric Chazal, Bertrand Michel, Alessandro Rinaldo, and Larry
     Wasserman. Estimating the reach of a manifold. Electronic Journal of Statistics, 13(1):1359–
     1399, 2019. 12, 13
[2] Nir Ailon and Edo Liberty. Fast Dimension Reduction Using Rademacher Series on Dual
     BCH Codes. Discrete & Computational Geometry, 42(4):615–630, December 2009. 69
[3] Ralph Alexander and Stephanie Alexander. Geodesics in Riemannian manifolds-with-
     boundary. Indiana University Mathematics Journal, 30(4):481–488, 1981. Publisher: JSTOR.
     15
[4] Stephanie B. Alexander, I. David Berg, and Richard L. Bishop. The Riemannian obstacle
     problem. Illinois Journal of Mathematics, 31(1):167–184, March 1987. 16
[5] Noga Alon. Problems and results in extremal combinatorics—i. Discrete Mathematics,
     273(1-3):31–53, 2003. 62, 68
[6] Marcel Berger. A Panoramic View of Riemannian Geometry. Springer Berlin Heidelberg,
     Berlin, Heidelberg, 2003. 16, 38, 39
[7] R.L. Bishop and R.J. Crittenden. Geometry of manifolds. ISSN. Elsevier Science, 2011. 16
[8] Jean-Daniel Boissonnat, André Lieutier, and Mathijs Wintraecken. The reach, metric dis-
     tortion, geodesic convexity and the variation of tangent spaces. Journal of Applied and
     Computational Topology, 3(1-2):29–58, June 2019. 12, 15, 17
[9] Károly Böröczky, Jr. Finite Packing and Covering. Cambridge Tracts in Mathematics.
     Cambridge University Press, 2004. 19
[10] Simone Brugiapaglia, Sjoerd Dirksen, Hans Christian Jung, and Holger Rauhut. Sparse
     recovery in bounded riesz systems with applications to numerical methods for pdes. arXiv
     preprint arXiv:2005.06994, 2020. 9, 53, 54, 55
[11] E.J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction
     from highly incomplete frequency information. IEEE Transactions on Information Theory,
     52(2):489–509, 2006. 7
[12] E.J. Candes and T. Tao. Decoding by linear programming. IEEE Transactions on Information
     Theory, 51(12):4203–4215, 2005. 7
[13] Emmanuel J. Candes and Terence Tao. Near-optimal signal recovery from random projections:
     Universal encoding strategies? IEEE Transactions on Information Theory, 52(12):5406–5425,
     2006. 7
                                               81


[14] Kenneth L. Clarkson. Tighter bounds for random projections of manifolds. In Proceedings of
     the twenty-fourth annual symposium on Computational geometry - SCG ’08, page 39, College
     Park, MD, USA, 2008. ACM Press. 28
[15] Armin Eftekhari and Michael B. Wakin. New analysis of manifold embeddings and signal
     recovery from compressive measurements. Applied and Computational Harmonic Analysis,
     39(1):67–109, July 2015. 12, 16, 28, 29, 49, 52
[16] Herbert Federer. Curvature measures. Transactions of the American Mathematical Society,
     93(3):418–418, March 1959. 11, 12, 13, 15
[17] Werner Fenchel. On the differential geometry of closed space curves. Bulletin of the American
     Mathematical Society, 57(1. P1):44–54, 1951. 38, 39
[18] Simon Foucart and Holger Rauhut. A Mathematical Introduction to Compressive Sensing.
     Applied and Numerical Harmonic Analysis. Springer New York, New York, NY, 2013. 6, 9,
     33, 53, 54, 72
[19] Sylvestre Gallot, Dominique Hulin, and Jacques Lafontaine. Riemannian geometry, volume 2.
     Springer, 1990. 18, 26
[20] Walter Gautschi. Some elementary inequalities relating to the gamma and incomplete gamma
     function. J. Math. Phys, 38(1):77–81, 1959. 20
[21] Mohammad Ghomi and Serge Tabachnikov. Totally skew embeddings of manifolds. Mathe-
     matische Zeitschrift, 258(3):499–512, 2008. 31
[22] Alfred Gray. Tubes, volume 221. Birkhäuser, 2012. 12, 13, 23, 39
[23] Subhaneil Lahiri, Peiran Gao, and Surya Ganguli. Random projections of random manifolds.
     arXiv preprint arXiv:1607.04331, 2016. 28
[24] R. Lashof and S. Smale. On the Immersion of Manifolds in Euclidean Space. The Annals of
     Mathematics, 68(3):562, November 1958. 28
[25] S. Li. Concise Formulas for the Area and Volume of a Hyperspherical Cap. Asian Journal of
     Mathematics & Statistics, 4(1):66–70, December 2010. 19
[26] Martin Lotz. On the volume of tubular neighborhoods of real algebraic varieties. Proceedings
     of the American Mathematical Society, 143(5):1875–1889, December 2014. 23
[27] Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the Homology of Subman-
     ifolds with High Confidence from Random Samples. Discrete & Computational Geometry,
     39(1-3):419–441, March 2008. 15, 16, 37
[28] Barrett O’Neill. Semi-Riemannian geometry: with applications to relativity. Number 103 in
     Pure and applied mathematics. Academic Press, New York, 1983. 15
[29] Samet Oymak, Benjamin Recht, and Mahdi Soltanolkotabi. Isometric sketching of any set
     via the Restricted Isometry Property. Information and Inference: A Journal of the IMA,
     7(4):707–726, December 2018. 7, 9, 53, 54, 60, 74, 75
                                               82


[30] William F. Pohl. Some Integral Formulas for Space Curves and Their Generalization. Amer-
     ican Journal of Mathematics, 90(4):1321, October 1968. 28
[31] Takashi Sakai. Riemannian geometry. translations of mathematical monographs, vol. 149.
     American Mathematical Society, Providence, RI, pages 262–272, 1996. 15
[32] Shigeo Sasaki. On the differential geometry of tangent bundles of riemannian manifolds.
     Tohoku Mathematical Journal, Second Series, 10(3):338–354, 1958. 23
[33] Hukum Singh. On the cut locus and the focal locus of a submanifold in a riemannian manifold.
     ii. Publ. Inst. Math.(Beograd)(NS), 41(55):119–124, 1987. 15
[34] Christoph Thäle. 50 years sets with positive reach–a survey. Surveys in Mathematics and its
     Applications, 3:123–165, 2008. Publisher: University Constantin Brancusi. 12
[35] Roman Vershynin. High-dimensional probability: an introduction with applications in data
     science. Number 47 in Cambridge series in statistical and probabilistic mathematics. Cam-
     bridge University Press, Cambridge ; New York, NY, 2018. v, 1, 2, 3, 4, 5, 6, 17, 18, 19, 30,
     49, 50, 51, 62, 63, 64, 68, 72, 75
[36] Hermann Weyl. On the Volume of Tubes. American Journal of Mathematics, 61(2):461,
     April 1939. 23
[37] James H. White. Self-Linking and the Gauss Integral in Higher Dimensions. American
     Journal of Mathematics, 91(3):693, July 1969. 28
[38] James H. White. Self-linking and the directed secant span of a differentiable manifold. Journal
     of Differential Geometry, 5(3-4):357–369, 1971. 28
                                                83