ADVANCES IN NEAR-FIELD TO BLIND FAR-FIELD PTYCHOGRAPHY, AND COMPRESSED CLASSIFICATION FROM PHASELESS MEASUREMENTS By Mark Philip Roach A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Mathematics โ€“ Doctor of Philosophy 2023 ABSTRACT Chapter 1 initially concerns the introduction of Fourier Phase Retrieval. In many imaging systems, one can only measure the magnitude-square of the Fourier transform of the underlying signal, known as the power spectral density. At a large enough distance from the imaging plane, the measurements are given by the Fourier transform of the image. Thus for capturing images of large distances, optical devices essentially measure the Fourier transform magnitude of the object being imaged. The problem of reconstructing a signal from its Fourier magnitude is known as Fourier phase retrieval. This problem arises in many areas of engineering and applied physics, including optics, X-ray crystallography (determining the atomic and molecular structure of a crystal), astronomical imaging, speech processing, computational biology, etc. Also, in Chapter 1, we introduce the concept of Dimensionality Reduction ([99], [25], [59], [62]), which is a tool used in several disciplines, including statistics, data mining, pattern recognition, machine learning, artificial intelligence, and optimization. Dimensionality reduction refers to the act of transforming data from a high-dimensional space to a low-dimensional space. Chapter 2 discusses Fourier Ptychography, which is an imaging technique that involves sample being illuminated at different angles of incidence (effectively shifting the sampleโ€™s Fourier transform) after which a lens acts as a low-pass filter, thereby effectively providing localized Fourier information about the sample around frequencies dictated by each angle of illumination. Near-Field (Fourier) Ptychography (NFP) (see, e.g., [107, 108, 125]) occurs when the sample is placed at a short defocus distance having a large Fresnel number. We prove that certain NFP measurements are robustly invertible (up to an unavoidable global phase ambiguity) for specific Point Spread Functions (PSFs) and physical masks which lead to well-conditioned lifted linear systems. We then apply a block phase retrieval algorithm using weighted angular synchronization and prove that the proposed approach accurately recovers the measured sample for these specific PSF and mask pairs. Finally, we also propose using a Wirtinger Flow for NFP problems and numerically evaluate that alternate approach both against our main proposed approach, as well as with NFP measurements for which our main approach does not apply. We now move onto Chapter 3 concerning Blind Ptychography. Far-field Ptychography occurs when there is a large enough defocus distance (when the Fresnel number is โ‰ช 1) to obtain magnitude-square Fourier transform measurements. In an attempt to remove ambiguities, masks are utilized to ensure unique outputs to any recovery algorithm are unique up to a global phase. In Chapter 3, we assume that both the sample and the mask are unknown, and we apply blind deconvolutional techniques to solve for both. Numerical experiments demonstrate that the technique works well in practice, and is robust under noise. Finally, we have Chapter 4. Let M be a compact ๐‘‘-dimensional submanifold of R๐‘ with reach ๐œ and volume ๐‘‰M . Fix ๐œ– โˆˆ (0, 1). In this chapter, we prove that a nonlinear function ๐‘“ : R๐‘ โ†’ R๐‘š โˆš๐‘‘  2  ๐‘‰M exists with ๐‘š โ‰ค ๐ถ ๐‘‘/๐œ– log ๐œ such that (1 โˆ’ ๐œ–)โˆฅx โˆ’ yโˆฅ 2 โ‰ค โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 2 โ‰ค (1 + ๐œ–)โˆฅx โˆ’ yโˆฅ 2 holds for all x โˆˆ M and y โˆˆ R๐‘ . In effect, ๐‘“ not only serves as a bi-Lipschitz function from M into R๐‘š with bi-Lipschitz constants close to one, but also approximately preserves all distances from points not in M to all points in M in its image. Furthermore, the proof is constructive and yields an algorithm which works well in practice. In particular, it is empirically demonstrated herein that such nonlinear functions allow for more accurate compressive nearest neighbor classification than standard linear Johnson-Lindenstrauss embeddings do in practice. Furthermore, it is demonstrated that this approach works when the labelled data consists of NFP measurements. Dedicated to Holly (2004-2018), Gypsy (2007-2021), and Sparkle (2006-2022). iv ACKNOWLEDGEMENTS I would like to thank Mark Iwen for his guidance and support.I would like to thank Guoan Zheng for answering questions about (and providing code for) his work in [125]. I would like to thank Michael Perlmutter for his work and collaboration on the Near-field Ptychography chapter. My work was supported in part by NSF DMS 1912706. Work in Chapter 2 was first published in Sampling Theory, Signal Processing, and Data Analysis [Volume 21, Article 6, 2023] by Springer Nature ([50]). v TABLE OF CONTENTS LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii CHAPTER 1 INTRODUCTION TO PHASE RETRIEVAL AND DIMENSIONALITY REDUCTION . . . . . . . . . . . . . . . . . . 1 1.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 PHASE RETRIEVAL PROBLEM . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 DIMENSIONALITY REDUCTION . . . . . . . . . . . . . . . . . . . . . . . . . 12 CHAPTER 2 TOWARD FAST AND PROVABLY ACCURATE NEAR-FIELD PTYCHOGRAPHIC PHASE RETRIEVAL . . . . . . . . . . 16 2.1 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 PRELIMINARIES: PRIOR RESULTS FOR FAR-FIELD PTYCHOGRAPHY USING LOCAL MEASUREMENTS . . . . . . . . . . . . . 21 2.4 NEAR FROM FAR: GUARANTEED NEAR-FIELD PTYCHOGRAPHIC RECOVERY VIA FAR-FIELD RESULTS . . . . . . . . . . 25 2.5 ERROR ANALYSIS FOR ALGORITHM 2.1 . . . . . . . . . . . . . . . . . . . . 30 2.6 AN ALTERNATE APPROACH: NEAR-FIELD PTYCHOGRAPHY VIA WIRTINGER FLOW . . . . . . . . . . . . . . . . . . . 34 2.7 NUMERICAL SIMULATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.8 APPLICATION OF ALGORITHM 2.1 . . . . . . . . . . . . . . . . . . . . . . . . 40 2.9 CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . 41 CHAPTER 3 BLIND PTYCHOGRAPHY VIA BLIND DECONVOLUTION . . . . . . . 43 3.1 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3 FAR-FIELD FOURIER PTYCHOGRAPHY . . . . . . . . . . . . . . . . . . . . . 45 3.4 BLIND DECONVOLUTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5 BLIND PTYCHOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.6 CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . 72 CHAPTER 4 ON OUTER BI-LIPSCHITZ EXTENSIONS OF LINEAR JOHNSON-LINDENSTRAUSS EMBEDDINGS OF LOW-DIMENSIONAL SUBMANIFOLDS OF R๐‘ . . . . . . . . . . . . . 73 4.1 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3 NOTATION AND PRELIMINARIES . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 THE MAIN BI-LIPSCHITZ EXTENSION RESULTS AND THEIR PROOFS . . . 81 4.5 THE PROOF OF THEOREM 4.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.6 A NUMERICAL EVALUATION OF TERMINAL EMBEDDINGS . . . . . . . . 92 4.7 COMPRESSED CLASSIFICATION FROM PHASELESS MEASUREMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . 106 CHAPTER 5 CONTRIBUTIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . 109 vi BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 APPENDIX A NEAR-FIELD PTYCHOGRAPHY . . . . . . . . . . . . . . . . . . . . . 121 APPENDIX B FAR-FIELD PTYCHOGRAPHY . . . . . . . . . . . . . . . . . . . . . . 127 APPENDIX C BLIND DECONVOLUTION . . . . . . . . . . . . . . . . . . . . . . . . 133 vii LIST OF ABBREVIATIONS โ€ข Let x, m โˆˆ C๐‘‘ denote the specimen and mask โ€ข Far-field Ptychographic Measurements: |(F๐‘‘ (x โ—ฆ ๐‘† ๐‘˜ m))โ„“ | 2 โ€ข Point Spread Function (PSF): p โˆˆ C๐‘‘ โ€ข Near-field Ptychographic Measurements: |(p โˆ— (๐‘† ๐‘˜ m โ—ฆ x))โ„“ | 2 , โ€ข Indexing: [๐‘‘] = {1, 2, . . . , ๐‘‘}, [๐‘‘]0 = {0, 1, . . . , ๐‘‘ โˆ’ 1} โ€ข Support: ๐‘ ๐‘ข ๐‘ ๐‘(x) := {๐‘› โˆˆ [๐‘‘]0 | ๐‘ฅ ๐‘› ฬธ= 0} โ€ข Fourier Transform: F๐‘‘ - ๐‘‘ ร— ๐‘‘ DFT matrix โ€ข Reversal: ๐‘ฅหœ๐‘› := ๐‘ฅ โˆ’๐‘› ๐‘š๐‘œ๐‘‘ ๐‘‘ , โˆ€๐‘› โˆˆ [๐‘‘]0 โ€ข Shift Operator: (๐‘†โ„“ x)๐‘› = ๐‘ฅ (โ„“+๐‘›) ๐‘š๐‘œ๐‘‘ ๐‘‘ , โˆ€๐‘› โˆˆ [๐‘‘]0 โ€ข Circular Convolution : (x โˆ—๐‘‘ y)โ„“ := ๐‘›=0 P๐‘‘โˆ’1 ๐‘ฅ ๐‘› ๐‘ฆ (โ„“โˆ’๐‘›) ๐‘š๐‘œ๐‘‘ ๐‘‘ โ€ข Hadamard Product: (x โ—ฆ y)โ„“ := ๐‘ฅโ„“ ๐‘ฆ โ„“     โ€ข Decoupling Lemma: (x โ—ฆ ๐‘†โˆ’โ„“ y) โˆ—๐‘‘ (xฬƒยฏ โ—ฆ ๐‘†โ„“ yฬƒยฏ ) = (x โ—ฆ ๐‘†โˆ’๐‘˜ xฬ„) โˆ—๐‘‘ ( ๐‘ฆหœ โ—ฆ ๐‘† ๐‘˜ yฬƒยฏ ) ๐‘˜ โ„“ โ€ข Aฬ„ - complex Gaussian matrix, ๐ดยฏ ๐‘– ๐‘— โˆผ N (0, 1/2) + ๐‘–N (0, 1/2) โ€ข aโ„“ - โ„“-th column of Aโˆ— โ€ข B - first ๐พ columns of F๐‘‘ , bโ„“ - โ„“-th column of Bโˆ— ๐œŽ 2 ๐ฟ 02 ๐œŽ 2 ๐ฟ 02 โ€ข e - complex Gaussian vector, e โˆผ N (0, ๐ผ ๐‘‘ ) + ๐‘–N (0, ๐ผ๐‘‘ ) 2๐‘‘ 2๐‘‘ โ€ข A - linear operator: A(๐‘) := {bโ„“โˆ— ๐‘aโ„“ }โ„“=1 ๐‘‘ โˆˆ C๐‘‘ร—1 โ€ข A โˆ— - adjoint linear operator: A โˆ— (๐‘ง) := โ„“=1 ๐‘งโ„“ bโ„“ aโ„“โˆ— โˆˆ C๐พร—๐‘ P๐‘‘ โ€ข (h0 , x0 ) - underlying truth, ๐‘ฆ = A(h0 xโˆ—0 ) + e, ๐ฟ 0 = โˆฅh0 โˆฅ 2 = โˆฅx0 โˆฅ 2 โ€ข (u0 , v0 ) - initial estimate, (u๐‘ก , v๐‘ก ) - estimate during gradient descent โ€ข (h, x) - obtained estimate after Algorithm, ๐ฟ = โˆฅhโˆฅยทโˆฅxโˆฅ โˆฅhxโˆ— โˆ’ h0 xโˆ—0 โˆฅ ๐น ๐ฟโˆฅBh0 โˆฅ 2โˆž โ€ข ๐›ฟ = ๐›ฟ(z) = ๐›ฟ(h, x) := , ๐œ‡ โ„Ž - incoherence, ๐œ‡2โ„Ž = ๐ฟ0 โˆฅh0 โˆฅ 2 โˆš โˆš โ€ข ๐‘ ๐ฟ 0 := {(h, x) | โˆฅhโˆฅโ‰ค 2 ๐ฟ 0 , โˆฅxโˆฅโ‰ค 2 ๐ฟ 0 } โˆš โˆš โ€ข ๐‘ ๐œ‡ := {h | ๐‘‘ โˆฅBhโˆฅ โˆž โ‰ค 4 ๐ฟ 0 ๐œ‡}, ๐œ‡ โ„Ž โ‰ค ๐œ‡ viii 1 โ€ข ๐‘๐œ– := {(h, x) | โˆฅhxโˆ— โˆ’ h0 xโˆ—0 โˆฅ ๐น โ‰ค ๐œ– ๐ฟ 0 }, 0<๐œ– โ‰ค 15 หœ x) โ‰ค 1 ๐œ– 2 ๐ฟ 2 + โˆฅeโˆฅ 2 } โ€ข ๐‘ ๐นหœ = {(h, x) | ๐น(h, 3 0 โ€ข ๐น(h, x) := โˆฅA(hxโˆ— โˆ’ h0 xโˆ—0 ) + eโˆฅ 2 , ๐น0 (h, x) := โˆฅA(hxโˆ— โˆ’ h0 xโˆ—0 )โˆฅ 2 โ€ข ๐น(h, x) = โˆฅeโˆฅ 2 +๐น0 (h, x) โˆ’ 2๐‘…๐‘’(โŸจA โˆ— (e), hxโˆ— โˆ’ h0 xโˆ—0 โŸฉ) โ€ข ๐บ 0 (z) := max{๐‘ง โˆ’ 1, 0}2 = [๐‘ง โˆ’ 1]2+ , ๐œŒ โ‰ฅ ๐‘‘ 2 + 2โˆฅeโˆฅ 2 h  โˆฅhโˆฅ 2   โˆฅxโˆฅ 2   ๐‘‘|bโˆ— h| 2 i โ€ข ๐บ(h, x) := ๐œŒ ๐บ 0 โ„“ P๐‘‘ + ๐บ0 + โ„“=1 ๐บ 0 2๐ฟ 2๐ฟ 8๐ฟ๐œ‡2 โ€ข ๐น(h, หœ x) := ๐น(h, x) + ๐บ(h, x) โ€ข Tube: ๐‘ก๐‘ข๐‘๐‘’(๐›ฟ, M) := {x | โˆƒy โˆˆ M ๐‘ค๐‘–๐‘กโ„Ž โˆฅx โˆ’ yโˆฅ 2 โ‰ค ๐›ฟ} โ€ข Euclidean Ball: ๐ตโ„“๐‘2 (x, ๐›พ) := {y โˆˆ R๐‘ | โˆฅx โˆ’ yโˆฅ 2 < ๐›พ} โ€ข โˆ’๐‘† := {โˆ’x | x โˆˆ ๐‘†}, ๐‘† ยฑ ๐‘† := {x ยฑ y | x, y โˆˆ ๐‘†}, ๐‘ˆ(x) := x/โˆฅxโˆฅ 2 n o xโˆ’y โ€ข Unit Secants: ๐‘†๐‘‡ := ๐‘ˆ ((๐‘‡ โˆ’ ๐‘‡) \ {0}) = โˆฅxโˆ’yโˆฅ 2 | x, y โˆˆ ๐‘‡, x ฬธ = y . โ€ข ๐œ–-JL map: ๐ด โˆˆ C๐‘šร—๐‘ , ๐‘‡ โŠ‚ R๐‘ into C๐‘š , (1 โˆ’ ๐œ–)โˆฅxโˆฅ 22 โ‰ค โˆฅ ๐ดxโˆฅ 22 โ‰ค (1 + ๐œ–)โˆฅxโˆฅ 22 โ€ข ๐œ–-JL embedding: ๐ด โˆˆ C๐‘šร—๐‘› ๐œ–-JL map of ๐‘‡ โˆ’ ๐‘‡ := {x โˆ’ y | x, y โˆˆ ๐‘‡ } โ€ข Radius: ๐‘Ÿ๐‘Ž๐‘‘(๐‘‡) := ๐‘ ๐‘ข ๐‘ โˆฅxโˆฅ 2 ๐‘ฅโˆˆ๐‘‡ โ€ข Diameter: ๐‘‘๐‘–๐‘Ž๐‘š(๐‘‡) := ๐‘Ÿ๐‘Ž๐‘‘(๐‘‡ โˆ’ ๐‘‡) = ๐‘ ๐‘ข ๐‘ โˆฅx โˆ’ yโˆฅ 2 , ๐‘ฅ,๐‘ฆโˆˆ๐‘‡ โ€ข ๐›ฟ-cover: ๐›ฟ โˆˆ R+ , ๐‘† โŠ‚ ๐‘‡, โˆ€x โˆˆ ๐‘‡, โˆƒy โˆˆ ๐‘† such that โˆฅx โˆ’ yโˆฅ 2 โ‰ค ๐›ฟ. โ€ข ๐›ฟ-Covering Number: N (๐‘‡, ๐›ฟ) โˆˆ N, smallest achievable cardinality of a ๐›ฟ-cover of ๐‘‡ โ€ข Gaussian Width: ๐‘ค(๐‘‡) := E ๐‘ ๐‘ข ๐‘ โŸจg, xโŸฉ, g โˆˆ R๐‘ , i.i.d. ๐œ‡ = 0, ๐œŽ 2 = 1, Gaussian entries ๐‘ฅโˆˆ๐‘‡ โ€ข Reach: ๐‘† โŠ‚ R๐‘ , ๐œ๐‘† := sup {๐‘ก โ‰ฅ 0 | โˆ€x โˆˆ R๐‘› , ๐‘‘(x, ๐‘†) < ๐‘ก, x has unique closest point in ๐‘†} S nP ๐‘— o โ€ข Convex Hull: ๐‘† โŠ‚ C๐‘ , conv(๐‘†) := โˆž P๐‘— ๐‘—=1 ๐›ผ โ„“=1 โ„“ โ„“ x | xโ„“ โˆˆ ๐‘†, ๐›ผ โ„“ โˆˆ [0, 1], ๐›ผ โ„“=1 โ„“ = 1 โ€ข ๐œ–-Convex Hull Distortion: ๐‘† โŠ‚ R๐‘ , |โˆฅฮฆxโˆฅ 2 โˆ’โˆฅxโˆฅ 2 | โ‰ค ๐œ–, โˆ€x โˆˆ conv(๐‘†) ix CHAPTER 1 INTRODUCTION TO PHASE RETRIEVAL AND DIMENSIONALITY REDUCTION Figure 1.1 [100] An illustration of a conventional ptychography setup. 1.1 INTRODUCTION In many imaging systems, one can only measure the magnitude-square of the Fourier transform of the underlying signal, known as the power spectral density. For example, in an optical setting, detection devices like CCD (charge-coupled device) cameras and photosensitive films cannot measure the phase of a light wave only, instead measuring the photon flux (number of photons per second per unit area). At a large enough distance from the imaging plane, the measurements are given by the Fourier transform of the image. Thus for capturing images of large distances, optical devices essentially measure the Fourier transform magnitude of the object being imaged. However, structural content about the image is contained in the phase, so this important information is lost. The problem of reconstructing a signal from its Fourier magnitude is known as Fourier phase retrieval. This prob- lem arises in many areas of engineering and applied physics, including optics ([57], [36]), X-ray crystallography ([66],[121],[103],[24]), astronomical imaging ([116],[94],[106]), speech process- ing ([91], [42]) and computational biology, just to name a few. 1 Figure 1.2 [93] In x-ray crystallography, the goal is to gain an image of the positions of atoms within a molecule by illuminating a crystallized sample with x-rays. The molecular structure is then deduced from the pattern of the radiation diffracted by the sample. 1.2 PHASE RETRIEVAL PROBLEM To mathematically solve the phase retrieval problem, we first focus on the discretized one- dimensional setting, which can the be generalized. Definition 1.2.1. (Classical Phase Retrieval Problem) Let x โˆˆ C๐‘‘ be the underlying signal we wish to recover. In Fourier phase retrieval, the measurements are given by ๐‘‘โˆ’1 2 ๐‘ฅ ๐‘› ๐‘’ โˆ’2๐œ‹๐‘–๐‘˜๐‘›/๐‘ , โˆ‘๏ธ ๐‘ฆ๐‘˜ = ๐‘˜ โˆˆ [๐‘]0 , ๐‘ = 2๐‘‘ โˆ’ 1 (1.1) ๐‘›=0 Here we are over-sampling by a factor of two by twice the length of the signal. Our goal is to recover x. There are many challenges involved in solving the phase retrieval problem, as discussed in [7]. First among these, is the fact that the true signal x โˆˆ C๐‘‘ cannot be recovered uniquely. For instance, the rotation, translation, or conjugate reflection do not modify the Fourier magnitudes. Without additional constraints, the unknown signal will only be determined up to what are called classical ambiguities or unavoidable trivial ambiguities, which may not be of concern depending on the application. 2 There are also non-trivial ambiguities for the classical phase retrieval problem. For example โˆš โˆš ๐‘ฅ 1 = (1, 0, โˆ’2, 0, โˆ’2)๐‘‡ , ๐‘ฅ2 = ((1 โˆ’ 3), 0, 1, 0, (1 + 3))๐‘‡ (1.2) yield the same Fourier magnitudes ๐‘ฆ ๐‘˜ . We wish to categorise the number of non-trivial solutions, by exploring the relationship between the Fourier magnitudes and the autocorrelation measurements. Definition 1.2.2. Let x โˆˆ C๐‘‘ be the underlying signal with ๐‘ ๐‘ข ๐‘ ๐‘(๐‘ฅ) โІ [๐‘‘]0 . We define the autocorrelation measurements by ๐‘‘ โˆ‘๏ธ ๐‘Ž๐‘š = ๐‘ฅยฏ๐‘› ๐‘ฅ ๐‘›+๐‘š , โˆ’๐‘ + 1 โ‰ค ๐‘š โ‰ค ๐‘ โˆ’ 1 (1.3) ๐‘›=0 P๐‘‘โˆ’1 We consider the product of the polynomial ๐‘‹(๐‘ง) = ๐‘›=0 ๐‘ฅ ๐‘› ๐‘ง ๐‘› and the reversed polynomial หœ ๐‘‹(๐‘ง) ยฏ โˆ’1 ), where ๐‘‹ยฏ denote the polynomial with conjugate coefficients. Assuming that = ๐‘ง ๐‘‘โˆ’1 ๐‘‹(๐‘ง ๐‘ฅ[0], ๐‘ฅ[๐‘‘ โˆ’ 1] ฬธ= 0, we have that ๐‘‘โˆ’1 ๐‘‘โˆ’1 2๐‘‘โˆ’2 ๐‘ฅยฏโ„“ ๐‘ง โˆ’โ„“ = โˆ‘๏ธ โˆ‘๏ธ โˆ‘๏ธ หœ ๐‘‹(๐‘ง) ๐‘‹(๐‘ง) = ๐‘ฅ ๐‘› ๐‘ง ๐‘› ยท ๐‘ง ๐‘‘โˆ’1 ๐‘Ž ๐‘›โˆ’๐‘‘+1 ๐‘ง ๐‘› =: ๐ด(๐‘ง) (1.4) ๐‘›=0 โ„“=0 ๐‘›=0 where ๐ด(๐‘ง) is the autocorrelation polynomial of degree 2๐‘‘ โˆ’ 2. We can then rewrite the Fourier magnitude measurements ๐‘ฆ ๐‘˜ = ๐‘’ 2๐œ‹๐‘–๐‘˜(๐‘‘โˆ’1)/๐‘ ๐‘‹(๐‘’ โˆ’2๐œ‹๐‘–๐‘˜/๐‘ ) ๐‘‹(๐‘’ หœ โˆ’2๐œ‹๐‘–๐‘˜/๐‘ ) = ๐‘’ 2๐œ‹๐‘–(๐‘‘โˆ’1)/๐‘ ๐ด(๐‘’ โˆ’2๐œ‹๐‘–๐‘˜/๐‘ ) (1.5) so that the autocorrelation polynomial is completely determined by the 2d - 1 samples ๐‘ฆ ๐‘˜ . The phase retrieval problem is thus equivalent to the recovery of ๐‘‹(๐‘ง) from ๐ด(๐‘ง) = ๐‘‹(๐‘ง) ๐‘‹(๐‘ง). หœ Comparing the roots of ๐‘‹(๐‘ง) and ๐‘‹(๐‘ง), หœ we note that the roots of ๐ด(๐‘ง) occur in reflected pairs (๐›พ ๐‘— , ๐›พยฏ โˆ’1 ๐‘— ) with respect to the unit circle. The main problem in the recovery of ๐‘‹(๐‘ง) is deciding whether ๐›พ ๐‘— or ๐›พยฏ โˆ’1 ๐‘— is a root of ๐‘‹(๐‘ง). In [6], this approach is used to show that the number of non-trivial solutions is therefore bounded by 2๐‘‘โˆ’2 . 3 1.2.1 PHASE RETRIEVAL USING MASKS Figure 1.3 [12] An illustration of masked phase retrieval setup. We can adapt the phase retrieval problem by using masks, to eliminate some of the trivial and non-trivial ambiguities. There are several methods to applying this, some of which are listed below: (i) Masking: The phase front after the sample is modified by the use of a mask or a phase plate; (ii) Diffraction grating: The illuminating beam is modulated by the use of optical gratings; (iii) Oblique illuminations: The illuminating beam is modulated to hit the sample at specific angles. Figure 1.4 On the left ([87]) is an example of oblique illuminations, and on the right ([74]) is a diffraction grating which displaces the angle of the beam. The main area of research involving masked phase retrieval is split in two sectors, looking at 4 random masks or deterministic masks (i.e. masks that have been specifically chosen). Definition 1.2.3. (Phase Retrieval Using Masks) Let x โˆˆ C๐‘‘ denote the signal, {mโ„“ | mโ„“ โˆˆ C๐‘‘ , โ„“ โˆˆ [๐ฟ]0 }, ๐ฟ โ‰ฅ 2 denote the collection of masks. The masked phase retrieval measurements will be of the form ๐‘‘โˆ’1 2 ๐‘ฅ ๐‘› [mโ„“ ]๐‘› ๐‘’ โˆ’2๐œ‹๐‘–๐‘›๐‘˜/๐‘‘ , โˆ‘๏ธ ๐‘ฆ โ„“,๐‘˜ = ๐‘˜ โˆˆ [๐‘‘]0 , โ„“ โˆˆ [๐ฟ]0 . (1.6) ๐‘›=0 The goal is then to recover x. A further study of research is blind phase retrieval ([98], [1], [16], [17]), in which both the signal and mask are unknown, although some information may be known. In Chapters 2 and 3, we look at types of masked phase retrieval, with Chapter 3 in particular, looking at the blind variant. In both situations, the set of masks are generated by a shift operator, which we will discuss more in the next section. There is a constraint to the type of signal that we can recover. Definition 1.2.4. A signal x is said to be non-vanishing if ๐‘ฅ ๐‘› ฬธ= 0 for each ๐‘› โˆˆ [๐‘‘]. In [55], deterministic masks are considered instead of random masks and they show that two masks are sufficient for the convex relaxation of the problem to uniquely recover non-vanishing signals up to a global phase when over-sampled by a factor of two. 1.2.2 PHASELIFT In [12], the classical phase retrieval problem is reformulated as a matrix completion problem. First, we need to define the Fourier transform of a vector. Definition 1.2.5. The Fourier transform of x โˆˆ C๐‘‘ , denoted xฬ‚ โˆˆ C๐‘‘ , is defined component-wise via ๐‘‘โˆ’1 ๐‘ฅ ๐‘› ๐‘’ โˆ’2๐œ‹๐‘–๐‘›๐‘˜/๐‘‘ โˆ‘๏ธ ๐‘ฅห† ๐‘˜ := (F๐‘‘ x) ๐‘˜ = (1.7) ๐‘›=0 where F๐‘‘ โˆˆ C๐‘‘ร—๐‘‘ denotes the ๐‘‘ ร— ๐‘‘ discrete Fourier transform (DFT) matrix with entries (F๐‘‘ )โ„“,๐‘˜ = ๐‘’ โˆ’2๐œ‹๐‘–โ„“๐‘˜/๐‘‘ , โˆ€(โ„“, ๐‘˜) โˆˆ [๐‘‘]0 ร— [๐‘‘]0 (1.8) 5 Remark 1.2.1. With this definition, we can rewrite the masked phase retrieval problem as ๐‘ฆ โ„“,๐‘˜ = |(F๐‘‘ (๐‘ฅ โ—ฆ mโ„“ )) ๐‘˜ | 2 , ๐‘˜ โˆˆ [๐‘‘]0 , โ„“ โˆˆ [๐ฟ]0 . (1.9) Let X = xxโˆ— . For โ„“ โˆˆ [๐ฟ]0 , let Dโ„“ be the diagonal matrix with the mask mโ„“ on the diagonal and let f ๐‘˜โˆ— be the rows of the DFT matrix. We then have that the measurements in (1.6) can be written as ๐‘ฆ โ„“,๐‘˜ = |f ๐‘˜โˆ— Dโ„“โˆ— x| 2 , ๐‘˜ โˆˆ [๐‘‘]0 , โ„“ โˆˆ [๐ฟ]0 . (1.10) Let A : S ๐‘›ร—๐‘› โˆ’โ†’ R๐‘‘๐ฟ denote the linear operator with entries given by {A(X)}โ„“,๐‘˜ = ๐‘ก๐‘Ÿ(xโˆ— f ๐‘˜โˆ— Dโ„“โˆ— Dโ„“ f ๐‘˜ x) = ๐‘ก๐‘Ÿ(Dโ„“ f ๐‘˜ f ๐‘˜โˆ— Dโ„“โˆ— X), (1.11) where S ๐‘›ร—๐‘› is the space of self-adjoint matrices. Then the phase retrieval problem can be formulated as Find X, subject to A(X) = y, X โชฐ 0, rank(X) = 1. (1.12) When the measurements in (1.6) are injective, then this is equivalent to minimize rank(X), subject to A(X) = y, X โชฐ 0 (1.13) Due to the complexities of solving this problem, PhaseLift (Section 2.3, [12]) solves the convex surrogate, giving the semi-definite program minimize Trace(X), subject to A(X) = y, X โชฐ 0 (1.14) The result will follow from looking at random masks, where the diagonal matrices Dโ„“ for โ„“ โˆˆ [๐ฟ]0 are i.i.d copies of a matrix D, whose entries are i.i.d copies of a random variable ๐‘. These are 6 known as coded diffraction patterns. It is shown in [13], that the solution to the convex relaxation is exact, with high probability, provided that we have sufficiently many coded diffraction patterns. It is further shown in the theorem below that the feasible set of solutions is given by {X : X โชฐ 0, A(X) = y} = {xxโˆ— } (1.15) Before we get to the result, we need a restriction on the random variable ๐‘ which will allow us to recover x. Definition 1.2.6. We say that ๐‘ is admissible if (i) ๐‘ is symmetric; (ii) | ๐‘|โ‰ค ๐‘; (iii) E๐‘ = E๐‘ 2 = 0; (iv) E| ๐‘| 4 = 2E| ๐‘| 2 . We can now state the theorem for recovering x. Theorem 1.2.1. (Theorem 1.1, [13]) Suppose that the modulation is admissible (i.e. the random masks are generated from an admissible random variable) and that the number ๐ฟ of coded diffraction patterns obeys ๐ฟ โ‰ฅ ๐‘๐›พ๐‘™๐‘œ๐‘” 4 ๐‘‘, for some fixed numerical constant ๐‘. Then with probability at least 1 โˆ’ ๐‘›โˆ’๐›พ , the set of solutions to the convex relaxation reduces to xxโˆ— , and we thus recover x up to a global phase. In practice, physically generating these random masks with known entries is a hard task. In many settings, deterministic masks are more practical to use and used more in real world situations. The next section deals with these masks, in particular taking one mask and shifting it to generate a set of masks. 1.2.3 STFT PHASE RETRIEVAL STFT phase retrieval is similar to masked phase retrieval in that it looks at partially blocking the signal measurements so that you can attempt to recover the signal at a more local level. The key idea is to introduce redundancy in the magnitude-only measurements by maintaining a substantial overlap 7 between adjacent short-time sections of shifted masked measurements. In effect, we are taking a window and shifting it over time. This could involve physically shifting the specimen/window itself, or one could shift the beam to focus on a different local area of the specimen. First, we define the shift of a vector. Definition 1.2.7. Given โ„“ โˆˆ [๐‘‘]0 , denote the circulant shift operator ๐‘†โ„“ : C๐‘‘ โˆ’โ†’ C๐‘‘ component- wise via (๐‘†โ„“ x)๐‘› = ๐‘ฅ (โ„“+๐‘›) ๐‘š๐‘œ๐‘‘ ๐‘‘ , โˆ€๐‘› โˆˆ [๐‘‘]0 (1.16) Now we can introduce the STFT phase retrieval problem. Definition 1.2.8. (STFT Phase Retrieval Problem) Let x โˆˆ C๐‘‘ , w โˆˆ C๐‘‘ denote the signal and window respectively. Let mโ„“ = ๐‘†โ„“ w denote the โ„“-shift of the window for โ„“ โˆˆ [๐ฟ]0 . The STFT magnitude measurements will be of the form of the masked measurements from 1.6, where each of the masks is a shift of the original mask or window. Our goal is to recover x. In a similar manner as to before, we say that a window w is non-vanishing if ๐‘ค ๐‘› ฬธ= 0 for each ๐‘› โˆˆ [๐‘‘]. In [56], it is shown that up to a set of measure zero, non-vanishing signals are uniquely identifiable from their STFT magnitude measurements, up to a global phase, if the support of the signal is contained inside the support of the window, and as long as adjacent short-time sections overlap by any amount. In other research ([82], [28]), it is shown that all non-vanishing signals are uniquely identifiable from their STFT magnitude measurements, up to a global phase, for specific choices of w and ๐ฟ. In Chapters 2 and 3, we will explore a couple of ptychographic phase retrieval problems, which can be modeled as an STFT phase retrieval problems. 1.2.4 NOISE In phase retrieval, noise refers to how a signal can be modified in a way that alters the final result. This occurs at all stages of the system, from capture and storage, to processing and transmission. Any real world system is affected by some level of noise. In analog photography or video capture, 8 noise can come in the way of film grain, which is caused by the developing process of silver halide crystals dispersed in photographic emulsion ([39],[86]). In digital photography, noise can come in the way of compression artifacts that occur when the file is compressed to reduce file size ([23]). Background noise is a common occurrence in audio capture. In astronomy, it may result from cosmic background radiation, which is a faint glow of light occurring as a result of remnants from the Big Bang ([120], [8], [102]). Figure 1.5 [97] Example of an image (left) in which a replication of film grain has been digitally added (right). Additive White Gaussian Noise (AWGN) or simply additive noise is the basic noise model which will be applied in Chapters 2 and 3. This model assumes that for all our phase retrieval models, there will be added unknown Gaussian noise which forms part of the collected measurements, i.e. Y=X+N (1.17) where X are the "true" measurements, and N is the additive Gaussian noise. Both X and N are then assumed to be unknown with Y being the known collected measurements. For example, in the masked phase retrieval model, we would have that ๐‘ฆ โ„“,๐‘˜ = |(F๐‘‘ (๐‘ฅ โ—ฆ mโ„“ )) ๐‘˜ | 2 +๐‘ ๐‘˜,โ„“ , ๐‘˜ โˆˆ [๐‘‘]0 , โ„“ โˆˆ [๐ฟ]0 . (1.18) 9 where ๐‘ โˆˆ C๐‘‘ร—๐ฟ has complex Gaussian entries. Although the noise is unknown, what can be modelled is the recovery of the signal against varying levels of the noise, with the noise level being measured as relative to the signal. This can be measured via the signal-to-noise ratio (SNR), which is the ratio of the power of the signal, ๐‘ƒ๐‘  , ๐‘ƒ๐‘  to the power of the noise, ๐‘ƒ๐‘› . That is, ๐‘†๐‘ ๐‘… = . Thus we have that the higher the signal-to-noise ๐‘ƒ๐‘› ratio, the less presence of noise relative to the signal. Typically, it is measured in decibels using a base 10 logarithm, that is ๐‘ƒ  ๐‘  ๐‘†๐‘ ๐‘…๐‘‘๐‘ := 10 log10 (๐‘†๐‘ ๐‘…) = 10 log10 . (1.19) ๐‘ƒ๐‘› Figure 1.6 [97] Example of an image with varying levels of Gaussian noise applied. 1.2.5 CONDITION NUMBER OF A MATRIX In Chapter 2, we demonstrate a method for solving a phase retrieval problem involving rewriting the measurements as a matrix multiplication, for which we then invert the generated matrix. However, the presence of noise can affect this approach. To measure how much noise can effect the outcome, we require the following definition. 10 Definition 1.2.9. The condition number of a matrix A is defined by ๐œ… = ๐œ…(A) := โˆฅAโˆฅยทโˆฅAโˆ’1 โˆฅ (1.20) where โˆฅยทโˆฅ is the operator norm. In particular, if โˆฅยทโˆฅ is the โ„“ 2 norm, then ๐œŽ๐‘š๐‘Ž๐‘ฅ (๐ด) ๐œ…(A) := (1.21) ๐œŽ๐‘š๐‘–๐‘› (๐ด) where ๐œŽ๐‘š๐‘Ž๐‘ฅ (๐ด), ๐œŽ๐‘š๐‘–๐‘› (๐ด) are the maximal and minimal singular values respectively. A matrix with a low condition number is said to be well-conditioned, while a matrix with a high condition number is said to be ill-conditioned. We apply these same definitions to the problem or system involving matrices i.e. a system is well-conditioned if the resultant matrix is well-conditioned. Informally, the condition number measures how close a matrix is to being non-invertible. In practice, it measures the effect of perturbations. In our context, this perturbation will be the additive noise. To see where the condition number comes into play, suppose we have the system y = Ax which has been effected by noise such that y + ๐›ฟy = Ax + ๐›ฟAx = A(x + ๐›ฟx) (1.22) where ๐›ฟy is the noise which is relatively small compared to y (that is, it has a relatively large โˆฅyโˆฅ ๐‘†๐‘ ๐‘… = ). Then we have that โˆฅ๐›ฟyโˆฅ 1 โˆฅAโˆฅ โˆฅyโˆฅ= โˆฅAxโˆฅโ‰ค โˆฅAโˆฅยทโˆฅxโˆฅโ‡’ โ‰ค , (1.23) โˆฅxโˆฅ โˆฅyโˆฅ and โˆฅ๐›ฟxโˆฅ= โˆฅAโˆ’1 ๐›ฟyโˆฅโ‰ค โˆฅAโˆ’1 โˆฅยทโˆฅ๐›ฟyโˆฅ. (1.24) 11 Thus combining these inequalities, we get that the relative error between the noise distributed part of the signal, ๐›ฟx, and the true signal, x, is given by โˆฅ๐›ฟxโˆฅ โˆฅ๐›ฟyโˆฅ ๐œ… โ‰ค โˆฅAโˆฅยทโˆฅAโˆ’1 โˆฅยท = . (1.25) โˆฅxโˆฅ โˆฅyโˆฅ ๐‘†๐‘ ๐‘… Thus our hope for a successful recovery, at least to a given margin of error, relies on ๐‘†๐‘ ๐‘… being relatively large, and ๐œ… being relatively small. In Chapter 2, Lemma 2.4.2, we demonstrate a choice of matrix which allows for a well- conditioned system, and thus successful recovery of the signal. This is ensured by utilizing the bound of the condition number generated from [54], in which the maximal singular value is upper bounded whilst the minimal singular value is lower bounded. 1.3 DIMENSIONALITY REDUCTION Dimensionality reduction ([99], [25], [59], [62]) is a general data analysis tool used in several disciplines, including statistics, data mining, pattern recognition, machine learning, artificial intel- ligence, and optimization. Dimensionality reduction refers to the act of transforming data from a high-dimensional space to a low-dimensional space. The goal is to be able to remove irrelevant or redundant data, and to be able to reduce the computational cost while still retaining meaningful properties of the original data. Dimension in this context can refer to many things, such as at- tributes, variables, features, pixels, etc. Each application utilizes different dimension reduction techniques. In pattern recognition for example, the problem of dimensionality reduction is to extract a subset of features that recovers most of the variability of the data. In text mining, the problem is defined as selecting a subset of words or terms. 12 Figure 1.7 [97] Example of dimensionality reduction effect over a 3-dimensional spherical shell manifold. The resulting 2-dimensional embedded data is the attempt to unfold the original data. 1.3.1 K-NEAREST NEIGHBORS CLASSIFICATION Suppose we vectorize training data in ๐‘‘-dimensions in such a way that the concept of distance between two points in R๐‘‘ makes sense within the problem, and that data is classified into multiple different classes. The goal of the ๐‘˜-nearest neighbours algorithm (k-NN) is to identify a new image by applying the same vectorization and classifying based on the preset classification of its k-nearest neighbours in R๐‘‘ . Generally, ๐‘˜ is chosen by testing and evaluating the results for different values. Figure 1.8 [48] Example of the k-nearest neighbors algorithm being applied on embedded data. The left figure demonstrates the data that is being attempted to be categorized. The central and right figure demonstrates the kNN algorithm being applied with ๐‘˜ = 3. 1.3.2 JOHNSON-LINDENSTRAUSS LEMMA Algorithms, for example, performing k-nearest neighbour classification, can be computational expensive in high dimensions. Thus it is often advantageous to reduce the dimension of the data before carrying out such tasks. To ensure that applying dimensionality reduction does not highly distort the data, we will utilize the Johnson-Lindenstrauss lemma. 13 Lemma 1.3.1 (Johnson-Lindenstrauss Lemma). Let ๐œ– โˆˆ (0, 1) and ๐‘‹ โŠ‚ R๐‘‘ be arbitrary with |๐‘‹ |= ๐‘› > 1. There exists ๐‘“ : ๐‘‹ โˆ’โ†’ R๐‘š with ๐‘š = O(๐œ– โˆ’2 log ๐‘›) such that โˆ€๐‘ฅ โˆˆ ๐‘‹, โˆ€๐‘ฆ โˆˆ ๐‘‹ (1 โˆ’ ๐œ–)โˆฅ๐‘ฅ โˆ’ ๐‘ฆโˆฅ 2 โ‰ค โˆฅ ๐‘“ (๐‘ฅ) โˆ’ ๐‘“ (๐‘ฆ)โˆฅ 2 โ‰ค (1 + ๐œ–)โˆฅ๐‘ฅ โˆ’ ๐‘ฆโˆฅ 2 . (1.26) Let ๐‘‹ denote the set of training data in R๐‘‘ . Then Lemma 1.3.1 states that the distance between the training data will be preserved up to a small distortion. However, in order to successfully apply, for example, k-NN methods, we would prefer a stronger guarantee that the distances from points in the training data be approximately preserved to any point in R๐‘‘ . Work by Elkin et al.in [29] demonstrated that ๐‘ฆ can be taken as an arbitrary point in R๐‘‘ โˆš with ๐‘š = O(log ๐‘›) and distortion โ‰ˆ 10. This embedding is called a terminal embedding with multiplicative factor on the right hand side referred to as the terminal distortion. Further work demonstrated that if ๐‘š is sufficiently large, one may prove a result of the following type. Theorem 1.3.1 (Lemma 1.1, [80]). Let ๐œ– โˆˆ (0, 1) and ๐‘‹ โŠ‚ R๐‘‘ be arbitrary with |๐‘‹ |= ๐‘› > 1. There exists ๐‘“ : ๐‘‹ โˆ’โ†’ R๐‘š with ๐‘š = O(๐œ– โˆ’2 log ๐‘›) such that โˆ€๐‘ฅ โˆˆ ๐‘‹, โˆ€๐‘ฆ โˆˆ R๐‘‘ (1 โˆ’ ๐œ–)โˆฅ๐‘ฅ โˆ’ ๐‘ฆโˆฅ 2 โ‰ค โˆฅ ๐‘“ (๐‘ฅ) โˆ’ ๐‘“ (๐‘ฆ)โˆฅ 2 โ‰ค (1 + ๐œ–)โˆฅ๐‘ฅ โˆ’ ๐‘ฆโˆฅ 2 . (1.27) Thus if the points in ๐‘‹ are mapped to R๐‘š well, which occurs with high probability, then our final terminal embedding is guaranteed to have low terminal distortion as a map from all of R๐‘‘ to R๐‘š . This terminal embedding is required to be non-linear. To see this, let ๐‘‹ โŠ‚ R๐‘‘ be arbitrary. Suppose for contradiction that ๐‘“ : ๐‘‹ โˆ’โ†’ R๐‘š , ๐‘‘ > ๐‘š is a linear embedding with constant terminal distortion. By the Rank-Nullity theorem, ๐‘‘๐‘–๐‘š(๐‘˜๐‘’๐‘Ÿ( ๐‘“ )) โ‰ฅ ๐‘‘ โˆ’๐‘š โ‰ฅ 1. This means โˆƒ๐‘ฆ โˆˆ ๐‘˜๐‘’๐‘Ÿ( ๐‘“ )\{0}. Let ๐‘ฅ โˆˆ ๐‘‹ be arbitrary. Since ๐‘“ is a linear embedding and ๐‘ฅ โˆ’ ๐‘ฆ โˆˆ R๐‘‘ , 0 < โˆฅ๐‘ฆโˆฅ 2 = โˆฅ๐‘ฅ โˆ’ (๐‘ฅ โˆ’ ๐‘ฆ)โˆฅ 2 โ‰ค โˆฅ ๐‘“ (๐‘ฅ) โˆ’ ๐‘“ (๐‘ฅ โˆ’ ๐‘ฆ)โˆฅ 2 = โˆฅ ๐‘“ (๐‘ฅ) โˆ’ ๐‘“ (๐‘ฅ) + ๐‘“ (๐‘ฆ)โˆฅ 2 = โˆฅ ๐‘“ (๐‘ฆ)โˆฅ 2 = 0. Thus we have arrived at a contradiction. 14 In Chapter 4, we will explore dimensionality reduction of manifolds. We also demonstrate numerically, a compressed classification algorithm for labelled data. 15 CHAPTER 2 TOWARD FAST AND PROVABLY ACCURATE NEAR-FIELD PTYCHOGRAPHIC PHASE RETRIEVAL 2.1 ABSTRACT Ptychography is an imaging technique that involves a sample being illuminated by a coherent, localized probe of illumination. When the probe interacts with the sample, the light is diffracted and a diffraction pattern is detected. Then the sample (or probe) is shifted laterally in space to illuminate a new area of the sample whilst ensuring sufficient overlap. Similarly, in Fourier ptychography a sample is illuminated at different angles of incidence (effectively shifting the sampleโ€™s Fourier transform) after which a lens acts as a low-pass filter, thereby effectively providing localized Fourier information about the sample around frequencies dictated by each angle of illumination. Mathematically, one therefore obtains a similar set of overlapping measurements of the sample in both Fourier ptychography and ptychography, except in the different domains (Fourier for the former, and physical for the latter). In either case, one is then able to reconstruct an image of the sample from the measurements using similar methods. Near-Field (Fourier) Ptychography (NFP) (see, e.g., [107, 108, 125]) occurs when the sample is placed at a short defocus distance having a large Fresnel number. In this chapter, we prove that certain NFP measurements are robustly invertible (up to an unavoidable global phase ambiguity) for specific Point Spread Functions (PSFs) and physical masks which lead to well-conditioned lifted linear systems. We then apply a block phase retrieval algorithm using weighted angular synchronization and prove that the proposed approach accurately recovers the measured sample for these specific PSF and mask pairs. Finally, we also propose using a Wirtinger Flow for NFP problems and numerically evaluate that alternate approach both against our main proposed approach, as well as with NFP measurements for which our main approach does not apply. 2.2 INTRODUCTION The task of recovering a complex signal x โˆˆ C๐‘‘ from phaseless magnitude measurements is called the phase retrieval problem. These types of problems appear in many applications such as optics 16 [3, 119] and x-ray crystallography [9, 73]. Here, we are interested in phase retrieval problems arising from (Fourier) ptychography [96, 126]. Ptychography is an imaging technique involving a sample illuminated by a coherent and often localized probe of illumination. When the probe interacts with the sample, light is diffracted and a diffraction pattern is detected. The probe, or the sample, is then shifted laterally in space to illuminate a new area of the sample while ensuring there is sufficient overlap between each neighboring shift. The intensity of the diffraction pattern detected at position โ„“ resulting from the ๐‘˜ ๐‘กโ„Ž shift of the probe along the sample takes the general form of ๐‘Œหœ๐‘˜,โ„“ = |(๐ท(๐‘† ๐‘˜ m โ—ฆ x))โ„“ | 2 , (2.1) where x โˆˆ C๐‘‘ is the sample being imaged, m โˆˆ C๐‘‘ is a mask which represents the probeโ€™s incident illumination on (a portion of) the sample, โ—ฆ denotes the Hadamard (pointwise) product, ๐‘† ๐‘˜ is a shift operator, and ๐ท : C๐‘‘ โ†’ C๐‘‘ is a function that describes the diffraction of the probe radiation from the sample to the plane of the detector after possibly passing through, e.g, a lens. Similarly, Fourier ptychography ultimately results in the same type of measurements as in (2.1) except with m and x replaced by m b and b x, respectively (see, e.g., [128]). Prior work in the computational mathematics community related to (Fourier) ptychographic imaging has primarily focused on Far-Field1 Ptychography (FFP) in which ๐ท is the action of a discrete (inverse) Fourier transform matrix (see, e.g., [95, 54, 49, 31, 92, 89]) in (2.1). Here, in contrast, we consider the less well studied setting of near-field ptychography (NFP) which describes situations where, e.g., the masked sample is too close to the source/detector to be well described by the FFP model. See, e.g., [107, 108, 125] for such imaging applications as well as for more detailed related discussions. In all of these NFP applications the acquired measurements can again be written in the form of (2.1) where ๐ท is now a convolution operator with a given Point Spread Function (PSF) p โˆˆ C๐‘‘ . Let x โˆˆ C๐‘‘ denote an unknown sample, m โˆˆ C๐‘‘ be a known mask, and p โˆˆ C๐‘‘ be a known 1 Far-field versus near-field measurements are defined based on the Fresnel number of the imaging system. See, e.g., [57] for details. 17 PSF, respectively. For the remainder of this chapter we will suppose we have noisy discretized NFP measurements of the form ๐‘Œ๐‘˜,โ„“ = ๐‘Œ๐‘˜,โ„“ (x) B |(p โˆ— (๐‘† ๐‘˜ m โ—ฆ x))โ„“ | 2 + ๐‘ ๐‘˜,โ„“ , (๐‘˜, โ„“) โˆˆ S โІ [๐‘‘]0 ร— [๐‘‘]0 , (2.2) where ๐‘† ๐‘˜ is a circular shift operator (๐‘† ๐‘˜ x)๐‘› = x๐‘›+๐‘˜ mod ๐‘‘ , N = (๐‘ ๐‘˜,โ„“ ) is an additive noise matrix, and [๐‘‘]0 := {0, . . . , ๐‘‘ โˆ’ 1}. Throughout this chapter we will always index vectors and matrices modulo ๐‘‘ unless otherwise stated. 2.2.1 RESULTS, CONTRIBUTIONS, AND CONTENTS Our main theorem guarantees the existence of a PSF p โˆˆ C๐‘‘ and a locally supported mask m โˆˆ C๐‘‘ with ๐‘ ๐‘ข ๐‘ ๐‘(m) โІ [๐›ฟ]0 B {0, . . . , ๐›ฟ โˆ’ 1}, ๐›ฟ โ‰ช ๐‘‘, for which the measurements (2.2) can be inverted up to a global phase factor by a computationally efficient and noise robust algorithm. In particular, we prove the following result which we believe to be the first theoretical error guarantee for a recovery algorithm in the setting of NFP. Theorem 2.2.1 (Inversion of NFP Measurements). Choose ๐›ฟ โˆˆ [๐‘‘]0 such that 2๐›ฟ โˆ’ 1 divides ๐‘‘. Then, there exists a PSF p โˆˆ C๐‘‘ and a mask m โˆˆ C๐‘‘ with supp(m) โІ [๐›ฟ]0 such that Algorithm 2.1 below, when provided with input measurements (2.2), will return an estimate xest โˆˆ C๐‘‘ of x satisfying โˆš โˆš๏ธ ๐‘š๐‘–๐‘› โˆฅxest โˆ’ ๐‘’ i๐œ™ xโˆฅ 2 โ‰ค ๐ถ โˆฅxโˆฅ โˆž  ๐‘‘ ๐›ฟ โˆฅxest โˆฅ 2โˆž +โˆฅxest โˆฅ 3โˆž โˆš๏ธ  ยท โˆฅNโˆฅ ๐น + ๐‘‘๐›ฟโˆฅNโˆฅ ๐น . ๐œ™โˆˆ[0,2๐œ‹) |xest | 2min Here ๐ถ โˆˆ R+ is an absolute constant2, and |xest | min denotes the smallest magnitude of any entry in xest . Looking at Theorem 2.2.1 we can see, e.g., that in the noiseless setting where โˆฅNโˆฅ ๐น = 0 the output xest of Algorithm 2.1 is guaranteed to match the measured signal x up to a global phase factor whenever xest has no zeros.3 Moreover, the method is also robust to small amounts of additive noise. The proof of Theorem 2.2.1 consists of two parts: First, in Section 2.4, we show that a 2 In this chapter we will use ๐ถ to denote absolute constants which may change from line to line. 3 Note that prior work on far-field ptychography assumed that x itself was non-vanishing (see e.g. [54, 49]). However, requiring xest to not vanish is more easily verifiable in practice. 18 specific PSF and mask choice results in NFP measurements (2.2) which are essentially equivalent to far-field ptychographic measurements (2.4) that are known to be robustly invertible by prior work [54, 49, 92]. This guarantees the existence of PSFs and masks which allow for the robust inversion of (2.2) up to a global phase. However, these prior works all prove error bounds on ๐‘š๐‘–๐‘› โˆฅxest โˆ’ ๐‘’ i๐œ™ xโˆฅ 2 which scale quadratically in ๐‘‘ (see, e.g., Corollary 3 in [49] and Theorem 1 in ๐œ™โˆˆ[0,2๐œ‹) [92]). This motivates the second part of the proof in Section 2.5, where we improve these results so that they only depend linearly on ๐‘‘. This is achieved by utilizing weighted angular synchronization error bounds from [33] which require, among other things, updated lower bounds for the second smallest eigenvalue of the unnormalized graph Laplacian of a given weighted graph obtained from x (derived in Section 2.5 with the help of auxiliary results proven in Appendix A.2). We also note that the improved dependence on ๐‘‘ proven in Section 2.5 for the FFP methods previously analyzed in [54, 49, 92] may be of potential independent interest. Theorem 2.2.1 is proven for a specific (2๐›ฟ โˆ’ 1)-periodic PSF p and locally supported mask m whose induced lifted linear measurement operator (see (2.5) โ€“ (2.7) below together with Lemma 2.4.1) is provably well conditioned. See Lemma 2.4.2 for the definition of this partic- ular p, m pair as well as for their measurementsโ€™ related condition number bound. However, we note that Algorithm 2.1 is guaranteed to work well much more generally for any PSF and mask pair which leads to well-conditioned measurements (up to, at worst, potentially having to change the shift and frequency pairs one samples if, e.g., p is not periodic โ€“ see Remark 2.4.1). Indeed, inspecting the proof of Theorem 2.2.1 we see that Lemma 2.5.1 decomposes the total error of Algorithm 2.1, min๐œ™โˆˆ[0,2๐œ‹) x โˆ’ ๐‘’ i๐œ™ xest 2 (๐œƒ) , into terms involving the phase error, min๐œ™โˆˆ[0,2๐œ‹)โˆฅxest โˆ’ ๐‘’ i๐œ™ x(๐œƒ) โˆฅ 2 , and the (mag) magnitute error, โˆฅx(mag) โˆ’ xest โˆฅ 2 . The phase error is controlled by Theorem 2.5.2 and Lemma 2.5.3. The proof of these results only depends on the choice of m and p through ๐œŽ ๐‘š๐‘–๐‘› (M q (๐‘,๐‘š) ), the minimal singular value of their induced measurement operator. Similarly, the magnitude error is controlled by Lemma 2.5.2 and Theorem 2.5.1 which also only depend on m and p through q (๐‘,๐‘š) ). Therefore, variants of these results can be derived for any invertible measurement ๐œŽ ๐‘š๐‘–๐‘› (M system. Moreover, numerical experiments demonstrate that the proposed method works well for a 19 wide variety of non-vanishing PSF and locally supported mask pairs. In order to be able to handle even more general PSFs p which do however, e.g., vanish, in Section 2.6 we also propose a Wirtinger Flow based algorithm, Algorithm 2.2, for inverting NFP measurements (2.2). Though slower than Algorithm 2.1 and less well supported by theory for the PSF and mask pairs for which both methods work empirically, Algorithm 2.2 generally appears more flexible and, e.g., also requires fewer shifts than Algorithm 2.1 to work well in practice when a given mask is not locally supported. Similar to Algorithm 2.1, Algorithm 2.2 relies on the observation that the NFP measurements (2.2) are essentially equivalent to FFP measurements as shown in Section 2.4. In Section 2.7, we evaluate Algorithm 2.1 and Algorithm 2.2, numerically, both individually and in comparison to one another in the case of locally supported masks. Finally, in Section 2.9, we conclude with a brief discussion of future work. 20 2.3 PRELIMINARIES: PRIOR RESULTS FOR FAR-FIELD PTYCHOGRAPHY USING LOCAL MEASUREMENTS Table 2.1 Notational Reference Table Notation Definition Notes [๐‘›]0 [๐‘›]0 = {0, 1, 2, . . . , ๐‘› โˆ’ 1} Zero indexing (x)๐‘› xโˆˆ C๐‘‘ , (x) ๐‘› = ๐‘ฅ ๐‘› mod ๐‘‘ Vector circular indexing (A)๐‘–, ๐‘— A โˆˆ C๐‘šร—๐‘› , (A)๐‘–, ๐‘— = ๐ด๐‘– mod ๐‘š, ๐‘— mod ๐‘› Matrix circular indexing โŸจx, yโŸฉ โŸจx, yโŸฉ = ๐‘›=0 ๐‘ฅ ๐‘› ๐‘ฆ ๐‘› = yโˆ— x P๐‘‘โˆ’1 Complex inner product supp(x) supp(x) = {๐‘› โˆˆ [๐‘‘]0 | ๐‘ฅ ๐‘› ฬธ= 0} Support F๐‘‘ (F๐‘‘ ) ๐‘—,๐‘˜ = ๐‘’ โˆ’2๐œ‹i ๐‘— ๐‘˜/๐‘‘ , โˆ€( ๐‘—, ๐‘˜) โˆˆ [๐‘‘]0 ร— [๐‘‘]0 Discrete Fourier transform matrix x ๐‘ฅ ๐‘› = (F๐‘‘ x)๐‘› = ๐‘‘โˆ’1 P โˆ’2๐œ‹ i๐‘›๐‘˜/๐‘‘ Discrete Fourier transform ๐‘˜=0 ๐‘ฅ ๐‘˜ ๐‘’ b b Fโˆ’1 (Fโˆ’1 1 P๐‘‘โˆ’1 2๐œ‹ i ๐‘˜๐‘›/๐‘‘ ๐‘‘ x ๐‘‘ x)๐‘› = ๐‘‘ ๐‘˜=0 ๐‘ฅ ๐‘˜ ๐‘’ Discrete inverse Fourier transform ๐‘† ๐‘˜ (x) (๐‘† ๐‘˜ x)๐‘› = ๐‘ฅ(๐‘›+๐‘˜) ๐‘š๐‘œ๐‘‘ ๐‘‘ , โˆ€๐‘› โˆˆ [๐‘‘]0 Circular shift x e ๐‘ฅ ๐‘› = ๐‘ฅโˆ’๐‘› e ๐‘š๐‘œ๐‘‘ ๐‘‘ , โˆ€๐‘› โˆˆ [๐‘‘]0 Reversal xโˆ—y (x โˆ— P๐‘‘โˆ’1 y)๐‘› = ๐‘˜=0 ๐‘ฅ ๐‘˜ ๐‘ฆ ๐‘›โˆ’๐‘˜ Circular convolution xโ—ฆy (x โ—ฆ y)๐‘› = ๐‘ฅ ๐‘› ๐‘ฆ ๐‘› Hadamard (pointwise) product Our method, described in Algorithm 2.1, is based on relating the near-field ptychographic measurements (2.2) to far-field ptychographic measurements of the form 2 i ๐‘‘โˆ’1 ๐‘šโ€ฒ๐‘› ๐‘ฅ ๐‘›+๐‘˜ ๐‘’ โˆ’2๐œ‹ โ„“๐‘›/๐‘‘ โˆ‘๏ธ ๐‘Œe๐‘˜,โ„“ = ๐‘Œe๐‘˜,โ„“ (x) B + ๐‘ ๐‘˜,โ„“ , (2.3) ๐‘›=0 where mโ€ฒ is a compactly supported mask. If we let (m q โ„“ )๐‘› B ๐‘šโ€ฒ๐‘› ๐‘’ โˆ’2๐œ‹iโ„“๐‘›/๐‘‘ , then these measurements can be written as ๐‘Œe๐‘˜,โ„“ = |โŸจmq โ„“ , ๐‘† ๐‘˜ xโŸฉ| 2 + ๐‘ ๐‘˜,โ„“ , (2.4) where as above ๐‘† ๐‘˜ denotes a circular shift of length ๐‘˜, i.e., (๐‘† ๐‘˜ x)๐‘› = ๐‘ฅ (๐‘›+๐‘˜) ๐‘š๐‘œ๐‘‘ ๐‘‘ . In [54], phase retrieval measurements of this form are studied when mโ€ฒ is supported in an interval of length ๐›ฟ for some ๐›ฟ โ‰ช ๐‘‘. The fast phase retrieval (fpr) method used there relies on using a lifted linear 21 system involving a block-circulant matrix to recover a portion of the autocorrelation matrix xxโˆ— . Specifically, letting ๐ท B ๐‘‘(2๐›ฟ โˆ’ 1), the authors define a block-circulant matrix M q โˆˆ C๐ทร—๐ท by ยฉM q0 M q1 ... M q ๐›ฟโˆ’1 0 0 ... 0 ยช ยญ ยฎ ยญ 0 M q0 M ... M 0 ... 0 ยฎ ยญ q1 q ๐›ฟโˆ’1 ยฎ M B ยญยญ . q .. .. .. .. .. ยฎยฎ . ยฎ (2.5) ยญ .. ... ... ยญ . . . . . ยฎ ยญ ยฎ ยซM1 . . . M๐›ฟโˆ’1 q q 0 0 0 ... M q0 ยฌ where the matrices M q ๐‘˜ โˆˆ C(2๐›ฟโˆ’1)ร—(2๐›ฟโˆ’1) are defined entry-wise by ๏ฃฑ ๏ฃด (m q โ„“ ) ๐‘˜ (m q โ„“ ) ๐‘—+๐‘˜ , 0 โ‰ค ๐‘— โ‰ค ๐›ฟ โˆ’ ๐‘˜, ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด (Mq ๐‘˜ )โ„“ ๐‘— B (m q โ„“ ) ๐‘˜ (m q โ„“ ) ๐‘—+๐‘˜โˆ’2๐›ฟ , 2๐›ฟ โˆ’ 1 + ๐‘˜ โ‰ค ๐‘— โ‰ค 2๐›ฟ โˆ’ 2 and ๐‘˜ < ๐›ฟ, (2.6) ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด 0, otherwise. ๏ฃด ๏ฃด ๏ฃณ Letting z โˆˆ C๐‘‘ be a vector obtained by subsampling appropriate entries of vec(xxโˆ— ), the authors show that, in the noiseless setting, vec(Y)e = Mz, q e โˆˆ C๐‘‘ร—(2๐›ฟโˆ’1) . Y (2.7) (See Equation (9) of [54] for explicit details on the arrangement of the entries.) For properly chosen m, the matrix M q is invertible, and therefore one may solve for z by multiplying by M q โˆ’1 , i.e., z=M q โˆ’1 vec(Y). Then, one may reshape z to recover a ๐‘‘ ร— ๐‘‘ matrix X b whose non-zero entries are estimates of the autocorrelation matrix xxโˆ— . One may then obtain a vector xest which approximates x by angular synchronization procedure such as the eigenvector-based method which we will discuss in Section 2.3.1. 22 In [54], it is shown that exponential masks m q โ„“(fpr) defined by 2 ๐œ‹ i๐‘›โ„“ ๏ฃฑ ๏ฃด โˆ’(๐‘›+1)/๐‘Ž ๏ฃด ๏ฃด โˆš4๐‘’ ยท ๐‘’ 2 ๐›ฟโˆ’1 , ๐‘› โˆˆ [๐›ฟ]0 n ๐›ฟ โˆ’ 1o q โ„“(fpr) )๐‘› = ๏ฃฒ ๏ฃด (m 2๐›ฟ โˆ’ 1 , ๐‘Ž B max 4, , (2.8) ๏ฃด 2 ๏ฃด 0, otherwise ๏ฃด ๏ฃด ๏ฃณ lead to a lifted linear system which is well-conditioned and thus to provable recovery guarantees for the method described above. In particular, we may obtain the following upper bound for the condition number of block-circulant matrix M q (fpr) obtained when one sets m q โ„“(fpr) . qโ„“ = m Theorem 2.3.1 (Theorem 4 and Equation (33) in [54]). The condition number of M q (fpr) , the matrix (fpr) obtained by setting m qโ„“ = m qโ„“ in (2.6), may be bounded by 2 2o q (fpr) < max 144๐‘’ 2 , 9๐‘’ (๐›ฟ โˆ’ 1) โ‰ค ๐ถ๐›ฟ2 ,   n ๐œ… M ๐ถ โˆˆ R+ . 4   Furthermore, M q (fpr) can be inverted in O(๐›ฟยท๐‘‘ log ๐‘‘)-time and its smallest singular value ๐œŽ ๐‘š๐‘–๐‘› M q (fpr) is bounded from below by ๐ถ/๐›ฟ. 2.3.1 ANGULAR SYNCHRONIZATION Inverting M q as described in the previous subsection allows one to obtain a portion of the autocorrelation matrix xxโˆ— . This motivates us to consider angular synchronization, the process of recovering a vector x from (a portion of) its autocorrelation matrix xxโˆ— (or an estimate X). b One popular approach, which we discuss below, is based on upon first entry-wise normalizing this matrix and then taking the lead eigenvector. Specifically, we define a truncated autocorrelation matrix X corresponding to the true signal x by ๏ฃฑ ๏ฃด | ๐‘— โˆ’ ๐‘˜ | mod ๐‘‘ < ๐›ฟ ๏ฃด ๏ฃฒ๐‘ฅ ๐‘— ๐‘ฅ ๐‘˜ , ๏ฃด ๏ฃด ๐‘‹ ๐‘—,๐‘˜ = (2.9) ๏ฃด ๏ฃด 0, otherwise. ๏ฃด ๏ฃด ๏ฃณ 23 We also define a truncated autocorrelation matrix X b corresponding to our estimate, xest , given by ๏ฃฑ ๏ฃด ๏ฃฒ (๐‘ฅ est ) ๐‘— (๐‘ฅ est ) ๐‘˜ , | ๐‘— โˆ’ ๐‘˜ | mod ๐‘‘ < ๐›ฟ ๏ฃด ๏ฃด ๏ฃด b๐‘—,๐‘˜ = ๐‘‹ (2.10) ๏ฃด ๏ฃด 0, otherwise. ๏ฃด ๏ฃด ๏ฃณ The method from [54] is based upon first solving for X b and then solving for xest . If X b is a good approximation of X, then the results proved in [118] show that xest will be a good approximation of x. Moving forward, prior works [54, 118] effectively decomposed X = X(๐œƒ) โ—ฆ X(mag) into its phase (mag) and magnitude matrices by setting ๐‘‹ ๐‘—,๐‘˜ = |๐‘‹ ๐‘—,๐‘˜ | and ๐‘‹ (๐œƒ) ๐‘—,๐‘˜ = ๐‘‹ ๐‘—,๐‘˜ /|๐‘‹ ๐‘—,๐‘˜ | if |๐‘‹ ๐‘—,๐‘˜ | =ฬธ 0 with ๐‘‹ (๐œƒ) ๐‘—,๐‘˜ = 0 otherwise. One may then write X b=X b(๐œƒ) โ—ฆ X b(mag) . Note that by construction, if x is nonvanishing, (mag) then we have |๐‘‹ (๐œƒ) ๐‘—,๐‘˜ | = 1 and ๐‘‹ ๐‘—,๐‘˜ > 0 whenever | ๐‘— โˆ’ ๐‘˜ | mod ๐‘‘ < ๐›ฟ. Letting u โˆˆ C๐‘‘ be the leading eigenvector of X b and letting diag(X) b โˆˆ C๐‘‘ be the main diagonal of X, b the output of the resulting algorithm is then x๐‘’๐‘ ๐‘ก B diag(X) b โ—ฆ u. Example 2.3.1. Let ๐‘‘ = 4, ๐›ฟ = 2. Then X b defined as in (2.10) is given by 2 ยฉ |(๐‘ฅ est )0 | (๐‘ฅ est )0 (๐‘ฅ est )1 0 (๐‘ฅ est )0 (๐‘ฅest )3 ยช ยญ ยฎ ยญ(๐‘ฅ ) (๐‘ฅ ) |(๐‘ฅ est )1 | 2 (๐‘ฅ est )1 (๐‘ฅ est )2 0 ยญ ยฎ b = ยญ est 1 est 1 ยฎ X ยฎ. |(๐‘ฅ est )2 | 2 ยญ ยฎ ยญ ยญ 0 (๐‘ฅ est )2 (๐‘ฅ est )1 (๐‘ฅ est )2 (๐‘ฅest )3 ยฎ ยฎ ยญ ยฎ ยซ(๐‘ฅest )3 (๐‘ฅ est )0 0 (๐‘ฅ est )3 (๐‘ฅ est )2 |(๐‘ฅ est )3 | 2 ยฌ If we write (๐‘ฅ est )๐‘› = |(๐‘ฅ est )๐‘› |๐‘’ i๐œƒ ๐‘› , then we may compute ยฉ 1 ๐‘’๐‘–(๐œƒ 0 โˆ’๐œƒ 1 ) 0 ๐‘’๐‘–(๐œƒ 0 โˆ’๐œƒ 3 ) ยช ยญ ยฎ ยญ ๐‘–(๐œƒ 1 โˆ’๐œƒ 0 ) 1 ๐‘’๐‘–(๐œƒ 1 โˆ’๐œƒ 2 ) 0 ยฎ ยญ๐‘’ b(๐œƒ) ยฎ X =ยญ ยญ ยฎ. ๐‘’๐‘–(๐œƒ 2 โˆ’๐œƒ 1 ) ๐‘’๐‘–(๐œƒ 2 โˆ’๐œƒ 3 ) ยฎ ยญ 0 1 ยฎ ยญ ยฎ ยญ ยฎ ๐‘–(๐œƒ 3 โˆ’๐œƒ 0 ) 0 ๐‘’๐‘–(๐œƒ 3 โˆ’๐œƒ 2 ) 1 ยซ๐‘’ ยฌ 24 One may verify that the lead eigenvector is u = (๐‘’ i๐œƒ 0 ๐‘’๐‘–๐œƒ 1 ๐‘’๐‘–๐œƒ 2 ๐‘’ i๐œƒ 3 )๐‘‡ and therefore b โ—ฆ u = (|(๐‘ฅ est )0 |๐‘’ i๐œƒ 0 |(๐‘ฅ est )1 |๐‘’ i๐œƒ 1 |(๐‘ฅ est )2 |๐‘’ i๐œƒ 2 |(๐‘ฅest )3 |๐‘’ i๐œƒ 3 )๐‘‡ . โˆš๏ธƒ x ๐‘’๐‘ ๐‘ก = ๐‘‘๐‘–๐‘Ž๐‘”(X) In Section 2.5, we will discuss another slightly more sophisticated way for estimating the phases based on Algorithm 3 of [93] which involves taking the smallest eigenvector of an appropriately weighted graph Laplacian. Indeed, this new angular synchronization approach is what ultimately allows for the NFP error bound in Theorem 2.2.1 to have improved dependence on signal dimension ๐‘‘ over prior FFP error bounds in [54, 49, 92]. The end result will be a more accurate method for computing X b in (2.10) from given NFP measurements (2.2). 2.4 NEAR FROM FAR: GUARANTEED NEAR-FIELD PTYCHOGRAPHIC RECOVERY VIA FAR-FIELD RESULTS In this section, we show how to relate the near-field ptychographic measurements (2.2) to the far-field ptychographic measurements (2.4). This will allow us to recover x by using methods similar to those introduced in [54]. In order get nontrivial bounds, we will also need to prove the existence of an admissible PSF and mask pair, p โˆˆ C๐‘‘ and m โˆˆ C๐‘‘ , which lead to a well conditioned linear system in (2.7). In particular, we will present a PSF and mask pair such that the resulting block-circulant matrix, denoted M q (๐‘,๐‘š) , will have the same condition number as the matrix M q (fpr) (fpr) constructed from the masks m qโ„“ defined in (2.8). Therefore, Theorem 2.3.1 will allow us to obtain convergence guarantees for Algorithm 2.1. Here, we will set the measurement index set S considered in (2.2) to be S = K ร— L where K = [๐‘‘]0 and L = [2๐›ฟ โˆ’ 1]0 . The following lemma proves that we can rewrite NFP measurements from (2.2) as local FFP measurements of the form (2.4) as long as the mask m has local support and the PSF is periodic. It will be based upon defining masks mq โ„“(๐‘,๐‘š) B ๐‘†โ„“ep โ—ฆ m โˆˆ C๐‘‘ , [๐›ฟ]0 , (2.11) where e p is the reversal of p about its first entry modulo ๐‘‘, i.e., ๐‘e๐‘› = ๐‘ โˆ’๐‘› mod ๐‘‘ Since the masks 25 q โ„“(๐‘,๐‘š) have compact support, this will then yield a lifted set of linear measurements of the type m considered in [54, 49, 92]. Lemma 2.4.1. Let S = K ร— L = [๐‘‘]0 ร— [2๐›ฟ โˆ’ 1]0 , and recall the measurements ๐‘Œ๐‘˜,โ„“ = |(p โˆ— (๐‘† ๐‘˜ m โ—ฆ x))โ„“ | 2 , (๐‘˜, โ„“) โˆˆ S, defined in (2.2). Suppose that 2๐›ฟ โˆ’ 1 divides ๐‘‘, that p โˆˆ C๐‘‘ is 2๐›ฟ โˆ’ 1 periodic, and that m โˆˆ C๐‘‘ satisfies ๐‘ ๐‘ข ๐‘ ๐‘(m) โІ [๐›ฟ]0 . Then, we may rearrange the measurements (2.2) into a matrix of FFP-type measurements (๐‘,๐‘š) ๐‘Œe๐‘˜,โ„“ B ๐‘Œโˆ’๐‘˜ ๐‘š๐‘œ๐‘‘ ๐‘‘, ๐‘˜โˆ’โ„“ ๐‘š๐‘œ๐‘‘ 2๐›ฟโˆ’1 = |โŸจmqโ„“ , ๐‘† ๐‘˜ xโŸฉ| 2 , (๐‘˜, โ„“) โˆˆ [๐‘‘]0 ร— [2๐›ฟ โˆ’ 1]0 , (2.12) (๐‘,๐‘š) where m qโ„“ is defined as in (2.11). As a consequence, recovering x is equivalent to inverting a block-circulant matrix as described in (2.5) โ€“ (2.7). Proof. By Lemma A.1.3 part 1 , Lemma A.1.2, Lemma A.1.3 part 2, and Lemma A.1.1 from Appendix A.1, we have that ๐‘Œ๐‘˜,โ„“ = |(p โˆ— (๐‘† ๐‘˜ m โ—ฆ x))โ„“ | 2 = |โŸจ๐‘†โˆ’โ„“e p, ๐‘† ๐‘˜ m โ—ฆ xโŸฉ| 2 = |โŸจ๐‘†โˆ’โ„“e p โ—ฆ ๐‘† ๐‘˜ m, xโŸฉ| 2 = |โŸจ๐‘† ๐‘˜ (๐‘†โˆ’โ„“โˆ’๐‘˜ e p โ—ฆ m), xโŸฉ| 2 = |โŸจ๐‘† ๐‘˜ (๐‘†โˆ’โ„“โˆ’๐‘˜ e p โ—ฆ m), xโŸฉ| 2 = |โŸจ๐‘† ๐‘˜ (๐‘†โˆ’โ„“โˆ’๐‘˜ ๐‘š๐‘œ๐‘‘ 2๐›ฟโˆ’1e pโ—ฆ m), xโŸฉ| 2 , where the last equality uses the fact that p is 2๐›ฟ โˆ’ 1 periodic. We may now apply Lemma A.1.3 part 3 to see that ๐‘Œ๐‘˜,โ„“ = |โŸจ๐‘† ๐‘˜ (๐‘†โˆ’โ„“โˆ’๐‘˜ ๐‘š๐‘œ๐‘‘ 2๐›ฟโˆ’1e pโ—ฆ m), xโŸฉ| 2 = |โŸจ(๐‘†โˆ’โ„“โˆ’๐‘˜ ๐‘š๐‘œ๐‘‘ 2๐›ฟโˆ’1e pโ—ฆ m), ๐‘†โˆ’๐‘˜ xโŸฉ| 2 .. Finally, 26 since m q โ„“(๐‘,๐‘š) = ๐‘†โ„“ep โ—ฆ m, we see that for all ๐‘˜ โˆˆ [๐‘‘]0 and all โ„“ โˆˆ [2๐›ฟ โˆ’ 1]0 , we have ๐‘Œe๐‘˜,โ„“ = ๐‘Œโˆ’๐‘˜ ๐‘š๐‘œ๐‘‘ ๐‘‘, ๐‘˜โˆ’โ„“ ๐‘š๐‘œ๐‘‘ 2๐›ฟโˆ’1 = |โŸจ(๐‘†โˆ’(๐‘˜โˆ’โ„“)โˆ’(โˆ’๐‘˜) ๐‘š๐‘œ๐‘‘ 2๐›ฟโˆ’1e pโ—ฆ m), ๐‘†โˆ’(โˆ’๐‘˜) xโŸฉ| 2 = |โŸจ(๐‘†โ„“ ๐‘š๐‘œ๐‘‘ 2๐›ฟโˆ’1e pโ—ฆ m), ๐‘† ๐‘˜ xโŸฉ| 2 (๐‘,๐‘š) = |โŸจm qโ„“ , ๐‘† ๐‘˜ xโŸฉ| 2 . Remark 2.4.1. If we instead change the pairs S in Lemma 2.4.1 for which we collect NFP measurements (2.2) to be S โ€ฒ := โˆช ๐‘˜ โˆˆ[๐‘‘]0 {๐‘‘ โˆ’ ๐‘˜ } ร— {๐‘˜ โˆ’ 2๐›ฟ + 2, . . . , ๐‘˜ โˆ’ 1, ๐‘˜ } mod ๐‘‘, then we may remove the assumption that p is 2๐›ฟ โˆ’ 1 periodic. In particular, for (๐‘˜, โ„“) โˆˆ S โ€ฒ one may substitute ๐‘˜ = ๐‘‘ โˆ’ ๐‘˜ โ€ฒ and โ„“โ€ฒ = ๐‘˜ โ€ฒ โˆ’ ๐‘– for some 0 โ‰ค ๐‘– โ‰ค 2๐›ฟ โˆ’ 2 to see that then (โˆ’๐‘˜ โ€ฒ โˆ’ โ„“โ€ฒ) mod ๐‘‘ = ๐‘–. Thus, since 0 โ‰ค ๐‘– โ‰ค 2๐›ฟ โˆ’ 2, (โˆ’โ„“โ€ฒ โˆ’ ๐‘˜ โ€ฒ) mod 2๐›ฟ โˆ’ 1 = (โˆ’โ„“โ€ฒ โˆ’ ๐‘˜ โ€ฒ) mod ๐‘‘, and so we may use the same calculation as above without assuming that p is 2๐›ฟ โˆ’ 1 periodic. Note, however, that S has a simple Cartesian product structure whereas S โ€ฒ does not. As a result, the entries of p โˆ— (๐‘† ๐‘˜ m โ—ฆ x) that one must sample varies based on the mask shift ๐‘˜ in the case of S โ€ฒ, potentially complicating the collection of the associated NFP measurements (2.2) in some situations. Next, in Lemma 2.4.2 below, we will show how to choose a mask m and PSF p such that q โ„“(๐‘,๐‘š) defined as in (2.11) and m m q (fpr) 2โ„“ mod 2๐›ฟโˆ’1 defined as in (2.8) will only differ by a global phase for each โ„“ โˆˆ [2๐›ฟ โˆ’ 1]0 . As a consequence, we obtain the desired result that the block-circulant matrix arising from the NFP measurements (2.2) is essentially equivalent (up to a row permutation and global phase shift) to the well-conditioned lifted linear measurement operator M q (fpr) considered in Theorem 2.3.1. Lemma 2.4.2. Let p, m โˆˆ C๐‘‘ have entries given by 2๐œ‹ i๐‘›2 2๐œ‹ i๐‘›2 ๏ฃฑ ๏ฃด ๏ฃด ๐‘’ โˆ’๐‘›+1 /๐‘Ž ยท ๐‘’ 2๐›ฟ โˆ’ 1 , ๏ฃด โˆ’ ๏ฃดโˆš ๏ฃฒ ๏ฃด 4 ๐‘› โˆˆ [๐›ฟ]0 ๐‘ ๐‘› B ๐‘’ 2๐›ฟ โˆ’ 1 , and ๐‘š๐‘› B 2๐›ฟ โˆ’ 1 , ๏ฃด ๏ฃด ๏ฃด ๏ฃด 0, ๏ฃด otherwise ๏ฃณ 27 n ๐›ฟ โˆ’ 1o where ๐‘Ž B max 4, . Then for all โ„“ โˆˆ [2๐›ฟ โˆ’ 1]0 , m q โ„“(๐‘,๐‘š) = ๐‘†โ„“e p โ—ฆ m satisfies 2 2๐œ‹ iโ„“ 2 q โ„“(๐‘,๐‘š) m = ๐‘’ 2๐›ฟ โˆ’ 1 ยท m q (fpr) 2โ„“ mod 2๐›ฟโˆ’1 , (2.13) (fpr) q (fpr) and M q (๐‘,๐‘š) be the lifted linear where mqโ„“ is defined as in (2.8). As a consequence, if we let M (fpr) (๐‘,๐‘š) measurement matrices as per (2.5) obtained by setting each m q โ„“ in (2.6) equal to m qโ„“ and m qโ„“ , respectively, then we will have Mq (๐‘,๐‘š) = PM q (fpr) , (2.14) where P is a ๐ท ร— ๐ท block diagonal permutation matrix. Thus M q (๐‘,๐‘š) and M q ( ๐‘“ ๐‘๐‘Ÿ) have the same singular values and     ๐œ… M q (๐‘,๐‘š) = ๐œ… M q (fpr) โ‰ค ๐ถ๐›ฟ2 , where ๐œ…(ยท) denotes the condition number of a matrix. Proof. Using the definition of the Hadamard product โ—ฆ, the circulant shift operator ๐‘†โ„“ and the reversal operator x โ†ฆโ†’ e x, we see that (๐‘,๐‘š) (m qโ„“ )๐‘› = (๐‘†โ„“ep โ—ฆ m)๐‘› = (๐‘†โ„“e p) ๐‘› m ๐‘› = e p๐‘›+โ„“ m๐‘› = pโˆ’๐‘›โˆ’โ„“ m๐‘› . Therefore, inserting the definitions of p and m above shows that for ๐‘› โˆˆ [๐›ฟ]0 2๐œ‹ i(๐‘› + โ„“)2 โˆ’(๐‘›+1)/๐‘Ž 2๐œ‹ i๐‘›2 ๐‘’ โˆ’ (mq โ„“(๐‘,๐‘š) )๐‘› = ๐‘’ 2๐›ฟ โˆ’ 1 ยท โˆš4 ยท ๐‘’ 2๐›ฟ โˆ’ 1 2๐›ฟ โˆ’ 1 2๐œ‹ i๐‘›2 4๐œ‹ i๐‘›โ„“ 2๐œ‹ iโ„“ 2 โˆ’(๐‘›+1)/๐‘Ž 2๐œ‹ i๐‘›2 ๐‘’ โˆ’ = ๐‘’ 2๐›ฟ โˆ’ 1 ยท ๐‘’ 2๐›ฟ โˆ’ 1 ยท ๐‘’ 2๐›ฟ โˆ’ 1 ยท โˆš4 ยท ๐‘’ 2๐›ฟ โˆ’ 1 2๐›ฟ โˆ’ 1 2๐œ‹ iโ„“  โˆ’(๐‘›+1)/๐‘Ž 2 2๐œ‹ iโ„“ 2  2 ๐œ‹ i๐‘›(2โ„“) ๐‘’   = ๐‘’ 2๐›ฟ โˆ’ 1 โˆš4 ยท ๐‘’ 2 ๐›ฟโˆ’1 = ๐‘’ 2๐›ฟ โˆ’ 1 m q (fpr) 2โ„“ mod 2๐›ฟโˆ’1 ๐‘› . 2๐›ฟ โˆ’ 1 28   2๐œ‹ iโ„“ 2   (fpr) For ๐‘› โˆˆ/ [๐›ฟ]0 , we have m q โ„“(๐‘,๐‘š) =๐‘’ 2๐›ฟ โˆ’ 1 m2โ„“ mod 2๐›ฟโˆ’1 = 0. Thus (2.13) follows. q ๐‘› ๐‘› q (๐‘,๐‘š) and M (๐‘,๐‘š) ๐‘,๐‘š To prove (2.14), let M q ๐‘˜ be the matrices obtained by using the mask m qโ„“ in (2.5) q (fpr) and M (fpr) (fpr) and (2.6) and let M q ๐‘˜ be the matrices obtained using mโ„“ instead. Then combining     (2.13) and (2.6) implies that M q (๐‘,๐‘š) = M q (fpr) . For example, when ๐‘— โˆˆ [๐›ฟ โˆ’ ๐‘˜ + 1]0 ๐‘˜ ๐‘–, ๐‘— ๐‘˜ 2๐‘– mod 2๐›ฟโˆ’1, ๐‘— one may check   2๐œ‹ i๐‘– 2   2๐œ‹ i๐‘– 2   q (๐‘,๐‘š) M ๐‘˜ = ๐‘’ 2๐›ฟ โˆ’ 1 m q (fpr) 2๐‘– mod 2๐›ฟโˆ’1 ๐‘˜ ๐‘’ 2๐›ฟ โˆ’ 1 m q (fpr) 2๐‘– mod 2๐›ฟโˆ’1 ๐‘—+๐‘˜ (2.15) ๐‘–, ๐‘—   = M q (fpr) , ๐‘˜ 2๐‘– mod 2๐›ฟโˆ’1, ๐‘— and one may perform similar computations in the other cases. Since each M q (๐‘,๐‘š) and M q (fpr) have ๐‘˜ ๐‘˜ 2๐›ฟ โˆ’ 1 rows and the mapping ๐‘– โ†’ 2๐‘– is a bฤณection on Z/(2๐›ฟ โˆ’ 1)Z we see that each M q (๐‘,๐‘š) may ๐‘˜ be obtained by permuting the rows of M q (fpr) (and that the permutation does not depend on ๐‘˜). ๐‘˜ Therefore, there exists a block diagonal permutation matrix P such that M q (๐‘,๐‘š) = PM q (fpr) . Finally, the condition number bound for M q (๐‘,๐‘š) now follows from Theorem 2.3.1 and the fact that permuting the rows of a matrix does not change its condition number or any of its singular values. Lemma 2.4.1 above demonstrates how to recast NFP problems involving locally supported masks and periodic PSFs as particular types of FFP problems. Then, Lemma 2.4.2 provides a particular PSF and mask combination for which the resulting FFP problem can be solved by inverting a well-conditioned linear system. Together they imply that, for properly chosen m and p, one may robustly invert the measurements given in (2.2) by first recasting the NFP data as modified FFP data and then using the BlockPR approach from [54, 49, 92]. This is the main idea behind Algorithm 2.1. However, this approach will lead to theoretical error bounds which scale quadratically in ๐‘‘. To remedy this, the final step of Algorithm 2.1 uses an alternative angular synchronization method (which originally appeared in [93]) based on a weighted graph Laplacian as opposed to previous works which used methods based on, e.g., the methods outlined in Section 2.3.1. As we shall see in the next section, this will allow us to obtain bounds in 29 Theorem 2.2.1 which depend linearly on ๐‘‘ rather than quadratically. Algorithm 2.1 NFP-BlockPR Input: 1) Variables ๐‘‘, ๐›ฟ, ๐ท = ๐‘‘(2๐›ฟ โˆ’ 1). 2) A 2๐›ฟ โˆ’ 1 periodic PSF p โˆˆ C๐‘‘ , and a mask m โˆˆ C๐‘‘ with ๐‘ ๐‘ข ๐‘ ๐‘(m) โІ [๐›ฟ]0 . 3) A near-field ptychographic measurement matrix Y โˆˆ C๐‘‘ร—2๐›ฟโˆ’1 . Output: xest with xest โ‰ˆ ๐‘’๐‘–๐œƒ x for some ๐œƒ โˆˆ [0, 2๐œ‹]. 1) Form masks m q โ„“(๐‘,๐‘š) = ๐‘†โ„“e p โ—ฆ m and matrix M q (๐‘,๐‘š) as per (2.5) and (2.6).   โˆ’1 2) Compute z = M q (๐‘,๐‘š) vec(Y) โˆˆ C๐ท . 3) Reshape z to get X b as per Section 2.3.1 containing estimated entries of xxโˆ— . 4) Use weighted angular synchronization (Algorithm 3, [93]) to obtain xest . 2.5 ERROR ANALYSIS FOR ALGORITHM 2.1 In this section, we will prove our main result, Theorem 2.2.1, which provides accuracy and robustness guarantees for Algorithm 2.1. For x โˆˆ C๐‘‘ , we write its ๐‘›th entry as ๐‘ฅ ๐‘› C |๐‘ฅ ๐‘› |๐‘’ i๐œƒ ๐‘› and let x(mag) B (|๐‘ฅ 0 |, . . . , |๐‘ฅ ๐‘‘โˆ’1 |)๐‘‡ and x(๐œƒ) B (๐‘’ i๐œƒ 0 , ๐‘’ i๐œƒ 1 , . . . , ๐‘’ i๐œƒ ๐‘‘โˆ’1 )๐‘‡ so that we may decompose x as x = x(mag) โ—ฆ x(๐œƒ) . (2.16) The following lemma upper bounds the total estimation error in terms of its phase and magnitude errors. For a proof, please see Appendix A.1. Lemma 2.5.1. Let x be decomposed as in (2.16), and similarly let xest be decomposed xest = (mag) xest โ—ฆ x(๐œƒ) est . Then, we have that min x โˆ’ ๐‘’ i๐œ™ xest i๐œ™ (๐œƒ) (mag) 2 โ‰ค โˆฅxโˆฅ โˆž min x(๐œƒ) est โˆ’ ๐‘’ x + x(mag) โˆ’ xest . (2.17) ๐œ™โˆˆ[0,2๐œ‹) ๐œ™โˆˆ[0,2๐œ‹) 2 2 In light of Lemma 2.5.1, to bound the total error of our algorithm, it suffices to consider the (mag) phase and magnitude errors separately. In order to bound โˆฅx(mag) โˆ’ xest โˆฅ 2 , we may utilize the following lemma which is a restatement of Lemma 3 of [54].   Lemma 2.5.2 (Lemma 3 of [54]). Let ๐œŽmin M q (๐‘,๐‘š) denote the smallest singular value of the lifted 30 measurement matrix M q (๐‘,๐‘š) from line 1 of Algorithm 2.1. Then, v u (mag) (mag) t โˆฅNโˆฅ ๐น x โˆ’ xest โ‰ค๐ถ  . โˆž (๐‘,๐‘š) ๐œŽ ๐‘š๐‘–๐‘› M q Having obtained Lemma 2.5.2, we are now able to prove the following theorem bounding the total estimation error. Theorem 2.5.1. Let p and m be the admissible PSF, mask pair defined in Lemma 2.4.2. Then, we have that x โˆ’ ๐‘’ i๐œ™ xest x(๐œƒ) i๐œ™ (๐œƒ) โˆš๏ธ min 2 โ‰ค โˆฅxโˆฅ โˆž min est โˆ’ ๐‘’ x + ๐ถ ๐‘‘๐›ฟโˆฅNโˆฅ ๐น . ๐œ™โˆˆ[0,2๐œ‹) ๐œ™โˆˆ[0,2๐œ‹) 2 โˆš Proof. Combining Lemmas 2.5.1 and 2.5.2 along with the inequality โˆฅuโˆฅ 2 โ‰ค ๐‘‘ โˆฅuโˆฅ โˆž , implies that v min โˆฅx โˆ’ ๐‘’ i๐œ™ xest โˆฅ 2 โ‰ค โˆฅxโˆฅ โˆž min โˆ’ ๐‘’ i๐œ™ x(๐œƒ) u (๐œƒ) t ๐‘‘ โˆฅNโˆฅ ๐น xest +๐ถ  . (2.18) ๐œ™โˆˆ[0,2๐œ‹) ๐œ™โˆˆ[0,2๐œ‹) 2 q (๐‘,๐‘š) ๐œŽ ๐‘š๐‘–๐‘› M As noted in Lemma 2.4.2, the singular values of M q (๐‘,๐‘š) are the same as those of M q (fpr) . Therefore, applying Theorem 2.3.1 then finishes the proof. Remark 2.5.1. Note that the inequality (2.18) in the proof of Theorem 2.5.1 holds any time m โˆˆ C๐‘‘ satisfies ๐‘ ๐‘ข ๐‘ ๐‘(m) โІ [๐›ฟ]0 and either (๐‘–) p โˆˆ C๐‘‘ is 2๐›ฟ โˆ’ 1 periodic for 2๐›ฟ โˆ’ 1 dividing ๐‘‘, or else (๐‘–๐‘–) one instead collects NFP measurements (2.2) at all (๐‘˜, โ„“) โˆˆ S โ€ฒ as in Remark 2.4.1. Therefore, results analogous to Theorem 2.5.1 may be produced for any such p and m pairs such   that ๐œŽ ๐‘š๐‘–๐‘› M q (๐‘,๐‘š) > 0. Furthermore, the value of this minimal singular value is straightforward to check numerically in practice. In order to bound x(๐œƒ) i๐œ™ (๐œƒ) , we will need a few additional definitions. As in (2.9), let X est โˆ’ ๐‘’ x 2 denote the partial autocorrelation matrix corresponding to the true signal x, as in (2.10), and let X b denote the partial autocorrelation matrix corresponding to xest , i.e., the matrix obtained in step 3 of Algorithm 2.1. Let ๐บ = (๐‘‰, ๐ธ, W) be a weighted graph whose vertices are given by ๐‘‰ = [๐‘‘]0 , 31 whose edge set ๐ธ is taken to be the set of (๐‘–, ๐‘—) such that ๐‘– ฬธ= ๐‘— and |๐‘– โˆ’ ๐‘— | mod ๐‘‘ < ๐›ฟ, and whose weight matrix W is defined entrywise by ๏ฃฑ b๐‘–, ๐‘— | 2 , ๏ฃด ๏ฃฒ|๐‘‹ 0 < |๐‘– โˆ’ ๐‘— | mod ๐‘‘ < ๐›ฟ ๏ฃด ๏ฃด ๏ฃด ๐‘Š๐‘–, ๐‘— = . (2.19) ๏ฃด ๏ฃด 0, otherwise ๏ฃด ๏ฃด ๏ฃณ Letting A๐บ denote the unweighted adjacency matrix of ๐บ, we observe that by construction, we have X = (I + A๐บ ) โ—ฆ xxโˆ— and X b = (I + A๐บ ) โ—ฆ xest xโˆ—est . Letting D denote the weighted degree matrix, we define the unnormalized graph Laplacian by L๐บ B D โˆ’ W and the normalized graph Laplacian by L๐‘ B Dโˆ’1/2 L๐บ Dโˆ’1/2 . It is well known that both L๐บ and L๐‘ are positive semi-definite with a minimal eigenvalue of zero (see, e.g., Section 3.1, [104]). We will let ๐œ๐บ denote the spectral gap (second smallest eigenvalue) of L๐บ . It is well known that if ๐บ is connected then ๐œ๐บ is strictly positive (see, e.g., Lemma 3.1.1, [104]). In [33], the authors used a weighted graph approach to prove the following result which bounds ๐‘š๐‘–๐‘› โˆฅx(๐œƒ) i๐œ™ (๐œƒ) est โˆ’ ๐‘’ x โˆฅ 2 . ๐œ™โˆˆ[0,2๐œ‹) Theorem 2.5.2 (Corollary 3, [33]). Consider the weighted graph ๐บ = (๐‘‰, ๐ธ, W) described in the previous paragraph with weight matrix given as in (2.19). Let ๐œ๐บ denote the spectral gap of the associated unnormalized Laplacian L๐บ . Then we have that x(๐œƒ) i๐œ™ (๐œƒ) โˆš๏ธ โˆฅX โˆ’ Xโˆฅ b ๐น , ๐ถ โˆˆ R+ . ๐‘š๐‘–๐‘› est โˆ’ ๐‘’ x โ‰ค ๐ถ 1 + โˆฅx ๐‘’๐‘ ๐‘ก โˆฅ โˆž ยท โˆš ๐œ™โˆˆ[0,2๐œ‹) 2 ๐œ๐บ โˆš๏ธ Remark 2.5.2. The 1 + โˆฅxest โˆฅ โˆž term is referred to in Theorem 4 of [33] as a tightness penalty which is applied when taking the non-convex constraint and performing an eigenvector relaxation, allowing us to use the method of angular synchronization involving the weighted Laplacian given in Algorithm 3 of [93]. In order to utilize Theorem 2.5.2 we require both an upper bound of โˆฅX โˆ’ Xโˆฅ b ๐น and a lower bound for the spectral gap ๐œ๐บ . These are provided by the next two lemmas. Lemma 2.5.3. Let p and m be defined as in Lemma 2.4.2. Then, โˆฅX โˆ’ Xโˆฅ b ๐น โ‰ค ๐ถ๐›ฟโˆฅNโˆฅ ๐น . 32 Proof. Let ๐‘ฃ๐‘’๐‘ : C๐‘‘ร—๐‘‘ โ†’ C๐ท be the vectorization operator considered in (2.7). It follows from (2.4), (2.7), and Step 2 of Algorithm 2.1, that vec(Y) = Mvec( q b and X) vec(Y โˆ’ N) = Mvec(X).q Therefore,   โˆ’1 โˆฅvec(N)โˆฅ 2 Xโˆ’X b โ‰ค q (๐‘,๐‘š) M vec(N) โ‰ค   โ‰ค ๐ถ๐›ฟโˆฅNโˆฅ ๐น , ๐น 2 ๐œŽ ๐‘š๐‘–๐‘› M q (๐‘,๐‘š) where final inequality again utilizes Lemma 2.4.2 and Theorem 2.3.1. Lemma 2.5.4. For the graph ๐บ considered in Theorem 2.5.2, we have that |xest | 4min 4(๐›ฟ โˆ’ 1) ๐œ๐บ โ‰ฅ . โˆฅxest โˆฅ 2โˆž ๐‘‘2 Proof. Letting ๐‘Šmin and ๐‘Šmax be the minimum and maximum value of any of the (nonzero) entries of W, we have that ๐‘Šmin โ‰ฅ |xest | 2min , ๐‘Šmax โ‰ค โˆฅxest โˆฅ 2โˆž , and diam(๐บ unw ) โ‰ฅ ๐‘‘/(2๐›ฟ โˆ’ 1) (where diam(๐บ unw ) is the diameter of the unweighted version of ๐บ). Therefore, by Theorem A.2.1 in Appendix A.2, we have that |xest | 4min 2 |xest | 4min 4(๐›ฟ โˆ’ 1) ๐œ๐บ โ‰ฅ โ‰ฅ . โˆฅxest โˆฅ 2โˆž (๐‘‘ โˆ’ 1)diam(๐บ) โˆฅxest โˆฅ 2โˆž ๐‘‘ 2 We shall now finally prove our main result. The Proof of Theorem 2.2.1. By Theorem 2.5.1, we have x โˆ’ ๐‘’ i๐œ™ xest x(๐œƒ) i๐œ™ (๐œƒ) โˆš๏ธ min 2 โ‰ค โˆฅxโˆฅ โˆž min est โˆ’ ๐‘’ x + ๐ถ ๐‘‘๐›ฟโˆฅNโˆฅ ๐น . ๐œ™โˆˆ[0,2๐œ‹) ๐œ™โˆˆ[0,2๐œ‹) 2 33 Combining Theorem 2.5.2 with Lemmas 2.5.3 and 2.5.4 yields x(๐œƒ) i๐œƒ (๐œƒ) โˆš๏ธ โˆฅX โˆ’ Xโˆฅ b ๐น ๐‘š๐‘–๐‘› est โˆ’ ๐‘’ x โ‰ค ๐ถ 1 + โˆฅx ๐‘’๐‘ ๐‘ก โˆฅ โˆž ยท โˆš ๐œ™โˆˆ[0,2๐œ‹) 2 ๐œ๐บ โˆš โˆš๏ธ ๐‘‘ ๐›ฟโˆฅxest โˆฅ โˆž โˆฅNโˆฅ ๐น โ‰ค ๐ถ 1 + โˆฅx ๐‘’๐‘ ๐‘ก โˆฅ โˆž ยท . |xest | 2min The result follows. 2.6 AN ALTERNATE APPROACH: NEAR-FIELD PTYCHOGRAPHY VIA WIRTINGER FLOW In the previous sections we have demonstrated a particular point spread function and mask for which NFP measurements are guaranteed to allow image reconstruction via Algorithm 2.1. However, in many real-world scenarios the particular mask and PSF combination considered above are not of the type actually used in practice. For example, in the setting considered in [125] the PSF p ideally behaves like a low-pass filter (so that, e.g., b p is supported in {๐‘˜ โˆˆ Z|โˆ’๐พ < ๐‘˜ mod ๐‘‘ < ๐พ } for some ๐พ โ‰ช ๐‘‘), and the mask ๐‘š is globally supported in [๐‘‘]0 . In contrast, the PSF considered above has its nonzero Discrete Fourier coefficients at frequencies in {๐‘˜ ๐‘‘/(2๐›ฟ โˆ’ 1)} ๐‘˜โˆˆ[2๐›ฟโˆ’1]0 (and thus its Fourier support includes large frequencies), and the mask ๐‘š has small physical support in [๐›ฟ]0 . This motivates us to explore a variant of the well known Wirtinger Flow algorithm [14] in this section. This method, Algorithm 2.2, can be applied to more general set of PSF and mask pairs than Algorithm 2.1 considered in the previous section. Suppose we have noiseless NFP measurements of the form [ ๐‘Œ๐‘˜,โ„“ = |(p โˆ— (๐‘† ๐‘˜ m โ—ฆ x))โ„“ | 2 , (๐‘˜, โ„“) โˆˆ {๐‘‘ โˆ’ ๐‘˜ } ร— {๐พ โˆ’ ๐ฟ + 1, . . . , ๐‘˜ } mod ๐‘‘, 0โ‰ค๐‘˜ โ‰ค๐พโˆ’1 where ๐พ, ๐ฟ โˆˆ [๐‘‘ + 1]0 \ {0}. Then by the same argument used in Lemma 2.4.1 (see also in Remark 2.4.1), we can manipulate the measurements above so that we have ๐‘Œe๐‘˜,โ„“ = |โŸจmq โ„“(๐‘,๐‘š) , ๐‘† ๐‘˜ xโŸฉ| 2 , (๐‘˜, โ„“) โˆˆ [๐พ]0 ร— [๐ฟ]0 , 34 (๐‘,๐‘š) where the masks m qโ„“ are defined as in (2.11). We may then reshape these measurements into a vector y โˆˆ C๐พ ๐ฟ with entries given by (๐‘,๐‘š) ๐‘ฆ ๐‘› := |โŸจmq๐‘› , ๐‘† ๐‘› xโŸฉ| 2 , mod ๐ฟ โŒŠ ๐ฟ โŒ‹ โˆ€๐‘› โˆˆ [๐พ ๐ฟ]0 . (2.20) After this reformulation, we may then apply a standard Wirtinger Flow Algorithm with spectral initialization. Full details are given below in Algorithm 2.2. Algorithm 2.2 NFP Wirtinger Flow Input: 1) Size ๐‘‘ โˆˆ N, number of iterations ๐‘‡, stepsizes ๐œ‡๐œ+1 for ๐œ โˆˆ [๐‘‡]0 .4 2) PSF p โˆˆ C๐‘‘ , mask m โˆˆ C๐‘‘ , m q โ„“(๐‘,๐‘š) = ๐‘†โ„“e p โ—ฆ m. 3) Noisy measurements ๐‘Œ๐‘˜,โ„“ = |(p โˆ— (๐‘† ๐‘˜ m โ—ฆ x))โ„“ | 2 + ๐‘ ๐‘˜,โ„“ . Output: xest โˆˆ C๐‘‘ with xest โ‰ˆ ๐‘’๐‘–๐œƒ x for some ๐œƒ โˆˆ [0, 2๐œ‹] 1) Rearrange measurement matrix to form measurement vector y in (2.20). 2) Compute z0 using spectral method (Algorithm 1 in [14]). ๐œ‡๐œ+1 3) For ๐œ โˆˆ [๐‘‡]0 , let z๐œ+1 = z๐œ โˆ’ โˆ‡ ๐‘“ (z๐œ ) where โˆฅz0 โˆฅ 2  โˆ— 2 2 1 P๐พ ๐ฟ  (๐‘,๐‘š) ๐‘“ (z) B q ๐‘› ๐‘š๐‘œ๐‘‘ ๐ฟ z โˆ’ ๐‘ฆ ๐‘› . ๐‘†โˆ’โŒŠ ๐ฟ โŒ‹ m ๐‘› ๐พ ๐ฟ ๐‘›=1 4) Return xest = z๐‘‡ . 2.7 NUMERICAL SIMULATIONS In this section, we evaluate Algorithms 2.1 and 2.2 with respect to both noise robustness and runtime. Every data point in the plots below reports an average reconstruction error or runtime over 100 tests. For each test, a new sample x โˆˆ C๐‘‘ is randomly generated by choosing each entry to have independent and identically distributed (i.i.d.) mean 0 and variance 1 Gaussian real and imaginary parts. We then attempt to recover this sample from the noisy measurements ๐‘Œ๐‘˜,โ„“ (x) defined as in (2.2) where the additive noise matrices N also have i.i.d. mean 0 Gaussian entries. In our noise robustness experiments, we plot the reconstruction error as a function of the 4 For our numerical simulations in Section 2.7, we set ๐œ‡ ๐œ = min(1 โˆ’ ๐‘’ โˆ’ ๐œ/330 , 0.4) as suggested in [14]. 35 Signal-to-Noise Ratio (SNR), where we define the reconstruction error by  min๐œ™ โˆฅx โˆ’ ๐‘’ i๐œ™ xest โˆฅ 2  2 ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ(x, xest ) := 10 log10 , โˆฅxโˆฅ 22 and the SNR by  โˆฅY โˆ’ Nโˆฅ  ๐น ๐‘†๐‘ ๐‘…(Y, N) := 10 log10 . โˆฅNโˆฅ ๐น In these experiments, we re-scale the noise matrix N in order to achieve each desired SNR level. All simulations were performed using MATLAB R2021b on an Intel desktop with a 2.60GHz i7-10750H CPU and 16GB DDR4 2933MHz memory. All code used to generate the figures below is publicly available at https://github.com/MarkPhilipRoach/NearFieldPtychography. 2.7.1 ALGORITHMS 2.1 AND 2.2 FOR LOCALLY SUPPORTED MASKS AND PERIODIC POINT SPREAD FUNCTIONS In these experiments, we choose the measurement index set for (2.2) to be S = K ร— L where K = [๐‘‘]0 and L = [2๐›ฟ โˆ’ 1]0 . As a consequence we see that we consider all shifts ๐‘˜ โˆˆ [๐‘‘]0 of the mask while observing only a portion of each resulting noisy near-field diffraction pattern |p โˆ— (๐‘† ๐‘˜ m โ—ฆ x)| 2 for each ๐‘˜. This corresponds to a physical imaging system where, e.g., the sample and (a smaller) detector are fixed while a localized probe with support size ๐›ฟ scans across the sample. Figure 2.1 evaluates the robustness and runtime of Algorithm 2.1 as a function of the SNR and mask support ๐›ฟ in this setting. Looking at Figure 2.1 one can see that noise robustness increases with the support size of the mask, ๐›ฟ, in exchange for mild increases in runtime. 36 Figure 2.1 An evaluation of Algorithm 2.1 for the proposed PSF and mask with ๐‘‘ = 945. Left: Reconstruction error vs SNR for various ๐›ฟ = | ๐‘ ๐‘ข ๐‘ ๐‘(m)|. Right: Runtime as a function of ๐›ฟ. Figure 2.2 compares the performance of Algorithm 2.1 and Algorithm 2.2 for the measurements proposed in Lemma 2.4.2. Looking at Figure 2.2 we can see that Algorithm 2.2 takes longer to achieve comparable errors to Algorithm 2.1 for these particular p and m as SNR increases. More specifically, we see, e.g., that BlockPR achieves a similar reconstruction error to 500 iterations of Wirtinger flow at an SNR of about 50 in a small fraction of the time. This supports the value of the BlockPR method as a fast initializer for more traditional optimization-based solution approaches. 37 Figure 2.2 A comparison of Algorithms 2.1 and 2.2 for the proposed PSF and mask with ๐›ฟ = 26 and ๐‘‘ = 102. Left: Reconstruction error vs SNR for various numbers of Algorithm 2.2 iterations. Right: The corresponding average runtimes. 2.7.2 ALGORITHM 2.2 FOR GLOBALLY SUPPORTED MASKS As we saw in the previous section, Algorithm 2.1 is able to invert NFP measurements more efficiently than Algorithm 2.2 in situations where it is applicable. However, Algorithm 2.1 only applies to locally supported masks. In this section, we will show that Algorithm 2.2 remains effective even when the masks are globally supported, such as the masks considered in [125]. In Figure 2.3, we evaluate Algorithm 2.2 using noisy measurements of the form ๐‘Œ๐‘˜,โ„“ = |(p โˆ— (๐‘† ๐‘˜ m โ—ฆ x))โ„“ | 2 + ๐‘ ๐‘˜,๐‘™ , (๐‘˜, โ„“) โˆˆ [๐พ]0 ร— [๐‘‘]0 . (2.21) ๐‘‘ Here p โˆˆ C๐‘‘ is a low-pass filter with b p = ๐‘†โˆ’(๐›พโˆ’1)/2 1๐›พ where ๐›พ = + 1 and 1๐›พ โˆˆ {0, 1} ๐‘‘ is a vector 3 whose first ๐›พ entries are 1 and whose last ๐‘‘ โˆ’ ๐›พ entries are 0. Here, we choose the mask m to have i.i.d. mean 0 variance 1 Gaussian entries. Thus, the measurements considered in (2.21) differ from those used in Section 2.7.1 in two crucial respects: i) the mask m here has global support. ii) we utilize a small number of mask shifts and observe the entire diffraction pattern resulting from each one (as opposed to observing just a portion of each diffraction pattern from all possible shifts, as above). Examining Figure 2.3, one can see Algorithm 2.2 remains effective in this setting. We also 38 observe, as expected, that using more shifts, i.e., collecting more measurements, results in lower reconstruction errors. Figure 2.3 The reconstruction error of Algorithm 2.2 with ๐‘‘ = 102, L = [๐‘‘]0 , and number of iterations ๐‘‡ = 2000. Left: Reconstruction error vs the number of total shifts ๐พ for fixed ๐‘†๐‘ ๐‘… = 80. Right: Reconstruction error vs SNR for various numbers of shifts ๐พ. 2.7.3 ALGORITHM 2.1 FOR NON-PERIODIC PSFS VIA REMARK 2.4.1 In these experiments, we choose the measurement index set for (2.2) to be S โ€ฒ from Remark 2.4.1 and consider two different non-periodic PSFs together with locally supported masks m โˆˆ C๐‘‘ having ๐‘ ๐‘ข ๐‘ ๐‘(m) โІ [๐›ฟ]0 for ๐›ฟ = 26 and ๐‘‘ = 102. Motivated again by [125], we first consider a PSF given by a low-pass filter (defined as in Section 2.7.2) plus small noise modeling imperfections (here the additive vector has i.i.d. N (0, 10โˆ’4 ) normal entries) in combination with a random symmetric mask. Here the maskโ€™s nonzero entries are created by reflecting ๐›ฟ/2 = 13 random entries (chosen via i.i.d. mean 0 variance 1 Gaussians) across the middle of its support. The reconstruction error of Algorithm 2.1, as well as of Algorithm 2.2 initialized with the output of Algorithm 2.1, is plotted on the left in Figure 2.4 as a function of the NFP measurementsโ€™ SNR for this PSF/mask pair. For our second non-periodic PSF and locally supported mask pair, we let the PSF be a vector with unit magnitude entries having i.i.d. uniformly random phases, and let our locally supported masks have ๐›ฟ nonzero i.i.d. mean 0 variance 1 Gaussian entries. The reconstruction errors of both Algorithm 2.1 and Algorithm 2.2 are plotted on the right in Figure 2.4 as a function of the 39 NFP measurementsโ€™ SNR in this case. In both experiments plotted in Figure 2.4, we note that both random initialization as well as the spectral initialization method from [14] appear insufficiently accurate to allow Algorithm 2.2 to converge. However, when the output of Algorithm 2.1 is used to compute ๐‘ง0 in step 2 of Algorithm 2.2, Algorithm 2.2 then converges nicely to an accurate estimate of the true signal. This further reinforces the potential value of Algorithm 2.1 as fast and accurate initializer for more traditional optimization-based solution approaches. Figure 2.4 A simulation applying Algorithm 2.1 via Remark 2.4.1 and then using its generated estimate as the initial estimate ๐‘ง0 in Algorithm 2.2. In both plots Algorithm 2.1 is plotted in solid blue, and Algorithm 2.2 is plotted in dashed red. Left: Reconstruction error vs SNR where the PSF is an low-pass filter with small additive noise, and the locally supported mask is symmetric and random. Here ๐‘‡ = 5000 iterations are used in Algorithm 2.2. Right: Reconstruction error vs SNR where the PSF is an vector with randomized unit magnitude entries, and the locally supported mask is random. Here ๐‘‡ = 1000 iterations are used in Algorithm 2.2. 2.8 APPLICATION OF ALGORITHM 2.1 We aim to apply a real world application of Algorithm 2.1, where we aim to recover a nxn-pixel color image. The image is first broken down into its three colour channels on the RGB scale, and converting to three individual integer valued matrices with entries from 0 to 255 based on the intensity of the color at each pixel. Each of these matrices is then separately reshaped into a 2 column vector in R๐‘› , which is then used for the object ๐‘ฅ in Algorithm 2.1. Once we obtain our three estimates for the three column vectors, they are then reshaped back into ๐‘› ร— ๐‘› matrices and 40 then combined together to form the color estimate of the original image. Figure 2.5 Here we have an example of this process in action. The original image is a 128 ร— 128 pixel color image. This would mean that ๐‘‘ = 1282 = 16, 384 in Algorithm 2.1, however to ensure that ๐‘‘ is divisible by 2๐›ฟ โˆ’ 1 for any given delta, we let ๐‘‘ ๐‘’๐‘ฅ๐‘ก be the smallest integer such that ๐‘‘ ๐‘’๐‘ฅ๐‘ก โ‰ฅ ๐‘‘ and ๐‘‘ ๐‘’๐‘ฅ๐‘ก is divisible by 2๐›ฟ โˆ’ 1. We then extend the reshapen pixel data such that ๐‘ฅ โˆˆ R๐‘‘๐‘’๐‘ฅ๐‘ก by adding ones at the extended part and disregard this extension once we recover the estimate. We test this recovery for two levels of delta and for each delta, we apply varying levels of noise. 2.9 CONCLUSIONS AND FUTURE WORK We have introduced two new algorithms for recovering a specimen of interest from near-field ptychographic measurements. Both of these algorithms rely on first reformulating and reshaping our measurements so that they resemble widely-studied far-field ptychographic measurements. We then recover our method using either Wirtinger Flow or using methods based on [54]. Algorithm 2.1 is computational efficient and, to the best of our knowledge, is the first algorithm with provable recovery guarantees for measurements of this form. Algorithm 2.2, on the other hand, has the advantage of being applied to more general masks with global support. Developing more efficient and provably accurate algorithms for this latter class of measurements remains an interesting avenue 41 for future work. 42 CHAPTER 3 BLIND PTYCHOGRAPHY VIA BLIND DECONVOLUTION 3.1 ABSTRACT Ptychography involves a sample being illuminated by a coherent, localised probe of illumination. When the probe interacts with the sample, the light is diffracted and a diffraction pattern is de- tected. Then the probe or sample is shifted laterally in space to illuminate a new area of the sample while ensuring there is sufficient overlap. Far-field Ptychography occurs when there is a large enough distance (when the Fresnel number is โ‰ช 1) to obtain magnitude-square Fourier transform measurements. In an attempt to remove ambiguities, masks are utilized to ensure unique outputs to any recovery algorithm are unique up to a global phase. In this paper, we assume that both the sample and the mask are unknown, and we apply blind deconvolutional techniques to solve for both. Numerical experiments demonstrate that the technique works well in practice, and is robust under noise. This chapter is comprised of three sections. Section 3.3 introduces far-field Fourier ptychog- raphy, and an algorithm for solving given noisy ptychographic measurements. In particular of use to us will be Theorem 3.3.2 which reformulates the measurements into a convolution. Sec- tion 3.4 explores a method for solving a blind deconvolution problem, given certain appropriate real-world assumptions. Section 3.5 combines the previous two sections, taking the reformulated convolutional measurements, assuming that both components are unknown, and then applying the blind deconvolution recovery algorithm. The full algorithm is stated and numerical simulations are summarized, outlining good recovery which is robust under noise. 3.2 INTRODUCTION Ptychography involves a sample being illuminated by a coherent, localised probe of illumination. When the probe interacts with the sample, the light is diffracted and a diffraction pattern is detected. Then the probe or sample is shifted laterally in space to illuminate a new area of the sample while ensuring there is sufficient overlap. Far-field Ptychography occurs when there is 43 a large enough distance (when the Fresnel number is โ‰ช 1) to obtain magnitude-square Fourier transform measurements. Ptychography was initially studied in the late 1960s ([45]), with the problem solidified in 1970 ([44]). The name "Ptychography" was coined in 1972 ([43]), after the Greek word to fold because the process involves an interference pattern such that the scattered waves fold into one anotherthe (coherent) Fourier diffraction pattern of the object. Initially developed to study crystalline objects under a scanning transmission electron microscope, since then the field has widen to setups such as using visible light ([19], [47], [85]), x-rays ([26], [115], [90]), or electrons ([123],[37][58]). It is benefited from being unaffected by lens-induced aberrations or diffraction effects unlike conventional lens imaging. Various types of ptychography are studied based on the optical configuration of the experiments. For instance, Bragg Ptychography ([38] [109], [46], [70]) measures strain in crystalline specimens by shifting the surface of the specimen. Fourier ptychography ([127],[114],[88],[128]) consists of taking multiple images at a wide field-of-view then computationally synthesizing into a high-resolution image reconstruction in the Fourier domain. This results in an increased resolution compared to a conventional microscope. Figure 3.1 [47] Experimental setup for fly-scan ptychography. 44 3.3 FAR-FIELD FOURIER PTYCHOGRAPHY Let x, m โˆˆ C๐‘‘ denote the unknown sample and known mask, respectively. We suppose that we have ๐‘‘ 2 noisy ptychographic measurements of the form (Y)โ„“,๐‘˜ = |(F(x โ—ฆ ๐‘† ๐‘˜ m))โ„“ | 2 + (N)โ„“,๐‘˜ , (โ„“, ๐‘˜) โˆˆ [๐‘‘]0 ร— [๐‘‘]0 , (3.1) where ๐‘† ๐‘˜ , โ—ฆ, F := F๐‘‘ denote ๐‘˜ th circular shift, Hadamard product, and ๐‘‘-dimensional discrete Fourier transform, and N is the matrix of additive noise. In this section, we will define a discrete Wigner distribution deconvolution method for recovering a discrete signal. A modified Wigner distribution deconvolution approach is used to solve for an estimate of xฬ‚xฬ‚โˆ— โˆˆ C๐‘‘ร—๐‘‘ and then angular synchronization is performed to compute estimate of xฬ‚ and thus x. In Section 3.3.1, we introduce definitions and techincal lemmas which will be of use. In particular, the decoupling lemma (Lemma 3.3.2) allows use to effectively โ€™separateโ€™ the mask and object from a convolution. In Section 3.3.2, these technical lemmas are applied to the ptychographic measurements to write the problem as a decoupled deconvolution problem, the blind variant of which will be studied later on. In Section 3.3.3, an additional Fourier transform is applied and the measurements have been rewritten to a form in which a pointwise division approach can be applied. Sub-sampled version of this theorem are also given. We then state the full algorithm for recovering the sample. 3.3.1 PROPERTIES OF THE DISCRETE FOURIER TRANSFORM We firstly define the modulation operator. Definition 3.3.1. Given ๐‘˜ โˆˆ [๐‘‘]0 , define the modulation operator ๐‘Š ๐‘˜ : C๐‘‘ โˆ’โ†’ C๐‘‘ component-wise via (๐‘Š ๐‘˜ x)๐‘› = ๐‘ฅ ๐‘› ๐‘’ 2๐œ‹๐‘–๐‘˜๐‘›/๐‘‘ , โˆ€๐‘› โˆˆ [๐‘‘]0 . (3.2) From this definition, we can develop some useful equalities which we will use in the main 45 proofs of this section. Lemma 3.3.1. (Technical Equalities) (Lemma 1.3.1., [78]) The following equalities hold for all x โˆˆ C๐‘‘ , [โ„“] โˆˆ [๐‘‘]0 : (i) F๐‘‘ xฬ‚ = ๐‘‘ ยท xฬƒ; (ii) F๐‘‘ (๐‘Šโ„“ x) = ๐‘†โˆ’โ„“ xฬ‚; (iii) F๐‘‘ (๐‘†โ„“ x) = ๐‘Šโ„“b x; ยฏ (iv) ๐‘Šโˆ’โ„“ F๐‘‘ (๐‘†โ„“ xฬƒยฏ ) = xฬ‚; (v) ๐‘†g ยฏ; โ„“ x = ๐‘† โˆ’โ„“ xฬƒ (vi) F๐‘‘ xฬ„ = F๐‘‘ xฬƒ; (vii) xฬ‚หœ = F๐‘‘ xฬƒ. We wish to be able to convert between the convolution and the Hadamard product, so we will need this useful theorem. Theorem 3.3.1. (Discretized Convolution Theorem) (Lemma 1.3.2., [78]) Let x, y โˆˆ C๐‘‘ . We have that (i) ๐น๐‘‘โˆ’1 (xฬ‚ โ—ฆ yฬ‚) = x โˆ—๐‘‘ y; (ii) (F๐‘‘ x) โˆ—๐‘‘ (F๐‘‘ y) = ๐‘‘ ยท F๐‘‘ (x โ—ฆ y). Currently, the measurements we are dealing with will be having the specimen and the mask intertwined. We introduce the decoupling lemma to essentially detangle the two. Lemma 3.3.2. (Decoupling Lemma) (Lemma 1.3.3., [78]) Let x, y โˆˆ C๐‘‘ , โ„“, ๐‘˜ โˆˆ [๐‘‘]0 . Then     (x โ—ฆ ๐‘†โˆ’โ„“ y) โˆ—๐‘‘ (xฬƒยฏ โ—ฆ ๐‘†โ„“ yฬƒยฏ ) = (x โ—ฆ ๐‘†โˆ’๐‘˜ xฬ„) โˆ—๐‘‘ ( ๐‘ฆหœ โ—ฆ ๐‘† ๐‘˜ yฬƒยฏ ) . (3.3) ๐‘˜ โ„“ Proof. Let x, y โˆˆ C๐‘‘ , โ„“, ๐‘˜ โˆˆ [๐‘‘]0 . By the definitions of the circular convolution, Hadamard product 46 and shift operator, we have that   ๐‘‘โˆ’1 โˆ‘๏ธ (x โ—ฆ ๐‘†โˆ’โ„“ y) โˆ—๐‘‘ (xฬƒยฏ โ—ฆ ๐‘†โ„“ yฬƒยฏ ) = (x โ—ฆ ๐‘†โˆ’โ„“ y)๐‘› ((xฬƒยฏ โ—ฆ ๐‘†โ„“ yฬƒยฏ ) ๐‘˜โˆ’๐‘› ๐‘˜ ๐‘›=0 ๐‘‘โˆ’1 โˆ‘๏ธ = ๐‘ฅ ๐‘› ๐‘ฆ ๐‘›โˆ’โ„“ ๐‘ฅยฏหœ ๐‘˜โˆ’๐‘› ๐‘ฆยฏหœ โ„“+๐‘˜โˆ’๐‘› ๐‘›=0 ๐‘‘โˆ’1 โˆ‘๏ธ = ๐‘ฅ ๐‘› ๐‘ฅยฏ๐‘›โˆ’๐‘˜ ๐‘ฆหœ โ„“โˆ’๐‘› ๐‘ฆยฏหœ ๐‘˜+โ„“โˆ’๐‘› (3.4) ๐‘›=0 ๐‘‘โˆ’1 โˆ‘๏ธ = (x โ—ฆ ๐‘†โˆ’๐‘˜ x)๐‘› ((yฬƒ โ—ฆ ๐‘† ๐‘˜ yฬƒยฏ )โ„“โˆ’๐‘› ๐‘›=0   = (x โ—ฆ ๐‘†โˆ’๐‘˜ xฬ„) โˆ—๐‘‘ ( ๐‘ฆหœ โ—ฆ ๐‘† ๐‘˜ yฬƒยฏ ) . โ„“ Lastly before entering the main part of this subsection, we need a lemma involving looking at how the Fourier squared magnitude measurements will relate to a convolution. Lemma 3.3.3. Let x โˆˆ C๐‘‘ . We have that |F๐‘‘ x| 2 = F๐‘‘ (x โˆ—๐‘‘ xฬƒยฏ ). (3.5) Proof. Let x โˆˆ C๐‘‘ . Then we have that |F๐‘‘ x| 2 = (F๐‘‘ x) โ—ฆ (F๐‘‘ x) = (F๐‘‘ x) โ—ฆ (F๐‘‘ xฬƒยฏ ) = F๐‘‘ (x โˆ—๐‘‘ xฬƒยฏ ). (3.6) 3.3.2 DISCRETIZED WIGNER DISTRIBUTION DECONVOLUTION We now prove the Discretized Wigner Distribution Deconvolution theorem which will allow us to convert the measurements into a form in which we can algorithmically solve. Theorem 3.3.2. (Lemma 1.3.5., [78]) Let x, m โˆˆ C๐‘‘ denote the unknown specimen and known 47 mask, respectively. Suppose we have ๐‘‘ 2 noisy ptychographic measurements of the form ๐‘‘โˆ’1 2 ๐‘ฅ ๐‘› ๐‘š ๐‘›โˆ’โ„“ ๐‘’ โˆ’2๐œ‹๐‘–๐‘›๐‘˜/๐‘‘ + (N)โ„“,๐‘˜ , โˆ‘๏ธ (yโ„“ ) ๐‘˜ = (โ„“, ๐‘˜) โˆˆ [๐‘‘]0 ร— [๐‘‘]0 . (3.7) ๐‘›=0 Let Y โˆˆ R๐‘‘ร—๐‘‘ , N โˆˆ C๐‘‘ร—๐‘‘ be the matrices whose โ„“ ๐‘กโ„Ž column is yโ„“ , Nโ„“ respectively. Then for any ๐‘˜ โˆˆ [๐‘‘]0 ,     Y๐‘‡ F๐‘‡๐‘‘ = ๐‘‘ ยท (x โ—ฆ ๐‘† ๐‘˜ xฬ„) โˆ—๐‘‘ (mฬƒ โ—ฆ ๐‘†โˆ’๐‘˜ mฬƒยฏ ) + N๐‘‡ F๐‘‡ . (3.8) ๐‘‘ ๐‘˜ ๐‘˜ Proof. Let โ„“ โˆˆ [๐‘‘]0 . We have that   yโ„“ = |๐น๐‘‘ (x โ—ฆ ๐‘†โˆ’โ„“ m)| 2 +Nโ„“ = ๐น๐‘‘ (x โ—ฆ ๐‘†โˆ’โ„“ m) โˆ—๐‘‘ (xฬƒยฏ โ—ฆ ๐‘†โ„“ mฬƒ ยฏ ) + Nโ„“ . (3.9) Taking Fourier transform of both sides at ๐‘˜ โˆˆ [๐‘‘]0 and using that F๐‘‘ xฬ‚ = ๐‘‘ ยท xฬƒ yields   (F๐‘‘ yโ„“ ) ๐‘˜ = ๐‘‘ ยท (x โ—ฆ ๐‘†โˆ’โ„“ m) โˆ—๐‘‘ (xฬƒยฏ โ—ฆ ๐‘†โ„“ mฬƒ ยฏ) + (F๐‘‘ Nโ„“ ) ๐‘˜ (3.10) โˆ’๐‘˜   = ๐‘‘ ยท (x โ—ฆ ๐‘† ๐‘˜ x) โˆ—๐‘‘ (mฬƒ โ—ฆ ๐‘†โˆ’๐‘˜ mฬƒ ยฏ ) + (F๐‘‘ Nโ„“ ) ๐‘˜ , โ„“ by previous lemma. For fixed โ„“ โˆˆ [๐‘‘]0 , the vector F๐‘‘ yโ„“ โˆˆ C๐‘‘ is the โ„“ th column of the matrix F๐‘‘ Y, thus its transpose y๐‘‡โ„“ F๐‘‡๐‘‘ โˆˆ C๐‘‘ is the โ„“ th row of the matrix (F๐‘‘ Y)๐‘‡ . Similarly, ((N)โ„“ )๐‘‡ F๐‘‡๐‘‘ โˆˆ C๐‘‘ is the โ„“ th row of (F๐‘‘ N)๐‘‡ . Thus we have that        Y๐‘‡ F๐‘‡๐‘‘ = ๐‘‘ ยท (x โ—ฆ ๐‘† ๐‘˜ xฬ„) โˆ—๐‘‘ (mฬƒ โ—ฆ ๐‘†โˆ’๐‘˜ mฬƒ ยฏ ) + N๐‘‡ F๐‘‡ . (3.11) ๐‘‘ ๐‘˜ โ„“ โ„“ ๐‘˜,โ„“ Thus we have that     Y๐‘‡ F๐‘‡๐‘‘ = ๐‘‘ ยท (x โ—ฆ ๐‘† ๐‘˜ xฬ„) โˆ—๐‘‘ (mฬƒ โ—ฆ ๐‘†โˆ’๐‘˜ mฬƒยฏ ) + N๐‘‡ F๐‘‡ . (3.12) ๐‘‘ ๐‘˜ ๐‘˜ 48 We note that x โ—ฆ ๐‘† ๐‘˜ xฬ„ is a diagonal of xxโˆ— . 3.3.3 WIGNER DISTRIBUTION DECONVOLUTION ALGORITHM We suppose that the mask is known and the specimen is unknown. By taking an additional Fourier transform and using the discretized convolution theorem, we have these variances of the previous lemmas. Theorem 3.3.3. (Discretized Wigner Distribution Deconvolution) Let x, m โˆˆ C๐‘‘ denote the unknown specimen and known mask, respectively. Suppose we have ๐‘‘ 2 noisy spectrogram mea- surements of the form ๐‘‘โˆ’1 2 ๐‘ฅ ๐‘› ๐‘š ๐‘›โˆ’โ„“ ๐‘’ โˆ’2๐œ‹๐‘–๐‘›๐‘˜/๐‘‘ + (N)โ„“,๐‘˜ , โˆ‘๏ธ (yโ„“ ) ๐‘˜ = (โ„“, ๐‘˜) โˆˆ [๐‘‘]0 ร— [๐‘‘]0 . (3.13) ๐‘›=0 Let Y โˆˆ R๐‘‘ร—๐‘‘ be the matrix whose โ„“ ๐‘กโ„Ž column is yโ„“ . Then for any ๐‘˜ โˆˆ [๐‘‘]0     F๐‘‘ Y๐‘‡ F๐‘‡๐‘‘ = ๐‘‘ ยท F๐‘‘ (x โ—ฆ ๐‘† ๐‘˜ xฬ„) โ—ฆ F๐‘‘ (mฬƒ โ—ฆ ๐‘†โˆ’๐‘˜ mฬƒ ยฏ ) + F๐‘‘ N๐‘‡ F๐‘‡ . (3.14) ๐‘‘ ๐‘˜ ๐‘˜ We also have a similar result based on the work in Appendix B. Lemma 3.3.4. (Sub-Sampling In Frequency) Suppose that the spectrogram measurements are collected on a subset K โІ [๐‘‘]0 of ๐พ equally spaced Fourier modes. Then for any ๐œ” โˆˆ [๐พ]0 ๐‘‘   ๐พ โˆ’1 โˆ‘๏ธ   F๐‘‘ (Y๐พ,๐‘‘ )๐‘‡ F๐‘‡๐พ =๐พ ยฏ ) + F๐‘‘ (N๐พ,๐‘‘ )๐‘‡ F๐‘‡ F๐‘‘ (x โ—ฆ ๐‘†โ„“๐ฟโˆ’๐›ผ xฬ„) โ—ฆ F๐‘‘ (mฬƒ โ—ฆ ๐‘† ๐›ผโˆ’โ„“๐ฟ mฬƒ ๐พ ๐œ” ๐‘Ÿ=0 ๐œ” where Y๐พ,๐‘‘ โˆˆ C๐พร—๐‘‘ is the matrix of sub-sampled noiseless ๐พ ยท ๐‘‘ measurements. Lemma 3.3.5. (Sub-Sampling In Frequency And Space) Suppose we have spectrogram measure- ments collected on a subset K โІ [๐‘‘]0 of ๐พ equally spaced frequencies and a subset L โІ [๐‘‘]0 of ๐ฟ 49 equally spaced physical shifts. Then for any ๐œ” โˆˆ [๐พ]0 , ๐›ผ โˆˆ [๐ฟ]0   F ๐ฟ (Y๐พ,๐ฟ )๐‘‡ (F๐‘‡๐พ )๐œ” ๐›ผ ๐‘‘ ๐‘‘ ๐พ โˆ’1 ๐ฟ โˆ’1 ๐พ ๐ฟ โˆ‘๏ธ โˆ‘๏ธ  ยฏ   ยฏ   ๐‘‡ ๐‘‡  = F ๐‘‘ (xฬ‚ โ—ฆ ๐‘† โ„“๐ฟโˆ’๐›ผ xฬ‚) ๐น๐‘‘ (mฬ‚ โ—ฆ ๐‘† ๐›ผโˆ’โ„“๐ฟ mฬ‚) + F ๐ฟ (N ๐พ,๐ฟ ) (F ) ๐พ ๐œ” , ๐‘‘ 3 ๐‘Ÿ=0 โ„“=0 ๐œ”โˆ’๐‘Ÿ๐พ ๐œ”โˆ’๐‘Ÿ๐พ ๐›ผ where Y๐พ,๐ฟ โˆˆ C๐พร—๐ฟ is the matrix of sub-sampled noiseless ๐พ ยท ๐ฟ measurements. Assume that m is band-limited with ๐‘ ๐‘ข ๐‘ ๐‘(mฬ‚) = [๐›ฟ]0 for some ๐›ฟ โ‰ช ๐‘‘. Then the algorithm below allows for the recovery of an estimate of xฬ‚ from spectrogram measurements via Wigner distribution deconvolution and angular synchronization. Algorithm 3.1 (Algorithm 1, [78]) Wigner Distribution Deconvolution Algorithm Input: 1) ๐‘Œ๐‘‘,๐ฟ โˆˆ C๐‘‘ร—๐ฟ , matrix of noisy measurements. 2) Mask m โˆˆ C๐‘‘ with ๐‘ ๐‘ข ๐‘ ๐‘(mฬ‚) = [๐›ฟ]. 3) Integer ๐œ… โ‰ค ๐›ฟ, so that 2๐œ… โˆ’ 1 diagonals of xฬ‚xฬ‚โˆ— are estimated, and ๐ฟ = ๐›ฟ + ๐œ… โˆ’ 1. Output: An estimate x๐‘’๐‘ ๐‘ก of x up to a global phase. 1) Perform pointwise division to compute   ๐‘‡ ๐‘‡ 1 F๐‘‘ Y F๐‘‘ ๐‘˜ . (3.15) ๐‘‘ Fโˆ’1 ยฏ ๐‘‘ (mฬƒ โ—ฆ ๐‘† โˆ’๐‘˜ mฬƒ) 2) Invert the (2๐œ… โˆ’ 1) Fourier transforms above. 3) Organize values from step 2 to form the diagonals of a banded matrix ๐‘Œ2๐œ…โˆ’1 . 4) Perform angular synchronization on ๐‘Œ2๐œ…โˆ’1 to obtain xฬ‚๐‘’๐‘ ๐‘ก . 5) Let x๐‘’๐‘ ๐‘ก = Fโˆ’1 ๐‘‘ xฬ‚๐‘’๐‘ ๐‘ก . When the mask is known with ๐‘ ๐‘ข ๐‘ ๐‘(mฬ‚) = [๐›ฟ]0 , ๐›ฟ โ‰ช ๐‘‘, maximum error guarantees (Theorem 2.1.1., [78]) are given depending on x, ๐‘‘, ๐œ…, ๐ฟ, โˆฅ๐‘ ๐‘‘,๐ฟ โˆฅ ๐น (the matrix formed by the noise) and the mask dependent constant ๐œ‡ > 0, ๐œ‡ := ๐‘š๐‘–๐‘› (๐น๐‘‘ (mฬ‚ โ—ฆ ๐‘† ๐‘ mฬ‚)) ยฏ ๐‘ž. (3.16) | ๐‘|โ‰ค๐œ…โˆ’1,๐‘žโˆˆ[๐‘‘]0 In the next section, we look at the situation in which both the specimen and mask are unknown. Since we have already shown that we can rewrite the Fourier squared magnitude measurements 50 as convolutions between the shifted autocorrelations, then the next obvious step is when both the specimen and mask are unknown. This is the topic of blind deconvolution, which seeks to recover vectors from their deconvolution. In particular, we will look at a couple of approaches which involve making assumptions based on real world applications. 3.4 BLIND DECONVOLUTION 3.4.1 INTRODUCTION Blind Deconvolution is a problem that has been mathematically considered for decades, from more past work ([4], [61], [112], [34][79], [65]) to more recent work ([15], [72], [2], [35] [71]), and summarized in [69]. The goal is to recover a sharp image from an initial blurry image. The first application to compressive sensing was considered in [2]. We consider one-dimensional, discrete, noisy measurements of the form y = f โˆ— g + n, where the f is considered to be an object, signal, or image of consideration. g is considered to be a blurring, masking, or point-spread function. n is considered to be the noise vector. โˆ— refers to circulant convolution1. We consider situations when both f and g are unknown. The process of recovering the object and blurring function can be generalized to two-dimensional measurements. The problem of estimating the unknown blurring function and unknown object simultaneously is known as blind image restoration ([124], [122], [64], [105]). Although strictly speaking, blind deconvolution refers to the noiseless model of recovering f and g from y = f โˆ— g, the noisy model is commonly refered to as blind deconvolution itself, and this notation will be continued in this chapter. As we will show later, the problem is ill-posed and ambiguities lead to no unique solution to the pair being viable from any approach. In Section 3.4.2, we consider the underlying measuerements and assumptions that we will consider. We then show how through manipulation, that we can re-write the original problem as a minimization of a non-convex linear function. In Section 3.4.3, we demonstrate an iterative 1โˆ— should refer to ordinary convolution, but for g that will be considered in this chapter, circulant convolution will be sufficient. 51 approach to the minimization problem, in particular, by applying Wirtinger Gradient Descent. In Section 3.4.5, we outline the initial estimate used for this gradient descent and fully layout the algorithm that will apply in our numerical simulations. In Section 3.4.6, we will look at the recovery guarantees that currently exist for this approach. Finally, in Section 3.4.7, we consider the key conditions used to generate the main recovery theorem, and where further work could be done to generalize these conditions and ultimately, allow more guarantees of recovery. 3.4.2 BLIND DECONVOLUTION MODEL We now want to approach the blind ptychography problem in which both the mask and specimen are unknown. Using the lemmas in the previous section, we can see that this would reduce to solving a blind deconvolution problem. Definition 3.4.1. We consider the blind deconvolution model yโ€ฒ = f โˆ— g + n, y, f, g, n โˆˆ C๐‘‘ , where y are blind deconvolutional measurements, f is the unknown blurring function (which serves a similar role as to our phase retrieval masks), n is the noise, and g is the signal (which serves a similar role as to our phase retrieval object). Here โˆ— denotes circular convolution. We will base our work on the algorithm suggested in [71], considering the assumptions used. In [71], the authors impose general conditions on f and g that are not restricted to any particular application but allows for flexibility. They also assume that f and g belong to known linear subspaces. For the blurring function, it is assumed that f is either compactly supported, or that f decays sufficiently fast so that it can be well approximated by a compactly supported function. Therefore, we make the assumption that f โˆˆ C๐‘‘ satisfies ๏ฃฎ ๏ฃน ๏ฃฏ h ๏ฃบ f := ๏ฃฏ ๏ฃฏ ๏ฃบ ๏ฃบ, ๏ฃฏ0 ๏ฃบ ๏ฃฏ ๐‘‘โˆ’๐พ ๏ฃบ ๏ฃฐ ๏ฃป 52 for some ๐‘˜ โ‰ช ๐‘‘, h โˆˆ C๐พ . This again reinforces the notion that the blurring function is analogous to our masking function since both are compactly supported. Figure 3.2 [35] An example of an image deblurring by solving the deconvolution. For the signal, it is assumed that g belongs to a linear subspace spanned by the columns of a known matrix C, i.e., g = Cxฬ„ for some matrix C โˆˆ C๐‘‘ร—๐‘ , ๐‘ โ‰ช ๐‘‘. This will lead to an additional restriction we have to place on our blind ptychography but one for which there are real world applications for which this assumption makes reasonable sense. In [71], the authors use that C is a Gaussian random matrix for theoretical guarantees although they demonstrated in numerical simulations that this assumption is not necessary to gain results. In particular, they found good results for when C represents a wavelet subspace (suitable for images) or when C is a Hadamard-type matrix (suitable for communications). ๐œŽ 2 ๐ฟ 02 ๐œŽ 2 ๐ฟ 02 We assume the noise is complex Gaussian, i.e. n โˆผ N (0, ๐ผ ๐‘‘ )+๐‘–N (0, ๐ผ ๐‘‘ ) is a complex 2 2 Gaussian noise vector, where ๐ฟ 0 = โˆฅh0 โˆฅยทโˆฅx0 โˆฅ, and h0 , x0 are the true blurring function and signal. ๐œŽ โˆ’2 represents the SNR. The goal is convert the problem into one which can be algorithmically solvable via gradient descent. Proposition 3.4.1. [71] Let F๐‘‘ โˆˆ ๐ถ ๐‘‘ร—๐‘‘ be DFT matrix. Let B โˆˆ C๐‘‘ร—๐พ denote the first ๐พ columns 53 of F๐‘‘ . Then we have that y = Bh โ—ฆ Ax + e, (3.17) 1 1 where y = โˆš ybโ€ฒ, Aฬ„ = FC โˆˆ C๐‘‘ร—๐‘ , and e = โˆš Fd n represents noise. ๐‘‘ ๐‘‘ Proof. By the unitary property of F๐‘‘ , we have that Bโˆ— B = I๐พ .By applying the scaled DFT matrix โˆš ๐ฟF๐‘‘ to both sides of the convolution, we have that โˆš โˆš โˆš โˆš ๐ฟFd y = ( ๐ฟFd f) โ—ฆ ( ๐ฟFd g) + ๐ฟFd n. Additionally, we have that ๏ฃฎ ๏ฃน h i๏ฃฏ h ๏ฃบ Fd f = B M ๏ฃฏ ๏ฃบ = Bh, ๏ฃฏ ๏ฃบ ๏ฃฏ0 ๏ฃบ ๏ฃฏ ๐‘‘โˆ’๐พ ๏ฃบ ๏ฃฐ ๏ฃป and we let Aฬ„ = FC โˆˆ C ๐ฟร—๐‘ . Since C is Gaussian, then Aฬ„ = FC is also Gaussian. In particular, 1 1 Aฬ„๐‘– ๐‘— โˆผ N (0, ) + ๐‘–N (0, ). 2 2 Thus by dividing by ๐‘‘, the problem converts to 1 โˆš ybโ€ฒ = Bh โ—ฆ Ax + e, ๐‘‘ 1 ๐œŽ 2 ๐ฟ 02 ๐œŽ 2 ๐ฟ 02 where e = โˆš FL n โˆผ N (0, I๐‘‘ ) + ๐‘–N (0, I๐‘‘ ) serves as complex Gaussian noise. Hence ๐‘‘ 2๐ฟ 2๐ฟ 1 by letting y = โˆš ybโ€ฒ, we arrive at ๐‘‘ y = Bh โ—ฆ Ax + e. 54 We have thus transformed the original blind deconvolution model as a hadamard product. This form of the problem is used in the rest of the section, where y โˆˆ C๐‘‘ , B โˆˆ C๐‘‘ร—๐พ , A โˆˆ C๐‘‘ร—๐‘ are given. Our goal is to recover h0 and x0 . There are inherent ambiguities to the problem however. If (h0 , x0 ) is a solution to the blind deconvolution problem, then so is (๐›ผh0 , ๐›ผโˆ’1 x0 ) for any non-zero constant ๐›ผ. For most real world โˆš applications, this is not an issue. Thus for uniformity, it is assumed that โˆฅh0 โˆฅ= โˆฅx0 โˆฅ= ๐ฟ 0 . Definition 3.4.2. We define the matrix-valued linear operator A : C๐พร—๐‘ โˆ’โ†’ C๐‘‘ by A(๐‘) := {bโ„“โˆ— ๐‘aโ„“ }โ„“=1 ๐‘‘ , where b ๐‘˜ denotes the ๐‘˜-th column of Bโˆ— , and a ๐‘˜ is the ๐‘˜-th column of Aโˆ— . We also define the corresponding adjoint operator A โˆ— : C๐‘‘ โˆ’โ†’ C๐พร—๐‘ , given by ๐‘‘ A โˆ— (z) := z ๐‘˜ b ๐‘˜ aโˆ—๐‘˜ . โˆ‘๏ธ ๐‘˜=1 We see that this translates to a lifting problem, where ๐‘‘ ๐พ bโ„“ bโ„“โˆ— = Bโˆ— B = I๐พ , E(aโ„“ aโ„“โˆ— ) = I๐‘ , โˆ‘๏ธ โˆฅbโ„“ โˆฅ= , โˆ€๐‘˜ โˆˆ [๐‘‘]. ๐‘˜=1 ๐‘‘ Lemma 3.4.1. Let ๐‘ฆ be defined as in Proposition 3.4.1. Then y = A(h0 xโˆ—0 ) + e. (3.18) This equivalent model to Proposition 3.4.1 will the model worked with for the rest of the chapter. We aim to recover (h0 , x0 ) by solving the minimization problem min ๐น(h, x), ๐น(h, x) := โˆฅA(hxโˆ— ) โˆ’ yโˆฅ 2 = โˆฅA(hxโˆ— โˆ’ h0 xโˆ—0 ) โˆ’ eโˆฅ 2 . (h,x) 55 We also define โˆฅhxโˆ— โˆ’ h0 xโˆ—0 โˆฅ ๐น ๐น0 (h, x) := โˆฅA(hxโˆ— โˆ’ h0 xโˆ—0 )โˆฅ 2 , ๐›ฟ = ๐›ฟ(h, x) = . ๐‘‘0 ๐น(h, x) is highly non-convex and thus attempts at minimization such as alternating minimization and gradient descent, can get easily trapped in some local minima. 3.4.2.1 MAIN THEOREMS Theorem 3.4.1. (Existence Of Unique Solution) ([2], Theorem 1) Fix ๐›ผ โ‰ฅ 1. Then there exists a constant ๐ถ๐›ผ = ๐‘‚(๐›ผ), such that if 2 ๐‘‘ max(๐พ ยท ๐œ‡max , ๐‘ ยท ๐œ‡2โ„Ž ) โ‰ค , ๐ถ๐›ผ log3 ๐‘‘ then X0 = h0 xโˆ—0 is the unique solution to our minimization problem with probability 1 โˆ’ ๐‘‚(๐‘‘ โˆ’๐›ผ+1 ), thus we can separate y = f โˆ— g up to a scalar multiple. When coherence is low, this is tight within a logarithmic factor, as we always have max(๐พ, ๐‘) โ‰ค ๐‘‘. Theorem 3.4.2. (Stability From Noise) ([2], Theorem 2) Let X0 = h0 mโˆ—0 and suppose the condition of previous theorem holds. We observe that y = A(X0 ) + e, where e โˆˆ R๐‘‘ is an unknown noise vector with โˆฅ๐‘งโˆฅ 2 โ‰ค ๐›ฟ, and estimate X0 by solving min โˆฅXโˆฅ โˆ— , subject to โˆฅb ๐‘ฆ โˆ’ A(X)โˆฅ 2 โ‰ค ๐›ฟ. Let ๐œ†min , ๐œ†max be the smallest/largest non-zero eigenvalue of AA โˆ— . Then with probability 1โˆ’ ๐‘‘ โˆ’๐›ผ+1 , the solution X will obey ๐œ† max โˆš๏ธ โˆฅX โˆ’ X0 โˆฅ ๐น โ‰ค ๐ถ min(๐พ, ๐‘)๐›ฟ, ๐œ† min 56 for a fixed constant ๐ถ. 3.4.3 WIRTINGER GRADIENT DESCENT In [71], the approach is to solve the minimization problem (Equation 3.19) using Wirtinger gradient descent. In this subsection, the algorithm is introduced as well as the main theorems which establish convergence of the proposed algorithm to the true solution. The algorithm consists of two parts: first an initial guess, and secondly, a variation of gradient descent, starting at the initial guess to converge to the true solution. Theoretical results are established for avoiding getting stuck in local minima. This is ensured by determining that the iterates are inside some properly chosen basin of attraction of the true solution. 3.4.4 BASIN OF ATTRACTION Proposition 3.4.2. Basin of Attraction: (Section 3.1, [71]) Three neighbourhoods are introduced whose intersection will form the basis of attraction of the solution: (i) Non-uniqueness: Due to the scale ambiguity, for numerical stability we introduce the following neighbourhood โˆš๏ธ โˆš๏ธ ๐‘ ๐ฟ 0 := {(h, x) | โˆฅhโˆฅโ‰ค 2 ๐ฟ 0 , โˆฅxโˆฅโ‰ค 2 ๐ฟ 0 }, ๐ฟ 0 = โˆฅh0 โˆฅยทโˆฅx0 โˆฅ. (ii) Incoherence: The number of measurements required for solving the blind deconvolution problem depends on how much h0 is correlated with the rows of the matrix B, with the hopes of minimizing the correlation. We define the incoherence between the rows of B and h0 , via ๐‘‘ โˆฅBh0 โˆฅ 2โˆž ๐œ‡h2 = . โˆฅh0 โˆฅ 2 To ensure that the incoherence of the solution is under control, we introduce the neighborhood โˆš โˆš๏ธ ๐‘ ๐œ‡ := {h | ๐‘‘ โˆฅBhโˆฅ โˆž โ‰ค 4 ๐ฟ 0 ๐œ‡}, ๐œ‡h โ‰ค ๐œ‡. (3.19) 57 (iii) Initial guess: A carefully chosen initial guess is required due to the non-convexity of the function we wish to minimize. The distance to the true solution is defined via the following neighborhood 1 ๐‘๐œ– := {(h, x) | โˆฅhxโˆ— โˆ’ h0 xโˆ—0 โˆฅ ๐น โ‰ค ๐œ– ๐ฟ 0 }, 0<๐œ– โ‰ค . (3.20) 15 Thus the basin of attraction is chosen as ๐‘ ๐‘‘0 โˆฉ ๐‘ ๐œ‡ โˆฉ ๐‘๐œ– , where the true solution lies. Figure 3.3 Basin of Attraction: ๐‘ ๐ฟ 0 โˆฉ ๐‘ ๐œ‡ โˆฉ ๐‘๐œ– . Our approach consists of two parts: We first construct an initial guess that is inside the basin of attraction ๐‘ ๐ฟ 0 โˆฉ ๐‘ ๐œ‡ โˆฉ ๐‘๐œ– . We then apply a regularized Wirtinger gradient descent algorithm that will ensure that all the iterates remain inside ๐‘ ๐ฟ 0 โˆฉ ๐‘ ๐œ‡ โˆฉ ๐‘๐œ– . To achieve that, we add a regularizing function ๐บ(h, x) to the objective function ๐น(h, x) to enforce that the iterates remain inside ๐‘ ๐ฟ 0 โˆฉ ๐‘ ๐œ‡ . Hence we aim the minimize the following regularized objective function, in order to solve the blind deconvolution problem: หœ x) := ๐น(h, x) + ๐บ(h, x), ๐น(h, where ๐น(h, x) := โˆฅA(hxโˆ— โˆ’ h0 xโˆ—0 ) โˆ’ eโˆฅ 2 is defined as before and ๐บ(h, x) is the penalty function, of 58 the form h  โˆฅhโˆฅ 2   โˆฅxโˆฅ 2  โˆ‘๏ธ๐‘‘  ๐‘‘|bโˆ— h| 2 i โ„“ ๐บ(h, x) := ๐œŒ ๐บ 0 + ๐บ0 + ๐บ0 , 2๐ฟ 2๐ฟ โ„“=1 8๐ฟ๐œ‡2 where ๐บ 0 (๐‘ง) := max{๐‘ง โˆ’ 1, 0}2 , ๐œŒ โ‰ฅ ๐ฟ 2 + 2โˆฅeโˆฅ 2 . 9 11 It is assumed ๐ฟ0 โ‰ค ๐ฟ โ‰ค ๐ฟ 0 and ๐œ‡ โ‰ฅ ๐œ‡ โ„Ž . 10 10 Remark 3.4.1. The matrix A โˆ— (e) = ๐‘‘๐‘˜=1 e ๐‘˜ b ๐‘˜ aโˆ—๐‘˜ as a sum of d rank-1 random matrices, has nice P concentration of measure properties. Asymptotically, โˆฅA โˆ— (๐‘’)โˆฅ converges to 0 with rate ๐‘‚(๐‘‘ 1/2 ). Note that ๐น(h, x) = โˆฅeโˆฅ 2 +โˆฅA(hxโˆ— โˆ’ h0 xโˆ—0 )โˆฅ 2๐น โˆ’2๐‘…๐‘’(โŸจA โˆ— (e), hxโˆ— โˆ’ h0 xโˆ—0 โŸฉ). ๐œŽ 2 ๐‘‘02 If one lets ๐‘‘ โˆ’โ†’ โˆž, then โˆฅeโˆฅ 2 โˆผ ๐œ’2 will converge almost surely to ๐œŽ 2 ๐‘‘ 2 2 and the cross term 2๐‘‘ 2๐‘‘ 0 ๐‘…๐‘’(โŸจhxโˆ— โˆ’ h0 xโˆ—0 , A โˆ— (e)โŸฉ) will converge to 0. In other words, asymptotically lim ๐น(h, x) = ๐น0 (h, x) + ๐œŽ 2 ๐ฟ 02 , ๐‘‘โ†’โˆž for all fixed (h, x). This implies that if the number of measurements is large, then ๐น(h, x) behaves "almost like" ๐น0 (h, x) = โˆฅA(hxโˆ— โˆ’ h0 xโˆ—0 )โˆฅ 2 , the noiseless version of ๐น(h, x). So for large ๐‘‘, we effectively ignore the noise. Theorem 3.4.3. For any given ๐‘ โˆˆ C๐พร—๐‘ , we have that E(A โˆ— (A(Z))) = Z. Proof. By linearity, sufficient to prove for ๐‘ โˆˆ C๐พร—๐‘ where ๐‘๐‘–, ๐‘— = 1, 0 otherwise. Then we have 2 This is due to the law of large numbers 59 that ๐ฟ ๐‘‘ ๐‘‘ E((A โˆ— A(Z))) = E( ๐‘ โˆ—๐‘˜๐‘– ๐‘Ž ๐‘˜ ๐‘— b ๐‘˜ aโˆ—๐‘˜ )) = (๐‘ โˆ—๐‘˜๐‘– b ๐‘˜ E(๐‘Ž ๐‘˜ ๐‘— aโˆ—๐‘˜ )) = (๐‘ โˆ—๐‘˜๐‘– b ๐‘˜ )๐‘งโˆ—๐‘— = e๐‘– eโˆ—๐‘— = ๐‘, โˆ‘๏ธ โˆ‘๏ธ โˆ‘๏ธ ๐‘˜=1 ๐‘˜=1 ๐‘˜=1 Thus we have that E(A โˆ— (y)) = E(A โˆ— (A(h0 xโˆ—0 ) + e) = E(A โˆ— (A(h0 xโˆ—0 ))) + E(A โˆ— (e)) = h0 xโˆ—0 , since E(A โˆ— (e)) by the definition of e. Hence it makes logical sense that the leading singular value and vectors of A โˆ— (y) would be a good approximation of ๐ฟ 0 and (h0 , x0 ) respectively. 3.4.5 ALGORITHMS We can now state the algorithm for generating an initial estimate. Algorithm 3.2 Blind Deconvolution Initial Estimate Input: Blind Deconvolutional measurements y, ๐พ = supp( ๐‘“ ). Output: Estimate underlying signal and blurring function. 1) Compute A โˆ— (y) and find the leading singular value, left and right singular vectors of A โˆ— (y), denoted by ๐‘‘, hฬƒ0 , and xฬƒ0 respectively. 2) Solve the following optimization problem โˆš โˆš โˆš u0 := ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› โˆฅz โˆ’ ๐‘‘ hฬƒ0 โˆฅ 2 , subject to ๐‘‘ โˆฅ๐ตzโˆฅ โˆž โ‰ค 2 ๐ฟ๐œ‡, ๐‘ง โˆš and x0 = ๐ฟ xฬƒ0 . Since we are dealing with complex variables, for the gradient descent, Wirtinger derivatives are utilized. Since ๐นหœ is a real-valued function, we only need to consider the derivative of ๐น, หœ with respect to hฬ„ and xฬ„, and the corresponding updates of h and x since ๐œ• ๐นหœ ๐œ• ๐นหœ ๐œ• ๐นหœ ๐œ• ๐นหœ = , = . ๐œ• hฬ„ ๐œ•h ๐œ• xฬ„ ๐œ•x 60 In particular, we denote ๐œ• ๐นหœ ๐œ• ๐นหœ โˆ‡ ๐นหœh := , หœ := โˆ‡ ๐นx . (3.21) ๐œ• hฬ„ ๐œ• xฬ„ We can now state the full algorithm. Algorithm 3.3 Wirtinger Gradient Descent Blind Deconvolution Algorithm Input: Blind Deconvolutional measurements y, ๐พ = supp( ๐‘“ ). Output: Estimate underlying signal and blurring function. 1) Compute A โˆ— (y) and find the leading singular value, left and right singular vectors of A โˆ— (y), denoted by ๐‘‘, hฬƒ0 , and xฬƒ0 respectively. 2) Solve the following optimization problem โˆš โˆš โˆš u0 := ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› โˆฅz โˆ’ ๐ฟ hฬƒ0 โˆฅ 2 , subject to ๐‘‘ โˆฅ๐ตzโˆฅ โˆž โ‰ค 2 ๐ฟ๐œ‡, ๐‘ง โˆš and x0 = ๐ฟ xฬƒ0 . 3) Compute Wirtinger Gradient Descent while halting criterion false do ut = utโˆ’1 โˆ’ ฮทโˆ‡ ๐นหœh (utโˆ’1 , utโˆ’1 ) vt = vtโˆ’1 โˆ’ ฮทโˆ‡ ๐นหœx (vtโˆ’1 , vtโˆ’1 ) end while 4) Set (h, x) = (ut , vt ) In [71], the authors show that with a carefully chosen initial guess (u0 , v0 ), running Wirtinger gradient descent to minimize Fฬƒ(โ„Ž, ๐‘ฅ) will guarantee linear convergence of the sequence (u๐‘ก , v๐‘ก ) to the global minimum (h0 , x0 ) in the noiseless case, and also provide robust recovery in the presence of noise. The results are summarized in the following two theorems. 3.4.6 MAIN THEOREMS Theorem 3.4.4. (Main Theorem 1) ([71], Theorem 3.1) The initialization obtained via Algorithm 3.2 satisfies 1 1 9 11 (u0 , v0 ) โˆˆ โˆš ๐‘ ๐ฟ 0 โˆฉ โˆš ๐‘ ๐œ‡ โˆฉ ๐‘ 2 , ๐ฟ0 โ‰ค ๐ฟ โ‰ค ๐ฟ0, 3 3 ๐œ– 10 10 5 61 with probability at least 1 โˆ’ ๐‘‘ โˆ’๐›พ if the number of measurements is sufficient large, that is ๐‘‘ โ‰ฅ ๐ถ๐›พ (๐œ‡2โ„Ž + ๐œŽ 2 ) max{๐พ, ๐‘ } log2 ๐‘‘/๐œ– 2 ,  i 1 where ๐œ– is a predetermined constant on 0, 15 , and ๐ถ๐›พ is a constant only linearly depending on ๐›พ with ๐›พ โ‰ฅ 1. The following theorem establishes that as long as the initial guess lies inside the basin of attraction of the true solution, regularized gradient descent will converge to this solution (or to a nearby solution in case of noisy data). Theorem 3.4.5. (Main Theorem 2) ([71], Theorem 3.2) Assume that the initialization (u0 , v0 ) โˆˆ โˆš1 ๐‘ ๐ฟ 0 โˆฉ โˆš1 ๐‘ ๐œ‡ โˆฉ ๐‘ 2 ๐œ– , and that ๐‘‘ โ‰ฅ ๐ถ๐›พ (๐œ‡2 + ๐œŽ 2 ) max{๐พ, ๐‘ } log2 (๐ฟ)/๐œ– 2 . Then Algorithm 3.3 will 3 3 5 create a sequence (u๐‘ก , v๐‘ก ) โˆˆ ๐‘ ๐‘‘0 โˆฉ ๐‘ ๐œ‡ โˆฉ ๐‘๐œ– which converges geometrically to (h0 , x0 ) in the sense 1 that with probability at least 1 โˆ’ 4๐‘‘ โˆ’๐›พ โˆ’ ๐‘’ โˆ’(๐พ+๐‘) , we have that ๐›พ 1 2  max{๐‘ ๐‘–๐‘›ฬธ (u๐‘ก , h0 ), ๐‘ ๐‘–๐‘›ฬธ (v๐‘ก , x0 )} โ‰ค (1 โˆ’ ๐œ‚๐œ”)๐‘ก/2 ๐œ‚๐ฟ 0 + 50โˆฅA โˆ— (e)โˆฅ , ๐ฟ๐‘ก 3 and 2 |๐ฟ ๐‘ก โˆ’ ๐ฟ 0 |โ‰ค (1 โˆ’ ๐œ‚๐œ”)๐‘ก/2 ๐œ– ๐ฟ 0 + 50โˆฅA โˆ— (e)โˆฅ, 3 where ๐ฟ ๐‘ก := โˆฅu๐‘ก โˆฅยทโˆฅv๐‘ก โˆฅ, ๐œ” > 0, ๐œ‚ is the fixed stepsize. Here โˆš๏ธ‚ โˆš n (๐›พ + 1) max{๐พ, ๐‘ } log ๐‘‘ (๐›พ + ๐‘‘) ๐พ ๐‘ log2 ๐‘‘ o โˆฅA โˆ— (e)โˆฅโ‰ค ๐ถ0 ๐œŽ๐‘‘0 max , , ๐‘‘ ๐‘‘ holds with probability 1 โˆ’ ๐‘‘ โˆ’๐›พ . It has been shown with high probability that as long as the initial guess lies inside the basin of attraction of the true solution, Wirtinger gradient descent will converge towards the solution. 62 3.4.7 KEY CONDITIONS Theorem 3.4.6. Four Key Conditions: (i) (Local RIP Condition) ([71], Condition 5.1) The following local Restricted Isometry Property (RIP) for A holds uniformly for all (h, x) in the basin of attraction (๐‘ ๐ฟ 0 โˆฉ ๐‘ ๐œ‡ โˆฉ ๐‘๐œ– ) 3 5 โˆฅhxโˆ— โˆ’ h0 xโˆ—0 โˆฅ 2๐น โ‰ค โˆฅA(hxโˆ— โˆ’ h0 xโˆ—0 )โˆฅ 2 โ‰ค โˆฅhxโˆ— โˆ’ h0 xโˆ—0 โˆฅ 2๐น . 4 4 (ii) (Robustness Condition) ([71], Condition 5.2) For the complex Gaussian noise e, with high probability ๐œ– ๐ฟ0 โˆฅA โˆ— (e)โˆฅโ‰ค โˆš , 10 2 2 for ๐‘‘ sufficiently large, that is, ๐‘‘ โ‰ฅ ๐ถ๐›พ ( ๐œŽ๐œ– 2 + ๐œŽ๐œ– ) max{๐พ, ๐‘ }๐‘™๐‘œ๐‘” ๐‘‘; (iii) (Local Regularity Condition) ([71], Condition 5.3) There exists a regularity constant ๐œ” = ๐‘‘0 > 0 such that 5000 หœ x)โˆฅ 2 โ‰ฅ ๐œ”[๐น(h, โˆฅโˆ‡ ๐น(h, หœ x) โˆ’ ๐‘]+ , ๐‘ = โˆฅeโˆฅ 2 +1700โˆฅA โˆ— (e)โˆฅ 2 , for all (h, x) โˆˆ ๐‘ ๐ฟ 0 โˆฉ ๐‘ ๐œ‡ โˆฉ ๐‘๐œ– ; (iv) (Local Smoothness Condition) ([71], Condition 5.4) Denote z := (h, x). There exists a constant ๐ถ๐‘‘ such that โˆฅโˆ‡ ๐‘“ (z + ๐‘กฮ”z) โˆ’ โˆ‡ ๐‘“ (z)โˆฅโ‰ค ๐ถ๐‘‘ ๐‘ก โˆฅฮ”zโˆฅ, 0 โ‰ค ๐‘ก โ‰ค 1, for all {(z, ฮ”z) | z + ๐‘กฮ”z โˆˆ ๐‘๐œ– โˆฉ ๐‘ ๐นหœ , โˆ€0 โ‰ค ๐‘ก โ‰ค 1}, i.e., the whole segment connecting (h, x) and โˆ‡(h, x) belongs to the non-convex set ๐‘๐œ– โˆฉ ๐‘ ๐นหœ . 63 3.5 BLIND PTYCHOGRAPHY 3.5.1 INTRODUCTION A more recent area of study is blind ptychography, in which both the object and the mask are considered unknown, up to reasonable assumptions. The first successful recovery was given in ([111],[110]) , further study into the sufficient overlap ([10], [77], [76]), and summarized in [30]. Let x, m โˆˆ C๐‘‘ denote the unknown sample and mask, respectively. We suppose that we have ๐‘‘ 2 noisy ptychographic measurements of the form (Y)โ„“,๐‘˜ = |(F(x โ—ฆ ๐‘† ๐‘˜ m))โ„“ | 2 +(N)โ„“,๐‘˜ , (โ„“, ๐‘˜) โˆˆ [๐‘‘]0 ร— [๐‘‘]0 , (3.22) where ๐‘† ๐‘˜ , โ—ฆ, F denote ๐‘˜ th circular shift, Hadamard product, and ๐‘‘-dimensional discrete Fourier transform, and N is the matrix of additive noise. By Theorem 3.3.2, we have shown we can rewrite the measurements as     Y๐‘‡ F๐‘‡ ยฏ ) + N๐‘‡ F๐‘‡ , = ๐‘‘ ยท (x โ—ฆ ๐‘† ๐‘˜ xฬ„) โˆ— (mฬƒ โ—ฆ ๐‘†โˆ’๐‘˜ mฬƒ (3.23) ๐‘˜ ๐‘˜ where โˆ— denotes the ๐‘‘-dimensional discrete convolution, and mฬƒ denotes the reversal of m about its first entry. This is now a scaled blind deconvolution problem which has been studied in [2],[71]. 3.5.2 MAIN RESULTS 3.5.2.1 RECOVERING THE SAMPLE To recover the sample, we will need to assume that x belongs to a known subspace. Initially we solve algorithmically for the zero shift case (๐‘˜ = 0) and then generalize the method to solve for the estimate which utilizes all the obtained shifts. Our assumptions are as follows: x โˆˆ C๐‘‘ unknown, x = ๐ถxโ€ฒ, ๐ถ โˆˆ C๐‘‘ร—๐‘ , ๐‘ โ‰ช ๐ฟ known, xโ€ฒ โˆˆ C๐‘ or R๐‘ unknown m โˆˆ C๐‘‘ unknown, ๐‘ ๐‘ข ๐‘ ๐‘(m) โІ [๐›ฟ]0 , ๐พ known, โˆฅmโˆฅ 2 known. Known noisy measurements Y. Our first goal is to compute an estimate x๐‘’๐‘ ๐‘ก of x, true up to a global phase. We will use this 64 estimate to then produce an estimate m๐‘’๐‘ ๐‘ก of m, again true up to a global phase. Firstly, we let y be the first column of โˆš1 ยท F((FY) ยž๐‘‡ ), f = mฬƒ โ—ฆ mฬƒ (so โˆฅf โˆฅ 2 known). We ๐‘‘ next set g = x โ—ฆ xฬ„ but to fully utilize the blind deconvolution algorithm, we will need a lemma concerning hadamard products of products of matrices. Firstly, we need define some products between matrices. Definition 3.5.1. Let A = (๐ด๐‘–, ๐‘— ) โˆˆ C๐‘šร—๐‘› and B = (๐ต๐‘–, ๐‘— ) โˆˆ C ๐‘ร—๐‘ž . Then the Kronecker product A โŠ— B โˆˆ C๐‘š ๐‘ร—๐‘›๐‘ž is defined by (A โŠ— B)๐‘›(๐‘–โˆ’1)+๐‘˜,๐‘ž( ๐‘—โˆ’1)+โ„“ = ๐ด๐‘–, ๐‘— ๐ต ๐‘˜,โ„“ . Definition 3.5.2. Let A โˆˆ C๐‘šร—๐‘› and B โˆˆ C ๐‘ร—๐‘› with columns a๐‘– , b๐‘– for ๐‘– โˆˆ [๐‘›]0 . Then the Khatri-Rao product A โ€ข B โˆˆ C๐‘š ๐‘ร—๐‘› is defined by A โ€ข B = [a0 โŠ— b0 a1 โŠ— b1 . . . a๐‘›โˆ’1 โŠ— b๐‘›โˆ’1 ]. (3.24) Definition 3.5.3. Let A โˆˆ C๐‘šร—๐‘› and B โˆˆ C๐‘šร—๐‘ be matrices with rows a๐‘– , b๐‘– for ๐‘– โˆˆ [๐‘š]0 . Then the transposed Khatri-Rao product (or face-splitting product), denoted โŠ™, is the matrix whose rows are Kronecker products of the columns of A and B i.e. the rows of A โŠ™ B โˆˆ C๐‘šร—๐‘›๐‘ are given by (A โŠ™ B)๐‘– = a๐‘– โŠ— b๐‘– , ๐‘– โˆˆ [๐‘š]0 . We then utilize the following lemma concerning the transposed Khatri-Rao product. Lemma 3.5.1 (Theorem 1, [101]). Let A โˆˆ C๐‘šร—๐‘› , B โˆˆ C๐‘›ร—๐‘ , C โˆˆ C๐‘šร—๐‘ž , D โˆˆ C๐‘žร—๐‘ . Then we have that (AB) โ—ฆ (CD) = (A โ€ข C)(B โŠ™ D), where โ—ฆ is the Hadamard product, โ€ข is the standard Khatri-Rao product, and โŠ™ is the transposed Khatri-Rao product. 65 Thus by Lemma 3.5.1 we have that for g = x โ—ฆ xฬ„ = Cxโ€ฒ โ—ฆ Cฬ„xฬ„โ€ฒ. Then ๐‘” = Cโ€ฒxโ€ฒโ€ฒ where 2 2 Cโ€ฒ โˆˆ C ๐ฟร—๐‘ , xโ€ฒโ€ฒ โˆˆ C๐‘ are given by Cโ€ฒ = C โ€ข Cฬ„, xโ€ฒโ€ฒ = xโ€ฒ โŠ™ xฬ„โ€ฒ . We now compute RRR Blind Deconvolution (Algorithm 3.3) with y, f, g, C, ๐พ = ๐›ฟ as above (B last K columns of DFT matrix) to obtain estimate for xโ€ฒ โŠ™ xฬ„โ€ฒ. Use angular synchronisation to solve for xโ€ฒ, and thus solve for x. Algorithm 3.4 Blind Ptychography (Zero Shift) Input: 1) x โˆˆ C๐‘‘ unknown, x = ๐ถxโ€ฒ, ๐ถ โˆˆ C๐‘‘ร—๐‘ , ๐‘ โ‰ช ๐ฟ known, xโ€ฒ โˆˆ C๐‘ or R๐‘ unknown. 2) m โˆˆ C๐‘‘ unknown, ๐‘ ๐‘ข ๐‘ ๐‘(m) โІ [๐›ฟ]0 , ๐พ known, โˆฅmโˆฅ 2 known Known noisy measurements Y. 3) Known noisy measurements Y. Output: Estimate x๐‘’๐‘ ๐‘ก of x true up to a global phase. 1) Let y be the first column of โˆš1 ยท F((FY) ยž๐‘‡ ), f = mฬƒ โ—ฆ mฬƒ (so โˆฅf โˆฅ 2 known) ๐‘‘ 2 2 2) Let g = x โ—ฆ xฬ„ = Cxโ€ฒ โ—ฆ Cฬ„xฬ„โ€ฒ. Then ๐‘” = Cโ€ฒxโ€ฒโ€ฒ where Cโ€ฒ โˆˆ C๐‘‘ร—๐‘ , xโ€ฒโ€ฒ โˆˆ C๐‘ are given by Cโ€ฒ = C โ€ข Cฬ„, xโ€ฒโ€ฒ = xโ€ฒ โŠ™ xฬ„โ€ฒ . 3) Compute RRR Blind Deconvolution (Algorithm 1 & 2, [71]) with y, f, g, C, ๐พ = ๐›ฟ as above (B last K columns of DFT matrix) to obtain estimate for xโ€ฒ โŠ™ xฬ„โ€ฒ. 4) Use angular synchronisation to solve for xโ€ฒ, and thus compute x๐‘’๐‘ ๐‘ก . 3.5.2.2 RECOVERING THE MASK Once the estimate of x has been found, denoted x๐‘’๐‘ ๐‘ก , we use this estimate to find m๐‘’๐‘ ๐‘ก . We first compute g๐‘’๐‘ ๐‘ก = x๐‘’๐‘ ๐‘ก โ—ฆ x๐‘’๐‘ ๐‘ก , and then we use point-wise division to find ยฏ)= Fโˆ’1 ((FY) ยž๐‘‡ ) F(mฬƒ โ—ฆ mฬƒ . (3.25) F(x๐‘’๐‘ ๐‘ก โ—ฆ x๐‘’๐‘ ๐‘ก ) Then use an inverse Fourier transform, a reversal and then angular synchronization, similar to obtaining x๐‘’๐‘ ๐‘ก . 66 Algorithm 3.5 Recovering The Mask Input: 1) x๐‘’๐‘ ๐‘ก generated by Algorithm 3.4. 2) Known noisy measurements Y. 3) ๐‘ ๐‘ข ๐‘ ๐‘(m) โІ [๐›ฟ]0 , ๐พ known, โˆฅmโˆฅ 2 known. Output: Estimate m๐‘’๐‘ ๐‘ก of m true up to a global phase. 1) Compute g๐‘’๐‘ ๐‘ก = x๐‘’๐‘ ๐‘ก โ—ฆ x๐‘’๐‘ ๐‘ก and 2๐›ฟ โˆ’ 1 perform point-wise divisions to obtain ยฏ ๐‘’๐‘ ๐‘ก ) = Fโˆ’1 ((FY) ยž๐‘‡ ) ๐‘˜ F(mฬƒ๐‘’๐‘ ๐‘ก โ—ฆ ๐‘†โˆ’๐‘˜ mฬƒ . (3.26) F(x๐‘’๐‘ ๐‘ก โ—ฆ ๐‘† ๐‘˜ x๐‘’๐‘ ๐‘ก ) 2) Compute inverse Fourier transform to obtain mฬƒ๐‘’๐‘ ๐‘ก โ—ฆ๐‘†โˆ’๐‘˜ mฬƒ ยฏ ๐‘’๐‘ ๐‘ก and use these to form the diagonals of a banded matrix. 3) Use angular synchronisation to solve for mฬƒ๐‘’๐‘ ๐‘ก , and thus perform a reversal to compute m๐‘’๐‘ ๐‘ก . โˆฅm๐‘’๐‘ ๐‘ก โˆฅ 2 4) Let ๐›ผ = . Finally, let x๐‘’๐‘ ๐‘ก = ๐›ผx๐‘’๐‘ ๐‘ก , m๐‘’๐‘ ๐‘ก = ๐›ผโˆ’1 m๐‘’๐‘ ๐‘ก โˆฅmโˆฅ 2 3.5.2.3 MULTIPLE SHIFTS To generalize the setup, we let y(๐‘˜) denote the ๐‘˜ th column of โˆš1 ยž๐‘‡ ), f(๐‘˜) = mฬƒ โ—ฆ ๐‘†โˆ’๐‘˜ mฬƒ. ยท F((FY) ๐‘‘ Let g(๐‘˜) = x โ—ฆ ๐‘† ๐‘˜ xฬ„ = Cxโ€ฒ โ—ฆ ๐‘† ๐‘˜ Cฬ„xฬ„โ€ฒ. Then again by another application of Lemma 3.5.1, g(๐‘˜) = Cโ€ฒ(๐‘˜) xโ€ฒโ€ฒ 2 2 where Cโ€ฒ โˆˆ C๐‘‘ร—๐‘ , xโ€ฒโ€ฒ โˆˆ C๐‘ are given by Cโ€ฒ(๐‘˜) = C โ€ข ๐‘† ๐‘˜ Cฬ„, 0 โ‰ค ๐‘˜ โ‰ค ๐พ, ๐‘‘ โˆ’ ๐พ + 1 โ‰ค ๐‘˜ โ‰ค ๐‘‘, xโ€ฒโ€ฒ = xโ€ฒ โŠ™ xฬ„โ€ฒ = ๐‘ฃ๐‘’๐‘(xโ€ฒ(xโ€ฒ)โˆ— ). We then perform 2๐›ฟ โˆ’ 1 blind deconvolutions to obtain 2๐›ฟ โˆ’ 1 estimates of x and m, labelled ๐‘— x๐‘–๐‘’๐‘ ๐‘ก , m๐‘’๐‘ ๐‘ก respectively for ๐‘–, ๐‘— โˆˆ [2๐›ฟ โˆ’ 1]0 . Ideally, we would want to select the estimates which generates the minimum error for each x and m but that implies prior knowledge of x and m. Instead, compute (2๐›ฟ โˆ’ 1)2 estimates of the Fourier measurements by (Y๐‘’๐‘ ๐‘ก )โ„“,๐‘˜ = |(F(x๐‘–๐‘’๐‘ ๐‘ก โ—ฆ ๐‘† ๐‘˜ m๐‘’๐‘ ๐‘ก ))โ„“ | 2 , ๐‘–, ๐‘— ๐‘— ๐‘–, ๐‘— โˆˆ [2๐›ฟ โˆ’ 1]0 . (3.27) 67 We then compute the associated error โˆฅY๐‘’๐‘ ๐‘ก โˆ’ Yโˆฅ 2๐น ๐‘–, ๐‘— (๐‘–โ€ฒ, ๐‘— โ€ฒ) = argmin , ๐‘–, ๐‘— โˆˆ [2๐›ฟ โˆ’ 1]0 . (3.28) (๐‘–, ๐‘—) โˆฅYโˆฅ 2๐น โ€ฒ ๐‘—โ€ฒ Then let x๐‘’๐‘ ๐‘ก = x๐‘–๐‘’๐‘ ๐‘ก , m๐‘’๐‘ ๐‘ก = m๐‘’๐‘ ๐‘ก . Algorithm 3.6 Blind Ptychography (Multiple Shifts) Input: 1) x โˆˆ C๐‘‘ unknown, x = ๐ถxโ€ฒ, ๐ถ โˆˆ C๐‘‘ร—๐‘ , ๐‘ โ‰ช ๐‘‘ known, xโ€ฒ โˆˆ C๐‘ or R๐‘ unknown. 2) m โˆˆ C๐‘‘ unknown, ๐‘ ๐‘ข ๐‘ ๐‘(m) โІ [๐›ฟ]0 , ๐พ known, โˆฅmโˆฅ 2 . 3) Known noisy measurements Y. Output: Estimate x๐‘’๐‘ ๐‘ก of x, true up to a global phase 1) Let y(๐‘˜) denote the ๐‘˜ th column of โˆš1 ยท F((FY) ยž๐‘‡ ), f(๐‘˜) = mฬƒ โ—ฆ ๐‘†โˆ’๐‘˜ mฬƒ (so โˆฅf(๐‘˜) โˆฅ 2 known). ๐‘‘ 2 2 2) Let g(๐‘˜) = x โ—ฆ ๐‘† ๐‘˜ xฬ„ = Cxโ€ฒ โ—ฆ ๐‘† ๐‘˜ Cฬ„xฬ„โ€ฒ. Then ๐‘”(๐‘˜) = Cโ€ฒxโ€ฒโ€ฒ where Cโ€ฒ โˆˆ C๐‘‘ร—๐‘ , xโ€ฒโ€ฒ โˆˆ C๐‘ are given by Cโ€ฒ(๐‘˜) = ๐ถ โ€ข ๐‘† ๐‘˜ ๐ถ, ยฏ 0 โ‰ค ๐‘˜ โ‰ค ๐พ, ๐‘‘ โˆ’ ๐พ + 1 โ‰ค ๐พ โ‰ค ๐‘‘, xโ€ฒโ€ฒ = xโ€ฒ โŠ™ xฬ„โ€ฒ . 3) Perform 2๐›ฟ โˆ’ 1 RRR Blind Deconvolutions (Algorithm 1 & 2, [71]) with y(๐‘˜) , f(๐‘˜) , g(๐‘˜) , C, as above to obtain 2๐›ฟ โˆ’ 1 estimates for xโ€ฒ โŠ™ xฬ„โ€ฒ. 4) Use angular synchronisation to solve for 2๐›ฟ โˆ’ 1 estimates xโ€ฒ๐‘’๐‘ ๐‘ก , and thus solve for 2๐›ฟ โˆ’ 1 estimates x๐‘–๐‘’๐‘ ๐‘ก = Cxโ€ฒ๐‘’๐‘ ๐‘ก , ๐‘– โˆˆ [2๐›ฟ โˆ’ 1]0 . ๐‘— 5) Use these estimates x๐‘–๐‘’๐‘ ๐‘ก to compute 2๐›ฟ โˆ’ 1 estimates m๐‘’๐‘ ๐‘ก , ๐‘— โˆˆ [2๐›ฟ โˆ’ 1]0 . โˆฅm๐‘–๐‘’๐‘ ๐‘ก โˆฅ 2 1 6) Let ๐›ผ๐‘– = , and for ๐‘– โˆˆ [2๐›ฟ โˆ’ 1]0 , let x๐‘–๐‘’๐‘ ๐‘ก = ๐›ผ๐‘– x๐‘–๐‘’๐‘ ๐‘ก , m๐‘–๐‘’๐‘ ๐‘ก = ๐‘– m๐‘–๐‘’๐‘ ๐‘ก โˆฅmโˆฅ 2 ๐›ผ 2 7) Compute (2๐›ฟ โˆ’ 1) estimates of the Fourier measurements by (Y๐‘’๐‘ ๐‘ก )โ„“,๐‘˜ = |(F(x๐‘–๐‘’๐‘ ๐‘ก โ—ฆ ๐‘† ๐‘˜ m๐‘’๐‘ ๐‘ก ))โ„“ | 2 , ๐‘–, ๐‘— ๐‘— ๐‘–, ๐‘— โˆˆ [2๐›ฟ โˆ’ 1]0 . (3.29) โˆฅY๐‘’๐‘ ๐‘ก โˆ’ Yโˆฅ 2๐น ๐‘–, ๐‘— We then compute the associated error (๐‘–โ€ฒ, ๐‘— โ€ฒ) = argmin , ๐‘–, ๐‘— โˆˆ [2๐›ฟ โˆ’ 1]0 . (๐‘–, ๐‘—) โˆฅYโˆฅ 2๐น โ€ฒ ๐‘—โ€ฒ 8) Let x๐‘’๐‘ ๐‘ก = x๐‘–๐‘’๐‘ ๐‘ก , m๐‘’๐‘ ๐‘ก = m๐‘’๐‘ ๐‘ก . 3.5.3 NUMERICAL SIMULATIONS All simulations were performed using MATLAB R2021b on an Intel desktop with a 2.60GHz i7-10750H CPU and 16GB DDR4 2933MHz memory. All code used to generate the figures below is publicly available at https://github.com/MarkPhilipRoach/BlindPtychography. To be more precise, we have defined the immeasurable (in practice since x and m are both 68 unknown) estimates Max Shift(๐‘ฅ) = argmax โˆฅx โˆ’ x๐‘–๐‘’๐‘ ๐‘ก โˆฅ 22 , Max Shift(๐‘š) = argmax โˆฅm โˆ’ m๐‘’๐‘ ๐‘ก โˆฅ 22 , ๐‘— ๐‘–, ๐‘— โˆˆ [2๐›ฟ โˆ’ 1]0 . ๐‘– ๐‘ฅ ๐‘’๐‘ ๐‘ก ๐‘— ๐‘š ๐‘’๐‘ ๐‘ก Min Shift(๐‘ฅ) = argmin โˆฅx โˆ’ x๐‘–๐‘’๐‘ ๐‘ก โˆฅ 22 , Min Shift(๐‘š) = argmin โˆฅm โˆ’ m๐‘’๐‘ ๐‘ก โˆฅ 22 , ๐‘— ๐‘–, ๐‘— โˆˆ [2๐›ฟ โˆ’ 1]0 . ๐‘– ๐‘ฅ ๐‘’๐‘ ๐‘ก ๐‘— ๐‘š ๐‘’๐‘ ๐‘ก and the measurable estimates. First, No Shift(๐‘ฅ) and No Shift(๐‘š) refer to the zero shift estimates outlined in Algorithm 3.4. Secondly, the estimates achieved in Algorithm 3.6 โˆฅY๐‘’๐‘ ๐‘ก โˆ’ Yโˆฅ 2๐น ๐‘–, ๐‘— (๐‘ฅ) (๐‘š) ๐‘–โ€ฒ ๐‘—โ€ฒ โ€ฒ โ€ฒ (Argmin Shift , Argmin Shift ) = (x , m ), (๐‘– , ๐‘— ) = argmin , ๐‘–, ๐‘— โˆˆ [2๐›ฟ โˆ’ 1]0 . (๐‘–, ๐‘—) โˆฅYโˆฅ 2๐น Figure 3.4 ๐‘‘ = 26 , ๐พ = ๐›ฟ = log2 ๐‘‘, ๐‘ = 4, C complex Gaussian. Max shift refers to the maximum error achieved from a blind deconvolution of a particular shift. Min shift refers to the maximum error achieved from a blind deconvolution of a particular shift. Argmin Shift refers to the choice of object and mask chosen in Step 6 of Algorithm 3.6. Averaged over 100 simulations. 1000 iterations. Figure 3.4 demonstrates robust recovery under noise. It also demonstrates the impact of performing the 2๐›ฟ โˆ’ 1 blind deconvolutions and taking the Argmin Shift, versus simply taking the non-shifted object and mask. It also demonstrates how closely the reconstructions error from Argmin Shift and Min Shift are, in particular for the mask. Figure 3.5 demonstrates the impact even more, showing that with a higher value for the known subspace, the more accurate the Argmin Shift and Min Shift are, as well as demonstrating the large difference between the Max Shift and Min Shift. 69 Figure 3.5 ๐‘‘ = 26 , ๐พ = ๐›ฟ = log2 ๐‘‘, ๐‘ = 6, C complex Gaussian. Max shift refers to the maximum error achieved from a blind deconvolution of a particular shift. Min shift refers to the maximum error achieved from a blind deconvolution of a particular shift. Argmin Shift refers to the choice of object and mask chosen in Step 6 of Algorithm 3.6. Averaged over 100 simulations. 1000 iterations. The following figures demonstrate recovery against additional noise, with varying ๐›ฟ and ๐‘. Figure 3.6 ๐‘‘ = 26 , ๐‘ = 4, C complex Gaussian. Application of Algorithm 3.6 with varying ๐พ = ๐›ฟ. Figure 3.7 ๐‘‘ = 26 , ๐พ = ๐›ฟ = 6, C complex Gaussian. Application of Algorithm 3.6 with varying ๐‘. 70 Next, we consider the frequency of the chosen index from performing the argmin function (step 6 of Algorithm 3.6) compared to the true minimizing indices for the object and mask separately. Firstly, we have the frequency of the argmin indices. Figure 3.8 ๐‘‘ = 26 , ๐›ฟ = 6, ๐‘ = 4, C complex Gaussian. 1000 simulations. Frequency of index being chosen to compute Argmin Shift(๐‘ฅ) and Argmin Shift(๐‘š) . Secondly, we have the frequency of the min shift for both of the object and mask. Both of Figure 3.8 and Figure 3.9 were computed on the same 1000 tests. Figure 3.9 ๐‘‘ = 26 , ๐›ฟ = 6, ๐‘ = 4, C complex Gaussian. 1000 simulations. Frequency of index being chosen to compute Min Shift(๐‘ฅ) and Min Shift(๐‘š) . Finally, we plot these choice of indices for both the Argmin Shift and Min Shift onto a two dimensional plot. 71 Figure 3.10 ๐‘‘ = 26 , ๐›ฟ = 6, ๐‘ = 4, C complex Gaussian. 1000 simulations. Frequency of indices being chosen to compute (Argmin Shift(๐‘ฅ) , Argmin Shift(๐‘š) ) and (Min Shift(๐‘ฅ) , Min Shift(๐‘š) ). 3.6 CONCLUSIONS AND FUTURE WORK We have introduced an algorithm for recovering a specimen of interest from blind far-field pty- chographic measurements. This algorithm relies on reformulating the measurements so that they resemble widely-studied blind deconvolutional measurements. This leads to transposed Khatri-Rao product estimates of our specimen which are then able to be recovered by angular synchroniza- tion. We then use these estimates in applying inverse Fourier transforms, point-wise division, and angular synchronization to recover estimates for the mask. Finally, we use a best error estimate sorting algorithm to find the final estimate of both the specimen and mask. As shown in numerical results, Algorithm 3.6 recovers both the sample and mask within a good margin of error. It also provides stability under noise. A further goal for this research would be to adapt the existing recovery guarantee theorems for the selected blind deconvolutional recovery algorithm, in which the assumed Gaussian matrix C is replaced with Khatri-Rao matrix Cโ€ฒ(๐‘˜) = C โ€ข ๐‘† ๐‘˜ Cฬ„. In particular, this would mean providing alternate inequalities for the four key conditions laid out in Theorem 3.4.6. 72 CHAPTER 4 ON OUTER BI-LIPSCHITZ EXTENSIONS OF LINEAR JOHNSON-LINDENSTRAUSS EMBEDDINGS OF LOW-DIMENSIONAL SUBMANIFOLDS OF R๐‘ 4.1 ABSTRACT Let M be a compact ๐‘‘-dimensional submanifold of R๐‘ with reach ๐œ and volume ๐‘‰M . Fix ๐œ– โˆˆ (0, 1). In this chapter, it is proven that a nonlinear function ๐‘“ : R๐‘ โ†’ R๐‘š exists with โˆš๐‘‘  2  ๐‘‰M ๐‘š โ‰ค ๐ถ ๐‘‘/๐œ– log ๐œ such that (1 โˆ’ ๐œ–)โˆฅx โˆ’ yโˆฅ 2 โ‰ค โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 2 โ‰ค (1 + ๐œ–)โˆฅx โˆ’ yโˆฅ 2 (4.1) holds for all x โˆˆ M and y โˆˆ R๐‘ . In effect, ๐‘“ not only serves as a bi-Lipschitz function from M into R๐‘š with bi-Lipschitz constants close to one, but also approximately preserves all distances from points not in M to all points in M in its image. Furthermore, the proof is constructive and yields an algorithm which works well in practice. In particular, it is empirically demonstrated herein that such nonlinear functions allow for more accurate compressive nearest neighbor classification than standard linear Johnson-Lindenstrauss embeddings do in practice. 4.2 INTRODUCTION The classical Kirszbraun theorem [63] ensures that a Lipschitz continuous function ๐‘“ : ๐‘† โ†’ R๐‘š from a subset ๐‘† โŠ‚ R๐‘ into R๐‘š can always be extended to a function ๐‘“หœ : R๐‘ โ†’ R๐‘š with the same Lipschitz constant as ๐‘“ . More recently, similar results have been proven for bi-Lipschitz functions, ๐‘“ : ๐‘† โ†’ R๐‘š , from ๐‘† โŠ‚ R๐‘ into R๐‘š in the theoretical computer science literature. In particular, it was shown in [75] that outer extensions of such bi-Lipschitz functions ๐‘“ , ๐‘“หœ : R๐‘ โ†’ R๐‘š+1 , exist which both (๐‘–) approximately preserve ๐‘“ โ€™s Lipschitz constants, and which (๐‘–๐‘–) satisfy ๐‘“หœ(x) = ( ๐‘“ (x), 0) for all x โˆˆ ๐‘†. Narayanan and Nelson [81] then applied similar outer extension methods to a special class of the linear bi-Lipschitz maps guaranteed to exist for any given finite set ๐‘† โŠ‚ R๐‘ by Johnson- Lindenstrauss (JL) lemma [60] in order prove the following remarkable result: For each finite set ๐‘† โŠ‚ R๐‘ and ๐œ– โˆˆ (0, 1) there exists a terminal embedding of ๐‘†, ๐‘“ : R๐‘ โ†’ RO ( log|๐‘†|/๐œ– ) , with the 2 73 property that (1 โˆ’ ๐œ–)โˆฅx โˆ’ yโˆฅ 2 โ‰ค โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 2 โ‰ค (1 + ๐œ–)โˆฅx โˆ’ yโˆฅ 2 (4.2) holds โˆ€x โˆˆ ๐‘† and โˆ€y โˆˆ R๐‘ . In this chapter, we generalize Narayanan and Nelsonโ€™s theorem for finite sets to also hold for infinite subsets ๐‘† โŠ‚ R๐‘ , and then give a specialized variant for the case where the infinite subset ๐‘† โŠ‚ R๐‘ in question is a compact and smooth submanifold of R๐‘ . As we shall see below, generalizing this result requires us to both alter the bi-Lipschitz extension methods of [75] as well as to replace the use of embedding techniques utilizing cardinality in [81] with different JL-type embedding methods involving alternate measures of set complexity which remain meaningful for infinite sets (i.e., the Gaussian width of the unit secants of the set ๐‘† in question). In the special case where ๐‘† is a submanifold of R๐‘ , recent results bounding the Gaussian widths of the unit secants of such sets in terms of other fundamental geometric quantities (e.g., their reach, dimension, volume, etc.) [53] can then be brought to bear in order to produce terminal manifold embeddings of ๐‘† into R๐‘š satisfying (4.2) with ๐‘š near-optimally small. Note that a non-trivial terminal embedding, ๐‘“ , of ๐‘† satisfying (4.2) for all x โˆˆ ๐‘† and y โˆˆ R๐‘ must be nonlinear. In contrast, prior work on bi-Lipschitz maps of submanifolds of R๐‘ into lower dimensional Euclidean space in the mathematical data science literature have all utilized linear maps (see, e.g., [5, 27, 53]). As a result, it is impossible for such previously considered linear maps to serve as terminal embeddings of submanifolds of R๐‘ into lower-dimensional Euclidean space without substantial modification. Another way of viewing the work carried out herein is that it constructs outer bi-Lipschitz extensions of such prior linear JL embeddings of manifolds in a way that effectively preserves their near-optimal embedding dimension in the final resulting extension. Motivating applications of terminal embeddings of submanifolds of R๐‘ related to compressive classification via manifold models [21] are discussed next. 74 4.2.1 UNIVERSALLY ACCURATE COMPRESSIVE CLASSIFICATION VIA NOISY MANIFOLD DATA It is one of the sad facts of life that most everyone eventually comes to accept: everything living must eventually die, you canโ€™t always win, you arenโ€™t always right, and โ€“ worst of all to the most dedicated of data scientists โ€“ there is always noise contaminating your datasets. Nevertheless, there are mitigating circumstances and achievable victories implicit in every statement above โ€“ most pertinently here, there are mountains of empirical evidence that noisy training data still permits accurate learning. In particular, when the noise level is not too large, the mere existence of a low-dimensional data model which only approximately fits your noisy training data can still allow for successful, e.g., nearest-neighbor classification using only a highly compressed version of your original training dataset (even when you know very little about the model specifics) [21]. Better quantifying these empirical observations in the context of low-dimensional manifold models is the primary motivation for our main result below. For example, let M โŠ‚ R๐‘ be a ๐‘‘-dimensional submanifold of R๐‘ (our data model), fix ๐›ฟ โˆˆ R+ (our effective noise level), and choose ๐‘‡ โІ ๐‘ก๐‘ข๐‘๐‘’(๐›ฟ, M) := {x | โˆƒy โˆˆ M ๐‘ค๐‘–๐‘กโ„Ž โˆฅx โˆ’ yโˆฅ 2 โ‰ค ๐›ฟ} (our โ€œnoisyโ€ and potentially high-dimensional training data). Fix ๐œ– โˆˆ (0, 1). For a terminal embedding ๐‘“ : R๐‘ โ†’ R๐‘š of M as per (4.2), one can see that (1 โˆ’ ๐œ–) โˆฅz โˆ’ tโˆฅ 2 โˆ’ 2(1 โˆ’ ๐œ–)๐›ฟ โ‰ค โˆฅ ๐‘“ (z) โˆ’ ๐‘“ (t)โˆฅ 2 โ‰ค (1 + ๐œ–) โˆฅz โˆ’ tโˆฅ 2 + 2(1 + ๐œ–)๐›ฟ (4.3) will hold simultaneously for all z โˆˆ R๐‘ and t โˆˆ ๐‘‡, where ๐‘“ has an embedding dimension that only depends on the geometric properties of M (and not necessarily on |๐‘‡ |).1 Thus, if ๐‘‡ includes a sufficiently dense external cover of M, then ๐‘“ will allow us to approximate the distance of all z โˆˆ R๐‘ to M in the compressed embedding space via the estimator หœ ๐‘“ (z), ๐‘“ (๐‘‡)) := inf โˆฅ ๐‘“ (z) โˆ’ ๐‘“ (t)โˆฅ 2 โ‰ˆ ๐‘‘(z, M) := inf โˆฅz โˆ’ yโˆฅ 2 ๐‘‘( (4.4) tโˆˆ๐‘‡ yโˆˆM 1 One can prove (4.3) by comparing both z and t to a point x๐‘ก โˆˆ M satisfying โˆฅt โˆ’ x๐‘ก โˆฅ 2 โ‰ค ๐›ฟ via several applications of the (reverse) triangle inequality. 75 up to O(๐›ฟ)-error. As a result, if one has noisy data from two disjoint manifolds M1 , M2 โŠ‚ R๐‘ , one can use this compressed ๐‘‘หœ estimator to correctly classify all data z โˆˆ ๐‘ก๐‘ข๐‘๐‘’(๐›ฟ, M 1 ) ๐‘ก๐‘ข๐‘๐‘’(๐›ฟ, M 2 ) S as being in either ๐‘‡ 1 := ๐‘ก๐‘ข๐‘๐‘’(๐›ฟ, M 1 ) (class 1) or ๐‘‡ 2 := ๐‘ก๐‘ข๐‘๐‘’(๐›ฟ, M 2 ) (class 2) as long as inf โˆฅx โˆ’ yโˆฅ 2 is sufficiently large. In short, terminal manifold embeddings demonstrate that xโˆˆ๐‘‡ 1 ,yโˆˆ๐‘‡ 2 accurate compressive nearest-neighbor classification based on noisy manifold training data is al- ways possible as long as the manifolds in question are sufficiently far apart (though not necessarily separable from one another by, e.g., a hyperplane, etc.). Note that in the discussion above we may in fact take ๐‘‡ = ๐‘ก๐‘ข๐‘๐‘’(๐›ฟ, M). In that case (4.3) will hold simultaneously for all z โˆˆ R๐‘ and (t, ๐›ฟ) โˆˆ R๐‘ ร— R+ with t โˆˆ ๐‘ก๐‘ข๐‘๐‘’(๐›ฟ, M) so that ๐‘“ : R๐‘ โ†’ R๐‘š will approximately preserve the dis- tances of all points z โˆˆ R๐‘ to ๐‘ก๐‘ข๐‘๐‘’(๐›ฟ, M) up to errors on the order of O(๐œ–)๐‘‘(z, ๐‘ก๐‘ข๐‘๐‘’(๐›ฟ, M))+ O(๐›ฟ) for all ๐›ฟ โˆˆ R+ . This is in fact rather remarkable when one recalls that the best achievable embedding dimension, ๐‘š, here only depends on the geometric properties of the low-dimensional manifold M (see Theorem 4.2.1 for a detailed accounting of these dependences). We further note that alternate applications of Theorem 4.4.2 (on which Theorem 4.2.1 depends) involving other data models are also possible. As a more explicit second example, suppose that M is a union of ๐‘› ๐‘‘-dimensional affine subspaces so that its unit secants, ๐‘† M defined as per (4.7), are  contained in the union of at most ๐‘›2 + ๐‘› unit spheres โŠ‚ S๐‘โˆ’1 , each of dimension at most 2๐‘‘ + 1. โˆš๏ธ The Gaussian width (see Definition 4.3.1) of ๐‘†M can then be upper-bounded by ๐ถ ๐‘‘ + log ๐‘› using standard techniques, where ๐ถ โˆˆ R+ is an absolute constant. An application of Theorem 4.4.2   ๐‘‘+log ๐‘› R๐‘ โ†’ R O now guarantees the existence of a terminal embedding ๐‘“ : ๐œ–2 which will allow approximate nearest subspace queries to be answered for any input point z โˆˆ R๐‘ using only ๐‘“ (z)   ๐‘‘+log ๐‘› in the compressed O ๐œ–2 -dimensional space. Even more specifically, if we choose, e.g., M to consist of all at most ๐‘ -sparse vectors in R๐‘ (i.e., so that M is the union of ๐‘› = ๐‘๐‘  subspaces of  R๐‘ ), we can now see that Theorem 4.4.2 guarantees the existence of a deterministic compressed estimator (4.4) which allows for the accurate approximation of the best ๐‘ -term approximation error inf โˆฅz โˆ’ yโˆฅ 2 for all z โˆˆ R๐‘ using only ๐‘“ (z) โˆˆ RO(๐‘  log(๐‘/๐‘ )) as input. Note that this yโˆˆ R๐‘ ๐‘Ž๐‘ก ๐‘š๐‘œ๐‘ ๐‘ก ๐‘  ๐‘ ๐‘๐‘Ž๐‘Ÿ ๐‘ ๐‘’ is only possible due to the non-linearity of ๐‘“ herein. In, e.g., the setting of classical compressive 76 sensing theory where ๐‘“ must be linear it is known that such good performance is impossible [20, Section 5]. 4.2.2 THE MAIN RESULT AND A BRIEF OUTLINE OF ITS PROOF The following theorem is proven in Section 4.5. Given a low-dimensional submanifold M of R๐‘ it establishes the existence of a function ๐‘“ : R๐‘ โ†’ R๐‘š with ๐‘š โ‰ช ๐‘ that approximately preserves the Euclidean distances from all points in R๐‘ to all points in M. As a result, it guarantees the existence of a low-dimensional embedding which will, e.g., always allow for the correct compressed nearest-neighbor classification of images living near different well separated submanifolds of Euclidean space. Theorem 4.2.1 (The Main Result). Let M โ†ฉโ†’ R๐‘ be a compact ๐‘‘-dimensional submanifold of R๐‘ with boundary ๐œ•M, finite reach ๐œM (see Definition 4.3.2), and volume ๐‘‰M . Enumerate the connected components of ๐œ•M and let ๐œ๐‘– be the reach of the ๐‘– ๐‘กโ„Ž connected component of ๐œ•M as a submanifold of R๐‘ . Set ๐œ := min๐‘– {๐œM , ๐œ๐‘– }, let ๐‘‰๐œ•M be the volume of ๐œ•M, and denote the volume of the ๐‘‘-dimensional Euclidean ball of radius 1 by ๐œ” ๐‘‘ . Next, 20๐‘‰ M 1. if ๐‘‘ = 1, define ๐›ผM := ๐œ + ๐‘‰ ๐œ•M , else  ๐‘‘   ๐‘‘โˆ’1 ๐‘‰M 41 ๐‘‰ ๐œ•M 81 2. if ๐‘‘ โ‰ฅ 2, define ๐›ผM := ๐œ”๐‘‘ ๐œ + ๐œ” ๐‘‘โˆ’1 ๐œ . Finally, fix ๐œ– โˆˆ (0, 1) and define   ๐›ฝM := ๐›ผ2M ๐‘‘ + 3 ๐›ผM . (4.5) Then, there exists a map ๐‘“ : R๐‘ โ†’ C๐‘š with ๐‘š โ‰ค ๐‘ (ln (๐›ฝM ) + 4๐‘‘) /๐œ– 2 that satisfies โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 22 โˆ’ โˆฅx โˆ’ yโˆฅ 22 โ‰ค ๐œ– โˆฅx โˆ’ yโˆฅ 22 (4.6) for all x โˆˆ M and y โˆˆ R๐‘ . Here ๐‘ โˆˆ R+ is an absolute constant independent of all other quantities. 77 Proof. See Section 4.5. The remainder of the chapter is organized as follows. In Section 4.3 we review notation and state a result from [53] that bounds the Gaussian width of the unit secants of a given submanifold of R๐‘ in terms of geometric quantities of the original submanifold. Next, in Section 4.4 we prove an optimal terminal embedding result for arbitrary subsets of R๐‘ in terms of the Gaussian widths of their unit secants by generalizing results from the computer science literature concerning finite sets [75, 81]. See Theorem 4.4.2 therein. We then combine results from Sections 4.3 and 4.4 in order to prove our main theorem in Section 4.5. Finally, in Section 4.6 we conclude by demonstrating that terminal embeddings allow for more accurate compressive nearest neighbor classification than standard linear embeddings in practice. 4.3 NOTATION AND PRELIMINARIES Below ๐ตโ„“๐‘2 (x, ๐›พ) will denote the open Euclidean ball around x of radius ๐›พ in R๐‘ . Given an arbitrary subset ๐‘† โŠ‚ R๐‘ , we will further define โˆ’๐‘† := {โˆ’x | x โˆˆ ๐‘†} and ๐‘† ยฑ ๐‘† := {x ยฑ y | x, y โˆˆ ๐‘†}. Finally, for a given ๐‘‡ โŠ‚ R๐‘ we will also let ๐‘‡ denote its closure, and further define the normalization operator ๐‘ˆ : R๐‘ \ {0} โ†’ S๐‘โˆ’1 to be such that ๐‘ˆ(x) := x/โˆฅxโˆฅ 2 . With this notation in hand we can then define the unit secants of ๐‘‡ โŠ‚ R๐‘ to be   xโˆ’y ๐‘†๐‘‡ := ๐‘ˆ ((๐‘‡ โˆ’ ๐‘‡) \ {0}) = | x, y โˆˆ ๐‘‡, x ฬธ= y . (4.7) โˆฅx โˆ’ yโˆฅ 2 Note that ๐‘†๐‘‡ is always a compact subset of the unit sphere S๐‘โˆ’1 โŠ‚ R๐‘ , and that ๐‘†๐‘‡ = โˆ’๐‘†๐‘‡ . Herein we will call a matrix ๐ด โˆˆ C๐‘šร—๐‘ an ๐œ–-JL map of a set ๐‘‡ โŠ‚ R๐‘ into C๐‘š if (1 โˆ’ ๐œ–)โˆฅxโˆฅ 22 โ‰ค โˆฅ ๐ดxโˆฅ 22 โ‰ค (1 + ๐œ–)โˆฅxโˆฅ 22 holds for all x โˆˆ ๐‘‡. Note that this is equivalent to ๐ด โˆˆ C๐‘šร—๐‘ having the property that sup ๐ด(x/โˆฅxโˆฅ 22 ) 2 2 โˆ’ 1 = sup โˆฅ ๐ดxโˆฅ 22 โˆ’ 1 โ‰ค ๐œ–, (4.8) xโˆˆ๐‘‡\{0} xโˆˆ๐‘ˆ(๐‘‡) 78 where ๐‘ˆ(๐‘‡) โŠ‚ R๐‘ is the normalized version of ๐‘‡ \ {0} โŠ‚ RN defined as above. Furthermore, we will say that a matrix ๐ด โˆˆ C๐‘šร—๐‘› is an ๐œ–-JL embedding of a set ๐‘‡ โŠ‚ R๐‘› into C๐‘š if ๐ด is an ๐œ–-JL map of ๐‘‡ โˆ’ ๐‘‡ := {x โˆ’ y | x, y โˆˆ ๐‘‡ } into C๐‘š . Here we will be working with random matrices which will embed any fixed set ๐‘‡ of bounded size (measured with respect to, e.g., Gaussian Width [117]) with high probability. Such matrix distributions are often called oblivious and discussed as randomized embeddings in the absence of any specific set ๐‘‡ since their embedding quality can be determined independently of any properties of a given set ๐‘‡ beyond its size. In particular, the class of oblivious sub- Gaussian random matrices having independent, isotropic, and sub-Gaussian rows will receive special attention below. 4.3.1 SOME COMMON MEASURES OF SET SIZE AND COMPLEXITY WITH ASSOCIATED BOUNDS We will denote the cardinality of a finite set ๐‘‡ by |๐‘‡ |. For a (potentially infinite) set ๐‘‡ โŠ‚ R๐‘ we define its radius and diameter to be ๐‘Ÿ๐‘Ž๐‘‘(๐‘‡) := sup โˆฅxโˆฅ 2 xโˆˆ๐‘‡ and ๐‘‘๐‘–๐‘Ž๐‘š(๐‘‡) := ๐‘Ÿ๐‘Ž๐‘‘(๐‘‡ โˆ’ ๐‘‡) = sup โˆฅx โˆ’ yโˆฅ 2 , x,yโˆˆ๐‘‡ respectively. Given a value ๐›ฟ โˆˆ R+ , a ๐›ฟ-cover of ๐‘‡ (also sometimes called a ๐›ฟ-net of ๐‘‡) will be a subset ๐‘† โŠ‚ ๐‘‡ such that the following holds โˆ€x โˆˆ ๐‘‡, โˆƒy โˆˆ ๐‘† ๐‘ ๐‘ข๐‘โ„Ž ๐‘กโ„Ž๐‘Ž๐‘ก โˆฅx โˆ’ yโˆฅ 2 โ‰ค ๐›ฟ. The ๐›ฟ-covering number of ๐‘‡, denoted by N (๐‘‡, ๐›ฟ) โˆˆ N, is then the smallest achievable cardinality of a ๐›ฟ-cover of ๐‘‡. Finally, the Gaussian width of a set ๐‘‡ is defined as follows. 79 Definition 4.3.1. (Gaussian Width [117, Definition 7.5.1]). The Gaussian width of a set ๐‘‡ โŠ‚ R๐‘ is ๐‘ค(๐‘‡) := E sup โŸจg, xโŸฉ xโˆˆ๐‘‡ where g is a random vector with ๐‘ independent and identically distributed (i.i.d.) mean 0 and variance 1 Gaussian entries. For a list of useful properties of the Gaussian width we refer the reader to [117, Proposition 7.5.2]. Finally, reach is an extrinsic parameter of a subset ๐‘† of Euclidean space defined based on how far away points can be from ๐‘† while still having a unique closest point in ๐‘† [32, 113]. The following formal definition of reach utilizes the Euclidean distance ๐‘‘ between a given point x โˆˆ R๐‘ and subset ๐‘† โŠ‚ R๐‘ . Definition 4.3.2. (Reach [32, Definition 4.1]). For a subset ๐‘† โŠ‚ R๐‘ of Euclidean space, the reach ๐œ๐‘† is ๐œ๐‘† := sup {๐‘ก โ‰ฅ 0 | โˆ€x โˆˆ R๐‘› such that ๐‘‘(x, ๐‘†) < ๐‘ก, x has a unique closest point in ๐‘†} . he following theorem is a restatement of Theorem 20 in [53]. It bounds the Gaussian width of a smooth submanifold of R๐‘ in terms of its dimension, reach, and volume. Theorem 4.3.1 (Gaussian Width of the Unit Secants of a Submanifold of R๐‘ , Potentially with Boundary). Let M โ†ฉโ†’ R๐‘ be a compact ๐‘‘-dimensional submanifold of R๐‘ with boundary ๐œ•M, finite reach ๐œM , and volume ๐‘‰M . Enumerate the connected components of ๐œ•M and let ๐œ๐‘– be the reach of the ๐‘– ๐‘กโ„Ž connected component of ๐œ•M as a submanifold of R๐‘ . Set ๐œ := min๐‘– {๐œM , ๐œ๐‘– }, let ๐‘‰๐œ•M be the volume of ๐œ•M, and denote the volume of the ๐‘‘-dimensional Euclidean ball of radius 1 by ๐œ” ๐‘‘ . Next, 20๐‘‰M 1. if ๐‘‘ = 1, define ๐›ผM := ๐œ + ๐‘‰๐œ•M , else  ๐‘‘   ๐‘‘โˆ’1 ๐‘‰M 41 ๐‘‰ ๐œ•M 81 2. if ๐‘‘ โ‰ฅ 2, define ๐›ผM := ๐œ”๐‘‘ ๐œ + ๐œ” ๐‘‘โˆ’1 ๐œ . 80 Finally, define   2 ๐›ฝM := ๐›ผM + 3 ๐‘‘ ๐›ผM . (4.9) Then, the Gaussian width of ๐‘ˆ ((M โˆ’ M) \ {0}) satisfies   โˆš โˆš๏ธ ๐‘ค (๐‘†M ) = ๐‘ค ๐‘ˆ ((M โˆ’ M) \ {0}) โ‰ค 8 2 ln (๐›ฝM ) + 4๐‘‘. With this Gaussian width bound in hand we can now begin the proof of our main result. The approach will be to combine Theorem 4.3.1 above with general theorems concerning the existence of outer bi-Lipschitz extensions of ๐œ–-JL embeddings of arbitrary subsets of R๐‘ into lower-dimensional Euclidean space. These general existence theorems are proven in the next section. 4.4 THE MAIN BI-LIPSCHITZ EXTENSION RESULTS AND THEIR PROOFS Our first main technical result guarantees that any given JL map ฮฆ of a special subset of S๐‘โˆ’1 related to M will not only be a bi-Lipschitz map from M โŠ‚ R๐‘ into a lower dimensional Euclidean space R๐‘š , but will also have an outer bi-Lipschitz extension into R๐‘š+1 . It is useful as a means of extending particular (structured) JL maps ฮฆ of special interest in the context of, e.g., saving on memory costs [52].   Theorem 4.4.1. Let M โŠ‚ R๐‘ , ๐œ– โˆˆ (0, 1), and suppose that ฮฆ โˆˆ C๐‘šร—๐‘ is an ๐œ–2 2304 -JL map of ๐‘†M + ๐‘†M into C๐‘š . Then, there exists an outer bi-Lipschitz extension of ฮฆ : M โ†’ C๐‘š , ๐‘“ : R๐‘ โ†’ C๐‘š+1 , with the property that โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 2 2 โˆ’ โˆฅx โˆ’ yโˆฅ 22 โ‰ค ๐œ– โˆฅx โˆ’ yโˆฅ 22 holds for all x โˆˆ M and y โˆˆ R๐‘ . Proof. See Section 4.4.4. 81   ๐œ–2 Looking at Theorem 4.4.1 we can see that an 2304 -JL map of ๐‘†M + ๐‘†M is required in order to achieve the outer extension ๐‘“ of interest. This result is sub-optimal in two respects. First, the constant factor 1/2304 is certainly not tight and can likely be improved substantially. More importantly though is the fact that ๐œ– is squared in the required map distortion which means that the terminal embedding dimension, ๐‘š + 1, will have to scale sub-optimally in ๐œ– (see Remark 4.4.1 below for details). Unfortunately, this is impossible to rectify when extending arbitrary maps ฮฆ (see, e.g., [75]). For sub-gaussian ฮฆ an improvement is in fact possible, however, which is the subject of our second main technical result just below. Using specialized theory for sub-gaussian matrices it demonstrates the existence of terminal JL embeddings for arbitrary subsets of R๐‘ which achieve an optimal terminal embedding dimension up to constants. Theorem 4.4.2. Let M โŠ‚ R๐‘ and ๐œ– โˆˆ (0, 1). There exists a map ๐‘“ : R๐‘ โ†’ C๐‘š with ๐‘š โ‰ค  2 ๐‘ค(๐‘† M ) ๐‘ ๐œ– that satisfies โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 22 โˆ’ โˆฅx โˆ’ yโˆฅ 22 โ‰ค ๐œ– โˆฅx โˆ’ yโˆฅ 22 (4.10) for all x โˆˆ M and y โˆˆ R๐‘ . Here ๐‘ โˆˆ R+ is an absolute constant independent of all other quantities. Proof. See Section 4.4.5. To see the optimality of the terminal embedding dimension ๐‘š provided by Theorem 4.4.2 we note that functions ๐‘“ which satisfy (4.10) for all x, y โˆˆ M must in fact generally scale quadratically in both ๐‘ค(๐‘†M ) and 1/๐œ– (see [51, Theorem 7] and [67]). We will now begin proving supporting results for both of the main technical theorems above. The first supporting results pertain to the so-called convex hull distortion of a given linear ๐œ–-JL map. โˆš 4.4.1 ALL LINEAR ๐œ–-JL MAPS PROVIDE O( ๐œ–)โ€“CONVEX HULL DISTORTION A crucial component involved in proving our main results involves the approximate norm preservation of all points in the convex hull of a given set bounded set ๐‘† โŠ‚ R๐‘ . Recall that the 82 convex hull of ๐‘† โŠ‚ C๐‘ is ( ) [โˆž โˆ‘๏ธ๐‘— โˆ‘๏ธ๐‘— conv(๐‘†) := ๐›ผโ„“ xโ„“ | x1 , . . . , x ๐‘— โˆˆ ๐‘†, ๐›ผ1 , . . . , ๐›ผ ๐‘— โˆˆ [0, 1] s.t. ๐›ผโ„“ = 1 . ๐‘—=1 โ„“=1 โ„“=1 The next theorem states that each point in the convex hull of ๐‘† โŠ‚ R๐‘ can be expressed as a convex combination of at most ๐‘ + 1 points from ๐‘†. Hence, the convex hulls of subsets of R๐‘ are actually a bit simpler than they first appear. Theorem 4.4.3 (Carathรฉadory, see, e.g., [11]). Given ๐‘† โˆˆ R๐‘ , โˆ€x โˆˆ conv(๐‘†), โˆƒy1 , . . . , y๐‘หœ , ๐‘หœ = P๐‘หœ P๐‘หœ min(|๐‘†|, ๐‘ + 1), such that x = โ„“=1 ๐›ผโ„“ yโ„“ for some ๐›ผ1 , . . . , ๐›ผ๐‘หœ โˆˆ [0, 1], โ„“=1 ๐›ผโ„“ = 1. Finally, we say that a matrix ฮฆ โˆˆ C๐‘šร—๐‘ provides ๐œ–-convex hull distortion for ๐‘† โŠ‚ R๐‘ if |โˆฅฮฆxโˆฅ 2 โˆ’โˆฅxโˆฅ 2 | โ‰ค ๐œ– holds for all x โˆˆ conv(๐‘†). The main result of this subsection states that all linear ๐œ–-JL maps can provide ๐œ–-convex hull distortion for the unit secants of any given set. In particular, we have the following theorem which generalizes arguments in [75] for finite sets to arbitrary and potentially infinite sets.  2 Theorem 4.4.4. Let M โŠ‚ R๐‘ , ๐œ– โˆˆ (0, 1), and suppose that ฮฆ โˆˆ C๐‘šร—๐‘ is an ๐œ– 4 -JL map of ๐‘†M + ๐‘†M into C๐‘š . Then, ฮฆ will also provide ๐œ–-convex hull distortion for ๐‘†M . The proof of Theorem 4.4.4 depends on two intermediate lemmas. The first lemma is a slight modification of Lemma 3 in [52]. Lemma 4.4.1. Let ๐‘† โŠ‚ R๐‘ and ๐œ– โˆˆ (0, 1). Then, an ๐œ–-JL map ฮฆ โˆˆ C๐‘šร—๐‘ of the set   โ€ฒ x y x y ๐‘† = + , โˆ’ | x, y โˆˆ ๐‘† โˆฅxโˆฅ 2 โˆฅyโˆฅ 2 โˆฅxโˆฅ 2 โˆฅyโˆฅ 2 will satisfy |โ„œ (โŸจฮฆx, ฮฆyโŸฉ) โˆ’ โŸจx, yโŸฉ|โ‰ค 2๐œ– โˆฅxโˆฅ 2 โˆฅyโˆฅ 2 โˆ€x, y โˆˆ ๐‘†. 83 Proof. If x = 0 or y = 0 the inequality holds trivially. Thus, suppose x, y ฬธ= 0. Consider the x y normalizations u = โˆฅxโˆฅ 2 , v = โˆฅyโˆฅ 2 . The polarization identities for complex/real inner products imply that ! 3 1 โˆ‘๏ธ 2   |โ„œ (โŸจฮฆu, ฮฆv)โŸฉ โˆ’ โŸจu, vโŸฉ| = โ„œ ๐‘– โ„“ ฮฆu + ๐‘– โ„“ ฮฆv 2 โˆ’ โˆฅu + vโˆฅ 22 โˆ’ โˆฅu โˆ’ vโˆฅ 22 4 โ„“=0 1     = โˆฅฮฆu + ฮฆvโˆฅ 22 โˆ’ โˆฅฮฆu โˆ’ ฮฆvโˆฅ 22 โˆ’ โˆฅu + vโˆฅ 22 โˆ’ โˆฅu โˆ’ vโˆฅ 22 4 1 2 2 2 2  โ‰ค โˆฅฮฆu + ฮฆvโˆฅ 2 โˆ’ โˆฅu + vโˆฅ 2 + โˆฅฮฆu โˆ’ ฮฆvโˆฅ 2 โˆ’ โˆฅu โˆ’ vโˆฅ 2 4 ๐œ–   ๐œ– โ‰ค โˆฅu + vโˆฅ 22 + โˆฅu โˆ’ vโˆฅ 22 โ‰ค (โˆฅuโˆฅ 2 + โˆฅvโˆฅ 2 ) 2 โ‰ค 2๐œ– . 4 2 The result now follows by multiplying the inequality through by โˆฅxโˆฅ 2 โˆฅyโˆฅ 2 . Next, we see that and linear ๐œ–-JL maps are capable of preserving the angles between the elements of the convex hull of any bounded subset ๐‘† โŠ‚ R๐‘ .   Lemma 4.4.2. Suppose ๐‘† โŠ‚ ๐ตโ„“๐‘2 (0, ๐›พ) and ๐œ– โˆˆ (0, 1). Let ฮฆ โˆˆ C๐‘šร—๐‘ be an ๐œ– 2๐›พ 2 -JL map of the set ๐‘†โ€ฒ defined as in Lemma 4.4.1 into C๐‘š . Then |โ„œ (โŸจฮฆx, ฮฆyโŸฉ) โˆ’ โŸจx, yโŸฉ| โ‰ค ๐œ– holds โˆ€x, y โˆˆ conv(๐‘†). ๐‘หœ ๐‘หœ หœ ๐‘ ๐‘หœ Proof. Let x, y โˆˆ conv(๐‘†). By Theorem 4.4.3, โˆƒ {y๐‘– }๐‘–=1 , {x๐‘– }๐‘–=1 โŠ‚ ๐‘† ๐‘Ž๐‘›๐‘‘ {๐›ผโ„“ }โ„“=1 , {๐›ฝโ„“ }โ„“=1 โŠ‚ P๐‘หœ P๐‘หœ [0, 1] with โ„“=1 ๐›ผโ„“ = โ„“=1 ๐›ฝโ„“ = 1 such that ๐‘หœ โˆ‘๏ธ โˆ‘๏ธ๐‘หœ x= ๐›ผโ„“ xโ„“ , ๐‘Ž๐‘›๐‘‘ y = ๐›ฝโ„“ yโ„“ . โ„“=1 โ„“=1 84 Hence, by Lemma 4.4.1 we have that โˆ‘๏ธ๐‘หœ โˆ‘๏ธ ๐‘หœ   |โ„œ (โŸจฮฆx, ฮฆyโŸฉ) โˆ’ โŸจx, yโŸฉ| = ๐›ผโ„“ ๐›ฝ ๐‘— โ„œ โŸจฮฆxโ„“ , ฮฆy ๐‘— โŸฉ โˆ’ โŸจxโ„“ , y ๐‘— โŸฉ โ„“=1 ๐‘—=1 ๐‘หœ โˆ‘๏ธ ๐‘หœ   โˆ‘๏ธ ๐œ– โ‰ค2 ๐›ผโ„“ ๐›ฝ ๐‘— โˆฅxโ„“ โˆฅ 2 โˆฅy ๐‘— โˆฅ 2 โ„“=1 ๐‘—=1 2๐›พ 2 ! ! โˆ‘๏ธ๐‘หœ โˆ‘๏ธ๐‘หœ โ‰ค๐œ– ๐›ผโ„“ ๐›ฝ๐‘— = ๐œ–. โ„“=1 ๐‘—=1   ๐œ– Here we have also used the mapping error 2๐›พ 2 and the fact that all norms of vectors in this case will be less than ๐›พ. We are now prepared to prove Theorem 4.4.4. 4.4.1.1 PROOF OF THEOREM 4.4.4 Applying Lemma 4.4.2 with ๐‘† = ๐‘†M = ๐‘†M โˆช โˆ’๐‘†M , we note that ๐‘†โ€ฒ = ๐‘†M + ๐‘†M = (๐‘†M โˆช โˆ’๐‘†M ) + (๐‘†M โˆช โˆ’๐‘†M ) since ๐‘† โŠ‚ S๐‘โˆ’1 . Furthermore, ๐›พ = 1 in this case. Hence, ฮฆ โˆˆ R๐‘šร—๐‘ being  2 an ๐œ–4 -JL map of ๐‘†M + ๐‘†M into R๐‘š implies that ๐œ–2 |โ„œ (โŸจฮฆx, ฮฆyโŸฉ) โˆ’ โŸจx, yโŸฉ| โ‰ค (4.11) 2 holds โˆ€x, y โˆˆ conv(๐‘†M ) โŠ‚ ๐ตโ„“๐‘2 (0, 1). In particular, (4.11) with x = y implies that |โˆฅฮฆxโˆฅ 2 โˆ’โˆฅxโˆฅ 2 | |โˆฅฮฆxโˆฅ 2 +โˆฅxโˆฅ 2 | = โˆฅฮฆxโˆฅ 22 โˆ’โˆฅxโˆฅ 22 โ‰ค ๐œ– 2 /2. Noting that |โˆฅฮฆxโˆฅ 2 +โˆฅxโˆฅ 2 | โ‰ฅ โˆฅxโˆฅ 2 we can see that the desired result holds automatically if โˆฅxโˆฅ 2 โ‰ฅ ๐œ–/2. Thus, it suffices to assume that that โˆฅxโˆฅ 2 < ๐œ–/2, but then we are also finished since โˆš๏ธƒ โˆš |โˆฅฮฆxโˆฅ 2 โˆ’โˆฅxโˆฅ 2 | โ‰ค max{โˆฅxโˆฅ 2 , โˆฅฮฆxโˆฅ 2 } โ‰ค โˆฅxโˆฅ 22 +๐œ– 2 /2 < 23 ๐œ– will hold in that case. Remark 4.4.1. Though Theorem 4.4.4 holds for arbitrary linear maps, we note that it has sub-  2 optimal dependence on the distortion parameter ๐œ–. In particular, a linear ๐œ–4 -JL map of an arbitrary set will generally embed that set into C๐‘š with ๐‘š = ฮฉ(1/๐œ– 4 ) [67]. However, it has been 85 shown in [81] that sub-Gaussian matrices will behave better with high probability, allowing for outer bi-Lipschitz extensions of JL-embeddings of finite sets into R๐‘š with ๐‘š = O(1/๐œ– 2 ). In the next subsection we generalize those better scaling results for sub-Gaussian random matrices to (potentially) infinite sets. 4.4.2 SUB-GAUSSIAN MATRICES AND ๐œ–-CONVEX HULL DISTORTION FOR INFINITE SETS Motivated by results in [81] for finite sets which achieve optimal dependence on the distortion parameter ๐œ– for sub-Gaussian matrices, in this section we will do the same for infinite sets using results from [117]. Our main tool will be the following result (see also [53, Theorem 4]). Theorem 4.4.5 (See Theorem 9.1.1 and Exercise 9.1.8 in [117]). Let ฮฆ be ๐‘š ร— ๐‘ matrix whose rows are independent, isotropic, and sub-Gaussian random vectors in R๐‘ . Let ๐‘ โˆˆ (0, 1) and ๐‘† โŠ‚ R๐‘ . Then there exists a constant ๐‘ depending only on the distribution of the rows of ฮฆ such that โˆš h โˆš๏ธ i sup โˆฅฮฆxโˆฅ 2 โˆ’ ๐‘šโˆฅxโˆฅ 2 โ‰ค ๐‘ ๐‘ค(๐‘†) + ln(2/๐‘) ยท ๐‘Ÿ๐‘Ž๐‘‘(๐‘†) xโˆˆ๐‘† holds with probability at least 1 โˆ’ ๐‘. The main result of this section is a simple consequence of Theorem 4.4.5 together with standard results concerning Gaussian widths [117, Proposition 7.5.2]. Corollary 4.4.1. Let M โŠ‚ R๐‘ , ๐œ–, ๐‘ โˆˆ (0, 1), and ฮฆ โˆˆ R๐‘šร—๐‘ be an ๐‘š ร— ๐‘ matrix whose rows are independent, isotropic, and sub-Gaussian random vectors in R๐‘ . Furthermore, suppose that ๐‘โ€ฒ  โˆš๏ธ 2 ๐‘š โ‰ฅ 2 ๐‘ค (๐‘†M ) + ln(2/๐‘) , ๐œ– where ๐‘โ€ฒ is a constant depending only on the distribution of the rows of ฮฆ. Then, with probability at least 1 โˆ’ ๐‘ the random matrix โˆš1 ฮฆ will simultaneously be both an ๐œ–-JL embedding of M into ๐‘š R๐‘š and also provide ๐œ–-convex hull distortion for ๐‘†M . Proof. We apply Theorem 4.4.5 to ๐‘† = conv (๐‘†M ). In doing so we note that ๐‘ค (conv (๐‘†M )) = 86 ๐‘ค (๐‘†M ) [117, Proposition 7.5.2], and that ๐‘Ÿ๐‘Ž๐‘‘ (conv (๐‘†M )) = 1 since conv (๐‘†M ) โІ ๐ตโ„“๐‘2 (0, 1). The result will be that โˆš1 ฮฆ provides ๐œ–-convex hull distortion for ๐‘†M as long as ๐‘โ€ฒ โ‰ฅ ๐‘2 . Next, we note ๐‘š that providing ๐œ–-convex hull distortion for ๐‘†M implies that โˆš1 ฮฆ will also approximately preserve ๐‘š the โ„“2 -norms of all the unit vectors in ๐‘†M โŠ‚ conv (๐‘†M ). In particular, โˆš1 ฮฆ will be a 3๐œ–-JL map of ๐‘š ๐‘†M into R๐‘š , which in turn implies that โˆš1 ฮฆ ๐‘š will also be a 3๐œ–-JL embedding of M โˆ’ M into R๐‘š by linearity/rescaling. Adjusting the constant ๐‘โ€ฒ to account for the additional factor of 3 now yields the stated result. We are now prepared to prove our general theorems regarding outer bi-Lipschitz extensions of JL-embeddings of potentially infinite sets. 4.4.3 OUTER BI-LIPSCHITZ EXTENSION RESULTS FOR JL EMBEDDINGS OF GENERAL SETS Before we can prove our final results for general sets we will need two supporting lemmas. They are adapted from the proofs of analogous results in [75, 81] for finite sets. Lemma 4.4.3. Let M โŠ‚ R๐‘ , ๐œ– โˆˆ (0, 1), and suppose that ฮฆ โˆˆ C๐‘šร—๐‘ provides ๐œ–-convex hull distortion for ๐‘†M . Then, there exists a function ๐‘” : R๐‘ โ†’ C๐‘š such that |โ„œ (โŸจ๐‘”(y), ฮฆxโŸฉ) โˆ’ โŸจy, xโŸฉ| โ‰ค 2๐œ– โˆฅyโˆฅ 2 โˆฅxโˆฅ 2 (4.12) holds for all x โˆˆ M โˆ’ M and y โˆˆ R๐‘ . Proof. First, we note that (4.12) holds trivially for y = 0 as long as ๐‘”(0) = 0. Thus, it suffices to consider nonzero y. Second, we claim that it suffices to prove the existence of a function ๐‘” : R๐‘ โ†’ C๐‘š that satisfies both of the following properties 1. โˆฅ๐‘”(y)โˆฅ 2 โ‰ค โˆฅyโˆฅ 2 , and 2. |โ„œ (โŸจ๐‘”(y), ฮฆxโ€ฒโŸฉ) โˆ’ โŸจy, xโ€ฒโŸฉ| โ‰ค ๐œ– โˆฅyโˆฅ 2 for all xโ€ฒ in a finite (๐œ–/2 max{1, โˆฅฮฆโˆฅ 2โ†’2 })-cover C of ๐‘†M , 87 for all y โˆˆ R๐‘ . To see why, fix y ฬธ= 0, x โˆˆ ๐‘†M , and let xโ€ฒ โˆˆ C โŠ‚ ๐‘†M satisfy โˆฅx โˆ’ xโ€ฒ โˆฅ 2 โ‰ค ๐œ–/2 max{1, โˆฅฮฆโˆฅ 2โ†’2 }. We can see that any function ๐‘” satisfying both of the properties above will have |โ„œ (โŸจ๐‘”(y), ฮฆxโŸฉ) โˆ’ โŸจy, xโŸฉ| = |โ„œ (โŸจ๐‘”(y), ฮฆxโ€ฒโŸฉ) + โ„œ (โŸจ๐‘”(y), ฮฆ (x โˆ’ xโ€ฒ)โŸฉ) โˆ’ โŸจy, (x โˆ’ xโ€ฒ)โŸฉ โˆ’ โŸจy, xโ€ฒโŸฉ| โ‰ค |โ„œ (โŸจ๐‘”(y), ฮฆxโ€ฒโŸฉ) โˆ’ โŸจy, xโ€ฒโŸฉ| + |โŸจ๐‘”(y), ฮฆ (x โˆ’ xโ€ฒ)โŸฉ| + |โŸจy, (x โˆ’ xโ€ฒ)โŸฉ| โ‰ค ๐œ– โˆฅyโˆฅ 2 +โˆฅ๐‘”(y)โˆฅ 2 โˆฅฮฆโˆฅ 2โ†’2 โˆฅx โˆ’ xโ€ฒ โˆฅ 2 +โˆฅyโˆฅ 2 โˆฅโˆฅx โˆ’ xโ€ฒ โˆฅ 2 where the second property was used in the last inequality above. Appealing to the first property above we can now also see that |โ„œ (โŸจ๐‘”(y), ฮฆxโŸฉ) โˆ’ โŸจy, xโŸฉ| โ‰ค 2๐œ– โˆฅyโˆฅ 2 will hold. Finally, as a consequence of the definition of ๐‘†M , we therefore have that (4.12) will hold for all x โˆˆ M โˆ’ M and y โˆˆ R๐‘ whenever Properties 1 and 2 hold above. Showing that (4.12) holds all x โˆˆ M โˆ’ M more generally can be proven by contradiction using a limiting argument combined with the fact that both the right and left hand sides of (4.12) are continuous in x for fixed y. Hence, we have reduced the proof to constructing a function ๐‘” that satisfies both Properties 1 and 2 above. Let ๐‘”(y) := arg minvโˆˆ๐ต2๐‘š (0,โˆฅyโˆฅ ) max | C| โ„Žy (v, ๐€), where (4.13) โ„“2 2 ๐€โˆˆ๐ต (0,1) โ„“1 โˆ‘๏ธ โ„Žy (v, ๐€) := (๐œ† u (โŸจy, uโŸฉ โˆ’ โ„œ (โŸจv, ฮฆuโŸฉ)) โˆ’ ๐œ– |๐œ† u | ยท โˆฅyโˆฅ 2 ) (4.14) uโˆˆC where we identify C๐‘š with R2๐‘š above. Note that Property 1 above is guaranteed by definition (4.13). Furthermore, we note that if max โ„Žy (๐‘”(y), ๐€) = max (|โŸจy, uโŸฉ โˆ’ โ„œ (โŸจ๐‘”(y), ฮฆuโŸฉ)| โˆ’ ๐œ– โˆฅyโˆฅ 2 ) โ‰ค max | C| โ„Žy (๐‘”(y), ๐€) โ‰ค 0 | C| ๐€โˆˆ{ยฑe ๐‘— } ๐‘—=1 uโˆˆC ๐€โˆˆ๐ต (0,1) โ„“1 88 then Property 2 above will hold as well. Thus, it suffices to show that min max | C| โ„Žy (v, ๐€) โ‰ค 0 ๐€โˆˆ๐ต (0,1) vโˆˆ๐ต2๐‘š 2 (0,โˆฅyโˆฅ 2 ) โ„“1 โ„“ always holds in order to finish the proof. Noting that โ„Žy : R2๐‘š+|C| โ†ฆโ†’ R defined in (4.14) is continuous, convex (affine) in v, concave in ๐€, and further noting that both ๐ตโ„“|C| 2๐‘š 1 (0, 1) and ๐ตโ„“ 2 (0, โˆฅyโˆฅ 2 ) are compact and convex, we may apply Von Neumannโ€™s minimax theorem [84] to see that minvโˆˆ๐ต2๐‘š (0,โˆฅyโˆฅ ) max | C| โ„Žy (v, ๐€) = max | C| minvโˆˆ๐ต2๐‘š (0,โˆฅyโˆฅ ) โ„Žy (v, ๐€) โ„“2 2 ๐€โˆˆ๐ต (0,1) ๐€โˆˆ๐ต (0,1) โ„“2 2 โ„“1 โ„“1 holds. Thus, we will in fact be finished if we can show that minvโˆˆ๐ต2๐‘š (0,โˆฅyโˆฅ โ„Žy (v, ๐€) โ‰ค 0 holds 2) โ„“2 for each ๐€ โˆˆ ๐ตโ„“|C| 1 (0, 1). By rescaling this in turn is implied by showing that โˆ€u โˆˆ conv(C โˆช โˆ’C) โˆƒv โˆˆ ๐ตโ„“2๐‘š 2 (0, โˆฅyโˆฅ 2 ) such that (โŸจy, uโŸฉ โˆ’ โ„œ (โŸจv, ฮฆuโŸฉ) โˆ’ ๐œ– โˆฅyโˆฅ 2 ) โ‰ค 0 (4.15) holds. To prove (4.15) for a fixed u โˆˆ conv(C โˆช โˆ’C) โІ conv(๐‘†M โˆช โˆ’๐‘†M ) = conv(๐‘†M ) and thereby ฮฆu establish the stated theorem, one may set v = โˆฅyโˆฅ 2 โˆฅฮฆuโˆฅ 2 . Doing so we see that the left side of (4.15) simplifies to โŸจy, uโŸฉ โˆ’ โˆฅyโˆฅ 2 โˆฅฮฆuโˆฅ 2 โˆ’๐œ– โˆฅyโˆฅ 2 . To finish, we note that indeed โŸจy, uโŸฉ โˆ’ โˆฅyโˆฅ 2 โˆฅฮฆuโˆฅ 2 โˆ’๐œ– โˆฅyโˆฅ 2 โ‰ค โˆฅyโˆฅ 2 โˆฅuโˆฅ 2 โˆ’โˆฅyโˆฅ 2 โˆฅฮฆuโˆฅ 2 โˆ’๐œ– โˆฅyโˆฅ 2 โ‰ค โˆฅyโˆฅ 2 (โˆฅuโˆฅ 2 โˆ’โˆฅฮฆuโˆฅ 2 โˆ’๐œ–) โ‰ค 0 will then hold since ฮฆ provides ๐œ–-convex hull distortion for ๐‘†M . Lemma 4.4.4. Let M โŠ‚ R๐‘ be non-empty, ๐œ– โˆˆ (0, 1), and suppose that ฮฆ โˆˆ C๐‘šร—๐‘ provides ๐œ–-convex hull distortion for ๐‘†M . Then, there exists an outer bi-Lipschitz extension of ฮฆ, ๐‘“ : R๐‘ โ†’ 89 C๐‘š+1 , with the property that โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 22 โˆ’ โˆฅx โˆ’ yโˆฅ 22 โ‰ค 24๐œ– โˆฅx โˆ’ yโˆฅ 22 (4.16) holds for all x โˆˆ M and y โˆˆ R๐‘ . Proof. Given y โˆˆ R๐‘ let yM โˆˆ M satisfy โˆฅy โˆ’ yM โˆฅ 2 = inf xโˆˆM โˆฅy โˆ’ xโˆฅ 2 .2 We define ๏ฃฑ ๏ฃด ๏ฃฒ (ฮฆy, 0) if y โˆˆ M ๏ฃด ๏ฃด ๏ฃด ๐‘“ (y) :=  โˆš๏ธƒ  ๏ฃด 2 2 ๏ฃด ฮฆyM + ๐‘”(y โˆ’ yM ), โˆฅy โˆ’ yM โˆฅ 2 โˆ’ โˆฅ๐‘” (y โˆ’ yM )โˆฅ 2 if y โˆˆ/ M ๏ฃด ๏ฃด ๏ฃณ where ๐‘” is defined as in Lemma 4.4.3. Fix x โˆˆ M. If y โˆˆ M then โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 22 = โˆฅฮฆ(x โˆ’ y)โˆฅ 22 , and so we can see that โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 22 โˆ’โˆฅx โˆ’ yโˆฅ 22 โ‰ค 3๐œ– โˆฅx โˆ’ yโˆฅ 22 will hold since ฮฆ will be 3๐œ–-JL embedding of M โˆ’ M (recall the proof of Corollary 4.4.1 and note the linearity of ฮฆ). Thus, it suffices to consider a fixed y โˆˆ/ M. In that case we have โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 22 = โˆฅฮฆ(x โˆ’ yM ) โˆ’ ๐‘” (y โˆ’ yM ) โˆฅ 22 + โˆฅy โˆ’ yM โˆฅ 22 โˆ’ โˆฅ๐‘” (y โˆ’ yM )โˆฅ 22 = โˆฅy โˆ’ yM โˆฅ 22 + โˆฅฮฆ(x โˆ’ yM )โˆฅ 22 โˆ’2โ„œ (โŸจ๐‘” (y โˆ’ yM ) , ฮฆ(x โˆ’ yM )โŸฉ) (4.17) by the polarization identity and parallelogram law. Similarly we have that โˆฅx โˆ’ yโˆฅ 22 = โˆฅ(x โˆ’ yM ) โˆ’ (y โˆ’ yM )โˆฅ 22 = โˆฅy โˆ’ yM โˆฅ 22 +โˆฅx โˆ’ yM โˆฅ 22 โˆ’2โŸจy โˆ’ yM , x โˆ’ yM โŸฉ. (4.18) 2 One can see that it suffices to approximately compute y M in order to achieve (4.16) up to a fixed precision. 90 Subtracting (4.18) from (4.17) we can now see that โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 22 โˆ’โˆฅx โˆ’ yโˆฅ 22 โ‰ค โˆฅฮฆ(x โˆ’ yM )โˆฅ 22 โˆ’โˆฅx โˆ’ yM โˆฅ 22 + 2 |โ„œ (โŸจ๐‘” (y โˆ’ yM ) , ฮฆ(x โˆ’ yM )โŸฉ) โˆ’ โŸจy โˆ’ yM , x โˆ’ yM โŸฉ| โ‰ค 3๐œ– โˆฅx โˆ’ yM โˆฅ 22 + 4๐œ– โˆฅy โˆ’ yM โˆฅ 2 โˆฅx โˆ’ yM โˆฅ 2   2 2 2 โ‰ค 3๐œ– โˆฅx โˆ’ yM โˆฅ 2 + 2๐œ– โˆฅy โˆ’ yM โˆฅ 2 +โˆฅx โˆ’ yM โˆฅ 2 (4.19) where the second inequality again appeals to ฮฆ being a 3๐œ–-JL embedding of M โˆ’ M, and to Lemma 4.4.3. Considering (4.19) we can see that โ€ข โˆฅy โˆ’ yM โˆฅ 2 โ‰ค โˆฅy โˆ’ xโˆฅ 2 by the definition of yM , and so โ€ข โˆฅx โˆ’ yM โˆฅ 2 โ‰ค โˆฅx โˆ’ yโˆฅ 2 +โˆฅy โˆ’ yM โˆฅ 2 โ‰ค 2โˆฅx โˆ’ yโˆฅ 2 , and thus โ€ข โˆฅy โˆ’ yM โˆฅ 22 +โˆฅx โˆ’ yM โˆฅ 22 โ‰ค (โˆฅy โˆ’ yM โˆฅ 2 +โˆฅx โˆ’ yM โˆฅ 2 ) 2 โ‰ค 9โˆฅx โˆ’ yโˆฅ 22 . Using the last two inequalities above in (4.19) now yields the stated result. We are now prepared to prove the two main results of this section. 4.4.4 PROOF OF THEOREM 4.4.1 Apply Theorem 4.4.4 with ๐œ– โ† ๐œ–/24 in order obtain ๐œ–/24-convex hull distortion for ๐‘†M via ฮฆ. Then, apply Lemma 4.4.4. 4.4.5 PROOF OF THEOREM 4.4.2 To begin we apply Corollary 4.4.1 with, e.g., ๐‘ = 1/2 to demonstrate that an ๐‘โ€ฒโ€ฒ   2  โˆš๏ธ ๐‘ค(๐‘†M ) + ln(4) ร— ๐‘ ๐œ–2 matrix with i.i.d. standard normal random entries can provide (๐œ–/24)-convex hull distortion for ๐‘†M , where ๐‘โ€ฒโ€ฒ is an absolute constant. Hence, such a matrix ฮฆ exists. An application of Lemma 4.4.4 now finishes the proof. 91 4.5 THE PROOF OF THEOREM 4.2.1 We apply Theorem 4.4.2 together with Theorem 4.3.1 to bound the Gaussian width of ๐‘†M . 4.6 A NUMERICAL EVALUATION OF TERMINAL EMBEDDINGS In this section we consider several variants of the optimization approach mentioned in Section 3.3 of [81] for implementing a terminal embedding ๐‘“ : R๐‘ โ†’ R๐‘š+1 of a finite set ๐‘‹ โŠ‚ R๐‘ . In effect, this requires us to implement a function satisfying two sets of constraints from [81, Section 3.3] that are analogous to the two properties of ๐‘” : R๐‘ โ†’ C๐‘š listed at the beginning of the proof of Lemma 4.4.3. See Lines 1 and 2 of Algorithm 4.1 for a concrete example of one type of constrained minimization problem solved herein to accomplish this task. Algorithm 4.1 Terminal Embedding of a Finite Set Input: ๐œ– โˆˆ (0, 1), ๐‘‹ โŠ‚ R๐‘ , |๐‘‹ | =: ๐‘›, ๐‘† โŠ‚ R๐‘ , |๐‘†| =: ๐‘›โ€ฒ, ๐‘š โˆˆ N with ๐‘š < ๐‘, a random matrix with i.i.d. standard Gaussian entries, ฮฆ โˆˆ R๐‘šร—๐‘ , rescaled to perform as a JL embedding matrix ฮ  := โˆš1๐‘š ฮฆ Output: A terminal embedding of ๐‘‹, ๐‘“ โˆˆ R๐‘ โ†’ R๐‘š+1 , evaluated on ๐‘† for u โˆˆ ๐‘† do 1) Compute x๐‘ ๐‘ := argminxโˆˆ๐‘‹ โˆฅu โˆ’ xโˆฅ 2 2) Solve the following constrained minimization problem to compute a minimizer uโ€ฒ โˆˆ R๐‘š Minimize โ„Žu,x ๐‘ ๐‘ (z) := โˆฅzโˆฅ 22 + 2โŸจฮ (u โˆ’ x๐‘ ๐‘ ), zโŸฉ subject to โˆฅzโˆฅ 2 โ‰ค โˆฅu โˆ’ x๐‘ ๐‘ โˆฅ 2 |โŸจz, ฮ (x โˆ’ x๐‘ ๐‘ )โŸฉ โˆ’ โŸจu โˆ’ x๐‘ ๐‘ , x โˆ’ x๐‘ ๐‘ โŸฉ| โ‰ค ๐œ– โˆฅu โˆ’ x๐‘ ๐‘ โˆฅ 2 โˆฅx โˆ’ x๐‘ ๐‘ โˆฅ 2 , โˆ€x โˆˆ ๐‘‹ 3) Compute ๐‘“ : R๐‘ โ†’ R๐‘š+1 at u via ( (ฮ u, 0), uโˆˆ๐‘‹ ๐‘“ (u) := โ€ฒ โˆš๏ธƒ (ฮ x๐‘ ๐‘ + u , โˆฅu โˆ’ x๐‘ ๐‘ โˆฅ 22 โˆ’ โˆฅuโ€ฒ โˆฅ 22 ), u โˆˆ/ ๐‘‹ end for Crucially, we note that any choice uโ€ฒ โˆˆ R๐‘š of a z satisfying the two sets of constraints in Line 2 of Algorithm 4.1 for a given u โˆˆ R๐‘ is guaranteed to correspond to an evaluation of a valid terminal embedding of ๐‘‹ at u in Line 3. This leaves the choice of the objective function, โ„Žu,x ๐‘ ๐‘ , minimized in Line 2 of Algorithm 4.1 open to change without effecting its theoretical performance guarantees. Given this setup, several heretofore unexplored practical questions about terminal 92 embeddings immediately present themselves. These include: 1. Repeatedly solving the optimization problem in Line 2 of Algorithm 4.1 to evaluate a terminal embedding of ๐‘‹ on ๐‘† is certainly more computationally expensive than simply evaluating a standard linear Johnson-Lindenstrauss (JL) embedding of ๐‘‹ on ๐‘† instead. How do terminal embeddings empirically compare to standard linear JL embedding matrices on real-world data in the context of, e.g., compressive classification? When, if ever, is their additional computational expense actually justified in practice? 2. Though any choice of objective function โ„Žu,x ๐‘ ๐‘ in Line 2 of Algorithm 4.1 must result in a terminal embedding ๐‘“ of ๐‘‹ based on the available theory, some choices probably lead to better empirical performance than others. Whatโ€™s a good default choice? 3. How much dimensionality reduction are terminal embeddings capable of in the context of, e.g., accurate compressive classification using real-world data? In keeping with the motivating application discussed in Section 4.2.1 above, we will explore some preliminary answers to these three questions in the context of compressive classification based on real-world data below. 4.6.1 A COMPARISON CRITERIA: COMPRESSIVE NEAREST NEIGHBOR CLASSIFICATION Given a labelled data set D โŠ‚ R๐‘ with label set L, we let ๐ฟ๐‘Ž๐‘๐‘’๐‘™ : D โ†’ L denote the function which assigns the correct label to each element of the data set. To address the three questions above we will use compressive nearest neighbor classification accuracy as a primary measure of an embedding strategyโ€™s quality. See Algorithm 4.2 for a detailed description of how this accuracy can be computed for a given data set D. 93 Algorithm 4.2 Measuring Compressive Nearest Neighbor Classification Accuracy Input: ๐œ– โˆˆ (0, 1), A labeled data set D โŠ‚ R๐‘ split into two disjoint subsets: A training set ๐‘‹ โŠ‚ D with |๐‘‹ | =: ๐‘›, and a test set ๐‘† โŠ‚ D with |๐‘†| =: ๐‘›โ€ฒ, such that ๐‘† โˆฉ ๐‘‹ = โˆ…. A compressive dimension ๐‘š < ๐‘. Output: Successful Nearest Neighbor Classification Percentage for Data Embedded in R๐‘š+1 Fix ๐‘“ : R๐‘ โ†’ R๐‘š+1 , an embedding of the training data ๐‘‹ โŠ‚ R๐‘ into R๐‘š+1 satisfying (1 โˆ’ ๐œ–)โˆฅx โˆ’ yโˆฅ 2 โ‰ค โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 2 โ‰ค (1 + ๐œ–)โˆฅx โˆ’ yโˆฅ 2 for all x, y โˆˆ ๐‘‹. [Note: this can either be a JL-embedding of ๐‘‹, or a stronger terminal embedding of ๐‘‹.] % Embed the training data into R๐‘š+1 . for x โˆˆ ๐‘‹ do Compute ๐‘“ (x) using, e.g., Algorithm 4.1. end for % Classify the test data using its embedded distance in R๐‘š+1 . ๐‘=0 for u โˆˆ ๐‘† do Compute ๐‘“ (u) using, e.g., Algorithm 4.1 Compute x = argminyโˆˆ๐‘‹ โˆฅ ๐‘“ (u) โˆ’ ๐‘“ (y)โˆฅ 2 if ๐ฟ๐‘Ž๐‘๐‘’๐‘™(u) = ๐ฟ๐‘Ž๐‘๐‘’๐‘™(x) then ๐‘ = ๐‘+1 end if end for ๐‘ Output the Successful Classification Percentage = ร— 100% ๐‘›โ€ฒ Note that Algorithm 4.2 can be used to help us compare the quality of different embedding strategies. For example, one can use Algorithm 4.2 to compare different choices of objective functions โ„Žu,x ๐‘ ๐‘ in Line 2 of Algorithm 4.1 against one another by running Algorithm 4.2 multiple times on the same training and test data sets while only varying the implementation of Algorithm 4.1 each time. This is exactly the type of approach we will use below. Of course, before we can begin we must first decide on some labelled data sets D to use in our classification experiments. 4.6.2 OUR CHOICE OF TRAINING AND TESTING DATA SETS Herein we consider two standard benchmark image data sets which allow for accurate uncom- pressed Nearest Neighbor (NN) classification. The images in each data set can then be vectorized 94 and embedded using, e.g., Algorithm 4.1 in order to test the accuracies of compressed NN classifica- tion variants against both one another, as well as against standard uncompressed NN classification. These benchmark data sets are as follows. Figure 4.1 Example images from the MNIST data set (left), and the COIL-100 data set (right). The MNIST data set [68, 22] consists of 60,000 training images of 28 ร— 28-pixel grayscale hand-written images of the digits 0 through 9. Thus, MNIST has 10 labels to correctly classify between, and ๐‘ = 282 = 784. For all experiments involving the MNIST dataset, ๐‘›/10 digits of each type are selected uniformly at random to form the training set ๐‘‹, for a total of ๐‘› vectorized training images in R784 . Then, 100 digits of each type are randomly selected from those not used for training in order to form the test set ๐‘†, leading to a total of ๐‘›โ€ฒ = 1000 vectorized test images in R784 . See the left side of Figure 4.1 for example MNIST images. The COIL-100 data set [83] is a collection of 128ร—128-pixel color images of 100 objects, each photographed 72 times where the object has been rotated by 5 degrees each time to get a complete rotation. However, only the green color channel of each image is used herein for simplicity. Thus, herein COIL-100 consists of 7, 200 total vectorized images in R๐‘ with ๐‘ = 1282 = 16, 384, where each image has one of 100 different labels (72 images per label). For all experiments involving this COIL-100 data set, ๐‘›/100 training images are down sampled from each of the 100 objectsโ€™ rotational image sequences. Thus, the training sets each contain ๐‘›/100 vectorized images of each object, each photographed at rotations of โ‰ˆ 36000/๐‘› degrees (rounded to multiples of 5). The resulting training data sets therefore all consist of ๐‘› vectorized images in R16,384 . After forming each training set, 10 images of each type are then randomly selected from those not used for training 95 in order to form the test set ๐‘†, leading to a total of ๐‘›โ€ฒ = 1000 vectorized test images in R16,384 per experiment. See the right side of Figure 4.1 for example COIL-100 images. 4.6.3 A COMPARISON OF FOUR EMBEDDING STRATEGIES VIA NN CLASSIFICATION In this section we seek to better understand (๐‘–) when terminal embeddings outperform standard JL-embedding matrices in practice with respect to accurate compressive NN classification, (๐‘–๐‘–) what type of objective functions โ„Žu,x ๐‘ ๐‘ in Line 2 of Algorithm 4.1 perform best in practice when computing a terminal embedding, and (๐‘–๐‘–๐‘–) how much dimensionality reduction one can achieve with a terminal embedding without appreciably degrading standard NN classification results in practice. To gain insight on these three questions we will compare the following four embedding strategies in the context of NN classification. These strategies begin with the most trivial linear embeddings (i.e., the identity map) and slowly progress toward extremely non-linear terminal embeddings. (a) Identity: We use the data in its original uncompressed form (i.e., we use the trivial embedding ๐‘“ : R๐‘ โ†’ R๐‘ defined by ๐‘“ (u) = u in Algorithm 4.2). Here the embedding dimension ๐‘š + 1 is always fixed to be ๐‘. (b) Linear: We compressively embed our training data ๐‘‹ using a JL embedding. More specifi- cally, we generate an ๐‘š ร— ๐‘ random matrix ฮฆ with i.i.d. standard Gaussian entries and then   set ๐‘“ : R โ†’ R ๐‘ ๐‘š+1 โˆš1 to be ๐‘“ (u) := ๐‘š ฮฆu, 0 in Algorithm 4.2 for various choices of ๐‘š. It is then hoped that ๐‘“ will embed the test data ๐‘† well in addition to the training data ๐‘‹. Note that this embedding choice for ๐‘“ is consistent with Algorithm 4.1 where one lets ๐‘‹ = ๐‘‹ โˆช ๐‘† when evaluating Line 3, thereby rendering the minimization problem in Line 2 irrelevant. (c) A Valid Terminal Embedding Thatโ€™s as Linear as Possible: To minimize the point-wise difference between the terminal embedding ๐‘“ computed by Algorithm 4.1 and the linear map defined above in (b), we may choose the objective function in Line 2 of Algorithm 4.1 to be โ„Žu,x ๐‘ ๐‘ (z) := โŸจฮ (x๐‘ ๐‘ โˆ’ u), zโŸฉ. To see why solving this minimizes the pointwise difference between ๐‘“ and the linear map in (b), let uโ€ฒ be such that โŸจฮ (x๐‘ ๐‘ โˆ’ u), zโŸฉ is minimal subject to the constraints in Line 2 of Algorithm 4.1 when z = uโ€ฒ. Since u and x๐‘ ๐‘ are fixed here, we 96 note that z = uโ€ฒ will then also minimize โˆฅฮ (x๐‘ ๐‘ โˆ’ u)โˆฅ 22 + 2โŸจฮ (x๐‘ ๐‘ โˆ’ u), zโŸฉ + โˆฅu โˆ’ x๐‘ ๐‘ โˆฅ 22 = โˆฅฮ (x๐‘ ๐‘ โˆ’ u)โˆฅ 22 + โˆฅzโˆฅ 22 +2โŸจฮ (x๐‘ ๐‘ โˆ’ u), zโŸฉ + โˆฅu โˆ’ x๐‘ ๐‘ โˆฅ 22 โˆ’โˆฅzโˆฅ 22 = โˆฅฮ (x๐‘ ๐‘ โˆ’ u) + zโˆฅ 22 + โˆฅu โˆ’ x๐‘ ๐‘ โˆฅ 22 โˆ’โˆฅzโˆฅ 22  โˆš๏ธƒ  2 = ฮ x๐‘ ๐‘ + z, โˆฅu โˆ’ x๐‘ ๐‘ โˆฅ 22 โˆ’โˆฅzโˆฅ 22 โˆ’ (ฮ u, 0) 2 subject to the desired constraints. Hence, we can see that choosing z = uโ€ฒ as above is equivalent to minimizing โˆฅ ๐‘“ (u) โˆ’ (ฮ u, 0)โˆฅ 22 over all valid choices of terminal embeddings ๐‘“ that satisfy the existing theory. (d) A Terminal Embedding Computed by Algorithm 4.1 as Presented: This terminal embed- ding is computed using Algorithm 4.1 exactly as it is formulated above (i.e., with the objective function in Line 2 chosen to be โ„Žu,x ๐‘ ๐‘ (z) := โˆฅzโˆฅ 22 +2โŸจฮ (u โˆ’ x๐‘ ๐‘ ), zโŸฉ). Note that this choice of objective function was made to encourage non-linearity in the resulting terminal embedding ๐‘“ computed by Algorithm 4.1. To understand our intuition for making this choice of objec- tive function in order to encourage non-linearity in ๐‘“ , suppose that โˆฅzโˆฅ 22 +2โŸจฮ (u โˆ’ x๐‘ ๐‘ ), zโŸฉ is minimal subject to the constraints in Line 2 of Algorithm 4.1 when z = uโ€ฒ. Since u and x๐‘ ๐‘ are fixed independently of z this means that z = uโ€ฒ then also minimize โˆฅzโˆฅ 22 + 2โŸจฮ (u โˆ’ x๐‘ ๐‘ ), zโŸฉ + โˆฅฮ (u โˆ’ x๐‘ ๐‘ )โˆฅ 22 = โˆฅz + ฮ (u โˆ’ x๐‘ ๐‘ )โˆฅ 22 . Hence, this objection function is encouraging uโ€ฒ to be as close to โˆ’ฮ (u โˆ’ x๐‘ ๐‘ ) = ฮ (x๐‘ ๐‘ โˆ’ u) as possible subject to satisfying the constraints in Line 2 of Algorithm 4.1. Recalling (c) just above, we can now see that this is exactly encouraging uโ€ฒ to be a value for which the objective function we seek to minimize in (c) is relatively large. We are now prepared to empirically compare the four types of embeddings (a) โ€“ (d) on the data sets discussed above in Section 4.6.2. To do so, we run Algorithm 4.2 four times for several different 97 choices of embedding dimension ๐‘š on each data set below, varying the choice of embedding ๐‘“ between (a), (b), (c), and (d) for each value of ๐‘š. The successful classification percentage is then plotted as a function of ๐‘š for each different data set and choice of embedding. See Figures 4.2(a) and 4.2(c) for the results. In addition, to quantify the extent to which the embedding strategies (b) โ€“ (d) above are increasingly nonlinear, we also measure the relative distance between where each training-set embedding ๐‘“ maps points in the test sets versus where its associated linear training-set embedding would map them. More specifically, for each embedding ๐‘“ and test point u โˆˆ ๐‘† we let โˆฅ ๐‘“ (u) โˆ’ (ฮ u, 0)โˆฅ2 Nonlinearity ๐‘“ (u) = ร— 100% โˆฅ(ฮ u, 0)โˆฅ2 See Figures 4.2(b) and 4.2(d) for plots of Meanuโˆˆ๐‘† Nonlinearity ๐‘“ (u) for each of the embedding strategies (b) โ€“ (d) on the data sets discussed in Section 4.6.2. To compute solutions to the minimization problem in Line 2 of Algorithm 4.1 below we used the MATLAB package CVX [41, 40] with the initialization z0 = ฮ (u โˆ’ x๐‘ ๐‘ ) and ๐œ– = 0.1 in the constraints. All simulations were performed using MATLAB R2021b on an Intel desktop with a 2.60GHz i7-10750H CPU and 16GB DDR4 2933MHz memory. All code used to generate the figures below is publicly available at https://github.com/MarkPhilipRoach/TerminalEmbedding. 98 Figure 4.2 Figures 4.2(a) and 4.2(b) concern the MNIST data set with training set size ๐‘› = 4000 and test set size ๐‘›โ€ฒ = 1000 in all experiments. Similarly, Figures 4.2(c) and 4.2(d) concern the COIL-100 data set with training set size ๐‘› = 3600 and test set size ๐‘›โ€ฒ = 1000 in all experiments. In both Figures 4.2(a) and 4.2(c) the dashed black โ€œNearestNeighbor" line plots the classification accuracy when the Identity map (a) is used in Algorithm 4.2. Note that the โ€œNearestNeighbor" line is independent of ๐‘š because the identity map involves no compression. Similarly, in all of the Figures 4.2(a) โ€“ 4.2(d) the red โ€œTerminalEmbed" curves correspond to the use of Algorithm 4.1 as itโ€™s presented to compute highly non-linear terminal embeddings (embedding strategy (d) above), the green โ€œInnerProd" curves correspond to the use of nearly linear terminal embeddings (embedding strategy (c) above), and the blue โ€œLinear" curves correspond to the use of Linear JL embedding matrices (embedding strategy (b) above). Looking at Figure 4.2 one can see that the most non-linear embedding strategy (d) โ€“ i.e., Algorithm 4.1 โ€“ allows for the best compressed NN classification performance, outperforming standard linear JL embeddings for all choices of ๐‘š. Perhaps most interestingly, it also quickly converges to the uncompressed NN classification performance, matching it to within 1 percent 99 at the values of ๐‘š = 24 for MNIST and ๐‘š = 15 for COIL-100. This corresponds to relative dimensionality reductions of 100(1 โˆ’ 24/784)% โ‰ˆ 96.9% and 100(1 โˆ’ 15/16384)% โ‰ˆ 99.9%, respectively, with negligible loss of NN classification accuracy. As a result, it does indeed appear as if nonlinear terminal embeddings have the potential to allow for improvements in dimensionality reduction in the context of classification beyond what standard linear JL embeddings can achieve. Of course, challenges remain in the practical application of such nonlinear terminal embeddings. Principally, their computation by, e.g., Algorithm 4.1 is orders of magnitude slower than simply applying a JL embedding matrix to the data one wishes to compressively classify. Nonetheless, if dimension reduction at all costs is oneโ€™s goal, terminal embeddings appear capable of providing better results than their linear brethren. And, recent theoretical work [18] aimed at lessening their computational deficiencies looks promising. 4.6.4 ADDITIONAL EXPERIMENTS ON EFFECTIVE DISTORTIONS AND RUN TIMES In this section we further investigate the best performing terminal embedding strategy from the previous section (i.e., Algorithm 4.1) on the MNIST and COIL-100 data sets. In particular, we provide illustrative experiments concerning the improvement of (๐‘–) compressive classification accuracy with training set size, and (๐‘–๐‘–) the effective distortion of the terminal embedding with embedding dimension ๐‘š + 1. Furthermore, we also investigate (๐‘–๐‘–๐‘–) the run time scaling of Algorithm 4.1. To compute the effective distortions of a given (terminal) embedding of training data ๐‘‹, ๐‘“ : R๐‘ โ†’ R๐‘š+1 , over all available test and train data ๐‘‹ โˆช ๐‘† we use โˆฅ ๐‘“ (u) โˆ’ ๐‘“ (x)โˆฅ 2 โˆฅ ๐‘“ (u) โˆ’ ๐‘“ (x)โˆฅ 2 MaxDist ๐‘“ = ๐‘š๐‘Ž๐‘ฅ ๐‘š๐‘Ž๐‘ฅ , MinDist ๐‘“ = ๐‘š๐‘–๐‘› ๐‘š๐‘–๐‘› . ๐‘ฅโˆˆ๐‘‹ ๐‘ขโˆˆ๐‘†โˆช๐‘‹\{๐‘ฅ} โˆฅu โˆ’ xโˆฅ 2 ๐‘ฅโˆˆ๐‘‹ ๐‘ขโˆˆ๐‘†โˆช๐‘‹\{๐‘ฅ} โˆฅu โˆ’ xโˆฅ 2 100 Note that these correspond to estimates of the upper and lower multiplicative distortions, respec- tively, of a given terminal embedding in (4.2). In order to better understand the effect of the minimizer uโ€ฒ of the minimization problem in Line 2 of Algorithm 4.1 on the final embedding ๐‘“ , we will also separately consider the effective distortions of its component linear JL embedding u โ†ฆโ†’ (ฮ u, 0) below. See Figures 4.3 and 4.4 for such plots using the MNIST and COIL-100 data sets, respectively. Figure 4.3 This figure compares (a) compressive NN classification accuracies, and (b) the classification run times of Algorithm 4.2 averaged over all u โˆˆ ๐‘†, on the MNIST data set. Three different training data set sizes ๐‘› = |๐‘‹ | โˆˆ {1000, 2000, 4000} were fixed as the embedding dimension ๐‘š + 1 varied for each of the first two subfigures. Recall that the test set size is always fixed to ๐‘›โ€ฒ = 1000. In addition, Figure (c) compares MaxDist_ ๐‘“ and MinDist_ ๐‘“ for the nonlinear ๐‘“ computed by Algorithm 4.1 versus its component linear embedding u โ†ฆโ†’ (ฮ u, 0) as ๐‘š varies for a fixed embedded training set size of ๐‘› = 4000. Figure 4.4 Figures (a) and (b) here are run with identical parameters as for their corresponding subfigures in Figure 4.3, except using the COIL-100 data set. Similarly, Figure (c) compares MaxDist_ ๐‘“ and MinDist_ ๐‘“ for the nonlinear ๐‘“ computed by Algorithm 4.1 versus its component linear embedding u โ†ฆโ†’ (ฮ u, 0) as ๐‘š varies for a fixed embedded training set size of ๐‘› = 3600. Looking at Figures 4.3 and 4.4 one notes several consistent trends. First, compressive classifi- cation accuracy increases with both training set size ๐‘› and embedding dimension ๐‘š, as generally 101 expected. Second, compressive classification run times also increase with training set size ๐‘› (as well as more mildly with embedding dimension ๐‘š). This is mainly due to the increase in the number of constraints in Line 2 of Algorithm 4.1 with the training set size ๐‘›. Finally, the distortion plots indicate that the nonlinear terminal embeddings ๐‘“ computed by Algorithm 4.1 tend to preserve the lower distortions of their component linear JL embeddings while simultaneously increasing their upper distortions. As a result, the nonlinear terminal embeddings considered here appear to spread the initially JL embedded data out, perhaps pushing different classes away from one another in the process. If so, it would help explained the increased compressive NN classification accuracy observed for Algorithm 4.1 in Figure 4.2. 4.6.5 ADDITIONAL SIMULATION Figure 4.5 Example images from the Fashion-MNIST data set. The Fashion-MNIST data set [68, 22] consists of 60,0000 training images of 28 ร— 28-pixel grayscale images of clothing items, with 10 labels to correctly classify between, and ๐‘ = 282 = 784. For all experiments involving the Fashion-MNIST dataset ๐‘›/10 images of each clothing item are selected uniformly at random to form the training set ๐‘‹, for a total of ๐‘› vectorized training images in R784 . Then, 100 images of each type are randomly selected from those not used for training in order to form the test set ๐‘†, leading to a total of ๐‘›โ€ฒ = 1000 vectorized test images in R784 . See Figure 4.5 for example Fashion-MNIST images. 102 Figure 4.6 Figure 4.6(a) compares compressive NN classification accuracies. Three different training data set sizes ๐‘› = |๐‘‹ | โˆˆ {1000, 2000, 4000} were fixed as the embedding dimension ๐‘š + 1 varied. The test set size is again fixed to ๐‘›โ€ฒ = 1000. Figure 4.6(b) concerns the algorithmic comparison, as discussed in Figure 4.2, with training set size ๐‘› = 4000. 4.6.6 DEMONSTRATION OF NON-LINEARITY BY DENSELY APPROXIMATED TERMINAL EMBEDDING OF MANIFOLDS In this section, we perform numerical simulations to approximate the various ways of embedding from R3 to R2 . In particular, we will look at the TerminalEmbed, InnerProd, and Linear approaches outlined in Section 4.6.3. For the set ๐‘‹, we take a randomly generated dense sample of a plane in ๐‘ƒ โŠ‚ R3 and for ๐‘†, we take dense sample of smooth curve. In all cases, the ๐‘† is formed from evenly spaced samples. Figure 4.7 demonstrates when ๐‘† is densely approximating a line through this densely sampled plane. Figure 4.8 demonstrates when ๐‘† is densely approximating a circle through this densely sampled plane. Both examples demonstrate the non-linearity of our TerminalEmbed output in comparison to the InnerProd, and especially the Linear output. 103 Figure 4.7 Figure 4.7 demonstrates when ๐‘† is densely approximating a line. Three different training data set sizes ๐‘› = |๐‘‹ | โˆˆ {102 , 103 , 104 } were used. ๐‘›โ€ฒ = |๐‘†|= 100 in all three cases. 104 Figure 4.8 Figure 4.8 demonstrates when ๐‘† is densely approximating a circle. Three different training data set sizes ๐‘› = |๐‘‹ | โˆˆ {102 , 103 , 104 } were used. ๐‘›โ€ฒ = |๐‘†|= 100 in all three cases. 105 4.7 COMPRESSED CLASSIFICATION FROM PHASELESS MEASUREMENTS We now consider applying compressed classification to the measurements generated from chapters 2 and 3. For near-field ptychographic measurements, we use measurements of the form given in 2.2, using p and m as given in Lemma 2.4.2. We could also perform a similar simulation using far-field ptychographic measurements, where measurements are of the form given in 3.1, using masks given in 2.8. For both of these measurements, we will then vectorize so that we can apply our classification algorithm. For the addition of noise, we apply a Gaussian noise vector to each u โˆˆ ๐‘† as given in Algorithm 4.2, that is, we replace u with u + n, n โˆˆ R๐‘‘ , before applying the minimization problem from Algorithm 4.1. For our x, we will use the grayscaled vectorizations of the MNIST and COIL-100 images as performed in the previous section. This however, causes issues with the original space will be for which we embed. For instance, consider a grayscale image that is ๐‘ƒ ร— ๐‘ƒ-pixels. Once vectorized, this is a ๐‘ƒ2 -vector and thus the matrix of measurements will be ๐‘ƒ2 ร— ๐‘ƒ2 . Once this has been vectorized, we will finally result in a ๐‘ƒ4 -vector for our training/testing data. For the MNIST data, this would result in vectors of size 284 โ‰ˆ 600 thousand, whereas for the COIL-100 data, this would result in vectors of size 1284 โ‰ˆ 268 million. As a consequence, this results in long running times for the algorithm. To counteract this, rather than taking the full measurement matrix, we instead sub-sample based on the frequencies. In particular, we will simply take the first column for our training/testing data. We will demonstrate that this approach not only allows for successful classification, but it does not impact the result in a significant manner. Firstly we demonstrate our results on classifying NFP measurements of the MNIST dataset. Since the vectorized images are of length ๐‘‘ = 282 = 784, we choose ๐›ฟ = 25 such that ๐‘‘ is divisible by 2๐›ฟ โˆ’ 1. 106 Figure 4.9 Figure 4.9(a) compares compressive NN classification accuracies for the MNIST-NFP measurements (๐›ฟ = 25). Three different training data set sizes ๐‘› = |๐‘‹ | โˆˆ {4000, 8000} were fixed as the embedding dimension ๐‘š + 1. NearestNeighbor refers to the nearest neighbor classification in the original space. Linear refers to the linear embedding described in Section 4.6.3. Figure 4.9(b) concerns the classification in which varying levels of noise are applied, with the training data set size fixed to ๐‘› = 8000. Noiseless NearestNeighbor refers to the noiseless nearest neighbor classification in the original space. For both figures, the test set size is fixed to ๐‘›โ€ฒ = 1000. Here, we encounter an issue in choosing a suitable ๐›ฟ since the vectorized COIL-100 images are of length ๐‘‘ = 1282 = 16, 384 = 214 , as such no ๐›ฟ exists wherein ๐‘‘ is divisible by 2๐›ฟ โˆ’ 1. Instead, we choose a ๐›ฟ that approximates the ratio of ๐‘‘/๐›ฟ that we used the MNIST-NFP simulations above. We then artificially extend ๐‘‘ using the same process discussed in Section 2.8 to ensure divisibility. Figure 4.10 Figure 4.10(a) compares compressive NN classification accuracies for the COIL-NFP measurements (๐›ฟ = 525). Two different training data set sizes ๐‘› = |๐‘‹ | โˆˆ {1800, 3600} were fixed as the embedding dimension ๐‘š + 1. NearestNeighbor refers to the nearest neighbor classification in the original space. Linear refers to the linear embedding described in Section 4.6.3. Figure 4.10(b) concerns the classification in which varying levels of noise are applied, with the training data set size fixed to ๐‘› = 3600. Noiseless NearestNeighbor refers to the noiseless nearest neighbor classification in the original space. For both figures, the test set size is fixed to ๐‘›โ€ฒ = 1000. Finally, we look at the effect of sub-sampling the frequency index, using simply the nearest 107 neighbor classification in the original space. Figure 4.11 Figure representing the non-compressed NN classification of the MNIST-NFP measurements (๐›ฟ = 25), with varying levels of sub-sampling, that is, ๐‘Œ๐‘˜,โ„“ = |(p โˆ— (๐‘† ๐‘˜ m โ—ฆ x))โ„“ | 2 , (๐‘˜, โ„“) โˆˆ [๐‘‘]0 ร— [๐พ]0 , with varying levels for ๐พ, up to 2๐›ฟ โˆ’ 1. 108 CHAPTER 5 CONTRIBUTIONS AND FUTURE WORK Herein we outline a summary of contributions and possible areas for future work. In Chapter 2, we proved that for a certain given point spread function and mask, we could provide a recovery guarantee theorem when using a lifted linear approach. We developed a weighted spectral gap lower bound which we then applied to our specific problem. We introduced two new algorithms for recovering a specimen of interest from near-field ptychographic measurements. Both of these algorithms rely on first reformulating and reshaping our measurements so that they resemble widely- studied far-field ptychographic measurements. We then recover our method using either Wirtinger Flow or using methods based on [54]. Algorithm 2.1 is computational efficient and, to the best of our knowledge, is the first algorithm with provable recovery guarantees for measurements of this form. Algorithm 2.2, on the other hand, has the advantage of being applied to more general masks with global support. Developing more efficient and provably accurate algorithms for this latter class of measurements remains an interesting avenue for future work. In Chapter 3, we developed a novel approach for solving blind far-field ptychography. We introduced an algorithm for recovering a specimen of interest from blind far-field ptychographic measurements. This algorithm relies on reformulating the measurements so that they resemble widely-studied blind deconvolutional measurements. This leads to transposed Khatri-Rao product estimates of our specimen which are then able to be recovered by angular synchronization. We then use these estimates in applying inverse Fourier transforms, point-wise division, and angular synchronization to recover estimates for the mask. Finally, we use a best error estimate sorting algorithm to find the final estimate of both the specimen and mask. As shown in numerical results, Algorithm 3.6 recovers both the sample and mask within a good margin of error. It also provides stability under noise. A further goal for this research would be to adapt the existing recovery guarantee theorems for the selected blind deconvolutional recovery algorithm, in which the assumed Gaussian matrix C is replaced with Khatri-Rao matrix Cโ€ฒ(๐‘˜) = C โ€ข ๐‘† ๐‘˜ Cฬ„. In particular, 109 this would mean providing alternate inequalities for the four key conditions laid out in Theorem 3.4.6. In Chapter 4, we generalized the Johnson-Lindenstrauss lemma for manifolds. In particular, we let M be a compact ๐‘‘-dimensional submanifold of R๐‘ with reach ๐œ and volume ๐‘‰M , and we proved โˆš๐‘‘  that for all ๐œ– โˆˆ (0, 1), that a nonlinear function ๐‘“ : R โ†’ R exists with ๐‘š โ‰ค ๐ถ ๐‘‘/๐œ– log ๐‘ ๐‘š 2  ๐‘‰M ๐œ such that (1 โˆ’ ๐œ–)โˆฅx โˆ’ yโˆฅ 2 โ‰ค โˆฅ ๐‘“ (x) โˆ’ ๐‘“ (y)โˆฅ 2 โ‰ค (1 + ๐œ–)โˆฅx โˆ’ yโˆฅ 2 holds for all x โˆˆ M and y โˆˆ R๐‘ . The proof is constructive and yielded an algorithm which works well in practice. In particular, we empirically demonstrated herein that such nonlinear functions allow for more accurate compressive nearest neighbor classification than standard linear Johnson-Lindenstrauss embeddings do in practice. Furthermore, it was demonstrated that this approach works when the labelled data consists of NFP measurements. Future work in this area would be to develop more computationally efficient algorithms for computing terminal embeddings. Additionally, exploring the upper Lipschitz constants of terminal embeddings of sets with reach > 0 in small tubes around the set. Achieving these might allow for new results to be proven in machine learning application contexts. 110 BIBLIOGRAPHY [1] Ali Ahmed, Alireza Aghasi, and Paul Hand. Blind deconvolutional phase retrieval via convex programming. Advances in Neural Information Processing Systems, 31, 2018. [2] Ali Ahmed, Benjamin Recht, and Justin Romberg. Blind deconvolution using convex pro- gramming. IEEE Transactions on Information Theory, 60(3):1711โ€“1732, 2013. [3] Jacopo Antonello and Michel Verhaegen. Modal-based phase retrieval for adaptive optics. JOSA A, 32(6):1160โ€“1170, 2015. [4] GR Ayers and J Christopher Dainty. Iterative blind deconvolution method and its applications. Optics letters, 13(7):547โ€“549, 1988. [5] Richard G Baraniuk and Michael B Wakin. Random projections of smooth manifolds. Foundations of computational mathematics, 9(1):51โ€“77, 2009. [6] Robert Beinert and Gerlind Plonka. Ambiguities in one-dimensional discrete phase retrieval from fourier magnitudes. Journal of Fourier Analysis and Applications, 21(6):1169โ€“1198, 2015. [7] Tamir Bendory, Robert Beinert, and Yonina C Eldar. Fourier phase retrieval: Uniqueness and algorithms. In Compressed Sensing and its Applications, pages 55โ€“91. Springer, 2017. [8] JR Bond and G Efstathiou. The statistics of cosmic background radiation fluctuations. Monthly Notices of the Royal Astronomical Society, 226(3):655โ€“687, 1987. [9] Joe Buhler and Zinovy Reichstein. Symmetric functions and the phase problem in crystal- lography. Transactions of the American Mathematical Society, 357(6):2353โ€“2377, 2005. [10] Oliver Bunk, Martin Dierolf, Sรธren Kynde, Ian Johnson, Othmar Marti, and Franz Pfeiffer. Influence of the overlap parameter on the convergence of the ptychographical iterative engine. Ultramicroscopy, 108(5):481โ€“487, 2008. [11] Imre Bรกrรกny. A generalization of carathรฉodoryโ€™s theorem. Discrete Mathematics, 40(2):141โ€“ 152, 1982. [12] Emmanuel J Candรจs, Yonina C Eldar, Thomas Strohmer, and Vladislav Voroninski. Phase retrieval via matrix completion. SIAM review, 57(2):225โ€“251, 2015. [13] Emmanuel J. Candรจs, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval from coded diffraction patterns. Applied and Computational Harmonic Analysis, 39(2):277 โ€“ 299, 2015. [14] Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985โ€“2007, 111 2015. [15] Alfred S Carasso. Direct blind deconvolution. SIAM Journal on Applied Mathematics, 61(6):1980โ€“2007, 2001. [16] Huibin Chang, Pablo Enfedaque, and Stefano Marchesini. Blind ptychographic phase re- trieval via convergent alternating direction method of multipliers. SIAM Journal on Imaging Sciences, 12(1):153โ€“185, 2019. [17] Huibin Chang, Li Yang, and Stefano Marchesini. Fast iterative algorithms for blind phase retrieval: A survey. arXiv preprint arXiv:2211.06619, 2022. [18] Yeshwanth Cherapanamjeri and Jelani Nelson. Terminal embeddings in sublinear time. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 1209โ€“1216. IEEE, 2022. [19] Jesse N Clark, Xiaojing Huang, Ross J Harder, and Ian K Robinson. Continuous scanning mode for ptychography. Optics letters, 39(20):6066โ€“6069, 2014. [20] Albert Cohen, Wolfgang Dahmen, and Ronald DeVore. Compressed sensing and best ๐‘˜-term approximation. Journal of the American mathematical society, 22(1):211โ€“231, 2009. [21] Mark A Davenport, Marco F Duarte, Michael B Wakin, Jason N Laska, Dharmpal Takhar, Kevin F Kelly, and Richard G Baraniuk. The smashed filter for compressive classification and target recognition. In Computational Imaging V, volume 6498, page 64980H. International Society for Optics and Photonics, 2007. [22] Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141โ€“142, 2012. [23] Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE international conference on computer vision, pages 576โ€“584, 2015. [24] Jan Drenth. Principles of protein X-ray crystallography. Springer Science & Business Media, 2007. [25] George H Dunteman. Principal components analysis. Number 69. Sage, 1989. [26] TB Edo, DJ Batey, AM Maiden, C Rau, U Wagner, ZD Pevsiฤ‡, TA Waigh, and JM Rodenburg. Sampling in x-ray ptychography. Physical Review A, 87(5):053850, 2013. [27] Armin Eftekhari and Michael B Wakin. New analysis of manifold embeddings and signal recovery from compressive measurements. Applied and Computational Harmonic Analysis, 39(1):67โ€“109, 2015. 112 [28] Yonina C Eldar, Pavel Sidorenko, Dustin G Mixon, Shaby Barel, and Oren Cohen. Sparse phase retrieval from short-time fourier measurements. IEEE Signal Processing Letters, 22(5):638โ€“642, 2014. [29] Michael Elkin, Arnold Filtser, and Ofer Neiman. Terminal embeddings. arXiv preprint arXiv:1603.02321, 2016. [30] Albert Fannjiang and Pengwen Chen. Blind ptychography: uniqueness and ambiguities. Inverse Problems, 36(4):045005, 2020. [31] Albert Fannjiang and Zheqing Zhang. Fixed point analysis of douglasโ€“rachford splitting for ptychography and phase retrieval. SIAM Journal on Imaging Sciences, 13(2):609โ€“650, 2020. [32] Herbert Federer. Curvature measures. Transactions of the American Mathematical Society, 93(3):418โ€“418, March 1959. [33] Frank Filbir, Felix Krahmer, and Oleh Melnyk. On recovery guarantees for angular synchro- nization. Journal of Fourier Analysis and Applications, 27(2):1โ€“26, 2021. [34] DA Fish, AM Brinicombe, ER Pike, and JG Walker. Blind deconvolution by means of the richardsonโ€“lucy algorithm. JOSA A, 12(1):58โ€“65, 1995. [35] Horacio E Fortunato and Manuel M Oliveira. Fast high-quality non-blind deconvolution using sparse adaptive priors. The Visual Computer, 30(6):661โ€“671, 2014. [36] Grant R Fowles. Introduction to modern optics. Courier Corporation, 1989. [37] Si Gao, Peng Wang, Fucai Zhang, Gerardo T Martinez, Peter D Nellist, Xiaoqing Pan, and Angus I Kirkland. Electron ptychographic microscopy for three-dimensional imaging. Nature communications, 8(1):1โ€“8, 2017. [38] Pierre Godard, Marc Allain, and Virginie Chamard. Imaging of highly inhomogeneous strain field in nanocrystals using x-ray bragg ptychography: A numerical study. Physical Review B, 84(14):144109, 2011. [39] JW Goodman. Film-grain noise in wavefront-reconstruction imaging. JOSA, 57(4):493โ€“502, 1967. [40] Michael Grant and Stephen Boyd. Graph implementations for nonsmooth convex programs. In V. Blondel, S. Boyd, and H. Kimura, editors, Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pages 95โ€“110. Springer-Verlag Limited, 2008. http://stanford.edu/~boyd/graph_dcp.html. [41] Michael Grant and Stephen Boyd. CVX: Matlab software for disciplined convex program- 113 ming, version 2.1. http://cvxr.com/cvx, March 2014. [42] D Griffin, D Deadrick, and Jae Lim. Speech synthesis from short-time fourier transform magnitude and its application to speech processing. In ICASSPโ€™84. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 9, pages 61โ€“64. IEEE, 1984. [43] R Hegerl and W Hoppe. Phase evaluation in generalized diffraction (ptychography). Proc. Fifth Eur. Cong. Electron Microscopy, pages 628โ€“629, 1972. [44] Reiner Hegerl and W Hoppe. Dynamische theorie der kristallstrukturanalyse durch elektro- nenbeugung im inhomogenen primรคrstrahlwellenfeld. Berichte der Bunsengesellschaft fรผr physikalische Chemie, 74(11):1148โ€“1154, 1970. [45] W Hoppe. Diffraction in inhomogeneous primary wave fields. Acta Crystallogr. A 25, pages 495โ€“501,508โ€“515, 1969. [46] SO Hruszkewycz, Marc Allain, MV Holt, CE Murray, JR Holt, PH Fuoss, and Virginie Chamard. High-resolution three-dimensional structural microscopy by single-angle bragg ptychography. Nature materials, 16(2):244โ€“251, 2017. [47] Xiaojing Huang, Kenneth Lauer, Jesse N Clark, Weihe Xu, Evgeny Nazaretski, Ross Harder, Ian K Robinson, and Yong S Chu. Fly-scan ptychography. Scientific reports, 5(1):1โ€“5, 2015. [48] IBM. https://www.ibm.com/topics/knn. [49] M. A. Iwen, B. Preskitt, R. Saab, and A. Viswanathan. Phase retrieval from local measure- ments: improved robustness via eigenvector-based angular synchronization. Applied and Computational Harmonic Analysis, 48:415 โ€“ 444, 2020. [50] Mark Iwen, Michael Perlmutter, and Mark Philip Roach. Toward fast and provably accurate near-field ptychographic phase retrieval. Sampling Theory, Signal Processing, and Data Analysis, 21(1):6, 2023. [51] Mark Iwen, Arman Tavakoli, and Benjamin Schmidt. Lower bounds on the low-distortion embedding dimension of submanifolds of R๐‘› . arXiv preprint arXiv:2105.13512, 2021. [52] Mark A Iwen, Deanna Needell, Elizaveta Rebrova, and Ali Zare. Lower memory oblivious (tensor) subspace embeddings with fewer random bits: modewise methods for least squares. SIAM Journal on Matrix Analysis and Applications, 42(1):376โ€“416, 2021. [53] Mark A. Iwen, Benjamin Schmidt, and Arman Tavakoli. On fast johnson-lindenstrauss embeddings of compact submanifolds of R๐‘ with boundary. Arxiv, 2110.04193, 2021. [54] Mark A. Iwen, Aditya Viswanathan, and Yang Wang. Fast Phase Retrieval from Local 114 Correlation Measurements. SIAM Journal on Imaging Sciences, 9(4):1655โ€“1688, 2016. [55] Kishore Jaganathan, Yonina Eldar, and Babak Hassibi. Phase retrieval with masks using convex optimization. In 2015 IEEE International Symposium on Information Theory (ISIT), pages 1655โ€“1659. IEEE, 2015. [56] Kishore Jaganathan, Yonina C Eldar, and Babak Hassibi. Stft phase retrieval: Uniqueness guarantees and recovery algorithms. IEEE Journal of selected topics in signal processing, 10(4):770โ€“781, 2016. [57] Francis Arthur Jenkins and Harvey Elliott White. Fundamentals of optics. Indian Journal of Physics, 25:265โ€“266, 1957. [58] Yi Jiang, Zhen Chen, Yimo Han, Pratiti Deb, Hui Gao, Saien Xie, Prafull Purohit, Mark W Tate, Jiwoong Park, Sol M Gruner, et al. Electron ptychography of 2d materials to deep sub-รฅngstrรถm resolution. Nature, 559(7714):343โ€“349, 2018. [59] GH John, R Kohavi, and K Pfleger. Machine learning: proceedings of the eleventh interna- tional conference. Irrelevant features and the subset selection problem, 1994. [60] William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26:189โ€“206, 1984. [61] Aggelos K Katsaggelos and Kuen-Tsair Lay. Maximum likelihood blur identification and im- age restoration using the em algorithm. IEEE Transactions on Signal Processing, 39(3):729โ€“ 733, 1991. [62] Samina Khalid, Tehmina Khalil, and Shamila Nasreen. A survey of feature selection and fea- ture extraction techniques in machine learning. In 2014 science and information conference, pages 372โ€“378. IEEE, 2014. [63] Mojzesz Kirszbraun. รœber die zusammenziehende und lipschitzsche transformationen. Fun- damenta Mathematicae, 22(1):77โ€“108, 1934. [64] Deepa Kundur and Dimitrios Hatzinakos. Blind image restoration via recursive filtering using deterministic constraints. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, volume 4, pages 2283โ€“2286. IEEE, 1996. [65] Deepa Kundur and Dimitrios Hatzinakos. A novel blind deconvolution scheme for image restoration using recursive filtering. IEEE transactions on signal processing, 46(2):375โ€“390, 1998. [66] Marcus Frederick Charles Ladd, Rex Alfred Palmer, and Rex Alfred Palmer. Structure determination by X-ray crystallography, volume 233. Springer, 1977. 115 [67] Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 633โ€“638. IEEE, 2017. [68] Y. LeCun, C. Cortes, and C.J.C. Burges. The mnist database of handwritten digits. 1998. [69] Anat Levin, Yair Weiss, Fredo Durand, and William T Freeman. Understanding blind deconvolution algorithms. IEEE transactions on pattern analysis and machine intelligence, 33(12):2354โ€“2367, 2011. [70] Peng Li, Nicholas W Phillips, Steven Leake, Marc Allain, Felix Hofmann, and Virginie Chamard. Revealing nano-scale lattice distortions in implanted material with 3d bragg ptychography. Nature communications, 12(1):1โ€“13, 2021. [71] Xiaodong Li, Shuyang Ling, Thomas Strohmer, and Ke Wei. Rapid, robust, and reliable blind deconvolution via nonconvex optimization. Applied and computational harmonic analysis, 47(3):893โ€“934, 2019. [72] Aristidis C Likas and Nikolas P Galatsanos. A variational approach for bayesian blind image deconvolution. IEEE transactions on signal processing, 52(8):2222โ€“2233, 2004. [73] Z-C Liu, Rui Xu, and Y-H Dong. Phase retrieval in protein crystallography. Acta Crystallo- graphica Section A: Foundations of Crystallography, 68(2):256โ€“265, 2012. [74] Deepali Lodhia, Daniel Brown, Frank Brueckner, Ludovico Carbone, Paul Fulda, Keiko Kokeyama, and Andreas Freise. Phase effects due to beam misalignment on diffraction gratings. arXiv preprint arXiv:1303.7016, 2013. [75] Sepideh Mahabadi, Konstantin Makarychev, Yury Makarychev, and Ilya Razenshteyn. Non- linear dimension reduction via outer bi-lipschitz extensions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1088โ€“1101, 2018. [76] Andrew Maiden, Daniel Johnson, and Peng Li. Further improvements to the ptychographical iterative engine. Optica, 4(7):736โ€“745, 2017. [77] Andrew M Maiden and John M Rodenburg. An improved ptychographical phase retrieval algorithm for diffractive imaging. Ultramicroscopy, 109(10):1256โ€“1262, 2009. [78] Sami Eid Merhi. Phase Retrieval from Continuous and Discrete Ptychographic Measure- ments. Michigan State University, 2019. [79] Rafael Molina, Aggelos K Katsaggelos, Javier Abad, and Javier Mateos. A bayesian ap- proach to blind deconvolution based on dirichlet distributions. In 1997 IEEE international conference on acoustics, speech, and signal processing, volume 4, pages 2809โ€“2812. IEEE, 1997. 116 [80] Shyam Narayanan and Jelani Nelson. Optimal terminal dimensionality reduction in euclidean space. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 1064โ€“1069, 2019. [81] Shyam Narayanan and Jelani Nelson. Optimal terminal dimensionality reduction in euclidean space. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 1064โ€“1069, 2019. [82] S Nawab, T Quatieri, and Jae Lim. Signal reconstruction from short-time fourier transform magnitude. IEEE Transactions on Acoustics, Speech, and Signal Processing, 31(4):986โ€“998, 1983. [83] Sameer A Nene, Shree K Nayar, and Hiroshi Murase. Columbia object image library (coil-100). 1996. [84] J v Neumann. Zur theorie der gesellschaftsspiele. Mathematische annalen, 100(1):295โ€“320, 1928. [85] Michal Odstrvcil, Mirko Holler, and Manuel Guizar-Sicairos. Arbitrary-path fly-scan pty- chography. Optics express, 26(10):12585โ€“12593, 2018. [86] Byung Tae Oh, Shaw-min Lei, and C-C Jay Kuo. Advanced film grain noise extraction and synthesis for high-definition video coding. IEEE transactions on circuits and systems for video technology, 19(12):1717โ€“1729, 2009. [87] Olympus. https://www.olympus-lifescience.com/en/microscope-resource/primer/ techniques/oblique/obliqueintro/. [88] Xiaoze Ou, Roarke Horstmeyer, Guoan Zheng, and Changhuei Yang. High numerical aper- ture fourier ptychography: principle, implementation and characterization. Optics express, 23(3):3472โ€“3491, 2015. [89] Michael Perlmutter, Sami Merhi, Aditya Viswanathan, and Mark Iwen. Inverting spectro- gram measurements via aliased wigner distribution deconvolution and angular synchroniza- tion. Information and Inference: A Journal of the IMA, 2020. [90] Franz Pfeiffer. X-ray ptychography. Nature Photonics, 12(1):9โ€“17, 2018. [91] M Portnoff. Short-time fourier analysis of sampled speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(3):364โ€“373, 1981. [92] Brian Preskitt and Rayan Saab. Admissible measurements and robust algorithms for pty- chography. Journal of Fourier Analysis and Applications, 27(2):1โ€“39, 2021. [93] Brian P Preskitt. Phase retrieval from locally supported measurements. University of 117 California, San Diego, 2018. [94] Klaus G Puschmann and Franz Kneer. On super-resolution in astronomical imaging. As- tronomy & Astrophysics, 436(1):373โ€“378, 2005. [95] Jianliang Qian, Chao Yang, A Schirotzek, F Maia, and S Marchesini. Efficient algorithms for ptychographic phase retrieval. Inverse Problems and Applications, Contemp. Math, 615:261โ€“280, 2014. [96] JM Rodenburg. Ptychography and related diffractive imaging methods. Advances in Imaging and Electron Physics, 150:87โ€“184, 2008. [97] P Rosero-Montalvo, P Diaz, Jose Alejandro Salazar-Castro, DF Pena-Unigarro, Andres J Anaya-Isaza, Juan C Alvarado-Pรฉrez, Roberto Therรณn, and Diego Hernรกn Peluffo-Ordรณรฑez. Interactive data visualization using dimensionality reduction and similarity-based repre- sentations. In IberoAmerican Congress on Pattern Recognition, pages 334โ€“342. Springer, 2017. [98] P Yu Rotha and David M Paganin. Blind phase retrieval for aberrated linear shift-invariant imaging systems. New Journal of Physics, 12(7):073040, 2010. [99] John W Sammon. A nonlinear mapping for data structure analysis. IEEE Transactions on computers, 100(5):401โ€“409, 1969. [100] Pavel Sidorenko and Oren Cohen. Single-shot ptychography. Optica, 3(1):9โ€“14, 2016. [101] VI Slyusar. A family of face products of matrices and its properties. Cybernetics and systems analysis, 35(3):379โ€“384, 1999. [102] George F Smoot and Douglas Scott. Cosmic background radiation. The European Physical Journal C-Particles and Fields, 15:145โ€“149, 2000. [103] MS Smyth and JHJ Martin. x ray crystallography. Molecular Pathology, 53(1):8, 2000. [104] D.A. Spielman. Spectral and algebraic graph theory. Incomplete Draft, Yale University, 2019. [105] Filip Sroubek and Jan Flusser. Multichannel blind iterative image restoration. IEEE Trans- actions on Image Processing, 12(9):1094โ€“1106, 2003. [106] J-L Starck and Fionn Murtagh. Astronomical image and data analysis. 2007. [107] Marco Stockmar, Peter Cloetens, Irene Zanette, Bjoern Enders, Martin Dierolf, Franz Pfeif- fer, and Pierre Thibault. Near-field ptychography: phase retrieval for inline holography using a structured illumination. Scientific reports, 3(1):1โ€“6, 2013. 118 [108] Marco Stockmar, Irene Zanette, Martin Dierolf, Bjoern Enders, Richard Clare, Franz Pfeiffer, Peter Cloetens, Anne Bonnin, and Pierre Thibault. X-ray near-field ptychography for optically thick specimens. Physical Review Applied, 3(1):014005, 2015. [109] Yukio Takahashi, Akihiro Suzuki, Shin Furutaku, Kazuto Yamauchi, Yoshiki Kohmura, and Tetsuya Ishikawa. Bragg x-ray ptychography of a silicon crystal: Visualization of the dislo- cation strain field and the production of a vortex beam. Physical Review B, 87(12):121201, 2013. [110] Pierre Thibault, Martin Dierolf, Oliver Bunk, Andreas Menzel, and Franz Pfeiffer. Probe retrieval in ptychographic coherent diffractive imaging. Ultramicroscopy, 109(4):338โ€“343, 2009. [111] Pierre Thibault, Martin Dierolf, Andreas Menzel, Oliver Bunk, Christian David, and Franz Pfeiffer. High-resolution scanning x-ray diffraction microscopy. Science, 321(5887):379โ€“ 382, 2008. [112] Eric Thiรฉbaut and J-M Conan. Strict a priori constraints for maximum-likelihood blind deconvolution. JOSA A, 12(3):485โ€“492, 1995. [113] Christoph Thรคle. 50 years sets with positive reachโ€“a survey. Surveys in Mathematics and its Applications, 3:123โ€“165, 2008. Publisher: University Constantin Brancusi. [114] Lei Tian, Xiao Li, Kannan Ramchandran, and Laura Waller. Multiplexed coded illumination for fourier ptychography with an led array microscope. Biomedical optics express, 5(7):2376โ€“ 2389, 2014. [115] Esther HR Tsai, Ivan Usov, Ana Diaz, Andreas Menzel, and Manuel Guizar-Sicairos. X-ray ptychography with extended depth of field. Optics express, 24(25):29089โ€“29108, 2016. [116] Robert N Tubbs. Lucky exposures: Diffraction limited astronomical imaging through the atmosphere. arXiv preprint astro-ph/0311481, 2003. [117] Roman Vershynin. High-dimensional probability: an introduction with applications in data science. Number 47 in Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, New York, NY, 2018. [118] Aditya Viswanathan and Mark Iwen. Fast angular synchronization for phase retrieval via incomplete information. In Wavelets and Sparsity XVI, volume 9597, page 959718. Interna- tional Society for Optics and Photonics, 2015. [119] Adriaan Walther. The question of phase retrieval in optics. Optica Acta: International Journal of Optics, 10(1):41โ€“49, 1963. [120] DP Woody and PL Richards. Spectrum of the cosmic background radiation. Physical Review 119 Letters, 42(14):925, 1979. [121] Michael M Woolfson and Michael Mark Woolfson. An introduction to X-ray crystallography. Cambridge University Press, 1997. [122] Shixiang Wu, Chao Dong, and Yu Qiao. Blind image restoration based on cycle-consistent network. IEEE Transactions on Multimedia, 2022. [123] H Yang, RN Rutte, L Jones, M Simson, R Sagawa, H Ryll, M Huth, TJ Pennycook, MLH Green, H Soltau, et al. Simultaneous atomic-resolution electron ptychography and z-contrast imaging of light and heavy elements in complex nanostructures. Nature Communications, 7(1):1โ€“8, 2016. [124] Yu-Li You and Mostafa Kaveh. Blind image restoration by anisotropic regularization. IEEE Transactions on Image Processing, 8(3):396โ€“407, 1999. [125] He Zhang, Shaowei Jiang, Jun Liao, Junjing Deng, Jian Liu, Yongbing Zhang, and Guoan Zheng. Near-field fourier ptychography: super-resolution phase retrieval via speckle illumi- nation. Optics express, 27(5):7498โ€“7512, 2019. [126] Guoan Zheng. Fourier ptychographic imaging: a MATLAB tutorial. Morgan & Claypool Publishers, 2016. [127] Guoan Zheng, Roarke Horstmeyer, and Changhuei Yang. Wide-field, high-resolution fourier ptychographic microscopy. Nature photonics, 7(9):739โ€“745, 2013. [128] Guoan Zheng, Cheng Shen, Shaowei Jiang, Pengming Song, and Changhuei Yang. Con- cept, implementations and applications of fourier ptychography. Nature Reviews Physics, 3(3):207โ€“223, 2021. 120 APPENDIX A NEAR-FIELD PTYCHOGRAPHY A.1 TECHNICAL LEMMAS We state the following lemmas for the sake of completeness. Note we index all vectors modulo ๐‘‘. Lemma A.1.1. Let x, y โˆˆ C๐‘‘ . We have that |โŸจx, yโŸฉ| 2 = |โŸจx, yโŸฉ| 2 . Lemma A.1.2. Let x, y โˆˆ C๐‘‘ . We have that โŸจx, y โ—ฆ zโŸฉ = โŸจx โ—ฆ y, zโŸฉ. Proof. By the definition of the inner product and the Hadamard product ๐‘‘โˆ’1 โˆ‘๏ธ ๐‘‘โˆ’1 โˆ‘๏ธ ๐‘‘โˆ’1 โˆ‘๏ธ โŸจx, y โ—ฆ zโŸฉ = ๐‘ฅ ๐‘› (y โ—ฆ z)๐‘› = ๐‘ฅ๐‘› ๐‘ฆ๐‘› ๐‘ง๐‘› = (x โ—ฆ y)๐‘› ๐‘ง ๐‘› = โŸจx โ—ฆ y, zโŸฉ. ๐‘›=0 ๐‘›=0 ๐‘›=0 Lemma A.1.3. Let x, y โˆˆ C๐‘‘ , ๐‘˜ โˆˆ Z. We have that 1. (x โˆ— y) ๐‘˜ = โŸจ๐‘†โˆ’๐‘˜e x, yโŸฉ; 2. x โ—ฆ ๐‘† ๐‘˜ y = ๐‘† ๐‘˜ (๐‘†โˆ’๐‘˜ x โ—ฆ y); 3. โŸจ๐‘† ๐‘˜ x, yโŸฉ = โŸจx, ๐‘†โˆ’๐‘˜ yโŸฉ. Proof. Proof of 1: Let x, y โˆˆ C๐‘‘ . By the definition of the circular convolution ๐‘‘โˆ’1 โˆ‘๏ธ ๐‘‘โˆ’1 โˆ‘๏ธ ๐‘‘โˆ’1 โˆ‘๏ธ (x โˆ— y) ๐‘˜ = ๐‘ฅ ๐‘˜โˆ’๐‘› ๐‘ฆ ๐‘› = ๐‘ฅ ๐‘˜โˆ’๐‘› ๐‘ฆ ๐‘› = (๐‘†โˆ’๐‘˜ex)๐‘› ๐‘ฆ ๐‘› = โŸจ๐‘†โˆ’๐‘˜e x, yโŸฉ. ๐‘›=0 ๐‘›=0 ๐‘›=0 Proof of 2: Let x, y โˆˆ C๐‘‘ , ๐‘˜ โˆˆ Z. Let ๐‘› โˆˆ [๐‘‘]0 be arbitrary. Then we have that (x โ—ฆ ๐‘† ๐‘˜ y)๐‘› = x๐‘› (๐‘† ๐‘˜ y)๐‘› = (๐‘†โˆ’๐‘˜ x)๐‘›+๐‘˜ y๐‘›+๐‘˜ = (๐‘† ๐‘˜ (๐‘†โˆ’๐‘˜ x โ—ฆ y))๐‘› . 121 Proof of 3: Noting that we index modulo ๐‘‘, we have that ๐‘‘โˆ’1 โˆ‘๏ธ ๐‘‘โˆ’1 โˆ‘๏ธ ๐‘‘+๐‘˜โˆ’1 โˆ‘๏ธ ๐‘‘โˆ’1 โˆ‘๏ธ โŸจ๐‘† ๐‘˜ x, yโŸฉ = (๐‘† ๐‘˜ x)๐‘› ๐‘ฆ ๐‘› = ๐‘ฅ ๐‘›+๐‘˜ ๐‘ฆ ๐‘› = ๐‘ฅ ๐‘› ๐‘ฆ ๐‘›โˆ’๐‘˜ = x๐‘› ๐‘ฆ ๐‘›โˆ’๐‘˜ = โŸจx, ๐‘†โˆ’๐‘˜ yโŸฉ. ๐‘›=0 ๐‘›=0 ๐‘›=๐‘˜ ๐‘›=0 We now give our proof of Lemma 2.5.1. Proof of Lemma 2.5.1. Fix ๐œ™ โˆˆ [0, 2๐œ‹). By the triangle inequality we have โˆฅx โˆ’ ๐‘’ i๐œ™ xest โˆฅ 2 =โˆฅx(mag) โ—ฆ x(๐œƒ) โˆ’ xest (mag) โ—ฆ ๐‘’๐‘–๐œ™ x(๐œƒ) est โˆฅ 2 โ‰ค โˆฅx(mag) โ—ฆ x(๐œƒ) โˆ’ x(mag) โ—ฆ ๐‘’ i๐œ™ xest (๐œƒ) โˆฅ2 + โˆฅx(mag) โ—ฆ ๐‘’ i๐œ™ xest(๐œƒ) (mag) โˆ’ xest โ—ฆ ๐‘’๐‘–๐œ™ x(๐œƒ) est โˆฅ 2 . (A.1) For the first term, we may use the inequality โˆฅu โ—ฆ vโˆฅ 2 โ‰ค โˆฅuโˆฅ โˆž โˆฅvโˆฅ 2 to see that โˆฅx(mag) โ—ฆ x(๐œƒ) โˆ’ x(mag) โ—ฆ ๐‘’ i๐œ™ xest(๐œƒ) โˆฅ 2 โ‰ค โˆฅx(mag) โˆฅ โˆž โˆฅx(๐œƒ) est โˆ’ ๐‘’ โˆ’i๐œ™ (๐œƒ) x โˆฅ2 โˆ’i๐œ™ (๐œƒ) = โˆฅxโˆฅ โˆž โˆฅx(๐œƒ) est โˆ’ ๐‘’ x โˆฅ2. (A.2) For the second term, we see that โˆฅx(mag) โ—ฆ ๐‘’ i๐œ™ x(๐œƒ) i๐œ™ (๐œƒ) (mag) (mag) est โˆ’ xest โ—ฆ ๐‘’๐‘–๐œ™ x(๐œƒ) est โˆฅ 2 โ‰ค โˆฅ๐‘’ xest โˆฅ โˆž ยทโˆฅx (mag) โˆ’ xest โˆฅ2 (mag) = โˆฅx(mag) โˆ’ xest โˆฅ2. (A.3) Combining (A.2) and (A.3) with (A.1) and minimizing over ๐œ™ completes the proof. A.2 AUXILLIARY RESULTS FROM SPECTRAL GRAPH THEORY In this section, we will prove several lemmas related to the graph Laplacian and its eigenvalues. The following definition defines a partial ordering on the set of weighted graphs induced by the spectrum of their graph Laplacians. 122 Definition A.2.1. We say that a symmetric matrix A is positive semi-definite and write A โชฐ 0 if x๐‘‡ Ax โ‰ฅ 0, โˆ€x โˆˆ R๐‘› (or equivalently if all the eigenvalues of A are non-negative). We define the Loewner order1 โชฐ by the rule that A โชฐ B if A โˆ’ B is positive semi-definite (or equivalently if x๐‘‡ Ax โ‰ฅ x๐‘‡ Bx, โˆ€x โˆˆ R๐‘› ). For two graphs ๐บ and ๐ป with the same number of vertices, we will define ๐บ โชฐ ๐ป if L๐บ โชฐ L๐ป . We will also write ๐บ โชฐ ๐‘–=0 ๐ป๐‘– if L๐บ โชฐ ๐‘–=0 P๐‘›โˆ’1 P๐‘›โˆ’1 L๐ป๐‘– , and for a scalar ๐‘ we will write ๐บ โชฐ ๐‘๐ป if L๐บ โชฐ ๐‘L๐ป . Remark A.2.1. If ๐บ โชฐ ๐ป and ๐œ๐บ and ๐œ๐ป are the smallest non-zero eigenvalues L๐บ and L๐ป , then x๐‘‡ L๐บ x one can use the fact that ๐œ๐บ = ๐‘š๐‘–๐‘›๐‘› ๐‘‡ (see [104]) to verify that ๐œ๐บ โ‰ฅ ๐œ๐ป . ๐‘ฅโˆˆ๐‘… x x ๐‘ฅโŠฅ1 We now define some basic terminology for weighted graphs. (We note that these definitions may also be applied to unweighted graphs by interpreting each edge as having weight one.) Definition A.2.2. (Weighted Distance Definitions) Let ๐บ = (๐‘‰, ๐ธ, W) be a weighted graph. (i) For any subgraph ๐ป = (๐‘‰ โ€ฒ, ๐ธ โ€ฒ) of ๐บ, we define the weight of ๐ป, denoted ๐‘ค(๐ป), as โˆ‘๏ธ ๐‘ค(๐ป) B ๐‘Š๐‘–, ๐‘— , (๐‘–, ๐‘—)โˆˆ๐ธ โ€ฒ (ii) If ๐‘ƒ is a path inside ๐บ, we will let len(๐‘ƒ) B ๐‘ค(๐‘ƒ) denote the weighted length of ๐‘ƒ. (iii) We define the weighted distance between two vertices ๐‘ข and ๐‘ฃ, dist๐บ (๐‘ข, ๐‘ฃ), to be the minimal weighted length of any path from ๐‘ข to ๐‘ฃ (iv) The weighted diameter of ๐บ, denoted by diam(๐บ), is the maximum distance between any two vertices in ๐บ, that is, diam(๐บ) B max{dist๐บ (๐‘ข, ๐‘ฃ) | (๐‘ข, ๐‘ฃ) โˆˆ ๐‘‰ ร— ๐‘‰ }. In some contexts, it will be useful to consider the pointwise inverses of the weights ๐‘Š๐‘–, ๐‘— . Definition A.2.3. (Inverse Weighted Distance Definitions) Let ๐บ = (๐‘‰, ๐ธ, W) be a weighted graph. 1 The Loewner order is actually a partial ordering since there exist A and B such that A ฬธ โชฐ B and B ฬธ โชฐ A. 123 (i) For any subgraph ๐ป = (๐‘‰ โ€ฒ, ๐ธ โ€ฒ) of ๐บ, the inverse weight of ๐ป, is defined by 1 ๐‘ค โˆ’1 (๐ป) B โˆ‘๏ธ , (๐‘–, ๐‘—)โˆˆ๐ธ โ€ฒ ๐‘Š๐‘–, ๐‘— (ii) For a path ๐‘ƒ inside ๐บ, we refer to lenโˆ’1 (๐‘ƒ) B ๐‘ค โˆ’1 (๐‘ƒ) as the inverted weighted length of ๐‘ƒ. (iii) For two vertices ๐‘ข and ๐‘ฃ we will refer to the minimal value of ๐‘ค โˆ’1 (๐‘ƒ) over all paths from ๐‘ข to โˆ’1 ๐‘ฃ as the inverted weighted distance, denoted by dist๐บ (๐‘ข, ๐‘ฃ). (iv) The inverse weighted diameter of ๐บ, denoted by diamโˆ’1 (๐บ), is the maximum distance between any two vertices in ๐บ, that is, diamโˆ’1 (๐บ) B max{dist๐บ โˆ’1 (๐‘ข, ๐‘ฃ) | (๐‘ข, ๐‘ฃ) โˆˆ ๐‘‰ ร— ๐‘‰ }. The proof of Lemma 2.5.4 (and thus Theorem 2.2.1), relies on the following lemma to provide a lower bound for the spectral gap ๐œ๐บ . Lemma A.2.1. (Weighted Spectral Bound) Let ๐บ = (๐‘‰, ๐ธ, W) be a weighted, connected graph with |๐‘‰ |= ๐‘›, and let ๐‘Šmin and ๐‘Šmax denote the minimum and maximum value of any of the (nonzero) weights of ๐บ. Then 2 ยท ๐‘Šmin ๐œ๐บ โ‰ฅ . ๐‘Šmax (๐‘› โˆ’ 1) ยท diamโˆ’1 (๐บ) To prove Lemma A.2.1, we recall the following lemma from [104]. Lemma A.2.2. (Weighted Path Inequality) (Lemma 5.6.1 [104]) Let ๐‘ƒ๐‘› = (๐‘ฃ 0 , ๐‘ฃ 1 , . . . , ๐‘ฃ ๐‘›โˆ’1 ) be a path of length ๐‘› and assume that, for all 0 โ‰ค ๐‘– < ๐‘› โˆ’ 2, ๐‘ค ๐‘– , the weight of (๐‘ฃ ๐‘– , ๐‘ฃ๐‘–+1 ) is strictly positive. For 0 โ‰ค ๐‘– < ๐‘› โˆ’ 2, let ๐บ ๐‘–,๐‘–+1 = (๐‘‰, (๐‘ฃ ๐‘– , ๐‘ฃ๐‘–+1 )) be the graph whose vertex set ๐‘‰ is that same as the vertex set of ๐บ but only has a single edge (๐‘ฃ ๐‘– , ๐‘ฃ๐‘–+1 ). Similarly, let ๐บ 0,๐‘›โˆ’1 = (๐‘‰, (๐‘ฃ 0 , ๐‘ฃ ๐‘›โˆ’1 )) be the graph with only a single edge (๐‘ฃ 0 , ๐‘ฃ ๐‘›โˆ’1 ). Then ๐‘›โˆ’2 ๐‘›โˆ’2  โˆ‘๏ธ 1  โˆ‘๏ธ ๐บ 0,๐‘›โˆ’1 โ‰ผ ๐‘ค ๐‘– ๐บ ๐‘–,๐‘–+1 2 = lenโˆ’1 (๐‘ƒ๐‘› ) ยท ๐‘ƒ๐‘› , ๐‘–=0 ๐‘ค ๐‘– ๐‘–=0 124 where the final equality is interpreted is the sense of ๐ด โ‰ผ ๐ต and ๐ต โ‰ผ ๐ด. The Proof of Lemma A.2.1. For ๐‘ข, ๐‘ฃ โˆˆ ๐‘‰, let ๐บ ๐‘ข,๐‘ฃ = (๐‘‰, (๐‘ข, ๐‘ฃ)) denote the graph with only a single edge from ๐‘ข to ๐‘ฃ and let ๐‘ƒ๐‘ข,๐‘ฃ denote a path from ๐‘ข to ๐‘ฃ with minimal weighted inverse length. Then, by Lemma A.2.2 we have ๐บ ๐‘ข,๐‘ฃ โ‰ผ lenโˆ’1 (๐‘ƒ๐‘ข,๐‘ฃ (๐บ)) ยท ๐‘ƒ๐‘ข,๐‘ฃ (๐บ) โ‰ผ diamโˆ’1 (๐บ) ยท ๐‘ƒ๐‘ข,๐‘ฃ (๐บ) โ‰ผ diamโˆ’1 (๐บ) ยท ๐บ, where the last inequality holds since for all subgraphs ๐ป of a graph ๐บ, ๐ป โ‰ผ ๐บ (Section 5.2 [104]) Let ๐พe๐‘› be the extended weighted, complete graph on ๐‘› vertices with weighted matrix f W, where ๏ฃฑ ๏ฃด ๏ฃฒ๐‘Š๐‘–, ๐‘— , (๐‘–, ๐‘—) โˆˆ ๐ธ ๏ฃด ๏ฃด ๏ฃด e๐‘–, ๐‘— = ๐‘Š . Then by summing over all vertices, we have that ๏ฃด ๏ฃด๐‘Šmin , (๐‘–, ๐‘—) ฬธโˆˆ ๐ธ ๏ฃด ๏ฃด ๏ฃณ diamโˆ’1 (๐บ) ยท L๐บ , โˆ‘๏ธ โˆ‘๏ธ L๐พe๐‘› = e๐‘–, ๐‘— L๐บ ๐‘–, ๐‘— โ‰ผ ๐‘Šmax ๐‘Š 0โ‰ค๐‘–< ๐‘— โ‰ค๐‘›โˆ’1 0โ‰ค๐‘–< ๐‘— โ‰ค๐‘›โˆ’1 = ๐‘›(๐‘› โˆ’ 1), we then have that P Since 0โ‰ค๐‘–< ๐‘— โ‰ค๐‘›โˆ’1 1 e๐‘› โ‰ผ ๐‘Šmax ๐‘›(๐‘› โˆ’ 1) diamโˆ’1 (๐บ) ยท ๐บ, ๐พ 2 which, by Remark (A.2.1) implies ๐‘Šmax ๐‘›(๐‘› โˆ’ 1) ๐œ๐พe๐‘› โ‰ค diamโˆ’1 (๐บ)๐œ๐บ , 2 and therefore 2๐œ๐พe๐‘› ๐œ๐บ โ‰ฅ . ๐‘Šmax ๐‘›(๐‘› โˆ’ 1) ยท diamโˆ’1 (๐บ) 2 Under this construction, we see that if we have a weight which is much larger than all of the others, it effectively gets nullified by taking the inverse. 125 Letting ๐พ๐‘› be the unweighted graph on ๐‘› vertices we see that โˆ‘๏ธ โˆ‘๏ธ x๐‘‡ L๐พe๐‘› x = ๐‘Še๐‘Ž,๐‘ (๐‘ฅ(๐‘Ž) โˆ’ ๐‘ฅ(๐‘))2 โ‰ฅ ๐‘Šmin (๐‘ฅ(๐‘Ž) โˆ’ ๐‘ฅ(๐‘))2 = ๐‘Šmin x๐‘‡ L๐พ๐‘› x. (๐‘Ž,๐‘)โˆˆ[๐‘›]0 (๐‘Ž,๐‘)โˆˆ[๐‘›]0 x๐‘‡ L๐พe๐‘› x x๐‘‡ L๐พ๐‘› x Therefore, ๐œ๐พe๐‘› = ๐‘š๐‘–๐‘›๐‘› ๐‘‡ โ‰ฅ ๐‘Šmin ๐‘š๐‘–๐‘›๐‘› ๐‘‡ โ‰ฅ ๐‘Šmin ยท ๐œ๐พ๐‘› . Thus, since ๐‘ฅโˆˆ๐‘… x x ๐‘ฅโˆˆ๐‘… x x ๐‘ฅโŠฅ1 ๐‘ฅโŠฅ1 2๐‘Šmin ๐œ๐พ ๐‘ = ๐‘› (5.4.1, [104]), we have that ๐œ๐บ โ‰ฅ . ๐‘Šmax (๐‘› โˆ’ 1) ยท diamโˆ’1 (๐บ) Our next result uses Lemma A.2.1 to produce a bound for ๐œ๐บ in terms of the diameter of the underlying unweighted graph. Theorem A.2.1. Let ๐บ = (๐‘‰, ๐ธ, W) be a weighted graph and let ๐‘Šmin and ๐‘Šmax be the minimum and maximum value of any its (nonzero) weights. Then 2 ยท (๐‘Šmin )2 ๐œ๐บ โ‰ฅ , ๐‘Šmax (๐‘› โˆ’ 1)diam(๐บ unw ) where ๐บ unw = (๐‘‰, ๐ธ) is the unweighted counterpart of ๐บ. Proof. Let ๐บ โ€ฒ = (๐‘‰, ๐ธ, Wโ€ฒ), where ๐‘Š๐‘–,โ€ฒ ๐‘— = 1/๐‘Š๐‘–, ๐‘— if ๐‘Š๐‘–, ๐‘— ฬธ= 0 and ๐‘Š๐‘–,โ€ฒ ๐‘— = 0 otherwise. Let ๐‘Šmax โ€ฒ be 1 the maximum element of Wโ€ฒ. Observe that by construction, we have ๐‘Šmax โ€ฒ = . Moreover, it ๐‘Šmin follows immediately from Definition A.2.2 that we have diamโˆ’1 (๐บ) = diam(๐บ โ€ฒ). Therefore, 1 diamโˆ’1 (๐บ) = diam(๐บ โ€ฒ) โ‰ค ๐‘Šmax โ€ฒ ยท diam(๐บ unw ) = diam(๐บ unw ). ๐‘Šmin So by Lemma A.2.1, we have that 2 ยท ๐‘Šmin 2 ยท (๐‘Šmin )2 ๐œ๐บ โ‰ฅ โ‰ฅ . ๐‘Šmax (๐‘› โˆ’ 1) ยท diamโˆ’1 (๐บ) ๐‘Šmax (๐‘› โˆ’ 1)diam(๐บ unw ) 126 APPENDIX B FAR-FIELD PTYCHOGRAPHY B.1 SUB-SAMPLING In this section, we discuss sub-sampling lemmas that can be used in conjunction with Algorithm 3.1. In many cases, an illumination of the sample can cause damage to the sample, and applying the illumination beam (which can be highly irradiative) repeatedly at a single point can destroy it. Considering the risks to the sample and the costs of operating the measurement equipment, there are strong incentives to reduce the number of illuminations applied to any object. ๐‘‘ Definition B.1.1. Let ๐‘  โˆˆ N such that ๐‘  | ๐‘‘. We define the sub-sampling operator ๐‘ ๐‘  : C๐‘‘ โˆ’โ†’ C ๐‘  defined component-wise via (๐‘ ๐‘  x)๐‘› := ๐‘ฅ ๐‘›ยท๐‘  , โˆ€๐‘› โˆˆ [๐‘‘/๐‘ ]0 . (B.1) We now have an aliasing lemma which allows us to see the impact of performing the Fourier transform on a sub-sampled specimen. Lemma B.1.1. (Aliasing) ([78], Lemma 2.0.1.) Let ๐‘  โˆˆ N such that ๐‘  | ๐‘‘, x โˆˆ C๐‘‘ , ๐œ” โˆˆ [ ๐‘‘๐‘  ]0 . Then we have that ๐‘ โˆ’1   1 โˆ‘๏ธ ๐น ๐‘‘ (๐‘ ๐‘  x) = xฬ‚ ๐‘‘ . (B.2) ๐‘  ๐œ” ๐‘  ๐‘Ÿ=0 ๐œ”โˆ’๐‘Ÿ ๐‘  ๐‘‘ Proof. Let ๐‘‘ โˆˆ N and suppose ๐‘  โˆˆ N divides ๐‘‘. Let x โˆˆ C๐‘‘ and ๐œ” โˆˆ [ ]0 be arbitrary. By the ๐‘  definition of the discrete Fourier transform and sub-sampling operator, we have that ๐‘‘ ๐‘‘   ๐‘  โˆ’1 ๐‘  โˆ’1 โˆ’ 2 ๐œ‹๐‘–๐‘›๐œ” 2 ๐œ‹๐‘–๐‘›๐œ”๐‘  ๐‘ฅ ๐‘›๐‘  ๐‘’ โˆ’ โˆ‘๏ธ โˆ‘๏ธ ๐น ๐‘‘ (๐‘ ๐‘  x) = (๐‘ ๐‘  x)๐‘› ๐‘’ ๐‘‘/๐‘  = ๐‘‘ . (B.3) ๐‘  ๐œ” ๐‘›=0 ๐‘›=0 127 By the inverse DFT and by collecting terms, we have that ๐‘‘ ๐‘‘ ๐‘  โˆ’1 ๐‘  โˆ’1  โˆ‘๏ธ ๐‘‘โˆ’1 โˆ’ 2 ๐œ‹๐‘–๐‘›๐œ”๐‘  1 โˆ‘๏ธ 2 ๐œ‹๐‘–๐‘Ÿ ๐‘›๐‘   2 ๐œ‹๐‘–๐‘›(๐‘Ÿ โˆ’ ๐œ”)๐‘  ๐‘ฅห†๐‘Ÿ ๐‘’ ๐‘‘ ๐‘’ โˆ’ ๐‘‘ โˆ‘๏ธ ๐‘ฅ ๐‘›๐‘  ๐‘’ ๐‘‘ = . (B.4) ๐‘›=0 ๐‘‘ ๐‘›=0 ๐‘Ÿ=0 By treating this as a sum of DFTs, we then have that ๐‘‘ ๐‘  โˆ’1  โˆ‘๏ธ ๐‘‘โˆ’1 ๐‘ โˆ’1 ๐‘ โˆ’1 1 โˆ‘๏ธ 2 ๐œ‹๐‘–๐‘Ÿ ๐‘›๐‘   2 ๐œ‹๐‘–๐‘›(๐‘Ÿ โˆ’ ๐œ”)๐‘  1 โˆ‘๏ธ 1 โˆ‘๏ธ ๐‘ฅห†๐‘Ÿ ๐‘’ ๐‘‘ ๐‘’ โˆ’ ๐‘‘ = ๐‘ฅห†๐œ”+๐‘Ÿ ๐‘‘ = ๐‘ฅห† ๐‘‘ . (B.5) ๐‘‘ ๐‘›=0 ๐‘Ÿ=0 ๐‘  ๐‘Ÿ=0 ๐‘  ๐‘  ๐‘Ÿ=0 ๐œ”โˆ’๐‘Ÿ ๐‘  Before we start looking at aliased WDD, we need to introduce a lemma which will show the effect of taking a Fourier transform of an autocorrelation. Lemma B.1.2. (Fourier Transform Of Autocorrelation) ([78], Lemma 2.0.2.) Let x โˆˆ C๐‘‘ and ๐›ผ, ๐œ” โˆˆ [๐‘‘]0 . Then   1 2๐œ‹๐‘–๐œ”๐›ผ/๐‘‘  ยฏ .  F๐‘‘ (x โ—ฆ ๐‘†๐œ” xฬ„) = ๐‘’ F๐‘‘ (xฬ‚ โ—ฆ ๐‘†โˆ’๐›ผ xฬ‚) (B.6) ๐›ผ ๐‘‘ ๐œ” Proof. Let x โˆˆ C๐‘‘ and let ๐›ผ, ๐œ” โˆˆ [๐‘‘]0 be arbitrary. By the convolution theorem, we have that   1 F๐‘‘ (x โ—ฆ ๐‘†๐œ” xฬ„) = (xฬ‚ โˆ—๐‘‘ F๐‘‘ (๐‘†๐œ” xฬ„))๐›ผ . (B.7) ๐›ผ ๐‘‘ By technical equality (iii), we can revert the Fourier transform of the shift operator to the modulation operator of the Fourier transform ๐‘‘โˆ’1 ๐‘‘โˆ’1 1 1 1 โˆ‘๏ธ 1 โˆ‘๏ธ 2 ๐œ‹๐‘– ๐œ”(๐›ผโˆ’๐‘›) (xฬ‚ โˆ—๐‘‘ F๐‘‘ (๐‘†๐œ” xฬ„))๐›ผ = (xฬ‚ โˆ—๐‘‘ (๐‘Š๐œ” xฬ„ห† )๐›ผ = ๐‘ฅห†๐‘› (๐‘Š๐œ” xฬ„ห† )๐›ผโˆ’๐‘› = ๐‘ฅห†๐‘› ๐‘ฅห†ยฏ๐›ผโˆ’๐‘› ๐‘’ ๐‘‘ , (B.8) ๐‘‘ ๐‘‘ ๐‘‘ ๐‘›=0 ๐‘‘ ๐‘›=0 with the latter equalities being the definition of the convolution and modulation. By applying 128 reversals and using that xฬƒหœ = x, we have that ๐‘‘โˆ’1 ๐‘‘โˆ’1 1 โˆ‘๏ธ 2 ๐œ‹๐‘– ๐œ”(๐›ผโˆ’๐‘›) 1 2 ๐œ‹๐‘– ๐œ” ๐›ผ โˆ‘๏ธ 2 ๐œ‹๐‘– ๐œ” ๐›ผ ๐‘ฅห†๐‘› ๐‘ฅห†ยฏ๐›ผโˆ’๐‘› ๐‘’ ๐‘‘ = ๐‘’ ๐‘‘ ๐‘ฅห†๐‘› ๐‘ฅหœห†ยฏ๐‘›โˆ’๐›ผ ๐‘’ โˆ’ ๐‘‘ . (B.9) ๐‘‘ ๐‘›=0 ๐‘‘ ๐‘›=0 Finally by applying technical equality (vi) and using the definition of the shift operator and Hadamard product, we have that ๐‘‘โˆ’1 ๐‘‘โˆ’1 1 2 ๐œ‹๐‘– ๐œ” ๐›ผ โˆ‘๏ธ 2 ๐œ‹๐‘– ๐œ” ๐›ผ 1 2 ๐œ‹๐‘– ๐œ” ๐›ผ โˆ‘๏ธ 2 ๐œ‹๐‘– ๐œ” ๐›ผ 1   ๐‘’ ๐‘‘ ๐‘ฅห†๐‘› ๐‘ฅหœห†ยฏ๐‘›โˆ’๐›ผ ๐‘’ โˆ’ ๐‘‘ = ๐‘’ ๐‘‘ ๐‘ฅห†๐‘› ๐‘ฅยฏห†๐‘›โˆ’๐›ผ ๐‘’ โˆ’ ๐‘‘ = ๐‘’ 2๐œ‹๐‘–๐œ”๐›ผ/๐‘‘ F๐‘‘ (xฬ‚ โ—ฆ ๐‘†โˆ’๐›ผ xฬ‚) ยฏ . ๐‘‘ ๐‘›=0 ๐‘‘ ๐‘›=0 ๐‘‘ ๐œ” B.1.1 SUB-SAMPLING IN FREQUENCY We will first look at sub-sampling in frequency. Definition B.1.2. Let ๐พ be a positive factor of ๐‘‘, and assume that the data is measured at ๐พ equally ๐‘‘ spaced Fourier modes. We denote the set of Fourier modes of step-size ๐‘˜ by ๐‘‘ n ๐‘‘ 2๐‘‘ ๐‘‘o K= [๐พ]0 = 0, , , . . . , ๐‘‘ โˆ’ . (B.10) ๐พ ๐พ ๐พ ๐พ Definition B.1.3. Let A โˆˆ C๐‘‘ร—๐‘‘ with columns a ๐‘— , ๐พ | ๐‘‘. We denote by A๐พ,๐‘‘ โˆˆ C๐พร—๐‘‘ the sub-matrix of A whose โ„“ th column is equal to ๐‘ ๐‘‘ (aโ„“ ). ๐พ With these definitions, we will now convert the sub-sampled measurements into a more solvable form. Lemma B.1.3. ([78], Lemma 2.1.1.) Suppose that the noisy spectrogram measurements are collected on a subset K โІ [๐‘‘]0 of ๐พ equally space Fourier modes. Then for any ๐œ” โˆˆ [๐พ]0 ๐‘‘   ๐พ โˆ’1 โˆ‘๏ธ   (F๐พ Y๐พ,๐‘‘ ) ๐‘‡ =๐พ (x โ—ฆ ๐‘†๐œ”โˆ’๐‘Ÿ๐พ xฬ„) โˆ—๐‘‘ (mฬƒ โ—ฆ ๐‘†๐‘Ÿ๐พโˆ’๐œ” mฬƒ ยฏ ) + (F๐พ N๐พ,๐‘‘ )๐‘‡ , ๐œ” ๐‘Ÿ=0 ๐œ” where Y๐พ,๐‘‘ โˆˆ C๐พร—๐‘‘ โˆ’ N๐พ,๐‘‘ โˆˆ C๐พร—๐‘‘ is the matrix of sub-sampled noiseless ๐พ ยท ๐‘‘ measurements. 129 Proof. For โ„“ โˆˆ [๐‘‘]0 , the โ„“ th column of the matrix Y is   yโ„“ = F๐‘‘ (x โ—ฆ ๐‘†โˆ’โ„“ m) โˆ—๐‘‘ (xฬƒยฏ โ—ฆ ๐‘†โ„“ mฬƒ ยฏ ) + ๐œ‚โ„“ , (B.11) and thus for any ๐›ผ โˆˆ [๐พ]0        ๐‘ ๐‘‘ (yโ„“ ) = F๐‘‘ (x โ—ฆ ๐‘†โˆ’โ„“ m) โˆ—๐‘‘ (xฬƒยฏ โ—ฆ ๐‘†โ„“ mฬƒ ยฏ) + ๐‘ ๐‘‘ (ศทโ„“ ) . (B.12) ๐พ ๐›ผ ๐›ผ ๐พ๐‘‘ ๐พ ๐›ผ ๐‘‘ and by aliasing lemma (with ๐‘  = ๐พ) ๐‘‘     ๐พ โˆ’1 ๐พ โˆ‘๏ธ F๐พ ๐‘ ๐‘‘ (yโ„“ ) = (yฬ‚โ„“ )๐œ”โˆ’๐‘Ÿ๐พ ๐พ ๐›ผ ๐œ” ๐‘‘ ๐‘Ÿ=0 ๐‘‘ ๐พ โˆ’1       ๐พ โˆ‘๏ธ ยฏ =๐‘‘ยท (x โ—ฆ ๐‘†๐œ”โˆ’๐‘Ÿ๐พ xฬ„) โˆ—๐‘‘ (mฬƒ โ—ฆ ๐‘†๐‘Ÿ๐พโˆ’๐œ” mฬƒ) + F๐พ ๐‘ ๐‘‘ (ศทโ„“ ) . ๐‘‘ ๐‘Ÿ=0 โ„“ ๐พ ๐›ผ ๐œ” The โ„“ th column of Y๐พ,๐‘‘ โˆˆ C๐พร—๐‘‘ is equal to ๐‘ ๐‘‘ (yโ„“ ). Then for any ๐œ” โˆˆ [๐พ]0 , the ๐œ”th column of ๐พ (F๐พ ๐‘Œ๐พ,๐‘‘ )๐‘‡ โˆˆ C๐‘‘ร—๐พ may be computed as ๐‘‘   ๐พ โˆ’1 โˆ‘๏ธ   (F๐พ Y๐พ,๐‘‘ ) ๐‘‡ =๐พ (x โ—ฆ ๐‘†๐œ”โˆ’๐‘Ÿ๐พ xฬ„) โˆ—๐‘‘ (mฬƒ โ—ฆ ๐‘†๐‘Ÿ๐พโˆ’๐œ” mฬƒ ยฏ ) + (F๐พ N๐พ,๐‘‘ )๐‘‡ . (B.13) ๐œ” ๐‘Ÿ=0 ๐œ” B.1.2 SUB-SAMPLING IN FREQUENCY AND SPACE We will now look at sub-sampling in both frequency and space. Definition B.1.4. Let ๐ฟ be a positive factor of ๐‘‘. Suppose measurements are collected at ๐ฟ equally spaced physical shifts of step-size ๐ฟ๐‘‘ . We denote the set of shifts by L, that is ๐‘‘ n ๐‘‘ 2๐‘‘ ๐‘‘o L = [๐ฟ]0 = 0, , , . . . , ๐‘‘ โˆ’ . (B.14) ๐ฟ ๐ฟ ๐ฟ ๐ฟ Definition B.1.5. Let A โˆˆ C๐‘‘ร—๐‘‘ , ๐ฟ | ๐‘‘. We denote by A๐‘‘,๐ฟ โˆˆ C๐‘‘ร—๐ฟ the sub-matrix of A whose rows 130 are those of A, sub-sampled in step-size ๐ฟ๐‘‘ . We will now prove a similar lemma as before, but now we will sub-sample in both frequency and space. Lemma B.1.4. ([78], Lemma 2.1.2.) Suppose we have noisy spectrogram measurements collected on a subset K โІ [๐‘‘]0 of ๐พ equally spaced frequencies and a subset L โІ [๐‘‘]0 of ๐ฟ equally spaced physical shifts. Then for any ๐œ” โˆˆ [๐พ]0 , ๐›ผ โˆˆ [๐ฟ]0    F ๐ฟ Y๐‘‡๐พ,๐ฟ (F๐‘‡๐พ )๐œ” ๐›ผ ๐‘‘ ๐‘‘ ๐พ โˆ’1 ๐ฟ โˆ’1 ๐พ ๐ฟ โˆ‘๏ธ โˆ‘๏ธ  ยฏ ))     = F๐‘‘ ((x โ—ฆ ๐‘†๐œ”โˆ’๐‘Ÿ๐พ xฬ„) โˆ—๐‘‘ (mฬƒ โ—ฆ ๐‘†๐‘Ÿ ๐‘˜โˆ’๐œ” mฬƒ + F ๐ฟ N๐‘‡๐พ,๐ฟ (F๐‘‡๐พ )๐œ” . ๐‘‘ ๐‘Ÿ=0 โ„“=0 ๐›ผโˆ’โ„“๐ฟ ๐›ผ where Y๐พ,๐ฟ โˆ’ N๐พ,๐ฟ โˆˆ C๐พร—๐ฟ is the matrix of sub-sampled noiseless ๐พ ยท ๐ฟ measurements. Proof. For fixed โ„“ โˆˆ [๐‘‘]0 , ๐œ” โˆˆ [๐พ]0 , we have computed ๐‘‘   ๐พ โˆ’1  โˆ‘๏ธ  F๐พ (๐‘ ๐‘‘ (yโ„“ )) =๐พ ยฏ) . (x โ—ฆ ๐‘†๐œ”โˆ’๐‘Ÿ๐พ xฬ„) โˆ—๐‘‘ F๐‘‘ (mฬƒ โ—ฆ ๐‘†๐‘Ÿ ๐‘˜ ๐œ” mฬƒ ๐พ ๐œ” โ„“ ๐‘Ÿ=0 Fix ๐œ” โˆˆ [๐พ]0 , and define the vector p๐œ” โˆˆ C ๐ฟ by     (p๐œ” )โ„“ := F๐พ (๐‘ ๐‘‘ (yโ„“ ๐‘‘ )) + F๐พ (๐‘ ๐‘‘ (ศทโ„“ ๐‘‘ )) , โˆ€โ„“ โˆˆ [๐ฟ]0 . ๐พ ๐ฟ ๐œ” ๐พ ๐ฟ ๐œ” Note that the rows of Y๐พ,๐ฟ , N๐พ,๐ฟ โˆˆ C๐พร—๐ฟ are those of Y๐พ,๐ฟ , N๐พ,๐ฟ โˆˆ C๐พร—๐‘‘ , sub-sampled in step-size of ๐ฟ๐‘‘ . Thus     (p๐œ” )โ„“ = (Y๐พ,๐ฟ )๐‘‡ (F๐‘‡๐พ )๐œ” + (Y๐พ,๐ฟ )๐‘‡ (F๐‘‡๐พ )๐œ” , โ„“ โ„“ where (F๐‘‡๐พ )๐œ” โˆˆ C๐พ is the ๐œ”th column of F๐‘‡๐พ . Therefore p๐œ” = Y๐‘‡๐พ,๐ฟ (F๐‘‡๐พ )๐œ” + N๐‘‡๐พ,๐ฟ (F๐‘‡๐พ )๐œ” โˆˆ C ๐ฟ , โˆ€๐œ” โˆˆ [๐พ]0 . 131 For any โ„“ โˆˆ [๐ฟ]0 , we have ๐‘‘ ๐พ โˆ’1  โˆ‘๏ธ  (p๐œ” )โ„“ = ๐พ ยฏ) (x โ—ฆ ๐‘†๐œ”โˆ’๐‘Ÿ๐พ xฬ„) โˆ—๐‘‘ F๐‘‘ (mฬƒ โ—ฆ ๐‘†๐‘Ÿ ๐‘˜ ๐œ” mฬƒ + N๐‘‡๐พ,๐ฟ (F๐‘‡๐พ )๐œ” ๐‘Ÿ=0 โ„“ ๐ฟ๐‘‘ ๐‘‘  ๐พ โˆ’1  โˆ‘๏ธ  = ๐พ ยท ๐‘๐‘‘ (x โ—ฆ ๐‘†๐œ”โˆ’๐‘Ÿ๐พ xฬ„) โˆ—๐‘‘ F๐‘‘ (mฬƒ โ—ฆ ๐‘†๐‘Ÿ ๐‘˜ ๐œ” mฬƒ ยฏ) + N๐‘‡๐พ,๐ฟ (F๐‘‡๐พ )๐œ” , ๐ฟ โ„“ ๐‘Ÿ=0 and for any ๐›ผ โˆˆ [๐ฟ]0 , by aliasing, we have that ๐‘‘ ๐‘‘ ๐พ โˆ’1 โˆ‘๏ธ ๐ฟ โˆ’1      ๐พ ๐ฟ โˆ‘๏ธ ยฏ )) (F ๐ฟ p๐œ” )๐›ผ = F๐‘‘ ((x โ—ฆ ๐‘†๐œ”โˆ’๐‘Ÿ๐พ xฬ„) โˆ—๐‘‘ (mฬƒ โ—ฆ ๐‘†๐‘Ÿ ๐‘˜โˆ’๐œ” mฬƒ + F ๐ฟ N๐‘‡๐พ,๐ฟ (F๐‘‡๐พ )๐œ” . ๐‘‘ ๐‘Ÿ=0 โ„“=0 ๐›ผโˆ’โ„“๐ฟ ๐›ผ 132 APPENDIX C BLIND DECONVOLUTION C.1 ALTERNATIVE APPROACH In this section we discuss the convex relaxation approach studied in [2]. C.1.1 CONVEX RELAXATION In [2], the approach is to solve a convex version of the problem. Given y โˆˆ C ๐ฟ , their goal is to find h โˆˆ R ๐‘˜ , x โˆˆ R๐‘ that are consistent with the observations. Making no additional assumptions other than the dimensions, the way to choose between multiple feasible points is by solving using least-squares. That is, minimizeu,v โˆฅuโˆฅ 22 +โˆฅvโˆฅ 22 subject to y(โ„“) = โŸจcโ„“ , uโŸฉโŸจv, bโ„“ โŸฉ, 1โ‰คโ„“โ‰ค๐ฟ This is a non-convex quadratic optimization problem. The cost function is convex, but the quadratic equality constraints mean that the feasible set is non-convex. The dual of this minimization problem is the SDP and taking the dual again will give us a convex program ๏ฃฎ ๏ฃน 1 1 ๏ฃฏW1 X ๏ฃบ min ๐‘ก๐‘Ÿ(W1 ) + ๐‘ก๐‘Ÿ(W2 ) subject to ๏ฃฏ ๏ฃบ โชฐ 0, y = A(X) ๏ฃฏ ๏ฃบ W1 ,W2 ,X 2 2 ๏ฃฏ Xโˆ— W ๏ฃบ ๏ฃฏ 2๏ฃบ ๏ฃฐ ๏ฃป which is equivalent to minโˆฅXโˆฅ โˆ— subject to y = A(X) โˆš where โˆฅXโˆฅ โˆ— = ๐‘ก๐‘Ÿ( Xโˆ— X) denotes the nuclear norm. In [2], they achieved guarantees for relatively large ๐พ and ๐‘, when ๐ต is incoherent in the Fourier domain, and when ๐ถ is generic. We can now outline the algorithm from [2]. 133 Algorithm C.1 Convex Relaxed Blind Deconvolution Algorithm Input: Normalized Fourier measurement y, Output: Estimate underlying signal and blurring function 1) Compute A โˆ— (y) 2) Find the leading singular value, left and right singular vectors of A โˆ— (y), denoted by ๐‘‘, hหœ0 , and xหœ0 respectively 3) Let X0 = hหœ0 xหœ0 โˆ— denote the initial estimate and solve the following optimization problem min โˆฅXโˆฅ โˆ— subject to โˆฅy โˆ’ A(X)โˆฅโ‰ค ๐›ฟ (C.1) where โˆฅยทโˆฅ โˆ— denotes the nuclear norm and โˆฅeโˆฅ 2 โ‰ค ๐›ฟ Return (h, x) for X = hxโˆ— 134