STATISTICAL SIGNAL PROCESSING APPROACHES FOR MULTI-REFERENCE ALIGNMENT AND NEURAL TEXTURE SYNTHESIS By Liping Yin A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Applied Mathematicsโ€”Doctor of Philosophy Computational Mathematics, Science, and Engineeringโ€”Dual Major 2023 ABSTRACT Statistical signal processing plays a crucial role in numerous fields of modern technology and science. Some of the important applications include extracting signals from noisy data, processing images and videos for tasks like compression and enhancement, and analyzing time-varying data, such as climate data and asset prices. In this dissertation, we address two problems related to statistical signal/image processing. The first issue involves a generalized version of the multi-reference alignment problem in one dimension, inspired by modern data applications such as cryo-electron microscopy. The objective is to recover an unknown signal ๐‘“ : R โ†’ R from multiple observations that have been translated, dilated, and corrupted by additive noise. In the presence of large dilations and corruptions, observational data do not resemble the underlying signal. Although current approaches in the field have shown empirical success in the absence of dilations, no approach has successfully provided convergence guarantees for signal inversion while dilating, translating, and corrupting observational data all at once. Thus, we propose an unbiased estimator for the bispectrum of the unknown signal which depends on the corrupted samples and knowledge of the dilation distribution. To validate our proposed estimator, we use it for bispectrum recovery, and invert the recovered bispectrum to achieve full signal inversion. The second problem concerns neural texture synthesis, which is important for understanding how humans perceive texture. Current approaches require regularization terms or some type of supervision to capture long range constraints, such as the alignment of bricks, in images. To remedy this issue, we propose a new set of statistics for examplar-based neural texture synthesis based on Sliced Wasserstein Loss, and augment our proposed algorithm with a multi-scale synthesis process. Based on qualitative and quantitative experiments, our results are comparable or better than current state of the art methods. Copyright by LIPING YIN 2023 ACKNOWLEDGEMENTS I complete the thesis thanks to the guidance of my committee members, and support from family and friends. First of all, I would like to express my appreciation to both of my committee chairs Dr. Di Liu and Dr. Yuying Xie for supporting me during my PhD studies. It is my greatest luck to meet them in my life. I am very grateful to Dr. Yuying Xie to the point that I wish I could meet him much earlier. He offered incredible support in my hardest period of PhD life. He gave me the freedom to create my research plan and at the same time offered help whenever I need. He also help with every little thing, including checking over my thesis, applying for fellowships, and much more. I still remember he modified my slides word by word and checked my thesis writing carefully. I am impressed by his knowledge and intelligence. The most important thing is that, I am also learning from his sincerity, sense of responsibility, and perseverance. Dr. Di Liu has been my committee chair for three years. He has been very supportive and helpful during those years. He was always there when I need help for anything. I appreciate that he could give me suggestions when I was having trouble and encouraging me when I was under stress. He is also kind and wise. I would not be able to continue without his support and help. I also very appreciate my committee members Dr. Yingda Cheng and Dr. Ekaterina Rapinchuk. There are always reachable and helpful. They went through my work and gave me helpful feedback. Their suggestions were very useful to me and I made improvement to my research. They asked helpful questions to ensure my research was sound. I would also like to thank Dr. Mathew Hirn who has been my advisor during my third and fourth PhD life. He introduced both of my projects to me and taught me the background and all the useful math tools towards my projects. He also funded me for three semesters so that I could focus on courses and research. I would not be able to grow up without his supervision. He is very creative and able to help me when I got stuck. I also took two courses with him and I learned a lot from his courses. I also appreciated that he introduced my another important committee member, Dr. Anna Little, to me. I would like to send my best wishes to him. iv As I mentioned above. Dr. Anna Little played a critical rule in my dissertation. I would not be able to finish the MRA project, which is the most important part of my dissertation, without her. She has done pioneering work on dilation MRA and guided me through the project. I am grateful that she was very patient and answered all my questions, no matter how simple they are. She read my dissertation and commented on it so that I could make improvements to it, and Iโ€™m very thankful for that already. Regardless of time zone difference she was always there to discuss with me. I still remember she checked over my proofs, came up with ideas, and debugged my code. There are too many things to thank her for. I appreciate her time and effort, and I learned a lot from her. I not only learned math techniques, but I also learned how to think scientifically. She is the great mathematician that I have learned from. Lastly, I would like to thank my parents. Thanks for my mother always encouraging me and supporting me. And thanks for my father who has left the world along long time ago. The good memories and love will never fade. v TABLE OF CONTENTS CHAPTER 1 THESIS OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 2 BISPECTRUM INVERSION FOR MULTI-REFERENCE ALIGNMENT . . . . . . . . . . . . . . . . . . . 3 CHAPTER 3 NONLINEAR HEEGER-BERGEN TEXTURE SYNTHESIS . . . . . . 57 CHAPTER 4 LONG RANGE CONSTRAINTS FOR NEURAL TEXTURE SYNTHESIS USING SLICED WASSERSTEIN LOSS . . . . . . . . . . 81 CHAPTER 5 CONCLUDING REMARKS . . . . . . . . . . . . . . . . . . . . . . . . 104 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 vi CHAPTER 1 THESIS OVERVIEW We provide a brief outline of this thesis, which focuses on two problems in statistical signal processing: multi-reference alignment and texture synthesis. The second chapter focuses on a variation of the multi-reference alignment problem, which is motivated by Cryoelectron microscopy that won the Nobel prize in 2015. The original multi-reference alignment problem focuses on recovering a function ๐‘“ : R3 โ†’ R from observations that have been translated, rotated, or corrupted by additive noise. The challenge in this problem is the following: in the presence of large corruptions, observational data do not resemble the underlying signal. However, the multi-reference alignment problem is not necessarily an accurate representation of reality. One expects small physical variations in objects, such as macro-molecular structure in the context of Cryo-EM or deformation of an object in the context of object registration. Thus, it is of interest to consider a model where each observation has been deformed by a diffeomorphism ๐œ โˆˆ ๐ถ 2 (R3 ). However, the class of diffeomorphisms is challenging to study. We address a simplified one-dimensional version of diffeomorphism MRA where we only consider a specific class of diffeomorphisms, dilations of the form ๐œ(๐‘ฅ) = (1 โˆ’ ๐‘)๐‘ฅ, which we will call dilation MRA. Under mild assumptions, we propose an algorithm that is able to recover the ground truth signal using Fourier invariants, namely the power spectrum and bispectrum. Previous results provide a nonlinear estimator for bispectrum recovery, and we recover the bispectrum via a novel nonlinear estimator. This is a first step towards a model that can address general classes of diffeomorphisms, which will be left for future work. The third chapter considers the problem of texture synthesis, which focuses on generating a new texture from a reference texture. The generated texture should have the same perceptual qualities as the reference texture without being a repetition of the old texture. We consider a modification the Heeger Bergen texture synthesis algoritm, which generates new textures via matching wavelet coefficients between a reference texture and white noise. To increase the robustness of the algorithm, we first create a nonlinear version of the wavelet transform by applying a set of nonlinearities to 1 each wavelet coefficient after performing a wavelet transform. We then extend our formulation by creating a modification of Mallatโ€™s invariant scattering transform, the Two Layer Scattering Pyramid. We attempt to histogram match coefficients of the Two Layer Scattering Pyramid to get good synthesis results. However, we find that our results are not competitive with state of the art algorithms, which motivates the last main chapter of this dissertation. For chapter four, instead of using an understood and mathematically tested representation like the scattering transform, we switch to using statistics of a deep convolutional neural network, VGG19. We propose a new set of statistics for texture synthesis using Sliced Wasserstein Loss, which is an approximate solution to an optimal transport problem. Our results are compared with current state-of-the-art algorithms in single texture synthesis that have comparable run-time. We find our results are competitive with the state-of-the-art, but do require hyperparameter tuning or user-added spatial constraints. Additionally, we propose a multi-scale algorithm to augment our results, which improves the results of our synthesis dramatically. 2 CHAPTER 2 BISPECTRUM INVERSION FOR MULTI-REFERENCE ALIGNMENT 2.1 Introduction to MRA The multi-reference alignment (MRA) problem, is a problem where we want to recover an unknown signal ๐‘“ : R โ†’ R from many observations that have been randomly translated and corrupted by an additive noise. Is there a way to recover ๐‘“ from these observations? A formal description of the assumptions is given in Model 1. Model 1. Suppose we have ๐‘€ independent observations of a function ๐‘“ โˆˆ ๐ฟ 2 (R) defined by ๐‘“ ๐‘— (๐‘ฅ) = ๐‘“ (๐‘ฅ โˆ’ ๐‘ก ๐‘— ) + ๐œ€ ๐‘— (๐‘ฅ), 1 โ‰ค ๐‘— โ‰ค ๐‘€, where โ€ข supp( ๐‘“ ๐‘— ) โŠ‚ [โˆ’ 12 , 12 ] for 1 โ‰ค ๐‘— โ‰ค ๐‘€. โ€ข {๐‘ก ๐‘— } ๐‘€ ๐‘—=1 are independent samples of a random variable ๐‘ก โˆˆ R. โ€ข {๐œ€ ๐‘— (๐‘ฅ)} ๐‘€ 1 1 ๐‘—=1 are independent white noise processes on [โˆ’ 2 , 2 ] with variance ๐œŽ . 2 The MRA problem is a simplified version of problems in cryo-electron microscopy (cryo-EM), and is similar to problems in other fields such as structural biology [17, 35, 36, 41, 42, 50], image registration [11, 18], and image processing [54]. To solve the problem outlined in Model 1, current methods can be grouped into two categories. The first of which is synchronization methods [2, 3, 4, 5, 9, 12, 13, 38, 46, 53], which try to recover each translation factor {๐‘ก ๐‘— } ๐‘€ ๐‘—=1 , align each of the signals using the recovered translation factor, and average the aligned signals to get a smoother estimate for the ground truth signal. However, synchronization methods can be problematic when the signal-to-noise ratio (SNR) is small. To illustrate this point, consider the following picture: Looking at Figure 2.1, with small amounts of noise, one can align the signals and then average the translated signals to get a cleaner version of the ground truth signal, up to a translation. However, at high noise levels, these peaks arenโ€™t recognizable, which makes this "synchronization" process unreliable. 3 Figure 2.1 Left Column: three ground truth signals that have been translated without any additive noise. Middle Column: Adding Gaussian noise with mean zero and ๐œŽ 2 = 0.2 to each of the signals in the left column. Right Column: Adding Gaussian noise with mean zero and ๐œŽ 2 = 1.2 to each of the signals in the left column. Reference: [6]. The second approach involves estimating the signal directly using ideas such as the method of moments [22, 28, 44]; a subclass of the method of moments technique, is the method of invariants technique [2, 7, 15, 26, 27]. Additionally, expectation maximization (EM) algorithms [1, 16] have also shown success for signal recovery. One problem with Model 1 is that it is not an accurate representation of real world applications. In particular, for three dimensional applications such as cryo-EM, molecules are randomly rotated and one only has access to 2D projections of the molecule like below: 4 Figure 2.2 An example of the Cryo-EM problem. The 3D object at the top of the figure is the ground truth object. One has 2D slices of the object that are noisy, rotated, and possibly translated. Reference: [47]. However, the model does not consider of physical variations in the objects, such as macro- molecular structure which โ€œflop" around. One can think of these deformations as a diffeomorphism acting on the molecule before retrieving the slices. In other words, if ๐‘” is a function extracting the slices and ๐‘ฅ is a molecule, we retrieve ๐‘”(๐œ‰ (๐‘ฅ)), where ๐œ‰ โˆˆ ๐ถ 2 (R๐‘› ) has a bฤณective, continuous hessian. The problem of adding random diffeomorphisms to Model 1 is much harder though. This is because the class of all diffeomorphisms encompasses a variety of different possible deformations. To simplify the problem, we consider a simple subset of all possible diffeomorphisms, the set of dilations. This leads to the formulation of Model 2 below: Model 2. Suppose we have ๐‘€ independent observations of a function ๐‘“ โˆˆ ๐ฟ 2 (R) defined by ๐‘ฆ ๐‘— (๐‘ฅ) = ๐‘“ ((1 โˆ’ ๐œ ๐‘— ) โˆ’1 (๐‘ฅ โˆ’ ๐‘ก ๐‘— )) + ๐œ€ ๐‘— (๐‘ฅ) := ๐‘“ ๐‘— (๐‘ฅ) + ๐œ€ ๐‘— (๐‘ฅ), 1โ‰ค ๐‘— โ‰ค๐‘€ Furthermore, assume that โ€ข supp( ๐‘“ ๐‘— ) โŠ‚ [โˆ’ 12 , 12 ] for 1 โ‰ค ๐‘— โ‰ค ๐‘€. โ€ข {๐‘ก ๐‘— } ๐‘€ ๐‘—=1 are independent samples of a random variable ๐‘ก โˆˆ R. โ€ข {๐œ ๐‘— } ๐‘€ ๐‘—=1 are independent samples of a uniformly distributed random variable ๐œ โˆˆ R satisfying 1 E[๐œ] = 0 and Var(๐œ) = ๐œ‚2 โ‰ค 12 . 5 โ€ข {๐œ€ ๐‘— (๐‘ฅ)} ๐‘€ 1 1 2 ๐‘—=1 are independent white noise processes on [โˆ’ 2 , 2 ] with variance ๐œŽ . Dilations are one of the simplest class of diffeomorphisms other than translations, but Model 2 is significantly more difficult to solve compared to Model 1, even at low noise levels. Consider Figure 2.3. Intuitively, as we see in Figure 2.3, dilations add another degree of difficulty to the Figure 2.3 Left Column: three ground truth signals that have been translated without any additive noise. Middle Left Column: Dilating each of the signals. Middle Right Column: Adding Gaussian noise with mean zero and ๐œŽ 2 = 0.5 to each of the signals in the left column. Right Column: Adding Gaussian noise with mean zero and ๐œŽ 2 = 2 to each of the signals in the left column. Reference: [26]. problem. Regarding alignment, the main challenge arises from the additive noise. Regarding the method of invariants, it utilizes translation invariant Fourier features, such as the power spectrum and bispectrum. Before we begin, define ๐ฟ ๐‘ž (R) as the set of functions ๐‘“ such that โˆซ โˆฅ ๐‘“ โˆฅ ๐‘ž = R | ๐‘“ | ๐‘ž ๐‘‘๐‘ฅ < โˆž. The Fourier Transform of ๐‘“ โˆˆ ๐ฟ 1 (R) is ๐‘ž โˆซ ๐‘“ห†(๐œ”) = ๐‘“ (๐‘ก)๐‘’ โˆ’๐‘–๐‘ค๐‘ก ๐‘‘๐‘ก, (2.1) R 6 the power spectrum is (๐‘ƒ ๐‘“ )(๐œ”) = | ๐‘“ห†(๐œ”)| 2 , (2.2) and the bispectrum is ๐ต ๐‘“ (๐œ”1 , ๐œ”2 ) = ๐‘“ห†(๐œ”1 ) ๐‘“ห†โˆ— (๐œ”2 ) ๐‘“ห†(๐œ”2 โˆ’ ๐œ”1 ). (2.3) However, in the case of Model 2, using an approach with Fourier invariants results in significant difficulties because the Fourier Transform modulus is unstable to small dilations. Consider the operator ๐ฟ ๐‘ ๐‘“ (๐‘ก) = ๐‘“ ((1 โˆ’ ๐‘)๐‘ก). For small ๐‘, this is a minor rescaling of the function ๐‘“ . Let ๐œ‰ (๐‘ก) = (1 โˆ’ ๐‘)๐‘ก; it is clear that ๐œ‰ is a diffeomorphism, specifically a dilation. Then ๐ฟ ๐‘ ๐‘“ (๐‘ก) = ๐‘“ (๐œ‰ (๐‘ก)) with โˆฅ๐œ‰ โ€ฒ โˆฅ โˆž = 1 โˆ’ ๐‘. Choose ๐‘“ (๐‘ก) = ๐‘’๐‘–๐›ผ๐‘ก ๐œƒ (๐‘ก), where ๐œƒ is smooth with fast decay. We see that ๐ฟ ๐‘ ๐‘“ (๐‘ก) = ๐‘’๐‘–๐›ผ(1โˆ’๐‘)๐‘ก ๐œƒ ((1 โˆ’ ๐‘)๐‘ก), and it follows that โˆซ ๐ฟd๐‘ ๐‘“ (๐œ”) = ๐œƒ ((1 โˆ’ ๐‘)๐‘ก)๐‘’๐‘–๐›ผ(1โˆ’๐‘)๐‘กโˆ’๐‘–๐œ”๐‘ก ๐‘‘๐‘ก R โˆซ 1 ๐œ” = ๐œƒ (๐‘ก)๐‘’๐‘–๐›ผ๐‘กโˆ’๐‘– 1โˆ’๐‘ ๐‘ก ๐‘‘๐‘ก 1โˆ’๐‘ R 1 ห† = ๐œƒ ((1 โˆ’ ๐‘) โˆ’1 ๐œ” โˆ’ ๐›ผ). 1โˆ’๐‘ Looking at the norm now, for ๐‘ < 1/2, we see that from an argument identical to [34], โˆฅ| ๐ฟd ห† 2 2 ๐‘ ๐‘“ | โˆ’ | ๐‘“ |โˆฅ 2 โ‰ˆ |๐‘๐›ผ| ยท โˆฅ ๐‘“ โˆฅ 2 . (2.4) Since ๐›ผ is arbitrary, we see why Fourier invariants are unstable to small dilations. While we will address Model 2 later in this thesis, we first address a noiseless model, Model 3, given by Model 3. Suppose we have ๐‘€ independent observations of a function ๐‘“ โˆˆ ๐ฟ 2 (R) defined by ๐‘“ ๐‘— (๐‘ฅ) = ๐‘“ ((1 โˆ’ ๐œ ๐‘— ) โˆ’1 (๐‘ฅ โˆ’ ๐‘ก ๐‘— )), 1โ‰ค ๐‘— โ‰ค๐‘€ Furthermore, assume that โ€ข supp( ๐‘“ ๐‘— ) โŠ‚ [โˆ’ 21 , 12 ] for 1 โ‰ค ๐‘— โ‰ค ๐‘€. โ€ข {๐‘ก ๐‘— } ๐‘€ ๐‘—=1 are independent samples of a random variable ๐‘ก โˆˆ R. 7 โ€ข {๐œ ๐‘— } ๐‘€ ๐‘—=1 are independent samples of a uniformly distributed random variable ๐œ โˆˆ R with 1 E[๐œ] = 0 and Var(๐œ) = ๐œ‚2 โ‰ค 12 . Note that the lack of additive noise means one can estimate ๐ถ ๐‘“ = โˆฅ ๐‘“ โˆฅ 2 and then dilate each observation to have norm ๐ถ ๐‘“ , which makes this problem trivial. However we explore a solution to Model 3 via Fourier invariants to provide intuition on how to approach Model 2. 2.2 Notation Let ๐‘” = ๐ต ๐‘“ . For the second and third models, let ๐‘”๐œ‚ (๐œ”1 , ๐œ”2 ) = E๐œ [(๐ต ๐‘“ ๐‘— )(๐œ”1 , ๐œ”2 )]. We will also define the following constants and operations used in the rest of the paper: โˆš (1 โˆ’ 3๐œ‚) โˆš 1 ๐ถ0 = โˆš , ๐ถ1 = 2 3๐œ‚ , ๐ถ2 = โˆš . (2.5) (1 + 3๐œ‚) 1 + 3๐œ‚ Additionally, define the dilation operator ๐ฟ ๐ถ ๐‘”(๐œ”1 , ๐œ”2 ) := ๐ถ 4 ๐‘”(๐ถ๐œ”1 , ๐ถ๐œ”2 ). Note that in polar coordinates (๐‘Ÿ, ๐œƒ), we have ๐ฟ ๐ถ ๐‘”(๐‘Ÿ, ๐œƒ) = ๐ถ 4 ๐‘”(๐ถ๐‘Ÿ, ๐œƒ). 2.3 Power Spectrum Recovery for Model 3 Specific work from [26] has produced results for recovery of the power spectrum of a hidden signal under certain assumptions in Models 2 and 3. We summarize their results below for Model 3 and explain how to adjust the results for Model 2. Define   ๐‘ ๐œ‚ (๐œ”) := E๐œ (๐‘ƒ ๐‘“ ๐‘— )(๐œ”) (2.6) We also let (๐‘†๐ถ ๐‘”)(๐œ”) = ๐ถ 3 ๐‘”(๐ถ๐œ”) be a rescaled dilation operator. In the case of the infinite sample limit, we have the following theorem: Theorem 1. Assume ๐‘ƒ ๐‘“ โˆˆ C0 (R) and ๐‘ ๐œ‚ as defined in (2.6). Then for ๐œ” โ‰  0: (๐‘ƒ ๐‘“ )(๐œ”) = (๐ผ โˆ’ ๐‘†๐ถ0 ) โˆ’1๐ถ1 ๐ฟ ๐ถ2 (3๐‘ ๐œ‚ (๐œ”) + ๐œ”๐‘โ€ฒ๐œ‚ (๐œ”)) , where ๐ถ0 , ๐ถ1 , ๐ถ2 are as defined in (2.5). 8 One can now use this to derive an approximation of the infinite sample estimator. In practice, one can estimate ๐‘ ๐œ‚ by ๐‘€ 1 โˆ‘๏ธ ๐‘e๐œ‚ (๐œ”) := (๐‘ƒ ๐‘“ ๐‘— )(๐œ”) . (2.7) ๐‘€ ๐‘—=1 Additionally, a natural approximation for the estimator in Theorem 1 is ( ๐‘ƒf๐‘“ )(๐œ”) := (๐ผ โˆ’ ๐‘†๐ถ0 ) โˆ’1๐ถ1 ๐‘†๐ถ2 (3e ๐‘ ๐œ‚ (๐œ”) + ๐œ” ๐‘eโ€ฒ๐œ‚ (๐œ”)) , (2.8) For the next theorem, define the quantity: (๐‘ƒ ๐‘“ ) (๐‘˜) (๐œ”) := max |(๐‘ƒ ๐‘“ ) (๐‘˜) (๐œ‰)| (2.9) ๐œ‰โˆˆ[๐œ”/2,2๐œ”] Using (2.8), the following convergence theorem holds: Theorem 2. Assume Model 3, the estimator ( ๐‘ƒf๐‘“ )(๐œ”) defined in (2.8), ๐‘ƒ ๐‘“ โˆˆ C3 (R), and that ๐œ” ๐‘˜ (๐‘ƒ ๐‘“ ) (๐‘˜) (๐œ”) โˆˆ ๐ฟ 2 (R) for ๐‘˜ = 2, 3. Then: h i ๐œ‚2 E โˆฅ๐‘ƒ ๐‘“ โˆ’ ๐‘ƒf๐‘“ โˆฅ 22 โ‰ฒ โˆฅ(๐‘ƒ ๐‘“ )(๐œ”)โˆฅ22 + โˆฅ๐œ”(๐‘ƒ ๐‘“ ) โ€ฒ (๐œ”)โˆฅ 22 ๐‘€ + โˆฅ๐œ”2 (๐‘ƒ ๐‘“ ) โ€ฒโ€ฒ (๐œ”)โˆฅ 22 + ๐‘Ÿ ,  where ๐‘Ÿ is a higher-order term satisfying ๐œ‚4  2 โ€ฒโ€ฒ โ€ฒโ€ฒโ€ฒ  ๐‘Ÿโ‰ค โˆฅ๐œ” (๐‘ƒ ๐‘“ ) (๐œ”)โˆฅ 22 + โˆฅ๐œ”3 (๐‘ƒ ๐‘“ ) (๐œ”)โˆฅ 22 . ๐‘€  2 ๐œ‚ That is, in expectation, the convergence rate of (2.8) to the true power spectrum is ๐‘‚ ๐‘€ . 2.4 Power Spectrum Recovery for Model 2 Using these results from Model 3, we design estimators for Model 2. The main difficulty in model 2 is because of the additive noise. First of all, the mean-squared error (MSE) is only finite on a bounded interval due to the additive noise. Thus we consider a bounded frequency domain ฮฉ, h i and consider the MSE of an estimator ๐‘ƒf๐‘“ over ฮฉ: E โˆฅ๐‘ƒ ๐‘“ โˆ’ ๐‘ƒf๐‘“ โˆฅ L2 (ฮฉ) โˆฅ 2 . Another difficultly is that one cannot use ๐‘e๐œ‚ , the average of the observations without noise. Instead, one has access to ๐‘€ 1 โˆ‘๏ธ ๐‘ƒ๐‘ฆ ๐‘— โˆ’ ๐œŽ 2 = ๐‘e๐œ‚ + ๐‘e๐œŽ ๐‘€ ๐‘—=1 9 where ๐‘€ 1 โˆ‘๏ธ b โˆ— bโˆ— ๐‘e๐œŽ := ๐‘“๐‘—b ๐œ– ๐‘— + ๐‘ƒ๐œ– ๐‘— โˆ’ ๐œŽ 2 . ๐œ–๐‘— + ๐‘“๐‘— b ๐‘€ ๐‘—=1 The particular problem is the term ๐‘e๐œŽ , which is not continuous due to the additive noise. To remedy this issue, we smooth the noisy power spectra via a low pass filtering: ( ๐‘e๐œ‚ + ๐‘e๐œŽ ) โˆ— ๐œ™ ๐ฟ (2.10) ๐œ”2 1 โˆ’ using a Gaussian filter with width ๐ฟ, ๐œ™ ๐ฟ (๐œ”) = (2๐œ‹๐ฟ 2 ) โˆ’ 2 ๐‘’ 2๐ฟ 2 . Accordingly, we define a modified version of (2.7): โ€ฒ ( ๐‘ƒf๐‘“ ) (๐œ”) := (๐ผ โˆ’ ๐‘†๐ถ0 ) โˆ’1๐ถ1 ๐‘†๐ถ2 3( ๐‘e๐œ‚ + ๐‘e๐œŽ ) โˆ— ๐œ™ ๐ฟ (๐œ”) + ๐œ” ( ๐‘e๐œ‚ + ๐‘e๐œŽ ) โˆ— ๐œ™ ๐ฟ (๐œ”) .   (2.11) Similar to before, (2.11) is an unbiased estimator of ๐‘ƒ ๐‘“ when |๐‘€ | โ†’ โˆž and ๐ฟ โ†’ 0. Additionally, we have the following result: Theorem 3. Assume Model 2, the estimator ( ๐‘ƒf๐‘“ )(๐œ”) defined in (2.11), ๐‘ƒ ๐‘“ โˆˆ C3 (R), and that ๐œ” ๐‘˜ (๐‘ƒ ๐‘“ ) (๐‘˜) (๐œ”) โˆˆ ๐ฟ 2 (R) for ๐‘˜ = 2, 3. Then  2 ๐œŽ2 โˆจ ๐œŽ4 h i  2 ๐œ‚ 4 E โˆฅ๐‘ƒ ๐‘“ โˆ’ ๐‘ƒ ๐‘“ โˆฅ ๐ฟ 2 (ฮฉ) โ‰ฒ ๐ถ ๐‘“ ,ฮฉ f +๐ฟ + . ๐‘€ ๐ฟ2 ๐‘€   1/6 ๐œŽ4 When ๐œŽ โ‰ฅ 1, we may choose ๐ฟ = ๐‘€ . Then we have "  2/3 # ๐œ‚2 ๐œŽ4 h i  E โˆฅ๐‘ƒ ๐‘“ โˆ’ ๐‘ƒf๐‘“ โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ฒ ๐ถ ๐‘“ ,ฮฉ + ๐ฟ4 + . ๐‘€ ๐‘€ We will generalize these results for bispectrum recovery in the next sections. In other words, for proper choice of ๐ฟ and ๐‘€, the expected MSE converges to 0 as ๐‘€ โ†’ โˆž. 2.5 Bispectrum Recovery for Model 3 Following [26], we propose a similar process for creating an estimator for the bispectrum. We will first consider the case where we have an infinite number of samples and find an unbiased estimator. 10 Theorem 4. Suppose we have an infinite number of samples and assume that ๐ต ๐‘“ โˆˆ ๐ถ 1 (R2 ) An unbiased estimator for the bispectrum is given by   โˆ’1 ๐œ•๐‘”๐œ‚ ๐ต ๐‘“ (๐‘Ÿ, ๐œƒ) = (๐ผ โˆ’ ๐ฟ ๐ถ0 ) ๐ถ1 ๐ฟ ๐ถ2 4๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ (๐‘Ÿ, ๐œƒ) . ๐œ•๐‘Ÿ Proof. Recall that the Bispectrum is given by ๐ต ๐‘“ (๐œ”1 , ๐œ”2 ) = ๐‘“ห†(๐œ”1 ) ๐‘“ห†โˆ— (๐œ”2 ) ๐‘“ห†(๐œ”2 โˆ’ ๐œ”1 ). The Fourier Transform of each ๐‘“ ๐‘— is ๐‘’ โˆ’๐‘–๐œ”๐‘ก ๐‘— (1 โˆ’ ๐œ ๐‘— ) ๐‘“ห†((1 โˆ’ ๐œ ๐‘— )๐œ”). Making a substitution, ๐ต ๐‘“ ๐‘— (๐œ”1 , ๐œ”2 ) = (1 โˆ’ ๐œ ๐‘— ) 3 ๐‘’ โˆ’๐‘–๐œ”1 ๐‘ก ๐‘— ๐‘“ห†((1 โˆ’ ๐œ ๐‘— )๐œ”1 ) ยท [๐‘’ โˆ’๐‘–๐œ”2 ๐‘ก ๐‘— ๐‘“ห†((1 โˆ’ ๐œ ๐‘— )๐œ”1 )] โˆ— ยท ๐‘’ โˆ’๐‘–(๐œ”2 โˆ’๐œ”1 )๐‘ก ๐‘— ๐‘“ห†((1 โˆ’ ๐œ ๐‘— )(๐œ”2 โˆ’ ๐œ”1 )) = (1 โˆ’ ๐œ ๐‘— ) 3 ๐‘“ห†((1 โˆ’ ๐œ ๐‘— )๐œ”1 ) ๐‘“ห†โˆ— ((1 โˆ’ ๐œ ๐‘— )๐œ”1 ) ๐‘“ห†((1 โˆ’ ๐œ ๐‘— )(๐œ”2 โˆ’ ๐œ”1 )). So (๐ต ๐‘“ ๐‘— )(๐œ”1 , ๐œ”2 ) = (1 โˆ’ ๐œ ๐‘— ) 3 (๐ต ๐‘“ )((1 โˆ’ ๐œ ๐‘— )๐œ”1 , (1 โˆ’ ๐œ ๐‘— )๐œ”2 ). Since ๐œ has uniform distribution with variance ๐œ‚2 , the pdf of ๐œ has form ๐‘ ๐œ = โˆš1 ๐œ’ โˆš โˆš . 2 3๐œ‚ [โˆ’ 3๐œ‚, 3๐œ‚] Thus: ๐‘”๐œ‚ (๐œ”1 , ๐œ”2 ) = E๐œ [๐ต ๐‘“ ๐‘— (๐œ”1 , ๐œ”2 )] = E๐œ [(1 โˆ’ ๐œ) 3 ๐‘”((1 โˆ’ ๐œ)๐‘ค 1 , (1 โˆ’ ๐œ)๐‘ค 2 )] โˆซ = (1 โˆ’ ๐œ) 3 ๐‘”((1 โˆ’ ๐œ)๐‘ค 1 , (1 โˆ’ ๐œ)๐‘ค 2 ) ๐‘ ๐œ (๐œ) ๐‘‘๐œ Now we convert to polar coordinates (๐‘Ÿ, ๐œƒ) and let ๐œหœ = (1 โˆ’ ๐œ)๐‘Ÿ: โˆซ โˆš3๐œ‚ 1 3 ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) = โˆš โˆš (1 โˆ’ ๐œ) ๐‘”((1 โˆ’ ๐œ)๐‘Ÿ, ๐œƒ)๐‘‘๐œ 2 3๐œ‚ โˆ’ 3๐œ‚ โˆซ (1+โˆš3๐œ‚)๐‘Ÿ 1 = โˆš โˆš ๐œหœ 3 ๐‘”( ๐œ, หœ ๐œƒ) ๐‘‘ ๐œ. หœ 4 2 3๐œ‚๐‘Ÿ (1โˆ’ 3๐œ‚)๐‘Ÿ Let ๐ป be the antiderivative in the variable ๐‘ค for the function โ„Ž(๐‘ค, ๐œƒ) = ๐‘ค 3 ๐‘”(๐‘ค, ๐œƒ). In other words, ๐œ•๐ป (๐‘ค, ๐œƒ) = โ„Ž(๐‘ค, ๐œƒ) = ๐‘ค 3 ๐‘”(๐‘ค, ๐œƒ). ๐œ•๐‘ค 11 By Fundamental Theorem of Calculus, โˆš โˆš โˆš 2 3๐œ‚๐‘Ÿ 4 ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) = ๐ป ((1 + 3 ๐œ‚)๐‘Ÿ, ๐œƒ) โˆ’ ๐ป ((1 โˆ’ 3 ๐œ‚)๐‘Ÿ, ๐œƒ). Now take derivative with respect to ๐‘Ÿ and divide both sides by ๐‘Ÿ 3 to get โˆš   ๐œ•๐‘”๐œ‚ โˆš โˆš โˆš โˆš 2 3๐œ‚ 4๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ (๐‘Ÿ, ๐œƒ) = (1 + 3 ๐œ‚) 4 ๐‘”((1 + 3 ๐œ‚)๐‘Ÿ, ๐œƒ) โˆ’ (1 โˆ’ 3 ๐œ‚) 4 ๐‘”((1 โˆ’ 3 ๐œ‚)๐‘Ÿ, ๐œƒ). ๐œ•๐‘Ÿ We now apply the dilation operation ๐ฟ ๐ถ2 to both sides, which yields โˆš   โˆš 1โˆ’3 ๐œ‚ 4 1โˆ’3 ๐œ‚     ๐œ•๐‘”๐œ‚ ๐ถ1 ๐ฟ ๐ถ2 4๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ (๐‘Ÿ, ๐œƒ) = ๐‘”(๐‘Ÿ, ๐œƒ) โˆ’ โˆš ๐‘” โˆš ๐‘Ÿ, ๐œƒ . ๐œ•๐‘Ÿ 1+3 ๐œ‚ 1+3 ๐œ‚ We can also rewrite the right side in terms of ๐ผ and ๐ฟ ๐ถ0 to get   ๐œ•๐‘”๐œ‚ ๐ถ1 ๐ฟ ๐ถ2 4๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ (๐‘Ÿ, ๐œƒ) = (๐ผ โˆ’ ๐ฟ ๐ถ0 )๐‘”(๐‘Ÿ, ๐œƒ). ๐œ•๐‘Ÿ Thus, an unbiased estimator is   โˆ’1 ๐œ•๐‘”๐œ‚ ๐‘”(๐‘Ÿ, ๐œƒ) = (๐ผ โˆ’ ๐ฟ ๐ถ0 ) ๐ถ1 ๐ฟ ๐ถ2 4๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ (๐‘Ÿ, ๐œƒ) . ๐œ•๐‘Ÿ โ–ก However, we are only given a finite number of samples and do not have access to the estimator above in actual applications. For Model 3, we can approximate ๐‘”๐œ‚ by taking an average of ๐‘€ samples using ๐‘€ 1 โˆ‘๏ธ ๐‘”๐œ‚ (๐œ”1 , ๐œ”2 ) := e (๐ต ๐‘“ ๐‘— )(๐œ”1 , ๐œ”2 ). ๐‘€ ๐‘—=1 Based on Proposition 4, a good choice for the estimator is   โˆ’1 ๐œ• ๐‘”หœ ๐œ‚ (๐ตf๐‘“ )(๐‘Ÿ, ๐œƒ) := (๐ผ โˆ’ ๐ฟ ๐ถ0 ) ๐ถ1 ๐ฟ ๐ถ2 4๐‘”หœ ๐œ‚ (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ (๐‘Ÿ, ๐œƒ) . ๐œ•๐‘Ÿ To show the estimator ๐ต f๐‘“ has a small error, we will need the following lemma. Lemma 1. Assume that ๐ต ๐‘“ โˆˆ ๐ถ 1 (R). Then 2 ๐œ•๐‘”๐œ‚ ๐œ• ๐‘”หœ ๐œ‚ โˆฅ๐ต ๐‘“ (๐‘Ÿ, ๐œƒ) โˆ’ ๐ตf๐‘“ (๐‘Ÿ, ๐œƒ)โˆฅ 2 โ‰ฒ โˆฅ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘”หœ ๐œ‚ (๐‘Ÿ, ๐œƒ)โˆฅ 2 + ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) . 2 2 ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ 2 12 Proof. We start with    โˆ’1 ๐œ•๐‘”๐œ‚ ๐œ• ๐‘”หœ ๐œ‚ ๐ต ๐‘“ (๐‘Ÿ, ๐œƒ) โˆ’ ๐ตf๐‘“ (๐‘Ÿ, ๐œƒ) = (๐ผ โˆ’ ๐ฟ ๐ถ0 ) ๐ถ1 ๐ฟ ๐ถ2 4(๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘”หœ ๐œ‚ (๐‘Ÿ, ๐œƒ)) + ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ (๐‘Ÿ, ๐œƒ) . ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ By using the triangle inequality, โˆฅ๐ต ๐‘“ (๐‘Ÿ, ๐œƒ) โˆ’ ๐ตf๐‘“ (๐‘Ÿ, ๐œƒ)โˆฅ 2 โ‰ค 32๐ถ 2 โˆฅ(๐ผ โˆ’ ๐ฟ ๐ถ0 ) โˆ’1 โˆฅ 2 โˆฅ๐ฟ ๐ถ2 โˆฅ 2 โˆฅ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘”หœ ๐œ‚ (๐‘Ÿ, ๐œƒ)โˆฅ 2 2 1 2 2 ๐œ•๐‘”๐œ‚ ๐œ• ๐‘”หœ ๐œ‚ + 2๐ถ12 โˆฅ(๐ผ โˆ’ ๐ฟ ๐ถ0 ) โˆ’1 โˆฅ 2 โˆฅ๐ฟ ๐ถ2 โˆฅ 2 ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) . ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ 2 To compute the spectral norm of ๐ฟ ๐ถ , we revert back to Cartesian coordinates: โˆซ โˆซ โˆฅ๐ฟ ๐ถ๐‘š ๐‘”๐œ‚ โˆฅ 22 =๐ถ 8๐‘š |๐‘”๐œ‚ (๐ถ ๐‘š ๐œ”1 , ๐ถ ๐‘š ๐œ”2 )| 2 ๐‘‘๐œ”1 ๐‘‘๐œ”2 . R R Let ๐‘ข = ๐ถ ๐‘š ๐œ”1 and ๐‘ฃ = ๐ถ ๐‘š ๐œ”2 : โˆซ โˆซ โˆฅ๐ฟ ๐ถ๐‘š ๐‘”๐œ‚ โˆฅ 22 =๐ถ 6๐‘š |๐‘”๐œ‚ (๐‘ข, ๐‘ฃ)| 2 ๐‘‘๐‘ข๐‘‘๐‘ฃ = ๐ถ 6๐‘š โˆฅ๐‘”๐œ‚ โˆฅ 22 . R R Thus โˆฅ๐ฟ ๐ถ2 โˆฅ 2 = ๐ถ26๐‘š . Our result also implies that โˆž โˆž โˆ‘๏ธ โˆ‘๏ธ 3๐‘— 1 โˆฅ(๐ผ โˆ’ ๐ฟ ๐ถ0 ) โˆ’1 โˆฅ โ‰ค ๐‘— ๐ฟ ๐ถ0 โ‰ค ๐ถ0 = . ๐‘—=0 ๐‘—=0 1 โˆ’ ๐ถ03 Putting everything together, 16 ยท 12๐œ‚2 โˆฅ๐ต ๐‘“ (๐‘Ÿ, ๐œƒ) โˆ’ ๐ตf๐‘“ (๐‘Ÿ, ๐œƒ)โˆฅ 2 โ‰ค 2 โˆš โˆฅ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘”หœ ๐œ‚ (๐‘Ÿ, ๐œƒ)โˆฅ 22 (1 โˆ’ ๐ถ06 ) 2 (1 + 3๐œ‚) 12 2 12๐œ‚2 ๐œ•๐‘”๐œ‚ ๐œ• ๐‘”หœ ๐œ‚ + โˆš ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) . (1 โˆ’ ๐ถ06 ) 2 (1 + 3๐œ‚) 12 ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ 2 13 12๐œ‚2 Now it suffices to prove that โˆš is bounded by constant: (1โˆ’๐ถ06 ) 2 (1+ 3๐œ‚) 12 12๐œ‚2 12๐œ‚2 โˆš โ‰ค  (1 โˆ’ ๐ถ06 ) 2 (1 + 3๐œ‚) 12  โˆš 6 2 1โˆ’ 3๐œ‚ 1โˆ’ โˆš 1+ 3๐œ‚ 12๐œ‚2 =   โˆš 6  โˆš 6 2 1+ 3๐œ‚ โˆš โˆ’ 1โˆ’โˆš3๐œ‚ 1+ 3๐œ‚ 1+ 3๐œ‚ โˆš 12๐œ‚2 (1 + 3๐œ‚) 12 = โˆš โˆš โˆš (108 3๐œ‚5 + 120 3๐œ‚3 + 12 3๐œ‚) 2 โˆš 12๐œ‚2 (1 + 3๐œ‚) 12 = โˆš โˆš โˆš ๐œ‚2 (108 3๐œ‚4 + 120 3๐œ‚2 + 12 3) 2 ๐œ‚2 โˆš โ‰ฒ 2 (1 + 3๐œ‚) 12 ๐œ‚ = ๐‘‚ (1). โ–ก This lemma implies that we only have to bound E โˆฅ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘”หœ ๐œ‚ (๐‘Ÿ, ๐œƒ)โˆฅ 22 ,   " # 2 ๐œ•๐‘”๐œ‚ ๐œ• ๐‘”หœ ๐œ‚ E ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ 2 with an ๐‘‚ (1/๐‘€) bound on ๐‘€ for ๐ต f๐‘“ (๐‘Ÿ, ๐œƒ) to converge to ๐ต ๐‘“ (๐‘Ÿ, ๐œƒ) with an ๐‘‚ (1/๐‘€) bound. We have the following result now. Theorem 5. Suppose that ๐ต ๐‘“ โˆˆ ๐ถ 3 (R2 ) and also assume that ๐‘Ÿ 2 max๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)| 2 โˆˆ ๐ฟ 2 (R2 , ๐‘‘๐‘Ÿ ร— ๐‘‘๐œƒ) and ๐‘Ÿ 3 max๐›พโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] |๐œ•๐›พ๐›พ๐›พ (๐ต ๐‘“ )(๐›พ, ๐œƒ)| 2 โˆˆ ๐ฟ 2 (R2 , ๐‘‘๐‘Ÿ ร— ๐‘‘๐œƒ). Then the following bound holds: h 2 i ๐œ‚2  2 2 2 2  E โˆฅ๐ต ๐‘“ โˆ’ ๐ต ๐‘“ โˆฅ 2 โ‰ฒ f โˆฅ(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 + 2โˆฅ๐‘Ÿ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 + โˆฅ๐‘Ÿ ๐œ•๐‘Ÿ๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 ๐‘€ ! 2 2 ๐œ‚4 + ๐‘Ÿ 2 max ๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ) + ๐‘Ÿ 3 max ๐œ•๐›พ๐›พ๐›พ (๐ต ๐‘“ )(๐›พ, ๐œƒ) . ๐‘€ ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 ๐›พโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 with   โˆ’1 ๐œ• ๐‘”หœ ๐œ‚ ( ๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) = (๐ผ โˆ’ ๐ฟ ๐ถ0 ) ๐ถ1 ๐ฟ ๐ถ2 4๐‘”หœ ๐œ‚ (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ f (๐‘Ÿ, ๐œƒ) . ๐œ•๐‘Ÿ 14 Proof. First, assume that the ๐ต ๐‘“ : R2 โ†’ R. The argument will be generalized to the complex case after. Notice that 2 ๐‘€ 1 โˆ‘๏ธ | ๐‘”หœ ๐œ‚ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ)| 2 = (๐ต ๐‘“ ๐‘— )(๐‘Ÿ, ๐œƒ) โˆ’ ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) . ๐‘€ ๐‘—=1 Define ๐‘‹ ๐‘— = ๐ต ๐‘“ ๐‘— (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) = ๐ต ๐‘“ ๐‘— (๐‘Ÿ, ๐œƒ) โˆ’ E[(๐ต ๐‘“ ๐‘— )(๐‘Ÿ, ๐œƒ)]. This means each ๐‘‹ ๐‘— is zero centered, so we have ๏ฃฎ 2๏ฃน ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ ๏ฃฏ1 ๐‘€ ๏ฃฎ โˆ‘๏ธ ๏ฃน ๏ฃบ var(๐‘‹ ๐‘— ) ๏ฃบ = var ๏ฃฏ ๐‘‹ ๐‘— ๏ฃบ๏ฃบ = ๏ฃฏ ๏ฃบ E๏ฃฏ ๐‘‹๐‘— ๏ฃฏ . ๏ฃฏ๐‘€ ๏ฃบ ๐‘€ ๐‘—=1 ๏ฃบ ๐‘€ ๏ฃฏ ๐‘—=1 ๏ฃบ ๏ฃฏ ๏ฃฐ ๏ฃป ๏ฃฐ ๏ฃป Write ๐‘‹ ๐‘— = ๐ต ๐‘“ ๐‘— (๐‘Ÿ, ๐œƒ) โˆ’ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) + (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) โˆ’ E[(๐ต ๐‘“ ๐‘— )(๐‘Ÿ, ๐œƒ)]. Then ๐‘‹ 2๐‘— โ‰ฒ (๐ต ๐‘“ ๐‘— (๐‘Ÿ, ๐œƒ) โˆ’ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)) 2 + ((๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) โˆ’ E[(๐ต ๐‘“ ๐‘— )(๐‘Ÿ, ๐œƒ)]) 2 and E[๐‘‹ 2๐‘— ] โ‰ฒ E (๐ต ๐‘“ ๐‘— (๐‘Ÿ, ๐œƒ) โˆ’ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)) 2 + E ((๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) โˆ’ E[(๐ต ๐‘“ ๐‘— )(๐‘Ÿ, ๐œƒ)]) 2     โ‰ฒ E (๐ต ๐‘“ ๐‘— (๐‘Ÿ, ๐œƒ) โˆ’ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)) 2 .   Each ๐œ ๐‘— has bounded variance and is supported on [โˆ’1/2, 1/2]. Taylor expand the dilated bispectrum in radial variable in interval [๐‘Ÿ/2, 2๐‘Ÿ]: 1 (๐ต ๐‘“ ) ((1 โˆ’ ๐œ ๐‘— )๐‘Ÿ, ๐œƒ) = (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) โˆ’ ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐‘Ÿ๐œ ๐‘— + ๐œ•๐‘Ÿ๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) ๐‘Ÿ 2 ๐œ 2๐‘— , ๐›ผ โˆˆ [๐‘Ÿ/2, 2๐‘Ÿ]. 2 ๐‘Ÿ=๐›ผ Now multiply both sides by (1 โˆ’ ๐œ ๐‘— ) 3 to get (๐ต ๐‘“ ๐‘— )(๐‘Ÿ, ๐œƒ) = (1 โˆ’ ๐œ ๐‘— ) 3 (๐ต ๐‘“ )((1 โˆ’ ๐œ ๐‘— )๐‘Ÿ, ๐œƒ) = (1 โˆ’ ๐œ ๐‘— ) 3 (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) โˆ’ (1 โˆ’ ๐œ ๐‘— ) 3 ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐‘Ÿ๐œ ๐‘— 1 + (1 โˆ’ ๐œ ๐‘— ) 3 ๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)๐‘Ÿ 2 ๐œ 2๐‘— . 2 15 with ๐›ผ โˆˆ [๐‘Ÿ/2, 2๐‘Ÿ]. It now follows that (๐ต ๐‘“ ๐‘— )(๐‘Ÿ, ๐œƒ) โˆ’ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) = (3๐œ 2๐‘— โˆ’ 3๐œ ๐‘— โˆ’ ๐œ 3๐‘— )(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) โˆ’ (1 โˆ’ ๐œ ๐‘— ) 3 ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐‘Ÿ๐œ ๐‘— 1 + (1 โˆ’ ๐œ ๐‘— ) 3 ๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)๐‘Ÿ 2 ๐œ 2๐‘— . 2 with ๐›ผ โˆˆ [๐‘Ÿ/2, 2๐‘Ÿ]. Square both sides to get: ((๐ต ๐‘“ ๐‘— )(๐‘Ÿ, ๐œƒ) โˆ’ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)) 2 = (3๐œ 2๐‘— โˆ’ ๐œ ๐‘— โˆ’ ๐œ 3๐‘— ) 2 (๐ต ๐‘“ ) 2 (๐‘Ÿ, ๐œƒ) โˆ’ 2(3๐œ 2๐‘— โˆ’ ๐œ ๐‘— โˆ’ ๐œ 3๐‘— )(1 โˆ’ ๐œ ๐‘— ) 3 (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐‘Ÿ๐œ ๐‘— + (3๐œ 2๐‘— โˆ’ ๐œ ๐‘— โˆ’ ๐œ 3๐‘— )(1 โˆ’ ๐œ ๐‘— ) 3 (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)๐‘Ÿ 2 ๐œ 2๐‘— + (1 โˆ’ ๐œ ๐‘— ) 6 [๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)] 2๐‘Ÿ 2 ๐œ 2๐‘— โˆ’ (1 โˆ’ ๐œ ๐‘— ) 6 ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)๐‘Ÿ 3 ๐œ 3๐‘— (1 โˆ’ ๐œ ๐‘— ) 6 + [๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)] 2๐‘Ÿ 4 ๐œ 4๐‘— . 4 Using the inequality 2|๐‘Ž๐‘| โ‰ค |๐‘Ž| 2 + |๐‘| 2 , it follows that 2|(๐œ 2๐‘— โˆ’ ๐œ ๐‘— โˆ’ ๐œ 3๐‘— )(1 โˆ’ ๐œ ๐‘— ) 3 (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐‘Ÿ๐œ ๐‘— | โ‰ค (๐œ 2๐‘— โˆ’ ๐œ ๐‘— โˆ’ ๐œ 3๐‘— ) 2 |(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)| 2 + (1 โˆ’ ๐œ ๐‘— ) 6 |๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐‘Ÿ๐œ ๐‘— | 2 and 1 |(3๐œ 2๐‘— โˆ’ ๐œ ๐‘— โˆ’ ๐œ 3๐‘— )(1 โˆ’ ๐œ ๐‘— ) 3 (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)๐‘Ÿ 2 ๐œ 2๐‘— | โ‰ค (3๐œ 2๐‘— โˆ’ ๐œ ๐‘— โˆ’ ๐œ 3๐‘— ) 2 |(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)| 2 2 1 + |1 โˆ’ ๐œ ๐‘— | 6 |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)๐‘Ÿ 2 ๐œ 2๐‘— | 2 . 2 and 1 |(1 โˆ’ ๐œ ๐‘— ) 6 ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)๐‘Ÿ 3 ๐œ 3๐‘— | โ‰ค (1 โˆ’ ๐œ ๐‘— ) 6 |๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐œ ๐‘— ๐‘Ÿ | 2 2 1 + |1 โˆ’ ๐œ ๐‘— | 6 |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)๐‘Ÿ 2 ๐œ 2๐‘— | 2 . 2 16 Take the expectation of both sides now. Since the pdf of ๐œ is supported on [โˆ’1/2, 1/2] and zero centered, E[(3๐œ 2๐‘— โˆ’ ๐œ ๐‘— โˆ’ ๐œ 3๐‘— ) 2 ] โ‰ฒ E[๐œ 2๐‘— ] โ‰ฒ ๐œ‚2 . The other terms with ๐œ ๐‘— are bounded in a similar way. Thus E[((๐ต ๐‘“ ๐‘— )(๐‘Ÿ, ๐œƒ) โˆ’ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)) 2 ] โ‰ฒ ๐œ‚2 (๐ต ๐‘“ ) 2 (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ 2 ๐œ‚2 [๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)] 2  2 4 4 +๐œ‚ ๐‘Ÿ max |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)| . ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] We also have var[๐‘‹ ๐‘— ] = E[๐‘‹ 2๐‘— ] โ‰ฒ E[((๐ต ๐‘“ ๐‘— )(๐‘Ÿ, ๐œƒ) โˆ’ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)) 2 ]  2 2 2 2 2 2 4 4 โ‰ฒ ๐œ‚ (๐ต ๐‘“ ) (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ ๐œ‚ [๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)] + ๐œ‚ ๐‘Ÿ max |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)| , ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] and ๐œ‚2 ๐œ‚2 E (๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘”หœ ๐œ‚ (๐‘Ÿ, ๐œƒ)) 2 โ‰ฒ (๐ต ๐‘“ ) 2 (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ 2 [๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)] 2   ๐‘€ ๐‘€ 4  2 ๐œ‚ 4 + ๐‘Ÿ max |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)| . ๐‘€ ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] Now we can take the integral and expectation to get ๐œ‚2 ๐œ‚2 E โˆฅ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘”หœ ๐œ‚ (๐‘Ÿ, ๐œƒ)โˆฅ 22 โ‰ฒ โˆฅ(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 22 + โˆฅ๐‘Ÿ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 22   ๐‘€ ๐‘€ 4 2 ๐œ‚ 2 + ๐‘Ÿ max |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)| ๐‘€ ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 The first term is now handled appropriately. We can now repeat a nearly identical argument for the second term. Let ๐‘” ๐‘— = ๐ต ๐‘“ ๐‘— and ๐œ•๐‘” ๐‘— ๐œ•๐‘”๐œ‚ ๐‘๐‘— = ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ). ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ We have ๐‘€ ๐œ• ๐‘”หœ ๐œ‚ ๐œ•๐‘”๐œ‚ 1 โˆ‘๏ธ ๐œ•๐‘” ๐‘— ๐œ•๐‘”๐œ‚ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) = ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ). ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ ๐‘€ ๐‘—=1 ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ 17 By Leibniz Rule, we can take the derivative inside the expectation to get E[๐‘ ๐‘— ] = 0, and a similar argument from before yields  2  2 ๐œ•๐‘” ๐‘— ๐œ•๐‘” ๐œ•๐‘” ๐œ•๐‘”๐œ‚ ๐‘ 2๐‘— โ‰ฒ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ and " 2# ๐œ•๐‘” ๐‘— ๐œ•๐‘” E[๐‘ 2๐‘— ] โ‰ฒ E ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) . ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ Taylor expand ๐œ•๐‘Ÿ (๐ต ๐‘“ )((1 โˆ’ ๐œ ๐‘— )๐‘Ÿ, ๐œƒ) to get 1 ๐œ•๐‘Ÿ (๐ต ๐‘“ ) ((1 โˆ’ ๐œ ๐‘— )๐‘Ÿ, ๐œƒ) = ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) โˆ’ ๐œ•๐‘Ÿ๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐‘Ÿ๐œ ๐‘— + ๐œ•๐›พ๐›พ๐›พ (๐ต ๐‘“ )(๐›พ, ๐œƒ)๐‘Ÿ 2 ๐œ 2๐‘— , ๐›พ โˆˆ [๐‘Ÿ/2, 2๐‘Ÿ]. 2 ๐œ•๐‘” ๐‘— Since ๐‘Ÿ ๐œ•๐‘Ÿ (๐‘Ÿ, ๐œƒ) = ๐‘Ÿ (1 โˆ’ ๐œ ๐‘— ) 4 ๐œ•๐‘Ÿ (๐ต ๐‘“ )((1 โˆ’ ๐œ ๐‘— )๐‘Ÿ, ๐œƒ), multiply both sides by (1 โˆ’ ๐œ ๐‘— ) 4 : (1 โˆ’ ๐œ ๐‘— ) 4๐‘Ÿ๐œ•๐‘Ÿ (๐ต ๐‘“ )((1 โˆ’ ๐œ ๐‘— )๐‘Ÿ, ๐œƒ) = (1 โˆ’ ๐œ ๐‘— ) 4๐‘Ÿ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) 1 โˆ’ (1 โˆ’ ๐œ ๐‘— ) 4 ๐œ•๐‘Ÿ๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐‘Ÿ 2 ๐œ ๐‘— + (1 โˆ’ ๐œ ๐‘— ) 4 ๐œ•๐›พ๐›พ๐›พ (๐ต ๐‘“ )(๐›พ, ๐œƒ)๐‘Ÿ 3 ๐œ 2๐‘— 2 with ๐›พ โˆˆ [๐‘Ÿ/2, 2๐‘Ÿ]. Then ๐œ•๐‘” ๐‘— ๐œ•๐‘” ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) = (๐œ 4๐‘— โˆ’ 4๐œ 3๐‘— + 6๐œ 2๐‘— โˆ’ 4๐œ ๐‘— )๐‘Ÿ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ) ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ 1 โˆ’ (1 โˆ’ ๐œ ๐‘— ) 4 ๐œ•๐‘Ÿ๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)๐‘Ÿ 2 ๐œ ๐‘— + (1 โˆ’ ๐œ ๐‘— ) 4 ๐œ•๐›พ๐›พ๐›พ (๐ต ๐‘“ )(๐›พ, ๐œƒ)๐‘Ÿ 3 ๐œ 2๐‘— 2 with ๐›พ โˆˆ [๐‘Ÿ/2, 2๐‘Ÿ]. By a similar process from above, " 2# ๐œ•๐‘” ๐‘— ๐œ•๐‘”๐œ‚ ๐œ‚2 E ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘Ÿ (๐‘Ÿ, ๐œƒ) โ‰ฒ โˆฅ๐‘Ÿ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 22 ๐œ•๐‘Ÿ ๐œ•๐‘Ÿ ๐‘€ 2 ๐œ‚2 2 ๐œ‚4 3 + โˆฅ๐‘Ÿ ๐œ•๐‘Ÿ๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 22 + ๐‘Ÿ max |๐œ•๐›พ๐›พ๐›พ (๐ต ๐‘“ )(๐›พ, ๐œƒ)| . ๐‘€ ๐‘€ ๐›พโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 Thus h i 2   E โˆฅ๐ต ๐‘“ โˆ’ ๐ต f๐‘“ โˆฅ 2 โ‰ฒ ๐œ‚ โˆฅ(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 + 2โˆฅ๐‘Ÿ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 + +โˆฅ๐‘Ÿ 2 ๐œ•๐‘Ÿ๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 2 2 2 2 ๐‘€ ! 2 2 ๐œ‚4 2 3 + ๐‘Ÿ max |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)| + ๐‘Ÿ max |๐œ•๐›พ๐›พ๐›พ (๐ต ๐‘“ )(๐›พ, ๐œƒ)| . ๐‘€ ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 ๐›พโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 18 For the case where ๐ต ๐‘“ : R2 โ†’ C, simply write ๐ต ๐‘“ = Re(๐ต ๐‘“ ) + ๐‘–Im(๐ต ๐‘“ ) and repeat the argument above. Then it follows that h i E โˆฅRe(๐ต ๐‘“ ) โˆ’ Re( ๐ต f๐‘“ )โˆฅ 2 2 ๐œ‚2   โ‰ฒ โˆฅRe(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 22 + 2โˆฅ๐‘Ÿ๐œ•๐‘Ÿ Re(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 22 + +โˆฅ๐‘Ÿ 2 ๐œ•๐‘Ÿ๐‘Ÿ Re(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 22 ๐‘€ ! 2 2 ๐œ‚4 + ๐‘Ÿ 2 max |๐œ•๐›ผ Re(๐ต ๐‘“ )(๐›ผ, ๐œƒ)| + ๐‘Ÿ 3 max |๐œ•๐›ผ๐›ผ๐›ผ Re(๐ต ๐‘“ )(๐›พ, ๐œƒ)| ๐‘€ ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 ๐›พโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 and h i E โˆฅIm(๐ต ๐‘“ ) โˆ’ Im( ๐ต f๐‘“ )โˆฅ 2 2 ๐œ‚2  2 2 2 2  โ‰ฒ โˆฅIm(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 + 2โˆฅ๐‘Ÿ๐œ•๐‘Ÿ Im(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 + +โˆฅ๐‘Ÿ ๐œ•๐‘Ÿ๐‘Ÿ Im(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 ๐‘€ ! 2 2 ๐œ‚4 + ๐‘Ÿ 2 max |๐œ•๐›ผ๐›ผ Im(๐ต ๐‘“ )(๐›ผ, ๐œƒ) + ๐‘Ÿ 3 max |๐œ•๐›พ๐›พ๐›พ Im(๐ต ๐‘“ )(๐›พ, ๐œƒ)| . ๐‘€ ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 ๐›พโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 Adding the two inequalities together yields the desired result. โ–ก Now that we have solved the noiseless case, the goal is to move onto Model 2 and try to adapt our method to work in the presence of additive noise. 2.6 Bispectrum Recovery for Model 2 Now we move on to Model 2. This is similar, but the additive noise creates two difficulties. First, we must restrict ourselves from R2 to some finite domain ฮฉ since the MSE is not well defined on infinite intervals because of the noise. Second, we donโ€™t necessarily have access to ๐‘”หœ ๐œ‚ like before. Instead, we only know ๐‘€ 1 โˆ‘๏ธ ๐ต๐‘ฆ ๐‘— . ๐‘€ ๐‘—=1 โˆซ 1/2 Writing ๐œ€(๐œ”) ห† = โˆ’1/2 ๐‘’ โˆ’๐‘–๐œ”๐‘ฅ ๐‘‘๐ต๐‘ฅ as an integral with respect to a Brownian motion, it is clear that E[๐ต๐œ€ ๐‘— ] = 0. Also, notice that E[๐ต๐œ€ ๐‘— ] = E[ ๐œ€ห† ๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 )], where each ๐œ€ ๐‘— is white 19 noise process on [โˆ’1/2, 1/2] with variance ๐œŽ 2 . We see that for each index ๐ต๐‘ฆ ๐‘— (๐œ”1 , ๐œ”2 ) = ๐ต( ๐‘“ ๐‘— + ๐œ€ ๐‘— )(๐œ”1 , ๐œ”2 ) = ( ๐‘“ห†๐‘— (๐œ”1 ) + ๐œ€ห† ๐‘— (๐œ”1 ))( ๐‘“ห†๐‘—โˆ— (๐œ”2 ) + ๐œ€ห†โˆ—๐‘— (๐œ”2 ))( ๐‘“ห†๐‘— (๐œ”2 โˆ’ ๐œ”1 ) + ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 )) = ๐‘“ห†๐‘— (๐œ”1 ) ๐‘“ห†๐‘—โˆ— (๐œ”2 ) ๐‘“ห†๐‘— (๐œ”2 โˆ’ ๐œ”1 ) + ๐‘“ห†๐‘— (๐œ”1 ) ๐‘“ห†๐‘—โˆ— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) + ๐‘“ห†๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) + ๐‘“ห†๐‘—โˆ— (๐œ”2 ) ๐‘“ห†๐‘— (๐œ”2 โˆ’ ๐œ”1 ) ๐œ€ห† ๐‘— (๐œ”1 ) + ๐‘“ห†๐‘—โˆ— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”1 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) + ๐œ€ห† ๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐‘“ห†๐‘— (๐œ”2 โˆ’ ๐œ”1 ) + ๐‘“ห†๐‘— (๐œ”1 ) ๐‘“ห†๐‘— (๐œ”2 โˆ’ ๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) + ๐œ€ห† ๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) := ๐ต ๐‘“ ๐‘— (๐œ”1 , ๐œ”2 ) + ๐‘… ๐‘— (๐œ”1 , ๐œ”2 ). 1 ร๐‘€ Thus, we need to perform a ๐œŽ-based centering to recover ๐‘€ ๐‘—=1 ๐ต ๐‘“ ๐‘— (๐œ”1 , ๐œ”2 ). We see that if we take the expectation over ๐œ€, we have E๐œ€ [๐ต๐‘ฆ ๐‘— (๐œ”1 , ๐œ”2 )] = ๐ต ๐‘“ ๐‘— (๐œ”1 , ๐œ”2 ) + E๐œ€ [ ๐‘“ห†๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) + ๐‘“ห†๐‘—โˆ— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”1 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) + ๐œ€ห† ๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐‘“ห†๐‘— (๐œ”2 โˆ’ ๐œ”1 )]. Now we know from Theorem 4.5 of [29]   1 sin 2 (๐œ”2 โˆ’ ๐œ”1 ) ห† 1 ) ๐œ€ห†โˆ— (๐œ”2 )] = 2๐œŽ 2 E๐œ€ [ ๐œ€(๐œ” . ๐œ”2 โˆ’ ๐œ”1 sin ( 21 ๐œ”) Define โ„Ž(๐œ”) = 2๐œŽ 2 ๐œ” . Note that โ„Ž is an even function since it is the product of two odd functions. Taking the expectation yields E๐œ€ [๐ต๐‘ฆ ๐‘— (๐œ”1 , ๐œ”2 )] = ๐ต ๐‘“ ๐‘— (๐œ”1 , ๐œ”2 ) + ๐‘“ห†๐‘— (๐œ”1 )โ„Ž(๐œ”1 ) + ๐‘“ห†๐‘—โˆ— (๐œ”2 )โ„Ž(๐œ”2 ) + ๐‘“ห†๐‘— (๐œ”2 โˆ’ ๐œ”1 )โ„Ž(๐œ”2 โˆ’ ๐œ”1 ). Now take the expectation over the translation and dilation parameters to get E[๐ต๐‘ฆ ๐‘— (๐œ”1 , ๐œ”2 )] = E[๐ต ๐‘“ ๐‘— (๐œ”1 , ๐œ”2 )] + E[ ๐‘“ห†๐‘— (๐œ”1 )]โ„Ž(๐œ”1 ) + E[ ๐‘“ห†๐‘—โˆ— (๐œ”2 )]โ„Ž(๐œ”2 ) + E[ ๐‘“ห†๐‘— (๐œ”2 โˆ’ ๐œ”1 )]โ„Ž(๐œ”2 โˆ’ ๐œ”1 ). 20 Denote the ๐œ‡(๐œ”) = E[ ๐‘“ห†๐‘— (๐œ”)] and let ๐œ‡(๐œ”) 1 ร๐‘€ หœ = ๐‘€ ๐‘—=1 ๐‘ฆห† ๐‘— (๐œ”). We will approximate the expectation using ๐‘€ 1 โˆ‘๏ธ E[๐ต ๐‘“ ๐‘— ] โ‰ˆ หœ 1 )โ„Ž(๐œ”1 ) โˆ’ ๐œ‡หœ โˆ— (๐œ”2 )โ„Ž(๐œ”2 ) โˆ’ ๐œ‡(๐œ” ๐ต๐‘ฆ ๐‘— โˆ’ ๐œ‡(๐œ” หœ 2 โˆ’ ๐œ”1 )โ„Ž(๐œ”2 โˆ’ ๐œ”1 ) ๐‘€ ๐‘—=1 ๐‘€ 1 โˆ‘๏ธ = ๐ต๐‘ฆ ๐‘— โˆ’ ๐‘…๐œŽ . ๐‘€ ๐‘—=1 After empirical centering by ๐‘…๐œŽ , we can thus decompose the computable quantity into two pieces: ๐‘€ 1 โˆ‘๏ธ ๐ต๐‘ฆ ๐‘— โˆ’ ๐‘…๐œŽ = ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ , ๐‘€ ๐‘—=1 where ๐‘€ ๐‘€ 1 โˆ‘๏ธ 1 โˆ‘๏ธ ๐‘”หœ ๐œŽ = ๐‘… ๐‘— (๐œ”1 , ๐œ”2 ) โˆ’ ๐‘…๐œŽ (๐œ”1 , ๐œ”2 ) = ๐‘… ๐‘— (๐‘Ÿ, ๐œƒ) โˆ’ ๐‘…๐œŽ (๐‘Ÿ, ๐œƒ). ๐‘€ ๐‘—=1 ๐‘€ ๐‘—=1 The term ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ is not smooth due to the additive noise. Thus we need to add some procedure 2 /(2๐ฟ 2 ) to smooth the signal. Let ๐œ™ ๐ฟ (๐‘Ÿ) = (2๐œ‹๐ฟ 2 ) โˆ’1/2 ๐‘’ โˆ’๐‘Ÿ be a low pass filter. We define a new estimator for Model 2 as f๐‘“ ) (๐‘Ÿ, ๐œƒ) := (๐ผ โˆ’ ๐ฟ ๐ถ0 ) โˆ’1๐ถ1 ๐ฟ ๐ถ2 4(( ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ ) โˆ— ๐œ™ ๐ฟ )(๐‘Ÿ, ๐œƒ) + ๐‘Ÿ๐œ•๐‘Ÿ (( ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ ) โˆ— ๐œ™ ๐ฟ )(๐‘Ÿ, ๐œƒ) . (2.12)  (๐ต We will use the following two lemmas, whose proofs are similar to [26]. Lemma 2. Let ๐‘ž โˆˆ ๐ฟ 2 (R2 ) and assume | ๐‘ž(๐œ”)| ห† decays like |๐‘ž| โˆ’๐›ผ for some integer ๐›ผ โ‰ฅ 2 and ๐œ” such that |๐œ”| โ‰ฅ ๐œ”0 for some ๐œ”0 > 0. Then for ๐ฟ small enough, โˆฅ๐‘ž โˆ’ ๐‘ž โˆ— ๐œ™ ๐ฟ โˆฅ 22 โ‰ฒ โˆฅ๐‘žโˆฅ 22 ๐ฟ 4 + ๐ฟ 4โˆง(2๐›ผโˆ’2) . 2 2 Proof. We have ๐œ™ห†๐ฟ (|๐œ”|) = ๐‘’ โˆ’๐ฟ |๐œ”| /2 and ๐ฟ 2 |๐œ”| 2 ห† 1 โˆ’ ๐œ™ ๐ฟ (|๐œ”|) = + ๐‘‚ (๐ฟ 3 ). 2 21 Letting |๐œ”| = ๐‘Ÿ, we write the integral in polar coordinates and get 1 โˆฅ๐‘ž โˆ’ ๐‘ž โˆ— ๐œ™ ๐ฟ โˆฅ 22 = 2 ห† โˆ’ ๐œ™ห†๐ฟ )โˆฅ 22 โˆฅ ๐‘ž(1 (2๐œ‹) โˆซ 2๐œ‹ โˆซ โˆž 1 = ห† ๐œƒ)| 2 |1 โˆ’ ๐œ™ห†๐ฟ (๐‘Ÿ)| 2๐‘Ÿ๐‘‘๐‘Ÿ๐‘‘๐œƒ | ๐‘ž(๐‘Ÿ, (2๐œ‹) 2 0 0 โˆซ 2๐œ‹ โˆซ ๐œ”0 1 = 2 ห† ๐œƒ)| 2 |1 โˆ’ ๐œ™ห†๐ฟ (๐‘Ÿ)| | ๐‘ž(๐‘Ÿ, ห† 2๐‘Ÿ๐‘‘๐‘Ÿ๐‘‘๐œƒ (2๐œ‹) 0 0 โˆซ 2๐œ‹ โˆซ โˆž 1 + ห† ๐œƒ)| 2 |1 โˆ’ ๐œ™ห†๐ฟ (๐‘Ÿ)| 2๐‘Ÿ๐‘‘๐‘Ÿ๐‘‘๐œƒ | ๐‘ž(๐‘Ÿ, (2๐œ‹) 2 0 ๐œ”0 := ๐ผ1 + ๐ผ2 . For ๐ผ1 , we have 2๐œ‹ 2 ๐ฟ 2 |๐‘Ÿ | 2 โˆซ โˆซ ๐œ”0  1 2 3 ๐ผ1 = | ๐‘ž(๐‘Ÿ, ห† ๐œƒ)| + ๐‘‚ (๐ฟ ) ๐‘Ÿ ๐‘‘๐‘Ÿ ๐‘‘๐œƒ (2๐œ‹) 2 0 0 2 !โˆซ 1 ๐ฟ 4 ๐œ”40 5 2๐œ‹ โˆซ ๐œ”0 โ‰ค + ๐‘‚ (๐ฟ ) ห† ๐œƒ)| 2๐‘Ÿ๐‘‘๐‘Ÿ๐‘‘๐œƒ | ๐‘ž(๐‘Ÿ, 2๐œ‹ 2 4 0 0 โ‰ฒ ๐œ”40 โˆฅ๐‘žโˆฅ 22 ๐ฟ 4 + ๐‘‚ (๐ฟ 5 ). For ๐ผ2 , โˆซ 2๐œ‹ โˆซ โˆž ๐ถ2 2 2 ๐ผ2 โ‰ค 2 ๐‘Ÿ โˆ’2๐›ผ+1 (1 โˆ’ ๐‘’ โˆ’๐ฟ ๐‘Ÿ /2 ) 2 ๐‘‘๐‘Ÿ ๐‘‘๐œƒ (2๐œ‹) 0 1 2 โˆซ โˆž ๐ถ 2 2 = ๐‘Ÿ โˆ’2๐›ผ+1 (1 โˆ’ ๐‘’ โˆ’๐ฟ ๐‘Ÿ /2 ) 2 ๐‘‘๐‘Ÿ. 2๐œ‹ 1 Using the same argument as [26], it follows that ๐ผ2 โ‰ฒ ๐ฟ 4โˆง(2๐›ผโˆ’2) . โ–ก Lemma 3. Let ๐‘Ÿ๐‘ž(๐‘Ÿ, ๐œƒ) โˆˆ ๐ฟ 2 (R2 , ๐‘‘๐‘Ÿ ร— ๐‘‘๐œƒ) and assume its Fourier transform (ยท)๐‘ž(ยท)(๐œ”) \ decays like |๐‘ค| โˆ’๐›ผ for some integer ๐›ผ โ‰ฅ 2. Then for ๐ฟ small enough, โˆฅ๐‘Ÿ (๐‘ž โˆ’ ๐‘ž โˆ— ๐œ™ ๐ฟ )(๐‘Ÿ, ๐œƒ)โˆฅ 22 โ‰ฒ โˆฅ๐‘Ÿ๐‘ž(๐‘Ÿ, ๐œƒ)โˆฅ 22 ๐ฟ 4 + ๐ฟ 4โˆง(2๐›ผโˆ’2) + ๐ฟ 3 โˆฅ๐‘žโˆฅ 22 . 22 Proof. First, we switch back to rectangular coordinates: โˆฅ๐‘Ÿ (๐‘ž โˆ’ ๐‘ž โˆ— ๐œ™ ๐ฟ )(๐‘Ÿ, ๐œƒ)โˆฅ 22 = โˆฅ(๐œ”21 + ๐œ”22 ) 1/2 (๐‘ž โˆ’ ๐‘ž โˆ— ๐œ™ ๐ฟ )(๐œ”1 , ๐œ”2 )โˆฅ 22 โˆซ = ๐‘ค 21 |(๐‘ž โˆ’ ๐‘ž โˆ— ๐œ™ ๐ฟ )(๐œ”1 , ๐œ”2 )| 2 ๐‘‘๐œ”1 ๐‘‘๐œ”2 2 โˆซR + ๐‘ค 22 |(๐‘ž โˆ’ ๐‘ž โˆ— ๐œ™ ๐ฟ )(๐œ”1 , ๐œ”2 )| 2 ๐‘‘๐œ”1 ๐‘‘๐œ”2 R 2 โˆซ 1 = |๐œ•๐‘ก ๐‘ž(๐‘ก ห† 1 , ๐‘ก2 ) โˆ’ ๐œ•๐‘ก1 ( ๐‘ž(๐‘ก ห† 1 , ๐‘ก2 ) ๐œ™ห†๐ฟ (๐‘ก1 , ๐‘ก2 ))| 2 ๐‘‘๐‘ก 1 ๐‘‘๐‘ก2 (2๐œ‹) 2 R2 1 โˆซ 1 + |๐œ•๐‘ก ๐‘ž(๐‘ก ห† 1 , ๐‘ก2 ) โˆ’ ๐œ•๐‘ก2 ( ๐‘ž(๐‘ก ห† 1 , ๐‘ก2 ) ๐œ™ห†๐ฟ (๐‘ก1 , ๐‘ก2 ))| 2 ๐‘‘๐‘ก 1 ๐‘‘๐‘ก2 (2๐œ‹) 2 R2 2 := ๐ผ1 + ๐ผ2 . For the first term, take derivatives to get โˆซ 1 ๐ผ1 = 2 |๐œ•๐‘ก1 ๐‘ž(๐‘ก ห† 1 , ๐‘ก2 ) โˆ’ ๐œ•๐‘ก1 ๐‘ž(๐‘ก ห† 1 , ๐‘ก2 ) ๐œ™ห†๐ฟ (๐‘ก1 , ๐‘ก2 ) โˆ’ ๐‘ž(๐‘ก ห† 1 , ๐‘ก2 )๐œ•๐‘ก1 ๐œ™ห†๐ฟ (๐‘ก1 , ๐‘ก2 )| 2 ๐‘‘๐‘ก1 ๐‘‘๐‘ก 2 (2๐œ‹) R2 โˆซ 1 โ‰ฒ 2 |๐œ•๐‘ก1 ๐‘ž(๐‘ก ห† 1 , ๐‘ก2 ) โˆ’ ๐œ•๐‘ก1 ๐‘ž(๐‘ก ห† 1 , ๐‘ก2 ) ๐œ™ห†๐ฟ (๐‘ก1 , ๐‘ก2 )| 2 ๐‘‘๐‘ก1 ๐‘‘๐‘ก 2 (2๐œ‹) R2 โˆซ 1 + 2 ห† 1 , ๐‘ก2 )๐œ•๐‘ก1 ๐œ™ห†๐ฟ (๐‘ก1 , ๐‘ก2 )| 2 ๐‘‘๐‘ก1 ๐‘‘๐‘ก2 | ๐‘ž(๐‘ก (2๐œ‹) R2 โ‰ฒ โˆฅ๐œ”1 ๐‘ž(๐œ”1 , ๐œ”2 ) โˆ’ (๐œ”1 ๐‘ž) โˆ— ๐œ™ ๐ฟ (๐œ”1 , ๐œ”2 )โˆฅ 22 + โˆฅ ๐‘ž(๐‘ก ห† 1 , ๐‘ก2 )๐œ•๐‘ก1 ๐œ™ห†๐ฟ (๐‘ก1 , ๐‘ก2 )โˆฅ 22 . We use Lemma 2 to get 4โˆฅ๐œ”1 ๐‘ž(๐œ”1 , ๐œ”2 ) โˆ’ (๐œ”1 ๐‘ž) โˆ— ๐œ™ ๐ฟ (๐œ”1 , ๐œ”2 )โˆฅ 22 โ‰ฒ โˆฅ๐œ”1 ๐‘ž(๐œ”1 , ๐œ”2 )โˆฅ 22 ๐ฟ 4 + ๐ฟ 4โˆง(2๐›ผโˆ’2) . 2 (๐‘ก 2 +๐‘ก 2 )/2 By how we defined ๐œ™ ๐ฟ , we also have ๐œ™ห†๐ฟ (๐‘ก1 , ๐‘ก2 ) = ๐‘’ โˆ’๐ฟ 1 2 . Take the derivative with respect to ๐‘ก1 : 2 (๐‘ก 2 +๐‘ก 2 )/2 ห† 1 , ๐‘ก2 ) = โˆ’๐ฟ 2 ๐‘ก1 ๐‘’ โˆ’๐ฟ ๐œ•๐‘ก1 ๐œ™(๐‘ก 1 2 and get ห† 1 , ๐‘ก2 )๐œ•๐‘ก1 ๐œ™ห†๐ฟ (๐‘ก1 , ๐‘ก2 )โˆฅ 22 โ‰ค ๐ฟ 3 โˆฅ๐‘žโˆฅ 22 . โˆฅ ๐‘ž(๐‘ก The result is ๐ผ1 โ‰ฒ โˆฅ๐œ”1 ๐‘ž(๐œ”1 , ๐œ”2 )โˆฅ 22 ๐ฟ 4 + ๐ฟ 4โˆง(2๐›ผโˆ’2) + ๐ฟ 3 โˆฅ๐‘žโˆฅ 22 . 23 An identical argument is used to get ๐ผ2 โ‰ฒ โˆฅ๐œ”2 ๐‘ž(๐œ”1 , ๐œ”2 )โˆฅ 22 ๐ฟ 4 + ๐ฟ 4โˆง(2๐›ผโˆ’2) + ๐ฟ 3 โˆฅ๐‘žโˆฅ 22 . Now combine ๐ผ1 and ๐ผ2 . The previous work implies โˆฅ๐œ”1 ๐‘ž(๐œ”1 , ๐œ”2 )โˆฅ 22 + โˆฅ๐œ”2 ๐‘ž(๐œ”1 , ๐œ”2 )โˆฅ 22 = โˆฅ๐‘Ÿ๐‘ž(๐‘Ÿ, ๐œƒ)โˆฅ 22 and โˆฅ๐‘Ÿ (๐‘ž โˆ’ ๐‘ž โˆ— ๐œ™ ๐ฟ )(๐‘Ÿ, ๐œƒ)โˆฅ 22 โ‰ฒ โˆฅ๐‘Ÿ๐‘ž(๐‘Ÿ, ๐œƒ)โˆฅ 22 ๐ฟ 4 + ๐ฟ 4โˆง(2๐›ผโˆ’2) + ๐ฟ 3 โˆฅ๐‘žโˆฅ 22 . โ–ก Along with these two lemmas, we will also need the following lemma. Lemma 4. Suppose ๐œ€ is a mean zero Gaussian white noise supported on [โˆ’1/2, 1/2] with variance ๐œŽ 2 . For all ๐‘ > 0 and ๐œ” โˆˆ R, E [| ๐œ€(๐œ”)| ห† ๐‘ ] โ‰ฒ๐‘ ๐œŽ ๐‘ . Proof. We rewrite ๐œ€ห† as an integral with respect to a Brownian motion: โˆซ 1/2 ห† ๐œ€(๐œ”) = ๐‘’ โˆ’๐‘–๐œ”๐‘ฅ ๐‘‘๐ต๐‘ฅ . โˆ’1/2 Let โˆซ 1/2 ๐‘”1 (๐œ”) = Re( ๐œ€(๐œ”)) ห† = cos(๐œ”๐‘ฅ)๐‘‘๐ต๐‘ฅ , โˆ’1/2 โˆซ 1/2 ๐‘”2 (๐œ”) = Im( ๐œ€(๐œ”)) ห† = sin(๐œ”๐‘ฅ)๐‘‘๐ต๐‘ฅ . โˆ’1/2 ยฉ๐‘”1 (๐œ”) ยช For fixed ๐œ”, the random vector ยญยญ ยฎ โˆผ ๐‘ (0, ฮฃ(๐œ”)) where the covariance matrix is given by ยฎ ๐‘”2 (๐œ”) ยซ ยฌ 2ยญ ยฉ1 + sin ๐œ”๐œ”cos ๐œ” 0 ยช ฮฃ(๐œ”) = ๐œŽ ยญ ยฎ. sin ๐œ” cos ๐œ” ยฎ 0 1โˆ’ ๐œ” ยซ ยฌ 24 We will now prove a bound for ๐‘”1 (๐œ”). An identical bound applies for ๐‘”2 (๐œ”). Since ๐‘”1 (๐œ”) is normal,   ๐‘+1 Var ๐‘ (๐‘”1 (๐œ”)) 2 ๐‘/2 ฮ“ 2 E[|๐‘”1 (๐œ”)| ๐‘ ] = ๐œŽ ๐‘ โˆš ๐œŽ 2๐‘ ๐œ‹  ๐‘   ๐œŽ 2๐‘ 1 โˆ’ sin ๐œ”๐œ”cos ๐œ” 2 ๐‘/2 ฮ“ ๐‘+1 2 โ‰ค ๐œŽ๐‘ โˆš ๐œŽ 2๐‘ ๐œ‹ โ‰ฒ ๐œŽ๐‘. Now we have   ๐‘/2  ๐‘ 2 ห† E[| ๐œ€(๐œ”)| ]=E | ๐œ€(๐œ”)| ห† h i = E (๐‘”12 (๐œ”) + ๐‘”22 (๐œ”)) ๐‘/2 h ๐‘ i h ๐‘ i โ‰ฒ ๐‘ (E |๐‘”12 (๐œ”)| 2 + E |๐‘”22 (๐œ”)| 2 ) โ‰ฒ๐‘ ๐œŽ ๐‘ . โ–ก Lemma 5. Assume that the assumptions of Model 2 hold, ๐ต ๐‘“ โˆˆ ๐ถ 3 (R2 ), (ยท) ๐ต c๐‘“ (ยท) decays like | ยท | ๐œ… for some ๐œ… โ‰ฅ 2, ๐‘Ÿ2 max |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)| โˆˆ ๐ฟ 2 (R2 , ๐‘‘๐‘Ÿ ร— ๐‘‘๐œƒ), ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] and ๐‘Ÿ3 max |๐œ•๐›พ๐›พ๐›พ (๐ต ๐‘“ )(๐›พ, ๐œƒ)| โˆˆ ๐ฟ 2 (R2 , ๐‘‘๐‘Ÿ ร— ๐‘‘๐œƒ). ๐›พโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] We have the bound ๐œŽ2 ๐œŽ4 ๐œŽ6 โˆฅ ๐‘”หœ ๐œŽ โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ฒฮฉ,๐œ โˆฅ ๐‘“ โˆฅ 42 + โˆฅ ๐‘“ โˆฅ 22 + . ๐‘€ ๐‘€ ๐‘€ 25 Proof. We use triangle inequality to get 2 2 ๐‘€ ๐‘€ 1 โˆ‘๏ธ ห† 1 โˆ‘๏ธ ห†โˆ— โˆฅ ๐‘”หœ ๐œŽ โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ฒ ๐‘“ ๐‘— (๐œ”1 ) ๐‘“ห†๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) + ๐‘“ (๐œ”2 ) ๐‘“ห†๐‘— (๐œ”2 โˆ’ ๐œ”1 ) ๐œ€ห† ๐‘— (๐œ”1 ) ๐‘€ ๐‘—=1 ๐‘€ ๐‘—=1 ๐‘— ๐ฟ 2 (ฮฉ) ๐ฟ 2 (ฮฉ) 2 2 ๐‘€ ๐‘€ 1 โˆ‘๏ธ ห† 1 โˆ‘๏ธ + ๐‘“ ๐‘— (๐œ”1 ) ๐‘“ห†๐‘— (๐œ”2 โˆ’ ๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) + ๐ต๐œ€ ๐‘— (๐œ”1 , ๐œ”2 ) ๐‘€ ๐‘—=1 ๐‘€ ๐‘—=1 ๐ฟ 2 (ฮฉ) ๐ฟ 2 (ฮฉ) 2 ๐‘€ 1 โˆ‘๏ธ ห† + ๐‘“ ๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) โˆ’ โ„Ž(๐œ”1 ) ๐œ‡(๐œ” หœ 1) ๐‘€ ๐‘—=1 ๐ฟ 2 (ฮฉ) 2 ๐‘€ 1 โˆ‘๏ธ ห†โˆ— + ๐‘“ (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”1 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) โˆ’ โ„Ž(๐œ”2 ) ๐œ‡หœ โˆ— (๐œ”2 ) ๐‘€ ๐‘—=1 ๐‘— ๐ฟ 2 (ฮฉ) 2 ๐‘€ 1 โˆ‘๏ธ ห† + ๐‘“ ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) ๐œ€ห† ๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) โˆ’ โ„Ž(๐œ”2 โˆ’ ๐œ”1 ) ๐œ‡(๐œ” หœ 2 โˆ’ ๐œ”1 ) . ๐‘€ ๐‘—=1 ๐ฟ 2 (ฮฉ) The argument for bounding the expectation of the first three terms is similar, so we only provide 2 the proof for the first term ๐‘€1 ๐‘€ ร ห† ห†โˆ— ๐‘—=1 ๐‘“ ๐‘— (๐œ”1 ) ๐‘“ ๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) 2 . We have ๐ฟ (ฮฉ) ๏ฃฎ 2 ๏ฃน ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ ๐‘“ห†๐‘— (๐œ”1 ) ๐‘“ห†๐‘—โˆ— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) ๏ฃฏ ๏ฃบ E๏ฃฏ ๏ฃบ ๏ฃฏ ๐‘€ ๏ฃบ ๏ฃฏ ๐‘—=1 2 (ฮฉ) ๏ฃบ ๏ฃฐ ๐ฟ ๏ฃป ๏ฃฎ 2๏ฃน โˆซ ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ = ๏ฃฏ E๏ฃฏ 2 ห† ห† ๐‘“ ๐‘— (๐œ”1 ) ๐‘“ ๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) ๏ฃบ ๐‘‘๐œ”1 ๐‘‘๐œ”2 ๏ฃบ ฮฉ ๏ฃฏ๐‘€ ๏ฃฏ ๐‘—=1 ๏ฃบ ๏ฃบ ๏ฃฐ ๏ฃป 2 โˆซ โˆ‘๏ธ ๐‘€ ๐œŽ = 2 | ๐‘“ห†๐‘— (๐œ”1 ) ๐‘“ห†๐‘— (๐œ”2 )| 2 ๐‘‘๐œ”1 ๐‘‘๐œ”2 ๐‘€ ฮฉ ๐‘—=1 ๐‘€ โˆซ ๐œŽ 2 โˆ‘๏ธ = 2 | ๐‘“ห†๐‘— (๐œ”1 ) ๐‘“ห†๐‘— (๐œ”2 )| 2 ๐‘‘๐œ”1 ๐‘‘๐œ”2 ๐‘€ ๐‘—=1 ฮฉ ๐‘€ โˆซ ๐œŽ 2 โˆ‘๏ธ โˆซ 2 = 2 ห† | ๐‘“ ๐‘— (๐œ”1 )| ๐‘‘๐œ”1 | ๐‘“ห†๐‘— (๐œ”2 )| 2 ๐‘‘๐œ”2 ๐‘€ ๐‘—=1 ฮฉ ฮฉ ๐œŽ2 ห† 4 = โˆฅ ๐‘“ โˆฅ ๐ฟ 2 (ฮฉ) ๐‘€ ๐œŽ2 โ‰ค โˆฅ ๐‘“ โˆฅ| 4๐ฟ 2 (R) , ๐‘€ 26 where the last line follows from Phlancherel Theorem. A similar argument proves that each of the ๐œŽ2 first three terms on the right are bounded on the order of ๐‘€โˆฅ ๐‘“ โˆฅ 42 . For the fourth term, since E[๐ต๐œ– ๐‘— ] = 0, we have ๏ฃฎ 2 ๏ฃน ๏ฃฎโˆซ 2 ๏ฃน ๐‘€ ๐‘€ ๏ฃฏ 1 โˆ‘๏ธ ๏ฃบ ๏ฃฏ 1 โˆ‘๏ธ ๏ฃบ ๐ต๐œ€ ๐‘— (๐œ”1 , ๐œ”2 ) ๏ฃบ =E๏ฃฏ ๐ต๐œ€ ๐‘— (๐œ”1 , ๐œ”2 ) ๐‘‘๐œ”1 ๐‘‘๐œ”2 ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ E๏ฃฏ ๏ฃฏ ๐‘€ ๏ฃบ ๏ฃฏ ฮฉ ๐‘€ ๏ฃบ ๐‘—=1 ๐‘—=1 ๏ฃฏ ๏ฃฐ ๐ฟ 2 (ฮฉ) ๏ฃบ๏ฃป ๏ฃฏ ๏ฃฐ ๏ฃบ ๏ฃป ๏ฃฎ 2๏ฃน โˆซ ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ = ๐ต๐œ€ ๐‘— (๐œ”1 , ๐œ”2 ) ๏ฃบ ๐‘‘๐œ”1 ๐‘‘๐œ”2 ๏ฃฏ ๏ฃบ E๏ฃฏ ฮฉ ๏ฃฏ๏ฃฏ ๐‘€ ๐‘—=1 ๏ฃบ ๏ฃบ ๏ฃฐ ๏ฃป โˆซ ๏ฃฎ โˆ‘๏ธ ๏ฃฏ1 ๐‘€ ๏ฃน ๏ฃบ = Var ๏ฃฏ๏ฃฏ ๐ต๐œ€ ๐‘— (๐œ”1 , ๐œ”2 ) ๏ฃบ๏ฃบ ๐‘‘๐œ”1 ๐‘‘๐œ”2 ฮฉ ๏ฃฏ ๐‘€ ๐‘—=1 ๏ฃบ โˆซ ๏ฃฐ ๏ฃป 1 = Var(๐ต๐œ€ ๐‘— ) ๐‘‘๐œ”1 ๐‘‘๐œ”2 . ๐‘€ ฮฉ We now bound Var(๐ต๐œ€ ๐‘— ). First, since the expectation is zero, by Holderโ€™s inequality and the Lemma 4 to get Var(๐ต๐œ€ ๐‘— ) = E[|๐ต๐œ€ ๐‘— | 2 ] ห† 1 )| 2 | ๐œ€(๐œ”ห† 2 )| 2 | ๐œ€(๐œ” ห† 2 โˆ’ ๐œ”1 )| 2   = E | ๐œ€(๐œ”  1/3   1/3   1/3 ห† 1 )| 6 ห† 2 )| 6 ห† 2 โˆ’ ๐œ”1 )| 6  โ‰ค E | ๐œ€(๐œ” E | ๐œ€(๐œ” E | ๐œ€(๐œ” โ‰ฒ ๐œŽ6 . Thus, we obtain ๏ฃฎ 2 ๏ฃน ๐‘€ ๏ฃฏ 1 โˆ‘๏ธ ๏ฃบ ๐œŽ6 ๐ต๐œ€ ๐‘— (๐œ”1 , ๐œ”2 ) ๏ฃบโ‰ฒ |ฮฉ|. ๏ฃฏ ๏ฃบ E๏ฃฏ ๏ฃฏ ๐‘€ ๏ฃบ ๐‘€ ๐‘—=1 ๏ฃฏ ๏ฃฐ ๐ฟ 2 (ฮฉ) ๏ฃบ๏ฃป Now we bound the last three terms. We start with ๏ฃฎ 2 ๏ฃน ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ ๐ด=E๏ฃฏ ๐‘“ห†๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ) โˆ’ โ„Ž(๐œ”1 ) ๐œ‡(๐œ” หœ 1) ๏ฃฏ ๏ฃบ ๏ฃบ. ๏ฃฏ ๐‘€ ๏ฃบ ๐‘—=1 ๏ฃฏ ๏ฃฐ ๐ฟ 2 (ฮฉ) ๏ฃบ๏ฃป Consider the random variable ๐ด ๐‘— = ๐‘“ห†๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 ). 27 Then ๏ฃฎ 2 ๏ฃน ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ ๐ด=E๏ฃฏ ๐ด ๐‘— โˆ’ โ„Ž(๐œ”1 ) ๐œ‡(๐œ” หœ 1) ๏ฃฏ ๏ฃบ ๏ฃบ ๏ฃฏ ๐‘€ ๏ฃบ ๏ฃฏ ๐‘—=1 2 (ฮฉ) ๏ฃบ ๏ฃฐ ๐ฟ ๏ฃป ๏ฃฎ 2 ๏ฃน ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ =E๏ฃฏ ยญ ๐ด ๐‘— โˆ’ โ„Ž(๐œ”1 )๐œ‡(๐œ”1 ) ยฎ + โ„Ž(๐œ”1 )๐œ‡(๐œ”1 ) โˆ’ โ„Ž(๐œ”1 ) ๐œ‡(๐œ” หœ 1) ๏ฃฏ ยฉ ยช ๏ฃบ ๏ฃบ ๏ฃฏ ๐‘€ ๏ฃบ ๏ฃฏ ยซ ๐‘—=1 2 (ฮฉ) ๏ฃบ ๏ฃฐ ยฌ ๐ฟ ๏ฃป ๏ฃฎ 2 ๏ฃน ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ h i 2 โ‰ฒE๏ฃฏ ๐ด ๐‘— โˆ’ โ„Ž(๐œ”1 )๐œ‡(๐œ”1 ) ๏ฃบ + E โˆฅโ„Ž(๐œ”1 )๐œ‡(๐œ”1 ) โˆ’ โ„Ž(๐œ”1 ) ๐œ‡(๐œ” หœ 1 )โˆฅ ๐ฟ 2 (ฮฉ) . ๏ฃฏ ๏ฃบ ๏ฃฏ ๐‘€ ๏ฃบ ๐‘—=1 ๏ฃฏ ๏ฃฐ ๐ฟ 2 (ฮฉ) ๏ฃป๏ฃบ For the first term at the end of the inequality, we see that โ„Ž(๐œ”1 )๐œ‡(๐œ”1 ) is the mean of ๐ด ๐‘— for fixed (๐œ”1 , ๐œ”2 ). Thus, ๏ฃฎ 2 ๏ฃน โˆซ ๏ฃฎ 2๏ฃน ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ ๐ด ๐‘— โˆ’ โ„Ž(๐œ”1 )๐œ‡(๐œ”1 ) ๏ฃบ= ๐ด ๐‘— โˆ’ โ„Ž(๐œ”1 )๐œ‡(๐œ”1 ) ๏ฃบ ๐‘‘๐œ”1 ๐‘‘๐œ”2 ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ E๏ฃฏ E๏ฃฏ ๏ฃฏ ๐‘€ ๏ฃบ ฮฉ ๏ฃฏ ๐‘€ ๐‘—=1 ๏ฃฏ ๏ฃบ ๐‘—=1 ๏ฃฏ ๏ฃฐ ๐ฟ 2 (ฮฉ) ๏ฃบ๏ฃป ๏ฃฐ ๏ฃบ ๏ฃป โˆซ 1   = Var ๐ด ๐‘— ๐‘‘๐œ”1 ๐‘‘๐œ”2 ฮฉ ๐‘€ We have Var( ๐ด ๐‘— ) = E[| ๐‘“ห†๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 )| 2 ] โˆ’ โ„Ž2 (๐œ”1 )๐œ‡2 (๐œ”1 ) โ‰ค E[| ๐‘“ห†๐‘— (๐œ”1 ) ๐œ€ห†โˆ—๐‘— (๐œ”2 ) ๐œ€ห† ๐‘— (๐œ”2 โˆ’ ๐œ”1 )| 2 ] โ‰ฒ ๐œŽ 4 E๐‘ก,๐œ [| ๐‘“ห†๐‘— (๐œ”1 )| 2 ]. Now substitute this back into the integral to get ๐œŽ4 ๐œŽ4 โˆซ โˆซ โˆซ 1 E๐‘ก,๐œ [| ๐‘“ห†๐‘— (๐œ”1 )| 2 ] ๐‘‘๐œ”1 ๐‘‘๐œ”2 = E๐œ | ๐‘“ห†๐‘— (๐œ”1 )| 2 ๐‘‘๐œ”1 ๐‘‘๐œ”2     Var ๐ด ๐‘— ๐‘‘๐œ”1 ๐‘‘๐œ”2 โ‰ฒ ฮฉ ๐‘€ ๐‘€ ฮฉ ๐‘€ ฮฉ by translation invariance of the power spectrum. Now we have ๐œŽ4 4 โˆซ โˆซ  2 ๐œŽ 2 E๐œ | ๐‘“ห†๐‘— (๐œ”1 )| ๐‘‘๐œ”1 ๐‘‘๐œ”2 = | ๐‘“ห†๐‘— (๐œ”1 )| ๐‘‘๐œ”1 ๐‘‘๐œ”2   E๐œ ๐‘€ ฮฉ ๐‘€ ฮฉ ๐œŽ4  โˆซ โˆซ  1 ห† 2 = E๐œ | ๐‘“ ( ๐œ”หœ 1 )| ๐‘‘ ๐œ”หœ 1 ๐‘‘๐œ”2 ๐‘€ 1 โˆ’ ๐œ๐‘— ๐œŽ4 โ‰ฒ๐œ,ฮฉ โˆฅ ๐‘“ โˆฅ 22 . ๐‘€ 28 For the second term, h i โˆซ หœ 1 )โˆฅ 2๐ฟ 2 (ฮฉ) โ„Ž2 (๐œ”1 )E |๐œ‡(๐œ”1 ) โˆ’ ๐œ‡(๐œ” หœ 1 )| 2 ๐‘‘๐œ”1 ๐‘‘๐œ”2   E โˆฅโ„Ž(๐œ”1 )๐œ‡(๐œ”1 ) โˆ’ โ„Ž(๐œ”1 ) ๐œ‡(๐œ” = ฮฉ โˆซ 4 หœ 1 )| 2 ๐‘‘๐œ”1 ๐‘‘๐œ”2   โ‰ฒ๐œŽ E |๐œ‡(๐œ”1 ) โˆ’ ๐œ‡(๐œ” ฮฉ ๏ฃฎ 2๏ฃน โˆซ ๏ฃฏ 1 โˆ‘๏ธ ๐‘€ ๏ฃบ = ๐œŽ4 ๐‘ฆห† ๐‘— (๐œ”1 ) โˆ’ ๐œ‡(๐œ”1 ) ๏ฃบ ๐‘‘๐œ”1 ๐‘‘๐œ”2 ๏ฃฏ ๏ฃบ E๏ฃฏ ฮฉ ๏ฃฏ๏ฃฏ ๐‘€ ๐‘—=1 ๏ฃบ ๏ฃบ ๏ฃฐ ๏ฃป Define ๐‘ ๐‘— = ๐‘ฆห† ๐‘— (๐œ”) โˆ’ ๐œ‡(๐œ”). Using similar steps to before, we get 2 โˆซ ๐‘€ ยฉ 1 โˆ‘๏ธ ยช h i E โˆฅโ„Ž(๐œ”1 )๐œ‡(๐œ”1 ) โˆ’ โ„Ž(๐œ”1 ) ๐œ‡(๐œ” หœ 1 )โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ฒ๐œŽ 4 ยญ ๐‘ ๐‘— ยฎ ๐‘‘๐œ”1 ๐‘‘๐œ”2 ฮฉ ๐‘€ ๐‘—=1 ยซ ยฌ ๐œŽ 4 โˆซ = Var(๐‘ ๐‘— ) ๐‘‘๐œ”1 ๐‘‘๐œ”2 ๐‘€ ฮฉ ๐œŽ4 โ‰ฒฮฉ (โˆฅ ๐‘“ โˆฅ 22 + ๐œŽ 2 ). ๐‘€ This means that we put everything together to conclude ๐œŽ2 ๐œŽ4 ๐œŽ6 โˆฅ ๐‘”หœ ๐œŽ โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ฒฮฉ,๐œ โˆฅ ๐‘“ โˆฅ 42 + โˆฅ ๐‘“ โˆฅ 22 + . ๐‘€ ๐‘€ ๐‘€ โ–ก Theorem 6. Assume that the assumptions of Model 2 hold, ๐ต ๐‘“ โˆˆ ๐ถ 3 (R2 ), (ยท) ๐ต c๐‘“ (ยท) decays like | ยท | ๐œ… for some ๐œ… โ‰ฅ 2, ๐‘Ÿ2 max |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)| โˆˆ ๐ฟ 2 (R2 , ๐‘‘๐‘Ÿ ร— ๐‘‘๐œƒ), ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] and ๐‘Ÿ3 max |๐œ•๐›พ๐›พ๐›พ (๐ต ๐‘“ )(๐›พ, ๐œƒ)| โˆˆ ๐ฟ 2 (R2 , ๐‘‘๐‘Ÿ ร— ๐‘‘๐œƒ). ๐›พโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] For the estimator ( ๐ต f๐‘“ )(๐‘Ÿ, ๐œƒ) defined for Model 2, ๐œŽ2 โˆจ ๐œŽ6 h i  2  2 ๐œ‚ 4 E โˆฅ๐ต ๐‘“ โˆ’ ๐ต ๐‘“ โˆฅ ๐ฟ 2 (ฮฉ) โ‰ค ๐ถ ๐‘“ ,ฮฉ f +๐ฟ + , ๐‘€ ๐ฟ2 ๐‘€ where ๐ถ ๐‘“ ,ฮฉ only depends on ๐‘“ and ฮฉ. 29 Proof. First, we see that by an argument as in Lemma 1, f๐‘“ โˆฅ 2 2 โ‰ฒ 4๐‘”๐œ‚ + ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”๐œ‚ โˆ’ 4( ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ ) โˆ— ๐œ™ ๐ฟ โˆ’ ๐‘Ÿ๐œ•๐‘Ÿ (( ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ ) โˆ— ๐œ™ ๐ฟ ) 2 โˆฅ๐ต ๐‘“ โˆ’ ๐ต ๐ฟ (ฮฉ) ๐ฟ 2 (ฮฉ) 2 2 โ‰ฒ ๐‘”๐œ‚ โˆ’ ๐‘”หœ ๐œ‚ ๐ฟ 2 (ฮฉ) + ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”๐œ‚ โˆ’ ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”หœ ๐œ‚ ๐ฟ 2 (ฮฉ) 2 + 4๐‘”หœ ๐œ‚ + ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”หœ ๐œ‚ โˆ’ 4( ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ ) โˆ— ๐œ™ ๐ฟ โˆ’ ๐‘Ÿ๐œ•๐‘Ÿ (( ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ ) โˆ— ๐œ™ ๐ฟ ) ๐ฟ 2 (ฮฉ) 2 2 โ‰ฒ ๐‘”๐œ‚ โˆ’ ๐‘”หœ ๐œ‚ ๐ฟ 2 (ฮฉ) + ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”๐œ‚ โˆ’ ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”หœ ๐œ‚ ๐ฟ 2 (ฮฉ) + โˆฅ ๐‘”หœ ๐œ‚ โˆ’ ๐‘”หœ ๐œ‚ โˆ— ๐œ™ ๐ฟ โˆฅ 2๐ฟ 2 (ฮฉ) 2 + ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”หœ ๐œ‚ โˆ’ ๐‘Ÿ๐œ•๐‘Ÿ ( ๐‘”หœ ๐œ‚ โˆ— ๐œ™ ๐ฟ ) ๐ฟ 2 (ฮฉ) + โˆฅ ๐‘”หœ ๐œŽ โˆ— ๐œ™ ๐ฟ โˆฅ 2๐ฟ 2 (ฮฉ) + โˆฅ๐‘Ÿ๐œ•๐‘Ÿ ( ๐‘”หœ ๐œŽ โˆ— ๐œ™ ๐ฟ )โˆฅ 2๐ฟ 2 (ฮฉ) By Theorem 1, h i 2 2 E ๐‘”๐œ‚ โˆ’ ๐‘”หœ ๐œ‚ ๐ฟ 2 (ฮฉ) + ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”๐œ‚ โˆ’ ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”หœ ๐œ‚ ๐ฟ 2 (ฮฉ) ๐œ‚2  2 2 2 2 2  โ‰ฒ โˆฅ(๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 + 2โˆฅ๐‘Ÿ๐œ•๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)โˆฅ 2 + โˆฅ๐‘Ÿ ๐œ•๐‘Ÿ๐‘Ÿ (๐ต ๐‘“ )(๐‘Ÿ, ๐œƒ)] โˆฅ 2 ๐‘€ ! 2 2 ๐œ‚4 + ๐‘Ÿ 2 max |๐œ•๐›ผ๐›ผ (๐ต ๐‘“ )(๐›ผ, ๐œƒ)| + ๐‘Ÿ 3 max |๐œ•๐›พ๐›พ๐›พ (๐ต ๐‘“ )(๐›พ, ๐œƒ)| . ๐‘€ ๐›ผโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 ๐›พโˆˆ[๐‘Ÿ/2,2๐‘Ÿ] 2 It suffices to bound the other four terms appropriately now. By Lemma 2, โˆฅ ๐‘”หœ ๐œ‚ โˆ’ ๐‘”หœ ๐œ‚ โˆ— ๐œ™ ๐ฟ โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ฒ โˆฅ ๐‘”หœ ๐œ‚ โˆฅ 22 ๐ฟ 4 + ๐ฟ 4โˆง(2๐œ…โˆ’2) . For ๐‘”หœ ๐œ‚ , notice that โˆซ โˆฅ๐ต ๐‘“ ๐‘— โˆฅ 22 = (1 โˆ’ ๐œ ๐‘— ) 6 |(๐ต ๐‘“ )((1 โˆ’ ๐œ ๐‘— )๐œ”1 , (1 โˆ’ ๐œ ๐‘— )๐œ”2 )| 2 ๐‘‘๐œ”1 ๐‘‘๐œ”2 R2 โˆซ 4 = (1 โˆ’ ๐œ ๐‘— ) |(๐ต ๐‘“ )(๐œ”1 , ๐œ”2 )| 2 ๐‘‘๐œ”1 ๐‘‘๐œ”2 R2 = (1 โˆ’ ๐œ ๐‘— ) 4 โˆฅ๐ต ๐‘“ โˆฅ 22 โ‰ค 24 โˆฅ๐ต ๐‘“ โˆฅ 22 . Triangle inequality implies ๐‘€ ๐‘€ 1 โˆ‘๏ธ 1 โˆ‘๏ธ โˆฅ ๐‘”หœ ๐œ‚ โˆฅ 2 = ๐ต ๐‘“๐‘— โ‰ค โˆฅ๐ต ๐‘“ ๐‘— โˆฅ 2 โ‰ฒ โˆฅ๐ต ๐‘“ โˆฅ 2 ๐‘€ ๐‘—=1 ๐‘€ ๐‘—=1 2 and โˆฅ ๐‘”หœ ๐œ‚ โˆ’ ๐‘”หœ ๐œ‚ โˆ— ๐œ™ ๐ฟ โˆฅ ๐ฟ 2 (ฮฉ) โ‰ฒ โˆฅ๐ต ๐‘“ โˆฅ 22 ๐ฟ 4 + ๐ฟ 4โˆง(2๐›ผโˆ’2) . 30 Also, by Lemma 3, โˆฅ๐‘Ÿ๐œ•๐‘Ÿ ( ๐‘”หœ ๐œ‚ โˆ’ ( ๐‘”หœ ๐œ‚ โˆ— ๐œ™ ๐ฟ ))โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ฒ โˆฅ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”หœ ๐œ‚ โˆฅ 22 ๐ฟ 4 + ๐ฟ 4โˆง(2๐›ผโˆ’2) + ๐ฟ 3 โˆฅ๐œ•๐‘Ÿ ๐‘”หœ ๐œ‚ โˆฅ 22 . Now consider the error terms. First start with โˆฅ ๐‘”หœ ๐œŽ โˆ— ๐œ™ ๐ฟ โˆฅ 2๐ฟ 2 (ฮฉ) . We have โˆฅ ๐‘”หœ ๐œŽ โˆ— ๐œ™ ๐ฟ โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ค โˆฅ๐œ™ ๐ฟ โˆฅ 2๐ฟ 1 (ฮฉ) โˆฅ ๐‘”หœ ๐œŽ โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ฒ โˆฅ ๐‘”หœ ๐œŽ โˆฅ 2๐ฟ 2 (ฮฉ) . Now we consider โˆฅ๐‘Ÿ๐œ•๐‘Ÿ ( ๐‘”หœ ๐œŽ โˆ— ๐œ™ ๐ฟ )โˆฅ ๐ฟ 2 (ฮฉ) . Let ๐‘…ฮฉ = maxฮฉ |๐‘Ÿ |. We have โˆฅ๐‘Ÿ๐œ•๐‘Ÿ ( ๐‘”หœ ๐œŽ โˆ— ๐œ™ ๐ฟ )โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ค ๐‘…ฮฉ 2 โˆฅ ๐‘”หœ ๐œŽ โˆ— ๐œ•๐‘Ÿ ๐œ™ ๐ฟ โˆฅ 2๐ฟ 2 (ฮฉ) 2 โ‰ค ๐‘…ฮฉ โˆฅ๐œ•๐‘Ÿ ๐œ™ ๐ฟ โˆฅ 21 โˆฅ ๐‘”หœ ๐œŽ โˆฅ 2๐ฟ 2 (ฮฉ) . Since ๐œ™ is a radial filter and we know the filter h 2 2 i 2 2 ๐œ•๐‘Ÿ ๐œ™ ๐ฟ (๐‘Ÿ) = (2๐œ‹๐ฟ 2 ) โˆ’1/2 ๐œ•๐‘Ÿ ๐‘’ โˆ’๐‘Ÿ /(2๐ฟ ) = โˆ’(2๐œ‹๐ฟ 3 ) โˆ’1/2๐‘Ÿ๐‘’ โˆ’๐‘Ÿ /(2๐ฟ ) . It follows that โˆซ โˆž 1 2 /(2๐ฟ 2 ) โˆฅ๐œ•๐‘Ÿ ๐œ™ ๐ฟ โˆฅ 1 = 3 ๐‘Ÿ๐‘’ โˆ’๐‘Ÿ ๐‘‘๐‘Ÿ = ๐ฟ โˆ’1 ๐ฟ 0 and โˆฅ๐œ•๐‘Ÿ ๐œ™ ๐ฟ โˆฅ 21 = ๐ฟ โˆ’2 . We can also use our previous bound to get ๐œŽ2 โˆจ ๐œŽ6   โˆ’2 โˆฅ๐‘Ÿ๐œ•๐‘Ÿ ( ๐‘”หœ ๐œŽ โˆ— ๐œ™ ๐ฟ )โˆฅ 2๐ฟ 2 (ฮฉ) โ‰ฒ๐œ,ฮฉ, ๐‘“ ๐ฟ . ๐‘€ This finishes the proof of the theorem since each term is dependent on ๐ฟ, ๐‘€, and ๐œ‚2 now. โ–ก   ๐œŽ6 ๐œŽ6 Note that we can choose ๐ฟ = ๐œŽ ๐‘€ 1/6 to get a bound of ๐‘‚ ๐‘€ when ๐œŽ โ‰ฅ 1. We let ๐ฟ 4 = ๐ฟ2 ๐‘€ , 2 ๐œŽ4 and this gives a convergence rate of ๐‘‚ ( ๐œ‚๐‘€ + ๐‘€ 2/3 ) on the squared error. 2.7 Numerical Implementation of Bispectrum Recovery Now that we have theoretical results to recover the bispectrum, we will use these results for bispectrum recovery. After showing success for signal recovery, we will use our recovery results for signal inversion with the phase synchronization algorithm from [7], which will require a recovered power spectrum and recovered bispectrum, which will motivate our current approach of doing power spectrum recovery first. 31 We can discuss computing (2.12) numerically now. We cannot compute (๐ผ โˆ’ ๐ฟ ๐ถ0 ) โˆ’1 , but we can still solve for the bispectrum by means of optimization. Similar to how we constructed the estimator in the finite sample case, we will consider the infinite sample case first and derive an optimization procedure. Based on this procedure, we will design another optimization procedure for the finite sample case that is a good estimator when ๐‘€ is large. In the infinite sample limit, we have access to the term ๐‘‘ (๐‘Ÿ, ๐œƒ) = 4๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ๐œ•๐‘Ÿ ๐‘”๐œ‚ (๐‘Ÿ, ๐œƒ) (2.13) and Proposition 4 implies that we can recover ๐‘‘ by solving the convex optimization problem ๐‘” = argmin๐‘”หœ โˆฅ(๐ผ โˆ’ ๐ฟ ๐ถ0 ) ๐‘”หœ โˆ’ ๐ถ1 ๐ฟ ๐ถ2 ๐‘‘ โˆฅ 22 , (2.14) where the constants ๐ถ0 , ๐ถ1 , ๐ถ2 depend on ๐œ‚ as given in (2.5). Note that for fixed ๐œ‚, this problem is convex. Since the variance of the dilations is not necessarily known, ๐œ‚ is possibly an unknown parameter and we must actually minimize 2 ๐”( ๐‘”, หœ = (๐ผ โˆ’ ๐ฟ ๐ถ0 (๐œ‚)หœ ) ๐‘”หœ โˆ’ ๐ถ1 ( ๐œ‚)๐ฟ หœ ๐œ‚) หœ ๐ถ2 (๐œ‚)หœ ๐‘‘ 2 . In [26], the authors find ๐œ‚ via a nonconvex optimization problem while recovering the power spectrum. We could mimic a similar process for bispectrum recovery, but it is likely this process would fail. The reason for this is that the optimization problem for bispectrum recovery has a larger number of variables, so memory limited methods are necessary. In addition, we also need a recovered power spectrum for our bispectrum inversion algorithm anyway, so we would still need to perform power spectrum recovery. We will choose to learn ๐œ‚ and recover the power spectrum using a modification of the algorithm from [26] and describe the steps below. For recovering the power spectrum, our data term is given by ๐‘ data (๐œ”) = 3๐‘ ๐œ‚ (๐œ”) + ๐œ”๐‘โ€ฒ๐œ‚ (๐œ”) , (2.15) and we can consider minimizer โˆฅ(๐ผ โˆ’ ๐‘†๐ถ0 ) ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data โˆฅ 22 ๐‘ = argmin ๐‘หœ , (2.16) ๐œ‚2 32 Since ๐œ‚ may be unknown, we use the loss function: หœ ๐ถ2 (๐œ‚)หœ ๐‘ data โˆฅ 22  โˆฅ ๐ผ โˆ’ ๐‘†๐ถ0 (๐œ‚)หœ ๐‘หœ โˆ’ ๐ถ1 ( ๐œ‚)๐‘† L ( ๐‘, หœ ๐œ‚) หœ = . (2.17) ๐œ‚2 Theorem 7. Define the operator ๐ด = ๐ผ โˆ’ ๐‘†๐ถ0 . For the loss function in (2.17), we have 2๐ดโˆ— ( ๐ด ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data ) โˆ‡ ๐‘หœ L ( ๐‘, หœ ๐œ‚) หœ = ๐œ‚2 โˆซ 1 ๐œ• โˆ‡๐œ‚หœ L ( ๐‘, หœ ๐œ‚) หœ = 2 2( ๐ด ๐‘(๐œ”) หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data (๐œ”)) ( ๐ด ๐‘(๐œ”) หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data (๐œ”)) ๐‘‘๐œ” ๐œ‚ ๐œ• ๐œ‚หœ 2 โˆ’ 3 โˆฅ ๐ด ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data โˆฅ 22 . ๐œ‚ Proof. The following computation is almost identical to [26], but we provide the details for com- pleteness. We start with โˆ‡ ๐‘หœ L ( ๐‘,หœ ๐œ‚)หœ and fix ๐œ‚. หœ We will ignore the ๐œ‚หœ dependence for this part of the computation. The Frechet derivative is โˆฅ ๐ด ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data โˆฅ 22 L ( ๐‘) หœ = = ๐‘ ( ๐ด ๐‘)หœ , ๐œ‚2 where ๐‘ ๐‘“ = โˆฅ ๐‘“ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data โˆฅ 22 . Let โ„Ž be a test function. Then we have (๐ทL)( ๐‘)โ„Žหœ = (๐ท๐‘)( ๐ด ๐‘) หœ โ—ฆ ๐ท ( ๐ด ๐‘)โ„Ž หœ = (๐ท๐‘)( ๐ด ๐‘) หœ โ—ฆ ๐ดโ„Ž since ๐ด is a linear operator. It follows that 2 |๐‘ ( ๐‘“ + โ„Ž) โˆ’ ๐‘ ๐‘“ โˆ’ ๐œ‚2 โŸจ๐‘“ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data , โ„ŽโŸฉ| 1 โˆฅโ„Žโˆฅ 22 = โ†’0 โˆฅโ„Žโˆฅ 2 ๐œ‚2 โˆฅโ„Žโˆฅ 2 2 as โˆฅโ„Žโˆฅ 2 โ†’ 0, and (๐ท๐‘)( ๐‘“ )โ„Ž = ๐œ‚2 โŸจ๐‘“ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data , โ„ŽโŸฉ. Thus 2 (๐ทL)( ๐‘)โ„Ž หœ = โŸจ๐ด ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data , ๐ดโ„ŽโŸฉ ๐œ‚2 2 = โŸจ 2 ๐ดโˆ— ( ๐ด ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data ), โ„ŽโŸฉ. ๐œ‚ It now follows that 2 โˆ— โˆ‡L ( ๐‘) หœ = ๐ด ( ๐ด ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data ) . (2.18) ๐œ‚2 33 Additionally, using the work above, we have ๐œ‚2 โˆ‡๐œ‚ (โˆฅ ๐ด ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data โˆฅ 22 ) โˆ’ 2๐œ‚โˆฅ ๐ด ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data โˆฅ 22 โˆ‡๐œ‚หœ L ( ๐‘, หœ ๐œ‚) หœ = ๐œ‚4 โˆซ 1 ๐œ• = 2( ๐ด หœ ๐‘(๐œ”) โˆ’ ๐ถ 1 ๐‘† ๐ถ2 ๐‘ data (๐œ”)) ( ๐ด ๐‘(๐œ”) หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data (๐œ”)) ๐‘‘๐œ” ๐œ‚2 ๐œ• ๐œ‚หœ 2 โˆ’ 3 โˆฅ ๐ด ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data โˆฅ 22 . ๐œ‚ โ–ก For this implementation, we need to calculate ๐ดโˆ— , which we do analytically below. We have โˆซ โŸจ๐ด๐‘”, โ„ŽโŸฉ = ๐ด๐‘”(๐œ”)โ„Ž(๐œ”) ๐‘‘๐œ” โˆซ = (๐‘”(๐œ”) โˆ’ ๐ถ03 ๐‘”(๐ถ0 ๐œ”))โ„Ž(๐œ”) ๐‘‘๐œ” โˆซ    2 ๐œ”หœ = หœ โ„Ž( ๐œ”) ๐‘”( ๐œ”) หœ โˆ’ ๐ถ0 โ„Ž ๐‘‘ ๐œ”หœ ๐ถ0 = โŸจ๐‘”, ๐ดโˆ— โ„ŽโŸฉ. Thus,   โˆ— ๐œ” ๐ด โ„Ž(๐œ”) = โ„Ž(๐œ”) โˆ’ ๐ถ02 โ„Ž . (2.19) ๐ถ0 For implementations, one only has access to the estimate of ๐‘ ๐œ‚ , so numerical implementations will use the following estimate for the data term: ๐‘edata (๐œ”) := 3( ๐‘e๐œ‚ + ๐‘e๐œŽ ) โˆ— ๐œ™ ๐ฟ (๐œ”) + ๐œ” ( ๐‘e๐œ‚ + ๐‘e๐œŽ ) โˆ— ๐œ™โ€ฒ๐ฟ (๐œ”)    2 Le( ๐‘, หœ := ๐ผ โˆ’ ๐‘†๐ถ0 (๐œ‚)หš ๐‘หœ โˆ’ ๐ถ1 ( ๐œ‚)๐‘† หœ ๐œ‚) หœ ๐ถ2 (๐œ‚)หœ ๐‘edata 2 , and recovered power spectrum is computed by minimizing L. e After recovering the power spectrum and an estimate for ๐œ‚, the problem for recovering the bispectrum is ๐‘” = argmin๐‘”หœ โˆฅ(๐ผ โˆ’ ๐ฟ ๐ถ0 (๐œ‚) ) ๐‘”หœ โˆ’ ๐ถ1 (๐œ‚)๐ฟ ๐ถ2 (๐œ‚) ๐‘‘ โˆฅ 22 . (2.20) for a fixed estimate ๐œ‚, หœ which is a convex optimization problem. We let ๐‘Œ = ๐ผ โˆ’ ๐ฟ ๐ถ0 . By a proof identical to above, we get หœ = 2๐‘Œ โˆ— (๐‘Œ ๐‘”หœ โˆ’ ๐ถ1 ๐ฟ ๐ถ2 ๐‘‘) โˆ‡๐‘”หœ ๐”( ๐‘”) 34 Additionally, we have ๐‘Œ โˆ— โ„Ž(๐œ”1 , ๐œ”2 ) = โ„Ž(๐œ”1 , ๐œ”2 ) โˆ’ ๐ถ02 โ„Ž(๐ถ0โˆ’1 ๐‘ค 1 , ๐ถ0โˆ’1 ๐‘ค 2 )). (2.21) Like before, for numerical applications, we only have access to a finite number of samples, so will have to have to modify our data term and loss function based on the estimator provided in Model 2: ๐‘‘หœ(๐‘Ÿ, ๐œƒ) = 4( ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ ) โˆ— ๐œ™ ๐ฟ (๐‘Ÿ, ๐œƒ) + ๐‘Ÿ [( ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ ) โˆ— ๐œ•๐‘Ÿ ๐œ™ ๐ฟ ] (๐‘Ÿ, ๐œƒ) หœ ๐‘”) 2 ๐”( หœ = (๐ผ โˆ’ ๐ฟ ๐ถ0 ) ๐‘”หœ โˆ’ ๐ถ1 ๐ฟ ๐ถ2 ๐‘‘หœ 2 . With this surrogate loss and data term, we can conduct numerical experiments in the next section. 2.8 Numerical Experiments for Bispectrum Recovery We will now test our bispectrum recovery algorithm on the following signals: 2 ๐‘“1 (๐‘ฅ) = ๐ด1 ๐‘’ โˆ’5๐‘ฅ cos(4๐‘ฅ) 2 ๐‘“2 (๐‘ฅ) = ๐ด2 ๐‘’ โˆ’5๐‘ฅ cos(8๐‘ฅ) 2 ๐‘“3 (๐‘ฅ) = ๐ด3 ๐‘’ โˆ’5๐‘ฅ cos(12๐‘ฅ) ๐‘“ห†4 (๐‘ฅ) = ๐ด4 1 [โˆ’1/8,1/8] (๐‘ฅ) ๐‘“5 (๐‘ฅ) = ๐ด5 sinc(4๐‘ฅ) ๏ฃฑ ๏ฃฒ 2 โˆ’ 2|๐‘ฅ|, |๐‘ฅ| < 1 ๏ฃด ๏ฃด ๏ฃด ๐‘“6 (๐‘ฅ) = ๐ด6 . ๏ฃด ๏ฃด 0, ๏ฃด otherwise ๏ฃณ Like in [27], we define our hidden signals on [โˆ’๐‘/4, ๐‘/4], and the noisy signals are defined on [โˆ’๐‘/2, ๐‘/2] with ๐‘ = 25 . In frequency, the signals were sampled on the interval [โˆ’2โ„“ ๐œ‹, 2โ„“ ๐œ‹] with โ„“ = 4 and sampling rate of ๐œ‹/๐‘. The constants ๐ด๐‘– with ๐‘– = 1, . . . , 6 were chosen so that the SNR โˆซ  ๐‘/2 for each ๐‘“๐‘– was set to be ๐œŽ โˆ’2 , where SNR = โˆ’๐‘/2 | ๐‘“ (๐‘ฅ)| 2 ๐‘‘๐‘ฅ /๐œŽ 2 . All signals, other than ๐‘“4 , were generated in space. We chose to generate ๐‘“4 in frequency. We chose to generate ๐‘“4 in frequency because of aliasing. Regarding our signals, we chose our signals so that they could test the robustness of our proposed method. The signals ๐‘“1 to ๐‘“3 are smooth with fast decay, which do not fit the assumptions to employ 35 our proposed estimators. Nonetheless, ๐‘“1 to ๐‘“3 still perform well, which is most likely because of their exponential decay rate. However, we note that as the peak of the power spectrum is farther from the origin, our problem becomes much harder problem because the dilations cause larger perturbations. This is shown in Figure 2.4. We start with the case of oracle ๐œ‚ (i.e. ๐œ‚ is known) and consider two cases: ๐œŽ = 0.5 and ๐œŽ = 1.0. Error plots are shown in Figures 2.4 and 2.5. All oracle experiments were run with ๐œ‚ = 12โˆ’1/2 and ๐œŽ 1/6 4 with a gaussian width of ๐ฟ = 5( ๐‘€ ) for bispectrum recovery and width ๐ฟ = 10( ๐œŽ๐‘€ ) 1/6 for power spectrum recovery. (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.4 Relative error decay with standard error bars for Bispectrum Recovery using Model 2 under the assumption of oracle ๐œ‚ = 12โˆ’1/2 with ๐œŽ = 0.5. 36 (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.5 Relative error decay with standard error bars for Bispectrum Recovery using Model 2 under the assumption of oracle ๐œ‚ = 12โˆ’1/2 and ๐œŽ = 1.0. For all the signals, in the case where one only does empirical noise centering on the average bispectra, which is marked in blue, there is a diminishing return in performance for large enough ๐‘€. This is most likely because the blue estimator has a dilation bias, and a large sample size will not be able to overcome this. It will only be able to overcome the additive noise bias, so eventually the blue line will plateau. In comparison, our inversion unbiasing procedure,which is marked in red, demonstrates a continual drop in error for both choices of ๐œŽ, except for ๐‘“5 . Additionally, note that the red decay line does not plateau, but is linear on the log-log plot, supporting the claim of Theorem 5 that ๐ต f๐‘“ is truly an unbiased estimator. The poor performance of ๐‘“5 stems from the fact that it does not obey the assumptions of Model 2 since sinc is not a compactly supported function. The same is true for each of the Gabor functions, but their decay is exponential rather than polynomial like the sinc. Bispectrum recovery examples for ๐œŽ = 0.5 are given in Figures 2.6 and 2.7. Note that in all 37 the examples, once can "undilate" the additive noise unbiased bispectrum average. The results are similar for ๐œŽ = 1.0, but they are not provided in this thesis. Figure 2.6 Example plots of recovery for ๐‘“1 , ๐‘“2 , and ๐‘“3 under the assumption of oracle ๐œ‚ = 12โˆ’1/2 with ๐œŽ = 0.5. Left: ground truth. Middle: Only Additive Noise Unbiasing. Our Unbiasing Procedure. 38 Figure 2.7 Example plots of recovery for ๐‘“4 , ๐‘“5 , and ๐‘“6 under the assumption of oracle ๐œ‚ = 12โˆ’1/2 with ๐œŽ = 0.5. Left: ground truth. Middle: Only Additive Noise Unbiasing. Our Unbiasing Procedure. We now consider the case where ๐œ‚ is unknown and estimated. Again, we have two cases: ๐œŽ = 0.5, and ๐œŽ = 1.0. For estimating ๐œŽ, since all the signals decay away from the origin, the values of the power spectrum average should be near zero, sans noise. Thus, we find the variance of the function values on the edge of the signal. All experiments will be run with ๐œ‚ = 12โˆ’1/2 . Since we now need to estimate the power spectrum, we let the width for the smoothing parameter for 4 power spectrum recovery be 5( ๐œŽ๐‘€ ) 1/6 and use the smoothing parameter 5( ๐‘€ ๐œŽ 1/6 ) for the bispectrum 39 recovery. Note that the smoothing parameter for the power spectrum is smaller than the oracle 4 4 case. We found that 10( ๐œŽ๐‘€ ) 1/6 and 5( ๐œŽ๐‘€ ) 1/6 yielded similar results, so we decided to not change the parameter. In Figure 2.8, an error plot based on the number of samples is shown with ๐œŽ = 0.5; in Figure 2.9, a similar error plot is shown with ๐œŽ = 1.0. With regards to the estimation of ๐œ‚, the process given in [26] was unreliable for all sample sizes. Our results show that our modified loss function yields a more reliable estimate of ๐œ‚ for large ๐‘€. See Figures 2.20 and 2.21. (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.8 Relative error decay with standard error bars for Bispectrum Recovery using Model 2 without prior knowledge of ๐œ‚ and with ๐œŽ = 0.5. 40 (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.9 Relative error decay with standard error bars for Bispectrum Recovery using Model 2 without prior knowledge of ๐œ‚ and with ๐œŽ = 1.0. Our results indicate that we can recovery the bispectrum with a reasonable degree of accuracy. The question is: do we have enough accuracy for full signal inversion? 2.9 Bispectrum Inversion and Hidden Signal Recovery For hidden signal recovery, the tools we will need are a power spectrum recovery algorithm, a bispectrum recovery algorithm, and a bispectrum inversion algorithm. We use [26] for power spectrum recovery and ๐œ‚ estimation, our proposed bispectrum recovery algorithm, and the iterative phase synrchonization algorithm [7] for bispectrum inversion. The general outline of the recovery and inversion process is in Algorithm 2.2. The idea behind algorithm 2.2 is that we have already recovered the magnitudes of the target signal via power spectrum recovery. Thus, if we are able to recover the corresponding phase measurements, will recover the original signal. The algorithm we use for phase recovery is iterative phase synchronization. Suppose that the 41 bispectrum of our hidden signal is given by ๐ต, and our estimate of the phases for ๐ต is given by ๐ต. หœ Suppose we have an estimate of the phase, ๐‘ฆหœ , say ๐‘ฆหœ ๐‘˜โˆ’1 that is close to the ground truth. Then we should have ๐ตหœ โ—ฆ๐‘‡ ( ๐‘ฆหœ ๐‘˜โˆ’1 ) โ‰ˆ ๐‘ฆหœ ๐‘˜โˆ’1 ๐‘ฆหœ โˆ—๐‘˜โˆ’1 , where ๐‘‡ (๐‘ฆ) is the circulant matrix for the vector ๐‘ฆ and โ—ฆ is the elementwise product of two matrices. We then approximate the phases via solving the optimization problem: argmax๐‘งโˆˆC๐‘› Re{๐‘งโˆ— ๐ต โ—ฆ ๐‘‡ (๐‘ฆ)๐‘ง} subject to |๐‘ง[โ„“]| = 1โˆ€โ„“. However, this solution is incorrect by a global phase. We need additional information, namely ๐‘“ห†(0), or some estimate of it. Algorithm 2.1 describes the iterative phase synchronization algorithm: Algorithm 2.1 Iterative Phase Synchronization 1: INPUT: normalized bispectrum ๐ต, ห† estimation of phase of ๐‘“ห†(0), given by ๐‘ฆยฏ (0). 2: OUTPUT: Phase of Signal, ๐‘ฆหœ . 3: Let ๐‘˜ = 0. 4: while Stopping Criterion does not occur do 5: Increase ๐‘˜ by 1. ๐‘ฆห† ๐‘˜ โ†โˆ’ argmax๐‘งโˆˆC๐‘› Re{๐‘งโˆ— ๐ตห† โ—ฆ ๐‘‡ ( ๐‘ฆห† ๐‘˜ )๐‘ง} subject to ๐‘ง[โ„“]| = 1 โˆ€โ„“. 6: ๐‘ฆหœ (0) ๐‘ฆห† ๐‘˜ โ†โˆ’ ๐‘ฆห† ๐‘˜ ยท . ๐‘ฆห† (0) 7: If the signal is real, symmetrize it. 8: end while Other methods, such as frequency marching [7], local non-convex optimization over the man- ifold of phases [7], and semidefinite relaxation [4]. Frequency marching and iterative phase synchronization have been found the have similar empirical performance. However, iterative phase synchronization requires less assumptions, namely one does not need ๐‘“ห†(1) โ‰  0. We have found that local non-convex optimization over the manifold of phases did not yield good results in our experiments, and semidefinite programming was infeasible due to memory requirements. 42 Algorithm 2.2 Hidden Signal Recovery Algorithm 1: INPUTS: noisy signals {๐‘ฆ ๐‘— } 2: Calculate ๐‘“หœ = ๐‘— ๐‘ฆห† ๐‘— , ๐‘หœ๐œ‚ + ๐‘หœ ๐œŽ , ๐‘”หœ ๐œ‚ + ๐‘”หœ ๐œŽ . ร 3: Estimate the additive noise level, ๐œŽ, หœ using the power spectrum on the edge of the signal. 4: Perform additive noise centering on the power spectrum and bispectrum. 5: if Eta Known then 6: Recover the power spectrum via solving the convex optimization problem argmin ๐‘หœ โˆฅ(๐ผ โˆ’ ๐‘†๐ถ0 ) ๐‘หœ โˆ’ ๐ถ1 ๐‘†๐ถ2 ๐‘ data โˆฅ 22 . 7: else if Eta Unknown then 8: Estimate the  power  spectrum and ๐œ‚ via solving the nonconvex optimization problem โˆฅ ๐ผโˆ’๐‘†๐ถ0 ( ๐œ‚) หœ 1 ( ๐œ‚)๐‘† 2 หœ ๐‘โˆ’๐ถ หœ ๐‘ data โˆฅ 2 หœ ๐ถ2 ( ๐œ‚) argmin ๐‘,หœ ๐œ‚หœ ๐œ‚2 . 9: end if 10: Recover the bispectrum via solving the convex optimization problem argmin๐‘”หœ โˆฅ(๐ผ โˆ’ ๐ฟ ๐ถ0 ) ๐‘”หœ โˆ’ ๐ถ1 ๐ฟ ๐ถ2 ๐‘‘ โˆฅ 22 . 11: Apply APS to recover the original signal using ๐‘“หœ(0), the recovered power spectrum, and the recovered bispectrum. To illustrate the difficulty of this process, we provide a ground truth example for ๐‘“5 in Figure 2.10 and four observations of ๐‘“5 in Figure 2.11. One can see that the observations do not resemble the actual signal, even at a low noise levels. Figure 2.10 Ground Truth Signal for ๐‘“5 . 43 (a) ๐‘“3 Figure 2.11 Corrupted Samples for ๐‘“5 . The top row is ๐œŽ = 0.5 and the bottom row is ๐œŽ = 1.0. We test our results on the signals from the previous section with ๐‘€ = 220 samples with ๐œ‚ = 12โˆ’1/2 . We start with an oracle ๐œ‚. The results for ๐œŽ = 0.5 are given in Figure 2.12 and example inversion plots are given in Figure 2.13. 44 (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.12 Relative error decay with standard error bars for bispectrum inversion using Model 2 under the assumption of oracle ๐œ‚ = 12โˆ’1/2 with ๐œŽ = 0.5. The blue lines are for inversion using additive noise centered versions of the average bispectrum and average power spectrum, and the red lines are using our inversion unbiasing procedure. One can see that there is a huge gain in performance for the first five signals. However, for ๐‘“6 , there is no gain in performance. We hypothesize this occurs because the power spectrum and bispectrum of ๐‘“6 have a peak around the origin. Dilating the power spectrum and bispectrum will not have a large effect on the shape of the signal, since their support is restricted to very low frequencies. This behavior is observed in Figure 2.7 as well. 45 (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.13 Bispectrum inversion results using Model 2 under the assumption of oracle ๐œ‚ = 12โˆ’1/2 with ๐œŽ = 0.5. "Rec" is using our inversion unbiasing procedure and "NO UB" is using the the centered averages. "RE" stands for the relative error. Note that while the relative error between the signals is similar in a few cases, such as ๐‘“1 and ๐‘“5 , the quality of the recovery results cannot be fully judged by using relative error. For instance, in ๐‘“1 and ๐‘“5 , one can see that the general shape of the signals is incorrect without the inversion unbiasing, but the relative error is low because the recovered signal is relatively smooth. On the other hand, with the inversion unbiasing, the recovered signal has the right general shape, but is not very smooth. Also, with regards to ๐‘“5 , the assumptions of the model are not met, so the drop in performance is expected. For this specific run, for the high frequency signal ( ๐‘“3 ), the recovery error decreased from 0.45 without unbiasing to 0.09 with unbiasing. Additionally for ๐‘“2 and ๐‘“4 , the unbiasing procedure decreased the error error by over 50%. For ๐‘“1 and ๐‘“6 we do not see much improvement, but that is as expected since they are supported in the low frequencies; for ๐‘“5 there is a moderate improvement, but the bispectrum estimation was less reliable because model assumptions were not met. 46 Now we consider the case where ๐œŽ = 1.0. The error decay plots are given in Figure 2.14 and corresponding inversion plots are given in Figure 2.15. (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.14 Relative error decay with standard error bars for bispectrum inversion using Model 2 under the assumption of oracle ๐œ‚ = 12โˆ’1/2 with ๐œŽ = 1.0. Notice that the gap in relative error between not using the unbiasing procedure and using the unbaising procedure has decreased. This suggests that the bispectrum inversion algorithm we have used is sensitive to noise. This also explains the slight performance drop we have observed. 47 (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.15 Bispectrum inversion results using Model 2 under the assumption of oracle ๐œ‚ = 12โˆ’1/2 with ๐œŽ = 1.0. Surprisingly, our error has not increased too much except in ๐‘“4 . For ๐‘“4 , we notice that noise in the signal is most notable around the discontinuities of the function. There is a similar phenomenon in ๐‘“6 . At the end of each triangle and the peak of the triangle, we observe more noise. This suggests that the bispectrum inversion algorithm has a hard time learning discontinuities of the signal, which is expected. Now, we consider the case where ๐œ‚ is unknown and estimated again. For ๐œŽ = 0.5, the error decay plots are in Figure 2.16 and the inversion results are in Figure 2.17. For ๐œŽ = 1.0, the error decay plots are in Figure 2.18 and the inversion results are in Figure 2.19. Surprisingly, the results are similar to the corresponding oracle plots in many cases. 48 (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.16 Relative error decay with standard error bars for bispectrum inversion using Model 2 with ๐œŽ = 0.5 and no prior knowledge of ๐œ‚. 49 (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.17 Bispectrum inversion results using Model 2 with ๐œŽ = 0.5 and no prior knowledge of ๐œ‚. 50 (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.18 Relative error decay with standard error bars for bispectrum inversion using Model 2 with ๐œŽ = 1.0 and no prior knowledge of ๐œ‚. 51 (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.19 Bispectrum inversion results using Model 2 with ๐œŽ = 1.0 and no prior knowledge of ๐œ‚. 52 Finally, here are error plots for the estimation of ๐œ‚ in the empirical case. Figure 2.20 is for ๐œŽ = 0.5 and Figure 2.21 is for ๐œŽ = 1.0. Note that the improvement tends to plateau once we have more than 214 samples for most of our cases. Note that the ๐œ‚ estimation was subpar for ๐‘“4 . We believe this is because the signal is not continuous, but we have not done rigorous tests to confirm this. Nonetheless, the bispectrum recovery results were still adequate in the oracle case. (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.20 Mean Relative Error in estimating ๐œ‚ with ๐œŽ = 0.5. 53 (a) ๐‘“1 (b) ๐‘“2 (c) ๐‘“3 (d) ๐‘“4 (e) ๐‘“5 (f) ๐‘“6 Figure 2.21 Mean Relative Error in estimating ๐œ‚ with ๐œŽ = 1.0. 54 2.10 Conclusions and Future Work In this chapter, we have provided a solution for the dilation MRA model by recovering the bispectrum of the noisy observations. After recovering the bispectrum, we perform full signal inversion and observe competitive results compared to other methods without dilations. One natural extension is to work in two dimensions. That is, we want to recover a signal ๐‘“ : R2 โ†’ R from many noisy observations that have been randomly translated, dilated, rotated, and corrupted by additive noise. Define the rotation matrix ยฉcos ๐œƒ โˆ’ sin ๐œƒ ยช ๐‘…๐œƒ = ยญยญ ยฎ. ยฎ (2.22) sin ๐œƒ cos ๐œƒ ยซ ยฌ A formal description is now given by Model 4. Suppose we have ๐‘€ independent observations of a function ๐‘“ โˆˆ ๐ฟ 2 (R2 ) defined by ๐‘ฆ ๐‘— (๐‘ฅ) = ๐‘“ (๐‘…๐œƒโˆ’1๐‘— (1 โˆ’ ๐œ ๐‘— ) โˆ’1 (๐‘ฅ โˆ’ ๐‘ก ๐‘— )) + ๐œ€ ๐‘— (๐‘ฅ) := ๐‘“ ๐‘— (๐‘ฅ) + ๐œ€ ๐‘— (๐‘ฅ), 1โ‰ค ๐‘— โ‰ค๐‘€ Furthermore, assume that โ€ข supp( ๐‘“ ๐‘— ) โŠ‚ [โˆ’ 12 , 12 ] 2 for 1 โ‰ค ๐‘— โ‰ค ๐‘€. โ€ข {๐‘ก ๐‘— } ๐‘€ ๐‘—=1 are independent samples of a random variable ๐‘ก โˆˆ R. โ€ข {๐œƒ ๐‘— } ๐‘€๐‘—=1 are independent samples of a uniformly distributed random variable ๐œƒ โˆˆ [โˆ’๐œ‹, ๐œ‹). โ€ข {๐œ ๐‘— } ๐‘€ ๐‘—=1 are independent samples of a uniformly distributed random variable ๐œ โˆˆ R satisfying 1 E[๐œ] = 0 and Var(๐œ) = ๐œ‚2 โ‰ค 12 . โ€ข {๐œ€ ๐‘— (๐‘ฅ)} ๐‘€ 1 1 2 ๐‘—=1 are independent white noise processes on [โˆ’ 2 , 2 ] with variance ๐œŽ . 2 Can we recover ๐‘“ ? The main difficulty in this model comes from the addition of the rotations, like mentioned in previous sections. Even the noiseless case itself is difficult because one cannot simply align the signals anymore. Intuitively, one has to first rotate all the signals in place, but this is not very feasible. The corresponding noiseless model is given by Model 5. Suppose we have ๐‘€ independent observations of a function ๐‘“ โˆˆ ๐ฟ 2 (R2 ) defined by ๐‘“ ๐‘— (๐‘ฅ) = ๐‘“ (๐‘…๐œƒโˆ’1๐‘— (1 โˆ’ ๐œ ๐‘— ) โˆ’1 (๐‘ฅ โˆ’ ๐‘ก ๐‘— )) 1โ‰ค ๐‘— โ‰ค๐‘€ 55 Furthermore, assume that โ€ข supp( ๐‘“ ๐‘— ) โŠ‚ [โˆ’ 12 , 12 ] 2 for 1 โ‰ค ๐‘— โ‰ค ๐‘€. โ€ข {๐‘ก ๐‘— } ๐‘€ ๐‘—=1 are independent samples of a random variable ๐‘ก โˆˆ R. โ€ข {๐œƒ ๐‘— } ๐‘€๐‘—=1 are independent samples of a uniformly distributed random variable ๐œƒ โˆˆ [โˆ’๐œ‹, ๐œ‹). โ€ข {๐œ ๐‘— } ๐‘€ ๐‘—=1 are independent samples of a uniformly distributed random variable ๐œ โˆˆ R satisfying 1 E[๐œ] = 0 and Var(๐œ) = ๐œ‚2 โ‰ค 12 . Can we recover ๐‘“ ? We will start with Model 4 and generalize our results to Model 5, like in the one dimensional case. 56 CHAPTER 3 NONLINEAR HEEGER-BERGEN TEXTURE SYNTHESIS 3.1 Background on Texture Synthesis Texture synthesis is the process of generating an image from a reference image by taking advantage of its statistical properties. The new texture should have similar qualities as the reference texture, but look different. In other words, for example, the same reference texture, a rotation of the reference texture, or a rearrangement of the reference texture should not be the result after synthesis. An example of texture synthesis is given in Figure 3.1. Figure 3.1 "Example" is the reference texture and "Generated" is the synthesized texture. Note that the images have the same perceptual qualities, but are not the same image. While this task can be phrased simply, one difficulty in this task is that traditional image metrics do not lead to good synthesis results. For example, if one were to minimize the MSE loss between a reference image ๐ผ ๐‘… and synthesized image ๐ผ๐‘† via solving the optimization problem ๐ผ๐‘† = argmin ๐ผห†๐‘† โˆฅ ๐ผห†๐‘† โˆ’ ๐ผ ๐‘… โˆฅ 22 , the result would be same texture. To illustrate why such a metric is difficult, we will provide some examples. Consider Figure 3.2. The synthesis in the last two columns have similar quality compared to the reference, and itโ€™s clear that both textures are not simply repetitions of the reference texture. 57 Figure 3.2 Left: Original Texture. Middle: Synthesis using [24]. Right: Synthesis using [20]. One could argue that dots are more solid in the middle column, which matches with the reference texture, so the synthesis is better. However, this is a subjective argument without any quantitative backing. Additionally, consider Figure 3.3. One can see that the middle column has worse alignment compared to the reference texture on the left, so itโ€™s clear that the synthesis on the right is higher quality since there is no repetition. Figure 3.3 Left: Original Texture. Middle: Synthesis using [24]. Right: Synthesis using [20]. The synthesis in the right column of Figure 3.3 captures what we will call "long range con- straints" in an image. That is, it is able to capture macroscopic features that the middle column of Figure 3.3 cannot capture. To approach this problem, there are two factors to consider. First, can one find a representation that captures texture? If so, what is a good measure for measuring similarity between textures? One needs to account for long range constraints without creating a 58 texture too similar to the original reference texture. One approach, which we will generalize in this chapter, is to update a white noise by matching the histograms of wavelet coefficients between a reference image and the white noise. First, we will need to introduce the filter bank we used and the wavelet transform. 3.2 Filter Construction In this section we describe the filter bank used in the Heeger-Bergen algorithm. This is the analytic description of the filters that does not rely on down sampling. We are following the presentation in [10], but some of the notation is changed so care should be taken when comparing the two. All filters are defined in frequency in the frequency box [โˆ’๐œ‹, ๐œ‹] 2 := [โˆ’๐œ‹, ๐œ‹] ร— [โˆ’๐œ‹, ๐œ‹] using polar coordinates. Write ๐œ” = (๐œ”1 , ๐œ”2 ) โˆˆ [โˆ’๐œ‹, ๐œ‹) 2 as ๐œ” = (๐‘Ÿ, ๐œƒ) โˆš๏ธƒ ๐‘Ÿ := ๐œ”21 + ๐œ”22 ๏ฃฑ ๐œ”1 โ‰ค 0 and ๐œ”2 = 0 ๏ฃด ๏ฃฒ ๐œ‹ ๏ฃด ๏ฃด ๐œƒ :=   . ๐œ”2 ๏ฃด ๏ฃด ๏ฃด 2 arctan ๐œ”1 +๐‘Ÿ otherwise ๏ฃณ We begin by defining some โ€œbuilding blockโ€ functions. The first is a low frequency radial function ๐ฟ : [0, โˆž) โ†’ R defined as ๏ฃฑ 1 ๐‘Ÿ โ‰ค ๐œ‹/2 ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด    โˆ€ ๐‘Ÿ โ‰ฅ 0, ๐ฟ(๐‘Ÿ) := cos ๐œ‹2 log2 2๐‘Ÿ๐œ‹ ๐œ‹/2 < ๐‘Ÿ โ‰ค ๐œ‹ . ๏ฃด ๏ฃด ๏ฃด ๐‘Ÿโ‰ฅ๐œ‹ ๏ฃด ๏ฃด 0 ๏ฃด ๏ฃณ The second is a high frequency function ๐ป : [0, โˆž) โ†’ R defined as ๏ฃฑ 0 ๐‘Ÿ โ‰ค ๐œ‹/2 ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด โˆ€ ๐‘Ÿ โ‰ฅ 0, ๐ป (๐‘Ÿ) := cos ๐œ‹2 log2 ๐œ‹๐‘Ÿ  ๏ฃด ๐œ‹/2 < ๐‘Ÿ โ‰ค ๐œ‹ . ๏ฃด ๏ฃด ๏ฃด 1 ๐‘Ÿโ‰ฅ๐œ‹ ๏ฃด ๏ฃด ๏ฃณ One can verify that ๐ฟ and ๐ป satisfy the following important property: |๐ฟ(๐‘Ÿ)| 2 + |๐ป (๐‘Ÿ)| 2 = 1, โˆ€ ๐‘Ÿ โ‰ฅ 0. (3.1) 59 Figure 3.4 Plot of ๐ฟ(๐‘Ÿ), ๐ป (๐‘Ÿ), and |๐ฟ(๐‘Ÿ)| 2 + |๐ป (๐‘Ÿ)| 2 . In fact, the cosine part for ๐ฟ (๐‘Ÿ) can be replaced with a different decreasing function, and the cosine part of ๐ป (๐‘Ÿ) can be replaced with a different increasing function, so long as (3.1) holds. Figure 3.4 plots ๐ฟ(๐‘Ÿ) and ๐ป (๐‘Ÿ) and verifies, numerically, equation (3.1). Remark 1. Looking at the plots in Figure 3.4, indeed we may want to use a different form for ๐ฟ (๐‘Ÿ) and ๐ป (๐‘Ÿ), as they are not very smooth at the point where they equal zero. We also define dilations of ๐ฟ and ๐ป. They are given by: ๏ฃฑ 1 ๐‘Ÿ โ‰ค ๐œ‹/2 ๐‘—+1 ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด   ๐‘—+1  โˆ€ ๐‘— โˆˆ Z, ๐ฟ ๐‘— (๐‘Ÿ) := ๐ฟ(2 ๐‘— ๐‘Ÿ) = cos ๐œ‹2 log2 2 ๐œ‹ ๐‘Ÿ ๐œ‹/2 ๐‘—+1 < ๐‘Ÿ โ‰ค ๐œ‹/2 ๐‘— , ๏ฃด ๏ฃด ๏ฃด ๐‘Ÿ โ‰ฅ ๐œ‹/2 ๐‘— ๏ฃด ๏ฃด 0 ๏ฃด ๏ฃณ and ๏ฃฑ 0 ๐‘Ÿ โ‰ค ๐œ‹/2 ๐‘—+1 ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด   ๐‘—  โˆ€ ๐‘— โˆˆ Z, ๐ป ๐‘— (๐‘Ÿ) := ๐ป (2 ๐‘— ๐‘Ÿ) = cos ๐œ‹2 log2 2๐œ‹๐‘Ÿ ๐œ‹/2 ๐‘—+1 < ๐‘Ÿ โ‰ค ๐œ‹/2 ๐‘— . ๏ฃด ๏ฃด ๏ฃด ๐‘Ÿ โ‰ฅ ๐œ‹/2 ๐‘— ๏ฃด ๏ฃด 1 ๏ฃด ๏ฃณ We remark that it follows from (3.1) that |๐ฟ ๐‘— (๐‘Ÿ)| 2 + |๐ป ๐‘— (๐‘Ÿ)| 2 = 1, โˆ€ ๐‘Ÿ โ‰ฅ 0, โˆ€ ๐‘— โˆˆ Z. Now we define a family of functions for the angular variable ๐œƒ. Let ๐‘„ be the number of such functions, which will later correspond to the number of directional filters at each scale. We define 60 |๐บ ๐‘ž (๐œƒ)| 2 for 0 โ‰ค ๐‘ž < ๐‘„ = 4. ร๐‘„โˆ’1 Figure 3.5 Plots of ๐บ ๐‘ž (๐œƒ) and ๐‘ž=0 โˆ€ 0 โ‰ค ๐‘ž < ๐‘„, โˆ€ ๐œƒ โˆˆ (โˆ’๐œ‹, ๐œ‹],   ๐‘„โˆ’1   ๐‘„โˆ’1 ! ๐œ‹๐‘ž ๐œ‹(๐‘ž โˆ’ ๐‘„) ๐บ ๐‘ž (๐œƒ) := ๐›ผ๐‘„ cos ๐œƒ โˆ’ 1|๐œƒโˆ’๐œ‹๐‘ž/๐‘„|โ‰ค๐œ‹/2 + cos ๐œƒ โˆ’ 1|๐œƒโˆ’๐œ‹(๐‘žโˆ’๐‘„)/๐‘„|โ‰ค๐œ‹/2 , ๐‘„ ๐‘„ where (๐‘„ โˆ’ 1)! ๐›ผ๐‘„ := 2๐‘„โˆ’1 โˆš๏ธ . ๐‘„(2(๐‘„ โˆ’ 1))! One can verify that ๐‘„โˆ’1 โˆ‘๏ธ |๐บ ๐‘ž (๐œƒ)| 2 = 1, โˆ€ ๐œƒ โˆˆ (โˆ’๐œ‹, ๐œ‹]. (3.2) ๐‘ž=0 Figure 3.5 plots the functions ๐บ ๐‘ž (๐œƒ) and verifies (3.2) numerically. Similarly to the ๐ฟ (๐‘Ÿ) and ๐ป (๐‘Ÿ) functions, the functions ๐บ ๐‘ž (๐œƒ) can be changed so long as (3.2) is satisfied. Now we use the functions ๐ฟ(๐‘Ÿ), ๐ป (๐‘Ÿ), and ๐บ ๐‘ž (๐œƒ) to build our filters. We construct three types of filters: โ€ข A high frequency filter โ„Ž : R2 โ†’ R โ€ข Directional wavelet filters ๐œ“ ๐‘ž : R2 โ†’ R โ€ข A low frequency filter โ„“ : R2 โ†’ R The high pass filter โ„Ž(๐‘ข) is defined through its Fourier transform b โ„Ž(๐‘Ÿ, ๐œƒ) as โˆ€ ๐‘Ÿ โ‰ฅ 0, โˆ€ ๐œƒ โˆˆ (โˆ’๐œ‹, ๐œ‹], โ„Ž(๐‘Ÿ, ๐œƒ) := ๐ป (๐‘Ÿ), b and the low pass filter โ„“(๐‘ข) is defined analogously, โˆ€ ๐‘Ÿ โ‰ฅ 0, โˆ€ ๐œƒ โˆˆ (โˆ’๐œ‹, ๐œ‹], b ๐œƒ) := ๐ฟ(๐‘Ÿ). โ„“(๐‘Ÿ, 61 The directional filters ๐œ“ ๐‘ž (๐‘ข) are defined as โˆ€ ๐‘Ÿ โ‰ฅ 0, โˆ€ ๐œƒ โˆˆ (โˆ’๐œ‹, ๐œ‹], ๐œ“b๐‘ž (๐‘Ÿ, ๐œƒ) := ๐ฟ 0 (๐‘Ÿ)๐ป1 (๐‘Ÿ)๐บ ๐‘ž (๐œƒ). One may verify that all the filters are symmetric through the origin (i.e., โ„Ž(โˆ’๐‘ข) = โ„Ž(๐‘ข), โ„“(โˆ’๐‘ข) = โ„“(๐‘ข), and ๐œ“ ๐‘ž (โˆ’๐‘ข) = ๐œ“ ๐‘ž (๐‘ข)) and real valued in space. We now define the wavelet transform of an image ๐‘ฅ โˆˆ L2 (R2 ) with its frequency support contained in [โˆ’๐œ‹, ๐œ‹] 2 , i.e., supp(b ๐‘ฅ ) โІ [โˆ’๐œ‹, ๐œ‹] 2 . Let ๐ฝ โ‰ฅ 1 be the number of scales and let ๐‘„ โ‰ฅ 1 be the number of orientations. A dilation of a filter ๐‘“ : R2 โ†’ C is defined as โˆ€ ๐‘ข โˆˆ R2 , โˆ€ ๐‘— โˆˆ Z, ๐‘“ ๐‘— (๐‘ข) := 2โˆ’2 ๐‘— ๐‘“ (2โˆ’ ๐‘— ๐‘ข). We remark that the Fourier transform of a dilated filter satisfies: ๐‘“ ๐‘— (๐‘Ÿ, ๐œƒ) = b b ๐‘“ ๐‘— (๐œ”) = b๐‘“ (2 ๐‘— ๐œ”) = b ๐‘“ (2 ๐‘— ๐‘Ÿ, ๐œƒ), โˆ€ ๐œ” โˆˆ R2 , โˆ€ ๐‘— โˆˆ Z. Using this notation define the filter bank F๐ฝ,๐‘„ as F๐ฝ,๐‘„ := {โ„Ž , โ„“๐ฝ , ๐œ“ ๐‘—,๐‘ž : 0 โ‰ค ๐‘— < ๐ฝ, 0 โ‰ค ๐‘ž < ๐‘„}, where โ„“๐ฝ (๐‘ข) is the dilation of the low pass filter โ„“(๐‘ข), which means: โ„“๐ฝ (๐‘ข) = 2โˆ’2๐ฝ โ„“(2โˆ’๐ฝ ๐‘ข) =โ‡’ b ๐ฝ ๐‘Ÿ, ๐œƒ) = ๐ฟ (2๐ฝ ๐‘Ÿ) = ๐ฟ ๐ฝ (๐‘Ÿ), โ„“b๐ฝ (๐‘Ÿ, ๐œƒ) = โ„“(2 and ๐œ“ ๐‘—,๐‘ž (๐‘ข) is the dilation of the directional filter ๐œ“ ๐‘ž (๐‘ข), meaning that: ๐œ“ ๐‘—,๐‘ž (๐‘ข) = 2โˆ’2 ๐‘— ๐œ“ ๐‘ž (2โˆ’ ๐‘— ๐‘ข) =โ‡’ b๐‘—,๐‘ž (๐‘Ÿ, ๐œƒ) = ๐œ“ ๐œ“ b๐‘ž (2 ๐‘— ๐‘Ÿ, ๐œƒ) = ๐ฟ ๐‘— (๐‘Ÿ)๐ป ๐‘—+1 (๐‘Ÿ)๐บ ๐‘ž (๐œƒ). Figure 3.6 plots the high frequency residual filter โ„Ž(๐‘ข) and the low frequency filter โ„“๐ฝ (๐‘ข) for ๐ฝ = 4, along with their Fourier transforms. Figure 3.7 plots the Fourier transforms ๐œ“ b๐‘—,๐‘ž (๐œ”) of the directional band-pass wavelets for 0 โ‰ค ๐‘— < ๐ฝ = 4 and 0 โ‰ค ๐‘ž < ๐‘„ = 4, while Figure 3.8 plots ๐œ“ ๐‘—,๐‘ž (๐‘ข) in space for 0 โ‰ค ๐‘— < ๐ฝ = 4 and 0 โ‰ค ๐‘ž < ๐‘„ = 4. 62 (a) b โ„Ž(๐œ”) (b) โ„Ž(๐‘ข) (c) โ„“b๐ฝ (๐œ”) for ๐ฝ = 4 (d) โ„“๐ฝ (๐‘ข) for ๐ฝ = 4 Figure 3.6 The high frequency residual filter โ„Ž(๐‘ข) and the low frequency filter โ„“๐ฝ (๐‘ข) with ๐ฝ = 4, and their Fourier transforms b โ„Ž(๐œ”) and โ„“b๐ฝ (๐œ”), respectively. 63 b๐‘—,๐‘ž (๐œ”) of the directional wavelets for 0 โ‰ค ๐‘— < ๐ฝ = 4 and Figure 3.7 The Fourier transforms ๐œ“ 0 โ‰ค ๐‘ž < ๐‘„ = 4. 64 Figure 3.8 The directional wavelets ๐œ“ ๐‘—,๐‘ž (๐‘ข) for 0 โ‰ค ๐‘— < ๐ฝ = 4 and 0 โ‰ค ๐‘ž < ๐‘„ = 4. 65 Define the wavelet transform of ๐‘ฅ(๐‘ข) as the map ๐‘Š๐ฝ,๐‘„ : L2 (R2 ) โ†’ L2 (R2 ) ๐ฝ๐‘„+2 , where ๐‘Š๐ฝ,๐‘„ ๐‘ฅ := {๐‘ฅ โˆ— โ„Ž , ๐‘ฅ โˆ— โ„“๐ฝ , ๐‘ฅ โˆ— ๐œ“ ๐‘—,๐‘ž : 0 โ‰ค ๐‘— < ๐ฝ, 0 โ‰ค ๐‘ž < ๐‘„}. Notice that ๐‘Š๐ฝ,๐‘„ takes an image ๐‘ฅ(๐‘ข) and returns ๐ฝ๐‘„ + 2 images, which we refer to as the wavelet coefficients of ๐‘ฅ. Let โˆฅ๐‘ฅโˆฅ denote the L2 (R2 ) norm of ๐‘ฅ, โˆซ 2 โˆฅ๐‘ฅโˆฅ = |๐‘ฅ(๐‘ข)| 2 ๐‘‘๐‘ข. R The norm of ๐‘Š๐ฝ,๐‘„ ๐‘ฅ is defined as ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ 2 2 โˆฅ๐‘Š๐ฝ,๐‘„ ๐‘ฅโˆฅ := โˆฅ๐‘ฅ โˆ— โ„Žโˆฅ + โˆฅ๐‘ฅ โˆ— โ„“๐ฝ โˆฅ + 2 โˆฅ๐‘ฅ โˆ— ๐œ“ ๐‘—,๐‘ž โˆฅ 2 . ๐‘—=0 ๐‘ž=0 This particular wavelet transform has many nice properties due to the filter construction. We collect them in the next theorem. Theorem 8. The filter bank F๐ฝ,๐‘„ satisfies the following Littlewood-Paley condition: ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ โ„Ž(๐œ”)| 2 + | โ„“b๐ฝ (๐œ”)| 2 + |b b๐‘—,๐‘ž (๐œ”)| 2 = 1, |๐œ“ โˆ€ ๐œ” โˆˆ [โˆ’๐œ‹, ๐œ‹] 2 . (3.3) ๐‘—=0 ๐‘ž=0 Therefore, for any ๐‘ฅ โˆˆ L2 (R2 ) with supp(b ๐‘ฅ ) โІ [โˆ’๐œ‹, ๐œ‹] 2 , the wavelet transform ๐‘Š๐ฝ,๐‘„ is an isometry, โˆฅ๐‘Š๐ฝ,๐‘„ ๐‘ฅโˆฅ = โˆฅ๐‘ฅโˆฅ, and furthermore has the following inverse: ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ ๐‘ฅ = ๐‘ฅ โˆ— โ„Ž โˆ— โ„Ž + ๐‘ฅ โˆ— โ„“๐ฝ โˆ— โ„“๐ฝ + ๐‘ฅ โˆ— ๐œ“ ๐‘—,๐‘ž โˆ— ๐œ“ ๐‘—,๐‘ž . (3.4) ๐‘—=0 ๐‘ž=0 Proof. First, convert everything to polar coordinates. Use condition (3.2) to simplify the third term, condition (3.1) for the other terms, and also use the definition of ๐ฟ ๐‘— (๐‘Ÿ) to observe that 66 ๐ฟ ๐‘— (๐‘Ÿ)๐ฟ ๐‘—+1 (๐‘Ÿ) = ๐ฟ ๐‘—+1 (๐‘Ÿ), which results in a telescoping sum: ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ |b 2 โ„Ž(๐œ”)| + | โ„“b๐ฝ (๐œ”)| +2 b๐‘—,๐‘ž (๐œ”)| 2 |๐œ“ ๐‘—=0 ๐‘ž=0 ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ 2 2 = |๐ป (๐‘Ÿ)| + |๐ฟ ๐ฝ (๐‘Ÿ)| + |๐ฟ ๐‘— (๐‘Ÿ)| 2 |๐ป ๐‘—+1 (๐‘Ÿ)| 2 |๐บ ๐‘ž (๐œƒ)| 2 ๐‘—=0 ๐‘ž=0 ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ ยฉ โˆ‘๏ธ 2 = |๐ป (๐‘Ÿ)| + |๐ฟ ๐ฝ (๐‘Ÿ)| + 2 2 2 |๐ฟ ๐‘— (๐‘Ÿ)| |๐ป ๐‘—+1 (๐‘Ÿ)| | ยญ ๐บ ๐‘ž (๐œƒ)| 2 ยฎ ยช ๐‘—=0 ยซ ๐‘ž=0 ยฌ ๐ฝโˆ’1 โˆ‘๏ธ = |๐ป (๐‘Ÿ)| 2 + |๐ฟ ๐ฝ (๐‘Ÿ)| 2 + |๐ฟ ๐‘— (๐‘Ÿ)| 2 |๐ป ๐‘—+1 (๐‘Ÿ)| 2 ๐‘—=0 ๐ฝโˆ’1 โˆ‘๏ธ 2 2 = 1 โˆ’ |๐ฟ (๐‘Ÿ)| + |๐ฟ ๐ฝ (๐‘Ÿ)| + |๐ฟ ๐‘— (๐‘Ÿ)| 2 (1 โˆ’ |๐ฟ ๐‘—+1 (๐‘Ÿ)| 2 ) ๐‘—=0 ๐ฝโˆ’1 โˆ‘๏ธ 2 2 = 1 โˆ’ |๐ฟ (๐‘Ÿ)| + |๐ฟ ๐ฝ (๐‘Ÿ)| + |๐ฟ ๐‘— (๐‘Ÿ)| 2 โˆ’ |๐ฟ ๐‘— (๐‘Ÿ)| 2 |๐ฟ ๐‘—+1 (๐‘Ÿ)| 2 ๐‘—=0 ๐ฝโˆ’1 โˆ‘๏ธ = 1 โˆ’ |๐ฟ (๐‘Ÿ)| 2 + |๐ฟ ๐ฝ (๐‘Ÿ)| 2 + |๐ฟ ๐‘— (๐‘Ÿ)| 2 โˆ’ |๐ฟ ๐‘—+1 (๐‘Ÿ)| 2 ๐‘—=0 = 1 โˆ’ |๐ฟ (๐‘Ÿ)| 2 + |๐ฟ ๐ฝ (๐‘Ÿ)| 2 + |๐ฟ (๐‘Ÿ)| 2 โˆ’ |๐ฟ ๐ฝ (๐‘Ÿ)| 2 = 1. To prove the isometry property, multiply both sides of (3.3) by | ๐‘ฅ(๐œ”)| ห† 2 , integrate, and apply Plancherel formula. To prove the inversion formula, take the Fourier Transform of ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ ๐‘” = ๐‘ฅ โˆ— โ„Ž โˆ— โ„Ž + ๐‘ฅ โˆ— โ„“๐ฝ โˆ— โ„“๐ฝ + ๐‘ฅ โˆ— ๐œ“ ๐‘—,๐‘ž โˆ— ๐œ“ ๐‘—,๐‘ž . ๐‘—=0 ๐‘ž=0 67 Then ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ ห† ๐‘”(๐œ”) = ๐‘ฅ(๐œ”) ห† ห† โ„Ž(๐œ”) ห† โ„Ž(๐œ”) + ๐‘ฅ(๐œ”) ห† ห† โ„“(๐œ”) โ„“(๐œ”) ห† + ห† ๐‘ฅ(๐œ”) ๐œ“ห† ๐‘—,๐‘ž (๐œ”) ๐œ“ห† ๐‘—,๐‘ž (๐œ”) ๐‘—=0 ๐‘ž=0 ๏ฃฎ ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ ๏ฃน 2 2 2 ๏ฃฏ ๏ฃบ = ๐‘ฅ(๐œ”) ห† ห† ๏ฃฏ | โ„Ž(๐œ”)| ห† + | โ„“(๐œ”)| + | ๐œ“ห† ๐‘—,๐‘ž (๐œ”)| ๏ฃบ๏ฃบ ๏ฃฏ ๏ฃฏ ๐‘—=0 ๐‘ž=0 ๏ฃบ ๏ฃฐ ๏ฃป = ๐‘ฅ(๐œ”). ห† Taking inverse Fourier Transform gives result. โ–ก It turns out that the inverse formula (3.4) is based on the adjoint of the wavelet transform. Let us explain. For ๐‘ฅ, ๐‘ฆ โˆˆ L2 (R2 ) define their inner product as: โˆซ โŸจ๐‘ฅ, ๐‘ฆโŸฉ := ๐‘ฅ(๐‘ข)๐‘ฆ(๐‘ข) ๐‘‘๐‘ข, (3.5) R2 where ๐‘ฅ(๐‘ข) is the complex conjugate of ๐‘ฅ(๐‘ข). Using this inner product we can define an inner product on L2 (R2 ) ๐ฝ๐‘„+2 . Write ๐น โˆˆ L2 (R2 ) ๐ฝ๐‘„+2 as ๐น = { ๐‘“ โ„Ž , ๐‘“๐ฝ , ๐‘“ ๐‘—,๐‘ž : 0 โ‰ค ๐‘— < ๐ฝ, 0 โ‰ค ๐‘ž < ๐‘„}, where ๐‘“ โ„Ž โˆˆ L2 (R2 ), ๐‘“๐ฝ โˆˆ L2 (R2 ), and ๐‘“ ๐‘—,๐‘ž โˆˆ L2 (R2 ) for each ๐‘— and ๐‘ž. Then for ๐น, ๐บ โˆˆ L2 (R2 ) ๐ฝ๐‘„+2 , define their inner product as: ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ โŸจ๐น, ๐บโŸฉ := โŸจ ๐‘“ โ„Ž , ๐‘” โ„Ž โŸฉ + โŸจ ๐‘“๐ฝ , ๐‘”๐ฝ โŸฉ + โŸจ ๐‘“ ๐‘—,๐‘ž , ๐‘” ๐‘—,๐‘ž โŸฉ, ๐‘—=0 ๐‘ž=0 where the inner products on the right hand side are defined using (3.5). Notice that with this inner product we have โˆฅ๐‘Š๐ฝ,๐‘„ ๐‘ฅโˆฅ 2 = โŸจ๐‘Š๐ฝ,๐‘„ ๐‘ฅ, ๐‘Š๐ฝ,๐‘„ ๐‘ฅโŸฉ. Now let us define the adjoint of the wavelet transform. The adjoint of ๐‘Š๐ฝ,๐‘„ : L2 (R2 ) โ†’ L2 (R2 ) ๐ฝ๐‘„+2 is the map ๐‘Š๐ฝ,๐‘„โˆ— : L2 (R2 ) ๐ฝ๐‘„+2 โ†’ L2 (R2 ) such that the following relation holds: โˆ— โˆ€ ๐‘ฅ โˆˆ L2 (R2 ), โˆ€ ๐น โˆˆ L2 (R2 ) ๐ฝ๐‘„+2 , โŸจ๐‘Š๐ฝ,๐‘„ ๐‘ฅ, ๐นโŸฉ = โŸจ๐‘ฅ, ๐‘Š๐ฝ,๐‘„ ๐นโŸฉ. (3.6) Using (3.6) as the definition of ๐‘Š๐ฝ,๐‘„ โˆ— , one can prove the following theorem. 68 Theorem 9. The adjoint of ๐‘Š๐ฝ,๐‘„ is ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ โˆ— ๐‘Š๐ฝ,๐‘„ ๐น = ๐‘“ โ„Ž โˆ— โ„Ž + ๐‘“๐ฝ โˆ— โ„“๐ฝ + ๐‘“ ๐‘—,๐‘ž โˆ— ๐œ“ ๐‘—,๐‘ž , โˆ€ ๐น โˆˆ L2 (R2 ) ๐ฝ๐‘„+2 . ๐‘—=0 ๐‘ž=0 Proof. Let ๐‘ฅ โˆˆ L2 (R2 ) and ๐น โˆˆ L2 (R2 ) ๐ฝ๐‘„+2 . The filters are real symmetric. We can also use 69 Fubiniโ€™s Theorem. โŸจ๐‘Š๐ฝ,๐‘„ ๐‘ฅ, ๐นโŸฉ ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ = โŸจ(๐‘ฅ โˆ— โ„Ž), ๐‘“ โ„Ž โŸฉ + โŸจ(๐‘ฅ โˆ— โ„“๐ฝ ), ๐‘“โ„“ โŸฉ + โŸจ(๐‘ฅ โˆ— ๐œ“ ๐‘—,๐‘ž ), ๐‘“ ๐‘—,๐‘ž โŸฉ ๐‘—=0 ๐‘ž=0 โˆซ โˆซ ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ โˆซ = (๐‘ฅ โˆ— โ„Ž)(๐‘ข) ๐‘“ โ„Ž (๐‘ข) ๐‘‘๐‘ข + (๐‘ฅ โˆ— โ„“๐ฝ )(๐‘ข) ๐‘“โ„“ (๐‘ข) ๐‘‘๐‘ข + (๐‘ฅ โˆ— ๐œ“ ๐‘—,๐‘ž )(๐‘ข) ๐‘“ ๐‘—,๐‘ž (๐‘ข) ๐‘‘๐‘ข R2 R2 ๐‘—=0 ๐‘ž=0 R2 โˆซ โˆซ  โˆซ โˆซ  = ๐‘ฅ(๐‘ก)โ„Ž(๐‘ข โˆ’ ๐‘ก) ๐‘‘๐‘ก ๐‘“ โ„Ž (๐‘ข) ๐‘‘๐‘ข + ๐‘ฅ(๐‘ก)โ„“๐ฝ (๐‘ข โˆ’ ๐‘ก) ๐‘‘๐‘ก ๐‘“โ„“ (๐‘ข) ๐‘‘๐‘ข R2 R2 R2 R2 ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ โˆซ โˆซ  + ๐‘ฅ(๐‘ก)๐œ“ ๐‘—,๐‘ž (๐‘ข โˆ’ ๐‘ก) ๐‘‘๐‘ก ๐‘“ ๐‘—,๐‘ž (๐‘ข) ๐‘‘๐‘ข ๐‘—=0 ๐‘ž=0 R2 R2 โˆซ โˆซ  โˆซ โˆซ  = ๐‘ฅ(๐‘ก) ๐‘“ โ„Ž (๐‘ข)โ„Ž(๐‘ข โˆ’ ๐‘ก) ๐‘‘๐‘ข ๐‘‘๐‘ก + ๐‘ฅ(๐‘ก) ๐‘“โ„“ (๐‘ข)โ„“๐ฝ (๐‘ข โˆ’ ๐‘ก) ๐‘‘๐‘ข ๐‘‘๐‘ก R2 R2 R2 R2 ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ โˆซ โˆซ  + ๐‘ฅ(๐‘ก) ๐‘“ ๐‘—,๐‘ž (๐‘ข)๐œ“ ๐‘—,๐‘ž (๐‘ข โˆ’ ๐‘ก) ๐‘‘๐‘ข ๐‘‘๐‘ก ๐‘—=0 ๐‘ž=0 R2 R2 โˆซ โˆซ  โˆซ โˆซ  = ๐‘ฅ(๐‘ก) ๐‘“ โ„Ž (๐‘ข)โ„Ž(๐‘ก โˆ’ ๐‘ข) ๐‘‘๐‘ข ๐‘‘๐‘ก + ๐‘ฅ(๐‘ก) ๐‘“โ„“ (๐‘ข)โ„“๐ฝ (๐‘ก โˆ’ ๐‘ข) ๐‘‘๐‘ข ๐‘‘๐‘ก R2 R2 R2 R2 ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ โˆซ โˆซ  + ๐‘ฅ(๐‘ก) ๐‘“ ๐‘—,๐‘ž (๐‘ข)๐œ“ ๐‘—,๐‘ž (๐‘ก โˆ’ ๐‘ข) ๐‘‘๐‘ข ๐‘‘๐‘ก ๐‘—=0 ๐‘ž=0 R2 R2 โˆซ โˆซ ! โˆซ โˆซ ! = ๐‘ฅ(๐‘ก) ๐‘“ โ„Ž (๐‘ข)โ„Ž(๐‘ก โˆ’ ๐‘ข) ๐‘‘๐‘ข ๐‘‘๐‘ก + ๐‘ฅ(๐‘ก) ๐‘“โ„“ (๐‘ข)โ„“๐ฝ (๐‘ก โˆ’ ๐‘ข) ๐‘‘๐‘ข ๐‘‘๐‘ก R2 R2 R2 R2 ๐ฝโˆ’1 ๐‘„โˆ’1 ! โˆ‘๏ธ โˆ‘๏ธ โˆซ โˆซ + ๐‘ฅ(๐‘ก) ๐‘“ ๐‘—,๐‘ž (๐‘ข)๐œ“ ๐‘—,๐‘ž (๐‘ก โˆ’ ๐‘ข) ๐‘‘๐‘ข ๐‘‘๐‘ก ๐‘—=0 ๐‘ž=0 R2 R2 โˆซ โˆซ ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ โˆซ = ๐‘ฅ(๐‘ก) ยท ๐‘“ โ„Ž โˆ— โ„Ž(๐‘ก) ๐‘‘๐‘ก + ๐‘ฅ(๐‘ก) ยท ๐‘“โ„“ โˆ— โ„“๐ฝ (๐‘ก) ๐‘‘๐‘ก + ๐‘ฅ(๐‘ก) ยท ๐‘“ ๐‘—,๐‘ž โˆ— ๐œ“ ๐‘—,๐‘ž (๐‘ก) ๐‘‘๐‘ก R2 R2 ๐‘—=0 ๐‘ž=0 R2 ๐ฝโˆ’1 ๐‘„โˆ’1 โˆ‘๏ธ โˆ‘๏ธ = โŸจ๐‘ฅ, ( ๐‘“ โ„Ž โˆ— โ„Ž)โŸฉ + โŸจ๐‘ฅ, ( ๐‘“โ„“ โˆ— โ„“๐ฝ )โŸฉ + โŸจ๐‘ฅ, ( ๐‘“ ๐‘—,๐‘ž โˆ— ๐œ“ ๐‘—,๐‘ž )โŸฉ ๐‘—=0 ๐‘ž=0 โˆ— = โŸจ๐‘ฅ, ๐‘Š๐ฝ,๐‘„ ๐นโŸฉ. โ–ก Looking back at (3.4) we see that the left inverse of ๐‘Š๐ฝ,๐‘„ is given by ๐‘Š๐ฝ,๐‘„ โˆ— , i.e., โˆ— ๐‘ฅ = ๐‘Š๐ฝ,๐‘„ ๐‘Š๐ฝ,๐‘„ ๐‘ฅ. 70 Having the adjoint of ๐‘Š๐ฝ,๐‘„ and this relationship will be useful when we get to the Heeger-Bergen texture synthesis algorithm. 3.3 Heeger Bergen Texture Synthesis Using the filter bank from before, we can now state the Heeger-Bergen Texture Synthesis algorithm. We will need an auxillary histogram matching algorithm taken from [10]: Algorithm 3.1 Histogram Matching 1: Start with an input image ๐‘ข and reference image ๐‘ฃ both of size ๐‘€ ร— ๐‘. 2: Define ๐ฟ = ๐‘€ ๐‘ and assume that ๐‘ข and ๐‘ฃ are unraveled. 3: Determine the permutation ๐œ such that ๐‘ฃ ๐œ(1) โ‰ค ๐‘ฃ ๐œ(2) โ‰ค . . . โ‰ค ๐‘ฃ ๐œ(๐ฟ) . 4: Determine the permutation ๐œŽ such that ๐‘ข ๐œŽ(1) โ‰ค ๐‘ข ๐œŽ(2) โ‰ค . . . โ‰ค ๐‘ข ๐œŽ(๐ฟ) . 5: for ๐‘– = 1 to ๐ฟ do 5: ๐‘ข ๐œŽ(๐‘–) โ† ๐‘ฃ ๐œ(๐‘˜) 6: end for The general idea behind the Heeger Bergen Texture Synthesis algorithm is to match distributions one wavelet coefficients. One starts with a ๐‘€ ร— ๐‘ white noise, say ๐ผ๐‘Š , with pixels sampled from a standard normal distribution and a reference image ๐ผ ๐‘… . One then calculates ๐‘Š๐ฝ,๐‘„ ๐ผ ๐‘… and ๐‘Š๐ฝ,๐‘„ ๐ผ๐‘Š , histogram matches corresponding coefficients, and inverts the transform. The general idea is that matching distributions of wavelet coefficients will turn the white noise into an image that is similar to reference texture. Pseudocode for the algorithm is provided in Algorithm 3.2. Algorithm 3.2 Heeger Bergen Texture Synthesis Algorithm 1: Start with an white noise ๐ผ๐‘Š โˆˆ R ๐‘€ร—๐‘ , reference image ๐ผ ๐‘… โˆˆ R ๐‘€ร—๐‘ , set of scales ๐ฝ, number of rotations ๐‘„, and number of iterations ๐‘‡. 2: Calculate ๐‘Š๐ฝ,๐‘„ ๐ผ ๐‘… โˆˆ L2 (R2 ) ๐ฝ๐‘„+2 . 3: for ๐‘ก = 1 to ๐‘‡ do 4: Calculate ๐‘Š๐ฝ,๐‘„ ๐ผ๐‘Š โˆˆ L2 (R2 ) ๐ฝ๐‘„+2 . 5: Update ๐‘Š๐ฝ,๐‘„ ๐ผ๐‘Š by histogram matching each element of ๐‘Š๐ฝ,๐‘„ ๐ผ๐‘Š with the corresponding filter in ๐‘Š๐ฝ,๐‘„ ๐ผ ๐‘… (using each filter in ๐‘Š๐ฝ,๐‘„ ๐ผ ๐‘… as the reference histogram). Update ๐ผ๐‘Š via the formula ๐ผ๐‘Š = ๐‘Š๐ฝ,๐‘„ โˆ— ๐‘Š 6: ๐ฝ,๐‘„ ๐ผ๐‘Š . 7: Histogram match ๐ผ๐‘Š with ๐ผ ๐‘… using ๐ผ ๐‘… as the reference histogram. 8: end for Unfortunately, Algorithm 3.2 does not yield high quality texture synthesis. One possible reason for this is that it does not use a nonlinear filter bank, so complex features in a texture cannot be 71 effectively captured. 3.4 One Layer Nonlinear Heeger Bergen Texture Synthesis To improve the synthesis quality of Algorithm 3.2, we propose a slight modification. Consider the nonlinearity ReLU(๐‘ฅ) := max{0, ๐‘ฅ} and notice that we have the identities |๐‘ฅ| = ReLU(๐‘ฅ) + ReLU(โˆ’๐‘ฅ) (3.7) and ๐‘ฅ = ReLU(๐‘ฅ) โˆ’ ReLU(โˆ’๐‘ฅ). (3.8) Now define the nonlinear filter bank ๐‘Š๐ฝ,๐‘„,๐œ€ ๐‘ฅ := {ReLU(๐‘ฅ โˆ— ๐œ€โ„Ž) , ReLU(๐‘ฅ โˆ— ๐œ€โ„“๐ฝ ) , ReLU(๐‘ฅ โˆ— ๐œ€๐œ“ ๐‘—,๐‘ž ) : 0 โ‰ค ๐‘— < ๐ฝ, 0 โ‰ค ๐‘ž < ๐‘„, ๐œ€ = ยฑ1}. (3.9) In particular, for any ฮจ in ๐‘Š๐ฝ,๐‘„ , we see that | ๐‘“ โˆ— ฮจ| = ReLU( ๐‘“ โˆ— ฮจ) + ReLU( ๐‘“ โˆ— โˆ’ฮจ) (3.10) and ๐‘“ โˆ— ฮจ = ReLU( ๐‘“ โˆ— ฮจ) โˆ’ ReLU( ๐‘“ โˆ— โˆ’ฮจ). (3.11) Thus, it follows that ๐‘Š๐ฝ,๐‘„ ๐‘ฅ = ๐‘Š๐ฝ,๐‘„,1 ๐‘ฅ โˆ’ ๐‘Š๐ฝ,๐‘„,โˆ’1 ๐‘ฅ, (3.12) which means that โˆ— ๐‘ฅ = ๐‘Š๐ฝ,๐‘„ (๐‘Š๐ฝ,๐‘„,1 ๐‘ฅ โˆ’ ๐‘Š๐ฝ,๐‘„,โˆ’1 ๐‘ฅ). (3.13) Using this inverse formula, we can create the following algorithm. Results using Algorithm 3.3 are given in Figure 3.9. 72 Algorithm 3.3 ReLU Heeger Bergen Texture Synthesis Algorithm 1: Start with an white noise ๐ผ๐‘Š โˆˆ R ๐‘€ร—๐‘ , reference image ๐ผ ๐‘… โˆˆ R ๐‘€ร—๐‘ , set of scales ๐ฝ, number of rotations ๐‘„, and number of iterations ๐‘‡. 2: Calculate ๐‘Š๐ฝ,๐‘„,๐œ€ ๐ผ ๐‘… โˆˆ L2 (R2 ) 2(๐ฝ๐‘„+2) . 3: for ๐‘ก = 1 to ๐‘‡ do 4: Calculate ๐‘Š๐ฝ,๐‘„,๐œ€ ๐ผ๐‘Š โˆˆ L2 (R2 ) 2(๐ฝ๐‘„+2) . 5: Update ๐‘Š๐ฝ,๐‘„,๐œ€ ๐ผ๐‘Š by histogram matching each element of ๐‘Š๐ฝ,๐‘„,๐œ€ ๐ผ๐‘Š with the corresponding filter in ๐‘Š๐ฝ,๐‘„,๐œ€ ๐ผ ๐‘… (using each filter in ๐‘Š๐ฝ,๐‘„,๐œ€ ๐ผ ๐‘… as the reference histogram). โˆ— (๐‘Š 6: Update ๐ผ๐‘Š via the formula ๐ผ๐‘Š = ๐‘Š๐ฝ,๐‘„ ๐ฝ,๐‘„,1 ๐ผ๐‘Š โˆ’ ๐‘Š๐ฝ,๐‘„,โˆ’1 ๐ผ๐‘Š ). 7: Histogram match ๐ผ๐‘Š with ๐ผ ๐‘… using ๐ผ ๐‘… as the reference histogram. 8: end for Figure 3.9 Left: Reference Texture. Middle: No ReLU. Right: With ReLU. Ran with ๐ฝ = 4 and ๐‘„ = 6 for 50 iterations. 3.5 An Invertible Windowed Scattering Transform Before we describe the next modification to the Heeger Bergen Texture Synthesis algorithm, we need to motivate the usage of wavelet scattering transforms. We will use the following notation 73 again: ๐ฟ ๐‘ ๐‘“ (๐‘ฅ) = ๐‘“ (๐‘ฅ โˆ’ ๐‘) (3.14) ๐ฟ๐œ ๐‘“ (๐‘ฅ) = ๐‘“ (๐‘ฅ โˆ’ ๐œ(๐‘ฅ)). (3.15) The first operator, ๐ฟ ๐‘ , is a translation operator, and the second operator ๐ฟ ๐œ can be thought of as a deformation operator. In particular, if โˆฅโˆ‡๐œโˆฅ โˆž = sup |โˆ‡๐œ(๐‘ฅ)| < 1 (3.16) ๐‘ฅโˆˆR๐‘› and ๐œ โˆˆ C2 (R๐‘› ). Suppose that H1 , H2 are a Hilbert spaces and ฮฆ : H1 โ†’ H2 is an operator for a vision-related task, such as classification. We would like ฮฆ to have the following properties: โ€ข Local Translation Invariance: small translations of an object should not greatly affect the output of ฮฆ. โ€ข Nonexpansiveness: for ๐‘“ , ๐‘” โˆˆ H1 , โˆฅฮฆ ๐‘“ โˆ’ ฮฆ๐‘”โˆฅ H2 โ‰ค โˆฅ ๐‘“ โˆ’ ๐‘”โˆฅ H1 . (3.17) In other words, the distance between ฮฆ ๐‘“ and ฮฆ๐‘” should not be larger than the original distance between ๐‘“ and ๐‘” for stability reasons. โ€ข Stability to Diffeomorphisms: there exists ๐ถ > 0 such that โˆฅฮฆ ๐‘“ โˆ’ ฮฆ๐ฟ ๐œ ๐‘“ โˆฅ H2 โ‰ค ๐ถ โˆฅ ๐‘“ โˆฅ H1 (โˆฅ๐œโˆฅ โˆž + โˆฅโˆ‡๐œโˆฅ โˆž + โˆฅ๐ป๐œโˆฅ โˆž ). (3.18) for all ๐œ โˆˆ C2 (R๐‘› ). That is, the operator "linearizes" small deformations. As shown in [34], the Fourier modulus is not stable to small dilations, which are a simple class of diffeomorphisms. A natural next step is to use wavelets, which produce a representation that is well localized in space and frequency. Let ๐บ + be the set of "positive" rotations and define + ๏ฃฑ ๐‘— ๏ฃฒ {๐œ† = 2 ๐‘Ÿ : ๐‘Ÿ โˆˆ ๐บ , ๐‘— < ๐ฝ} if ๐ฝ < โˆž ๏ฃด ๏ฃด ๏ฃด ฮ›๐ฝ = (3.19) ๏ฃด {๐œ† = 2 ๐‘— ๐‘Ÿ : ๐‘Ÿ โˆˆ ๐บ + , ๐‘— โˆˆ Z} if ๐ฝ = โˆž. ๏ฃด ๏ฃด ๏ฃณ 74 Let ๐œ“ be a wavelet. For ๐‘ฅ โˆˆ ๐ฟ 2 (R๐‘› ), let ๐œ† โˆˆ ฮ›๐ฝ and define ๐‘ˆ [๐œ†]๐‘ฅ = |๐‘ฅ โˆ— ๐œ“๐œ† | (3.20) A path ๐‘ is an ordered tuple ๐‘ = (๐œ† 1 , . . . ๐œ†โ„“ ) with ๐œ†๐‘– โˆˆ ฮ›โˆž for ๐‘– = 1 . . . , โ„“, and ๐‘ƒ๐ฝ where each element of the path is from ฮ›๐ฝ . For paths, define the operator ๐‘ˆ [ ๐‘]๐‘ฅ = ๐‘ˆ [๐œ†โ„“ ] . . . ๐‘ˆ [๐œ† 2 ]๐‘ˆ [๐œ†1 ]๐‘ฅ = |||๐‘ฅ โˆ— ๐œ“๐œ†1 | โˆ— ๐œ“๐œ†2 | . . . | โˆ— ๐œ“๐œ†โ„“ |. (3.21) Let ๐œ™ ๐ฝ (๐‘ข) be a low pass filter. For ๐‘ โˆˆ ๐‘ƒ๐ฝ , define the windowed scattering operator ๐‘† ๐ฝ [ ๐‘]๐‘ฅ(๐‘ข) = (๐‘ˆ [ ๐‘]๐‘ฅ โˆ— ๐œ™ ๐ฝ )(๐‘ข) (3.22) For any set of paths, say ฮฉ, define the path set ๐‘† ๐ฝ [ฮฉ] = {๐‘† ๐ฝ [ ๐‘]} ๐‘โˆˆฮฉ . We now define the windowed scattering transform as the set ๐‘† ๐ฝ [๐‘ƒ๐ฝ ]๐‘ฅ and use the following Hilbert Space norm โˆ‘๏ธ โˆฅ๐‘† ๐ฝ [ฮฉ]๐‘ฅโˆฅ 2 = โˆฅ๐‘† ๐ฝ [ ๐‘]๐‘ฅโˆฅ 22 . (3.23) ๐‘โˆˆฮฉ Notably, the windowed scattering transform has many properties we would like in a feature extractor. Theorem 10 (The Windowed Scattering Norm is Well-defined). Suppose that ๐œ“ is a wavelet such that there ๐œ‚ โˆˆ R๐‘› , ๐œŒ โ‰ฅ 0, | ๐œŒ(๐œ”)| ห† ห† โ‰ค | ๐œ™(2๐œ”)|, ห† ๐œŒ(0) = 1, and โˆž โˆ‘๏ธ   ฮจฬ‚(๐œ”) = | ๐œŒ(๐œ” ห† โˆ’ ๐œ‚)| โˆ’ 2 ห† โˆ’๐‘˜ ๐œ” โˆ’ ๐œ‚)| 2 ๐‘˜ 1 โˆ’ | ๐œŒ(2 (3.24) ๐‘˜=1 satisfies โˆ‘๏ธโˆž โˆ‘๏ธ ๐›ผ โˆ’ inf ฮจฬ‚(2โˆ’ ๐‘— ๐‘Ÿ โˆ’1 ๐œ”)| ๐œ“(2 ห† โˆ’ ๐‘— ๐‘Ÿ โˆ’1 ๐œ”)| 2 > 0. (3.25) 1โ‰ค๐œ”โ‰ค2 ๐‘—=โˆ’โˆž ๐‘Ÿโˆˆ๐บ We will call this the admissibility condition. Under the admissibility condition, for all ๐‘ฅ โˆˆ L2 (R๐‘› ), we have the isometry โˆฅ๐‘† ๐ฝ [๐‘ƒ๐ฝ ]๐‘ฅโˆฅ = โˆฅ๐‘ฅโˆฅ. (3.26) Theorem 11 (Nonexpansive Property of the Windowed Scattering Transform). For all ๐‘ฅ, ๐‘ฆ โˆˆ ๐ฟ 2 (R๐‘› ), โˆฅ๐‘† ๐ฝ [๐‘ƒ๐ฝ ]๐‘ฅ โˆ’ ๐‘† ๐ป [๐‘ƒ๐ฝ ]๐‘ฆโˆฅ โ‰ค โˆฅ๐‘ฅ โˆ’ ๐‘ฆโˆฅ 2 . (3.27) 75 Theorem 12 (Local Translation Invariance of the Windowed Scattering Transform). For all ๐‘ โˆˆ R๐‘› , ๐‘ฅ โˆˆ ๐ฟ 2 (R๐‘› ), and an admissible wavelet, lim โˆฅ๐‘† ๐ฝ [๐‘ƒ๐ฝ ]๐‘ฅ โˆ’ ๐‘† ๐ฝ [๐‘ƒ๐ฝ ] ๐ฟ ๐‘ ๐‘ฅโˆฅ = 0. (3.28) ๐ฝโ†’โˆž 1 Theorem 13 (Diffeomorphism Stability). Let ๐œ โˆˆ ๐ถ 2 (R๐‘› ) with โˆฅ๐ท๐œโˆฅ โˆž < 2๐‘› . For an admissible wavelet and any ๐‘ฅ โˆˆ ๐ฟ 2 (R๐‘› ), there exists a constant ๐ถ such that โˆฅ๐‘† ๐ฝ [๐‘ƒ๐ฝ ] ๐ฟ ๐œ ๐‘ฅ โˆ’ ๐‘† ๐ฝ [๐‘ƒ๐ฝ ]๐‘ฅโˆฅ โ‰ค ๐ถ๐พ (๐œ)โˆฅ๐‘ฅโˆฅ, (3.29) where ๐พ (๐œ) โ†’ 0 as โˆฅ๐œโˆฅ โˆž + โˆฅโˆ‡๐œโˆฅ โˆž + โˆฅ๐ป๐œโˆฅ โˆž โ†’ 0. That is to say, a windowed scattering operator is a good feature extractor. In practice, it is not able to compute an infinite cascade of wavelet transforms, but empirical studies have shown that two layers is enough [33]. We will now construct a two layer modification of the scattering transform, which we will denote as the "Two Layer Scattering Pyramid," using the operators ๐‘Š๐ฝ,๐‘„ and ๐‘Š๐ฝ,๐‘„,๐œ€ : ๐‘†2 ๐‘ฅ := {๐‘ฅ โˆ— ๐œ™ ๐ฝ , ๐‘ฅ โˆ— โ„Ž, ๐‘Š๐ฝ,๐‘„ ReLU(๐‘ฅ โˆ— ๐œ€๐œ“ ๐‘—,๐‘ž ) : 0 โ‰ค ๐‘— โ‰ค ๐ฝ โˆ’ 1 , 0 โ‰ค ๐‘ž โ‰ค ๐‘„ โˆ’ 1 , ๐œ€ = ยฑ1}. (3.30) The algorithm is provided in Algorithm 3.4. Note that this can be generalized to multiple layers, but the computational cost is infeasible; additionally, most of energy should be in the first two layers. 76 Algorithm 3.4 Two Layer Scattering Pyramid 1: INPUTS: An image ๐‘ฅ and operators ๐‘Š๐ฝ,๐‘„ , ๐‘Š๐ฝ,๐‘„,โˆ’1 , ๐‘Š๐ฝ,๐‘„,1 . 2: OUTPUT: ๐‘† 2 ๐‘ฅ, a modification of the Two Layer Scattering Pyramid. 3: Initialize a set of functions ๐‘† 2 ๐‘ฅ. 4: Calculate ๐‘Š๐ฝ,๐‘„,โˆ’1 ๐‘ฅ and ๐‘Š๐ฝ,๐‘„,1 ๐‘ฅ and add ๐‘ฅ โˆ— ๐œ™ ๐ฝ = ReLU(๐‘ฅ โˆ— ๐œ™ ๐ฝ ) โˆ’ ReLU(๐‘ฅ โˆ— โˆ’๐œ™ ๐ฝ ) and ๐‘ฅ โˆ— โ„Ž = ReLU(๐‘ฅ โˆ— โ„Ž) โˆ’ ReLU(๐‘ฅ โˆ— โˆ’โ„Ž). 5: for ๐‘— = 0 to ๐ฝ โˆ’ 1 do 6: for ๐‘ž = 0 to ๐‘„ โˆ’ 1 do 7: for ๐œ€ = ยฑ1 do 8: Calculate ๐‘Š๐ฝ,๐‘„,๐œ– (๐‘ฅ โˆ— ๐œ“ ๐‘—,๐‘ž ) and add to ๐‘†2 ๐‘ฅ. 9: end for 10: end for 11: end for 12: Return ๐‘†2 ๐‘ฅ. Algorithm 3.5 Two Layer Scattering Pyramid Inverse 1: INPUTS: The two layer pyramid ๐‘† 2 ๐‘ฅ. 2: OUTPUT: The original image ๐‘ฅ. 3: for ๐‘— = 0 to ๐ฝ โˆ’ 1 do 4: for ๐‘ž = 0 to ๐‘„ โˆ’ 1 do โˆ— (๐‘Š 5: Calculate ๐‘Š๐ฝ,๐‘„ ๐ฝ,๐‘„,1 (๐‘ฅ โˆ— ๐œ“ ๐‘—,๐‘ž ) โˆ’ ๐‘Š๐ฝ,๐‘„,โˆ’1 (๐‘ฅ โˆ— ๐œ“ ๐‘—,๐‘ž )), where corresponding filters are subtracted in the subtraction operation. 6: end for 7: end for 8: Note that this recovers all the bandpass filters in ๐‘Š๐ฝ,๐‘„ ๐‘ฅ, and the high and low pass residuals are already in ๐‘†2 ๐‘ฅ. 9: ๐‘ฅ = ๐‘Š๐ฝ,๐‘„โˆ— ๐‘Š ๐ฝ,๐‘„ ๐‘ฅ. 10: Return ๐‘ฅ. For the next section, we will use the algorithms above to formulate a modified texture synthesis algorithm. 3.6 Two Layer Nonlinear Heeger Bergen Texture Synthesis Using Algorithm 3.4, we can formulate a Heeger Bergen Texture Synthesis algorithm motivated by the scattering transform. 77 Algorithm 3.6 Heeger Bergen Scattering Texture Synthesis 1: Start with an white noise ๐ผ๐‘Š โˆˆ R ๐‘€ร—๐‘ , reference image ๐ผ ๐‘… โˆˆ R ๐‘€ร—๐‘ , set of scales ๐ฝ, number of rotations ๐‘„, and number of iterations ๐‘‡. 2: Calculate ๐‘† 2 ๐ผ ๐‘… and save it. 3: for ๐‘ก = 1 to ๐‘‡ do 4: Calculate ๐‘Š๐ฝ,๐‘„,1 ๐ผ๐‘Š and ๐‘Š๐ฝ,๐‘„,โˆ’1 ๐ผ๐‘Š . 5: 6: Update ๐‘Š๐ฝ,๐‘„,1 ๐ผ๐‘Š by histogram matching each element of ๐‘Š๐ฝ,๐‘„,1 ๐ผ๐‘Š with the corresponding filter in ๐‘Š๐ฝ,๐‘„,1 ๐ผ ๐‘… (using each filter in ๐‘Š๐ฝ,๐‘„,1 ๐ผ ๐‘… as the reference histogram) and update ๐‘Š๐ฝ,๐‘„,โˆ’1 ๐ผ๐‘Š by histogram matching each element of ๐‘Š๐ฝ,๐‘„,โˆ’1 ๐ผ๐‘Š with the corresponding filter in ๐‘Š๐ฝ,๐‘„,โˆ’1 ๐ผ ๐‘… (using each filter in ๐‘Š๐ฝ,๐‘„,โˆ’1 ๐ผ ๐‘… as the reference histogram). 7: Keep ๐ผ๐‘Š โˆ— ๐œ™ ๐ฝ = ReLU(๐ผ๐‘Š โˆ— ๐œ™ ๐ฝ ) โˆ’ ReLU(๐ผ๐‘Š โˆ— โˆ’๐œ™ ๐ฝ ) and ๐ผ๐‘Š โˆ— โ„Ž = ReLU(๐ผ๐‘Š โˆ— โ„Ž) โˆ’ ReLU(๐ผ๐‘Š โˆ— โˆ’โ„Ž) for inversion. 8: for ๐‘— = 0 to ๐ฝ โˆ’ 1 do 9: for ๐‘ž = 0 to ๐‘„ โˆ’ 1 do 10: for ๐œ€ = ยฑ1 do 11: Calculate ๐‘Š๐ฝ,๐‘„,๐œ– (๐ผ๐‘Š โˆ— ๐œ€๐œ“ ๐‘—,๐‘ž ) and histogram match with corresponding filter in ๐‘†2 ๐ผ ๐‘… . 12: end for 13: end for 14: end for 15: Invert the scattering pyramid ๐‘†2 ๐ผ๐‘Š via 3.5. 16: end for 17: Histogram match ๐ผ๐‘Š with ๐ผ ๐‘… using ๐ผ ๐‘… as the reference histogram. 78 Figure 3.10 Left: Reference Texture. Middle: One Layer ReLU Synthesis for 50 iterations. Right: Two Layer Synthesis for 10 iterations. Ran with ๐ฝ = 4 and ๐‘„ = 6. 79 3.7 Conclusions and Future Work Based on our current experiments, we believe there are multiple reasons why our approach fails. First, matching wavelet coefficients is an approximation of optimal transport. However, this approximation seems to get stuck in local minimums quickly. There are two possible reasons โ€ข Histogram matching wavelet coefficients is bad approximation of optimal transport. โ€ข Matching each coefficient works, but they "interfere" each other. That is, matches in one coefficiens might lead to better synthesis, but a match in a different coefficient leads to worse synthesis. The second point leads to an interesting question. Two representations can have close to the same histogram for each component, but their structure does not necessarily look the same; what else do we need to match to obtain structure such as edges and lines? One approach, which we will consider for deep neural networks in the next chapter, is to match ๐‘›-dimensional histograms. We can regard the set of wavelet coefficients as a ๐ป ร— ๐‘Š ร— ๐ถ tensor with each ๐ป ร— ๐‘Š slice of the last dimension being a wavelet coefficient. If we unroll the vector into a ๐ป๐‘Š ร— ๐ถ matrix and match ๐ป๐‘Š dimension histograms for each pixel, it is a good approximation of an optimal transport problem. Will this problem yield good results for synthesis? Our preliminary experiments suggest yes. If this is the case, can we study what type of frames yield good texture synthesis? Is invertibility/pseudoinvertibility really a necessary requirement? 80 CHAPTER 4 LONG RANGE CONSTRAINTS FOR NEURAL TEXTURE SYNTHESIS USING SLICED WASSERSTEIN LOSS This work is based on [51], which has been submitted to IEEE ICIP 2023. We introduce an algorithm for texture synthesis using a modified form of Sliced Wasserstein Loss. The main idea of both methods is similar: we match 1D histograms between feature maps. However, our approach for this chapter will use deep convolutional neural networks (CNNs) instead of an invertible wavelet frame. 4.1 Background on Convolutional Neural Networks The contents of this section is based on [21]. A CNN is neural network using series of convolutions to learn features from data. The general steps of the network for an image classification task are the following: โ€ข Initialize random weights for a set of filters and put in an input image ๐‘ฅ. โ€ข Apply a cascade of convolutional layers and nonlinearities using the filters. โ€ข Subsample the result using a subsampling operation such as max pooling, min pooling, or average pooling. โ€ข Repeat this process multiple times. โ€ข Use a classifier (usually feedforward neural network) to classify the image. โ€ข Apply backpropogation to update the filters via gradient descent. โ€ข Repeat the steps till one reaches desired performance on the desired task. An example of a convolutional architecture, LeNet, is given in Figure 4.1. First, consider the discrete convolution of functions ๐‘ฅ(๐‘ก) and ๐‘ค(๐‘ก), which are only defined for integer ๐‘ก: โˆž โˆ‘๏ธ ๐‘ (๐‘ก) = (๐‘ฅ โˆ— ๐‘ค)(๐‘ก) = ๐‘ฅ(๐‘Ž)๐‘ค(๐‘ก โˆ’ ๐‘Ž). (4.1) ๐‘Ž=โˆ’โˆž Convolutions are done over more than one dimension at a time. In particular, for two dimensions, 81 Figure 4.1 LeNet Architecture demonstrating the procedure for the forward pass of a CNN. for an image two-dimensional image ๐ผ and two dimensional kernel ๐พ, the convolution is โˆ‘๏ธ โˆ‘๏ธ ๐‘†(๐‘–, ๐‘—) = (๐พ โˆ— ๐ผ)(๐‘–, ๐‘—) = ๐ผ (๐‘– โˆ’ ๐‘š, ๐‘— โˆ’ ๐‘›)๐พ (๐‘š, ๐‘›), (4.2) ๐‘š ๐‘› which is a commutative operation. However, neural network libraries usually use the cross-correlation function โˆ‘๏ธ โˆ‘๏ธ ๐‘†(๐‘–, ๐‘—) = (๐พ โˆ— ๐ผ)(๐‘–, ๐‘—) = ๐ผ (๐‘– + ๐‘š, ๐‘— + ๐‘›)๐พ (๐‘š, ๐‘›). (4.3) ๐‘š ๐‘› due to ease of implementation. While the operation is not commutative, this does not affect the accuracy of a network because the kernel ๐พ is learned from the data anyway. In practice, when ones uses a convolutional neural network, we regard an image or feature map as an 3-tensor. For example, an ๐‘€ ร— ๐‘ RGB image is an ๐‘€ ร— ๐‘ ร— 3 tensor, where the last dimension is known as the channel dimension. Each channel of a the three tensor (i.e. each ๐‘€ ร— ๐‘ matrix) is known as a feature map. We can now provide a mathematical formulation for a convolutional layer of a network. Assume we have an ๐‘€ ร— ๐‘ ร— ๐ถ1 input. In other words, there are ๐ถ1 feature maps in the channel dimension. If we would like to output ๐ถ2 feature maps from our convolutional layer, define the set of filters {๐น๐‘–, ๐‘— } with 1 โ‰ค ๐‘– โ‰ค ๐ถ1 and 1 โ‰ค ๐‘— โ‰ค ๐ถ2 . A the "convolution" step of convolutional layer is ๐ถ1 โˆ‘๏ธ ๐ถ (๐‘ฅ) = ๐น๐‘–, ๐‘— โˆ— ๐‘ฅ, (4.4) ๐‘–=1 which has channel dimension ๐ถ2 . After the application of the convolution step above, one applies 82 a pointwise nonlinearity, ๐œŽ to get ๐ถ1 ! โˆ‘๏ธ ๐œŽ(๐ถ (๐‘ฅ)) = ๐œŽ ๐น๐‘–, ๐‘— โˆ— ๐‘ฅ . (4.5) ๐‘–=1 Common choices are the sigmoid function, tanh, ReLU, and so on. After a convolutional layer, CNNs usually have a subsampling operation to decrease the size of the feature maps. This makes training networks less computationally intensive and helps with extracting features such as edges. Usually, a pooling operation, such as max pooling, which involves sliding a kernel and taking the maximum along a window, is used. The reason CNNs work better than, such as feedfoward neural networks, for vision tasks is that they encode information in a more relevant manner. Specifically, the convolution step of a convolutional layer is translation equivariant (e.g. applying a translation to an image before and after the convolution step yields the same result); convolutions are a local operation, and local information is more relevant for images; a feature map shares parameters with other feature maps, which captures interaction between different feature maps and lowers the total number of learned parameters. Parameter sharing and more capacity for larger networks are some of the reasons one would suspect a deep CNN to work better than a scattering transform for classification tasks. Regarding deep CNNs, we are interested in a specific architecture which is commonly used for neural texture synthesis tasks, VGG19 [45]. See Figures 4.2 and 4.3. Figure 4.2 Schematic of VGG19 architecture. The blocks "Conv" are a convolution block and ReLU, "MaxPooling" is a max pooling operation, and "FC+relu" is a linear layer and ReLU. 4.2 Image Quality Metrics To evaluate the effectiveness of our texture synthesis algorithms, discuss a few image quality metrics that we will use to compare the quality of images we generated. Traditional image metrics, 83 Figure 4.3 VGG19 schematic showing how feature map size decreases as the number of filters increase. like PSNR and SSIM, are not perfect image metrics. Consider the following example: Figure 4.4 Two examples where human perception does not match with common image metrics. In the example, one can clearly see that Patch 1 looks similar to the reference, and is slightly deformed. However, common image metrics like MSE would not be able to handle the distortions properly. Additionally, Patch 0, which is a low pass filtering of the reference, would be recognized as similar by traditional metrics. Note that these metrics we will discuss are not optimal either. In all cases, if we calculate the similarity between two of the same image, the image metric would 84 yield the best score. Like we mentioned previously, this is not ideal for texture synthesis. The first metric we consider is LPIPS [52], which measures the perceptual similarity between two images. The idea is to use MSE between feature maps of a deep convolutional neural network. Figure 4.5 LPIPS calculation between an image ๐‘ฅ and ๐‘ฅ 0 . Suppose we have โ„“ = 1, . . . , ๐ฟ layers of a neural network. We extract feature stacks from layer โ„“ and unit-normalize in the channel dimension, which we designate ๐‘ฆ โ„“ , ๐‘ฆห† โ„“ โˆˆ R๐ปโ„“ ร—๐‘Šโ„“ ร—๐ถโ„“ . We do a channel-wise scaling of the activiations by vector ๐‘ค โ„“ โˆˆ R๐ถโ„“ Finally, we average spatially and over the channels. This can be represented as ๐ฟ ๐ปโ„“ โˆ‘๏ธ ๐‘Šโ„“ โˆ‘๏ธ 1 โˆ‘๏ธ ๐‘‘ (๐‘ฅ, ๐‘ฅ 0 ) = โˆฅ๐‘ค โ„“ โ—ฆ (๐‘ฆ โ„“ โˆ’ ๐‘ฆห† โ„“ )โˆฅ 22 (4.6) โ„“=1 ๐ปโ„“ ๐‘Šโ„“ ๐‘–=1 ๐‘—=1 Usually, a deep network like VGG or AlexNet are used for feature map extraction. The second metric we use is Frechet Inception distance (FID) [25]. The 2-Wasserstein Distance, or Frechet Distance, is given by โˆซ ๐‘‘ ๐น2 (๐œ‡, ๐œˆ) = inf โˆฅ๐‘ฅ โˆ’ ๐‘ฆโˆฅ 2 ๐‘‘๐›พ(๐‘ฅ, ๐‘ฆ) (4.7) ๐›พโˆˆฮ“(๐œ‡,๐œˆ) R๐‘› ร—R๐‘› where ฮ“(๐œ‡, ๐œˆ) is the set of joint probability measures such that  โˆซ  ๐‘› ๐›พ: ๐›พ(๐‘ฅ, ๐‘ฆ) = ๐‘‘๐‘ฆ = ๐œ‡(๐‘ฅ) and R }๐›พ(๐‘ฅ, ๐‘ฆ) = ๐‘‘๐‘ฅ = ๐œˆ(๐‘ฆ) . R๐‘› For FID, one assumes that the model fits to a Gaussian distribution. Then for two Gaussian distributions ๐‘ 1 with distribution N (๐œ‡1 , ฮฃ1 ) and ๐‘ 2 with distribution N (๐œ‡2 , ฮฃ2 ), one can write the 85 FID as    1/2  ๐‘‘ ๐น2 ( ๐‘ 1 , ๐‘ 2 ) = โˆฅ๐œ‡1 โˆ’ ๐œ‡2 โˆฅ 22 + tr ฮฃ1 + ฮฃ1โ€ฒ โˆ’2 ฮฃ11/2 ฮฃ2 ฮฃ11/2 . (4.8) Usually, the function to estimate the distribution is a specific layer of a deep convolutional neural network, similar to LPIPS. KID score [8] is similar to FID score, but relaxes the assumption that both distributions are Gaussian and is defined as KID( ๐‘ 1 , ๐‘ 2 ) = MMD( ๐‘ 1 , ๐‘ 2 ) 2 . (4.9) More detail is given in [8], but it will be omitted from this thesis. 4.3 Texture Synthesis Using VGG19 and Gram Matrices The contents of this section are adapted from [19], which has made led to remarkable improve- ments in the field of texture synthesis since its publication. For notation, assume we have an ๐ฟ + 1 layer convolutional neural network; for โ„“ = 0, . . . , ๐ฟ, layer โ„“ has ๐‘โ„“ distinct feature maps. Each of these feature maps is a ๐‘€โ„“ vector after flattening. We reshape feature maps in layer โ„“ into a matrix ๐นโ„“ โˆˆ R๐‘โ„“ ร—๐‘€โ„“ , where each row of the matrix is a feature map. Assuming that each row of ๐นโ„“ (i.e. each feature map in layer โ„“) is mean centered, we define the Gram matrix ๐บ โ„“ โˆˆ R๐‘› ร— R๐‘› as โˆ‘๏ธ ๐บ ๐‘–โ„“๐‘— = ๐น๐‘–๐‘˜โ„“ ๐น ๐‘—โ„“๐‘˜ . (4.10) ๐‘˜ In matrix form, we have ๐บ โ„“ = ๐น โ„“ (๐น โ„“ )๐‘‡ . To generate a new texture from a reference texture, we update a white noise image using gradient descent by optimizing over the mean-squared error (MSE) between the Gram matrices of the reference image and the white noise. More formally, let ๐‘ฅยฎ be a reference image and ๐‘ฅยฎห† be the generated image. Also let the gram matrices for ๐‘ฅยฎ be ๐บ โ„“ and the gram matrices for ๐‘ฅยฎห† be ๐‘”ห† โ„“ . Define 1 โˆ‘๏ธ ๐ธโ„“ = (๐บ ๐‘–โ„“๐‘— โˆ’ ๐บห† ๐‘–โ„“๐‘— ) 2 , (4.11) 4๐‘โ„“2 ๐‘€โ„“2 ๐‘–, ๐‘— which is the MSE between the gram matrices of ๐‘ฅยฎ and ๐‘ฅยฎห† in layer โ„“. The full loss function is given 86 by ๐ฟ โˆ‘๏ธ L (ยฎ ๐‘ฅ , ๐‘ฅยฎห† ) = ๐‘ค โ„“ ๐ธโ„“ , (4.12) โ„“=0 where ๐‘ค โ„“ are user-defined weights. For updating the image, one uses gradient descent. In particular, the following identity holds:    1 ห†  ห† ๐นห†๐‘–โ„“๐‘— > 0 ๏ฃฑ โ„“ ๐‘‡ โ„“ โ„“ ๏ฃฒ ๐‘ 2 ๐‘€ 2 ( ๐น ) ๐บ โˆ’ ๐บ ๐‘—๐‘– , ๏ฃด ๏ฃด ๏ฃด ๐œ•๐ธ โ„“ ๏ฃด = โ„“ โ„“ (4.13) ๐œ• ๐นห†๐‘–โ„“๐‘— ๏ฃด๏ฃด ๏ฃด 0, ๏ฃด otherwise. ๏ฃณ In practice, one does not implement the gradient manually. Instead, one can use open-source deep learning packages, like PyTorch or Tensorflow, with a built in optimizer (L-BFGS) to update the white noise image. Before we provide a unified workflow, we discuss the specific CNN architecture, which is a modified version of VGG-19, used for texture synthesis. The following changes were made to the original version of VGG-19: โ€ข The max pooling layers are switched to average pooling layers. No retraining of the network is done. The authors noticed that this change improved image quality. โ€ข The weights of the network are transformed so that the mean of each feature map is 1. Additionally, during the synthesis process the feature maps from the layers โ€™conv1_1โ€™, โ€™pool1โ€™, โ€™pool2โ€™, โ€™pool3โ€™, and โ€™pool4โ€™ are used to make Gram matrices. The other convolutional layers and the fully connected part of the network are not used.A schematic of the general workflow is given in Figure 4.6 and some synthesis results are given in Figure 4.7. 87 Figure 4.6 The general synthesis method is given below. The images ๐‘ฅยฎ and ๐‘ฅยฎห† are both passed through the network. Gram matrices are created from specific feature maps, and the loss is calculated. Using a backpropogation engine, the ๐‘ฅยฎห† is updated. Picture reference: [19]. 88 Figure 4.7 Results for textures synthesis after each gram matrix is added for synthesis. "Original" denotes the reference image and "Portilla and Simoncelli" are results from [40]. Picture reference: [19]. One notable issue with the results above is that long range constraints are not captured by this algorithm. In other words, the alignment of objects isnโ€™t capture by this algorithm. This is expected because CNNs aggregate local information. 89 4.4 Improvements Via Regularization To ameliorate the lack of long range information captured by [19], [31] propose adding regu- larization (4.12). For an image ๐ผ, let E ๐ผ = { ๐ผหœ : |F (๐ผ)| = |F ( ๐ผ)|}. หœ (4.14) That is, (4.14) is the set of images with the same power spectrum as ๐ผ. We can find the projection of an image ๐ผห† onto E ๐ผ via ห† ยท F ( ๐ผ) ห†โˆ—   F ( ๐ผ) ๐ผหœ := F ยท F (๐ผ) . (4.15) ห† ยท F ( ๐ผ) |F ( ๐ผ) ห† โˆ—| ห† E ๐ผ ) be the distance of ๐ผห† to E ๐ผ . If we denote LCNN to be the loss from (4.12), then our new Let ๐‘‘ ( ๐ผ, loss function between images is ๐›ฝ ห† L = LCNN + ๐‘‘ ( ๐ผ, E ๐ผ ) 2 , (4.16) 2 where ๐›ฝ is a hyperparameter. In Figure 4.8, some examples of synthesis using the spectrum constraint are given. As shown in Figure 4.8, there is notable improvement in alignment for these textures. However, it is difficult to find a value for ๐›ฝ that works on a variety of images. This would require time-consuming hyperparameter tuning. Additionally, if one required high quality synthesis for multiple textures, one may have to tune ๐›ฝ depending on each texture. It would be ideal if one could find a method that did not require hyperparameter tuning for high quality synthesis results. 90 Figure 4.8 Results for textures synthesis using a spectrum constraint. The hyperparameter ๐›ฝ is 105 . Left: Reference Texture. Middle: Without Spectrum Constraint. Right: With Spectrum Constraint. 4.5 Texture Synthesis Using Sliced Wasserstein Loss Before we discuss the method from [51], we will provide some background information about texture synthesis using Sliced Wasserstein Loss from [24]. Suppose that layer โ„“ of an ๐ฟ layer convolutional neural network has ๐‘โ„“ channels and ๐‘€โ„“ pixels in each channel. We denote the feature vector located at pixel ๐‘š as ๐น๐‘šโ„“ โˆˆ R๐‘โ„“ , which is a change in notation from previous sections. The authors of [24] propose using a different set of statistics instead of the mean squared error between gram matrices. In a manner similar to [23, 48], the authors match the distributions between feature maps, but these feature maps used in [24] are feature maps of VGG19 [45]. With respect a network architecture, let ๐‘ โ„“ and ๐‘ห†โ„“ be the probability density functions associated 91 with the set of feature vectors {๐น๐‘šโ„“ } and { ๐นห†๐‘šโ„“ }. Since our network is discrete, we assume that the probability density functions are always an average of Dirac delta distributions of the form ๐‘€โ„“ โ„“ 1 โˆ‘๏ธ ๐‘ (๐‘ฅ) = ๐›ฟ โ„“ (๐‘ฅ). (4.17) ๐‘€โ„“ ๐‘š=1 ๐น๐‘š Let ๐‘‰ โˆˆ S๐‘โ„“ be a random direction on the unit sphere of dimension ๐‘โ„“ . For the purpose of this paper, the Sliced Wasserstein Distance between two distributions of features is of the form LSW,โ„“ ( ๐‘ โ„“ , ๐‘ห†โ„“ ) = E๐‘‰ [LSW1D ( ๐‘๐‘‰โ„“ , ๐‘ห†๐‘‰โ„“ )], (4.18) where ๐‘๐‘‰โ„“ := {โŸจ๐น๐‘šโ„“ , ๐‘‰โŸฉ} (4.19) is a set consisting of batched projections of the feature maps ๐น๐‘šโ„“ onto the directions ๐‘‰; if we make a vector ๐‘ƒ๐‘‰โ„“ consisting of the elements of ๐‘๐‘‰โ„“ , the 1๐ท Sliced Wasserstein Loss is the 2-norm between sets of sorted projections: 1 2 LSW1D ( ๐‘๐‘‰โ„“ , ๐‘ห†๐‘‰โ„“ ) = sort(๐‘ƒ๐‘‰โ„“ ) โˆ’ sort( ๐‘ƒห†๐‘‰โ„“ ) 2 (4.20) len(๐‘ƒ๐‘‰ )โ„“ and the full Sliced Wasserstein loss over all the layers is โˆ‘๏ธ๐ฟ โ„“ โ„“ LSW (๐ผ1 , ๐ผ2 ) = LSW,โ„“ ( ๐‘๐‘‰,๐ผ 1 , ๐‘๐‘‰,๐ผ 2 ), (4.21) โ„“=1 for images ๐ผ1 and ๐ผ2 , respectively. For practical applications, one uses a loss of the form โˆ‘๏ธ๐ฟ โ„“ โ„“ LSW (๐ผ1 , ๐ผ2 ) = ๐‘ค ๐‘– LSW,โ„“ ( ๐‘๐‘‰,๐ผ 1 , ๐‘๐‘‰,๐ผ 2 ), (4.22) โ„“=1 where ๐‘ค ๐‘– are weight terms that set to zero for layers that are not used. We will use this formulation for the rest of the paper. An implementation of 4.20 is provided below in pseudocode: 92 Algorithm 4.1 Implementation of 4.20 1: Set ๐ผWN as variable to be updated by optimizer. 2: for ๐‘˜ = 1, . . . , ๐‘€ do 3: Calculate Extract(๐ผWN ). 4: Calculate Extract(๐ผref ). 5: Calculate LSlicing (๐ผWN , ๐ผref ). 6: Backpropogate and update ๐ผWN . 7: end for 8: Return updated ๐ผWN as synthesized texture. Pitie et al. [39] showed that Sliced Wasserstein Distance satisfies: LSW ( ๐‘, ๐‘) ห† = 0 =โ‡’ ๐‘ = ๐‘. ห† The same does not hold for other losses used for texture synthesis, such as the gram matrix loss. Thus, using a Sliced Wasserstein-based loss should capture more stationary statistics compared to the traditional Gram Loss. Figure 4.9 Synthesis results using the gram matrix loss, denoted LGram and using LSW . In Figure 4.9, some examples of synthesis using sliced wasserstein loss are given above. While the alignment is better for the synthesis, the algorithm still has trouble capturing long range 93 constraints, like alignment. The authors propose adding a spatial tag as fourth channel in an RGB image to guide synthesis, which is shown in Figure 4.10. Figure 4.10 Synthesis results with and without a spatial tag. While adding a spatial tag creates visually appealing textures, one needs a spatial tag for each constraint they wish to impose. However, itโ€™s unlikely to have a prepared spatial tag for each texture one would like to generate, especially if the texture is highly irregular. Thus, it would be ideal if there was an algorithm that could capture long range constraints without user guidance and without tedious hyperparameter tuning. 4.6 New Texture Synthesis Algorithm The method proposed in [24] cannot effectively capture long range constraints unless a user- added spatial tag is added to guide synthesis. This is most likely Sliced Wasserstein Loss does not fully capture nonstationary statistics in an image. Thus we propose a new set of statistics for texture synthesis based on Sliced Wasserstein Loss to capture long range constraints without any 94 supervision or hyperparameter tuning. Our experiments show that our proposed set of statistics provides competitive results with current approaches. Additionally, we augment our synthesis results via a coarse-to-fine multi-scale procedure, which yield state-of-the-art results. Instead of just matching distributions via slicing over the channel dimension of the feature maps, we purpose matching more statistics in a very simple way. Consider a set of feature maps ๐น โ„“ โˆˆ R๐ปโ„“ ร—๐‘Šโ„“ ร—๐‘โ„“ . In the original algorithm, we unravel each ๐ปโ„“ ร— ๐‘Šโ„“ feature map, consider each pixel ๐‘š to get feature vectors of length ๐‘โ„“ , and project onto direction ๐‘‰ in Eq. (4.21). Another way to reshape the feature maps is to unravel them into ๐ปโ„“ different ๐‘Šโ„“ ร— ๐‘โ„“ feature maps, ๐น๐ปโ„“ โ„“ being a vector of all ๐‘›th pixels of each feature vector), and project them onto ๐‘‰ (with ๐น๐ป,๐‘› ๐ปโ„“ โˆˆ S . ๐ปโ„“ Analogous to Eq. (4.19), for the distribution ๐‘ โ„“๐ป associated to feature vectors {๐น๐ป,๐‘› โ„“ } we can define another set of batched projections given by ๐‘๐‘‰โ„“ ๐ป = {โŸจ๐น๐ป,๐‘› โ„“ , ๐‘‰๐ปโ„“ โŸฉ}. (4.23) โ„“ The corresponding additional loss term is ๐ฟ โˆ‘๏ธ   โ„“ LSW,๐ป (๐ผ1 , ๐ผ2 ) = ๐‘ค ๐‘– LSW,โ„“ ๐‘๐‘‰โ„“ ๐ป ,๐ผ1 , ๐‘๐‘‰๐ป ,๐ผ2 . (4.24) โ„“ โ„“ โ„“=1 Intuitively, this loss term accounts for alignment in an image by slicing over the dimension for the height of the feature maps rather than the dimension for the channel of the feature maps. The new loss function we consider is LSlicing (๐ผ1 , ๐ผ2 ) = LSW (๐ผ1 , ๐ผ2 ) + LSW,๐ป (๐ผ1 , ๐ผ2 ), (4.25) which is the sum of Eq. (4.21) and Eq. (4.24). Denote the feature map extraction from VGG19 as Extract(๐ผ). For our algorithm, we start with a reference image ๐ผref and a white noise ๐ผWN and run for ๐‘€ epochs. Our implementation for slicing synthesis is the same as in [24] for Eq. (4.21) where we perform a batch projection on ๐‘โ„“ directions and sort. For our additional loss term in equation Eq. (4.24), the number of batched projections we make is ๐ปโ„“ . In our next algorithm, assume that the goal is to synthesize an image the same size as the reference image without any loss of generality. 95 Algorithm 4.2 Synthesis Algorithm 1: Set ๐ผWN as variable to be updated by optimizer. 2: for ๐‘˜ = 1, . . . , ๐‘€ do 3: Calculate Extract(๐ผWN ). 4: Calculate Extract(๐ผref ). 5: Calculate LSlicing (๐ผWN , ๐ผref ). 6: Backpropogate and update ๐ผWN . 7: end for 8: Return updated ๐ผWN as synthesized texture. The settings for the slicing loss are to use the first 12 layers of VGG19 for calculating LSW and the first two convolutions (after the ReLU) in each convolution block for for calculating LSW,๐ป . The L-BFGS optimizer [30] is used for optimization with a learning rate of ๐œ‚ = 1. We start by comparing results with [24]. We use the authorโ€™s TensorFlow implementation, which is a previous commit in https://github.com/tchambon/A-Sliced-Wasserstein-Loss-for-Neural- Texture-Synthesis. without their spatial tag on some relatively periodic textures. For all the experiments in this paper, our texture sources were the following: โ€ข https://github.com/omrysendik/DCor/tree/master/Data โ€ข https://www.robots.ox.ac.uk/ vgg/data/dtd for generating 256 ร— 256 textures. The results are given in Fig. 4.11. 96 Figure 4.11 Original SW loss vs. Eq. (4.25). Left: Reference. Mid: SW Loss. Right: Eq. (4.25) Loss (Ours). However, there are still some textures that are not generated perfectly. See Fig. 4.12 for some examples. Figure 4.12 Less successful cases. Left: Reference. Mid: SW Loss. Right: Eq. (4.25) Loss (Ours). Now the results of using Eq. (4.25) is compared to [31]. Note that we use the implementation from [20] for this comparison. as an additional point of comparison. There is no comparison with [43] because experiments from [43, 20] have shown the algorithm does not yield much 97 improvement for nonperiodic textures. Take note of the top row of Figure 4.13 in particular. The Figure 4.13 Gram Matrix + Spectrum vs. Eq. (4.25) Loss. Left: Reference. Mid: Spectrum Constraint. Right: Eq. (4.25) Loss (Ours). spectrum constraint produces better results, which suggests that there could be improvement in our proposed synthesis method. Quantitative comparisons between the original SW loss, the spectrum constraint, and our proposed method using a set of 34 images are provided in this section. The LPIPS [52], FID [25], crop-based FID. This is done by taking sixty-four 128 ร— 128 crops of the reference texture and synthesized texture for each exemplar (the FID/KID score is calculated between these two sets of images. For the ground truth case, a different set of crops of the reference is used) like [32], KID [8], and crop-based KID (c-KID) score are provided in Table 4.1. For FID and KID based scores, the implementation from [37] is used. In Table 4.1, SW stands for the method using the original SW Loss, Spec. stands for using a spectrum constraint, and GT stands for the Ground Truth. 98 Table 4.1 Quantitative Comparison Method LPIPS FID c-FID KID c-KID Ours 0.437 107.220 71.938 โˆ’0.014 0.073 SW 0.454 101.768 78.683 โˆ’0.016 0.083 Spec. 0.447 99.615 78.250 โˆ’0.016 0.083 GT 0 0 18.069 โˆ’0.025 0 From the table, our results are competitive and our proposed set of statistics did not require searching for a proper hyperparameter to get competitive results. Note that our results for FID and c-FID vary compared to [32] because FID is a biased estimate [14] and our sample count is very low. 4.7 Improvements via a Multi-scale Approach Since there is room for improvement in our synthesis, we augment our algorithm with a multi- scale procedure in a manner identical to [20, 49]. For the multi-scale algorithm at ๐พ scales, let ๐ผref,๐‘– be the reference image downsampled by a scale factor of 2๐‘– with ๐‘– = 0, . . . , ๐พ, and define the upsampling operator as Upsample(๐ผ). Lastly, define the output of Algorithm 4.2 using the notation ๐ผSynthesis = SWSynthesis(๐ผinput , ๐ผref ), where ๐ผinput is the input to be optimized via backpropogation, ๐ผref is the reference texture, and ๐ผSynthesis is the output after synthesis using Algorithm 1. The results are given in Figure 4.14. Algorithm 4.3 Multi-scale Synthesis Algorithm 1: Initialize ๐ผSynthesis as a white noise that is the same size as the reference texture downsampled by 2๐พ . 2: for ๐‘– = 0, . . . , ๐พ do 3: ๐ผSynthesis โ† SWSynthesis(๐ผSynthesis , ๐ผref,๐พโˆ’๐‘– ). 4: ๐ผSynthesis โ† Upsample(๐ผSynthesis ). 5: end for 6: Return ๐ผSynthesis as the synthesized texture. In Figure 4.14, note the small improvements in edge generation and general structure when using ๐พ = 1 compared to ๐พ = 0. 99 Figure 4.14 Multi-scale procedure at different scales. Left: Reference. Mid Left: ๐พ = 0. Mid Right: ๐พ = 1. Right: ๐พ = 2. The intuition for why it works is that it generates the details of the texture in a coarse-to-fine way; the initial scale generates the general color and macro-scale features and additional scales add on fine-grain details in an image. Figure 4.15 Progression of synthesis that lead to repetitions. Left: Reference Texture. Middle Left: ๐พ = 0. Middle Right: ๐พ = 1. Right: ๐พ = 2. However, it is possible to create replica textures for larger values of ๐พ. Of the 34 images 100 generated for the experiments, there were four repetitions when ๐พ = 2. See Figure 4.15 for examples. Additionally, a quantitative comparison using the same image quality metrics from before is provided. Unlike in previous comparisons, it would be better to have a higher LPIPS score at a comparable generative metric; this would mean that our textures are less likely to be replicas, but still have similar qualities. The results are given in Table 4.2. Table 4.2 Quantitative Score for ๐พ = 0, 1, 2 Scale LPIPS FID c-FID KID c-KID ๐พ=0 0.437 107.220 71.938 โˆ’0.014 0.073 ๐พ=1 0.381 67.118 53.908 โˆ’0.018 0.044 ๐พ=2 0.250 38.304 40.220 โˆ’0.022 0.027 Based on the scores and Fig. 5, ๐พ = 1 provides a nice mix between diversity and image quality at our fixed image size. In the Figure 4.16, the multi-scale approach is applied without the additional loss term in Eq. (4.25). Figure 4.16 Results with SW Loss. Left: Reference. Mid Left: ๐พ = 0. Mid Right: ๐พ = 1. Right: ๐พ = 2. 101 From our results, the multi-scale approach by itself is not enough to fully capture nonstationary statistics or enforce long range constraints. That is to say, the loss term added in Eq. (4.25) is absolutely necessary to capture long-range constraints in textures. Figure 4.17 Comparison of results. Left: Reference. Mid Left: SW Loss. Mid Right: Gonthier. Right: ๐พ = 1 (Ours). Lastly, the results using ๐พ = 1 are compared with [20] using the default settings from their experiments (๐พ = 2) to show the effectiveness our mutli-scale results relative to another multi-scale algorithm. The results from [24] again for an additional point of reference. Some results are shown 102 in Figure 4.17 and a quantitative study is given in Table 4.3. Table 4.3 Comparison of ๐พ = 1, SW Loss, Gonthier Method LPIPS FID c-FID KID c-KID ๐พ=1 0.381 67.118 53.908 โˆ’0.018 0.044 SW 0.454 101.768 78.683 โˆ’0.016 0.083 Gonthier 0.415 77.569 67.728 โˆ’0.018 0.067 GT 0 0 18.069 โˆ’0.025 0 4.8 Conclusions We present a modification of texture synthesis via Sliced Wasserstein Loss that has the ability to add long range constraints without user-added spatial tags (supervision). Our additional loss term can be thought of as a regularization term, but unlike traditional regularization terms, one does not need to hyperparameter tune to enforce long range constraints. That is to say, the proposed method requires less user supervision for competitive results. One thing we have not tested is whether the number of scales is dependent on image size. We believe this is true, and it is probable that one would need to choose the number of scales based on the size of the image. 103 CHAPTER 5 CONCLUDING REMARKS We have addressed two important problems in statistical signal processing: mtuli-reference align- ment and texture synthesis. For multi-reference alignment, like we mentioned in chapter 2, many open problems remain. In one dimension, one question to consider is how we can approach general diffeomorphisms. While considering a specific set of group actions seems to be the most viable approach, one wonders whether it would be possible to consider limiting the size of โˆฅ๐œโ€ฒ โˆฅ โˆž and โˆฅ๐œโ€ฒโ€ฒ โˆฅ โˆž , which would make ๐œ "close" to a translation. Additionally, is it possible to generalize all our results to two and three dimensions? Both these cases would be more relevant to practitioners in cryo-EM. Regarding texture synthesis, one wonders whether a deep representation is actually needed. That is, could we use a wavelet transform or scattering transform with sliced wasserstein loss to generate textures? Our preliminary experiments, which are not available in this thesis, suggest this is possible. However, the synthesis quality is not as strong. This suggests that we could use a different filter bank for better synthesis. VGG models have omnidirectional filters and all the filters are not necessarily positive, which we believe is responsible for strong synthesis. Future work will focus on testing our hypotheses. 104 BIBLIOGRAPHY [1] Emmanuel Abbe, Tamir Bendory, William Leeb, Joรฃo M Pereira, Nir Sharon, and Amit Singer. Multireference alignment is easier with an aperiodic translation distribution. IEEE Transactions on Information Theory, 65(6):3565โ€“3584, 2018. [2] Afonso Bandeira, Yutong Chen, Roy R Lederman, and Amit Singer. Non-unique games over compact groups and orientation estimation in cryo-em. Inverse Problems, 2020. [3] Afonso S Bandeira, Ben Blum-Smith, Joe Kileel, Amelia Perry, Jonathan Weed, and Alexan- der S Wein. Estimation under group actions: recovering orbits from invariants. arXiv preprint arXiv:1712.10163, 2017. [4] Afonso S Bandeira, Nicolas Boumal, and Vladislav Voroninski. On the low-rank approach for semidefinite programs arising in synchronization and community detection. In Conference on learning theory, pages 361โ€“382, 2016. [5] Afonso S Bandeira, Moses Charikar, Amit Singer, and Andy Zhu. Multireference alignment using semidefinite programming. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 459โ€“470. ACM, 2014. [6] Tamir Bendory, Alberto Bartesaghi, and Amit Singer. Single-particle cryo-electron mi- croscopy: Mathematical theory, computational challenges, and opportunities. IEEE Signal Processing Magazine, pages 58โ€“76, March 2020. [7] Tamir Bendory, Nicolas Boumal, Chao Ma, Zhizhen Zhao, and Amit Singer. Bispectrum inversion with application to multireference alignment. IEEE Transactions on Signal Pro- cessing, 66(4):1037โ€“1050, 2017. [8] Mikoล‚aj Biล„kowski, Danica J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying mmd gans. In International Conference on Learning Representations. [9] Nicolas Boumal. Nonconvex phase synchronization. SIAM Journal on Optimization, 26(4):2355โ€“2377, 2016. [10] Thibaud Briand, Jonathan Vacher, Bruno Galerne, and Julien Rabin. The heeger & bergen pyramid based texture synthesis algorithm. Image processing on line, 4:276โ€“299, 2014. [11] Lisa Gottesfeld Brown. A survey of image registration techniques. ACM computing surveys (CSUR), 24(4):325โ€“376, 1992. [12] Yuxin Chen and Emmanuel J Candรจs. The projected power method: An efficient algo- rithm for joint alignment from pairwise differences. Communications on Pure and Applied Mathematics, 71(8):1648โ€“1714, 2018. 105 [13] Yuxin Chen, Leonidas J Guibas, and Qi-Xing Huang. Near-optimal joint object matching via convex relaxation. In Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 100โ€“108, 2014. [14] Min Jin Chong and David Forsyth. Effectively unbiased fid and inception score and where to find them. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6070โ€“6079, 2020. [15] WB Collis, PR White, and JK Hammond. Higher-order spectra: the bispectrum and trispec- trum. Mechanical systems and signal processing, 12(3):375โ€“394, 1998. [16] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1โ€“22, 1977. [17] Robert Diamond. On the multiple simultaneous superposition of molecular structures by rigid body transformations. Protein Science, 1(10):1279โ€“1287, 1992. [18] Hassan Foroosh, Josiane B Zerubia, and Marc Berthod. Extension of phase correlation to subpixel registration. IEEE transactions on image processing, 11(3):188โ€“200, 2002. [19] Leon Gatys, Alexander S Ecker, and Matthias Bethge. Texture synthesis using convolutional neural networks. Advances in neural information processing systems, 28, 2015. [20] Nicolas Gonthier, Yann Gousseau, and Saรฏd Ladjal. High-resolution neural texture synthesis with long-range constraints. Journal of Mathematical Imaging and Vision, 64(5):478โ€“492, 2022. [21] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. [22] Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica: Journal of the Econometric Society, pages 1029โ€“1054, 1982. [23] David J Heeger and James R Bergen. Pyramid-based texture analysis/synthesis. In Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, pages 229โ€“ 238, 1995. [24] Eric Heitz, Kenneth Vanhoey, Thomas Chambon, and Laurent Belcour. A sliced wasserstein loss for neural texture synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9412โ€“9420, 2021. [25] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochre- iter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017. 106 [26] Matthew Hirn and Anna Little. Unbiasing procedures for scale-invariant multi-reference alignment. arXiv preprint arXiv:2107.01274, 2021. [27] Matthew Hirn and Anna Little. Wavelet invariants for statistically robust multi-reference alignment. Information and Inference: A Journal of the IMA, 10(4):1287โ€“1351, 2021. [28] Zvi Kam. The reconstruction of structure from electron micrographs of randomly oriented particles. In Electron Microscopy at Molecular Dimensions, pages 270โ€“277. Springer, 1980. [29] Fima C Klebaner. Introduction to stochastic calculus with applications. World Scientific Publishing Company, 2012. [30] Dong C Liu and Jorge Nocedal. On the limited memory bfgs method for large scale optimiza- tion. Mathematical programming, 45(1-3):503โ€“528, 1989. [31] Gang Liu, Yann Gousseau, and Gui-Song Xia. Texture synthesis through convolutional neural networks and spectrum constraints. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 3234โ€“3239. IEEE, 2016. [32] Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A Reda, Karan Sapra, Andrew Tao, and Bryan Catanzaro. Transposer: Universal texture synthesis using feature maps as transposed convolution filter. arXiv preprint arXiv:2007.07243, 2020. [33] Stรฉphane Mallat. Recursive interferometric representations. In 18th European Signal Pro- cessing Conference (EUSIPCO-2010), Aalborg, Denmark, 2010. [34] Stรฉphane Mallat. Group invariant scattering. Communications on Pure and Applied Mathe- matics, 65(10):1331โ€“1398, October 2012. [35] Wooram Park and Gregory S Chirikjian. An assembly automation approach to alignment of noncircular projections in electron microscopy. IEEE Transactions on Automation Science and Engineering, 11(3):668โ€“679, 2014. [36] Wooram Park, Charles R Midgett, Dean R Madden, and Gregory S Chirikjian. A stochastic kinematic model of class averaging in single-particle electron microscopy. The International journal of robotics research, 30(6):730โ€“754, 2011. [37] Gaurav Parmar, Richard Zhang, and Jun-Yan Zhu. On aliased resizing and surprising subtleties in gan evaluation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11410โ€“11420, 2022. [38] Amelia Perry, Alexander S Wein, Afonso S Bandeira, and Ankur Moitra. Message-passing algorithms for synchronization problems over compact groups. Communications on Pure and Applied Mathematics, 71(11):2275โ€“2322, 2018. 107 [39] F. Pitie, A.C. Kokaram, and R. Dahyot. N-dimensional probability density function transfer and its application to color transfer. In Tenth IEEE International Conference on Computer Vision (ICCVโ€™05) Volume 1, volume 2, pages 1434โ€“1439 Vol. 2, 2005. [40] Javier Portilla and Eero P Simoncelli. A parametric texture model based on joint statistics of complex wavelet coefficients. International journal of computer vision, 40:49โ€“70, 2000. [41] Brian M Sadler and Georgios B Giannakis. Shift-and rotation-invariant object reconstruction using the bispectrum. JOSA A, 9(1):57โ€“69, 1992. [42] Sjors HW Scheres, Mikel Valle, Rafael Nuรฑez, Carlos OS Sorzano, Roberto Marabini, Ga- bor T Herman, and Jose-Maria Carazo. Maximum-likelihood multi-reference refinement for electron microscopy images. Journal of molecular biology, 348(1):139โ€“149, 2005. [43] Omry Sendik and Daniel Cohen-Or. Deep correlations for texture synthesis. ACM Transactions on Graphics (ToG), 36(5):1โ€“15, 2017. [44] Nir Sharon, Joe Kileel, Yuehaw Khoo, Boris Landa, and Amit Singer. Method of moments for 3-D single particle ab initio modeling with non-uniform distribution of viewing angles. Inverse Problems, 36(4):044003, 2020. [45] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [46] Amit Singer. Angular synchronization by eigenvectors and semidefinite programming. Ap- plied and computational harmonic analysis, 30(1):20โ€“36, 2011. [47] Amit Singer. Mathematics for cryo-electron microscopy. In Proceedings of the International Congress of Mathematicians: Rio de Janeiro 2018, pages 3995โ€“4014. World Scientific, 2018. [48] Guillaume Tartavel, Yann Gousseau, and Gabriel Peyrรฉ. Variational texture synthesis with sparsity and spectrum constraints. Journal of Mathematical Imaging and Vision, 52:124โ€“144, 2015. [49] Guillaume Tartavel, Gabriel Peyrรฉ, and Yann Gousseau. Wasserstein loss for image synthesis and restoration. SIAM Journal on Imaging Sciences, 9(4):1726โ€“1755, 2016. [50] Douglas L Theobald and Phillip A Steindel. Optimal simultaneous superpositioning of multiple structures with missing data. Bioinformatics, 28(15):1972โ€“1979, 2012. [51] Liping Yin and Albert Chua. Long range constraints for neural texture synthesis using sliced wasserstein loss. arXiv preprint arXiv:2211.11137, 2022. [52] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE 108 conference on computer vision and pattern recognition, pages 586โ€“595, 2018. [53] Yiqiao Zhong and Nicolas Boumal. Near-optimal bounds for phase synchronization. SIAM Journal on Optimization, 28(2):989โ€“1016, 2018. [54] J Portegies Zwart, Renรฉ van der Heiden, Sjoerd Gelsema, and Frans Groen. Fast translation invariant classification of hrr range profiles in a zero phase representation. IEE Proceedings- Radar, Sonar and Navigation, 150(6):411โ€“418, 2003. 109