ON THE WAVELET SCATTERING TRANSFORM AND ITS GENERALIZATIONS By Albert Chua A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Mathematicsโ€”Doctor of Philosophy 2023 ABSTRACT In this thesis, we look into generalizations of Mallatโ€™s wavelet scattering transform. In the second chapter, we generalize finite depth wavelet scattering transforms, which we formulate as L๐‘ž (R๐‘›) norms of a cascade of continuous wavelet transforms (or dyadic wavelet transforms) and contractive nonlinearities. We then provide norms for these operators, prove that these operators are well- defined, and are Lipschitz continuous to the action of ๐ถ2 diffeomorphisms in specific cases; additionally, we extend our results to formulate an operator invariant to the action of rotations ๐‘… โˆˆ SO(๐‘›) and an operator that is equivariant to the action of rotations of ๐‘… โˆˆ SO(๐‘›). In the third and fourth chapters, we generalize our results to stochastic process and signals on compact manifolds, respectively. ACKNOWLEDGEMENTS First, and most importantly, I would like to thank my advisor, Yang Yang, for all the support that he has provided me during the one and a half years we have worked with each other. In spite of the fact that I always try "interesting" ideas for my research, he has never been dismissive, and has been very supportive by providing help whenever he can. Iโ€™ll always appreciate the support he has provided me. I also would like to thank my committee members, Jianliang Qian, Ekaterina Rapinchuk, and Rongrong Wang for all the support they have provided me thoughout my time at MSU. Without their thoughtful suggestions for this thesis, it would be much worse condition compared to now. I also appreciate the help Michael Perlmutter, who was one of the first people to provide edits to what became the journal version of Chapter 2 of this thesis. I definitely would not have caught most of those errors myself. I also appreciate his advice for how/where to publish the paper, because I wasnโ€™t really sure of what to do with it at the time. I would like to also Anna Little, who was one of the coauthors on the journal version of Chapter 2 in this thesis. She provided a lot of help with foundational parts of the work, which made my life a lot easier. I also appreciate the time she took to help me through the review process for the paper, which I was completely unprepared for on my own. Matt Hirn, who was my advisor during the third and fourth years of my PhD, is also someone I would like to thank. He was the one who introduced me to the scattering transform, and helped me while I started my research on nonwindowed scattering transforms. Despite all the trouble I had at first, he was very patient with me and helped me with any trouble I had. I appreciate all the proofs he checked, especially all the "proofs" with mistakes that he caught. Lastly, I would like to thank anyone who I have missed on this list. I donโ€™t think I could have finished this thesis without the help of many people in my life. iii TABLE OF CONTENTS CHAPTER 1 . . . . . . . INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Machine Learning and Model Fitting . . . . . . . . . . . . . . . . . . . . . . 1.3 Background On Convolutional Neural Networks . . . . . . . . . . . . . . . . . Invariance, Equivariance, Stability, Frequency Representations, and Machine 1.4 Learning . 1.5 Wavelets . . 1.6 Scattering Transforms . 1.7 Contributions . 3 5 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 . . . . . . . . . . . . . CHAPTER 2 . . . GENERALIZING THE NONWINDOWED SCATTERING TRANS- FORM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . . . . 16 2.1 Fourier Transforms and Hardy Spaces . . . . . . . . . . . . . . . . . . . . 17 2.2 Wavelet Scattering is a Bounded Operator 2.3 Stability to Dilations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4 Stability to Diffeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 . 54 2.5 Equivariance and Invariance to Rotations . . . . . . . . . . . . . . . . . . . . CHAPTER 3 . . . 60 EXPECTED SCATTERING TRANSFORMS . . . . . . . . . . . . . . . 60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Background . . 3.2 Wavelet Transforms for Stochastic Processes . . . . . . . . . . . . . . . . . . . 60 3.3 Scattering Moments and the Expected Scattering Transform . . . . . . . . . . . 61 3.4 The Expected Scattering Transform When ๐‘ž = 2 . . . . . . . . . . . . . . . . . 62 3.5 The Expected Scattering Transform When 1 < ๐‘ž < 2 . . . . . . . . . . . . . . 66 CHAPTER 4 NONWINDOWED SCATTERING ON COMPACT RIEMANNIAN MANIFOLDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 . 69 4.1 Notation for Scattering on Manifolds . . . . . . . . . . . . . . . . . . . . . . 4.2 Spectral Filters and the Geometric Wavelet Transform . . . . . . . . . . . . . . 70 4.3 The Geometric Scattering Transform . . . . . . . . . . . . . . . . . . . . . . . 72 . 74 4.4 Generalizing Geometric Scattering Transforms . . . . . . . . . . . . . . . . CHAPTER 5 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 iv CHAPTER 1 INTRODUCTION 1.1 Notation Set R+ to be the positive real numbers, i.e. R+ := (0, โˆž). The gradient of a function ๐‘“ : R๐‘› โ†’ C is given by โˆ‡ ๐‘“ , the Jacobian of a function ๐‘“ : R๐‘› โ†’ R๐‘š is given by ๐ท ๐‘“ , and the Hessian is given by (cid:105) 1/๐‘ž ๐ท2 ๐‘“ . For 1 โ‰ค ๐‘ž < โˆž, the L๐‘ž (R๐‘›) norm of a function ๐‘“ : R๐‘› โ†’ C is โˆฅ ๐‘“ โˆฅ๐‘ž := (cid:104)โˆซ . R๐‘› | ๐‘“ (๐‘ฅ)|๐‘ž ๐‘‘๐‘ฅ When ๐‘ž = โˆž, โˆฅ ๐‘“ โˆฅโˆž := ess sup| ๐‘“ |. We will also use the notation, โˆฅฮ” ๐‘“ โˆฅโˆž = sup๐‘ฅ,๐‘ฆโˆˆR๐‘‘ | ๐‘“ (๐‘ฅ) โˆ’ ๐‘“ (๐‘ฆ)|, for the first two chapters of this thesis (which should not be mistaken for applying a Laplacian operator). Greek letters with a vector symbol, such as (cid:174)๐›ผ = (๐›ผ1, ยท ยท ยท , ๐›ผ๐‘›), will be a multi-index of nonnegative integers; additionally, we write | (cid:174)๐›ผ| = ๐›ผ1 + ยท ยท ยท + ๐›ผ๐‘›, and the usage will be clear from context. The operator ๐ท (cid:174)๐›ผ is a multi-index of derivatives: ๐ท (cid:174)๐›ผ ๐‘“ = ๐‘“ . For integer ๐‘  โ‰ฅ 0, ๐œ• | (cid:174)๐›ผ| ๐œ•๐‘ฅ ๐›ผ ยทยทยท๐œ•๐‘ฅ ๐›ผ๐‘› 1 ๐‘› 1 we define the function space H๐‘  (R๐‘›) = { ๐‘“ โˆˆ L2(R๐‘›) : ๐ท (cid:174)๐›ผ ๐‘“ โˆˆ L2(R๐‘›) for | (cid:174)๐›ผ| โ‰ค ๐‘ }. 1.2 Machine Learning and Model Fitting The following material is based on [1]. A functional perspective on supervised learning is the following. Suppose we have a set of data that is split into a training set ๐‘‡ = {(๐‘ฅ๐‘–, ๐‘ฆ๐‘–)}๐‘ ๐‘–=1, which has ๐‘–=1. Our goal is to find a model ๐น๐œƒ, parameterized by a set of weights ๐œƒ โˆˆ R๐‘›, that best fits the data with respect to some metric (i.e. mean squared loss). To known data {๐‘ฅ๐‘–} โŠ‚ X and labels {๐‘ฆ๐‘–}๐‘ check if our model ๐น๐œƒ actually fits the data, we are given a set of test points ๐‘‡test, which are only accessible for evaluating the fit of ๐น๐œƒ. Define the set of all possible models F as F = { ๐‘“๐œƒ (๐‘ฅ) : ๐œƒ โˆˆ R๐‘›}, where each ๐‘“๐œƒ is a model parameterized by weights ๐œƒ โˆˆ R๐‘›. One needs to narrow down the search space by choosing an appropriate model to fit the data. One such instance is when one has prior knowledge of the distribution of data. For example, consider linear regression; suppose that {๐‘ฅ๐‘–}๐‘ ๐‘–=1 โŠ‚ R๐‘› and ๐‘ฆ๐‘– โŠ‚ R with ๐‘ฆ๐‘– = ๐‘ค๐‘‡ ๐‘ฅ๐‘– + ๐œ€, where ๐‘ค โˆˆ R๐‘› is a set of unknown weights and ๐œ€ is a small noise. Lastly, assume that we want to find a representation that minimizes the mean squared 1 error: ๐‘ โˆ‘๏ธ ๐‘–=1 ( ๐‘“๐œƒ (๐‘ฅ๐‘–) โˆ’ ๐‘ฆ๐‘–)2. At this point, it is natural to restrict the set of functions to have the following representation: ๐‘“๐œƒ (๐‘ฅ) = ๐œƒ๐‘‡ ๐‘ฅ, ๐œƒ โˆˆ R๐‘›. Note that this example is relatively simple. For more complex representations, such as images, one needs to consider more sophisticated representations. Over the past two decades, convolutional neural networks have show remarkable success for image recognition tasks. For example, [2, 3, 4, 5] have gradually redefined state-of-the art on benchmark datasets in the 2010s. However, the mechanisms behind how they work have not been fully understood until recently [6, 7, 8, 9, 10]. 1.3 Background On Convolutional Neural Networks Before we provide more discussion about invariants in machine learning, we will discuss the architecture for convolutional neural networks. Consider two discrete functions: ๐‘Ž1 : Z โ†’ R and ๐‘1 : Z โ†’ R. Practitioners in deep learning generally define the convolution (which is cross-correlation) as (๐‘Ž1 โˆ— ๐‘1) (๐‘–) = โˆ‘๏ธ ๐‘— โˆˆZ ๐‘Ž1(๐‘– + ๐‘—)๐‘1( ๐‘—). (1.1) More generally, we can assume that we have two dimensional functions ๐‘Ž2 : Z2 โ†’ R and ๐‘2 : Z2 โ†’ R. The two dimensional convolution is given by (๐‘Ž2 โˆ— ๐‘2)(๐‘–1, ๐‘–2) = โˆ‘๏ธ ( ๐‘—1, ๐‘—2)โˆˆZ2 ๐‘Ž2(๐‘–1 + ๐‘—1, ๐‘–2 + ๐‘—2)๐‘2( ๐‘—1, ๐‘—2). (1.2) This is the first building block for convolution operations similar to the operations seen in deep learning libraries, such as "Conv2d" in PyTorch. However, in practice, these operations generally are implemented with finite filters rather than infinite filters like above. To construct a full Conv2d layer, suppose that we have a set of ๐‘1 functions, and the goal is to get a representation with ๐‘2 functions via a set of convolutions. Define a set of functions {๐น๐‘›1,๐‘›2 } 2 with indexes 1 โ‰ค ๐‘›1 โ‰ค ๐‘1 and 1 โ‰ค ๐‘›2 โ‰ค ๐‘2. The Conv2d layer can be mathematically expressed as ๐ถ ( ๐‘“ ) = ๐‘1โˆ‘๏ธ ๐‘›1=1 ๐น๐‘›1,๐‘›2 โˆ— ๐‘“ . (1.3) After applying ๐ถ, a nonlinearity is applied to each entry of the result, and some form of subsampling is done to reduce the data necessary for the representation. A convolutional neural network, less formally speaking, is a cascade of applying a Conv2d layer, a nonlinearity, and a subsampling operator, in that exact order. 1.4 Invariance, Equivariance, Stability, Frequency Representations, and Machine Learning Let B1,B2 be Banach Spaces and ฮฆ : B1 โ†’ B2 be an operator, let ๐‘‡ : B1 โ†’ B1 be an operator. We say that ฮฆ is invariant to ๐‘‡ if ฮฆ๐‘‡ ๐‘“ = ฮฆ ๐‘“ , โˆ€ ๐‘“ โˆˆ B1, and ฮฆ is a ๐‘‡-invariant operator. Similarly, for ๐‘‡ : B1 โ†’ B2, for we say that ฮฆ is equivariant with respect to ๐‘‡ if ฮฆ๐‘‡ ๐‘“ = ๐‘‡ฮฆ ๐‘“ , โˆ€ ๐‘“ โˆˆ B1. Similar to the regression example, CNNs restrict the the possible set of models we consider. With respect to images, convolution has two properties that are helpful for image recognition tasks: โ€ข Convolution is inherently a local operation and depends on neighboring pixels. That is to say, we utilize the underlying geometry of an image. โ€ข Convolution is equivariant with respect to translation. In other words, translating a function and translating a function after convolution yield the same output. However, it is not necessarily useful to have translation equivariance. Suppose we have the following two tasks: โ€ข Determine if a cat is in the picture. โ€ข Determine where the cat is in the picture. For the first task, the location of the cat does not matter, so translating the cat in the picture is irrelevant. Thus, we would like a representation that is invariant to translation. On the other hand, 3 in the second task, to keep track of the location of the cat, we would like a representation that is equivariant with respect to translations. This example illustrates the following point. Using relevant information about our task is a way of restricting down the search space for possible models. Along with some type of invariance or equivariance, stability is also an important property for our representation. Let ๐ฟ๐›พ ๐‘“ (๐‘ฅ) = ๐‘“ (๐›พโˆ’1(๐‘ฅ)), where ๐›พ(๐‘ฅ) := ๐‘ฅ โˆ’ ๐œ(๐‘ฅ) for ๐œ โˆˆ ๐ถ2(R๐‘›) suitably small. We would like a representation such that โˆฅฮฆ ๐‘“ โˆ’ ฮฆ๐ฟ๐œ ๐‘“ โˆฅB2 โ‰ค ๐พ (๐œ) โˆฅ ๐‘“ โˆฅB1 , and ๐พ (๐œ) get smaller as ๐œ get smaller. The intuition is that small deformations of the signal will not change the representation too much. An important aspect of convolutional models is their ability to discern frequency information. Empirically, high frequency information is important for image recognition. In Figure 1.1, one can Figure 1.1 Left: Polar Bear. Middle: Low Pass filtering. Right: High Pass filtering. see that the high frequency information is what allows us to determine that the image is in fact an image of a polar bear, so it is important that a representation can extract high frequency information properties. Notably, convolutions are useful for this task because of the convolution theorem. Lastly, one also needs sufficient model complexity to retain enough meaningful information, which is a key ingredient of deep convolutional neural networks. For example, notice that using the representation โˆฅ ๐‘“ โˆฅ2 2 yields a translation invariant operator, but any meaningful information about the function ๐‘“ is lost, including high frequency information. Since convolutional neural networks learn the best model via optimizing a set of weights, it is hard to study their mathematical properties. Instead, one can consider a proxy by using unlearned filters to simplify the analysis. Ideally, the representation should have the following properties: 4 โ€ข Has some invariance/equivariance properties. โ€ข Stable to small deformations. โ€ข Keeps meaningful information and is sufficiently complex. Regarding the third point, choosing an operator with sufficient complexity, invariance, and stability is not an easy task. For now, consider a simple dilation operator ๐ฟ๐‘ ๐‘“ (๐‘ฅ) = ๐‘“ ((1 โˆ’ ๐‘)๐‘ฅ) for |๐‘| < 1 2๐‘› . A feasible way to extract more information is via a low pass filtering (e.g. define an operator ๐พ๐œ™ ๐‘“ = ๐‘“ โˆ— ๐œ™, where ห†๐œ™(๐œ”) = 1๐ต๐‘… (0) for some ๐‘… > 0). One can check that for functions ๐‘“ such that ห†๐‘“ is supported in ๐ต๐‘… (0), we have โˆฅ ๐‘“ โˆ’ ๐ฟ๐‘ ๐‘“ โˆฅ2 2 = โˆฅ ๐‘“ โˆ— ๐œ™ โˆ’ ๐ฟ๐‘ ๐‘“ โˆ— ๐œ™โˆฅ2 2 โ‰ค ๐‘2 ยท ๐ถ๐‘… โˆฅ ๐‘“ โˆฅ2 2 for some constant ๐ถ๐‘…. However, high frequency information is lost because ห†๐‘“ is only supported in some bounded ball. To keep high frequency information, a feasible translation invariant operator to consider is the fourier modulus. However, this operator is not even stable with respect dilations with respect to the 2-norm. The following informal argument from [11]. Suppose that ๐‘“ (๐‘ฅ) = ๐‘’๐‘–๐œ‰๐‘ฅ๐œƒ (๐‘ฅ), where ๐œƒ is regular with fast decay. Then one can prove that โˆฅ| (cid:100)๐ฟ๐œ ๐‘“ | โˆ’ | ห†๐‘“ |โˆฅ2 โ‰ˆ |๐‘||๐œ‰ |โˆฅ๐œƒ โˆฅ2. Since ๐œ‰ is arbitrary, we see that we can choose it so that the Fourier modulus is not stable to dilations. The main point of these examples is to show that Fourier invariants, which are a natural choice for a feature extractor, are simply not enough. Even for the most simple class of dilations, we do not have any stability result that can contain high frequency information. To create an operator with the properties mentioned above, we consider using wavelets. 1.5 Wavelets We let ๐œ“ โˆˆ L1(R๐‘›) โˆฉ L2(R๐‘›) be a wavelet, which means it is a function that is localized in both space and frequency and has zero average, i.e., โˆซ R๐‘› ๐œ“(๐‘ฅ) ๐‘‘๐‘ข = 0 . 5 Assume ๐‘“ โˆˆ L2(R๐‘›). The continuous wavelet transform W ๐‘“ โˆˆ L2(R๐‘› ร— R+) is defined as: โˆ€ (๐‘ฅ, ๐œ†) โˆˆ R๐‘› ร— R+ , W ๐‘“ (๐‘ฅ, ๐œ†) := ๐‘“ โˆ— ๐œ“๐œ† (๐‘ฅ) . Furthermore, if ๐œ“ satisfies the following admissibility condition โˆซ โˆž 0 | (cid:98)๐œ“(๐œ†๐œ”)|2 ๐œ† ๐‘‘๐œ† = C๐œ“ , โˆ€ ๐œ” โˆˆ R๐‘› \ {0} , (1.4) for some C๐œ“ > 0, then we will say that ๐œ“ is a Littlewood-Paley wavelet for the continuous wavelet transform. If ๐œ“ satisfies (1.4), one can show that the norm W ๐‘“ computed with a weighted measure (๐‘‘๐‘ฅ, ๐‘‘๐œ†/๐œ†๐‘›+1) on R๐‘› ร— R+ is well defined: โˆฅW ๐‘“ โˆฅ2 L2 (R๐‘›ร—R+) := |W ๐‘“ (๐‘ฅ, ๐œ†)|2 ๐‘‘๐‘ฅ โˆซ โˆž โˆซ R๐‘› โˆซ R๐‘› 0 โˆซ โˆž 0 โˆซ โˆž 0 | ๐‘“ โˆ— ๐œ“๐œ† (๐‘ฅ)|2 ๐‘‘๐‘ฅ โˆฅ ๐‘“ โˆ— ๐œ“๐œ† โˆฅ2 2 ๐‘‘๐œ† ๐œ†๐‘›+1 . ๐‘‘๐œ† ๐œ†๐‘›+1 ๐‘‘๐œ† ๐œ†๐‘›+1 = = We note, in fact, that one can show: where ๐›ฝ = โˆฅW ๐‘“ โˆฅ2 L2 (R๐‘›ร—R+) = ๐›ฝ ยท C๐œ“ โˆฅ ๐‘“ โˆฅ2 2 1/2 if ๐œ“ is real valued 1 if ๐œ“ is complex valued ๏ฃฑ๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด ๏ฃณ . . For a function ๐‘“ โˆˆ L2(R๐‘›) we define the dyadic wavelet transform ๐‘Š ๐‘“ โˆˆ โ„“2(L2(R๐‘›)) as If ๐œ“ satisfies ๐‘Š ๐‘“ = (cid:0) ๐‘“ โˆ— ๐œ“ ๐‘— (cid:1) ๐‘— โˆˆZ . |(cid:98)๐œ“(2 ๐‘— ๐œ”)|2 = ห†๐ถ๐œ“, โˆ€๐œ” โˆˆ R๐‘› \ {0} , โˆ‘๏ธ ๐‘— โˆˆZ (1.5) (1.6) for some ห†๐ถ๐œ“ > 0, then we will say that ๐œ“ is a Littlewood-Paley wavelet for the dyadic wavelet transform. If ๐œ“ satisfies (1.6), one can show that the norm ๐‘Š ๐‘“ given below is well defined: โˆฅ๐‘Š ๐‘“ โˆฅ2 โ„“2 (L2 (R๐‘›)) := โˆ‘๏ธ ๐‘— โˆˆZ โˆฅ ๐‘“ โˆ— ๐œ“ ๐‘— โˆฅ2 2 . 6 In fact, we have the following norm equivalence: โˆฅ๐‘Š ๐‘“ โˆฅ2 โ„“2 (L2 (R๐‘›)) = ๐›ฝ ยท ห†๐ถ๐œ“ โˆฅ ๐‘“ โˆฅ2 2 , where ๐›ฝ is defined in (1.5). Wavelets are an ideal choice because the wavelet transform provides a decomposition of a function into frequency bins. 1.6 Scattering Transforms We now introduce the windowed scattering transform, which is a simple model for convolutional neural network with desirable mathematical properties. Let ๐œ™ : R๐‘› โ†’ R be a low pass filter ( ห†๐œ™(0) โ‰  0) , ๐œ“ : R๐‘› โ†’ C a suitable mother wavelet ( ห†๐œ“(0) = 0), and let ๐บ be a rotation group and ๐บ+ = ๐บ/{โˆ’1, 1}, where 1 is the identity element for the group. Define a set of rotations and dilations by and ฮ›๐ฝ := {๐œ† = 2 ๐‘—๐‘Ÿ : ๐‘Ÿ โˆˆ ๐บ+, ๐‘— > โˆ’๐ฝ} if ๐ฝ โ‰  โˆž ฮ›โˆž := {2 ๐‘—๐‘Ÿ : ๐‘Ÿ โˆˆ ๐บ+, ๐‘— โˆˆ Z}. (1.7) (1.8) Let ๐œ† = 2 ๐‘—๐‘Ÿ โˆˆ ฮ›๐ฝ. We further assume that our wavelet satisfies the following unitary frame condition: is ๐œ“ is a complex wavelet, and |๐œ™(2๐ฝ๐œ”)|2 + โˆ‘๏ธ ๐œ†โˆˆฮ›๐ฝ |๐œ“(๐œ†โˆ’1๐œ”)|2 = 1 |๐œ™(2๐ฝ๐œ”)|2 + 1 2 โˆ‘๏ธ ๐œ†โˆˆฮ›๐ฝ (cid:2)|๐œ“(๐œ†โˆ’1๐œ”)|2 + |๐œ“(โˆ’๐œ†โˆ’1๐œ”)|2(cid:3) = 1 if ๐œ“ is a real wavelet. Consider the operator ๐‘ˆ [๐œ†] = (cid:12) (cid:12) (cid:12) (cid:12) โˆซ R๐‘› ๐‘“ (๐‘ข)2๐‘› ๐‘— ๐œ“(2 ๐‘—๐‘Ÿ โˆ’1(๐‘ฅ โˆ’ ๐‘ข)) ๐‘‘๐‘ข (cid:12) (cid:12) (cid:12) (cid:12) (1.9) For a tuple of rotations and dilations in ฮ›๐ฝ, define a path of length ๐‘š as the tuple ๐‘ := (๐œ†1, . . . , ๐œ†๐‘š) and let P๐ฝ be the set of all finite paths. The scattering propagator for ๐‘“ โˆˆ L2(R๐‘›) and ๐‘ โˆˆ P๐ฝ is ๐‘ˆ [ ๐‘] ๐‘“ := ๐‘ˆ [๐œ†๐‘š] ยท ยท ยท ๐‘ˆ [๐œ†1] ๐‘“ , (1.10) 7 which gathers high frequency information via a cascade of wavelet transforms and nonlinearities. The scattering operator is ๐‘† ๐‘“ ( ๐‘) = 1 ๐œ‡ ๐‘ โˆซ R๐‘› ๐‘ˆ [ ๐‘] ๐‘“ (๐‘ฅ) ๐‘‘๐‘ฅ (1.11) with ๐œ‡ ๐‘ := โˆซ [11] define the scattering operator for ๐‘“ โˆˆ L2(R๐‘›) and ๐‘ โˆˆ P๐ฝ as R๐‘› ๐‘ˆ [ ๐‘]๐›ฟ(๐‘ฅ) ๐‘‘๐‘ฅ. Additionally, to aggregate features similar to pooling, the author of ๐‘†๐ฝ [ ๐‘] ๐‘“ (๐‘ฅ) = โˆซ R๐‘› ๐‘ˆ [ ๐‘] ๐‘“ (๐‘ข)2โˆ’๐‘›๐ฝ ๐œ™(2โˆ’๐ฝ (๐‘ฅ โˆ’ ๐‘ข)) ๐‘‘๐‘ข. (1.12) Additionally, the windowed scattering transform is the set of functions ๐‘†๐ฝ [P๐ฝ] ๐‘“ = {๐‘†๐ฝ [ ๐‘] ๐‘“ } ๐‘โˆˆP๐ฝ . (1.13) This operator is similar to a convolution neural network because along each path (analogous to each layer of a convolutional neural network) a convolution, a nonlinearity is applied, and feature aggregation occurs via the low pass filter. The scattering norm for any set of paths ฮฉ is โˆฅ๐‘†๐ฝ [ฮฉ] ๐‘“ โˆฅ2 = โˆ‘๏ธ ๐‘โˆˆฮฉ โˆฅ๐‘†๐ฝ [ ๐‘] ๐‘“ โˆฅ2 2 . (1.14) Notably, we see that the windowed scattering transform has a structure similar to a convolutional neural network. Since it is important for a feature extractor to extract high frequency information, we will provide an informal explanation for how the modulus nonlinearity does this. Suppose ๐‘“ โˆˆ ๐ฟ2(R๐‘›). Then (cid:156)( ๐‘“ โˆ— ๐œ“ ๐‘— ) (0) = ห†๐‘“ (0) ห†๐œ“ ๐‘— (0) = 0, and assume that ๐œ“ is ๐ถโˆž without any loss of generality. Assume ๐‘“ โˆ— ๐œ“ ๐‘— โ‰  0 on a set of positive measure. Then (cid:156)| ๐‘“ โˆ— ๐œ“ ๐‘— |(0) = โˆซ R๐‘› | ๐‘“ โˆ— ๐œ“ ๐‘— |(๐‘ฅ) ๐‘‘๐‘ฅ > 0. Since | ๐‘“ โˆ— ๐œ“ ๐‘— | is continuous, we can find a neighborhood around the origin where |((cid:155)๐‘“ โˆ— ๐œ“ ๐‘— ) (๐‘ฅ)| is nonzero. In other words, high frequency information is pushed down to lower frequency bins. Before we discuss the theoretical properties of scattering transforms, we provide empirical justification of scattering architectures for feature extraction. First, the seminal paper [12] provided 8 justification for using the windowed scattering transform for small benchmark datasets. From then on, scattering features have shown competitive results for audio tasks [13, 14, 15] and image tasks [16, 17]. Adding learning, like in [18, 19], have been shown to help improve performance in classification tasks as well. Moving on to theoretical properties of the windowed scattering transform, the windowed scat- tering transform has the following properties, which are desirable for a feature extractor. The first property is energy preservation, under strict assumptions on the wavelet. Theorem 1 ([11]). A scattering wavelet ๐œ“ is said to be admissible if there exists ๐œ‚ โˆˆ R๐‘› and ๐œŒ โ‰ฅ 0, with | ห†๐œŒ(๐œ”)| โ‰ค | ห†๐œ™(2๐œ”)| and ห†๐œŒ(0) = 1, such that the function satisfies ห†ฮจ(๐œ”) = |๐œŒ(๐œ” โˆ’ ๐œ‚)|2 โˆ’ (cid:16) 1 โˆ’ | ห†๐œŒ(2โˆ’๐‘˜ (๐œ” โˆ’ ๐œ‚))|2(cid:17) ๐‘˜ โˆž โˆ‘๏ธ ๐‘˜=1 ๐›ผ = inf 1โ‰ค|๐œ”|โ‰ค2 โˆž โˆ‘๏ธ โˆ‘๏ธ ๐‘—=โˆ’โˆž ๐‘Ÿโˆˆ๐บ ห†ฮจ(2โˆ’ ๐‘—๐‘Ÿ โˆ’1๐œ”)| ห†๐œ“(2โˆ’ ๐‘—๐‘Ÿ โˆ’1๐œ”)|2 > 0. (1.15) (1.16) If a wavelet is admissible, then โˆฅ๐‘†๐ฝ [๐‘ƒ๐ฝ] โˆฅ = โˆฅ ๐‘“ โˆฅ. The problem with the admissibility condition in above is that there are very few classes of wavelets that are admissible. The author of [11] mentions an analytic cubic spline Battle-Lemariรฉ wavelet is admissible in one dimension, but provides no other examples. On a related note, [20] has shown that scattering coefficients have exponential decay for ๐‘› = 1 under relatively mild assumptions, but her proof only applies for ๐‘› = 1, which makes the admissibility condition still necessary for ๐‘› โ‰ฅ 2. Additionally, to our knowledge, there are no examples in the literature of wavelets that satisfy the admissibility condition when ๐‘› > 1. The second property is that the windowed scattering transform is nonexpansive. Theorem 2 ([11]). Suppose ๐œ“ is an admissible wavelet. For all ๐‘“ , โ„Ž โˆˆ ๐ฟ2(R๐‘›), โˆฅ๐‘†๐ฝ [๐‘ƒ๐ฝ] ๐‘“ โˆ’ ๐‘†๐ฝ [๐‘ƒ๐ฝ]โ„Žโˆฅ โ‰ค โˆฅ ๐‘“ โˆ’ โ„Žโˆฅ2. The third property is an "almost translation invariance" property. 9 Theorem 3 ([11]). Define ๐ฟ๐‘ ๐‘“ (๐‘ข) = ๐‘“ (๐‘ข โˆ’ ๐‘). For admissible wavelets, โˆฅ๐‘†๐ฝ [๐‘ƒ๐ฝ] ๐‘“ โˆ’ ๐‘†๐ฝ [๐‘ƒ๐ฝ] ๐ฟ๐‘ ๐‘“ โˆฅ = 0. lim ๐ฝโ†’โˆž for all ๐‘ โˆˆ R๐‘› and for all ๐‘“ โˆˆ ๐ฟ2(R๐‘›). The last property is a deformation stability bound. Theorem 4 ([11], informal). Let ๐œ โˆˆ ๐ถ2(R๐‘›) and ๐ฟ๐œ ๐‘“ = ๐‘“ (๐‘ข โˆ’ ๐œ(๐‘ข)). For ๐‘“ โˆˆ ๐ฟ2(R๐‘›) and โˆฅ๐ท๐œโˆฅโˆž < 1 2๐‘› , โˆฅ๐‘†๐ฝ [๐‘ƒ๐ฝ] ๐ฟ๐œ ๐‘“ โˆ’ ๐‘†๐ฝ [๐‘ƒ๐ฝ] ๐‘“ โˆฅ โ‰ค ๐พ (๐œ) โˆฅ ๐‘“ โˆฅ2 with ๐พ (๐œ) โ†’ 0 as โˆฅ๐œโˆฅโˆž + โˆฅ๐ท๐œโˆฅโˆž + โˆฅ๐ท2๐œโˆฅโˆž โ†’ 0. Deformation stability bounds have become a major point of importance in mathematical deep learning. Since Mallatโ€™s work, other works have tried to find feature extractors with similar mathematical properties. For example, [21, 22] consider a generalization of the scattering transform where one uses a general frame instead of a wavelet frame. Another set of related works are [23, 24], which uses a generalization of gabor frames, called uniform covering frames, as a convolution layer. Convolutional kernel networks, as seen in [25, 6], also have desirable mathematical properties. Additionally, rather than working on Euclidean space, a better intrinsic representation can be found by working on a graph or manifold (e.g. point cloud data); works such as [26, 27, 28, 29, 30] focus on feature extractors on noneuclidean data. We will provide a preliminary generalization of [28] in Chapter 4 of this thesis. Notably, other than [11, 23, 24] all these feature extractors for Euclidean data only provide stability bounds for bandlimited functions, or the set of functions that satisfy { ห†๐‘“ : ห†๐‘“ has compact support}. This assumption is reasonable for actual signals because real-world implementation of signals are implemented on a domain with compact time and frequency support. The work in [23, 24] makes a slight generalization to (๐œ– โˆ’ ๐‘…) bandlimited functions. Let ๐‘„ ๐‘… (๐‘ฅ) = {๐‘ฆ โˆˆ R๐‘› : โˆฅ๐‘ฆ โˆ’ ๐‘ฅโˆฅโˆž < ๐‘…}. 10 A function ๐‘“ โˆˆ L2(R๐‘›) is (๐œ–, ๐‘…) bandlimited for some ๐œ€ โˆˆ [0, 1) and ๐‘… > 0 if โˆฅ ห†๐‘“ โˆฅL2 (๐‘„ ๐‘… (0)) โ‰ฅ (1 โˆ’ ๐œ€) โˆฅ ๐‘“ โˆฅ2. However, their stability result is slightly weaker because there are terms that are independent of the deformation in their bound. To our knowledge, a result similar to Mallatโ€™s stability bound, which does not rely on the function being bandlimited, does not exist for other feeature extractors in the current literature. An interesting line of work appears in [31], where one relaxes the assumption on ๐œ in Theorem 4 from ๐œ โˆˆ ๐ถ2(R๐‘›) to ๐œ โˆˆ C1+๐›ผ (R๐‘›) for ๐›ผ โˆˆ (0, 1). Similar results also apply to our stability bound in Chapter 2 as well. 1.7 Contributions Windowed Scattering Transforms are useful when the representation does not need to be rigid. For example, object detection does not require translation invariance, so a Windowed Scattering Transform would be appropriate since a smaller choice of ๐ฝ would not have coeffiicents that would be nearly translation invariant. For a task like classification that needs rigid translation invariance, windowed scattering coefficients are not necessarily the best option. Since the set of functions {๐œ™๐ฝ } forms an approximate identity, lim ๐ฝโ†’โˆž ๐‘†[ ๐‘] ๐‘“ = lim ๐ฝโ†’โˆž 2๐‘›๐ฝ โˆซ R๐‘› ๐‘ˆ [ ๐‘] ( ๐‘“ โˆ— ๐œ™๐ฝ) (๐‘ฅ) ๐‘‘๐‘ฅ = ๐œ™(0) โˆฅ๐‘ˆ [ ๐‘] ๐‘“ โˆฅ1. Here, the norm acts as the global pooling layer instead of a local pooling layer with the low pass filter. Mallat considered the set of all nonwindowed scattering coefficients, given by ๐‘†[๐‘ƒโˆž] ๐‘“ , which provides a rigid representation. However, he was not able to provide stability results for the norm he considered. We consider a slightly different problem than Mallat did for the nonwindowed scattering transform. As mentioned before, the nonwindowed scattering transform introduced in [11] was a collection of L1(R๐‘›) norms of various cascades of dyadic wavelet convolutions and modulus nonlinearities applied to a signal. Here, we extend the definition of the scattering transform to the 11 continuous wavelet transform and for L๐‘ž (R๐‘›) norms with ๐‘ž โˆˆ [1, 2]. For a continuous dilation parameter ๐œ† โˆˆ R+ we define the dilations of ๐œ“ as: โˆ€ ๐œ† โˆˆ R+ , ๐œ“๐œ† (๐‘ฅ) := ๐œ†โˆ’๐‘›/2๐œ“(๐œ†โˆ’1๐‘ฅ) , which preserves the L2(R๐‘›) norm of ๐œ“: โˆฅ๐œ“๐œ† โˆฅ2 = โˆฅ๐œ“โˆฅ2 , โˆ€ ๐œ† โˆˆ R+ . For the continuous wavelet transform, the one layer wavelet scattering transform with L๐‘ž (R๐‘›) norm is the function ๐‘†cont,๐‘ž : R+ โ†’ R defined as: โˆ€ ๐œ† โˆˆ R+ , ๐‘†cont,๐‘ž ๐‘“ (๐œ†) := โˆฅ ๐‘“ โˆ— ๐œ“๐œ† โˆฅ๐‘ž . (1.17) For a dyadic dilation parameter ๐‘— โˆˆ Z we define dilations of ๐œ“ as: โˆ€ ๐‘— โˆˆ Z , ๐œ“ ๐‘— (๐‘ฅ) = 2โˆ’๐‘› ๐‘— ๐œ“(2โˆ’ ๐‘— ๐‘ฅ) , which preserves the L1(R๐‘›) norm of ๐œ“: โˆฅ๐œ“ ๐‘— โˆฅ1 = โˆฅ๐œ“โˆฅ1 , โˆ€ ๐‘— โˆˆ Z . The one layer wavelet scattering transform for the dyadic wavelet transform is the function ๐‘†dyad,๐‘ž ๐‘“ : Z โ†’ R defined as: โˆ€ ๐‘— โˆˆ Z , ๐‘†dyad,๐‘ž ๐‘“ ( ๐‘—) := โˆฅ ๐‘“ โˆ— ๐œ“ ๐‘— โˆฅ๐‘ž . (1.18) More generally, the ๐‘š-layer wavelet scattering transforms ๐‘†๐‘š cont,๐‘ž ๐‘“ Z๐‘š โ†’ R are defined as : R๐‘š + โ†’ R and ๐‘†๐‘š dyad,๐‘ž ๐‘“ : ๐‘†๐‘š cont,๐‘ž ๐‘“ (๐œ†1, . . . , ๐œ†๐‘š) := โˆฅ|| ๐‘“ โˆ— ๐œ“๐œ†1 | โˆ— ๐œ“๐œ†2 | โˆ— ยท ยท ยท | โˆ— ๐œ“๐œ†๐‘š โˆฅ๐‘ž , ๐‘†๐‘š dyad,๐‘ž ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š) := โˆฅ|| ๐‘“ โˆ— ๐œ“ ๐‘—1 | โˆ— ๐œ“ ๐‘—2 | โˆ— ยท ยท ยท | โˆ— ๐œ“ ๐‘—๐‘š โˆฅ๐‘ž . (1.19) (1.20) This is similar to working with a windowed scattering transform with a finite number of layers. However, our operator is different from the operator ๐‘†๐ฝ in [11] because it does not contain the 12 filter ๐ด๐ฝ to aggregate low frequency information, so the scale parameter in our formulation is not bounded above or below. Additionally, because the averaging filter is replaced L๐‘ž (R๐‘›) norms, our representation is fully translation invariant rather than translation invariant as ๐ฝ โ†’ โˆž. As for the significance of using L๐‘ž (R๐‘›) norms to replace the averaging filter, there is one area with direct application: quantum energy regression tasks [32], where a representation that is similar to the rotation invariant representation in Section 6.2 has already been used for quantum energy regression. Given a configuration of atoms, we would like to estimate the ground state energy of the configuration. Suppose we have a molecule with ๐พ atoms with nuclear charges ๐‘ง๐‘˜ and nuclear positions ๐‘๐‘˜ with ๐‘˜ = 1, . . . , ๐พ. The state ๐‘ฅ of a molecule is given by ๐‘ฅ = {( ๐‘๐‘˜ , ๐‘ง๐‘˜ ) โˆˆ R3 ร— R : ๐‘˜ = 1 . . . , ๐พ }, (1.21) Due to how we have defined our state, we would like our representation to have the following properties: โ€ข Permutation Invariance: the energy should not depend on the index of the molecules. โ€ข Deformation Stability: small deformations of the molecule should only lead to small changes in energy of the system. โ€ข Isometry Invariance: the energy should be invariant to group actions such as translations, rotations, and other general isometries. โ€ข Multiscale Interactions: molecules have many interactions terms, and these interaction terms depend on the pairwise distance between atoms (i.e. short range covalent bonds and longer range Van Der Waals interactions). The rotation invariant version of our scattering transform in Chapter 6 satisfies permutation invariance, deformation stability, and has multiscale interactions based on the proofs weโ€™ve provided. We do not prove isometry invariance, but the operator is rotation and translation invariant. Motivated by DFT theory, the paper [32] uses a dictionary of one and two layer scattering norms with ๐‘ž = 1 and ๐‘ž = 2 to get (at the time) state-of-the-art results for energy regression tasks for planar molecules. In particular, scattering operators with ๐‘ž = 1 scaled with the number of 13 atoms in the system and ๐‘ž = 2 encoded pairwise interactions. The motivation for using 1 < ๐‘ž < 2 comes from [33, 34], which based on the Thomasโ€“Fermiโ€“Diracโ€“von Weizsรคcker model [35], also use scattering norms with ๐‘ž = 4/3, 5/3. Later papers, like [33, 34], use a similar representation, involving spherical harmonics, for 3D quantum energy regression. Remark 1. We can replace all the modulus operators with any contraction mapping (or use different contraction mappings in each layer) in the definition above, and all the proofs in the rest of this paper will still work. In particular, the modulus can be replaced with a complex version of the rectified linear unit (ReLU) nonlinearity, max(0, Re(๐‘Ž๐‘–))๐‘–=1,...,๐‘› for ๐‘Ž โˆˆ C๐‘›, which is a popular choice for complex neural networks. Nonetheless, we will use the modulus operator throughout this paper without any loss of generality. We provide a general roadmap for this chapter. First, we will cover notation, basic properties about wavelets and the wavelet scattering operator, and harmonic analysis that will be necessary for the paper. We then provide norms for an ๐‘š-layer wavelet scattering transforms and prove that the operators are well defined mappings into specific spaces when 1 โ‰ค ๐‘ž โ‰ค 2. Next, we explore conditions under which the ๐‘š-layer scattering transform is stable to dilations, and we generalize our results to diffeomorphisms. Lastly, in the last section of this chapter, we formulate two new translation invariant operators that are stable to diffeomorphisms. The first is rotation equivariant, and the second is rotation invariant. Our contributions include, but are not limited to, the following: โ€ข We formulate an extension of the dyadic wavelet scattering operator for a finite, arbitrary number of layers with parameter ๐‘ž โˆˆ [1, 2] by applying L๐‘ž (R๐‘›) norms instead of L1(R๐‘›) norms. Additionally, we formulate a wavelet scattering operator with ๐‘ž โˆˆ [1, 2] that uses a continuous scale parameter, like the continuous wavelet transform. โ€ข We create a new finite depth scattering norm using dyadic and continuous scales in the case when ๐‘ž โˆˆ [1, 2], and prove that the mappings are well defined and provide theoretical justification for a broader class of wavelets that make the scattering transform Lipchitz continuous to the action of ๐ถ2 diffeomorphisms. However, the trade-off is that our stability bound depends on the number of layers. 14 โ€ข We provide a condition for norm equivalence in the case of ๐‘ž = 2 that is less stringent. โ€ข In the case of ๐‘ž โˆˆ (1, 2], we prove that our norm is stable to diffeomorphisms ๐œ โˆˆ ๐ถ2(R๐‘›) provided that โˆฅ๐œโˆฅโˆž < 1 2๐‘› and the wavelet and its first and second partial derivatives have sufficient decay. In the case of ๐‘ž = 1, we show stability to dilations. โ€ข We extend our formulation to include invariance or equivariance to the action of rotations ๐‘… โˆˆ SO(๐‘›). 15 CHAPTER 2 GENERALIZING THE NONWINDOWED SCATTERING TRANSFORM The contents of this chapter were a joint work with Matthew Hirn and Anna Little. A journal version of this chapter is published in [36]. We start by providing basic prerequisite knowledge that will be necessary for the results in this chapter. 2.1 Fourier Transforms and Hardy Spaces The Fourier transform of a function ๐‘“ โˆˆ L1(R๐‘›) is the function (cid:98)๐‘“ โˆˆ Lโˆž(R๐‘›) defined as: โˆ€ ๐œ” โˆˆ R๐‘› , (cid:98)๐‘“ (๐œ”) := โˆซ R๐‘› ๐‘“ (๐‘ฅ)๐‘’โˆ’๐‘–๐‘ฅยท๐œ” ๐‘‘๐‘ฅ . The Hilbert transform of a function ๐‘“ โˆˆ L1(R) is denoted by ๐ป ๐‘“ and is defined as: ๐ป ๐‘“ (๐‘ฅ) := lim ๐œ–โ†’0 โˆซ |๐‘ฅโˆ’๐‘ฆ|>๐œ– ๐‘“ (๐‘ฆ) ๐‘ฅ โˆ’ ๐‘ฆ ๐‘‘๐‘ฆ . The map ๐ป is a convolution operator in which ๐‘“ is convolved against the function 1/๐‘ฅ. We note that ๐ป : L๐‘ž (R) โ†’ L๐‘ž (R) , โˆ€ 1 < ๐‘ž < โˆž , however the result is not true for ๐‘ž = 1, i.e., if ๐‘“ โˆˆ L1(R) it is not necessarily true that ๐ป ๐‘“ โˆˆ L1(R). We thus introduce the Hardy space. We denote the Hardy space as H1(R) and it consists of those functions ๐‘“ โˆˆ L1(R) such that ๐ป ๐‘“ โˆˆ L1(R) as well. For ๐‘“ โˆˆ H1(R) the Hardy space norm is โˆฅ ๐‘“ โˆฅH1 (R), which we define as (see Corollary 2.4.7 of [37]) โˆฅ ๐‘“ โˆฅH1 (R) := โˆฅ ๐‘“ โˆฅ1 + โˆฅ๐ป ๐‘“ โˆฅ1 . (2.1) One can show that if ๐‘“ โˆˆ H1(R), then ๐‘“ must necessarily have zero average. An important property of the Hilbert transform and convolution is the following: ๐ป ( ๐‘“ โˆ— ๐‘”) = ๐ป ๐‘“ โˆ— ๐‘” = ๐‘“ โˆ— ๐ป๐‘” , ๐‘“ โˆˆ L๐‘ (R) , ๐‘” โˆˆ L๐‘ž (R) , 1 < 1 ๐‘ + 1 ๐‘ž . We have a similar definition for Hardy spaces when ๐‘› โ‰ฅ 2. For 1 โ‰ค ๐‘— โ‰ค ๐‘›, define the ๐‘— th Riesz transform as ๐‘… ๐‘— ๐‘“ (๐‘ฅ) = lim ๐œ€โ†’0 โˆซ |๐‘ฅโˆ’๐‘ฆ|>๐œ€ ๐‘ฅ ๐‘— โˆ’ ๐‘ฆ ๐‘— |๐‘ฅ โˆ’ ๐‘ฆ|๐‘›+1 ๐‘“ (๐‘ฆ) ๐‘‘๐‘ฆ , (2.2) 16 where ๐‘ฅ = (๐‘ฅ1, . . . , ๐‘ฅ๐‘›) and ๐‘ฆ = (๐‘ฆ1, . . . , ๐‘ฆ๐‘›). The Hardy space ๐‘“ โˆˆ H1(R๐‘›) consists of functions ๐‘“ such that ๐‘“ โˆˆ L1(R๐‘›) and ๐‘… ๐‘— ๐‘“ โˆˆ L1(R๐‘›) for 1 โ‰ค ๐‘— โ‰ค ๐‘› as well. For ๐‘“ โˆˆ H1(R๐‘›) the Hardy space norm is โˆฅ ๐‘“ โˆฅH1 (R๐‘›), which we define as (see Corollary 2.4.7 of [37]) โˆฅ ๐‘“ โˆฅH1 (R๐‘›) := โˆฅ ๐‘“ โˆฅ1 + ๐‘› โˆ‘๏ธ ๐‘—=1 โˆฅ๐‘… ๐‘— ๐‘“ โˆฅ1 . (2.3) 2.1.1 Operator Valued Spaces Consider a Banach space B. Suppose ๐‘“ : R๐‘› โ†’ B and ๐‘ฅ โ†’ โˆฅ ๐‘“ (๐‘ฅ) โˆฅB is measurable in the Lebesgue sense. Define L ๐‘ B (R๐‘›) for 1 โ‰ค ๐‘ < โˆž to be โˆฅ ๐‘“ โˆฅ ๐‘ L = ๐‘ B (R๐‘›) โˆซ R๐‘› โˆฅ ๐‘“ (๐‘ฅ) โˆฅ ๐‘ B ๐‘‘๐‘ฅ . Also, for 1 โ‰ค ๐‘ < โˆž, define โˆฅ ๐‘“ โˆฅL ๐‘,โˆž B (R๐‘›) = sup ๐›ฟ>0 ๐›ฟ ยท ๐‘š({๐‘ฅ โˆˆ R๐‘› : โˆฅ ๐‘“ (๐‘ฅ) โˆฅB > ๐›ฟ})1/๐‘ . We also have the following relation: โˆฅ ๐‘“ โˆฅL ๐‘,โˆž B (R๐‘›) โ‰ค โˆฅ ๐‘“ โˆฅL ๐‘ B (R๐‘›) . Note that for ๐‘“ : R๐‘› โ†’ R๐‘›, โˆฅ ๐‘“ โˆฅ ๐‘ ๐‘ R๐‘› (R๐‘›) L = โˆซ R๐‘› โˆฅ ๐‘“ (๐‘ฅ) โˆฅ ๐‘ R๐‘› ๐‘‘๐‘ฅ = โˆซ R๐‘› | ๐‘“ (๐‘ฅ)| ๐‘ ๐‘‘๐‘ฅ = โˆฅ ๐‘“ โˆฅ ๐‘ ๐‘ . 2.2 Wavelet Scattering is a Bounded Operator In this chapter we explore for which ๐‘ž > 0 and ๐‘š โ‰ฅ 1 the wavelet scattering transforms ๐‘†๐‘š cont,๐‘ž ๐‘“ dyad,๐‘ž ๐‘“ are well-defined as functions in some Banach space (i.e., have finite norm), and under and ๐‘†๐‘š what circumstances. Let ๐œ“ be a wavelet. We assume that ๐œ“ has the following properties: |๐œ“(๐‘ฅ)| โ‰ค ๐ด(1 + |๐‘ฅ|)โˆ’๐‘›โˆ’๐œ€ โˆซ R๐‘› |๐œ“(๐‘ฅ โˆ’ ๐‘ฆ) โˆ’ ๐œ“(๐‘ฅ)| ๐‘‘๐‘ฅ โ‰ค ๐ด|๐‘ฆ|๐œ€โ€ฒ , 17 (2.4) (2.5) for some constants ๐ด, ๐œ€โ€ฒ, ๐œ€ > 0 and for all โ„Ž โ‰  0. Consider the Littlewood-Paley ๐บ-function ๐บ๐œ“ ( ๐‘“ )(๐‘ฅ) = (cid:18)โˆซ (0,โˆž) | ๐‘“ โˆ— ๐‘กโˆ’๐‘›๐œ“(๐‘ฅ/๐‘ก)|2 (cid:19) 1/2 . ๐‘‘๐‘ก ๐‘ก (2.6) Let B = L2 (cid:16) . We can rewrite this as a Bochner integral by considering the function ๐พ (๐‘ฅ) = (๐‘กโˆ’๐‘›/2๐œ“๐‘ก (๐‘ฅ))๐‘ก>0. This is a mapping ๐พ : R๐‘› โ†’ B and the function ๐‘ฅ โ†’ โˆฅ๐พ (๐‘ฅ)โˆฅB is (0, โˆž), ๐‘‘๐‘ก ๐‘ก (cid:17) measurable. Also, if we let T ( ๐‘“ )(๐‘ฅ) = (cid:18)โˆซ R๐‘› ๐‘กโˆ’๐‘›/2๐œ“๐‘ก (๐‘ฅ โˆ’ ๐‘ฆ) ๐‘“ (๐‘ฆ) ๐‘‘๐‘ฆ (cid:19) ๐‘ก>0 (cid:16) = (๐‘กโˆ’๐‘›/2๐œ“๐‘ก โˆ— ๐‘“ ) (๐‘ฅ) (cid:17) , ๐‘ก>0 we observe that and ๐บ๐œ“ ( ๐‘“ ) (๐‘ฅ) = โˆฅT ( ๐‘“ ) (๐‘ฅ) โˆฅB โˆฅ๐บ๐œ“ ( ๐‘“ ) โˆฅ ๐‘ ๐‘ = โˆฅT ( ๐‘“ ) โˆฅ ๐‘ ๐ฟ ๐‘ B (R๐‘›) . From Problem 6.1.4 of [38], the two properties above for the wavelet ๐œ“ imply that โˆฅ๐พ (๐‘ฅ) โˆฅB โ‰ค ๐‘๐‘› ๐ด |๐‘ฅ|๐‘› , and sup ๐‘ฆโˆˆR๐‘›\{0} โˆซ |๐‘ฅ|โ‰ฅ2|๐‘ฆ| โˆฅ๐พ (๐‘ฅ โˆ’ ๐‘ฆ) โˆ’ ๐พ (๐‘ฅ) โˆฅB ๐‘‘๐‘ฅ โ‰ค ๐‘โ€ฒ ๐‘› ๐ด , (2.7) (2.8) where ๐‘๐‘› and ๐‘โ€ฒ ๐‘› depend only on ๐‘›, ๐œ€, and ๐œ€โ€ฒ. We will omit the dependence on ๐œ€ and ๐œ€โ€ฒ throughout the rest of this paper, and this will have no effect on any of our proofs. Remark 2. For the rest of this paper, we will write ๐บ in place of ๐บ๐œ“ when referring to the ๐บ-function because the dependence on the mother wavelet is clear. Remark 3. Note that (2.5) holds under the alternative condition |โˆ‡๐œ“(๐‘ฅ)| โ‰ค ๐ด(1 + |๐‘ฅ|)โˆ’๐‘›โˆ’1โˆ’๐œ– โ€ฒ . (2.9) This is a consequence of Mean Value Theorem. 18 We have the following result taken from Problem 6.1.4 of [38] and from Chapter V of [39]. Lemma 5 ([38, 39]). Assume that ๐œ“ is defined as above and satisfies (2.7) and (2.8). Then the operator ๐บ is bounded from L2(R๐‘›) to L2(R๐‘›). Also, for ๐‘ โˆˆ (1, โˆž) and B = L2(R+, ๐‘‘๐‘ก/๐‘ก), we have โˆฅT ๐‘“ โˆฅL ๐‘ B (R๐‘›) โ‰ค ๐ถ๐‘› ๐ด max( ๐‘, ( ๐‘ โˆ’ 1)โˆ’1) โˆฅ ๐‘“ โˆฅL ๐‘ (R๐‘›) , for some ๐ถ๐‘›. For all ๐‘“ โˆˆ L1(R๐‘›), we also have โˆฅT ๐‘“ โˆฅL1,โˆž B (R๐‘›) โ‰ค ๐ถโ€ฒ ๐‘› ๐ดโˆฅ ๐‘“ โˆฅL1 (R๐‘›) and for some ๐ถโ€ฒ ๐‘›. โˆฅT ๐‘“ โˆฅL1 B (R๐‘›) โ‰ค ๐ถโ€ฒ ๐‘› ๐ดโˆฅ ๐‘“ โˆฅH1 (R๐‘›) , Remark 4. We can also formulate similar bounds for the Littlewood-Paley ๐”ค operator ๐”ค( ๐‘“ )(๐‘ฅ) := (cid:35) 1/2 |๐œ“ ๐‘— โˆ— ๐‘“ (๐‘ฅ)|2 (cid:34) โˆ‘๏ธ ๐‘— โˆˆZ using similar arguments. (2.10) (2.11) (2.12) (2.13) Remark 5. Let ๐œ“ be a wavelet that has properties (2.4) and (2.5). Then with the L2 normalized dilations, the Littlewood-Paley ๐บ-function can be written as: ๐บ ( ๐‘“ )(๐‘ฅ) = (cid:20)โˆซ โˆž 0 | ๐‘“ โˆ— ๐œ“๐œ† (๐‘ฅ)|2 (cid:21) 1/2 . ๐‘‘๐œ† ๐œ†๐‘›+1 (2.14) Note that the ๐œ† measure for ๐บ ( ๐‘“ ) matches the measure in defining the norm of W ๐‘“ . 2.2.1 The L2(R๐‘›) Wavelet Scattering Transform In this section we prove the L2(R๐‘›) scattering transforms are bounded operators. More specif- ically, we prove that ๐‘†๐‘š cont,2 : L2(R๐‘›) โ†’ L2(R๐‘š + ), where L2(R๐‘š + ) has the weighted measure defined by โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆฅ2 L2 (R๐‘š + ) := โˆซ โˆž ยท ยท ยท โˆซ โˆž 0 0 |๐‘†๐‘š cont,2 ๐‘“ (๐œ†1, . . . , ๐œ†๐‘š)|2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 . . . ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š 19 and we show that โˆฅ๐‘†๐‘š cont,2 where ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค ๐ถ โˆฅ ๐‘“ โˆฅL2 (R๐‘›). We also show that ๐‘†๐‘š dyad,2 : L2(R๐‘›) โ†’ โ„“2(Z๐‘š), โˆฅ๐‘†๐‘š dyad,2 ๐‘“ โˆฅ2 โ„“2 (Z๐‘š) := โˆ‘๏ธ ๐‘—๐‘šโˆˆZ . . . โˆ‘๏ธ ๐‘—1โˆˆZ |๐‘†๐‘š dyad,2 ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š)|2. Proposition 6. For any wavelet satisfying (2.4) and (2.5), we have ๐‘†๐‘š dyad,2 : L2(R๐‘›) โ†’ โ„“2(Z๐‘š). ๐‘†๐‘š cont,2 : L2(R๐‘›) โ†’ L2(R๐‘š + ) and Proof. The proof of the dyadic case is essentially identical to the proof given below and is thus omitted. The case of ๐‘š = 1 follows by an application of Fubiniโ€™s Theorem: โˆฅ๐‘†cont,2 ๐‘“ โˆฅ2 L2 (R+) = = = โˆซ โˆž 0 โˆซ โˆž 0 โˆซ R๐‘› โˆฅ ๐‘“ โˆ— ๐œ“๐œ† โˆฅ2 2 ๐‘‘๐œ† ๐œ†๐‘›+1 โˆซ R๐‘› |( ๐‘“ โˆ— ๐œ“๐œ†) (๐‘ฅ)|2 ๐‘‘๐‘ฅ ๐‘‘๐œ† ๐œ†๐‘›+1 |๐บ ( ๐‘“ ) (๐‘ฅ)|2 ๐‘‘๐‘ฅ โ‰ค ๐ถ โˆฅ ๐‘“ โˆฅ2 2 by boundedness of the G-function. Now we proceed by using induction. Assume that we have โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆฅ2 L2 (R๐‘š + ) โ‰ค ๐ถ๐‘š โˆฅ ๐‘“ โˆฅ2 2. Let W๐‘ก ๐‘“ = ๐‘“ โˆ— ๐œ“๐‘ก, define ๐‘€ ๐‘“ = | ๐‘“ |, and ๐‘ˆ๐œ† = ๐‘€๐‘Š๐œ† for notational brevity. Then notice that โˆฅ||| ๐‘“ โˆ— ๐œ“๐œ†1 | โˆ— ๐œ“๐œ†2 | โˆ— ยท ยท ยท โˆ— ๐œ“๐œ†๐‘š | โˆ— ๐œ“๐œ†๐‘š+1 โˆฅ2 2 = โˆฅW๐œ†๐‘š+1 ๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ โˆฅ2 2 . Substituting yields โˆฅ๐‘†๐‘š+1 cont,2 ๐‘“ โˆฅL2 (R๐‘š+1 + ) = = = โˆซ โˆž 0 โˆซ โˆž 0 โˆซ โˆž ยท ยท ยท ยท ยท ยท ยท ยท ยท โˆซ โˆž 0 โˆซ โˆž 0 โˆซ โˆž 0 0 โˆฅW๐œ†๐‘š+1 ๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 โˆซ โˆž 0 โˆฅ(๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 . . . ๐‘“ โˆฅ2 2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š+1 โˆฅ2 2 ๐‘‘๐œ†๐‘š+1 ๐œ†๐‘›+1 ๐‘š+1 ๐‘‘๐œ†๐‘š+1 ๐œ†๐‘›+1 ๐‘š+1 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 . . . ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š โˆซ โˆž ยท ยท ยท โ‰ค ๐ถ 0 โˆซ โˆž ยท ยท ยท = ๐ถ โˆซ โˆž 0 โˆซ โˆž 0 0 โ‰ค ๐ถ๐‘š+1โˆฅ ๐‘“ โˆฅ2 2 , ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 . . . . . . ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 . . . ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š โˆฅ๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ โˆฅ2 L2 (R+) โˆฅ๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ โˆฅ2 2 |๐‘†๐‘š cont,2(๐œ†1, . . . , ๐œ†๐‘š)|2 20 where we used the induction hypothesis in the last line. This completes the proof. โ–ก Proposition 7. Suppose ๐œ“ is a Littlewood-Paley wavelet satisfying (2.4) and (2.5). Then ๐‘†๐‘š ๐‘“ โˆฅ1 = ๐ถ๐‘š ๐œ“ โˆฅ ๐‘“ โˆฅ2 2. Also, ๐‘†๐‘š cont,2 ๐‘“ : dyad,2 : L2(R๐‘›) โ†’ โ„“2(Z๐‘š) and cont,2 L2(R๐‘›) โ†’ L2(R๐‘š โˆฅ๐‘†๐‘š dyad,2 ๐‘“ โˆฅ1 = ห†๐ถ๐‘š + ) and specifically โˆฅ๐‘†๐‘š ๐œ“ โˆฅ ๐‘“ โˆฅ2 2. Proof. We only provide the proof of the continuous case again. First consider the case ๐‘š = 1. We have: โˆฅ๐‘†cont,2 ๐‘“ โˆฅ2 L2 (R+) = โˆซ โˆž 0 โˆฅ ๐‘“ โˆ— ๐œ“๐œ† โˆฅ2 2 ๐‘‘๐œ† ๐œ†๐‘›+1 โˆฅ ห†๐‘“ ยท ห†๐œ“๐œ† โˆฅ2 2 ๐‘‘๐œ† ๐œ†๐‘›+1 โˆซ โˆž 0 โˆซ โˆž (cid:18)โˆซ = = = = = 1 (2๐œ‹)๐‘› 1 (2๐œ‹)๐‘› 1 (2๐œ‹)๐‘› 1 (2๐œ‹)๐‘› 1 (2๐œ‹)๐‘› 0 โˆซ R๐‘› (cid:18)โˆซ โˆž 0 R๐‘› โˆซ R๐‘› ๐ถ๐œ“ โˆฅ ห†๐‘“ โˆฅ2 2 | ห†๐‘“ (๐œ”)|2| ห†๐œ“๐œ† (๐œ”)|2 ๐‘‘๐œ” (cid:19) ๐‘‘๐œ† ๐œ†๐‘›+1 | ห†๐œ“(๐œ†๐œ”)|2 (cid:19) ๐‘‘๐œ† ๐œ† | ห†๐‘“ (๐œ”)|2 ๐‘‘๐œ” (cid:16) ๐ถ๐œ“ | ห†๐‘“ (๐œ”)|2(cid:17) ๐‘‘๐œ” = ๐ถ๐œ“ โˆฅ ๐‘“ โˆฅ2 2 . Thus the claim holds for ๐‘š = 1. Now assume that it holds through ๐‘š. Then by the inductive hypothesis, โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆฅ2 L2 (R+) = โˆซ โˆž ยท ยท ยท โˆซ โˆž 0 0 โˆฅ|| ๐‘“ โˆ— ๐œ“๐œ†1 | โˆ— ๐œ“๐œ†2 | โˆ— ยท ยท ยท โˆ— ๐œ“๐œ†๐‘š โˆฅ2 2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 . . . ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š = ๐ถ๐‘š ๐œ“ โˆฅ ๐‘“ โˆฅ2 2 . Now consider the case of ๐‘š + 1. Similar to the previous proposition, we have โˆฅ๐‘†๐‘š+1 cont,2 ๐‘“ โˆฅ2 L2 (R+) = โˆซ โˆž ยท ยท ยท 0 โˆซ โˆž = ๐ถ๐œ“ โˆซ โˆž (cid:32)โˆซ โˆž 0 โˆซ โˆž 0 ยท ยท ยท 0 0 โˆฅ(๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š+1 โˆฅ2 2 |๐‘†๐‘š cont,2 ๐‘“ (๐œ†1, . . . , ๐œ†๐‘š)|2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ๐‘‘๐œ†๐‘š+1 ๐œ†๐‘›+1 ๐‘š+1 ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š . . . (cid:33) ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 . . . ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š = ๐ถ๐œ“ โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆฅ2 L2 (R+) = ๐ถ๐‘š+1 ๐œ“ โˆฅ ๐‘“ โˆฅ2 2 . Thus, the claim is proven by induction. โ–ก 21 2.2.2 The L1(R๐‘›) Wavelet Scattering Transform Define the notation W๐‘ก ๐‘“ = ๐‘“ โˆ— ๐œ“๐‘ก, ๐‘€ ๐‘“ = | ๐‘“ |, and ๐‘ˆ๐‘ก = ๐‘€W๐‘ก. We now try to prove that for ๐‘š โˆˆ N, ๐‘†๐‘š cont,1 : H1(R๐‘›) โ†’ L2(R๐‘š + ). The norm for ๐‘†๐‘š cont,1 ๐‘“ is: โˆฅ๐‘†๐‘š cont,1 ๐‘“ โˆฅL2 (R๐‘š + ) := = (cid:32)โˆซ โˆž โˆซ โˆž โˆซ โˆž ยท ยท ยท 0 (cid:32)โˆซ โˆž 0 0 โˆซ โˆž โˆซ โˆž ยท ยท ยท 0 0 0 |๐‘†๐‘š cont,1 ๐‘“ (๐œ†1, ๐œ†2, . . . , ๐œ†๐‘š)|2 (cid:13) (cid:13)(๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š (cid:13) (cid:13) 2 1 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ๐‘‘๐œ†2 ๐œ†๐‘›+1 2 ๐‘‘๐œ†2 ๐œ†๐‘›+1 2 (cid:33) 1/2 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š (cid:33) 1/2 . ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š An analogous result will also hold for the operator H1(R๐‘›) โ†’ โ„“2(Z๐‘š + ) with norm โˆฅ๐‘†๐‘š dyad,1 ๐‘“ โˆฅโ„“2 (Z๐‘š) := (cid:32) โˆ‘๏ธ ๐‘—๐‘šโˆˆZ . . . โˆ‘๏ธ ๐‘—1โˆˆZ |๐‘†๐‘š dyad,1 ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š)|2 (cid:33) 1/2 . Before we begin, we will need an important multiplier property of the individual Riesz Trans- forms: ๐œ” ๐‘— |๐œ”| Let (cid:174)๐›ผ = (๐›ผ1, . . . , ๐›ผ๐‘›) be a multi-index with ๐‘›-elements, and let ๐‘ก = (๐‘ก1, . . . , ๐‘ก๐‘›) โˆˆ R๐‘›. We say (cid:100)๐‘… ๐‘— ๐‘“ (๐œ”) = โˆ’๐‘– ห†๐‘“ (๐œ”) . (2.15) that ๐œ“ has ๐‘˜ vanishing moments if for all | (cid:174)๐›ผ| < ๐‘˜, we have โˆซ R๐‘› (cid:0)ฮ ๐‘› ๐‘–=1 ๐‘ก๐›ผ๐‘– ๐‘– (cid:1) ๐œ“(๐‘ก)๐‘‘๐‘ก = 0. (2.16) The following lemmas will be necessary. Lemma 8 ([40]). Suppose that ๐œ“ has ๐‘ vanishing moments, let ๐‘€ > 1 be an integer, let (cid:174)๐›ผ be defined as before, and let (cid:174)๐›ฝ = (๐›ฝ1, . . . , ๐›ฝ๐‘›) be a multi-index. Assume that ๐œ“ satisfies the following properties: โ€ข ๐œ“ โˆˆ H๐‘  (R๐‘‘) โˆฉ ๐ถ (R๐‘‘) for some ๐‘  > ๐‘€ + ๐‘› 2 . โ€ข There exists ๐ด > 0 and ๐œ– โˆˆ [0, 1) such that ๐œ“ satisfies |๐ท (cid:174)๐›ผ๐œ“| โ‰ค ๐ด(1 + |๐‘ฅ|)โˆ’๐‘›โˆ’๐‘โˆ’| (cid:174)๐›ผ|+๐œ€ for 0 โ‰ค | (cid:174)๐›ผ| โ‰ค ๐‘€. โ€ข For 0 โ‰ค | (cid:174)๐›ผ| โ‰ค ๐‘€ โˆ’ 1 and | (cid:174)๐›ฝ| < ๐‘ + | (cid:174)๐›ผ|, โˆซ R๐‘› ฮ ๐‘› ๐‘–=1 ๐‘ก ๐›ฝ๐‘– ๐‘– ๐ท (cid:174)๐›ผ๐œ“(๐‘ก) ๐‘‘๐‘ก = 0. 22 Then |๐ท (cid:174)๐›ผ ๐‘…๐‘–๐œ“(๐‘ฅ)| = |๐‘…๐‘– ๐ท (cid:174)๐›ผ๐œ“(๐‘ฅ)| โ‰ค ๐ด(1 + |๐‘ฅ|)โˆ’๐‘›โˆ’๐‘โˆ’| (cid:174)๐›ผ|+๐œ€+๐›ฟ for some 0 < ๐›ฟ < 1 โˆ’ ๐œ€ and ๐ท (cid:174)๐›ผ ๐‘…๐‘–๐œ“ has vanishing moments up to degree ๐‘ โˆ’ 1 + | (cid:174)๐›ผ|. An immediate consequence is the following Lemma, which we will provide without proof. Lemma 9. Suppose that ๐œ“ satisfies the following conditions: โ€ข ๐œ“ โˆˆ H๐‘  (R๐‘‘) โˆฉ ๐ถ (R๐‘‘) for some ๐‘  > 2 + ๐‘› 2 . โ€ข There exists ๐ด > 0 and ๐œ– โˆˆ [0, 1) such that ๐œ“ satisfies |๐ท (cid:174)๐›ผ๐œ“| โ‰ค ๐ด(1 + |๐‘ฅ|)โˆ’๐‘›โˆ’2โˆ’| (cid:174)๐›ผ|+๐œ€ for 0 โ‰ค | (cid:174)๐›ผ| โ‰ค 3. โ€ข For 0 โ‰ค | (cid:174)๐›ผ| โ‰ค 2 and | (cid:174)๐›ฝ| < 2 + | (cid:174)๐›ผ|, โˆซ R๐‘› ฮ ๐‘› ๐‘–=1 ๐‘ก ๐›ฝ๐‘– ๐‘– ๐ท (cid:174)๐›ผ๐œ“(๐‘ก) ๐‘‘๐‘ก = 0. Then ๐‘… ๐‘— ๐œ“ and all of its first and second partial derivatives have ๐‘‚ ((1 + |๐‘ฅ|)โˆ’๐‘›โˆ’1+๐œ‚) decay for some ๐œ‚ โˆˆ (0, 1). The first implication to take note of is that ๐‘… ๐‘— ๐œ“ is a wavelet with "good" decay of itself and all its first and second partial derivatives. Note that the strict decay on the partial derivatives is necessary for technical reasons in later proofs, but decay on all second partial derivatives can be relaxed for the following theorem. Theorem 10. Let ๐œ“ be a wavelet satisfying Lemma 9 and let ๐‘†๐‘š cont,1 be defined as above. Then for ๐‘“ โˆˆ H1(R๐‘›), there exists a constant ๐ถ๐‘š such that Additionally, โˆฅ๐‘†๐‘š cont,1 ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค ๐ถ๐‘š โˆฅ ๐‘“ โˆฅH1 (R๐‘›) . โˆฅ๐‘†๐‘š dyad,1 ๐‘“ โˆฅโ„“2 (Z๐‘š) โ‰ค ๐ถ๐‘š โˆฅ ๐‘“ โˆฅH1 (R๐‘›). 23 Proof. We proceed by induction and only provide a proof for the continuous case because the dyadic case follows by almost identical reasoning. Let ๐‘“ โˆˆ H1(R๐‘›) throughout the proof. By Minkowskiโ€™s integral inequality ([41], Theorem 202), we have โˆฅ ๐‘“ โˆ— ๐œ“๐œ† โˆฅ2 1 (cid:19) 1/2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:18)โˆซ โˆž 0 (cid:32)โˆซ โˆž (cid:18)โˆซ 0 (cid:32)โˆซ R๐‘› (cid:18)โˆซ โˆž R๐‘› 0 โˆฅ๐‘†cont,1 ๐‘“ โˆฅL2 (R+) = = โ‰ค = โˆซ R๐‘› ๐บ ( ๐‘“ ) (๐‘ฅ) ๐‘‘๐‘ฅ | ๐‘“ โˆ— ๐œ“๐œ† (๐‘ฅ)| ๐‘‘๐‘ฅ (cid:33) 1/2 (cid:19) 2 ๐‘‘๐œ† ๐œ†๐‘›+1 | ๐‘“ โˆ— ๐œ“๐œ† (๐‘ฅ)|2 (cid:19) 1/2 (cid:33) ๐‘‘๐‘ฅ ๐‘‘๐œ† ๐œ†๐‘›+1 = โˆฅ๐บ ( ๐‘“ ) โˆฅ1 โ‰ค ๐ถ โˆฅ ๐‘“ โˆฅH1 (R๐‘›) , where in the last inequality we used Lemma 5. Now we assume that there exists some ๐‘š โ‰ฅ 1 such that โˆฅ๐‘†๐‘š cont,1 ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค ๐ถ๐‘š โˆฅ ๐‘“ โˆฅH1 (R๐‘›). 24 We have ๐‘“ โˆฅL2 (R๐‘š+1 ) + โˆฅ๐‘†๐‘š+1 cont,1 (cid:32)โˆซ โˆž = 0 (cid:32)โˆซ โˆž 0 = โˆซ โˆž 0 โ‰ค (cid:169) (cid:173) (cid:173) (cid:171) (cid:32)โˆซ โˆž = 0 (cid:32)โˆซ โˆž 0 (cid:32)โˆซ โˆž = = ยท ยท ยท ยท ยท ยท โˆซ โˆž 0 (cid:13) (cid:13)(๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š+1 (cid:13) 2 (cid:13) 1 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 โˆซ โˆž (cid:18)โˆซ 0 R๐‘› (cid:12) (cid:12)(๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š+1 (cid:12) (cid:12) ๐‘‘๐‘ฅ (cid:33) 1/2 ยท ยท ยท ๐‘‘๐œ†๐‘š+1 ๐œ†๐‘›+1 ๐‘š+1 (cid:19) 2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท (cid:34)โˆซ โˆž R๐‘› 0 (cid:12) (cid:12)(๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š+1 2 ๐‘‘๐œ†๐‘š+1 (cid:12) (cid:12) ๐œ†๐‘›+1 ๐‘š+1 ๐บ (๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) (๐‘ฅ) ๐‘‘๐‘ฅ 0 R๐‘› โˆฅ๐บ (๐‘ˆ๐œ†๐‘š ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆฅ2 1 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š (cid:33) 1/2 ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š ยท ยท ยท (cid:21) 2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 (cid:33) 1/2 โˆซ (cid:169) (cid:173) (cid:171) (cid:20)โˆซ ยท ยท ยท โˆซ โˆž 0 โˆซ โˆž โˆซ โˆž 0 โˆซ โˆž ยท ยท ยท ยท ยท ยท ยท ยท ยท โˆฅ๐บ (W๐œ†๐‘š๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆฅ2 1 (cid:33) 1/2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š 0 0 (cid:33) 1/2 ๐‘‘๐œ†๐‘š+1 ๐œ†๐‘›+1 ๐‘š+1 (cid:35) 1/2 2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š 1/2 (cid:170) (cid:174) (cid:174) (cid:172) ๐‘‘๐‘ฅ(cid:170) (cid:174) (cid:172) since the ๐บ function has a modulus already. It follows that โˆฅ๐‘†๐‘š cont,1 ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค ๐ถ (cid:32)โˆซ โˆž โˆซ โˆž ยท ยท ยท 0 0 โˆฅW๐œ†๐‘š๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ โˆฅ2 H1 (R๐‘›) ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š (cid:33) 1/2 . Now use the definition of the H1(R๐‘›) norm to write โˆฅW๐œ†๐‘š๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ โˆฅH1 (R๐‘›) = โˆฅW๐œ†๐‘š๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ โˆฅL1 (R๐‘›) + ๐‘› โˆ‘๏ธ ๐‘—=1 (cid:13) (cid:13) (cid:0)๐‘… ๐‘—W๐œ†๐‘š (cid:1) (๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ )(cid:13) (cid:13)L1 (R๐‘›) . Thus, since ๐‘… ๐‘—W๐œ†๐‘š โ„Ž = โ„Ž โˆ— (cid:0)๐‘… ๐‘— ๐œ“๐œ†๐‘š (cid:1) and ๐‘… ๐‘— ๐œ“ wavelet, we can use our induction hypothesis and the 25 previous lemma to get (cid:32)โˆซ โˆž ๐ถ โˆซ โˆž ยท ยท ยท โˆฅW๐œ†๐‘š (๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆฅ2 H1 (R๐‘›) (cid:33) 1/2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š โˆฅW๐œ†๐‘š (๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆฅ2 L1 (R๐‘›) ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท โˆซ โˆž 0 (cid:13) (cid:13) (cid:0)๐‘… ๐‘—W๐œ†๐‘š (cid:1) (๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ )(cid:13) 2 (cid:13) L1 (R๐‘›) (cid:33) 1/2 (cid:33) 1/2 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 0 (cid:32)โˆซ โˆž 0 ยท ยท ยท โ‰ค ๐ถ โˆซ โˆž 0 (cid:32)โˆซ โˆž ๐‘› โˆ‘๏ธ 0 ยท ยท ยท ๐‘—=1 0 + ๐ถ โ‰ค ๐ถ๐‘š+1โˆฅ ๐‘“ โˆฅH1 (R๐‘›). Thus, the theorem is proved by induction. โ–ก The case of ๐‘› = 1 is a little trickier. We have the following multiplier property for the Hilbert Transform: (cid:99)๐ป ๐‘“ (๐œ”) = +๐‘– (cid:98)๐‘“ (๐œ”) ๐œ” < 0 โˆ’๐‘– (cid:98)๐‘“ (๐œ”) ๐œ” > 0 ๏ฃฑ๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด ๏ฃณ (2.17) Unfortunately, this yields less regularity for (cid:99)๐ป ๐‘“ at the origin without additional assumptions. However, notice that the Hilbert transform commutes with dilations, so in particular: ๐ป (๐œ“๐œ†) = ๐ป (๐œ“)๐œ† and ๐ป (๐œ“ ๐‘— ) = ๐ป (๐œ“) ๐‘— . Using the calculation of (cid:99)๐ป ๐‘“ in (2.17) we see that ๐ป๐œ“ = โˆ’๐‘–๐œ“ , if ๐œ“ is complex analytic. Thus, we have the following corollary. Corollary 11. Let ๐œ“ be a complex analytic wavelet such that (2.4) and (2.5) hold. Then for ๐‘“ โˆˆ H1(R), there exists a constant ๐ถ๐‘š such that Additionally, โˆฅ๐‘†๐‘š cont,1 ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค ๐ถ๐‘š โˆฅ ๐‘“ โˆฅH1 (R) . โˆฅ๐‘†๐‘š dyad,1 ๐‘“ โˆฅโ„“2 (Z๐‘š) โ‰ค ๐ถ๐‘š โˆฅ ๐‘“ โˆฅH1 (R). 26 2.2.3 L๐‘ž (R๐‘›) Wavelet Scattering Transform In this section, assume 1 < ๐‘ž < 2. We prove that for ๐‘š โˆˆ N, ๐‘†๐‘š cont,๐‘ž : L๐‘ž (R๐‘›) โ†’ L2(R๐‘š + ). The norm for ๐‘†๐‘š cont,๐‘ž ๐‘“ is: โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) := = (cid:32)โˆซ โˆž โˆซ โˆž โˆซ โˆž ยท ยท ยท 0 (cid:32)โˆซ โˆž 0 0 โˆซ โˆž โˆซ โˆž ยท ยท ยท 0 0 0 |๐‘†๐‘š cont,๐‘ž ๐‘“ (๐œ†1, ๐œ†2, . . . , ๐œ†๐‘š)|2 (cid:16)(cid:13) (cid:13)(๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š (cid:13) (cid:13)๐‘ž There is also an analagous result for ๐‘‘๐œ†1 ๐‘‘๐œ†2 ๐œ†๐‘›+1 ๐œ†๐‘›+1 1 2 (cid:17) 2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ๐‘‘๐œ†2 ๐œ†๐‘›+1 2 (cid:33) ๐‘ž/2 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š (cid:33) ๐‘ž/2 . ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š โˆฅ๐‘†๐‘š dyad,๐‘ž ๐‘“ โˆฅ ๐‘ž โ„“2 (Z๐‘š) := (cid:32) โˆ‘๏ธ โˆ‘๏ธ ยท ยท ยท ๐‘—๐‘šโˆˆZ ๐‘—๐‘šโˆˆZ |๐‘†๐‘š dyad,๐‘ž ๐‘“ (๐œ†1, ๐œ†2, . . . , ๐œ†๐‘š)|2 (cid:33) ๐‘ž/2 . Theorem 12. Let 1 < ๐‘ž < 2. Also, let ๐œ“ be a wavelet that satisfies properties (2.4) and (2.5) and let ๐‘†๐‘š cont,๐‘ž and ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R+) โˆฅ๐‘†๐‘š dyad,๐‘ž be defined as above. Then there exists a universal constant ๐ถ๐‘š > 0 such that โ‰ค ๐ถ๐‘š โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž for all ๐‘“ โˆˆ L๐‘ž (R๐‘›), and furthermore โˆฅ๐‘†๐‘š โ‰ค ๐ถ๐‘š โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. dyad,๐‘ž ๐‘“ โˆฅ ๐‘ž โ„“2 (Z) Proof. We proceed by induction and consider the case of ๐‘š = 1 first. Let ๐‘“ โˆˆ L๐‘ž (R๐‘›). For the continuous wavelet transform, we apply Minkowskiโ€™s integral inequality: (cid:20)โˆซ โˆž 0 (cid:34)โˆซ โˆž (cid:18)โˆซ 0 R๐‘› โˆฅ๐‘†cont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R+) = = โ‰ค (cid:0)โˆฅ ๐‘“ โˆ— ๐œ“๐œ† โˆฅ๐‘ž(cid:1) ๐‘ž ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:21) 1/2 | ๐‘“ โˆ— ๐œ“๐œ† (๐‘ฅ)|๐‘ž ๐‘‘๐‘ฅ (cid:35) ๐‘ž/2 (cid:19) 2/๐‘ž ๐‘‘๐œ† ๐œ†๐‘›+1 โˆซ (cid:18)โˆซ โˆž R๐‘› 0 | ๐‘“ โˆ— ๐œ“๐œ† (๐‘ฅ)|2 (cid:19) ๐‘ž/2 ๐‘‘๐‘ฅ ๐‘‘๐œ† ๐œ†๐‘›+1 = โˆฅ๐บ ( ๐‘“ ) โˆฅ ๐‘ž ๐‘ž โ‰ค ๐ถ โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. where in the last inequality we used Theorem 5. Now, let us assume that โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) โ‰ค ๐ถ๐‘šยท๐‘ž โˆฅ ๐‘“ โˆฅ ๐‘ž L๐‘ž (R๐‘›) . 27 We apply Minkowskiโ€™s Integral inequality [41] to swap and then bound: โˆฅ๐‘†๐‘š+1 cont,๐‘ž ๐‘“ โˆฅ (cid:34)โˆซ โˆž ๐‘ž L2 (R๐‘š+1 + โˆซ โˆž ยท ยท ยท 0 (cid:34)โˆซ โˆž ยท ยท ยท 0 โˆซ โˆž (cid:18)โˆซ 0 0 R๐‘› ) (cid:16)(cid:13) (cid:13)(๐‘ˆ๐œ†1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š+1 (cid:13) (cid:13)๐‘ž (cid:17) 2/๐‘ž ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 |(๐‘ˆ๐œ†1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š+1 (๐‘ฅ)|๐‘ž ๐‘‘๐‘ฅ (cid:35) ๐‘ž/2 . . . ๐‘‘๐œ†๐‘š+1 ๐œ†๐‘›+1 ๐‘š+1 (cid:19) 2/๐‘ž ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท โˆซ โˆž (cid:34)โˆซ โˆž (cid:18)โˆซ 0 0 R๐‘› |(๐‘ˆ๐œ†1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š+1 (๐‘ฅ)|๐‘ž ๐‘‘๐‘ฅ ยท ยท ยท โˆซ โˆž 0 โˆซ (cid:32)โˆซ โˆž R๐‘› 0 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ |(๐‘ˆ๐œ†1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ ) โˆ— ๐œ“๐œ†๐‘š+1 (๐‘ฅ)|2 โˆซ โˆž ยท ยท ยท 0 0 โˆฅ๐บ (๐‘ˆ๐œ†1 ยท ยท ยท ๐‘ˆ๐œ†1 ๐‘“ )โˆฅ2 ๐‘ž . . . (cid:35) ๐‘ž/2 ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š โ‰ค ๐ถ๐‘ž (cid:34)โˆซ โˆž โˆซ โˆž ยท ยท ยท 0 0 โˆฅ(๐‘ˆ๐œ†1 ยท ยท ยท ๐‘ˆ๐œ†1) ๐‘“ โˆฅ2 ๐‘ž ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 = = = โ‰ค = 0 โˆซ โˆž ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:34)โˆซ โˆž โˆซ โˆž 0 (cid:35) ๐‘ž/2 . . . ๐‘‘๐œ†๐‘š+1 ๐œ†๐‘›+1 ๐‘š+1 ๐‘ž 2 ยท 2 (cid:19) 2/๐‘ž ๐‘‘๐œ†๐‘š+1 ๐œ†๐‘›+1 ๐‘š+1 (cid:35) ๐‘ž ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 . . . ๐‘ž/2 ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๐‘ž/2 (cid:33) ๐‘ž/2 ๐‘‘๐œ†๐‘š+1 ๐œ†๐‘›+1 ๐‘š+1 ๐‘‘๐‘ฅ 2 ๐‘ž ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 . . . ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (cid:35) ๐‘ž/2 . . . ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š โ–ก = ๐ถ๐‘ž โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) โ‰ค ๐ถ (๐‘š+1)๐‘ž โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. This proves the desired claim. 2.3 Stability to Dilations We now consider dilations defined by ๐œ(๐‘ฅ) = ๐‘๐‘ฅ for some constant ๐‘, so that ๐ฟ๐œ ๐‘“ (๐‘ฅ) = ๐‘“ ((1 โˆ’ ๐‘)๐‘ฅ). We will start by proving a lemma that will be useful for our work. Lemma 13. Assume ๐ฟ๐œ is defined as above. Then ๐ฟ๐œ ๐‘“ โˆ— ๐œ“๐œ† (๐‘ฅ) = (1 โˆ’ ๐‘)โˆ’๐‘›/2 (cid:0) ๐‘“ โˆ— ๐œ“(1โˆ’๐‘)๐œ†(cid:1) ((1 โˆ’ ๐‘)๐‘ฅ). Proof. Notice that ๐ฟ๐œ ๐‘“ โˆ— ๐œ“๐œ† (๐‘ฅ) = โˆซ R๐‘› ๐‘“ ((1 โˆ’ ๐‘)๐‘ฆ)๐œ“๐œ† (๐‘ฅ โˆ’ ๐‘ฆ) ๐‘‘๐‘ฆ. 28 We make the substitution ๐‘ง = (1 โˆ’ ๐‘)๐‘ฆ. Then it follows that ๐ฟ๐œ ๐‘“ โˆ— ๐œ“๐œ† (๐‘ฅ) = (1 โˆ’ ๐‘)โˆ’๐‘› โˆซ = (1 โˆ’ ๐‘)โˆ’๐‘› โˆซ ๐‘“ (๐‘ง)๐œ“๐œ† (๐‘ฅ โˆ’ (1 โˆ’ ๐‘)โˆ’1๐‘ง) ๐‘‘๐‘ง ๐‘“ (๐‘ง)๐œ†โˆ’๐‘›/2๐œ“ (cid:16) ๐œ†โˆ’1(๐‘ฅ โˆ’ (1 โˆ’ ๐‘)โˆ’1๐‘ง) (cid:17) ๐‘‘๐‘ง R๐‘› R๐‘› โˆซ R๐‘› โˆซ = (1 โˆ’ ๐‘)โˆ’๐‘›/2 ๐‘“ (๐‘ง) [(1 โˆ’ ๐‘)๐œ†]โˆ’๐‘›/2๐œ“ (cid:16) [(1 โˆ’ ๐‘)๐œ†]โˆ’1 ((1 โˆ’ ๐‘)๐‘ฅ โˆ’ ๐‘ง) (cid:17) ๐‘‘๐‘ง = (1 โˆ’ ๐‘)โˆ’๐‘›/2 ๐‘“ (๐‘ง)๐œ“(1โˆ’๐‘)๐œ† ((1 โˆ’ ๐‘)๐‘ฅ โˆ’ ๐‘ง) ๐‘‘๐‘ง R๐‘› = (1 โˆ’ ๐‘)โˆ’๐‘›/2 ๐‘“ โˆ— ๐œ“(1โˆ’๐‘)๐œ† ((1 โˆ’ ๐‘)๐‘ฅ) = (1 โˆ’ ๐‘)โˆ’๐‘›/2๐ฟ๐œ (cid:0) ๐‘“ โˆ— ๐œ“(1โˆ’๐‘)๐œ†(cid:1) (๐‘ฅ). โ–ก Remark 6. We also have ๐ฟ๐œW๐œ† ๐‘“ (๐‘ฅ) = ( ๐‘“ โˆ— ๐œ“๐œ†) (๐‘ฅ(1 โˆ’ ๐‘)). Before we begin the next Lemma, we explain the general idea behind our approach to explain the necessity of Lemma 14. Define ฮจ(๐‘ฅ) = (1 โˆ’ ๐‘)โˆ’๐‘›/2๐œ“(1โˆ’๐‘) (๐‘ฅ) โˆ’ ๐œ“(๐‘ฅ). (2.18) We want to prove that ฮจ satisfies (2.4) and (2.5) with a linear dependence on ๐‘ for future stability lemmas. Lemma 14. Suppose that ๐œ“ is a wavelet that satisfies the following three conditions: |๐œ“(๐‘ฅ)| โ‰ค |โˆ‡๐œ“(๐‘ฅ)| โ‰ค โˆฅ๐ท2๐œ“(๐‘ฅ)โˆฅโˆž โ‰ค ๐ด (1 + |๐‘ฅ|)๐‘›+1+๐›ผ ๐ด (1 + |๐‘ฅ|)๐‘›+1+๐›ฝ ๐ด (1 + |๐‘ฅ|)๐‘›+1+๐œ… ๐‘ฅ โˆˆ R๐‘›, ๐‘ฅ โˆˆ R๐‘›, ๐‘ฅ โˆˆ R๐‘›, ฮจ(๐‘ฅ) = (1 โˆ’ ๐‘)โˆ’๐‘›/2๐œ“(1โˆ’๐‘) (๐‘ฅ) โˆ’ ๐œ“(๐‘ฅ). (2.19) (2.20) (2.21) for ๐›ผ, ๐›ฝ, ๐œ… > 0. Consider for ๐‘ < 1 2๐‘› . Then ฮจ is a wavelet satisfying (2.4) and (2.5). 29 Proof. Without loss of generality, assume ๐›ผ < ๐›ฝ < ๐œ… < 1. First, itโ€™s clear that โˆซ R๐‘› ฮจ = 0. We now just need to verify properties (2.4) and (2.5). Assume ๐‘ > 0. We can modify the proof accordingly if ๐‘ < 0. Then |ฮจ(๐‘ฅ)| = (cid:12) (cid:12) (cid:12) (1 โˆ’ ๐‘)โˆ’๐‘›/2๐œ“(1โˆ’๐‘) (๐‘ฅ) โˆ’ ๐œ“(๐‘ฅ) (cid:18) (cid:19) (cid:12) (cid:12) (cid:12) = (1 โˆ’ ๐‘)โˆ’๐‘› ๐œ“ โˆ’ (1 โˆ’ ๐‘)๐‘›๐œ“ (๐‘ฅ) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) ๐‘ฅ (1 โˆ’ ๐‘) ๐‘ฅ (cid:17) 1 โˆ’ ๐‘ โ‰ค (1 โˆ’ ๐‘)โˆ’๐‘› (cid:16) ๐œ“ โˆ’ ๐œ“ (cid:18) 1 โˆ’ ๐‘ 1 โˆ’ ๐‘ ๐‘ฅ (cid:19)(cid:12) (cid:12) (cid:12) (cid:12) + (1 โˆ’ ๐‘)โˆ’๐‘› ๐‘› โˆ‘๏ธ ๐‘—=1 (cid:19) (cid:18)๐‘› ๐‘— ๐‘ ๐‘— |๐œ“ (๐‘ฅ)| . Now use mean value theorem on the first term to choose a point ๐‘ง on the segment connecting ๐‘ฅ 1โˆ’๐‘ and ๐‘ฅ such that ๐‘ 1 โˆ’ ๐‘ (cid:12)[โˆ‡๐œ“(๐‘ง)]๐‘‡ ๐‘ฅ(cid:12) (cid:12) (cid:12) = ๐œ“ (cid:12) (cid:12) (cid:12) (cid:12) (cid:17) (cid:16) ๐‘ฅ 1 โˆ’ ๐‘ โˆ’ ๐œ“ (cid:18) 1 โˆ’ ๐‘ 1 โˆ’ ๐‘ ๐‘ฅ (cid:19)(cid:12) (cid:12) (cid:12) (cid:12) . We now use Cauchy-Schwarz to bound the left side: ๐‘ 1 โˆ’ ๐‘ (cid:12)[โˆ‡๐œ“(๐‘ง)]๐‘‡ ๐‘ฅ(cid:12) (cid:12) (cid:12) โ‰ค ๐‘ 1 โˆ’ ๐‘ ๐ด|๐‘ฅ| (1 + |๐‘ง|)๐‘›+1+๐›ฝ . Since ๐‘ง lies on the segment connecting ๐‘ฅ 1โˆ’๐‘ and ๐‘ฅ, we see that for some ๐‘ก โˆˆ [0, 1], we have + ๐‘ก๐‘ฅ ๐‘ฅ = ๐‘ฅ 1 โˆ’ ๐‘ ๐‘ฅ + ๐‘ง = (1 โˆ’ ๐‘ก) 1 โˆ’ ๐‘ก 1 โˆ’ ๐‘ 1 โˆ’ ๐‘ก + ๐‘ก โˆ’ ๐‘ก๐‘ 1 โˆ’ ๐‘ ๐‘ฅ. ๐‘ก โˆ’ ๐‘ก๐‘ 1 โˆ’ ๐‘ ๐‘ฅ = = 1 โˆ’ ๐‘ก๐‘ 1 โˆ’ ๐‘ Thus, |๐‘ง| โ‰ฅ |๐‘ฅ|. It now follows that ๐‘ 1 โˆ’ ๐‘ ๐ด|๐‘ฅ| (1 + |๐‘ง|)๐‘›+1+๐›ฝ โ‰ค ๐‘ 1 โˆ’ ๐‘ ๐ด (1 + |๐‘ฅ|)๐‘›+๐›ฝ . Finally, we get |ฮจ๐œ† (๐‘ฅ)| โ‰ค (cid:205)๐‘› (cid:1)๐‘ ๐‘— (cid:0)๐‘› ๐‘— ๐‘—=1 ๐ด (1 + |๐‘ฅ|)๐‘›+๐›ฝ + (1 โˆ’ ๐‘)๐‘›+1 (cid:0)๐‘› (cid:1)๐‘ ๐‘— (cid:19) โˆ’๐‘›โˆ’1 (cid:205)๐‘› ๐‘—=1 ๐‘— (1 + |๐‘ฅ|)๐‘›+๐›ผ ๐ด (1 + |๐‘ฅ|)๐‘›+๐›ผ ๐‘ (1 โˆ’ ๐‘)๐‘›+1 (cid:18) 2๐‘› 2๐‘› โˆ’ 1 ๐ด๐‘›๐‘ (1 + |๐‘ฅ|)๐‘›+๐›ผ โ‰ค 2๐ด โ‰ค 30 for some constant ๐ด๐‘› since we assume ๐›ผ < ๐›ฝ and ๐‘ < 1 2๐‘› . Thus, (2.4) is satisfied. We use a similar idea for proving (2.5) holds. Assume ๐‘ > 0 without loss of generality and further assume that |๐‘ฅ| โ‰ฅ 2|๐‘ฆ|. By Mean Value Theorem, there exists ๐‘ง on the line segment connecting ๐‘ฅ and ๐‘ฅ โˆ’ ๐‘ฆ such that Like before, we notice that |ฮจ(๐‘ฅ โˆ’ ๐‘ฆ) โˆ’ ฮจ(๐‘ฅ)| = |โˆ‡ฮจ(๐‘ง)||๐‘ฆ|. |โˆ‡ฮจ(๐‘ง)| = (1 โˆ’ ๐‘)โˆ’๐‘›/2โˆ‡๐œ“(1โˆ’๐‘) (๐‘ง) โˆ’ โˆ‡๐œ“(๐‘ง) (cid:16) (cid:17) (1 โˆ’ ๐‘)โˆ’๐‘›โˆ’1โˆ‡๐œ“ (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) = (cid:12) = (1 โˆ’ ๐‘)โˆ’๐‘›โˆ’1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) โ‰ค (1 โˆ’ ๐‘)โˆ’๐‘›โˆ’1 ๐‘ง 1 โˆ’ ๐‘ ๐‘ง (cid:16) 1 โˆ’ ๐‘ ๐‘ง 1 โˆ’ ๐‘ (cid:16) (cid:17) (cid:17) โˆ‡๐œ“ โˆ‡๐œ“ โˆ’ โˆ‡๐œ“(๐‘ง) (cid:12) (cid:12) (cid:12) โˆ’ (1 โˆ’ ๐‘)๐‘›+1โˆ‡๐œ“(๐‘ง) (cid:12) (cid:12) (cid:12) โˆ’ โˆ‡๐œ“ (cid:18) 1 โˆ’ ๐‘ 1 โˆ’ ๐‘ ๐‘ง (cid:19)(cid:12) (cid:12) (cid:12) (cid:12) + (1 โˆ’ ๐‘)โˆ’๐‘›โˆ’1 ๐‘›+1 โˆ‘๏ธ ๐‘—=1 (cid:19) (cid:18)๐‘› + 1 ๐‘— ๐‘ ๐‘— |โˆ‡๐œ“ (๐‘ง)| . Let ๐‘† be the set of points on the segment connecting ๐‘ง 1โˆ’๐‘ and ๐‘ง. By Mean Value Inequality, since ๐‘† is closed and bounded, we have (cid:12) (cid:12) (cid:12) (cid:12) โˆ‡๐œ“ (cid:17) (cid:16) ๐‘ง 1 โˆ’ ๐‘ โˆ’ โˆ‡๐œ“ (cid:18) 1 โˆ’ ๐‘ 1 โˆ’ ๐‘ ๐‘ง (cid:19)(cid:12) (cid:12) (cid:12) (cid:12) โ‰ค ๐‘ 1 โˆ’ ๐‘ max ๐‘คโˆˆ๐‘† (cid:13)๐ท2๐œ“(๐‘ค)(cid:13) (cid:13) (cid:13)โˆž |๐‘ง|. The maximum for the quantity above is attained in ๐‘†, so let us say the maximizer is ๐‘ค1 = (1โˆ’๐‘ก) ๐‘ง for some ๐‘ก โˆˆ [0, 1]. Now use decay of the Hessian to bound the right side: 1โˆ’๐‘ +๐‘ก๐‘ง ๐‘ 1 โˆ’ ๐‘ max ๐‘คโˆˆ๐‘† (cid:13)๐ท2๐œ“(๐‘ค)(cid:13) (cid:13) (cid:13)โˆž |๐‘ง| โ‰ค ๐‘ 1 โˆ’ ๐‘ ๐ด|๐‘ง| (1 + |๐‘ค1|)๐‘›+1+๐œ… . It follows that + ๐‘ก๐‘ง ๐‘ง = ๐‘ง 1 โˆ’ ๐‘ ๐‘ง + ๐‘ค1 = (1 โˆ’ ๐‘ก) 1 โˆ’ ๐‘ก 1 โˆ’ ๐‘ 1 โˆ’ ๐‘ก + ๐‘ก โˆ’ ๐‘ก๐‘ 1 โˆ’ ๐‘ ๐‘ง. ๐‘ก โˆ’ ๐‘ก๐‘ 1 โˆ’ ๐‘ ๐‘ง = = 1 โˆ’ ๐‘ก๐‘ 1 โˆ’ ๐‘ 31 Thus, |๐‘ค1| โ‰ฅ |๐‘ง|. We conclude ๐‘ 1 โˆ’ ๐‘ For bounding |โˆ‡ฮจ(๐‘ง)|, we see ๐ด|๐‘ง| (1 + |๐‘ค1|)๐‘›+1+๐œ… โ‰ค ๐‘ 1 โˆ’ ๐‘ ๐ด (1 + |๐‘ง|)๐‘›+๐œ… . (cid:1)๐‘ ๐‘— (cid:0)๐‘›+1 ๐‘— (cid:205)๐‘›+1 ๐‘—=1 (1 โˆ’ ๐‘)๐‘›+1 ๐ด (1 + |๐‘ง|)๐‘›+1+๐›ฝ |โˆ‡ฮจ(๐‘ง)| โ‰ค ๐‘ (1 โˆ’ ๐‘)๐‘›+2 ๐ด (1 + |๐‘ง|)๐‘›+๐œ… + (cid:0)๐‘›+1 (cid:1)๐‘ ๐‘— 2 (cid:205)๐‘›+1 ๐‘—=1 ๐‘— (1 + |๐‘ง|)๐‘›+๐œ… (cid:0)๐‘›+1 (cid:19) ๐‘›+2 2๐ด (cid:205)๐‘›+1 ๐‘—=1 ๐‘— (1 + |๐‘ง|)๐‘›+๐œ… (cid:1)๐‘ ๐‘— . โ‰ค ๐ด(1 โˆ’ ๐‘)โˆ’๐‘›โˆ’2 โ‰ค (cid:18) 2๐‘› 2๐‘› โˆ’ 1 Going back to proving (2.5) holds for ฮจ, |ฮจ(๐‘ฅ โˆ’ ๐‘ฆ) โˆ’ ฮจ(๐‘ฅ)| = |โˆ‡ฮจ(๐‘ง)||๐‘ฆ| โ‰ค (cid:18) 2๐‘› 2๐‘› โˆ’ 1 (cid:19) ๐‘›+2 2๐ด (cid:205)๐‘›+1 ๐‘—=1 (cid:1)๐‘ ๐‘— |๐‘ฆ| (cid:0)๐‘›+1 ๐‘— (1 + |๐‘ง|)๐‘›+๐œ… . since the point ๐‘ง lies on the lines on a line segment connecting ๐‘ฅ โˆ’ ๐‘ฆ and ๐‘ฅ with |๐‘ฅ| โ‰ฅ 2|๐‘ฆ|, we can use an argument similar to above to conclude |ฮจ(๐‘ฅ โˆ’ ๐‘ฆ) โˆ’ ฮจ(๐‘ฅ)| โ‰ค 2๐‘›+1+๐œ… (cid:18) 2๐‘› 2๐‘› โˆ’ 1 (cid:0)๐‘›+1 (cid:1)๐‘ ๐‘— (cid:19) ๐‘›+2 ๐ด (cid:205)๐‘›+1 ๐‘—=1 ๐‘— (1 + |๐‘ฅ|)๐‘›+๐œ… |๐‘ฆ|. Now integrate to get โˆซ |๐‘ฅ|โ‰ฅ2|๐‘ฆ| |ฮจ(๐‘ฅ โˆ’ ๐‘ฆ) โˆ’ ฮจ(๐‘ฅ)| ๐‘‘๐‘ฅ โ‰ค 2๐‘›+1+๐œ… = 2๐‘›+1+๐œ… (cid:18) 2๐‘› 2๐‘› โˆ’ 1 (cid:18) 2๐‘› 2๐‘› โˆ’ 1 (cid:19) ๐‘›+2 ๐ด ๐‘›+1 โˆ‘๏ธ ๐‘—=1 (cid:19) (cid:18)๐‘› + 1 ๐‘— โˆซ ๐‘ ๐‘— |๐‘ฆ| |๐‘ฅ|โ‰ฅ2|๐‘ฆ| ๐‘‘๐‘ฅ |๐‘ฅ|๐‘›+๐œ… (cid:19) ๐‘›+2 ๐ด๐ผ๐‘› ๐‘›+1 โˆ‘๏ธ ๐‘—=1 (cid:19) (cid:18)๐‘› + 1 ๐‘— ๐‘ ๐‘— |๐‘ฆ|1โˆ’๐œ…, where ๐ผ๐‘› is some constant associated with the integration. Finally, we have a bound of โˆซ |๐‘ฅ|โ‰ฅ2|๐‘ฆ| |ฮจ(๐‘ฅ โˆ’ ๐‘ฆ) โˆ’ ฮจ(๐‘ฅ)| ๐‘‘๐‘ฅ โ‰ค หœ๐ด๐‘›๐‘|๐‘ฆ|1โˆ’๐œ…. for some constant หœ๐ด๐‘› only dependent on the dimension ๐‘›. Thus, (2.5) holds with exponent 1 โˆ’ ๐œ… โˆˆ (0, 1). Let ห†๐ด๐‘› = max{๐ด๐‘›, หœ๐ด๐‘›}. It follows that ห†๐ด๐‘›๐‘ (1 + |๐‘ฅ|)๐‘›+๐›ผ |ฮจ๐œ† (๐‘ฅ)| โ‰ค โˆซ |๐‘ฅ|โ‰ฅ2|๐‘ฆ| |ฮจ(๐‘ฅ โˆ’ ๐‘ฆ) โˆ’ ฮจ(๐‘ฅ)| ๐‘‘๐‘ฅ โ‰ค ห†๐ด๐‘›๐‘|๐‘ฆ|1โˆ’๐œ…. โ–ก 32 It follows from Problem 6.1.2 in [38] that the bound in the ๐บ-function depends linearly on the constant ๐ด from Theorem 5 when proving L2(R๐‘›) boundedness. Thus, the following corollaries hold. Corollary 15. Assume |๐‘| < 1 2๐‘› . For ๐œ“ satisfying the conditions of Lemma 14, when 1 < ๐‘ < โˆž, there exist constants ๐ถ๐‘›,๐‘ and ห†๐ถ๐‘›,๐‘ such that (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:18)โˆซ โˆž 0 | ๐‘“ โˆ— ฮจ๐œ† (๐‘ฅ)|2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:19) 1/2(cid:13) (cid:13) (cid:13) (cid:13) (cid:13)L ๐‘ (R๐‘›) โ‰ค ๐‘ ยท ๐ถ๐‘›,๐‘ max{๐‘, ( ๐‘ โˆ’ 1)โˆ’1}โˆฅ ๐‘“ โˆฅL ๐‘ (R๐‘›) and (cid:32) โˆ‘๏ธ (cid:33) 1/2(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)L ๐‘ (R๐‘›) Alternatively, if one of the following holds: | ๐‘“ โˆ— ฮจ๐‘— (๐‘ฅ)|2 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘— โˆˆZ โ‰ค ๐‘ ยท ห†๐ถ๐‘› max{๐‘, ( ๐‘ โˆ’ 1)โˆ’1}โˆฅ ๐‘“ โˆฅL ๐‘ (R๐‘›). โ€ข ๐‘› = 1, ๐œ“ is complex analytic and satisfies the conditions of Lemma 14, โ€ข ๐‘› โ‰ฅ 2 and ๐œ“ satisfies the conditions of Lemma 9, there exist constants ๐ป๐‘› and ห†๐ป๐‘› such that and (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:18)โˆซ โˆž 0 | ๐‘“ โˆ— ฮจ๐œ† (๐‘ฅ)|2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:19) 1/2(cid:13) (cid:13) (cid:13) (cid:13) (cid:13)L1 (R๐‘›) โ‰ค ๐‘ ยท ๐ป๐‘› โˆฅ ๐‘“ โˆฅH1 (R๐‘›) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) โˆ‘๏ธ ๐‘— โˆˆZ | ๐‘“ โˆ— ฮจ๐‘— (๐‘ฅ)|2 (cid:33) 1/2(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)L1 (R๐‘›) โ‰ค ๐‘ ยท ห†๐ป๐‘› โˆฅ ๐‘“ โˆฅH1 (R๐‘›). Now we can use the results above for our main dilation stability results. Theorem 16. Suppose that ๐œ“ is a wavelet that satisfies the conditions of Lemma 14. Then there exists a constants ๐พ๐‘›,๐‘š and ห†๐พ๐‘›,๐‘š only dependent on ๐‘› and ๐‘š such that and for any |๐‘| < 1 2๐‘› . โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆ’ ๐‘†๐‘š cont,2 ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค |๐‘| ยท ๐พ๐‘›,๐‘š โˆฅ ๐‘“ โˆฅ2 โˆฅ๐‘†๐‘š dyad,2 ๐‘“ โˆ’ ๐‘†๐‘š dyad,2 ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค |๐‘| ยท ห†๐พ๐‘›,๐‘š โˆฅ ๐‘“ โˆฅ2 33 Proof. Without loss of generality, assume ๐‘ > 0. Let W๐‘ก ๐‘“ = ๐‘“ โˆ— ๐œ“๐‘ก ๐‘€ ๐‘“ = | ๐‘“ | ๐‘ˆ๐‘ก = ๐‘€W๐‘ก ๐ด๐‘ž ๐‘“ = (cid:18)โˆซ R๐‘› ๐‘“ ๐‘ž (๐‘ฅ) ๐‘‘๐‘ฅ (cid:19) 1/๐‘ž . It follows that ๐‘†๐‘š cont,2 = ๐ด2๐‘€๐‘Š๐œ†๐‘š๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1. We will also let ๐‘‰๐‘šโˆ’1 = ๐‘ˆ๐œ†๐‘šโˆ’1 ยท ยท ยท ๐‘ˆ๐œ†1, with ๐‘‰0 being the identity operator, and make a slight abuse of notation by denoting W๐œ†๐‘š as W. First, we will add and subtract ๐ด2๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ and apply triangle inequality: โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆ’ ๐‘†๐‘š cont,2 ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) = โˆฅ ๐ด2๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด2๐‘€W๐‘‰๐‘šโˆ’1๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค โˆฅ ๐ด2๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด2๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅL2 (R๐‘š + ) + ). + โˆฅ ๐ด2๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด2๐‘€W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š Weโ€™ll start by bounding the first term. We see that ๐‘” = W๐‘‰๐‘šโˆ’1 ๐‘“ โˆˆ L2(R๐‘›). Thus | ๐ด2๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด2๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ | = |โˆฅ๐‘”โˆฅ2 โˆ’ โˆฅ๐ฟ๐œ๐‘”โˆฅ2| . Now use a change of variables: โˆฅ๐ฟ๐œ๐‘”โˆฅ2 2 = โˆซ R๐‘› |๐‘”((1 โˆ’ ๐‘)๐‘ฅ)|2 ๐‘‘๐‘ฅ = (1 โˆ’ ๐‘)โˆ’๐‘› โˆฅ๐‘”โˆฅ2 2 . It then follows that |โˆฅ๐ฟ๐œ๐‘”โˆฅ2 โˆ’ โˆฅ๐‘”โˆฅ2| = โˆฅ๐‘”โˆฅ2 (cid:18) 1 (1 โˆ’ ๐‘)๐‘›/2 (cid:19) โˆ’ 1 โ‰ค โˆฅ๐‘”โˆฅ2 (cid:18) 1 (1 โˆ’ ๐‘)๐‘› โˆ’ 1 (cid:19) . 34 Taking the scattering norm yields โˆฅ ๐ด2๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด2๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅ2 L2 (R๐‘š + ) โ‰ค = (cid:19) 2 (cid:18) 1 (1 โˆ’ ๐‘)๐‘› โˆ’ 1 (cid:19) 2 (cid:18) 1 โˆ’ (1 โˆ’ ๐‘)๐‘› (1 โˆ’ ๐‘)๐‘› โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆฅ2 L2 (R๐‘š + ) โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆฅ2 L2 (R๐‘š + ) 1 (1 โˆ’ ๐‘)๐‘› ๐‘› โˆ‘๏ธ ๐‘—=1 (cid:19) (cid:18)๐‘› ๐‘— ๐‘ ๐‘— (cid:170) (cid:174) (cid:172) (cid:18) 2๐‘› 2๐‘› โˆ’ 1 (cid:19) ๐‘› ๐‘› โˆ‘๏ธ ๐‘—=1 (cid:19) (cid:18)๐‘› ๐‘— ๐‘ ๐‘— = (cid:169) (cid:173) (cid:171) ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ โ‰ค 2 โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆฅ2 L2 (R๐‘š + ) 2 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆฅ2 L2 (R๐‘š + ) โ‰ค ๐‘2 ยท ๐ถ๐‘š,๐‘› โˆฅ ๐‘“ โˆฅ2 2 . For the second term, apply Minkwoskiโ€™s inequality for 2 norms: โˆฅ ๐ด2๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด2๐‘€W๐‘‰๐‘šโˆ’1๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) (cid:32)โˆซ โˆž 0 (cid:32)โˆซ โˆž โˆซ โˆž 0 โˆซ โˆž ยท ยท ยท ยท ยท ยท 0 0 = โ‰ค |โˆฅ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅ2 โˆ’ โˆฅW ๐ฟ๐œ๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅ2|2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 โˆฅ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ W ๐ฟ๐œ๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅ2 2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š (cid:33) 1/2 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š (cid:33) 1/2 = โˆฅ ๐ด2๐‘€ [W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ] ๐‘“ โˆฅL2 (R๐‘š + ). Now this is a commutator term, and we can now bound: โˆฅ ๐ด2๐‘€ [W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ] ๐‘“ โˆฅ2 L2 (R๐‘š + ) = โˆซ โˆž โˆซ โˆž ยท ยท ยท 0 0 โˆฅ [W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ] ๐‘“ โˆฅ2 2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š = โˆฅ| [W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ] ๐‘“ โˆฅ2 โ‰ค โˆฅ [W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ] โˆฅ2 L2 (R๐‘š + ร—R๐‘›) + ร—R๐‘›)โ†’L2 (R๐‘›) โˆฅ ๐‘“ โˆฅ2 2 L2 (R๐‘š . We examine the commutator term more closely. Without a loss of generality, assume ๐‘š โ‰ฅ 2. By expanding it, we see that each term contains [W, ๐ฟ๐œ]. It follows that โˆฅ [W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ] โˆฅL2 (R๐‘š โ‰ค ๐‘šโˆฅW โˆฅ๐‘šโˆ’1 + ร—R๐‘›) L2 (R+ร—R๐‘›)โ†’L2 (R๐‘›) โˆฅ ๐‘€ โˆฅ๐‘šโˆ’1 โ‰ค ๐ถ๐‘š โˆฅ [W, ๐ฟ๐œ] โˆฅL2 (R+ร—R๐‘›)โ†’L2 (R๐‘›). L2 (R๐‘›)โ†’L2 (R๐‘›) โˆฅ [W, ๐ฟ๐œ] โˆฅL2 (R+ร—R๐‘›)โ†’L2 (R๐‘›) 35 Thus, once we bound this quantity appropriately, we will finish the proof. We start by writing โˆฅ [W, ๐ฟ๐œ] ๐‘“ โˆฅ2 L2 (R+ร—R๐‘›) = โˆซ โˆž 0 โˆฅ(๐ฟ๐œ ๐‘“ ) โˆ— ๐œ“๐œ† โˆ’ ๐ฟ๐œ ( ๐‘“ โˆ— ๐œ“๐œ†) โˆฅ2 2 ๐‘‘๐œ† ๐œ†๐‘›+1 . By substitution with ๐‘ง = (1 โˆ’ ๐‘)๐‘ฅ and Lemma 13, โˆฅ(๐ฟ๐œ ๐‘“ ) โˆ— ๐œ“๐œ† โˆ’ ๐ฟ๐œ ( ๐‘“ โˆ— ๐œ“๐œ†) โˆฅ2 2 โˆซ = R๐‘› โˆซ R๐‘› (cid:12) (cid:12) = (cid:12) = (1 โˆ’ ๐‘)โˆ’๐‘› โˆซ = (1 โˆ’ ๐‘)โˆ’๐‘› โˆซ = (1 โˆ’ ๐‘)โˆ’๐‘› โˆซ R๐‘› R๐‘› R๐‘› |( ๐ฟ๐œ ๐‘“ โˆ— ๐œ“๐œ† )(๐‘ฅ) โˆ’ ๐ฟ๐œ ( ๐‘“ โˆ— ๐œ“๐œ†) (๐‘ฅ)|2 ๐‘‘๐‘ฅ (1 โˆ’ ๐‘)โˆ’๐‘›/2 (cid:0) ๐‘“ โˆ— ๐œ“(1โˆ’๐‘)๐œ†(cid:1) ((1 โˆ’ ๐‘)๐‘ฅ) โˆ’ ( ๐‘“ โˆ— ๐œ“๐œ†) ((1 โˆ’ ๐‘)๐‘ฅ) (cid:12) 2 (cid:12) (cid:12) ๐‘‘๐‘ฅ (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (1 โˆ’ ๐‘)โˆ’๐‘›/2 (cid:0) ๐‘“ โˆ— ๐œ“(1โˆ’๐‘)๐œ†(cid:1) (๐‘ง) โˆ’ ( ๐‘“ โˆ— ๐œ“๐œ†) (๐‘ง) (cid:12) 2 (cid:12) (cid:12) ๐‘‘๐‘ง (cid:16) ๐‘“ โˆ— (1 โˆ’ ๐‘)โˆ’๐‘›/2๐œ“(1โˆ’๐‘)๐œ† โˆ’ ๐œ“๐œ† (cid:17)(cid:12) 2 (cid:12) (cid:12) ๐‘‘๐‘ง |( ๐‘“ โˆ— ฮจ๐œ†) (๐‘ง)|2 ๐‘‘๐‘ง, = (1 โˆ’ ๐‘)โˆ’๐‘› โˆฅ ๐‘“ โˆ— ฮจ๐œ† โˆฅ2 2 . Thus, we obtain โˆซ โˆž 0 โˆฅ(๐ฟ๐œ ๐‘“ ) โˆ— ๐œ“๐œ† โˆ’ ๐ฟ๐œ ( ๐‘“ โˆ— ๐œ“๐œ†) โˆฅ2 2 ๐‘‘๐œ† ๐œ†๐‘›+1 โˆฅ ๐‘“ โˆ— ฮจ๐œ† โˆฅ2 2 ๐‘‘๐œ† ๐œ†๐‘›+1 โˆซ โˆž | ๐‘“ โˆ— ฮจ๐œ† (๐‘ฅ)|2 ๐‘‘๐œ† ๐œ†๐‘›+1 ๐‘‘๐‘ฅ | ๐‘“ โˆ— ฮจ๐œ† (๐‘ฅ)|2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:19) 1/2(cid:13) 2 (cid:13) (cid:13) (cid:13) (cid:13) 2 0 = (1 โˆ’ ๐‘)โˆ’๐‘› โˆซ โˆž = (1 โˆ’ ๐‘)โˆ’๐‘› โˆซ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:18) 2๐‘› 2๐‘› โˆ’ 1 = (1 โˆ’ ๐‘)โˆ’๐‘› โ‰ค ๐‘2 ยท 0 (cid:19) ๐‘› R๐‘› 0 (cid:18)โˆซ โˆž ๐ถ๐‘›,๐‘ โˆฅ ๐‘“ โˆฅ2 2 . It follows that for any ๐‘ < 1 2๐‘› . โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆ’ ๐‘†๐‘š cont,2 ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค |๐‘| ยท ๐พ๐‘›,๐‘š โˆฅ ๐‘“ โˆฅ2 โ–ก As is customary at this point, we have the following corollaries. We start with the case where 1 < ๐‘ž < 2. 36 Corollary 17. Assume |๐‘| < 1 2๐‘› . For ๐‘ž โˆˆ (1, 2), there exists constants ๐พ๐‘›,๐‘š,๐‘ž and ห†๐พ๐‘›,๐‘š,๐‘ž such that โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆ’ ๐‘†๐‘š cont,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) โ‰ค |๐‘|๐‘ž ยท ๐พ๐‘›,๐‘š,๐‘ž โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž and โˆฅ๐‘†๐‘š dyad,๐‘ž ๐‘“ โˆ’ ๐‘†๐‘š dyad,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐‘ž โ„“2 (Z๐‘š) โ‰ค |๐‘|๐‘ž ยท ห†๐พ๐‘›,๐‘š,๐‘ž โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. Proof. Without loss of generality again, assume ๐‘ > 0. First, we will add and subtract ๐ด๐‘ž ๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ and apply triangle inequality: โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆ’ ๐‘†๐‘š cont,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) = โˆฅ ๐ด๐‘ž ๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€W๐‘‰๐‘šโˆ’1๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค โˆฅ ๐ด๐‘ž ๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅL2 (R๐‘š + ) + โˆฅ ๐ด๐‘ž ๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ). Weโ€™ll start by bounding the first term again. Define ๐‘” = W๐‘‰๐‘šโˆ’1 ๐‘“ โˆˆ L๐‘ž (R๐‘›). and we have | ๐ด๐‘ž ๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ | = (cid:12) (cid:12)โˆฅ๐‘”โˆฅ๐‘ž โˆ’ โˆฅ๐ฟ๐œ๐‘”โˆฅ๐‘ž (cid:12) (cid:12) . By change of variables, (cid:12) (cid:12)โˆฅ๐‘”โˆฅ๐‘ž โˆ’ โˆฅ๐ฟ๐œ๐‘”โˆฅ๐‘ž (cid:12) (cid:12) = โˆฅ๐‘”โˆฅ๐‘ž (cid:18) (cid:19) 1 (1 โˆ’ ๐‘)๐‘›/๐‘ž โˆ’ 1 โ‰ค โˆฅ๐‘”โˆฅ๐‘ž (cid:18) 1 (1 โˆ’ ๐‘)๐‘› โˆ’ 1 (cid:19) . Again, we have โˆฅ ๐ด๐‘ž ๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) โ‰ค โ‰ค = = โ‰ค (cid:19) ๐‘ž โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆฅ ๐‘ž ๐‘ž L2 (R๐‘š + ) (cid:18) (cid:18) 1 (1 โˆ’ ๐‘)๐‘›/๐‘ž โˆ’ 1 (cid:19) ๐‘ž 1 (1 โˆ’ ๐‘)๐‘› โˆ’ 1 (cid:21) ๐‘ž (cid:20) 1 โˆ’ (1 โˆ’ ๐‘)๐‘› (1 โˆ’ ๐‘)๐‘› ๐‘› โˆ‘๏ธ 1 (1 โˆ’ ๐‘)๐‘› ๐‘—=1 (cid:19) (cid:18)๐‘› ๐‘— ๐‘ ๐‘— ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (cid:18) 2๐‘› 2๐‘› โˆ’ 1 (cid:19) ๐‘› ๐‘› โˆ‘๏ธ ๐‘—=1 (cid:19) (cid:18)๐‘› ๐‘— ๐‘ ๐‘— โ‰ค |๐‘|๐‘ž ยท ๐ถ๐‘š,๐‘› โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) ๐‘ž ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ 37 For the second term, apply Minkoskiโ€™s inequality for ๐‘ž norms: โˆฅ ๐ด๐‘ž ๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) (cid:32)โˆซ โˆž 0 (cid:32)โˆซ โˆž โˆซ โˆž 0 โˆซ โˆž ยท ยท ยท ยท ยท ยท 0 0 = โ‰ค (cid:12) (cid:12)โˆฅ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅ๐‘ž โˆ’ โˆฅW ๐ฟ๐œ๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅ๐‘ž 2 ๐‘‘๐œ†1 (cid:12) (cid:12) ๐œ†๐‘›+1 1 โˆฅ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ W ๐ฟ๐œ๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅ2 ๐‘ž ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š (cid:33) 1/2 ยท ยท ยท ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š (cid:33) 1/2 + ). = โˆฅ ๐ด๐‘ž ๐‘€ [W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ] ๐‘“ โˆฅL2 (R๐‘š Via a similar reduction technique for Theorem 16, we can reduce to a commutator bound above. Additionally, we have Thus, โˆฅ(๐ฟ๐œ ๐‘“ ) โˆ— ๐œ“๐œ† โˆ’ ๐ฟ๐œ ( ๐‘“ โˆ— ๐œ“๐œ†) โˆฅ ๐‘ž ๐‘ž = (1 โˆ’ ๐‘)โˆ’๐‘› โˆฅ ๐‘“ โˆ— ฮจ๐œ† โˆฅ ๐‘ž ๐‘ž . โˆฅ ๐ด๐‘ž ๐‘€ [W, ๐ฟ๐œ] ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) = (cid:18)โˆซ โˆž 0 โˆฅ(๐ฟ๐œ ๐‘“ ) โˆ— ๐œ“๐œ† โˆ’ ๐ฟ๐œ ( ๐‘“ โˆ— ๐œ“๐œ†) โˆฅ2 ๐‘ž (cid:19) ๐‘ž/2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:18)โˆซ โˆž = (1 โˆ’ ๐‘)โˆ’๐‘› โˆฅ ๐‘“ โˆ— ฮจ๐œ† โˆฅ2 ๐‘ž (cid:19) ๐‘ž/2 ๐‘‘๐œ† ๐œ†๐‘›+1 โ‰ค (1 โˆ’ ๐‘)โˆ’๐‘› 0 (cid:18)โˆซ โˆž 0 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) โ‰ค |๐‘|๐‘ž ยท หœ๐ถ๐‘› โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. | ๐‘“ โˆ— ฮจ๐œ† (๐‘ฅ)|2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:19) 1/2(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘ž ๐‘ž It follows that for any |๐‘| < 1 2๐‘› . โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆ’ ๐‘†๐‘š cont,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) โ‰ค |๐‘|๐‘ž ยท ๐พ๐‘›,๐‘š โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž โ–ก Additionally, for the case of ๐‘ž = 1, we have the following corollary that we will state, but not prove, since the idea is the same as the previous corollary. Corollary 18. Suppose one of the following holds: โ€ข ๐‘› = 1, ๐œ“ is complex analytic and satisfies the conditions of Lemma 14, 38 โ€ข ๐‘› โ‰ฅ 2 and ๐œ“ satisfies the conditions of Lemma 9, then there exist constants ๐พ๐ป,๐‘š and ห†๐พ๐ป,๐‘š such that โˆฅ๐‘†๐‘š cont,1 ๐‘“ โˆ’ ๐‘†๐‘š cont,1 ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค ๐‘ ยท ๐พ๐ป,๐‘š โˆฅ ๐‘“ โˆฅH1 (R๐‘›) and โˆฅ๐‘†๐‘š dyad,1 ๐‘“ โˆ’ ๐‘†๐‘š dyad,1 ๐ฟ๐œ ๐‘“ โˆฅโ„“2 (Z๐‘š) โ‰ค ๐‘ ยท ห†๐พ๐ป,๐‘š โˆฅ ๐‘“ โˆฅH1 (R๐‘›). 2.4 Stability to Diffeomorphisms We now focus on the stability of ๐‘†๐‘š cont,๐‘ž ๐‘“ for general diffeomorphisms with โˆฅ๐ท๐œโˆฅโˆž < 1 2๐‘› . The corresponding operator for diffeomorphisms is defined as ๐ฟ๐œ ๐‘“ (๐‘ฅ) = ๐‘“ (๐‘ฅ โˆ’ ๐œ(๐‘ฅ)). 2.4.1 Stability to Diffeomorphisms When ๐‘ž = 2 Proposition 19. Assume ๐œ“ and its first and second order derivatives have decayโˆ— in ๐‘‚ ((1+|๐‘ฅ|)โˆ’๐‘›โˆ’3), and โˆซ R๐‘› ๐œ“(๐‘ฅ) ๐‘‘๐‘ฅ = 0. Then for every ๐œ โˆˆ ๐ถ2(R๐‘›) with โˆฅ๐ท๐œโˆฅโˆž โ‰ค 1 2๐‘› , there exists หœ๐ถ๐‘› > 0 such that: โˆฅ [W, ๐ฟ๐œ] โˆฅL2 (R+ร—R๐‘›)โ†’L2 (R๐‘›) โ‰ค หœ๐ถ๐‘› (cid:18) โˆฅ๐ท๐œโˆฅโˆž (cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 + โˆฅ๐ท2๐œโˆฅโˆž (cid:19) . Proof. The argument is a continuous version of Lemma 2.14 in [11]. We will first show how to transform our commutator term into an analogous commutator term from [11]. To shorten notation, we will denote โˆฅ [W, ๐ฟ๐œ] โˆฅL2 (R+ร—R๐‘›) as โˆฅ [W, ๐ฟ๐œ] โˆฅ. We have โˆฅ [W, ๐ฟ๐œ] ๐‘“ โˆฅ2 L2 (R+ร—R๐‘›) = = = โˆซ โˆž 0 โˆซ โˆž 0 โˆซ โˆž โˆซ 0 R๐‘› โˆฅ [W๐‘ก, ๐ฟ๐œ] ๐‘“ โˆฅ2 2 ๐‘‘๐‘ก ๐‘ก๐‘›+1 โˆฅ๐œ“๐‘ก โˆ— (๐ฟ๐œ ๐‘“ ) โˆ’ ๐ฟ๐œ (๐œ“๐‘ก โˆ— ๐‘“ ) โˆฅ2 2 ๐‘‘๐‘ก ๐‘ก๐‘›+1 |๐œ“๐‘ก โˆ— (๐ฟ๐œ ๐‘“ ) โˆ’ ๐ฟ๐œ (๐œ“๐‘ก โˆ— ๐‘“ )|2 ๐‘‘๐‘ฅ ๐‘‘๐‘ก ๐‘ก๐‘›+1 . Notice that ๐œ“ 1 ๐‘ก (๐‘ฅ) = ๐‘ก๐‘›/2๐œ“(๐‘ก๐‘ฅ). Use the change of variables ๐‘ก = 1 ๐œ† to get โˆฅ [W, ๐ฟ๐œ] ๐‘“ โˆฅ2 L2 (R+ร—R๐‘›) = = โˆซ โˆž 0 โˆซ โˆž 0 ๐œ† ๐œ“ 1 (cid:13) (cid:13) (cid:13) (cid:13) ๐œ†๐‘›/2๐œ“ 1 (cid:13) (cid:13) ๐œ† โˆ— (๐ฟ๐œ ๐‘“ ) โˆ’ ๐ฟ๐œ (๐œ“ 1 โˆ— ๐‘“ ) (cid:13) (cid:13) (cid:13) 2 2 ๐œ†๐‘›โˆ’1 ๐‘‘๐œ† ๐œ† โˆ— (๐ฟ๐œ ๐‘“ ) โˆ’ ๐ฟ๐œ (๐œ†๐‘›/2๐œ“ 1 ๐œ† โˆ— ๐‘“ ) (cid:13) 2 (cid:13) (cid:13) 2 ๐‘‘๐œ† ๐œ† . โˆ—Similar to [31], we have found that there needs to be ๐‘‚ ((1 + |๐‘ฅ|) โˆ’๐‘›โˆ’2+๐›ผ) decay for some ๐›ผ > 0 to bound (E.26) in [11]. 39 Define ๐’ฒ๐œ† ๐‘“ = ๐‘“ โˆ— ๐œ†๐‘›/2๐œ“ 1 ๐œ† with ๐œ†๐‘›/2๐œ“ 1 ๐œ† (๐‘ฅ) = ๐œ†๐‘›๐œ“(๐œ†๐‘ฅ). In other words, ๐’ฒ๐‘ก is a convolution with an L1 normalized wavelet, which matches with the normalization in [11]. Now we have โˆฅ [W, ๐ฟ๐œ] ๐‘“ โˆฅ2 L2 (R+ร—R๐‘›) = โˆซ โˆž 0 โˆฅ [๐’ฒ๐œ†, ๐ฟ๐œ] ๐‘“ โˆฅ2 2 ๐‘‘๐œ† ๐œ† . This implies [W, ๐ฟ๐œ]โˆ— [W, ๐ฟ๐œ] = โˆซ โˆž [๐’ฒ๐œ†, ๐ฟ๐œ]โˆ— [๐’ฒ๐œ†, ๐ฟ๐œ] Defining ๐พ๐œ† = ๐’ฒ๐œ† โˆ’ ๐ฟ๐œ๐’ฒ๐œ† ๐ฟโˆ’1 0 ๐œ so that [๐’ฒ๐œ†, ๐ฟ๐œ] = ๐พ๐œ† ๐ฟ๐œ, we have: ๐‘‘๐œ† ๐œ† โˆฅ [W, ๐ฟ๐œ] โˆฅ = โˆฅ [W, ๐ฟ๐œ]โˆ— [W, ๐ฟ๐œ] โˆฅ1/2 = = (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 0 (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 0 [๐’ฒ๐œ†, ๐ฟ๐œ]โˆ— [๐’ฒ๐œ†, ๐ฟ๐œ] ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) ๐ฟโˆ— ๐œ๐พ โˆ— ๐œ†๐พ๐œ† ๐ฟ๐œ ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) 1/2 , โ‰ค โˆฅ๐ฟ๐œ โˆฅ ยท (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 0 ๐พ โˆ— ๐œ†๐พ๐œ† Since โˆฅ๐ฟ๐œ ๐‘“ โˆฅ2 2 โ‰ค (cid:16) 1 1โˆ’๐‘›โˆฅ๐ท๐œโˆฅโˆž (cid:17) โˆฅ ๐‘“ โˆฅ2 2, โˆฅ๐ฟ๐œ โˆฅ โ‰ค 1 1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž โ‰ค 2 and the problem is reduced to bounding (cid:13) (cid:13) (cid:13) โˆซ โˆž 0 ๐พ โˆ— ๐œ†๐พ๐œ† ๐œ†โˆ’1 ๐‘‘๐œ† 1/2 (cid:13) (cid:13) (cid:13) . Let ๐›พ โ‰ฅ 1. The integral is divided into three pieces: (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 0 ๐พ โˆ— ๐œ†๐พ๐œ† 1/2 ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) โ‰ค โ‰ค โˆซ 2โˆ’๐›พ (cid:18)(cid:13) (cid:13) (cid:13) (cid:13) 0 โˆซ 2โˆ’๐›พ (cid:13) (cid:13) (cid:13) (cid:13) 0 ๐พ โˆ— ๐œ†๐พ๐œ† ๐พ โˆ— ๐œ†๐พ๐œ† + (cid:13) ๐‘‘๐œ† (cid:13) (cid:13) ๐œ† (cid:13) (cid:13) 1/2 (cid:13) (cid:13) (cid:13) ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) + โˆซ 1 2โˆ’๐›พ (cid:13) โˆซ 1 (cid:13) (cid:13) (cid:13) 2โˆ’๐›พ ๐พ โˆ— ๐œ†๐พ๐œ† ๐พ โˆ— ๐œ†๐พ๐œ† (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 1/2 1 ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) ๐‘‘๐œ† ๐œ† + (cid:13) (cid:13) (cid:13) (cid:13) ๐พ โˆ— ๐œ†๐พ๐œ† (cid:19) 1/2 ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 1 ๐พ โˆ— ๐œ†๐พ๐œ† ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) = ๐‘ƒ1 + ๐‘ƒ2 + ๐‘ƒ3. To bound ๐‘ƒ1, we decompose ๐พ๐œ† = หœ๐พ๐œ†,1 + หœ๐พ๐œ†,2, where the kernels defining หœ๐พ๐œ†,1, หœ๐พ๐œ†,2 are หœ๐‘˜๐œ†,1(๐‘ฅ, ๐‘ข) := (1 โˆ’ det(๐ผ โˆ’ ๐ท๐œ(๐‘ข)))๐œ†๐‘›๐œ“(๐œ†(๐‘ฅ โˆ’ ๐‘ข)) := ๐‘Ž(๐‘ข)๐œ†๐‘›๐œ“(๐œ†(๐‘ฅ โˆ’ ๐‘ข)), หœ๐‘˜๐œ†,2(๐‘ฅ, ๐‘ข) := det(๐ผ โˆ’ ๐ท๐œ(๐‘ข))(๐œ†๐‘›๐œ“(๐œ†(๐‘ฅ โˆ’ ๐‘ข)) โˆ’ ๐œ†๐‘›๐œ“(๐œ†(๐‘ฅ โˆ’ ๐œ(๐‘ฅ) โˆ’ ๐‘ข + ๐œ(๐‘ข))), 40 respectively. Since our normalization matches with [11], E.13 implies that there exists a constant ๐ถ๐‘› such that We want to prove that โˆฅ หœ๐พ๐œ†,2โˆฅ โ‰ค ๐ถ๐‘›๐œ†โˆฅฮ”๐œโˆฅโˆž. (cid:13) (cid:13) (cid:13) (cid:13) โˆซ 1 0 หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,1 ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) โ‰ค ๐ถ๐‘› โˆฅ๐ท๐œโˆฅโˆž. Let ๐‘“ โˆˆ L2(R๐‘›) be arbitrary and define หœ๐œ“(๐‘ก) = ๐œ“โˆ—(โˆ’๐‘ก). Based on [11], the kernel of หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,1 is given by หœ๐‘˜๐œ† (๐‘ฆ, ๐‘ง) := ๐‘Ž(๐‘ฆ)๐‘Ž(๐‘ง)๐œ†๐‘›/2 หœ๐œ“ 1 ๐œ† โˆ— ๐œ†๐‘›/2 หœ๐œ“ 1 ๐œ† (๐‘ง โˆ’ ๐‘ฆ). Thus, it is sufficient to bound the quantity โˆซ 1 0 โˆฅ หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,1 ๐‘“ โˆฅ2 2 ๐‘‘๐œ† ๐œ† . We see that โˆฅ๐‘Žโˆฅโˆž โ‰ค ๐‘›โˆฅ๐ท๐œโˆฅโˆž. Substituting in the kernel and bounding yields โˆซ 1 0 โˆฅ หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,1 ๐‘“ โˆฅ2 2 ๐‘‘๐œ† ๐œ† = = โˆซ 1 โˆซ 0 โˆซ 1 R๐‘› โˆซ 0 R๐‘› (cid:18) ๐‘Ž(๐‘ฆ)๐‘Ž(๐‘ง) (cid:12) (cid:12) (cid:12) (cid:12) โˆซ R๐‘› ๐œ†๐‘›/2 หœ๐œ“ 1 ๐œ† โˆ— ๐œ†๐‘›/2๐œ“ 1 ๐œ† (cid:19) (๐‘ง โˆ’ ๐‘ฆ) ๐‘“ (๐‘ฆ) ๐‘‘๐‘ฆ 2 (cid:12) (cid:12) (cid:12) (cid:12) ๐‘‘๐‘ง |๐‘Ž(๐‘ง)|2 ๐‘Ž(๐‘ฆ) โˆซ (cid:12) (cid:12) (cid:12) (cid:12) โˆซ R๐‘› (cid:12) โˆซ (cid:12) (cid:12) (cid:12) R๐‘› R๐‘› (cid:18) ๐œ†๐‘›/2 หœ๐œ“ 1 ๐œ† โˆ— ๐œ†๐‘›/2๐œ“ 1 ๐œ† (cid:19) (๐‘ง โˆ’ ๐‘ฆ) ๐‘“ (๐‘ฆ) ๐‘‘๐‘ฆ 2 (cid:12) (cid:12) (cid:12) (cid:12) (cid:18) ๐‘Ž(๐‘ฆ) (cid:19) ๐œ†๐‘›/2 หœ๐œ“ 1 ๐œ† โˆ— ๐œ†๐‘›/2๐œ“ 1 ๐œ† (๐‘ง โˆ’ ๐‘ฆ) ๐‘“ (๐‘ฆ) ๐‘‘๐‘ฆ (cid:12) 2 (cid:12) (cid:12) (cid:12) ๐‘‘๐œ† ๐œ† ๐‘‘๐‘ง ๐‘‘๐œ† ๐œ† . โ‰ค ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž โˆซ 1 0 ๐‘‘๐œ† ๐œ† ๐‘‘๐‘ง Let ๐น (๐‘ฆ) = ๐‘Ž(๐‘ฆ) ๐‘“ (๐‘ฆ) โˆˆ L2(R๐‘›) and let F represent taking the Fourier Transform. Then we substitute ๐น (๐‘ฆ) for ๐‘Ž(๐‘ฆ) ๐‘“ (๐‘ฆ) in the last line of the inequality above to get ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž โˆซ 1 โˆซ (cid:12) โˆซ (cid:12) (cid:12) (cid:12) R๐‘› โˆซ โˆซ 1 0 = ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž = ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž = ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž 0 โˆซ 1 0 โˆซ R๐‘› R๐‘› (cid:12) โˆซ (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) F R๐‘› (cid:18) R๐‘› โˆซ R๐‘› ๐‘Ž(๐‘ฆ) (cid:18) ๐œ†๐‘›/2 หœ๐œ“ 1 ๐œ† (cid:18) ๐œ†๐‘›/2 หœ๐œ“ 1 ๐œ† โˆ— ๐œ†๐‘›/2๐œ“ 1 ๐œ† (cid:19) โˆ— ๐œ†๐‘›/2๐œ“ 1 ๐œ† (cid:19) ๐œ†๐‘›/2 หœ๐œ“ 1 ๐œ† โˆ— ๐œ†๐‘›/2๐œ“ 1 ๐œ† (cid:19) ๐‘‘๐œ† ๐œ† ๐œ† )|4 | ห†๐œ“( ๐œ” | ห†๐น (๐œ”)|2 (cid:18)โˆซ 1 0 (cid:12) 2 (cid:12) (cid:12) (cid:12) ๐‘‘๐‘ง ๐‘‘๐œ† ๐œ† ๐‘‘๐‘ง ๐‘‘๐œ† ๐œ† (cid:19) (๐‘ง โˆ’ ๐‘ฆ) ๐‘“ (๐‘ฆ) ๐‘‘๐‘ฆ 2 (cid:12) (cid:12) (cid:12) (cid:12) ๐‘‘๐œ† ๐œ† (๐‘ง โˆ’ ๐‘ฆ)๐น (๐‘ฆ) ๐‘‘๐‘ฆ (๐œ”) ห†๐น (๐œ”) 2 (cid:12) (cid:12) (cid:12) (cid:12) ๐‘‘๐‘ง ๐‘‘๐œ”. To finish up the argument, we make a substitution to rewrite โˆซ 1 0 | ห†๐œ“( ๐œ” ๐œ† )|4 ๐‘‘๐œ† ๐œ† = โˆซ โˆž 1 | ห†๐œ“(๐œ†๐œ”)|4 ๐‘‘๐œ† ๐œ† . 41 Using our decay assumptions on ๐œ“ and its partial derivatives, from Problem 6.1.3 in [38], we know that | ห†๐œ“(๐œ”)| โ‰ค ๐‘€๐œ“min{|๐œ”|, |๐œ”|โˆ’2} for some constant ๐‘€๐œ“. Now, consider the quantity โˆซ โˆž 0 | ห†๐œ“(๐œ†๐œ”)|4 ๐‘‘๐œ† ๐œ† . Without loss of generality, assume that |๐œ”| = 1 since dilations of ๐œ” do not change the integral. It follows that โˆซ โˆž 0 | ห†๐œ“(๐œ†๐œ”)|4 ๐‘‘๐œ† ๐œ† โ‰ค ๐‘€๐œ“ โˆซ 1 0 ๐œ†3๐‘‘๐œ† + ๐‘€๐œ“ โˆซ โˆž 1 ๐œ†โˆ’9๐‘‘๐œ† < โˆž, and we conclude that โˆซ โˆž 1 | ห†๐œ“(๐œ†๐œ”)|4 ๐‘‘๐œ† ๐œ† โ‰ค ๐ด๐œ“ for some constant ๐ด๐œ“. To finish up, ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž โˆซ R๐‘› | ห†๐น (๐œ”)|2 (cid:18)โˆซ 1 0 | ห†๐œ“( ๐œ” ๐œ† )|4 (cid:19) ๐‘‘๐œ† ๐œ† ๐‘‘๐œ” โ‰ค ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž ๐ด๐œ“ โ‰ค ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž ๐ด๐œ“ โˆซ R๐‘› โˆซ R๐‘› | ห†๐น (๐œ”)|2 ๐‘‘๐œ” |๐‘Ž(๐‘ง) ๐‘“ (๐‘ง)|2 ๐‘‘๐‘ง โ‰ค ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž ๐ด๐œ“ โˆฅ ๐‘“ โˆฅ2 2 . Thus, we have the desired bound on (cid:13) โˆซ 1 (cid:13) (cid:13) 0 (cid:13) หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,1 ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) . 42 Substituting everything in yields โˆซ 2โˆ’๐›พ (cid:13) (cid:13) (cid:13) (cid:13) ๐พ โˆ— ๐œ†๐พ๐œ† 1/2 ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) 0 โˆซ 2โˆ’๐›พ (cid:13) (cid:13) (cid:13) (cid:13) 0 โˆซ 2โˆ’๐›พ (cid:13) (cid:13) (cid:13) (cid:13) 0 โˆซ 2โˆ’๐›พ (cid:18)(cid:13) (cid:13) (cid:13) (cid:13) 0 โˆซ 2โˆ’๐›พ (cid:18)(cid:13) (cid:13) (cid:13) (cid:13) 0 โˆซ 2โˆ’๐›พ (cid:13) (cid:13) (cid:13) (cid:13) 0 = = โ‰ค โ‰ค โ‰ค (cid:32) (cid:16) โ‰ค 2๐ถ๐‘› โ‰ค 2๐ถ๐‘› ( หœ๐พ๐œ†,1 + หœ๐พ๐œ†,2)โˆ—( หœ๐พ๐œ†,1 + หœ๐พ๐œ†,2) ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) ( หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,1 + หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,2 + หœ๐พ โˆ— ๐œ†,2 หœ๐พ๐œ†,1 + หœ๐พ โˆ— ๐œ†,2 หœ๐พ๐œ†,2) ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) 1/2 + หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,1 หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,1 ๐‘‘๐œ† ๐œ† ๐‘‘๐œ† ๐œ† หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,1 ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) โˆฅ๐ท๐œโˆฅโˆž + โˆฅฮ”๐œโˆฅโˆž โˆซ 2โˆ’๐›พ (cid:13) (cid:13) (cid:13) (cid:13) 0 โˆซ 2โˆ’๐›พ 0 (cid:18)โˆซ 2โˆ’๐›พ + 0 (cid:18)โˆซ 2โˆ’๐›พ 0 (cid:19) 1/2 (cid:13) (cid:13) (cid:13) (cid:13) ๐‘‘๐œ† ๐œ† (cid:19) 1/2 หœ๐พ โˆ— ๐œ†,1 หœ๐พ๐œ†,2 + หœ๐พ โˆ— ๐œ†,2 หœ๐พ๐œ†,1 + หœ๐พ โˆ— ๐œ†,2 หœ๐พ๐œ†,2 ๐‘‘๐œ† ๐œ† โˆฅ หœ๐พ๐œ†,2โˆฅ2 ๐‘‘๐œ† ๐œ† + โˆฅ หœ๐พ๐œ†,2โˆฅ2 ๐‘‘๐œ† ๐œ† โˆซ 2โˆ’๐›พ 0 (cid:19) 1/2 + 2โˆฅ หœ๐พ๐œ†,1โˆฅโˆฅ หœ๐พ๐œ†,2โˆฅ (cid:18)โˆซ 2โˆ’๐›พ 0 2โˆฅ หœ๐พ๐œ†,1โˆฅโˆฅ หœ๐พ๐œ†,2โˆฅ (cid:19) 1/2 ๐œ†2 ๐‘‘๐œ† ๐œ† + โˆฅ๐ท๐œโˆฅ1/2 โˆž โˆฅฮ”๐œโˆฅ1/2 โˆž (cid:18)โˆซ 2โˆ’๐›พ 2๐œ† 0 ๐‘‘๐œ† ๐œ† (cid:19) 1/2 ๐‘‘๐œ† ๐œ† (cid:19) 1/2(cid:33) โˆฅ๐ท๐œโˆฅโˆž + 2โˆ’๐›พ โˆฅฮ”๐œโˆฅโˆž + 2โˆ’๐›พ/2โˆฅ๐ท๐œโˆฅ1/2 โˆž โˆฅฮ”๐œโˆฅ1/2 โˆž (cid:17) โ‰ค 4๐ถ๐‘› (โˆฅ๐ท๐œโˆฅโˆž + 2โˆ’๐›พ โˆฅฮ”๐œโˆฅโˆž) . To bound ๐‘ƒ3, we decompose ๐พ๐œ† = ๐พ๐œ†,1 + ๐พ๐œ†,2, where the kernels defining ๐พ๐œ†,1, ๐พ๐œ†,2 are ๐‘˜๐œ†,1(๐‘ฅ, ๐‘ข) = ๐œ†๐‘›๐œ“(๐œ†(๐‘ฅ โˆ’ ๐‘ข)) โˆ’ ๐œ†๐‘›๐œ“(๐œ†(๐ผ โˆ’ ๐ท๐œ(๐‘ข)) (๐‘ฅ โˆ’ ๐‘ข)) det(๐ผ โˆ’ ๐ท๐œ(๐‘ข)) ๐‘˜๐œ†,2(๐‘ฅ, ๐‘ข) = det(๐ผ โˆ’ ๐ท๐œ(๐‘ข))๐œ†๐‘›๐œ“(๐œ†(๐ผ โˆ’ ๐ท๐œ(๐‘ข)) (๐‘ฅ โˆ’ ๐‘ข)) โˆ’ ๐œ†๐‘›๐œ“(๐œ†(๐‘ฅ โˆ’ ๐œ(๐‘ฅ) โˆ’ ๐‘ข + ๐œ(๐‘ข))). A similar computation to the one for ๐‘ƒ1 shows that: (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 1 ๐พ โˆ— ๐œ†๐พ๐œ† 1/2 ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) โ‰ค + โˆซ โˆž (cid:13) (cid:13) (cid:13) (cid:13) 1 (cid:18)โˆซ โˆž ๐พ โˆ— ๐œ†,1 ๐พ๐œ†,1 ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) + 2โˆฅ๐พ๐œ†,1โˆฅโˆฅ๐พ๐œ†,2โˆฅ ๐‘‘๐œ† ๐œ† 1 โˆฅ๐พ๐œ†,2โˆฅ2 (cid:19) 1/2 ๐‘‘๐œ† ๐œ† (cid:18)โˆซ โˆž 1 (cid:19) 1/2 . Letting ๐‘„ ๐‘— = ๐พ โˆ— 2 ๐‘— ,1 ๐พ2 ๐‘— ,1, it is shown in [11] that: โˆฅ๐พ๐œ†,1โˆฅ โ‰ค ๐ถ๐‘› โˆฅ๐ท๐œโˆฅโˆž โˆฅ๐พ๐œ†,2โˆฅ โ‰ค min{๐œ†โˆ’๐‘› โˆฅ๐ท2๐œโˆฅโˆž, โˆฅ๐ท๐œโˆฅโˆž} โˆฅ๐‘„ ๐‘— ๐‘„โ„“ โˆฅ โ‰ค ๐ถ2 ๐‘›2โˆ’| ๐‘—โˆ’โ„“| (โˆฅ๐ท๐œโˆฅโˆž + โˆฅ๐ท2๐œโˆฅโˆž)4 43 so that (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 1 ๐พ โˆ— ๐œ†,1 ๐พ๐œ†,1 1/2 ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) = (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) = โˆš๏ธ 0 log(2) ๐พ โˆ— 2 ๐‘— ,1 ๐พ2 ๐‘— ,1 log(2) ๐‘‘๐‘— (cid:13) 1/2 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) โˆซ โˆž 0 ๐‘„ ๐‘— ๐‘‘๐‘— (cid:13) 1/2 (cid:13) (cid:13) (cid:13) . We now apply a continuous version of Cotlarโ€™s Lemma (see Ch. 7 of [42], Sec. 5.5 for the continuous extension). We define: ๐›ฝ( ๐‘—, โ„“) = ๐ถ๐‘›2โˆ’| ๐‘—โˆ’โ„“|/2(โˆฅ๐ท๐œโˆฅโˆž + โˆฅ๐ท2๐œโˆฅโˆž)2 ๐‘— โ‰ฅ 0 and โ„“ โ‰ฅ 0 . 0 otherwise ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ Defining ๐‘„ ๐‘— = 0 for ๐‘— < 0, we have โˆฅ๐‘„โˆ— ๐‘— ๐‘„โ„“ โˆฅ โ‰ค ๐›ฝ( ๐‘—, โ„“)2 and โˆฅ๐‘„ ๐‘— ๐‘„โˆ— โ„“ โˆฅ โ‰ค ๐›ฝ( ๐‘—, โ„“)2 for all ๐‘—, โ„“. Thus โˆซ by Cotlarโ€™s Lemma: (cid:13) (cid:13) (cid:13) (cid:13) R (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 0 ๐‘„ ๐‘— ๐‘‘ ๐‘— ๐‘„ ๐‘— ๐‘‘ ๐‘— (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) โ‰ค sup ๐‘— โˆˆR โ‰ค sup ๐‘— โ‰ฅ0 โˆซ R โˆซ โˆž 0 ๐›ฝ( ๐‘—, โ„“) ๐‘‘โ„“, ๐›ฝ( ๐‘—, โ„“) ๐‘‘โ„“ โ‰ค ๐ถ๐‘› (โˆฅ๐ท๐œโˆฅโˆž + โˆฅ๐ป๐œโˆฅโˆž)2 (cid:32) sup ๐‘— โ‰ฅ0 โˆซ โˆž 0 2โˆ’| ๐‘—โˆ’โ„“|/2 ๐‘‘โ„“ (cid:33) . Now observing that with the change of variable ๐œ†1 = 2 ๐‘— , ๐œ†2 = 2โ„“, we have 2โˆ’| ๐‘—โˆ’โ„“|/2 = ๐œ†1 ๐œ†2 โˆง ๐œ†2 ๐œ†1 , we obtain: โˆซ โˆž 0 sup ๐‘— โ‰ฅ0 2โˆ’| ๐‘—โˆ’โ„“|/2 ๐‘‘โ„“ = sup ๐œ†1โ‰ฅ1 โˆซ โˆž 1 ๐‘‘๐œ†2 ln(2)๐œ†2 (๐œ†1 โˆง ๐œ†2) โˆš ๐œ†1๐œ†2 (cid:32)โˆซ ๐œ†1 โˆš (2 โˆซ โˆž ๐‘‘๐œ†2 + 1 ๐œ†1๐œ†2 ๐œ†1 โˆš๏ธ๐œ†1 โˆ’ 2) + โˆš๏ธ๐œ†1 (cid:19) (cid:33) ๐‘‘๐œ†2 (cid:19)(cid:19) โˆš ๐œ†1 ๐œ†3/2 2 (cid:18) 2 โˆš ๐œ†1 sup ๐œ†1โ‰ฅ1 sup ๐œ†1โ‰ฅ1 sup ๐œ†1โ‰ฅ1 1 (cid:18) 1 โˆš ๐œ†1 (cid:18) 4 โˆ’ 2 โˆš ๐œ†1 = = = = 1 ln(2) 1 ln(2) 1 ln(2) 4 ln(2) and conclude that (cid:13) (cid:13) (cid:13) (cid:13) โˆซ โˆž 1 ๐พ โˆ— ๐œ†,1 ๐พ๐œ†,1 ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) โ‰ค 3๐ถ๐‘› (โˆฅ๐ท๐œโˆฅโˆž + โˆฅ๐ป๐œโˆฅโˆž). 44 Thus we have: (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 1 ๐พ โˆ— ๐œ†๐พ๐œ† 1/2 ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) โ‰ค + โˆซ โˆž (cid:13) (cid:13) (cid:13) (cid:13) 1 (cid:18)โˆซ โˆž ๐พ โˆ— ๐œ†,1 ๐พ๐œ†,1 ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) + 2โˆฅ๐พ๐œ†,1โˆฅโˆฅ๐พ๐œ†,2โˆฅ ๐‘‘๐œ† ๐œ† 1 โˆฅ๐พ๐œ†,2โˆฅ2 (cid:19) 1/2 ๐‘‘๐œ† ๐œ† (cid:18)โˆซ โˆž 1 (cid:19) 1/2 . Now we see that there exists a constant ๐ถ๐‘› such that โˆซ โˆž (cid:13) (cid:13) (cid:13) (cid:13) 1 (cid:18)โˆซ โˆž 1 ๐พ โˆ— ๐œ†,1 ๐พ๐œ†,1 โˆฅ๐พ๐œ†,2โˆฅ2 2โˆฅ๐พ๐œ†,1โˆฅโˆฅ๐พ๐œ†,2โˆฅ (cid:13) 1/2 (cid:13) (cid:13) (cid:13) (cid:19) 1/2 (cid:19) 1/2 ๐‘‘๐œ† ๐œ† ๐‘‘๐œ† ๐œ† ๐‘‘๐œ† ๐œ† (cid:18)โˆซ โˆž 1 and โ‰ค ๐ถ๐‘› (โˆฅ๐ท๐œโˆฅโˆž + โˆฅ๐ท2๐œโˆฅโˆž) โ‰ค ๐ถ๐‘› โˆฅ๐ท2๐œโˆฅโˆž (cid:18)โˆซ โˆž 1 (cid:19) 1/2 ๐œ†โˆ’2๐‘› ๐‘‘๐œ† ๐œ† (cid:18)โˆซ โˆž โ‰ค ๐ถ๐‘› โˆฅ๐ท๐œโˆฅ1/2 โˆž โˆฅ๐ท2๐œโˆฅ1/2 โˆž 2๐œ†โˆ’๐‘› ๐‘‘๐œ† ๐œ† (cid:19) 1/2 . 1 (cid:13) โˆซ โˆž (cid:13) (cid:13) (cid:13) 1 ๐พ โˆ— ๐œ†๐พ๐œ† 1/2 ๐‘‘๐œ† ๐œ† (cid:13) (cid:13) (cid:13) (cid:13) (cid:18) (cid:18) โ‰ค ๐ถ๐‘› โ‰ค ๐ถ๐‘› โˆฅ๐ท๐œโˆฅโˆž + โˆฅ๐ท๐œโˆฅโˆž + 1 2๐‘› 1 2๐‘› โˆฅ๐ท2๐œโˆฅโˆž + โˆฅ๐ท2๐œโˆฅโˆž + 2 ๐‘› 1 ๐‘› โˆฅ๐ท๐œโˆฅ1/2 โˆž โˆฅ๐ท2๐œโˆฅ1/2 โˆž (cid:19) โˆฅ๐ท๐œโˆฅโˆž + (cid:19) โˆฅ๐ท2๐œโˆฅโˆž 1 ๐‘› โ‰ค 2๐ถ๐‘› (โˆฅ๐ท๐œโˆฅโˆž + โˆฅ๐ท2๐œโˆฅโˆž). Finally, we bound ๐‘ƒ2. Note that in the previous section it was observed (shown in [11]) that โˆฅ๐พ๐œ†,1โˆฅ โ‰ค ๐ถ๐‘› โˆฅ๐ท๐œโˆฅโˆž โˆฅ๐พ๐œ†,2โˆฅ โ‰ค min{๐œ†โˆ’๐‘› โˆฅ๐ท2๐œโˆฅโˆž, โˆฅ๐ท๐œโˆฅโˆž}. The above two inequalities imply โˆฅ๐พ๐œ† โˆฅ = โˆฅ๐พ๐œ†,1 + ๐พ๐œ†,2โˆฅ โ‰ค โˆฅ๐พ๐œ†,1โˆฅ + โˆฅ๐พ๐œ†,2โˆฅ โ‰ค 2๐ถ๐‘› โˆฅ๐ท๐œโˆฅโˆž so that (cid:13) (cid:13) (cid:13) (cid:13) โˆซ 1 2โˆ’๐›พ ๐พ โˆ— ๐œ†๐พ๐œ† ๐‘‘๐œ† ๐œ† (cid:13) 1/2 (cid:13) (cid:13) (cid:13) โ‰ค (cid:18)โˆซ 1 2โˆ’๐›พ โˆฅ๐พ๐œ† โˆฅ2 (cid:19) 1/2 ๐‘‘๐œ† ๐œ† โ‰ค 2๐ถ๐‘› โˆฅ๐ท๐œโˆฅโˆž (cid:19) 1/2 (cid:18)โˆซ 1 2โˆ’๐›พ ๐‘‘๐œ† ๐œ† โ‰ค 2๐ถ๐‘› โˆฅ๐ท๐œโˆฅโˆž (โˆ’ ln(2โˆ’๐›พ))1/2 โ‰ค 2๐ถ๐‘›๐›พ1/2โˆฅ๐ท๐œโˆฅโˆž. 45 Putting everything together and since ๐›พ โ‰ฅ 1, we obtain: โˆฅ [W, ๐ฟ๐œ] โˆฅ โ‰ค 2(๐‘ƒ1 + ๐‘ƒ2 + ๐‘ƒ3) โ‰ค 4๐ถ๐‘› (โˆฅ๐ท๐œโˆฅโˆž + 2โˆ’๐›พ โˆฅฮ”๐œโˆฅโˆž) + 2๐ถ๐‘›๐›พ1/2โˆฅ๐ท๐œโˆฅโˆž + 3๐ถ๐‘› (โˆฅ๐ท๐œโˆฅโˆž + โˆฅ๐ท2๐œโˆฅโˆž) (cid:16) โ‰ค หœ๐ถ๐‘› ๐›พโˆฅ๐ท๐œโˆฅโˆž + 2โˆ’๐›พ โˆฅฮ”๐œโˆฅโˆž + โˆฅ๐ท2๐œโˆฅโˆž (cid:17) . Choosing ๐›พ = (cid:16) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:17) โˆจ 1 gives โˆฅ [W, ๐ฟ๐œ] โˆฅ โ‰ค หœ๐ถ๐‘› (cid:18)(cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 โˆฅ๐ท๐œโˆฅโˆž + โˆฅ๐ท2๐œโˆฅโˆž (cid:19) , and the lemma is proved. โ–ก Theorem 20. Assume ๐œ“ and its first and second order derivatives have decay in ๐‘‚ ((1 + |๐‘ฅ|)โˆ’๐‘›โˆ’3) and โˆซ R๐‘› ๐œ“(๐‘ฅ) ๐‘‘๐‘ฅ = 0. Then for every ๐œ โˆˆ ๐ถ2(R๐‘›) with โˆฅ๐ท๐œโˆฅโˆž โ‰ค 1 2๐‘› , there exists ๐ถ๐‘š,๐‘› > 0 and ห†๐ถ๐‘š,๐‘› > 0 such that โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆ’ ๐‘†๐‘š cont,2 ๐ฟ๐œ ๐‘“ โˆฅ2 L2 (R๐‘š + ) โ‰ค ๐ถ๐‘š,๐‘›๐พ2(๐œ) โˆฅ ๐‘“ โˆฅ2 2 . and with โˆฅ๐‘†๐‘š dyad,2 ๐‘“ โˆ’ ๐‘†๐‘š dyad,2 ๐ฟ๐œ ๐‘“ โˆฅ2 โ„“2 (Z๐‘š) โ‰ค ห†๐ถ๐‘š,๐‘›๐พ2(๐œ) โˆฅ ๐‘“ โˆฅ2 2 , ๐พ2(๐œ) = โˆฅ๐ท๐œโˆฅ2 โˆž + (cid:18) โˆฅ๐ท๐œโˆฅโˆž (cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 + โˆฅ๐ท2๐œโˆฅโˆž (cid:19) 2 . Proof. The proof is only provided for the continuous case. We have the following bound for some ๐ถ๐‘š: โˆฅ๐‘†๐‘š cont,2 ๐‘“ โˆ’ ๐‘†๐‘š cont,2 ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค โˆฅ ๐ด2๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด2๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅL2 (R๐‘š + ) + โˆฅ ๐ด2๐‘€ [W๐‘‰๐‘šโˆ’1, ๐ฟ๐œ] ๐‘“ โˆฅL2 (R๐‘š + ) โ‰ค โˆฅ ๐ด2๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด2๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅL2 (R๐‘š + ) + ๐ถ2 ๐‘š โˆฅ [W, ๐ฟ๐œ] โˆฅ2 L2 (R๐‘š + ร—R๐‘›)โ†’L2 (R๐‘›) โˆฅ ๐‘“ โˆฅ2 2 . 46 For the first term, we can mimic the dilation argument to get | ๐ด2๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด2๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ | = |โˆฅ๐‘”โˆฅ2 โˆ’ โˆฅ๐ฟ๐œ๐‘”โˆฅ2| . The difference is the term with the diffeomorphism. Let ๐‘ฆ = ๐›พ(๐‘ฅ) = ๐‘ฅ โˆ’ ๐œ(๐‘ฅ). Then it follows that ๐›พโˆ’1(๐‘ฆ) = ๐‘ฅ and change of variables implies that โˆฅ๐ฟ๐œ ๐‘“ โˆฅ2 2 = โˆซ R๐‘› | ๐‘“ (๐‘ฅ โˆ’ ๐œ(๐‘ฅ))|2 ๐‘‘๐‘ฅ = โˆซ R๐‘› | ๐‘“ (๐‘ฆ)|2 ๐‘‘๐‘ฆ | det(๐ผ โˆ’ ๐ท๐œ(๐›พโˆ’1(๐‘ฆ)))| . We also have Thus, we obtain 1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž โ‰ค | det(๐ผ โˆ’ ๐ท๐œ(๐›พโˆ’1(๐‘ฆ)))| โ‰ค 1 + ๐‘›โˆฅ๐ท๐œโˆฅโˆž. โˆซ 1 1 + ๐‘›โˆฅ๐ท๐œโˆฅโˆž | ๐‘“ (๐‘ฆ)|2 ๐‘‘๐‘ฆ โ‰ค โˆฅ๐ฟ๐œ ๐‘“ โˆฅ2 2 โ‰ค โˆฅ ๐‘“ โˆฅ2 2 โ‰ค โˆฅ๐ฟ๐œ ๐‘“ โˆฅ2 2 โ‰ค R๐‘› 1 1 + ๐‘›โˆฅ๐ท๐œโˆฅโˆž 1 1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž 1 1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž โˆซ R๐‘› | ๐‘“ (๐‘ฆ)|2 ๐‘‘๐‘ฆ, โˆฅ ๐‘“ โˆฅ2 2 . Since we have a bound on โˆฅ๐ท๐œโˆฅโˆž, we see that 1 1 + ๐‘›โˆฅ๐ท๐œโˆฅโˆž = 1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž 1 โˆ’ ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž โ‰ฅ 1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž since 1 > 1 โˆ’ ๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž > 0. Similarly, 1 1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž = 1 + 2๐‘›โˆฅ๐ท๐œโˆฅโˆž 1 + ๐‘›โˆฅ๐ท๐œโˆฅโˆž โˆ’ 2๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž and 1 + ๐‘›โˆฅ๐ท๐œโˆฅโˆž โˆ’ 2๐‘›2โˆฅ๐ท๐œโˆฅ2 โˆž โ‰ฅ 1 + ๐‘›โˆฅ๐ท๐œโˆฅโˆž โˆ’ 2๐‘›2 2๐‘› โˆฅ๐ท๐œโˆฅโˆž = 1 since โˆฅ๐ท๐œโˆฅโˆž โ‰ค 1 2๐‘› . It follows that 1 1โˆ’๐‘›โˆฅ๐ท๐œโˆฅโˆž โ‰ค 1 + 2๐‘›โˆฅ๐ท๐œโˆฅโˆž and (1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž)1/2โˆฅ ๐‘“ โˆฅ2 โ‰ค โˆฅ๐ฟ๐œ ๐‘“ โˆฅ2 โ‰ค (1 + 2๐‘›โˆฅ๐ท๐œโˆฅโˆž)1/2โˆฅ ๐‘“ โˆฅ2. Since 1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž < 1 and 1 + 2๐‘›โˆฅ๐ท๐œโˆฅโˆž > 1, Use the lower bound on โˆฅ๐ฟ๐œ ๐‘“ โˆฅ2 to get โˆฅ ๐‘“ โˆฅ2 โˆ’ โˆฅ๐ฟ๐œ ๐‘“ โˆฅ2 = โˆฅ ๐‘“ โˆฅ2 (cid:16) 1 โˆ’ (1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž)1/2(cid:17) โ‰ค โˆฅ ๐‘“ โˆฅ2 (1 โˆ’ (1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž)) = ๐‘›โˆฅ๐ท๐œโˆฅโˆžโˆฅ ๐‘“ โˆฅ2. 47 and the upper bound to get โˆฅ๐ฟ๐œ ๐‘“ โˆฅ2 โˆ’ โˆฅ ๐‘“ โˆฅ2 = โˆฅ ๐‘“ โˆฅ2 (cid:16) (1 + 2๐‘›โˆฅ๐ท๐œโˆฅโˆž)1/2 โˆ’ 1 (cid:17) โ‰ค โˆฅ ๐‘“ โˆฅ2 ((1 + 2๐‘›โˆฅ๐ท๐œโˆฅโˆž) โˆ’ 1) = 2๐‘›โˆฅ๐ท๐œโˆฅโˆžโˆฅ ๐‘“ โˆฅ2. Finally, we have |โˆฅ ๐‘“ โˆฅ2 โˆ’ โˆฅ๐ฟ๐œ ๐‘“ โˆฅ2| โ‰ค 2๐‘›โˆฅ๐ท๐œโˆฅโˆžโˆฅ ๐‘“ โˆฅ2 for any ๐‘“ โˆˆ L2(R๐‘›). Now we mimic the argument given for dilation stability to get โˆฅ ๐ด2๐‘€W๐‘‰๐‘šโˆ’1 ๐‘“ โˆ’ ๐ด2๐‘€ ๐ฟ๐œW๐‘‰๐‘šโˆ’1 ๐‘“ โˆฅ2 L2 (R๐‘š + ) โ‰ค ๐ถ โˆฅ๐ท๐œโˆฅ2 โˆžโˆฅ ๐‘“ โˆฅ2 2 for some constant ๐ถ. For the second term, we have ๐ถ2 ๐‘š โˆฅ [W, ๐ฟ๐œ] โˆฅ2 L2 (R๐‘š + ร—R๐‘›)โ†’L2 (R๐‘›) โˆฅ ๐‘“ โˆฅ2 2 โ‰ค ๐ถโ€ฒ (cid:18) โˆฅ๐ท๐œโˆฅโˆž (cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 + โˆฅ๐ท2๐œโˆฅโˆž (cid:19) 2 โˆฅ ๐‘“ โˆฅ2 2 for some constant ๐ถโ€ฒ. We now choose ๐ถ๐‘›,๐‘š = max{๐ถโ€ฒ, ๐ถ} to get the desired bound. โ–ก 2.4.2 Stability to Diffeomorphisms When 1 < ๐‘ž < 2 Lemma 21. Let ๐›พ(๐‘ง) = ๐‘ง โˆ’ ๐œ(๐‘ง), ๐‘”(๐‘ง) = ๐‘“ (๐›พ(๐‘ง)), and ๐พ๐œ† (๐‘ฅ, ๐‘ง) = det(๐ท๐›พ(๐‘ง))๐œ“๐œ† (๐›พ(๐‘ฅ) โˆ’ ๐›พ(๐‘ง)) โˆ’ ๐œ“๐œ† (๐‘ฅ โˆ’ ๐‘ง). Additionally, define ๐‘‡๐œ†๐‘”(๐‘ฅ) = โˆซ R๐‘› ๐‘”(๐‘ง)๐พ๐œ† (๐‘ฅ, ๐‘ง) ๐‘‘๐‘ง and consider ๐‘‡ ๐‘” : R๐‘› โ†’ L2(R+, ๐‘‘๐œ† X = L2(R+, ๐‘‘๐œ† ๐œ†๐‘›+1 ), ๐œ†๐‘›+1 ) defined by ๐‘‡ ๐‘”(๐‘ฅ) = (๐‘‡๐œ†๐‘”(๐‘ฅ))๐œ†โˆˆR+ . Then for the Banach space โˆฅ๐‘‡ ๐‘”โˆฅ2 X (R๐‘›) L2 โ‰ค ๐ถ๐‘›,๐‘š (cid:18) โˆฅ๐ท๐œโˆฅโˆž (cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 + โˆฅ๐ท2๐œโˆฅโˆž (cid:19) 2 โˆฅ ๐‘“ โˆฅ2 for some constant ๐ถ๐‘›,๐‘š > 0. 48 Proof. Notice that โˆฅ๐‘‡ ๐‘”โˆฅ2 ๐‘‹ (R๐‘›) L2 โˆซ โˆž โˆซ = = = = R๐‘› โˆซ 0 โˆซ โˆž R๐‘› โˆซ 0 โˆซ โˆž R๐‘› โˆซ 0 โˆซ โˆž R๐‘› 0 |๐‘‡๐œ†๐‘”(๐‘ฅ)|2 ๐‘‘๐œ† ๐œ†๐‘›+1 ๐‘‘๐‘ฅ (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) โˆซ R๐‘› โˆซ R๐‘› โˆซ R๐‘› ๐พ๐œ† (๐‘ฅ, ๐‘ง)๐‘”(๐‘ง) ๐‘‘๐‘ง 2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:12) (cid:12) (cid:12) (cid:12) ๐‘‘๐‘ฅ ๐‘“ (๐›พ(๐‘ง)) [det(๐ท๐›พ(๐‘ง))๐œ“๐œ† (๐›พ(๐‘ฅ) โˆ’ ๐›พ(๐‘ง)) โˆ’ ๐œ“๐œ† (๐‘ฅ โˆ’ ๐‘ง)] ๐‘‘๐‘ง 2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:12) (cid:12) (cid:12) (cid:12) ๐‘‘๐‘ฅ det(๐ท๐›พ(๐‘ง)) ๐‘“ (๐›พ(๐‘ง))๐œ“๐œ† (๐›พ(๐‘ฅ) โˆ’ ๐›พ(๐‘ง)) ๐‘‘๐‘ง โˆ’ โˆซ R๐‘› ๐‘“ (๐›พ(๐‘ง))๐œ“๐œ† (๐‘ฅ โˆ’ ๐‘ง) ๐‘‘๐‘ง 2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:12) (cid:12) (cid:12) (cid:12) ๐‘‘๐‘ฅ. Using the change of variables ๐‘ข = ๐›พ(๐‘ง), we get โˆฅ๐‘‡ ๐‘”โˆฅ2 ๐‘‹ (R๐‘›) ๐ฟ2 = = = = โˆซ โˆซ โˆž R๐‘› โˆซ 0 โˆซ โˆž R๐‘› โˆซ โˆž 0 โˆซ R๐‘› 0 โˆซ โˆž 0 |๐ฟ๐œ ( ๐‘“ โˆ— ๐œ“๐œ†) (๐‘ฅ) โˆ’ (๐ฟ๐œ ๐‘“ โˆ— ๐œ“๐œ†) (๐‘ฅ)|2 ๐‘‘๐œ† ๐œ†๐‘›+1 ๐‘‘๐‘ฅ | [W๐œ†, ๐ฟ๐œ] ๐‘“ (๐‘ฅ)|2 ๐‘‘๐œ† ๐œ†๐‘›+1 ๐‘‘๐‘ฅ | [W๐œ†, ๐ฟ๐œ] ๐‘“ (๐‘ฅ)|2 ๐‘‘๐‘ฅ ๐‘‘๐œ† ๐œ†๐‘›+1 โˆฅ [W๐œ†, ๐ฟ๐œ] ๐‘“ โˆฅ2 2 ๐‘‘๐œ† ๐œ†๐‘›+1 = โˆฅ [W, ๐ฟ๐œ] ๐‘“ โˆฅ2 (cid:18) โ‰ค ๐ถ๐‘›,๐‘š โˆฅ๐ท๐œโˆฅโˆž L2 (R+ร—R๐‘›) (cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 + โˆฅ๐ท2๐œโˆฅโˆž (cid:19) โˆฅ ๐‘“ โˆฅ2 2 , where the last inequality follows from the ๐‘ž = 2 case. โ–ก Lemma 22 ([39], Marcinkiewicz Interpolation). Let A and B be Banach spaces and let ๐‘‡ : A โ†’ B ๐‘1 A (R๐‘›) with 0 < ๐‘0 < ๐‘1. Furthermore, if ๐‘‡ be a quasilinear operator defined on L ๐‘0 A (R๐‘›) and L satisfies โˆฅ๐‘‡ ๐‘“ โˆฅL ๐‘๐‘– ,โˆž B (R๐‘›) โ‰ค ๐‘€๐‘– โˆฅ ๐‘“ โˆฅL ๐‘๐‘– A (R๐‘›) for ๐‘– = 0, 1, then for all ๐‘ โˆˆ ( ๐‘0, ๐‘1), โˆฅ๐‘‡ ๐‘“ โˆฅL ๐‘ B (R๐‘›) โ‰ค ๐‘ ๐‘ โˆฅ ๐‘“ โˆฅL ๐‘ A (R๐‘›), where ๐‘ ๐‘ only depends on ๐‘€0, ๐‘€1, and ๐‘. 49 Remark 7. Like with the scalar valued estimate, it can be shown that ๐‘ ๐‘ = ๐œ‚๐‘€ ๐›ฟ 0 ๐‘€ 1โˆ’๐›ฟ 1 , where and ๐›ฟ = ๐‘0( ๐‘1 โˆ’ ๐‘) ๐‘( ๐‘1 โˆ’ ๐‘0) ๐‘0 ๐‘ ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐‘1 < โˆž, ๐‘1 = โˆž (cid:19) 1/๐‘ ๐œ‚ = (cid:18) (cid:18) 2 2 ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐‘( ๐‘1 โˆ’ ๐‘0) ( ๐‘ โˆ’ ๐‘0) ( ๐‘1 โˆ’ ๐‘) (cid:19) 1/๐‘ ๐‘0 ๐‘ โˆ’ ๐‘0 ๐‘1 < โˆž, ๐‘1 = โˆž. Lemma 23. Let ๐‘‡ be the operator defined in Lemma 21. Let ๐‘ž โˆˆ (1, 2) and ๐‘Ÿ โˆˆ (1, ๐‘ž). Then ๐‘‡ satisfies โˆฅ๐‘‡ ๐‘”โˆฅL ๐‘Ÿ ,โˆž X (R๐‘›) โ‰ค ๐‘€๐‘Ÿ โˆฅ ๐‘“ โˆฅL๐‘Ÿ (R๐‘›) for some constant ๐‘€๐‘Ÿ > 0, which is independent of โˆฅ๐ท๐œโˆฅโˆž and โˆฅ๐ท2๐œโˆฅโˆž. Furthermore, ๐‘‡ also satisfies โˆฅ๐‘‡ ๐‘”โˆฅ2 L2,โˆž X (R๐‘›) (cid:18) โ‰ค หœ๐ถ๐‘› โˆฅ๐ท๐œโˆฅโˆž (cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 (cid:19) 2 + โˆฅ๐ท2๐œโˆฅโˆž โˆฅ ๐‘“ โˆฅ2 L2 (R๐‘›) for some constant หœ๐ถ๐‘› > 0. Proof. The second inequality obviously follows from strong boundedness of the operator, so we will omit the proof. For the first inequality, the norm satisfies โˆฅ๐‘‡ ๐‘”(๐‘ฅ)โˆฅ2 X = = โˆซ โˆž 0 โˆซ โˆž โˆซ (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) โˆซ (cid:12) (cid:12) (cid:12) 0 โˆซ โˆž R๐‘› R๐‘› (cid:12) โˆซ (cid:12) (cid:12) (cid:12) โ‰ค 4 0 R๐‘› det(๐ท๐›พ(๐‘ง)) ๐‘“ (๐›พ(๐‘ง))๐œ“๐œ† (๐›พ(๐‘ฅ) โˆ’ ๐›พ(๐‘ง)) ๐‘‘๐‘ง โˆ’ โˆซ R๐‘› ๐‘“ (๐›พ(๐‘ง))๐œ“๐œ† (๐‘ฅ โˆ’ ๐‘ง) ๐‘‘๐‘ง 2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:12) (cid:12) (cid:12) (cid:12) ๐‘“ (๐‘ง)๐œ“๐œ† (๐›พ(๐‘ฅ) โˆ’ ๐‘ง) ๐‘‘๐‘ง โˆ’ ๐‘“ (๐‘ง)๐œ“๐œ† (๐›พ(๐‘ฅ) โˆ’ ๐‘ง) ๐‘‘๐‘ง โˆซ R๐‘› (cid:12) 2 ๐‘‘๐œ† (cid:12) (cid:12) ๐œ†2 (cid:12) ๐‘“ (๐›พ(๐‘ง))๐œ“๐œ† (๐‘ฅ โˆ’ ๐‘ง) ๐‘‘๐‘ง (cid:12) 2 ๐‘‘๐œ† (cid:12) (cid:12) ๐œ†๐‘›+1 (cid:12) + 4 โˆซ โˆž 0 (cid:12) (cid:12) (cid:12) (cid:12) โˆซ R๐‘› ๐‘“ (๐›พ(๐‘ง))๐œ“๐œ† (๐‘ฅ โˆ’ ๐‘ง) ๐‘‘๐‘ง 2 ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:12) (cid:12) (cid:12) (cid:12) = 4|(๐บ ๐‘“ )(๐›พ(๐‘ฅ))|2 + 4|๐บ ๐ฟ๐œ ๐‘“ (๐‘ฅ)|2. We see โˆฅ๐‘‡ ๐‘”(๐‘ฅ)โˆฅX โ‰ค โˆš๏ธƒ 4|(๐บ ๐‘“ )(๐›พ(๐‘ฅ))|2 + 4|๐บ ๐ฟ๐œ ๐‘“ (๐‘ฅ)|2 โ‰ค 2|(๐บ ๐‘“ ) (๐›พ(๐‘ฅ))| + 2|๐บ ๐ฟ๐œ ๐‘“ (๐‘ฅ)|. 50 For ๐›ฟ > 0, Chebyshevโ€™s inequality implies that there exists ๐ด๐‘Ÿ such that ๐‘š{โˆฅ๐‘‡ ๐‘”(๐‘ฅ)โˆฅX > ๐›ฟ} โ‰ค ๐‘š{2|(๐บ ๐‘“ ) (๐›พ(๐‘ฅ))| + 2|๐บ ๐ฟ๐œ ๐‘“ (๐‘ฅ)| > ๐›ฟ} ๐ด๐‘Ÿ ๐›ฟ๐‘Ÿ (โˆฅ(๐บ ๐‘“ ) (๐›พ(ยท)) โˆฅ๐‘Ÿ L๐‘Ÿ (R๐‘›) + โˆฅ๐บ ๐ฟ๐œ ๐‘“ โˆฅ๐‘Ÿ โ‰ค L๐‘Ÿ (R๐‘›)). We want to now ensure that โˆฅ(๐บ ๐‘“ )(๐›พ(ยท))โˆฅ๐‘Ÿ L๐‘Ÿ (R๐‘›) can be bounded above by a constant multiple of โˆฅ๐บ ๐‘“ โˆฅ๐‘Ÿ L๐‘Ÿ (R๐‘›) . Since ๐›พ is a diffeomorphism, we can use change of variables to get โˆฅ(๐บ ๐‘“ )(๐›พ(ยท))โˆฅ๐‘Ÿ L๐‘Ÿ (R๐‘›) = = |๐บ ๐‘“ (๐›พ(๐‘ฅ))|๐‘Ÿ ๐‘‘๐‘ฅ |๐บ ๐‘“ (๐‘ข)|๐‘Ÿ ๐‘‘๐‘ข det (cid:2)(๐ท๐›พ) (๐›พโˆ’1(๐‘ข))(cid:3) โˆซ R๐‘› โˆซ R๐‘› โˆซ |๐บ ๐‘“ (๐‘ฅ)|๐‘Ÿ ๐‘‘๐‘ฅ โ‰ค 2 R๐‘› = 2โˆฅ๐บ ๐‘“ โˆฅ๐‘Ÿ . L๐‘Ÿ (R๐‘›) By Theorem 5, we get โˆฅ๐บ ๐ฟ๐œ ๐‘“ โˆฅ๐‘Ÿ L๐‘Ÿ (R๐‘›) โ‰ค ๐ถ๐‘Ÿ โˆฅ๐ฟ๐œ ๐‘“ โˆฅ๐‘Ÿ L๐‘Ÿ (R๐‘›) โ‰ค 2๐ถ๐‘Ÿ โˆฅ ๐‘“ โˆฅ๐‘Ÿ L๐‘Ÿ (R๐‘›) for some constant ๐ถ๐‘Ÿ dependent on ๐‘Ÿ. Thus, we have ๐‘š{โˆฅ๐‘‡ ๐‘”(๐‘ฅ) โˆฅX > ๐›ฟ}1/๐‘Ÿ โ‰ค ๐‘€๐‘Ÿ ๐›ฟ โˆฅ ๐‘“ โˆฅL๐‘Ÿ (R๐‘›) for some constant ๐‘€๐‘Ÿ > 0. โ–ก Lemma 24. Fix ๐‘Ÿ = 1+๐‘ž 2 so that ๐‘Ÿ โˆˆ (1, ๐‘ž). For some constant ๐ถ๐‘›,๐‘ž > 0, the operator ๐‘‡ defined in Lemma 21 satisfies the estimate โˆฅ๐‘‡ ๐‘”โˆฅ ๐‘ž L ๐‘ž X (R๐‘›) โ‰ค ๐ถ๐‘›,๐‘ž๐œ‚๐‘ž ๐‘€ ๐‘ž๐›ฟ ๐‘Ÿ (cid:18) โˆฅ๐ท๐œโˆฅโˆž (cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 + โˆฅ๐ท2๐œโˆฅโˆž (cid:19) ๐‘ž(1โˆ’๐›ฟ) โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž, where ๐œ‚ and ๐›ฟ come from interpolation, and ๐‘€๐‘Ÿ comes from the constant for weak boundedness in Lemma 23. Proof. Since ๐‘‡ is an integral operator, it is clear that is quasilinear. Using the L๐‘Ÿ (R๐‘›) and L2(R๐‘›) estimates from the previous Lemma, we interpolate using Marcinkiewicz since โˆฅ๐‘”โˆฅ๐‘Ÿ โ‰ค 2โˆฅ ๐‘“ โˆฅ๐‘Ÿ โ‰ค 4โˆฅ๐‘”โˆฅ๐‘Ÿ. โ–ก 51 Theorem 25. Let 1 < ๐‘ž < 2. Assume ๐œ“ and its first and second order derivatives have decay in ๐‘‚ ((1 + |๐‘ฅ|)โˆ’๐‘›โˆ’3), and โˆซ R๐‘› ๐œ“(๐‘ฅ) ๐‘‘๐‘ฅ = 0. Then for every ๐œ โˆˆ ๐ถ2(R๐‘›) with โˆฅ๐ท๐œโˆฅโˆž < 1 2๐‘› , there exists ๐ถ๐‘›,๐‘ž > 0 such that โˆฅ๐‘†cont,๐‘ž ๐‘“ โˆ’ ๐‘†cont,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐‘ž L2 (R+) โ‰ค ๐ถ๐‘›,๐‘ž๐พ๐‘ž (๐œ) โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž with ๐พ๐‘ž (๐œ) = โˆฅ๐ท๐œโˆฅ ๐‘ž โˆž + ๐œ‚๐‘ž ๐‘€ ๐‘ž๐›ฟ ๐‘Ÿ (cid:18) โˆฅ๐ท๐œโˆฅโˆž (cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 + โˆฅ๐ท2๐œโˆฅโˆž (cid:19) ๐‘ž(1โˆ’๐›ฟ) . Proof. We use the same notation as Theorem 16. Using a nearly identical argument to Corollary 17, we get โˆฅ๐‘†cont,๐‘ž ๐‘“ โˆ’ ๐‘†cont,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅL2 (R+) = โˆฅ ๐ด๐‘ž ๐‘€W ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€W ๐ฟ๐œ ๐‘“ โˆฅL2 (R+) = โˆฅ ๐ด๐‘ž ๐‘€W ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€ ๐ฟ๐œW ๐‘“ + ๐ด๐‘ž ๐‘€ ๐ฟ๐œ๐‘Š ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€W ๐ฟ๐œ ๐‘“ โˆฅL2 (R+) โ‰ค โˆฅ ๐ด๐‘ž ๐‘€W ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€ ๐ฟ๐œW ๐‘“ โˆฅL2 (R+) + โˆฅ ๐ด๐‘ž ๐‘€ ๐ฟ๐œW ๐‘“ โˆ’ ๐ด๐‘ž ๐‘€W ๐ฟ๐œ ๐‘“ โˆฅL2 (R+) โ‰ค โˆฅ( ๐ด๐‘ž ๐‘€ โˆ’ ๐ด๐‘ž ๐‘€ ๐ฟ๐œ)W ๐‘“ โˆฅL2 (R+) + โˆฅ ๐ด๐‘ž ๐‘€ [W, ๐ฟ๐œ] ๐‘“ โˆฅL2 (R+). The first term, โˆฅ( ๐ด๐‘ž ๐‘€ โˆ’ ๐ด๐‘ž ๐‘€ ๐ฟ๐œ)W ๐‘“ โˆฅL2 (R+), can be bounded using an argument identical to the ๐‘ž = 2 case. In particular, we can prove that (1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž)โˆฅ ๐‘“ โˆฅ๐‘ž โ‰ค (1 โˆ’ ๐‘›โˆฅ๐ท๐œโˆฅโˆž)1/๐‘ž โˆฅ ๐‘“ โˆฅ๐‘ž โ‰ค โˆฅ๐ฟ๐œ ๐‘“ โˆฅ๐‘ž and which means For the other term, โˆฅ๐ฟ๐œ ๐‘“ โˆฅ๐‘ž โ‰ค (1 + 2๐‘›โˆฅ๐ท๐œโˆฅโˆž)1/๐‘ž โˆฅ ๐‘“ โˆฅ๐‘ž โ‰ค (1 + 2๐‘›โˆฅ๐ท๐œโˆฅโˆž) โˆฅ ๐‘“ โˆฅ๐‘ž, โˆฅ( ๐ด๐‘ž ๐‘€ โˆ’ ๐ด๐‘ž ๐‘€ ๐ฟ๐œ)W ๐‘“ โˆฅ ๐‘ž L2 (R+) โ‰ค ๐ถ โˆฅ๐ท๐œโˆฅ ๐‘ž โˆžโˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. โˆฅ ๐ด๐‘ž ๐‘€ [W, ๐ฟ๐œ] ๐‘“ โˆฅ ๐‘ž L2 (R+) = (cid:32)โˆซ โˆž (cid:20)โˆซ 0 R๐‘› |(๐ฟ๐œ ๐‘“ โˆ— ๐œ“๐œ†) (๐‘ฅ) โˆ’ ๐ฟ๐œ ( ๐‘“ โˆ— ๐œ“๐œ†) (๐‘ฅ)|๐‘ž ๐‘‘๐‘ฅ (cid:21) 2/๐‘ž ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:33) ๐‘ž/2 . 52 Now, expand convolution and then use change of variables to get โˆฅ ๐ด๐‘ž ๐‘€ [W, ๐ฟ๐œ] ๐‘“ โˆฅ ๐‘ž L2 (R+) (cid:32)โˆซ โˆž (cid:20)โˆซ 0 (cid:32)โˆซ โˆž 0 (cid:32)โˆซ โˆž R๐‘› (cid:20)โˆซ R๐‘› (cid:20)โˆซ 0 R๐‘› (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) โˆซ R๐‘› โˆซ R๐‘› ๐‘“ (๐›พ(๐‘ง))(det(๐ท๐›พ(๐‘ง))๐œ“๐œ† (๐›พ(๐‘ฅ) โˆ’ ๐›พ(๐‘ง)) โˆ’ ๐œ“๐œ† (๐‘ฅ โˆ’ ๐‘ง)) ๐‘‘๐‘ง (cid:33) ๐‘ž/2 ๐‘ž (cid:12) (cid:12) (cid:12) (cid:12) ๐‘‘๐‘ฅ (cid:21) 2/๐‘ž ๐‘‘๐œ† ๐œ†๐‘›+1 ๐‘”(๐‘ง)๐พ๐œ† (๐‘ฅ, ๐‘ง) ๐‘‘๐‘ง (cid:33) ๐‘ž/2 ๐‘ž (cid:12) (cid:12) (cid:12) (cid:12) ๐‘‘๐‘ฅ (cid:21) 2/๐‘ž ๐‘‘๐œ† ๐œ†๐‘›+1 |๐‘‡๐œ†๐‘”(๐‘ฅ)|๐‘ž ๐‘‘๐‘ฅ (cid:33) ๐‘ž/2 (cid:21) 2/๐‘ž ๐‘‘๐œ† ๐œ†๐‘›+1 |๐‘‡๐œ†๐‘”(๐‘ฅ)|๐‘ž ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:21) ๐‘ž/2 |๐‘‡๐œ†๐‘”(๐‘ฅ)|2 (cid:21) ๐‘ž/2 ๐‘‘๐œ† ๐œ†๐‘›+1 ๐‘‘๐‘ฅ ๐‘‘๐‘ฅ โˆซ (cid:20)โˆซ โˆž R๐‘› โˆซ 0 (cid:20)โˆซ โˆž 0 R๐‘› โˆซ R๐‘› โˆฅ๐‘‡ ๐‘”(๐‘ฅ)โˆฅ ๐‘ž L2 (cid:16)R+, (cid:17) ๐‘‘๐œ† ๐œ†๐‘›+1 ๐‘‘๐‘ฅ = = = โ‰ค = = = โˆฅ๐‘‡ ๐‘”โˆฅ ๐‘ž L ๐‘ž X (R๐‘›) (cid:18) โ‰ค ๐ถ๐‘›๐œ‚๐‘ž ๐‘€ ๐‘ž๐›ฟ ๐‘Ÿ โˆฅ๐ท๐œโˆฅโˆž (cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 + โˆฅ๐ท2๐œโˆฅโˆž (cid:19) ๐‘ž(1โˆ’๐›ฟ) โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. Thus, the proof is complete. โ–ก Corollary 26. Let 1 < ๐‘ž < 2 . Assume ๐œ“ and its first and second order derivatives have decay in ๐‘‚ ((1 + |๐‘ฅ|)โˆ’๐‘›โˆ’3), and โˆซ R๐‘› ๐œ“(๐‘ฅ) ๐‘‘๐‘ฅ = 0. Then for every ๐œ โˆˆ ๐ถ2(R๐‘›) with โˆฅ๐ท๐œโˆฅโˆž < 1 2๐‘› , there exist constants ๐ถ๐‘›,๐‘š, ห†๐ถ๐‘›,๐‘š > 0 such that โˆฅ๐‘†๐‘š cont,๐‘ž ๐‘“ โˆ’ ๐‘†๐‘š cont,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + ) โ‰ค ๐ถ๐‘›,๐‘š๐พ๐‘ž (๐œ) โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž and โˆฅ๐‘†๐‘š dyad,๐‘ž ๐‘“ โˆ’ ๐‘†๐‘š dyad,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐‘ž โ„“2 (Z๐‘š) โ‰ค ห†๐ถ๐‘›,๐‘š๐พ๐‘ž (๐œ) โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. Remark 8. This bound is not exactly the same as the definition for stability to diffeomorphisms in (cid:17) [11], but the idea is similar. Since ๐‘Ÿ is fixed, so is ๐›ฟ. It is easy to confirm that ๐›ฟ = 1 1+๐‘ž โˆˆ (cid:16) 1 3 , 1 2 when using Marcinkiewicz interpolation in Lemma 24, so ๐ถ๐‘›,๐‘ž๐œ‚๐‘ž ๐‘€ ๐‘ž๐›ฟ ๐‘Ÿ (cid:18) โˆฅ๐ท๐œโˆฅโˆž (cid:18) log โˆฅฮ”๐œโˆฅโˆž โˆฅ๐ท๐œโˆฅโˆž (cid:19) โˆจ 1 + โˆฅ๐ท2๐œโˆฅโˆž (cid:19) ๐‘ž(1โˆ’๐›ฟ) โ†’ 0 53 when โˆฅ๐ท๐œโˆฅโˆž โ†’ 0 and โˆฅ๐ท2๐œโˆฅโˆž โ†’ 0. 2.5 Equivariance and Invariance to Rotations We now consider adding group actions to our scattering transform and prove invariance to rotations. Let SO(๐‘›) be the group of ๐‘› ร— ๐‘› rotation matrices. Since SO(๐‘›) is a compact Lie group, we can define a Haar measure, say ๐œ‡, with ๐œ‡(SO(๐‘›)) < โˆž. We say that ๐‘“ โˆˆ L2(SO(๐‘›)) if and only if ๐‘“ is ๐œ‡-measurable and โˆซ | ๐‘“ (๐‘Ÿ)|2 ๐‘‘๐œ‡(๐‘Ÿ) < โˆž. SO(๐‘›) 2.5.1 Rotation Equivariant Representations Let ๐œ“ : R๐‘› โ†’ R be a wavelet. Define ๐œ“๐œ†,๐‘… (๐‘ฅ) = ๐œ†โˆ’๐‘›/2๐œ“(๐œ†โˆ’1๐‘…โˆ’1๐‘ฅ), where ๐‘… โˆˆ SO(๐‘›) is a ๐‘› ร— ๐‘› rotation matrix. The continuous and dyadic wavelet transforms of ๐‘“ are given by WRot ๐‘“ := { ๐‘“ โˆ— ๐œ“๐œ†,๐‘… (๐‘ฅ) : ๐‘ฅ โˆˆ R๐‘›, ๐œ† โˆˆ (0, โˆž), ๐‘… โˆˆ SO(๐‘›)}, ๐‘ŠRot ๐‘“ := { ๐‘“ โˆ— ๐œ“ ๐‘—,๐‘… (๐‘ฅ) : ๐‘ฅ โˆˆ R๐‘›, ๐‘— โˆˆ Z, ๐‘… โˆˆ SO(๐‘›)}. We will first consider a translation invariant and rotation equivariant formulation of continuous and dyadic one-layer scattering using ๐”–cont,๐‘ž ๐‘“ (๐œ†, ๐‘…) := โˆฅ ๐‘“ โˆ— ๐œ“๐œ†,๐‘… โˆฅ๐‘ž, ๐”–dyad,๐‘ž ๐‘“ ( ๐‘—, ๐‘…) := โˆฅ ๐‘“ โˆ— ๐œ“ ๐‘—,๐‘… โˆฅ๐‘ž. The translation invariance of our representation follows from translation invariance of the norm. For rotation equivariance, notice that if ๐‘“ หœ๐‘… (๐‘ฅ) := ๐‘“ ( หœ๐‘…โˆ’1๐‘ฅ), then we have ๐”–cont,๐‘ž ๐‘“ หœ๐‘… (๐œ†, ๐‘…) = ๐”–cont,๐‘ž ๐‘“ (๐œ†, หœ๐‘…โˆ’1๐‘…), ๐”–dyad,๐‘ž ๐‘“ หœ๐‘… ( ๐‘—, ๐‘…) = ๐”–dyad,๐‘ž ๐‘“ ( ๐‘—, หœ๐‘…โˆ’1๐‘…). Now suppose we have ๐‘š layers again. Then we define our ๐‘š layer transforms by ๐”–๐‘š cont,๐‘ž ๐‘“ (๐œ†1, . . . , ๐œ†๐‘š, ๐‘…1, . . . , ๐‘…๐‘š) := โˆฅ| ๐‘“ โˆ— ๐œ“๐œ†1,๐‘…1 | โˆ— . . . | โˆ— ๐œ“๐œ†๐‘š,๐‘…๐‘š โˆฅ๐‘ž, ๐”–๐‘š dyad,๐‘ž ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š, ๐‘…1, . . . , ๐‘…๐‘š) := โˆฅ| ๐‘“ โˆ— ๐œ“ ๐‘—1,๐‘…1 | โˆ— . . . | โˆ— ๐œ“ ๐‘—๐‘š,๐‘…๐‘š โˆฅ๐‘ž. 54 and rotation equivariance implies ๐”–๐‘š cont,๐‘ž ๐‘“ หœ๐‘… (๐œ†1, . . . , ๐œ†๐‘š, ๐‘…1, . . . , ๐‘…๐‘š) = ๐”–๐‘š dyad,๐‘ž ๐‘“ หœ๐‘… ( ๐‘—1, . . . , ๐‘—๐‘š, ๐‘…1, . . . , ๐‘…๐‘š) = ๐”–๐‘š ๐”–๐‘š cont,๐‘ž ๐‘“ (๐œ†1, . . . , ๐œ†๐‘š, หœ๐‘…โˆ’1๐‘…1, . . . , หœ๐‘…โˆ’1๐‘…๐‘š), dyad,๐‘ž ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š, หœ๐‘…โˆ’1๐‘…1, . . . , หœ๐‘…โˆ’1๐‘…๐‘š). The norm we will use is similar to our previous formulations. Denote the scattering norm for the continuous transform as โˆฅ๐”–๐‘š cont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + )ร—SO(๐‘›)๐‘š, which is defined as (cid:32)โˆซ โˆž โˆซ โˆซ โˆž โˆซ ยท ยท ยท 0 SO(๐‘›) 0 SO(๐‘›) โˆฅ| ๐‘“ โˆ— ๐œ“ ๐‘—1,๐‘…1 | โˆ— . . . | โˆ— ๐œ“ ๐‘—๐‘š,๐‘…๐‘š โˆฅ2 ๐‘ž๐‘‘๐œ‡1(๐‘…1) ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 . . . ๐‘‘๐œ‡๐‘š (๐‘…๐‘›) (cid:33) ๐‘ž/2 . ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š For the dyadic transform, we denote the norm using โˆฅ๐”–๐‘š dyad,๐‘ž ๐‘“ โˆฅ ๐‘ž โ„“2 (Z๐‘š)ร—SO(๐‘›)๐‘š, which is given by (cid:32) โˆซ โˆ‘๏ธ โˆซ โˆ‘๏ธ ยท ยท ยท ๐‘—๐‘šโˆˆZ SO(๐‘›) ๐‘—1โˆˆZ SO(๐‘›) โˆฅ| ๐‘“ โˆ— ๐œ“ ๐‘—1,๐‘…1 | โˆ— . . . | โˆ— ๐œ“ ๐‘—๐‘š,๐‘…๐‘š โˆฅ2 ๐‘ž๐‘‘๐œ‡1(๐‘…1) . . . ๐‘‘๐œ‡๐‘š (๐‘…๐‘›) (cid:33) ๐‘ž/2 . We will start by proving that these formulations of the scattering transform are well defined, and prove properties about stability to diffeomorphisms like in previous chapters. Lemma 27. Let ๐œ“ be a wavelet that satisfies properties (2.4) and (2.5). โ€ข If 1 < ๐‘ž โ‰ค 2, we have ๐”–๐‘š cont,๐‘ž : L๐‘ž (R๐‘›) โ†’ L2(R๐‘š + ) ร— SO(๐‘›)๐‘š and ๐”–๐‘š dyad,๐‘ž : L๐‘ž (R๐‘›) โ†’ โ„“2(Z๐‘š) ร— SO(๐‘›)๐‘š. โ€ข If ๐‘ž = 1 and one of the following holds: โ€“ ๐‘› = 1 and ๐œ“ is complex analytic, โ€“ ๐‘› โ‰ฅ 2 and ๐œ“ satisfies the conditions of Lemma 9, then ๐”–๐‘š cont,1 : L1(R๐‘›) โ†’ L2(R๐‘š + ) ร— SO(๐‘›)๐‘š and ๐”–๐‘š dyad,1 : L1(R๐‘›) โ†’ โ„“2(Z๐‘š) ร— SO(๐‘›)๐‘š. โ€ข If ๐œ“ is also a Littlewood-Paley wavelet, we have โˆฅ๐”–๐‘š cont,2 ๐‘“ โˆฅ2 โˆฅ๐”–๐‘š dyad,๐‘ž ๐‘“ โˆฅ2 + )ร—SO(๐‘›)๐‘š = ๐œ‡(SO(๐‘›))๐‘š๐ถ๐‘š L2 (R๐‘š โ„“2 (Z๐‘š)ร—SO(๐‘›)๐‘š = ๐œ‡(SO(๐‘›))๐‘š ห†๐ถ๐‘š ๐œ“ โˆฅ ๐‘“ โˆฅ2 2 ๐œ“ โˆฅ ๐‘“ โˆฅ2 2 , . Proof. We prove the first and third claim. The second claim is almost identical to the first claim, so the proof will be omitted for brevity. Note that we will only provide arguments for the continuous 55 scattering transform since the proofs for the dyadic transform are very similar. By Fubini Theorem and boundedness of the ๐‘š-layer scattering transform, there exists a constant ๐ถ๐‘ž > 0, which is dependent on ๐‘ž, such that โˆฅ๐”–๐‘š cont,๐‘ž ๐‘“ โˆฅ (cid:34)โˆซ โˆž โˆซ ๐‘ž L2 (R๐‘š + )ร—SO(๐‘›)๐‘š โˆซ โˆž โˆซ ยท ยท ยท = โ‰ค 0 (cid:20)โˆซ SO(๐‘›) โˆซ ยท ยท ยท 0 SO(๐‘›) (๐ถ๐‘š๐‘ž ๐‘ž โˆฅ ๐‘“ โˆฅ SO(๐‘›) SO(๐‘›) ๐‘ž ๐œ‡(SO(๐‘›))๐‘š๐‘ž/2โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž = ๐ถ๐‘š๐‘ž โˆฅ| ๐‘“ โˆ— ๐œ“๐œ†1,๐‘…1 | โˆ— . . . | โˆ— ๐œ“๐œ†๐‘š,๐‘…๐‘š โˆฅ2 ๐‘ž๐‘‘๐œ‡(๐‘…๐‘š) ๐‘ž ๐‘ž)2/๐‘ž ๐‘‘๐œ‡(๐‘…1) ยท ยท ยท ๐‘‘๐œ‡(๐‘…๐‘š) (cid:21) ๐‘ž/2 ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ยท ยท ยท ๐‘‘๐œ‡(๐‘…1) (cid:35) ๐‘ž/2 ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š because each ๐œ“๐œ†๐‘–,๐‘…๐‘– is still a wavelet with sufficient decay even if the rotation is applied. For the third claim, we see that โˆฅ๐”–๐‘š cont,2 โˆซ = ๐‘“ โˆฅ2 L2 (R๐‘š โˆซ + )ร—SO(๐‘›)๐‘š ๐ถ๐‘š ยท ยท ยท SO(๐‘›) SO(๐‘›) = ๐œ‡(SO(๐‘›))๐‘š๐ถ๐‘š ๐œ“ โˆฅ ๐‘“ โˆฅ2 2 ๐œ“ โˆฅ ๐‘“ โˆฅ2 2 ๐‘‘๐œ‡(๐‘…1) ยท ยท ยท ๐‘‘๐œ‡(๐‘…๐‘š) . โ–ก Theorem 28. Assume |๐‘| < 1 2๐‘› . Let ๐œ(๐‘ฅ) = ๐‘๐‘ฅ and ๐ฟ๐œ ๐‘“ (๐‘ฅ) = ๐‘“ ((1 โˆ’ ๐‘)๐‘ฅ). Suppose that ๐œ“ is a wavelet that satisfies the conditions of Lemma 14. Then there exist constants หœ๐พ๐‘›,๐‘š,๐‘ž and หœ๐พโ€ฒ ๐‘›,๐‘š,๐‘ž dependent only on ๐‘›, ๐‘š, and ๐‘ž such that โˆฅ๐”–๐‘š cont,๐‘ž ๐‘“ โˆ’ ๐”–๐‘š cont,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + )ร—SO(๐‘›)๐‘š โ‰ค |๐‘|๐‘ž ยท หœ๐พ๐‘›,๐‘š,๐‘ž โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž and โˆฅ๐”–๐‘š dyad,๐‘ž ๐‘“ โˆ’ ๐”–๐‘š dyad,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐‘ž โ„“2 (Z๐‘š)ร—SO(๐‘›)๐‘š โ‰ค |๐‘|๐‘ž ยท หœ๐พโ€ฒ ๐‘›,๐‘š,๐‘ž โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. Alternatively, if one of the following holds: โ€ข ๐‘› = 1, ๐œ“ is complex analytic and satisfies the conditions of Lemma 14, โ€ข ๐‘› โ‰ฅ 2 and ๐œ“ satisfies the conditions of Lemma 9, 56 there exist หœ๐ป๐‘š,๐‘› and หœ๐ปโ€ฒ ๐‘š,๐‘› such that โˆฅ๐”–๐‘š cont,1 ๐‘“ โˆ’ ๐”–๐‘š cont,1 ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + )ร—SO(๐‘›)๐‘š โ‰ค |๐‘| ยท หœ๐ป๐‘š,๐‘› โˆฅ ๐‘“ โˆฅH1 (R๐‘›). and โˆฅ๐”–๐‘š dyad,1 ๐‘“ โˆ’ ๐”–๐‘š dyad,1 ๐ฟ๐œ ๐‘“ โˆฅโ„“2 (Z๐‘š)ร—SO(๐‘›)๐‘š โ‰ค |๐‘| ยท หœ๐ปโ€ฒ ๐‘š,๐‘› โˆฅ ๐‘“ โˆฅH1 (R๐‘›) Theorem 29. Let ๐œ โˆˆ ๐ถ2(R๐‘›), and let ๐ฟ๐œ ๐‘“ (๐‘ฅ) = ๐‘“ (๐‘ฅ โˆ’ ๐œ(๐‘ฅ)). Suppose that ๐œ“ is a wavelet such that the wavelet and all its first and second partial derivatives have ๐‘‚ ((1 + |๐‘ฅ|)โˆ’๐‘›โˆ’3) decay. When ๐‘ž โˆˆ (1, 2), there exists a constant ๐ถ๐‘›,๐‘š,๐‘ž dependent on ๐œ‡(SO(๐‘›)), ๐‘›, ๐‘š, and ๐‘ž such that โˆฅ๐”–๐‘š cont,๐‘ž ๐‘“ โˆ’ ๐”–๐‘š cont,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ โˆฅ๐”–๐‘š dyad,๐‘ž ๐‘“ โˆ’ ๐”–๐‘š ๐‘“ โˆ’ ๐”–๐‘š cont,2 โˆฅ๐”–๐‘š dyad,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐ฟ๐œ ๐‘“ โˆฅ2 cont,2 โˆฅ๐”–๐‘š cont,2 ๐‘“ โˆ’ ๐”–๐‘š cont,2 ๐ฟ๐œ ๐‘“ โˆฅ2 ๐‘ž ๐‘ž, ๐‘ž ๐‘ž, ๐‘ž ๐‘ž + )ร—SO(๐‘›)๐‘š โ‰ค ๐ถ๐‘›,๐‘š,๐‘ž๐พ๐‘ž (๐œ) โˆฅ ๐‘“ โˆฅ L2 (R๐‘š โ„“2 (Z๐‘š)ร—SO(๐‘›)๐‘š โ‰ค หœ๐ถ๐‘›,๐‘š,๐‘ž๐พ๐‘ž (๐œ) โˆฅ ๐‘“ โˆฅ , + )ร—SO(๐‘›)๐‘š โ‰ค ๐ถ๐‘›,๐‘š๐พ2(๐œ) โˆฅ ๐‘“ โˆฅ2 + )ร—SO(๐‘›)๐‘š โ‰ค ๐ถ๐‘›,๐‘š๐พ2(๐œ) โˆฅ ๐‘“ โˆฅ2 L2 (R๐‘š L2 (R๐‘š 2 2 . 2.5.2 Rotation Invariant Representations The representation before was rotation equivariant, but in some tasks, we would rather have rotation invariance. In [11], the authors choose to integrate over each group action in a group of transformations. However, this will remove the information the relative angles between each action if we have multiple layers in our transform. In the case of one layer, since there is only one angle, we use a similar formulation to [11] and define continuous and dyadic scattering transforms for rotation invariance as ๐’ฎcont,๐‘ž ๐‘“ (๐œ†) = ๐’ฎdyad,๐‘ž ๐‘“ ( ๐‘—) = โˆซ SO(๐‘›) โˆซ SO(๐‘›) โˆฅ ๐‘“ โˆ— ๐œ“๐œ†,๐‘… โˆฅ ๐‘ž L๐‘ž (R๐‘›) ๐‘‘๐œ‡(๐‘…), โˆฅ ๐‘“ โˆ— ๐œ“ ๐‘—,๐‘… โˆฅ ๐‘ž L๐‘ž (R๐‘›) ๐‘‘๐œ‡(๐‘…). The corresponding norms are given by โˆฅ๐’ฎcont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R+) := (cid:34)โˆซ โˆž (cid:20)โˆซ 0 SO(๐‘›) โˆฅ ๐‘“ โˆ— ๐œ“๐œ†,๐‘… โˆฅ๐‘ž ๐œ‡(๐‘…) (cid:21) 2/๐‘ž ๐‘‘๐œ† ๐œ†๐‘›+1 (cid:35) ๐‘ž/2 , โˆฅ๐’ฎdyad,๐‘ž ๐‘“ โˆฅ ๐‘ž โ„“2 (Z) := (cid:34) โˆ‘๏ธ (cid:20)โˆซ ๐‘— โˆˆZ SO(๐‘›) โˆฅ ๐‘“ โˆ— ๐œ“ ๐‘—,๐‘… โˆฅ๐‘ž ๐œ‡(๐‘…) (cid:21) 2/๐‘ž(cid:35) ๐‘ž/2 . 57 Now we generalize to the case where ๐‘š โ‰ฅ 2. Let ๐‘…1, . . . , ๐‘…๐‘š โˆˆ SO(๐‘›). Define ๐’ฎ๐‘š cont,๐‘ž ๐‘“ (๐œ†1, . . . , ๐œ†๐‘š, ๐‘…2, . . . , ๐‘…๐‘š) := ๐’ฎ๐‘š dyad,๐‘ž ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š, ๐‘…2, . . . , ๐‘…๐‘š) := โˆซ SO(๐‘›) โˆซ SO(๐‘›) โˆฅ| ๐‘“ โˆ— ๐œ“๐œ†1,๐‘…2๐‘…1 | โˆ— ยท ยท ยท โˆ— |๐œ“๐œ†๐‘š,๐‘…๐‘š๐‘…1 โˆฅ2 ๐‘ž ๐‘‘๐œ‡(๐‘…1), โˆฅ| ๐‘“ โˆ— ๐œ“ ๐‘—1,๐‘…2๐‘…1 | โˆ— . . . | โˆ— ๐œ“ ๐‘—๐‘š,๐‘…๐‘š๐‘…1 โˆฅ2 ๐‘ž ๐‘‘๐œ‡(๐‘…1). The norm for the continuous transform, the norm โˆฅ๐’ฎ๐‘š cont,๐‘ž ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + )ร—SO(๐‘›)๐‘šโˆ’1, is given by (cid:32)โˆซ โˆž โˆซ โˆซ โˆž โˆซ โˆซ โˆž ยท ยท ยท 0 SO(๐‘›) 0 SO(๐‘›) 0 ๐’ฎ๐‘š cont,๐‘ž ๐‘“ ๐‘‘๐œ†1 ๐œ†๐‘›+1 1 ๐‘‘๐œ‡2(๐‘…2) ๐‘‘๐œ†2 ๐œ†๐‘›+1 2 . . . ๐‘‘๐œ‡๐‘š (๐‘…๐‘š) (cid:33) ๐‘ž/2 , ๐‘‘๐œ†๐‘š ๐œ†๐‘›+1 ๐‘š where we use the shorthand notation cont,๐‘ž ๐‘“ := ๐’ฎ๐‘š ๐’ฎ๐‘š cont,๐‘ž ๐‘“ (๐œ†1, . . . , ๐œ†๐‘š, ๐‘…2, . . . , ๐‘…๐‘š) dyad,๐‘ž ๐‘“ := ๐’ฎ๐‘š ๐’ฎ๐‘š dyad,๐‘ž ๐‘“ (๐œ†1, . . . , ๐œ†๐‘š, ๐‘…2, . . . , ๐‘…๐‘š) and for brevity. For the dyadic transform, the norm โˆฅ๐’ฎ๐‘š dyad,๐‘ž ๐‘“ โˆฅ ๐‘ž โ„“2 (Z)ร—SO(๐‘›)๐‘šโˆ’1 is given by (cid:32) โˆซ โˆ‘๏ธ โˆซ โˆ‘๏ธ ยท ยท ยท ๐‘—๐‘šโˆˆZ SO(๐‘›) ๐‘—2โˆˆZ SO(๐‘›) โˆ‘๏ธ ๐‘—1โˆˆZ ๐’ฎ๐‘š dyad,๐‘ž ๐‘“ ๐‘‘๐œ‡1(๐‘…1) ๐‘‘๐œ‡2(๐‘…2) . . . ๐‘‘๐œ‡๐‘š (๐‘…๐‘š) (cid:33) ๐‘ž/2 . Like before, we will discuss the well-definedness and stability of these operators to diffeomorphisms. The proofs will be omitted since they follow directly from the previous sections with minor modifications. Lemma 30. Let ๐œ“ be a wavelet that satisfies properties (2.4) and (2.5). โ€ข If 1 < ๐‘ž โ‰ค 2, we have ๐’ฎ๐‘š cont,๐‘ž : L๐‘ž (R๐‘›) โ†’ L2(R๐‘š + ) ร— SO(๐‘›)๐‘šโˆ’1 and ๐’ฎ๐‘š dyad,๐‘ž : L๐‘ž (R๐‘›) โ†’ โ„“2(Z๐‘š) ร— SO(๐‘›)๐‘šโˆ’1. โ€ข If ๐‘ž = 1 and one of the following holds: โ€“ ๐‘› = 1 and ๐œ“ is complex analytic, โ€“ ๐‘› โ‰ฅ 2 and ๐œ“ satisfies the conditions of Lemma 9, + ) ร— SO(๐‘›)๐‘šโˆ’1 and ๐’ฎ๐‘š cont,1 : L1(R๐‘›) โ†’ L2(R๐‘š then ๐’ฎ๐‘š dyad,1 : L1(R๐‘›) โ†’ โ„“2(Z๐‘š) ร— SO(๐‘›)๐‘šโˆ’1. 58 โ€ข If ๐‘ž = 2 and ๐œ“ is also a littlewood paley wavelet, we have โˆฅ๐’ฎ๐‘š ๐‘“ โˆฅโ„“1 (Z๐‘š)ร—SO(๐‘›)๐‘šโˆ’1 = ๐œ‡(SO(๐‘›))๐‘šโˆ’1๐ถ๐‘š ๐œ“ โˆฅ ๐‘“ โˆฅ2 2 and โˆฅ๐’ฎ๐‘š cont,2 ๐‘“ โˆฅL1 (R๐‘š dyad,2 + )ร—SO(๐‘›)๐‘šโˆ’1 = ๐œ‡(SO(๐‘›))๐‘šโˆ’1 ห†๐ถ๐‘š ๐œ“ โˆฅ ๐‘“ โˆฅ2 2. Theorem 31. Assume |๐‘| < 1 2๐‘› and 1 < ๐‘ž < 2. Let ๐œ(๐‘ฅ) = ๐‘๐‘ฅ and let ๐ฟ๐œ ๐‘“ (๐‘ฅ) = ๐‘“ ((1 โˆ’ ๐‘)๐‘ฅ). Suppose that ๐œ“ is a wavelet that satisfies the conditions of Lemma 14. Then there exist constants ห†๐พ๐‘›,๐‘š,๐‘ž and ห†๐พโ€ฒ ๐‘›,๐‘š,๐‘ž dependent only on ๐‘›, ๐‘š, and ๐‘ž such that โˆฅ๐’ฎ๐‘š cont,๐‘ž ๐‘“ โˆ’ ๐’ฎ๐‘š cont,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐‘ž L2 (R๐‘š + )ร—SO(๐‘›)๐‘šโˆ’1 โ‰ค |๐‘|๐‘ž ยท ห†๐พ๐‘›,๐‘š,๐‘ž โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž and โˆฅ๐’ฎ๐‘š dyad,๐‘ž ๐‘“ โˆ’ ๐’ฎ๐‘š dyad,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ ๐‘ž โ„“2 (Z๐‘š)ร—SO(๐‘›)๐‘šโˆ’1 โ‰ค |๐‘|๐‘ž ยท ห†๐พโ€ฒ ๐‘›,๐‘š,๐‘ž โˆฅ ๐‘“ โˆฅ ๐‘ž ๐‘ž. Additionally, if ๐‘ž = 1 and one of the following holds: โ€ข ๐‘› = 1, ๐œ“ is complex analytic and satisfies the conditions of Lemma 14, โ€ข ๐‘› โ‰ฅ 2 and ๐œ“ satisfies the conditions of Lemma 9, there exist ห†๐ป๐‘š,๐‘› and ห†๐ปโ€ฒ ๐‘š,๐‘› such that โˆฅ๐’ฎ๐‘š cont,1 ๐‘“ โˆ’ ๐’ฎ๐‘š cont,1 ๐ฟ๐œ ๐‘“ โˆฅL2 (R๐‘š + )ร—SO(๐‘›)๐‘šโˆ’1 โ‰ค |๐‘| ยท ห†๐ป๐‘š,๐‘› โˆฅ ๐‘“ โˆฅH1 (R๐‘›) and โˆฅ๐’ฎ๐‘š dyad,1 ๐‘“ โˆ’ ๐’ฎ๐‘š dyad,1 ๐ฟ๐œ ๐‘“ โˆฅโ„“2 (Z๐‘š)ร—SO(๐‘›)๐‘šโˆ’1 โ‰ค |๐‘| ยท ห†๐ปโ€ฒ ๐‘š,๐‘› โˆฅ ๐‘“ โˆฅH1 (R๐‘›). Theorem 32. Let ๐œ โˆˆ ๐ถ2(R๐‘›) and define ๐ฟ๐œ ๐‘“ (๐‘ฅ) = ๐‘“ (๐‘ฅ โˆ’ ๐œ(๐‘ฅ)) with โˆฅ๐ท๐œโˆฅโˆž < 1 2๐‘› . Suppose that ๐œ“ is a wavelet such that the wavelet and all its first and second partial derivatives have ๐‘‚ ((1+ |๐‘ฅ|)โˆ’๐‘›โˆ’3) decay. For ๐‘ž โˆˆ (1, 2], there exist constants ๐ถ๐‘š,๐‘›, ห†๐ถ๐‘š,๐‘›, ๐ถ๐‘š,๐‘›,๐‘ž, and ห†๐ถ๐‘š,๐‘›,๐‘ž such that โˆฅ๐’ฎ๐‘š cont,2 ๐‘“ โˆ’ ๐’ฎ๐‘š cont,2 ๐ฟ๐œ ๐‘“ โˆฅ2 โˆฅ๐’ฎ๐‘š dyad,2 ๐‘“ โˆ’ ๐’ฎ๐‘š dyad,2 ๐ฟ๐œ ๐‘“ โˆฅ2 โˆฅ๐’ฎcont,๐‘ž ๐‘“ โˆ’ ๐’ฎ๐‘š cont,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ โˆฅ๐’ฎ๐‘š dyad,๐‘ž ๐‘“ โˆ’ ๐’ฎ๐‘š dyad,๐‘ž ๐ฟ๐œ ๐‘“ โˆฅ , , 2 + )ร—SO(๐‘›)๐‘šโˆ’1 โ‰ค ๐ถ๐‘š,๐‘› ๐พ2(๐œ) โˆฅ ๐‘“ โˆฅ2 L2 (R๐‘š โ„“2 (Z๐‘š)ร—SO(๐‘›)๐‘šโˆ’1 โ‰ค ห†๐ถ๐‘š,๐‘› ๐พ2(๐œ) โˆฅ ๐‘“ โˆฅ2 ๐‘ž + )ร—SO(๐‘›)๐‘šโˆ’1 โ‰ค ๐ถ๐‘š,๐‘›,๐‘ž๐พ๐‘ž (๐œ) โˆฅ ๐‘“ โˆฅ L2 (R๐‘š โ„“2 (Z๐‘š)ร—SO(๐‘›)๐‘šโˆ’1 โ‰ค ห†๐ถ๐‘š,๐‘›,๐‘ž๐พ๐‘ž (๐œ) โˆฅ ๐‘“ โˆฅ ๐‘ž 2 ๐‘ž ๐‘ž, ๐‘ž ๐‘ž 59 CHAPTER 3 EXPECTED SCATTERING TRANSFORMS 3.1 Background Generalizing to stochastic processes, one can also consider scattering moments [11, 15], which have similar desirable properties as the nonwindowed scattering transform; other tangential works include [43, 44]. For the modeling of objects such as audio and image textures, one can think of them as realizations of highly non-Gaussian processes [15]. In the particular case of audio/image synthesis in particular, one would like generate a texture with the same statistical properties without generating a repetition of the texture. Equivariant features are more likely to lead to repetitions in textures. Thus, it is sensible to get a small number of rich descriptors that are translation invariant (e.g. using a realization of a process and calculating the nonwindowed scattering transform). In practice, instead of calculating an expectation, one takes an average of multiple realizations. Applications further applications include cosmology [45]. The main idea in all these applications is that the nonwindowed scattering transform has desirable mathematical properties and provides a small number of relevant descriptors for high dimensional, complicated data. 3.2 Wavelet Transforms for Stochastic Processes Let ๐‘‹ be a real valued stationary stochastic process with finite second moment. Also, let ๐œ“ be a wavelet. As a reminder, let ๐บ be a finite rotation group, and ๐บ+ be the quotient of ๐บ with the set {โˆ’1, 1}, and let ฮ› = {(2 ๐‘— , ๐‘Ÿ) : ๐‘— โˆˆ Z, ๐‘Ÿ โˆˆ ๐บ+}. For all ๐œ† โˆˆ ฮ›, dilations of the wavelet are given by ๐œ“๐œ† (๐‘ข) = 2โˆ’๐‘› ๐‘— ๐œ“(2โˆ’ ๐‘—๐‘Ÿ โˆ’1๐‘ข), and we define the wavelet transform of ๐‘‹ at scale 2 ๐‘— as ๐‘‹ โˆ— ๐œ“ ๐‘— (๐‘ก) = โˆซ R๐‘› ๐‘‹ (๐‘ข)๐œ“ ๐‘— (๐‘ก โˆ’ ๐‘ข) ๐‘‘๐‘ข. 60 (3.1) (3.2) The dyadic wavelet transform is given by ๐‘Š ๐‘‹ = {๐‘‹ โˆ— ๐œ“๐œ†}๐œ†โˆˆฮ›. (3.3) We say that is ๐œ“ a littlewood paley wavelet if ๐œ“ satisfies the following admissibility condition: | ห†๐œ“๐œ† (๐œ”)|2 = โˆ‘๏ธ ๐œ†โˆˆฮ› โˆ‘๏ธ โˆ‘๏ธ ๐‘Ÿโˆˆ๐บ+ ๐‘— โˆˆZ | ห†๐œ“(2 ๐‘—๐‘Ÿ โˆ’1๐œ”)|2 = ๐ถ๐œ“, โˆ€๐œ” โ‰  0. (3.4) For any littlewood paley wavelet, we have the following relation between the variance ๐œŽ2(๐‘‹) and the energy of the wavelet transform: where โˆ‘๏ธ ๐œ†โˆˆฮ› ๐›ฝ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด ๏ฃณ E[|๐‘‹ โˆ— ๐œ“๐œ†|2] = ๐›ฝ๐ถ๐œ“ ๐œŽ2(๐‘‹), (3.5) 1/2 if ๐œ“ is real valued, 1 if ๐œ“ is complex valued. 3.3 Scattering Moments and the Expected Scattering Transform Following [11], first order scattering moments are defined as ๐‘†1๐‘‹ (๐œ†) = E [|๐‘‹ โˆ— ๐œ“๐œ†|] , โˆ€๐œ† โˆˆ ฮ›. (3.6) Scattering moments for ๐‘š > 1 are an iterative application of a wavelet transform followed by a modulus, which is given by: ๐‘†๐‘š 1 ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š) = E (cid:2)||๐‘‹ โˆ— ๐œ“๐œ†1 | โˆ— ยท ยท ยท | โˆ— ๐œ“๐œ†๐‘š |(cid:3) , โˆ€(๐œ†1, . . . , ๐œ†๐‘š) โˆˆ ฮ›๐‘š. (3.7) The expected scattering transform is the set of all scattering moments: ๐‘†1๐‘‹ = {๐‘†๐‘š 1 ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š) : โˆ€(๐œ†1, . . . , ๐œ†๐‘š) โˆˆ ฮ›๐‘š, โˆ€๐‘š โˆˆ N} with norm โˆž โˆ‘๏ธ โˆ‘๏ธ โˆฅ๐‘†1๐‘‹ โˆฅ2 = ๐‘š=1 (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š |๐‘†๐‘š 1 ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š)|2. (3.8) (3.9) Additionally, suppose that ๐‘Œ is also a stochastic process with finite second moment. The scattering distance is given by โˆฅ๐‘†1๐‘‹ โˆ’ ๐‘†1๐‘Œ โˆฅ2 = โˆž โˆ‘๏ธ โˆ‘๏ธ ๐‘š=1 (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š |๐‘†๐‘š 1 ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š) โˆ’ ๐‘†๐‘š 1 ๐‘Œ (๐œ†1, . . . , ๐œ†๐‘š)|2 (3.10) 61 3.4 The Expected Scattering Transform When ๐‘ž = 2 Generalizing the norms above, we begin by defining the expected scattering transform and scattering norm when ๐‘ž = 2. The expected scattering transform is the set of all scattering moments: ๐‘†2๐‘‹ = {๐‘†๐‘š 2 ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š) : โˆ€(๐œ†1, . . . , ๐œ†๐‘š) โˆˆ ฮ›๐‘š, โˆ€๐‘š โˆˆ N} with norm and scattering distance โˆฅ๐‘†2๐‘‹ โˆฅ2 2 = โˆž โˆ‘๏ธ โˆ‘๏ธ ๐‘š=1 (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š |๐‘†๐‘š 2 ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š)|2 (3.11) (3.12) โˆฅ๐‘†2๐‘‹ โˆ’ ๐‘†2๐‘Œ โˆฅ2 2 = โˆž โˆ‘๏ธ โˆ‘๏ธ ๐‘š=1 (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š |๐‘†๐‘š 2 ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š) โˆ’ ๐‘†๐‘š 2 ๐‘Œ (๐œ†1, . . . , ๐œ†๐‘š)|2 (3.13) 3.4.1 General Properties Lemma 33. Suppose ๐œ“ is a littlewood paley wavelet. Then we have the following bound: โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š |๐‘†๐‘š 2 ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š)|2 = ๐›ฝ๐‘š๐ถ๐‘š ๐œ“ ๐œŽ2(๐‘‹) โ‰ค ๐›ฝ๐‘š๐ถ๐‘š ๐œ“ E[๐‘‹ 2]. Proof. Without a loss of generality, assume that ๐œ“ is complex and remove ๐›ฝ from all the proofs. We proceed by induction. The base case follows directly from (3.5) since Thus, we have โˆฅ๐‘†2๐‘‹ โˆฅ2 โ„“2 (Z) = ๐ถ๐œ“๐œŽ2(๐‘‹) โ‰ค ๐ถ๐œ“E[๐‘‹ 2]. Now assume that for some ๐‘˜ โˆˆ N, โˆ‘๏ธ |๐‘†๐‘˜ 2 (๐œ†1,...,๐œ†๐‘˜)โˆˆฮ›๐‘˜ ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘˜ )|2 = ๐ถ ๐‘˜ ๐œ“๐œŽ2(๐‘‹) โ‰ค ๐ถ ๐‘˜ ๐œ“E[๐‘‹ 2]. Define the random variable ๐‘Œ๐‘˜ = ||๐‘‹ โˆ— ๐œ“๐œ†1 | โˆ— ยท ยท ยท | โˆ— ๐œ“๐œ†๐‘˜ |, which is clearly stationary since the the modulus operator and wavelet transform both preserve stationarity of a stochastic process. It 62 follows that we can write โˆ‘๏ธ |๐‘†๐‘˜+1 2 ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘˜+1)|2 = โˆ‘๏ธ โˆ‘๏ธ E[|๐‘Œ๐‘˜ โˆ— ๐œ“ ๐‘—๐‘˜+1 |]2 (๐œ†1,...,๐œ†๐‘˜+1)โˆˆฮ›๐‘˜+1 (๐œ†1,...,๐œ†๐‘˜)โˆˆฮ›๐‘˜ โˆ‘๏ธ = ๐ถ๐œ“ ๐œ†๐‘˜+1โˆˆฮ› ๐œŽ2(๐‘Œ๐‘˜ ) โ‰ค ๐ถ๐œ“ ( ๐‘—1,..., ๐‘—๐‘˜)โˆˆZ๐‘˜ โˆ‘๏ธ ( ๐‘—1,..., ๐‘—๐‘˜)โˆˆZ๐‘˜ โ‰ค ๐ถ ๐‘˜+1 ๐œ“ E[๐‘‹ 2]. E[๐‘Œ 2 ๐‘˜ ] We first begin by proving that our expected scattering transform with ๐‘ž = 2 is a nonexpansive operator. Theorem 34 (Nonexpansive Operator). Suppose ๐œ“ is a littlewood paley wavelet with ๐›ฝ๐ถ๐œ“ โ‰ค 1 2. โ–ก Then โˆฅ๐‘†2๐‘‹ โˆ’ ๐‘†2๐‘Œ โˆฅ2 2 โ‰ค E[|๐‘‹ โˆ’ ๐‘Œ |2] and โˆฅ๐‘†2๐‘‹ โˆฅ2 2 โ‰ค E[๐‘‹ 2]. Proof. For notational simplicity, define ๐‘‹๐‘˜ = ||๐‘‹ โˆ— ๐œ“๐œ†1 | โˆ— ยท ยท ยท | โˆ— ๐œ“๐œ†๐‘˜ |, ๐‘Œ๐‘˜ = ||๐‘Œ โˆ— ๐œ“๐œ†1 | โˆ— ยท ยท ยท | โˆ— ๐œ“๐œ†๐‘˜ |. 63 We begin by applying Minkowskiโ€™s inequality and (3.5) repeatedly to get โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š |๐‘†๐‘š 2 ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š) โˆ’ ๐‘†๐‘š 2 ๐‘Œ (๐œ†1, . . . , ๐œ†๐‘š)|2 โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š โˆ‘๏ธ = โ‰ค (cid:12) (cid:12) (cid:12) E (cid:2)|๐‘‹๐‘šโˆ’1 โˆ— ๐œ“๐œ†๐‘š |2(cid:3) 1/2 โˆ’ E (cid:2)|๐‘Œ๐‘šโˆ’1 โˆ— ๐œ“๐œ†๐‘š |2(cid:3) 1/2(cid:12) 2 (cid:12) (cid:12) E (cid:2)|(๐‘‹๐‘šโˆ’1 โˆ’ ๐‘Œ๐‘šโˆ’1) โˆ— ๐œ“๐œ†๐‘š |2(cid:3) E (cid:2)|๐‘‹๐‘šโˆ’1 โˆ’ ๐‘Œ๐‘šโˆ’1|2(cid:3) E (cid:2)|(๐‘‹๐‘šโˆ’2 โˆ’ ๐‘Œ๐‘šโˆ’2) โˆ— ๐œ“๐œ†๐‘šโˆ’1 |2(cid:3) E (cid:2)|๐‘‹๐‘šโˆ’2 โˆ’ ๐‘Œ๐‘šโˆ’2|2(cid:3) (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š โˆ‘๏ธ โ‰ค ๐ถ๐œ“ (๐œ†1,...,๐œ†๐‘šโˆ’1)โˆˆฮ›๐‘šโˆ’1 โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘šโˆ’1)โˆˆฮ›๐‘šโˆ’1 โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘šโˆ’2)โˆˆฮ›๐‘šโˆ’2 โ‰ค ๐ถ๐œ“ โ‰ค ๐ถ2 ๐œ“ ... โ‰ค ๐ถ๐‘š ๐œ“ E[|๐‘‹ โˆ’ ๐‘Œ |2] Now sum over all ๐‘š to get โˆฅ๐‘†2๐‘‹ โˆ’ ๐‘†2๐‘‹ โˆฅ2 2 โ‰ค E[|๐‘‹ โˆ’ ๐‘Œ |2] โ‰ค E[|๐‘‹ โˆ’ ๐‘Œ |2]. โˆž โˆ‘๏ธ ๐ถ๐‘š ๐œ“ E[|๐‘‹ โˆ’ ๐‘Œ |2] = ๐ถ๐œ“ 1 โˆ’ ๐ถ๐œ“ ๐‘š=1 2 โ‰ค E[๐‘‹ 2], which completes the proof. Setting ๐‘Œ = 0 proves โˆฅ๐‘†2๐‘‹ โˆฅ2 โ–ก 3.4.2 Diffeomorphism Contraction Estimates Let ๐œ be a stationary random process independent of ๐‘‹ such that โˆฅ๐ท๐œโˆฅโˆž โ‰ค 1 2๐‘› with probability 1. Define the deformed process ๐ฟ๐œ ๐‘‹ (๐‘ฅ) = ๐‘‹ (๐‘ฅ โˆ’ ๐œ(๐‘ฅ)), which is still stationary. We will need the following lemma. Lemma 35 ([11], Lemma 4.8). Let ๐พ๐œ be an integral operator with a kernel ๐‘˜ ๐œ (๐‘ฅ, ๐‘ข) which depends upon a random process ๐œ. If the following two conditions are satisfied: and E (cid:2)๐‘˜ ๐œ (๐‘ฅ, ๐‘ข)๐‘˜ โˆ— ๐œ (๐‘ฅ, ๐‘ขโ€ฒ)(cid:3) = ๐‘˜ ๐œ (๐‘ฅ โˆ’ ๐‘ข, ๐‘ฅ โˆ’ ๐‘ขโ€ฒ) โˆซ โˆซ Rโ‹‰ Rโ‹‰ |๐‘˜ ๐œ (๐‘ฃ, ๐‘ฃโ€ฒ)||๐‘ฃ โˆ’ ๐‘ฃโ€ฒ| ๐‘‘๐‘ฃ ๐‘‘๐‘ฃโ€ฒ < โˆž, 64 then for any stationary process ๐‘Œ independent of ๐œ, E[|๐พ๐œ๐‘Œ (๐‘ฅ)|2] does not depend on ๐‘ฅ and E[|๐พ๐œ๐‘Œ |2] โ‰ค E[โˆฅ๐พ๐œ โˆฅ2]E[|๐‘Œ |2], where โˆฅ๐พ๐œ โˆฅ is the operator norm in L2(R๐‘›) for each realization of ๐œ. Theorem 36 (Diffeomorphism Contraction Estimate). Consider the random process ๐‘‹ โˆ’ ๐ฟ๐œ ๐‘‹. As- sume that ๐›ฝ๐ถ๐œ“ < 1/2 and ห†๐‘…๐‘‹โˆ’๐ฟ ๐œ ๐‘‹, the Fourier Transform of the covariance function, is bandlimited. We have the following estimate for some ๐ถ > 0: โˆฅ๐‘†2๐‘‹ โˆ’ ๐‘†2๐ฟ๐œ ๐‘‹ โˆฅ2 2 โ‰ค (๐ถ ๐‘€ 2E[โˆฅ๐œโˆฅ2 โˆž])E[|๐‘‹ |2]. Proof. Let ๐œ™ be a function such that ห†๐œ™(๐œ”) = 1, 0, ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด ๏ฃณ ๐œ” โˆˆ ๐ต1(0), ๐œ” โˆ‰ ๐ต1(0). Define ๐œ™๐‘€ (๐‘ฅ) = ๐‘€ โˆ’๐‘›๐œ™(๐‘€๐‘ฅ). Then we also know that โˆซ R๐‘› ๐œ™๐‘€ (๐‘ฅ) ๐‘‘๐‘ฅ = 1. Since our scattering operator is nonexpansive, we have โˆฅ๐‘†2๐‘‹ โˆ’ ๐‘†2๐ฟ๐œ ๐‘‹ โˆฅ2 2 โ‰ค E[|๐‘‹ โˆ’ ๐ฟ๐œ ๐‘‹ |2], where the expectation is over all possible randomness. Notice that since โˆซ R๐‘› ๐œ™๐‘€ (๐‘ฅ) ๐‘‘๐‘ฅ = 1, we can write E[|(๐‘‹ โˆ’ ๐ฟ๐œ ๐‘‹) โˆ— ๐œ™๐‘€ |2] = = = โˆซ R๐‘› โˆซ R๐‘› โˆซ ห†๐‘…๐‘‹โˆ’๐ฟ ๐œ ๐‘‹ (๐œ”)|๐œ™๐‘€ (๐œ”)|2 ๐‘‘๐œ” + E2 [(๐‘‹ โˆ’ ๐ฟ๐œ ๐‘‹) โˆ— ๐œ™๐‘€] ห†๐‘…๐‘‹โˆ’๐ฟ ๐œ ๐‘‹ (๐œ”)|๐œ™๐‘€ (๐œ”)|2 ๐‘‘๐œ” + E2 [๐‘‹ โˆ’ ๐ฟ๐œ ๐‘‹] ห†๐‘…๐‘‹โˆ’๐ฟ ๐œ ๐‘‹ (๐œ”) ๐‘‘๐œ” + E2 [๐‘‹ โˆ’ ๐ฟ๐œ ๐‘‹] ๐ต๐‘€ (0) = E[|๐‘‹ โˆ’ ๐ฟ๐œ ๐‘‹ |2]. In other words, if we define ๐ด๐œ™๐‘… ๐‘“ := ๐‘“ โˆ— ๐œ™๐‘…, we can write E[|(๐‘‹ โˆ’ ๐ฟ๐œ ๐‘‹) โˆ— ๐œ™๐‘… |2] = E[|( ๐ด๐œ™๐‘… โˆ’ ๐ด๐œ™๐‘… ๐ฟ๐œ) ๐‘“ |2]. 65 From estimates given in Theorem 3.6 of [23], in the deterministic case with ๐‘“ โˆˆ L2(R๐‘›) we have โˆฅ( ๐ด๐œ™๐‘… โˆ’ ๐ด๐œ™๐‘… ๐ฟ๐œ) ๐‘“ โˆฅ2 2 โ‰ค 4๐‘…2โˆฅโˆ‡๐œ™โˆฅ2 1โˆฅ๐œโˆฅ2 โˆžโˆฅ ๐‘“ โˆฅ2 2 , where ๐œ โˆˆ ๐ถ1(R๐‘›). It is proven in Appendix H of [11] that a operator of the form ๐ด๐œ™๐‘… โˆ’ ๐ด๐œ™๐‘… ๐ฟ๐œ has a kernel that satisfies Lemma 35. Thus, we have E[|๐‘‹ โˆ’ ๐ฟ๐œ ๐‘‹ |2] โ‰ค 4๐‘…2โˆฅโˆ‡๐œ™โˆฅ2 1 E[โˆฅ๐œโˆฅ2 โˆž]E[|๐‘‹ |2]. โ–ก 3.5 The Expected Scattering Transform When 1 < ๐‘ž < 2 Now we generalize to the case where ๐‘ž โˆˆ (1, 2). The case of ๐‘ž = 1 has been addressed in [11]. The expected scattering transform is the set of all scattering moments: ๐‘†๐‘ž ๐‘‹ = {๐‘†๐‘š ๐‘ž ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š) : โˆ€(๐œ†1, . . . , ๐œ†๐‘š) โˆˆ ฮ›๐‘š, โˆ€๐‘š โˆˆ N} (3.14) with norm โˆฅ๐‘†๐‘ž ๐‘‹ โˆฅ2 2 = โˆž โˆ‘๏ธ โˆ‘๏ธ ๐‘š=1 (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š |๐‘†๐‘š ๐‘ž ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š)|2 and scattering distance โˆฅ๐‘†๐‘ž ๐‘‹ โˆ’ ๐‘†๐‘ž๐‘Œ โˆฅ2 2 = โˆž โˆ‘๏ธ โˆ‘๏ธ ๐‘š=1 (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š |๐‘†๐‘š ๐‘ž ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š) โˆ’ ๐‘†๐‘š ๐‘ž ๐‘Œ (๐œ†1, . . . , ๐œ†๐‘š)|2. (3.15) (3.16) We start with a lemma that will help us determine when our generalized expected scattering transform is well defined. Lemma 37. Suppose ๐œ“ is a littlewood paley wavelet. Then we have the following bound: โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š |๐‘†๐‘š ๐‘ž ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š)|2 โ‰ค ๐›ฝ๐‘š๐ถ๐‘š ๐œ“ ๐œŽ2(๐‘‹) โ‰ค ๐›ฝ๐‘š๐ถ๐‘š ๐œ“ E[๐‘‹ 2]. 66 Proof. Without a loss of generality, assume that ๐œ“ is complex and remove ๐›ฝ from all the proofs. For each ๐‘š โˆˆ N, we apply Jensenโ€™s inequality to get โˆ‘๏ธ |๐‘†๐‘š ๐‘ž ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š)|2 = โˆ‘๏ธ E (cid:2)||๐‘‹ โˆ— ๐œ“๐œ†1 | โˆ— ยท ยท ยท | โˆ— ๐œ“๐œ†๐‘š |๐‘ž(cid:3) 2/๐‘ž (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š โ‰ค E (cid:2)||๐‘‹ โˆ— ๐œ“๐œ†1 | โˆ— ยท ยท ยท | โˆ— ๐œ“๐œ†๐‘š |2(cid:3) = ๐›ฝ๐‘š๐ถ๐‘š ๐œ“ ๐œŽ2(๐‘‹) โ‰ค ๐›ฝ๐‘š๐ถ๐‘š ๐œ“ E[๐‘‹ 2]. โ–ก Additionally, the expected scattering transform when 1 < ๐‘ž < 2 are all nonexpansive operators because of the following lemma. Theorem 38. Suppose ๐œ“ is a littlewood paley wavelet with ๐›ฝ๐ถ๐œ“ โ‰ค 1 2. Then โˆฅ๐‘†๐‘ž ๐‘‹ โˆ’ ๐‘†๐‘ž๐‘Œ โˆฅ2 2 โ‰ค E[|๐‘‹ โˆ’ ๐‘Œ |2] and โˆฅ๐‘†๐‘ž ๐‘‹ โˆฅ2 2 โ‰ค E[๐‘‹ 2]. Proof. For notational simplicity, we use ๐‘‹๐‘˜ = ||๐‘‹ โˆ— ๐œ“๐œ†1 | โˆ— ยท ยท ยท | โˆ— ๐œ“๐œ†๐‘˜ |, ๐‘Œ๐‘˜ = ||๐‘Œ โˆ— ๐œ“๐œ†1 | โˆ— ยท ยท ยท | โˆ— ๐œ“๐œ†๐‘˜ |. We have โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š |๐‘†๐‘š ๐‘ž ๐‘‹ (๐œ†1, . . . , ๐œ†๐‘š) โˆ’ ๐‘†๐‘š ๐‘ž ๐‘Œ (๐œ†1, . . . , ๐œ†๐‘š)|2 = โ‰ค โ‰ค โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š โˆ‘๏ธ (๐œ†1,...,๐œ†๐‘š)โˆˆฮ›๐‘š E (cid:2)|๐‘‹๐‘šโˆ’1 โˆ— ๐œ“๐œ†๐‘š |๐‘ž(cid:3) 1/๐‘ž (cid:12) (cid:12) (cid:12) โˆ’ E (cid:2)|๐‘Œ๐‘šโˆ’1 โˆ— ๐œ“๐œ†๐‘š |๐‘ž(cid:3) 1/๐‘ž(cid:12) (cid:12) (cid:12) 2 E (cid:2)|(๐‘‹๐‘šโˆ’1 โˆ’ ๐‘Œ๐‘šโˆ’1) โˆ— ๐œ“๐œ†๐‘š |๐‘ž(cid:3) 2/๐‘ž E (cid:2)|(๐‘‹๐‘šโˆ’1 โˆ’ ๐‘Œ๐‘šโˆ’1) โˆ— ๐œ“๐œ†๐‘š |2(cid:3) โ‰ค ๐ถ๐‘š ๐œ“ E[|๐‘‹ โˆ’ ๐‘Œ |2]. Now sum over all ๐‘š to finish the proof. โ–ก 67 The following corollary also follows immediately from the proof above and the ๐‘ž = 2 case. Corollary 39. Suppose ๐œ is a stochastic process independent of ๐‘‹ and ๐œ“ is a littlewood paley wavelet with ๐›ฝ๐ถ๐œ“ โ‰ค 1/2. Consider the random process ๐‘‹ โˆ’ ๐ฟ๐œ ๐‘‹, and suppose that the Fourier Transform of its covariance function, ห†๐‘…๐‘‹โˆ’๐ฟ ๐œ ๐‘‹ (๐œ”), is supported on some finite ball with radius ๐‘… centered at the origin: ๐ต๐‘… (0). We have the following estimate for some ๐ถ > 0: โˆฅ๐‘†๐‘ž ๐‘‹ โˆ’ ๐‘†๐‘ž ๐ฟ๐œ ๐‘‹ โˆฅ2 2 โ‰ค ๐ถ ๐‘…2E[โˆฅ๐œโˆฅ2 โˆž]E[|๐‘‹ |2]. 68 CHAPTER 4 NONWINDOWED SCATTERING ON COMPACT RIEMANNIAN MANIFOLDS In this chapter, we generalize our results with ๐‘ž = 2 to compact Riemannian manifolds. First, let us motivate why one would consider scattering transforms for non-Euclidean data. Suppose we have number written on a set spheres (i.e. spherical MNIST). We would like to classify which number is each of these spheres. A Euclidean approach would be to voxelize each of these spheres as ๐‘ ร— ๐‘ ร— ๐‘ discretized cubes and feed these cubes into a feature extractor (i.e. a scattering transform or a convolutional neural network). However, compared to a ๐‘ ร— ๐‘ image, this approach is ๐‘ times more expensive in terms of memory because of the extra dimension. One can instead consider these as signals on the sphere, which has a lower intrinsic dimension. The point is that using Euclidean representations is not necessarily the best representation for feature extraction. The paper [46] was the first to explore a unified framework for geometric deep learning, and [28, 27, 29] provided a mathematical framework for scattering transforms for noneuclidean data. Additionally, for spherical data, windowed scattering transforms have been generalized in [47, 48], where the convolution operation is specific to the sphere, and numerical implementations are optimized relative to [28] (with a trade-off of flexibility). As an aside, one could consider nonwindowed versions of [47, 48] for classification tasks on the sphere. In particular, [28] defines the nonwindowed scattering transform for compact manifolds as L1 norms of a cascade of wavelet transforms and nonlinearities, which will be reviewed below. Similar to scattering moments and nonwindowed scattering transforms for Euclidean data, one would suspect that using L๐‘ž norms instead of L1 norms provide richer discriptors for signals on manifolds. This motivates our results for ๐‘ž = 2. Other values of ๐‘ž have been left to future work. 4.1 Notation for Scattering on Manifolds Let M will be a compact, smooth, ๐‘›-dimensional Riemannian manifold without boundary contained in R๐‘‘, where ๐‘‘ โ‰ฅ ๐‘› with geodesic distance between two points ๐‘ฅ1, ๐‘ฅ2 โˆˆ M given by ๐‘Ÿ (๐‘ฅ1, ๐‘ฅ2) and Laplace-Beltrami operator denoted as ฮ”. The notation L๐‘ž (M) denotes the set of all functions ๐‘“ : M โ†’ R such that โˆซ | ๐‘“ (๐‘ฅ)|๐‘ž ๐‘‘๐‘ฅ < โˆž, where ๐‘‘๐‘ฅ is integration with respect to M 69 the Riemannian volume. We use the notation Isom(M1, M2) be the set of isometries between manifolds M1 and M2. Lastly, the set of diffeomorphisms on M will be denoted by Diff(M), and the maximum placement of ๐›พ โˆˆ Diff(M) will be given by โˆฅ๐›พโˆฅโˆž := sup๐‘ฅโˆˆM ๐‘Ÿ (๐‘ฅ, ๐›พ(๐‘ฅ)). 4.2 Spectral Filters and the Geometric Wavelet Transform We provide a brief summary of the geometric wavelet transform, as presented in [28]. The convolution of ๐‘“ , ๐‘” โˆˆ ๐ฟ2(R๐‘›) is usually defined in space as ( ๐‘“ โˆ— ๐‘”) (๐‘ฅ) = โˆซ R๐‘› ๐‘“ (๐‘ฆ)๐‘”(๐‘ฅ โˆ’ ๐‘ฆ) ๐‘‘๐‘ฆ. However, for a general manifold, even under the conditions we have prescribed, a notation of translation does not necessarily exist. Instead, one can consider a spectral definition of convolution via the spectral decomposition of โˆ’ฮ”. Denote N โˆช {0} = N0. Because our manifold is compact, it is well known that โˆ’ฮ” has a discrete spectrum, and we can order the eigenvalues in increasing order and denote them as {๐œ†๐‘›}๐‘›โˆˆN0. We will denote the corresponding eigenfunctions as {๐œ™๐‘› (๐‘ฅ)}๐‘›โˆˆN0, which form an orthonormal basis for L2(M). Suppose ๐‘“ โˆˆ L2(M). Since the set of functions {๐œ™๐‘› (๐‘ฅ)}๐‘›โˆˆN0 forms a basis in L2(M), we decompose ๐‘“ (๐‘ฅ) = โˆ‘๏ธ โŸจ ๐‘“ , ๐œ™๐‘›โŸฉ๐œ™๐‘› (๐‘ฅ) = (cid:18)โˆซ โˆ‘๏ธ ๐‘“ (๐‘ฆ)๐œ™๐‘› (๐‘ฆ) ๐‘‘๐‘ฆ (cid:19) ๐œ™๐‘› (๐‘ฅ), (4.1) ๐‘›โˆˆN0 which is similar to a Fourier series. Since ๐œ™๐‘› (๐‘ฆ), is a replacement for a Fourier node, it is natural ๐‘›โˆˆN0 M to let ห†๐‘“ (๐‘›) = โˆซ M ๐‘“ (๐‘ฆ)๐œ™๐‘› (๐‘ฆ) ๐‘‘๐‘ฆ and define convolution on M between functions ๐‘“ , โ„Ž โˆˆ L2(M) as ๐‘“ โˆ— โ„Ž(๐‘ฅ) = โˆ‘๏ธ ๐‘›โˆˆN0 ห†๐‘“ (๐‘›) ห†โ„Ž(๐‘›)๐œ™๐‘› (๐‘ฅ). (4.2) (4.3) Defining the operator ๐‘‡โ„Ž ๐‘“ (๐‘ฅ) := ๐‘“ โˆ— โ„Ž(๐‘ฅ), it is easy to verify that the kernel for ๐‘‡โ„Ž is given by ๐พโ„Ž (๐‘ฅ, ๐‘ฆ) := โˆ‘๏ธ ๐‘›โˆˆN0 ห†โ„Ž(๐‘›)๐œ™๐‘› (๐‘ฅ)๐œ™๐‘› (๐‘ฆ). (4.4) 70 Similar to how convolution commutes with translations on R๐‘›, it is important for convolution on M to be equivariant to a group action on M. The authors of [28] construct an operator by convolving with functions that commute with isometries since the the geometry of M should be preserved by a representation. To accomplish this goal, we use a similar definition for spectral filters. A filter โ„Ž โˆˆ L2(M) is a spectral filter if ๐œ†๐‘˜ = ๐œ†โ„“ implies ห†โ„Ž(๐‘˜) = ห†โ„Ž(โ„“). One can prove that there exists ๐ป : [0, โˆž) โ†’ R such that ๐ป (๐œ†๐‘›) = ห†โ„Ž(๐‘›), โˆ€๐‘› โˆˆ N0. Let ๐บ : [0, โˆž) โ†’ R be be nonnegative and decreasing with ๐บ (0) > 0. A low-pass spectral filter ๐œ™ is given in frequency as ห†๐œ™(๐‘˜) := ๐บ (๐œ†๐‘˜ ) and its dilation at scale 2 ๐‘— for ๐‘— โˆˆ Z is ห†๐œ™ ๐‘— (๐‘˜) := ๐บ (2 ๐‘—๐œ†๐‘˜ ). Using the set of low pass filters, { ห†๐œ™ ๐‘— } ๐‘— โˆˆZ, we define wavelets by ห†๐œ“ ๐‘— (๐‘˜) := (cid:2)| ห†๐œ™ ๐‘—โˆ’1(๐‘˜)|2 โˆ’ | ห†๐œ™ ๐‘— (๐‘˜)|2(cid:3) 1/2 , (4.5) which is identical to standard constructions of Littlewood Paley wavelets in Euclidean Space. Fix ๐ฝ โˆˆ Z. Define the operators ๐ด๐ฝ ๐‘“ := ๐‘“ โˆ— ๐œ™๐ฝ, ฮจ๐‘— ๐‘“ := ๐‘“ โˆ— ๐œ“ ๐‘— , ๐‘— โ‰ค ๐ฝ. The windowed geometric wavelet transform is given by ๐‘Š๐ฝ ๐‘“ := {๐ด๐ฝ ๐‘“ , ฮจ๐‘— ๐‘“ : ๐‘— โ‰ค ๐ฝ} and the nonwindowed geometric scattering transform is given by ๐‘Š ๐‘“ := {ฮจ๐‘— ๐‘“ : ๐‘— โˆˆ Z}. (4.6) (4.7) We have the following theorem, which provides a condition for when our wavelet frame is a nonexpansive frame. 71 Theorem 40. Let ๐บ : [0, โˆž) โ†’ R be nonnegative and decreasing with 0 < ๐บ (0) = ๐ถ, lim๐‘ฅโ†’โˆž ๐บ (๐‘ฅ) = 0, and {๐œ“ ๐‘— } ๐‘— โˆˆZ is a set of wavelets generated by the low pass filter ห†๐œ™(๐‘˜) = ๐บ (๐œ†๐‘˜ ). Then we have โˆ‘๏ธ ๐‘— โˆˆZ โˆฅ ๐‘“ โˆ— ๐œ“ ๐‘— โˆฅ2 2 = ๐ถ โˆฅ ๐‘“ โˆฅ2 2 . (4.8) Proof. For fixed ๐ฝ > 1, we telescope to get ๐ฝ โˆ‘๏ธ ๐‘—=โˆ’๐ฝ | ห†๐œ™ ๐‘— (๐‘˜)|2 = ๐ฝ โˆ‘๏ธ ๐‘—=โˆ’๐ฝ (cid:2)|๐บ (2 ๐‘—โˆ’1๐œ†๐‘˜ )|2 โˆ’ |๐บ (2 ๐‘—๐œ†๐‘˜ )|2(cid:3) = |๐บ (2๐ฝโˆ’1๐œ†๐‘˜ )|2 โˆ’ |๐บ (2โˆ’๐ฝ๐œ†๐‘˜ )|2. Since lim๐ฝโ†’โˆž |๐บ (2๐ฝโˆ’1๐œ†๐‘˜ )|2 and lim๐ฝโ†’โˆž |๐บ (2โˆ’๐ฝ๐œ†๐‘˜ )|2 both exist, it follows that โˆ‘๏ธ ๐‘— โˆˆZ | ห†๐œ™ ๐‘— (๐‘˜)|2 = lim ๐ฝโ†’โˆž |๐บ (2๐ฝโˆ’1๐œ†๐‘˜ )|2 โˆ’ lim ๐ฝโ†’โˆž |๐บ (2โˆ’๐ฝ๐œ†๐‘˜ )|2 = ๐ถ. We can write Thus, it follows that โˆฅ ๐‘“ โˆ— ๐œ“ ๐‘— โˆฅ2 2 = โˆ‘๏ธ ๐‘›โˆˆN0 | ห†๐œ“ ๐‘— (๐‘˜)|2| ห†๐‘“ (๐‘˜)|2. โˆ‘๏ธ ๐‘— โˆˆZ โˆฅ ๐‘“ โˆ— ๐œ“ ๐‘— โˆฅ2 2 = โˆ‘๏ธ โˆ‘๏ธ ๐‘— โˆˆZ ๐‘›โˆˆN0 | ห†๐‘“ (๐‘˜)|2| ห†๐œ“ ๐‘— (๐‘˜)|2 = โˆ‘๏ธ ๐‘— โˆˆZ | ห†๐‘“ (๐‘˜)|2 (cid:32) โˆ‘๏ธ ๐‘›โˆˆN0 (cid:33) | ห†๐œ“ ๐‘— (๐‘˜)|2 = ๐ถ โˆฅ ๐‘“ โˆฅ2 2 . โ–ก 4.3 The Geometric Scattering Transform In an analogous manner to the Euclidean definition of the scattering transform, one would like to find a representation that meaningfully encodes high frequency information of a signal ๐‘“ . Define the propagator as ๐‘ˆ [ ๐‘—] ๐‘“ := |๐‘Š ๐‘— ๐‘“ | โˆ€ ๐‘— โˆˆ Z, (4.9) 72 which is convolution of a wavelet and applying a nonlinearity. Similarly, we can define the windowed propogator as ๐‘ˆ๐ฝ [ ๐‘—] ๐‘“ := |๐‘Š ๐‘— ๐‘“ | โˆ€ ๐‘— โ‰ค ๐ฝ. (4.10) Similar to Scattering Transforms on Euclidean Space, one can apply a cascade of convolutions and modulus operators repeatedly. In particular, for ๐‘š โˆˆ N, let ๐‘—1, . . . , ๐‘—๐‘š โˆˆ Z. The ๐‘š-layer propogator is defined as ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘š] := ๐‘ˆ [ ๐‘—๐‘š] ยท ยท ยท ๐‘ˆ [ ๐‘—1] ๐‘“ = | ๐‘“ โˆ— ๐œ“ ๐‘—1 | โˆ— ๐œ“ ๐‘—2 ยท ยท ยท โˆ— ๐œ“ ๐‘—๐‘š | (4.11) and the ๐‘š-layer windowed propogator is defined as ๐‘ˆ๐ฝ [ ๐‘—1, . . . , ๐‘—๐‘š] := ๐‘ˆ๐ฝ [ ๐‘—๐‘š] ยท ยท ยท ๐‘ˆ๐ฝ [ ๐‘—1] ๐‘“ := | ๐‘“ โˆ— ๐œ“ ๐‘—1 | โˆ— ๐œ“ ๐‘—2 ยท ยท ยท โˆ— ๐œ“ ๐‘—๐‘š |, ๐‘—1, . . . , ๐‘—๐‘š โ‰ค ๐ฝ (4.12) with ๐‘ˆ [โˆ…] ๐‘“ = ๐‘“ and ๐‘ˆ๐ฝ [โˆ…] ๐‘“ = ๐‘“ . To aggregate low information and get local isometry invariance, one can apply a low pass filter in a manner similar to pooling to each windowed propogator to get windowed scattering coefficients: ๐‘† ๐‘— [ ๐‘—1, . . . , ๐‘—๐‘š] = ๐ด๐ฝ๐‘ˆ๐ฝ [ ๐‘—1, . . . , ๐‘—๐‘š] ๐‘“ = ๐‘ˆ๐ฝ [ ๐‘—1, . . . , ๐‘—๐‘š] ๐‘“ โˆ— ๐œ™๐ฝ, where we defined ๐‘†๐ฝ [โˆ…] ๐‘“ = ๐‘“ โˆ— ๐œ™๐ฝ. The windowed geometric scattering transform is given by ๐‘†๐ฝ ๐‘“ = {๐‘† ๐‘— [ ๐‘—1, . . . , ๐‘—๐‘š] : ๐‘š โ‰ฅ 0, ๐‘—๐‘– โ‰ค ๐ฝ โˆ€1 โ‰ค ๐‘– โ‰ค ๐‘š}. (4.13) The authors of [28] were able to prove that this nonwindowed scattering operator was nonexpansive, invariant to isometries up to the scale of the low pass filter, and stable to diffeomprohisms under mild assumptions. In addition, the authors consider a nonwindowed scattering transform, which removes the low pass filtering. For applications such as manifold classification, requires full isometry invariance instead of isometry invarance up to the scale 2๐ฝ. We see that ๐‘†[ ๐‘—1, . . . , ๐‘—๐‘š] ๐‘“ (๐‘ฅ) = vol(M)โˆ’1/2โˆฅ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘š] ๐‘“ โˆฅ1. (4.14) lim ๐ฝโ†’โˆž As a proxy, it is more appropriate to consider ๐‘† ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š) = โˆฅ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘š] ๐‘“ โˆฅ1, (4.15) 73 which motivates defining the nonwindowed geometric scattering transform as ๐‘† ๐‘“ = {๐‘†[ ๐‘—1, . . . , ๐‘—๐‘š] : ๐‘š โ‰ฅ 0, ๐‘—๐‘– โˆˆ Z, โˆ€1 โ‰ค ๐‘– โ‰ค ๐‘š}. (4.16) However, as mentioned previously, [36, 49] motivate the use of nonwindowed geometric scat- tering operators as 2-norms of a cascade of convolutions and modulus operators: ๐‘†๐‘ž ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š) = โˆฅ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘š] ๐‘“ โˆฅ2. Additionally, one can generalize nonwindowed geometric scattering transform to ๐‘†2 ๐‘“ = {๐‘†2 [ ๐‘—1, . . . , ๐‘—๐‘š] : ๐‘š โ‰ฅ 0, ๐‘—๐‘– โˆˆ Z, โˆ€1 โ‰ค ๐‘– โ‰ค ๐‘š}, (4.17) which we will call the 2-nonwindowed geometric scattering transform. 4.4 Generalizing Geometric Scattering Transforms To measure stability and invariance properties of the 2-nonwindowed geometric scattering transform, we need to define appropriate norms. The original nonwindowed geometric scattering transform was a mapping โ„“2(L1(M)) โ†’ L2(M), but our interpretation is slightly different. In particular, rather than thinking of the coefficients as a sequence, we group the coefficients in each layer and define the norm โˆฅ๐‘†2 ๐‘“ โˆฅ2 = โˆž โˆ‘๏ธ ๐‘š=1 (cid:169) (cid:173) (cid:171) โˆ‘๏ธ ( ๐‘—1,..., ๐‘—๐‘š)โˆˆZ๐‘š |๐‘†2 ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š)|2(cid:170) (cid:174) (cid:172) (4.18) with scattering distance given by โˆฅ๐‘†2 ๐‘“ โˆ’ ๐‘†2๐‘”โˆฅ2 = โˆž โˆ‘๏ธ ๐‘š=1 (cid:169) (cid:173) (cid:171) โˆ‘๏ธ ( ๐‘—1,..., ๐‘—๐‘š)โˆˆZ๐‘š |๐‘†2 ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š) โˆ’ ๐‘†2๐‘”( ๐‘—1, . . . , ๐‘—๐‘š)|2(cid:170) (cid:174) (cid:172) . (4.19) Theorem 41. Let ๐บ : [0, โˆž) โ†’ R be nonnegative and decreasing with 0 < ๐บ (0) = 1โˆš 2 , lim๐‘ฅโ†’โˆž ๐บ (๐‘ฅ) = 0, and {๐œ“ ๐‘— } ๐‘— โˆˆZ be a set of spectral filters generated by ๐บ. Then we have for all ๐‘“ , ๐‘” โˆˆ L2(M). โˆฅ๐‘†2 ๐‘“ โˆ’ ๐‘†2๐‘”โˆฅ โ‰ค โˆฅ ๐‘“ โˆ’ ๐‘”โˆฅ2 74 Proof. We begin by proving that โˆ‘๏ธ ( ๐‘—1,..., ๐‘—๐‘š)โˆˆZ๐‘š |๐‘†2 ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š) โˆ’ ๐‘†2๐‘”( ๐‘—1, . . . , ๐‘—๐‘š)|2 โ‰ค 2โˆ’๐‘š โˆฅ ๐‘“ โˆฅ2 2 for all ๐‘š โˆˆ N via induction. In the case of ๐‘š = 1, we see that |๐‘†2 ๐‘“ ( ๐‘—) โˆ’ ๐‘†2๐‘”( ๐‘—)|2 = โˆ‘๏ธ ๐‘— โˆˆZ โ‰ค = โˆ‘๏ธ ๐‘— โˆˆZ โˆ‘๏ธ ๐‘— โˆˆZ โˆ‘๏ธ ๐‘— โˆˆZ |โˆฅ ๐‘“ โˆ— ๐œ“ ๐‘— โˆฅ2 โˆ’ |๐‘” โˆ— ๐œ“ ๐‘— โˆฅ2|2 โˆฅ ๐‘“ โˆ— ๐œ“ ๐‘— โˆ’ ๐‘” โˆ— ๐œ“ ๐‘— โˆฅ2 2 โˆฅ( ๐‘“ โˆ’ ๐‘”) โˆ— ๐œ“ ๐‘— โˆฅ2 2 โ‰ค 2โˆ’1โˆฅ ๐‘“ โˆ’ ๐‘”โˆฅ2 2 . We can now work recursively. It follows that we can use similar ideas to the ๐‘š = 1 case to get โˆ‘๏ธ ( ๐‘—1,..., ๐‘—๐‘š+1)โˆˆZ๐‘š+1 |๐‘†2 ๐‘“ ( ๐‘—1, . . . , ๐‘—๐‘š+1) โˆ’ ๐‘†2๐‘”( ๐‘—1, . . . , ๐‘—๐‘š+1)|2 โˆ‘๏ธ (cid:12) (cid:12)โˆฅ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘š] ๐‘“ โˆ— ๐œ“ ๐‘—+1โˆฅ2 โˆ’ โˆฅ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘š]๐‘” โˆ— ๐œ“ ๐‘—+1โˆฅ2 (cid:12) (cid:12) 2 ( ๐‘—1,..., ๐‘—๐‘š+1)โˆˆZ๐‘š+1 โˆ‘๏ธ (cid:12) (cid:12)โˆฅ(๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘š] ๐‘“ โˆ’ ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘š]๐‘”) โˆ— ๐œ“ ๐‘—+1โˆฅ2 (cid:12) 2 (cid:12) = = ( ๐‘—1,..., ๐‘—๐‘š+1)โˆˆZ๐‘š+1 โ‰ค 2โˆ’1 โˆ‘๏ธ ( ๐‘—1,..., ๐‘—๐‘š)โˆˆZ๐‘š โˆ‘๏ธ โ‰ค ( ๐‘—1,..., ๐‘—๐‘š)โˆˆZ๐‘š โ‰ค 2โˆ’2 โˆ‘๏ธ โˆฅ|๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘šโˆ’1] ๐‘“ โˆ— ๐œ“ ๐‘—๐‘š | โˆ’ |๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘šโˆ’1]๐‘” โˆ— ๐œ“ ๐‘—๐‘š |โˆฅ2 2 โˆฅ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘šโˆ’1] ๐‘“ โˆ— ๐œ“ ๐‘—๐‘š โˆ’ ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘šโˆ’1]๐‘” โˆ— ๐œ“ ๐‘—๐‘š โˆฅ2 2 โˆฅ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘šโˆ’1] ๐‘“ โˆ’ ๐‘ˆ [ ๐‘—1, . . . , ๐‘—๐‘šโˆ’1]๐‘”โˆฅ2 2 ( ๐‘—1,..., ๐‘—๐‘šโˆ’1)โˆˆZ๐‘šโˆ’1 โ‰ค 2โˆ’๐‘˜+1โˆฅ ๐‘“ โˆ’ ๐‘”โˆฅ2 2 . Now we can sum over all ๐‘š to get โˆฅ๐‘†2 ๐‘“ โˆ’ ๐‘†2๐‘”โˆฅ2 โ‰ค โˆž โˆ‘๏ธ ๐‘š=1 2โˆ’๐‘š โˆฅ ๐‘“ โˆ’ ๐‘”โˆฅ2 2 = โˆฅ ๐‘“ โˆ’ ๐‘”โˆฅ2 2 . โ–ก 75 Corollary 42. Let ๐บ : [0, โˆž) โ†’ R be nonnegative and decreasing with 0 < ๐บ (0) โ‰ค 1โˆš 2 , lim๐‘ฅโ†’โˆž ๐บ (๐‘ฅ) = 0, and {๐œ“ ๐‘— } ๐‘— โˆˆZ be a set of spectral filters generated by ๐บ. Then we have for all ๐‘“ โˆˆ L2(M). โˆฅ๐‘†2 ๐‘“ โˆฅ โ‰ค โˆฅ ๐‘“ โˆฅ2 Towards the point of embedding proper invariance, we provide a theorem that demonstrates that the 2-nonwindowed geometric scattering transform is invariant to isometries. Theorem 43. Let ๐œ‰ โˆˆ Isom(M, Mโ€ฒ), and let ๐‘“ โˆˆ L๐ฟ2(M). Define ๐‘“ โ€ฒ = ๐‘‰๐œ‰ ๐‘“ and let ๐‘†โ€ฒ 2 be the corresponding 2-nonwindowed geometric scattering transform on Mโ€ฒ produced by a littlewood paley wavelet satisfying the conditions described in Theorem 40. We have ๐‘†โ€ฒ ๐‘“ โ€ฒ = ๐‘†2 ๐‘“ . 2 Proof. We see that ๐‘†2 [โˆ…] ๐‘“ = โˆฅ ๐‘“ โˆฅ2 = โˆฅ๐‘‰๐œ‰ ๐‘“ โˆฅ2 since ๐‘‰๐œ‰ is an isometry. Now suppose that we consider ๐‘ = ( ๐‘—1, . . . , ๐‘—๐‘š). Then ๐‘†2 [ ๐‘—1, . . . , ๐‘—๐‘š] ๐‘“ = โˆฅ๐‘ˆ [ ๐‘] ๐‘“ โˆฅ2 = โˆฅ๐‘‰๐œ‰๐‘ˆ [ ๐‘] ๐‘“ โˆฅ2 = โˆฅ๐‘ˆ [ ๐‘]๐‘‰๐œ‰ ๐‘“ โˆฅ2 = โˆฅ๐‘ˆ [ ๐‘] ๐‘“ โ€ฒโˆฅ2 = ๐‘†โ€ฒ 2 [ ๐‘—1, . . . , ๐‘—๐‘š] ๐‘“ โ€ฒ. Thus, we can see that ๐‘†โ€ฒ 2 ๐‘“ โ€ฒ = ๐‘†2 ๐‘“ . โ–ก Additionally, we also have a diffeomorphism stability result for ๐œ†-bandlimited functions (i.e. ห†๐‘“ )(๐‘˜) = โŸจ ๐‘“ , ๐œ™๐‘˜ โŸฉ = 0 whenever ๐œ†๐‘˜ > ๐œ†). Lemma 44 ([28]). Suppose ๐œ‰ โˆˆ Diff(M). If ๐‘“ โˆˆ L2(M) is ๐œ†-bandlimited, and ๐œ‰ โˆˆ Diff(M) can be decomposed as ๐œ‰ = ๐œ‰1 โ—ฆ ๐œ‰2, where ๐œ‰2 โˆˆ Diff(M) and ๐œ‰1 โˆˆ Isom(M), then โˆฅ ๐‘“ โˆ’ ๐‘‰๐œ‰ ๐‘“ โˆฅ2 โ‰ค ๐ถ (M)๐œ†๐‘› โˆฅ๐œ‰ โˆฅโˆžโˆฅ ๐‘“ โˆฅ2 for some constant ๐ถ (M). 76 Theorem 45. Let ๐‘“ โˆˆ L2(M), and assume that ๐œ“ is a wavelet family satisfying the conditions of Theorem 41 with ๐บ (๐œ†) โ‰ค ๐‘’โˆ’๐œ†. If ๐œ‰ โˆˆ Diff(M) can be decomposed as ๐œ‰ = ๐œ‰1 โ—ฆ ๐œ‰2, where ๐œ‰2 โˆˆ Diff(M) and ๐œ‰1 โˆˆ Isom(M), then โˆฅ๐‘†2 ๐‘“ โˆ’ ๐‘†2๐‘‰๐œ‰ ๐‘“ โˆฅ2 โ‰ค ๐ถ (M)๐œ†๐‘› โˆฅ๐œ‰ โˆฅโˆžโˆฅ ๐‘“ โˆฅ2. Proof. The transform is nonexpansive, so Lemma 44 gives the desired result. โ–ก 77 CHAPTER 5 CONCLUSIONS This thesis has provided a generalization of nonwindowed scattering transforms to signals in Euclidean space, as realizations stochastic processes, and signals on compact manifolds. Future work involves the following: โ€ข Generalize the diffeomorphism bound from chapter 2 to stochastic processes. This is possible, but this is more difficult because the techniques used in Euclidean space for Chapter 2 do not apply directly. โ€ข Apply ๐‘ž-scattering moments to audio texture synthesis. Based on the results of [50], one would expect that these scattering moments yield additional, relevant signal descriptors. However, does this yield better signal synthesis? โ€ข Generalize the results of chapter 2 to create nonwindowed scattering transforms as a cascade of wavelet transforms, nonlinearities, and L๐‘ž norms on a compact manifold. This is left to future work, and requires results from singular integral theory. 78 BIBLIOGRAPHY [1] M. Hirn, Lecture notes for mathematics of deep learning (April 2020). [2] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, Communications of the ACM 60 (6) (2017) 84โ€“90. [3] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recog- nition, in: International Conference on Learning Representations, 2015. [4] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1โ€“9. [5] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770โ€“778. [6] A. Bietti, J. Mairal, Group invariance, stability to deformations, and complexity of deep convolutional representations, The Journal of Machine Learning Research 20 (1) (2019) 876โ€“924. [7] S. Mallat, Understanding deep convolutional networks, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374 (2065) (2016) 20150203. [8] J. Zarka, F. Guth, S. Mallat, Separation and concentration in deep networks, arXiv preprint arXiv:2012.10424 (2020). [9] F. Guth, J. Zarka, S. Mallat, Phase collapse in neural networks, arXiv preprint arXiv:2110.05283 (2021). [10] F. Guth, B. Mรฉnard, G. Rochette, S. Mallat, A rainbow in deep network black boxes, arXiv preprint arXiv:2305.18512 (2023). [11] S. Mallat, Group invariant scattering, Communications on Pure and Applied Mathematics 65 (10) (2012) 1331โ€“1398. [12] J. Bruna, S. Mallat, Invariant scattering convolution networks, IEEE transactions on pattern analysis and machine intelligence 35 (8) (2013) 1872โ€“1886. [13] J. Andรฉn, S. Mallat, Deep scattering spectrum, IEEE Transactions on Signal Processing 62 (16) (2014) 4114โ€“4128. [14] J. Andรฉn, V. Lostanlen, S. Mallat, Joint timeโ€“frequency scattering, IEEE Transactions on Signal Processing 67 (14) (2019) 3704โ€“3718. [15] J. Bruna, S. Mallat, Audio texture synthesis with scattering moments (2013). arXiv:1311.0407. [16] T. Angles, S. Mallat, Generative networks as inverse problems with scattering transforms, arXiv preprint arXiv:1805.06621 (2018). 79 [17] J. Bruna, S. Mallat, Multiscale sparse microcanonical models. arxiv e-prints, arXiv preprint arXiv:1801.02013 (2018). [18] E. Oyallon, E. Belilovsky, S. Zagoruyko, Scaling the scattering transform: Deep hybrid networks, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 5618โ€“5627. [19] S. Gauthier, B. Thรฉrien, L. Alsene-Racicot, M. Chaudhary, I. Rish, E. Belilovsky, M. Eicken- berg, G. Wolf, Parametric scattering networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5749โ€“5758. [20] I. Waldspurger, Exponential decay of scattering coefficients, in: 2017 international conference on sampling theory and applications (SampTA), IEEE, 2017, pp. 143โ€“146. [21] T. Wiatowski, H. Bรถlcskei, A mathematical theory of deep convolutional neural networks for feature extraction, IEEE Transactions on Information Theory 64 (3) (2017) 1845โ€“1866. [22] M. Koller, J. GroรŸmann, U. Monich, H. Boche, Deformation stability of deep convolutional neural networks on Sobolev spaces, in: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2018, pp. 6872โ€“6876. [23] W. Czaja, W. Li, Analysis of time-frequency scattering transforms, Applied and Computational Harmonic Analysis 47 (1) (2019) 149โ€“171. [24] W. Czaja, W. Li, Rotationally invariant timeโ€“frequency scattering transforms, Journal of Fourier Analysis and Applications 26 (2020) 1โ€“23. [25] J. Mairal, P. Koniusz, Z. Harchaoui, C. Schmid, Convolutional kernel networks, Advances in neural information processing systems 27 (2014). [26] X. Cheng, X. Chen, S. Mallat, Deep haar scattering networks, Information and Inference: A Journal of the IMA 5 (2) (2016) 105โ€“133. [27] F. Gao, G. Wolf, M. Hirn, Geometric scattering for graph data analysis, in: International Conference on Machine Learning, PMLR, 2019, pp. 2122โ€“2131. [28] M. Perlmutter, F. Gao, G. Wolf, M. Hirn, Geometric wavelet scattering networks on compact Riemannian manifolds, in: Mathematical and Scientific Machine Learning, PMLR, 2020, pp. 570โ€“604. [29] F. Gama, A. Ribeiro, J. Bruna, Stability of graph scattering transforms, Advances in Neural Information Processing Systems 32 (2019). [30] D. Zou, G. Lerman, Graph convolutional neural networks via scattering, Applied and Com- putational Harmonic Analysis 49 (3) (2020) 1046โ€“1074. [31] F. Nicola, S. I. Trapasso, Stability of the scattering transform for deformations with minimal regularity, arXiv preprint arXiv:2205.11142 (2022). 80 [32] M. Hirn, S. Mallat, N. Poilvert, Wavelet scattering regression of quantum chemical energies, Multiscale Modeling & Simulation 15 (2) (2017) 827โ€“863. [33] M. Eickenberg, G. Exarchakis, M. Hirn, S. Mallat, L. Thiry, Solid harmonic wavelet scattering for predictions of molecule properties, The Journal of chemical physics 148 (24) (2018) 241732. [34] P. Sinz, M. W. Swift, X. Brumwell, J. Liu, K. J. Kim, Y. Qi, M. Hirn, Wavelet scattering networks for atomistic systems with extrapolation of material properties, The Journal of Chemical Physics 153 (8) (2020) 084109. [35] E. Cancรจs, M. Defranceschi, W. Kutzelnigg, C. Le Bris, Y. Maday, Computational quantum chemistry: A primer, in: Special Volume, Computational Chemistry, Vol. 10 of Hand- book of Numerical Analysis, Elsevier, 2003, pp. 3โ€“270. doi:https://doi.org/10.1016/S1570- 8659(03)10003-8. [36] A. Chua, M. Hirn, A. Little, On generalizations of the nonwindowed scattering transform, Applied and Computational Harmonic Analysis 68 (2024) 101597. [37] L. Grafakos, Modern fourier analysis, Vol. 250, Springer, 2009. [38] L. Grafakos, Classical fourier analysis, Vol. 249, Springer, 2014. [39] J. Garcรญa-Cuerva, J. R. De Francia, Weighted norm inequalities and related topics, Elsevier, 1985. [40] J. P. Ward, K. N. Chaudhury, M. Unser, Decay properties of riesz transforms and steerable wavelets, SIAM Journal on Imaging Sciences 6 (2) (2013) 984โ€“998. [41] G. H. Hardy, J. E. Littlewood, G. Pรณlya, Inequalities, Cambridge University Press, Cambridge, 1988. [42] E. M. Stein, T. S. Murphy, Harmonic Analysis (PMS-43): Real-Variable Methods, Orthogo- nality, and Oscillatory Integrals, Princeton University Press, 1993. [43] G.-R. Liu, Y.-C. Sheu, H.-T. Wu, Asymptotic analysis of higher-order scattering transform of gaussian processes, Electronic Journal of Probability 27 (2022) 1โ€“27. [44] G.-R. Liu, Y.-C. Sheu, H.-T. Wu, Central and noncentral limit theorems arising from the scattering transform and its neural activation generalization, SIAM Journal on Mathematical Analysis 55 (2) (2023) 1170โ€“1213. [45] E. Allys, F. Levrier, S. Zhang, C. Colling, B. Regaldo-Saint Blancard, F. Boulanger, P. Hen- nebelle, S. Mallat, The rwst, a comprehensive statistical description of the non-gaussian structures in the ism, Astronomy & Astrophysics 629 (2019) A115. [46] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, P. Vandergheynst, Geometric deep learning: going beyond euclidean data, IEEE Signal Processing Magazine 34 (4) (2017) 18โ€“42. 81 [47] J. D. McEwen, C. G. Wallis, A. N. Mavor-Parker, Scattering networks on the sphere for scalable and rotationally equivariant spherical cnns, arXiv preprint arXiv:2102.02828 (2021). [48] Y. Xiong, W. Dai, W. Fei, S. Li, H. Xiong, Anisotropic spherical scattering networks via directional spin wavelet, IEEE Transactions on Signal Processing 71 (2023) 2981โ€“2996. doi:10.1109/TSP.2023.3304410. [49] J. Chew, H. Steach, S. Viswanath, H.-T. Wu, M. Hirn, D. Needell, M. D. Vesely, S. Krish- naswamy, M. Perlmutter, The manifold scattering transform for high-dimensional point cloud data, in: Topological, Algebraic and Geometric Learning Workshops 2022, PMLR, 2022, pp. 67โ€“78. [50] J. Bruna, S. Mallat, E. Bacry, J. F. R. Muzy, Intermittent process analysis with scattering moments, Annals of Statistics 43 (1) (2015) 323โ€“351. 82