MINIMUM EMBEDDING DIMENSION FROM THE PERSPECTIVE OF PERSISTENT HOMOLOGY By Christopher Lloyd Sukhu A THESIS Michigan State University in partial fulfillment of the requirements Submitted to for the degree of Computational Mathematics, Science and Engineering – Master of Science 2019 ABSTRACT MINIMUM EMBEDDING DIMENSION FROM THE PERSPECTIVE OF PERSISTENT HOMOLOGY By Christopher Lloyd Sukhu We investigate the use of 1-dimensional persistence diagrams to determine minimum embedding dimension. In particular, we test the claim that persistence diagrams look qualitatively the same once the correct dimension is reached. In some cases, this appears to not be true so we turn to a quantitative measure, the bottleneck distance, to see if the persistence diagrams are close once the minimum embedding dimension is attained. In some instances, we see that the persistence diagrams fail to converge experimentally under the bottleneck distance. The main issue appears to be that it is difficult to explicitly characterize the persistent homology of delay embeddings of arbitrary time series. Instead we restrict to periodic time series where there exists such an explicit characterization. We apply Fourier analysis to see that that number of peaks in the frequency spectrum of a delay embedded time series is related to the minimum embedding dimension. Moreover, we give a method to filter out less significant peaks while not altering the persistent homology much, with respect to the bottleneck distance. ACKNOWLEDGEMENTS Thank you to Drs. Elizabeth Munch and Jose Perea for their respective roles in helping me complete this thesis. The images used in Figures 2.1 and 2.3 are credited to Dr. Elizabeth Munch. Thank you to Luis Polanco for helping me generate Figure 4.3. Software used include: Ripser and Hera libraries for computing persistence and bottleneck dis- tances, respectively. Julia libraries: DifferentialEquations.jl, DynamicalSystems.jl, DataFrames.jl, RecurrenceAnalysis.jl, and Plots.jl. Python libraries: scikit-tda and NumPy/SciPy. iii TABLE OF CONTENTS . . . . v 1 . . . . . . INTRODUCTION . . . 2.2.1 2.2.2 LIST OF FIGURES . 2.1 Delay embeddings . 2.2 Persistent homology . CHAPTER 1 CHAPTER 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 Takens’ embedding theorem . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Minimum embedding dimension . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 Average False Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Simplicial Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Persistent homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 CHAPTER 3 PERSISTENCE FOR MINIMUM EMBEDDING DIMENSION . . . . . . . 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 CHAPTER 4 MINIMUM EMBEDDING DIMENSION OF PERIODIC SIGNALS . . . . 23 4.1 Sliding windows and persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Minimum embedding dimension and persistence . . . . . . . . . . . . . . . . . . . 25 4.3 Application and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 CHAPTER 5 CONCLUSIONS, DISCUSSION, AND FUTURE WORK . . . . . . . . . . 33 . 34 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . 3.1 3.2 Method . . 3.3 Results . . . 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LIST OF FIGURES Figure 2.1: Delay embedding of a noisy cosine wave into R3 which captures the underly- ing periodic structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.2: Lorenz attractor (left) and reconstructed attractor (right). . . . . . . . . . . . . 4 5 Figure 2.3: Pictured (left) is a point cloud in R2 at a particular distance rj just over the value of 0.1 in the Rips filtration, with purple shaded disks of radius r showing which edges will be included in the Rips complex. Pictured (right) is the one dimensional persistence diagram showing the 1-cycle that is born at this particular point in the filtration, which we see will die at filtration value around 0.5, when the purple disks will be large enough to fill in the center of the point cloud and therefore connect the corresponding vertices, killing the 1-cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Figure 3.1: E1 sharply stops increasing after dimension 2, which is considered to be the minimum embedding dimension. E2 is not always approximately 1 so we consider the embedded time series to be deterministic. . . . . . . . . . . . . . . 17 Figure 3.2: All of these bottleneck distance values are fairly small which is possibly consistent with the minimum embedding dimension being 2. . . . . . . . . . . 18 Figure 3.3: E1 values quickly stop increasing after dimension 3, which is exactly what we would expect for a standard torus. E2 is not always approximately 1 so we consider the embedded time series to be deterministic. . . . . . . . . . . . . 19 Figure 3.4: We see that the bottleneck distance for the one dimensional diagrams have the largest jump when going to embedding dimension 3 and afterwards being fairly small, suggesting a minimum embedding dimension of 3. . . . . . . . . . 19 Figure 3.5: NaN values fail to appear in E1 plot. The minimum embedding dimension appears to be 4 for this deterministic time series. . . . . . . . . . . . . . . . . . 20 Figure 3.6: We do not observe any indication that the bottleneck distances are stabilizing after embedding dimension 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Figure 3.7: NaN values fail to appear in E1 plot. For this experimental time series, the results are less clear. The minimum embedding dimension is suggested to be 7 in [2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 . . . . . . . . Figure 3.8: The bottleneck distances are possibly starting to show interesting behavior around dimension 7 but this is far from clear. . . . . . . . . . . . . . . . . . . . 21 v Figure 3.9: Lorenz attractor 1-dimensional persistence diagrams for embedding dimen- sions 3 and 4. Note that the diagonal is not pictured. The main focus should be on the two high persistence points, corresponding to the figure 8 shape of the Lorenz attractor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 . . Figure 4.1: Pictured (above) is the sampled noisy signal. The power spectrum (below) appears to have many small peaks. . . . . . . . . . . . . . . . . . . . . . . . . 30 Figure 4.2: The truncated power spectrum (below) corresponds to a cleaner signal pic- tured (above). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Figure 4.3: The persistence diagrams are superimposed to show the matching done to compute the bottleneck distance. Here dgm1 corresponds to the original signal and dgm2 corresponds to the truncated signal. . . . . . . . . . . . . . . 32 vi CHAPTER 1 INTRODUCTION Studying physical processes or phenomena through experimental time series is ubiquitous in science. To understand the underlying dynamical behavior from these time series we turn to the Takens’ embedding theorem [17] which allows us to reconstruct the underlying dynamics using delay embeddings. However, the delay embeddings require parameter choices, namely the delay and embedding dimension. The choice of embedding dimension is what this thesis chiefly addresses. Tracking topological invariants like homology for determining the embedding dimension has been done before [11]. We build upon this by considering multi-scale topological quantities, namely persistent homology. We take a different perspective in this thesis by considering the geometry of the embedded signal to be important as well. Periodic signals correspond to circular embeddings so we exploit the connection to 1-dimensional persistence to be able to choose the embedding dimension. This is quite different but still has some of the same flavor as the traditional methods involving nearest neighbors [2, 8]. This thesis is divided into three main chapters. In Chapter 2, we develop some background pertaining to delay embeddings and persistent homology. In Chapter 3, we investigate the use of 1-dimensional persistence diagrams to determine mini- mum embedding dimension. This was first considered in [6, 9] where it was claimed that persistence diagrams look qualitatively the same once the minimum embedding dimension is reached. In some cases, this appears to not be true so we turn to a quantitative measure, the bottleneck distance, to see if the persistence diagrams are close once the minimum embedding dimension is attained. In some instances, we see that the persistence diagrams fail to converge experimentally under the bottleneck distance. The main issue appears to be that it is difficult to explicitly characterize the persistent homology of delay embeddings of arbitrary time series. Therefore in Chapter 4, we restrict to periodic time series where there exists such an explicit characterization [14]. We apply Fourier analysis to see that that number of peaks in the frequency 1 spectrum of a delay embedded time series is related to the minimum embedding dimension. Moreover, we give a method to filter out less significant peaks while not altering the persistent homology much, with respect to the bottleneck distance. 2 CHAPTER 2 BACKGROUND 2.1 Delay embeddings We first state Takens’ embedding theorem [7, 17] and explain its interpretation and consequences in the context of time series analysis and signals processing. We will often reformulate the notation and terminology involved in delay embeddings to maintain consistency with the primary sources while making certain things clearer and more convenient in context. The only disadvantage is a small amount of redundancy which stems from the abundance of domain-specific applications motivated by Takens’ embedding theorem. Then we discuss the minimum embedding dimension problem. 2.1.1 Takens’ embedding theorem First, we explain some of the notation and terminology. The functions y are called measurement functions. The space Diff2(M) is the subspace of diffeomorphisms in Cr(M, M), also called the diffeomorphism group (which underpins the theorem’s terse notation). And “generic” means open and dense with respect to the C1 topology. We use the term “delay embedding" to refer to the process of taking repeated measurements implied by Takens’ embedding theorem. See Figure 2.1. Theorem 2.1.1 (Takens) Let M be a compact manifold of dimension m. For pairs (φ, y), with φ ∈ Diff2(M), y ∈ C2(M, R), it is a generic property that the map Φ(φ,y) : M → R2m+1, defined by Φ(φ,y)(x) = (y(x), y(φ(x)), . . . , y(φ2m(x))) is an embedding. We often make the assumption that real-world data lies on a lower dimensional manifold [3]. This assumption combined with Takens’ embedding theorem means that we can take a sequence 3 Figure 2.1: Delay embedding of a noisy cosine wave into R3 which captures the underlying periodic structure. of measurements, often indexed by time, to reconstruct a dynamical system, i.e. any deterministic process that evolves with time. All we need is a sufficiently nice measurement function of some observable of the dynamical system and enough measurements. The number of measurements required is twice the dimension of the dynamical system’s true state space plus one [7]. We note that this dimension is usually not known in practice, but we take consolation that it is at least finite. To sum up, we can study any deterministic process by a finite time series of measurements of a single observable. This is truly remarkable, so we take care to state some caveats and hidden assumptions. Takens’ embedding theorem assumes no experimental noise and the ability to make measure- ments up to arbitrary precision. Moreover, while the number of measurements needed is finite, in practice it might still be very large for dynamical systems with high-dimensional state spaces, especially if there is noise involved. Another hidden assumption is that we need evenly spaced measurements which is often not possible in some applications. We note that the time interval, or delay, between measurements is not specified by the theorem as well. Despite these limitations, delay embedding has been widely employed as a tool in time series analysis and signals processing. See [1] for a survey of delay embeddings as well as specific applications to EEG analysis. For a real-world application, there are a number of parameter choices that go into delay 4 Figure 2.2: Lorenz attractor (left) and reconstructed attractor (right). embeddings. One is the observable to measure and yet another is the measurement function to use. These are typically chosen with regards to whatever is available and convenient. The main two parameters that must be specified are the number of measurements and the delay. We refer to the number of measurements as the “embedding dimension” and endeavor to find the smallest embedding dimension; this is the minimum embedding dimension problem which is the central problem addressed by this thesis. Before we proceed, we mention that there are a number of reformulations and extensions to Takens’ embedding theorem. See [16] for the following (paraphrased) version involving fractal sets: A dynamical system with an underlying attractor A of dimension dA needs an embedding dimension greater than twice dA to be reconstructed faithfully. See Figure 2.2 for the Lorenz attractor with parameters σ = 16, ρ = 40, and β = 4 for which dA = 2.06 ± 0.01 [10]. We can see that the minimum embedding dimension is 3. According to the embedding theorem, an embedding dimension of 5 is sufficient for a good reconstruction, larger than the minimum embedding dimension. 2.1.2 Minimum embedding dimension The minimum dimension required for an embedding is often smaller than the bound given in Takens’ embedding theorem. Being able to determine the minimum embedding dimension for a particular dynamical system is interesting in its own right. However, the main practical benefit is that having a smaller dimension facilitates further computations on a reconstructed dynamical 5 system. Before we proceed, we acknowledge that the terms “minimum embedding dimension” and “embedding dimension” are potentially ambiguous or misleading. When performing a delay embedding with a particular “embedding dimension”, the result may fail to be an embedding. A better term, perhaps, would be “reconstruction dimension” or “attempted embedding dimension”, however we stick to the terms commonly used in the literature with the hope that reader will keep this warning in mind [2, 7, 14]. A more serious issue is what constitutes a “minimum embedding dimension.” The Takens’ embedding theorem and its related variants [16] give theoretically sufficient (minimum) embedding dimensions. Since we seek even smaller dimensions for practical benefits, which is often possible for specific dynamical systems, we are content to call whichever dimension we settle on according to our analysis the “minimum embedding dimension” regardless of whether we can theoretically justify this or not. And when designing a method which chooses or optimizes the embedding dimension according to a particular heuristic or quantity of interest, we call the result the “minimum embedding” dimension. 2.1.3 Average False Nearest Neighbors There exist many methods in the literature attempting to determine the minimum embedding dimension. One survey of these methods can be found in [1]. We now discuss the popular method of Average False Nearest Neighbors (AFNN) [2] which we will use for comparison purposes later. AFNN builds upon the method of False Nearest Neighbors (FNN) [8] which rests on the idea that points close together in a dimension too low to be an embedding will move further apart if the dimension increases. Let {xi : i = 1, . . . , N} be a collection of sequential measurements, i.e. a time series. The reconstructed time-delay vectors are {yi(d) = (xi, xi+τ, . . . , xi+(d−1)τ : i = 1, . . . , N − (d − 1)τ}, for a fixed delay τ, determined in advance. Next we define a quantity that relates distances in one 6 embedding dimension to the next a(i, d) = ||yi(d + 1) − yn(i,d)(d + 1)||∞ ||yi(d) − yn(i,d)(d)||∞ where n(i, d) is the index of the nearest neighbor of yi(d) in dimension d (if the denominator were to be zero, we choose the next nearest neighbor). AFNN builds on FNN by considering the arithmetic mean N−dτ i=1 E(d) = 1 N − dτ a(i, d) and defining E1(d) = E(d + 1) E(d) to track changes as the embedding dimension increases. If E1(d) stops changing for a sequence of dimensions d = 1,2, . . . after dk, we say dk + 1 is the minimum embedding dimension. In practice, it may be difficult to determine if E1(d) has stopped changing or is merely increasing slowly. Indeed, when the time series is random, E1(d) might fail to converge in any meaningful sense. So we define N−dτ |xi+dτ − xn(i,d)+dτ| E∗(d) = and 1 N − dτ i=1 E∗(d + 1) E∗(d) E2(d) = where we expect for random data E2(d) will always equal 1, since the value of the data is independent across time. For a time series which is deterministic, this will not be the case, so we can use E2(d) as heuristic for distinguishing time series arising from random versus deterministic processes. The AFNN method consists of computing both E1(d) and E2(d) across a range of dimensions d and analyzing the results qualitatively. 2.2 Persistent homology We develop the basics of simplicial homology and persistent homology needed for the remainder of this thesis, and therefore not in full generality or detail. The selection of topics closely matches 7 the background found in [14]. For more on simplicial homology and persistent homology see [12] and [5, 13], respectively. We fix a prime p and the finite field Fp with p elements henceforth. 2.2.1 Simplicial Homology We begin with the notion of simplices, which can be thought of as the appropriate generalization of triangles or tetrahedrons to arbitrary dimensions. Let {v0, . . . , vk} be a collection of k + 1 points in Rk which are affinely independent, i.e. v1 − v0, . . . , vk − v0 are linearly independent. Definition 2.2.1 (Geometric k-simplex) A k-simplex σ spanned by {v0, . . . , vk} is the set of points x ∈ Rk such that k i=0 k i=0 tivi where That is, σ is the convex hull of {v0, . . . , vk}. x = ti = 1 and ti ≥ 0 ∀i. We often use the notation σ[v0, . . . , vk] to denote a simplex spanned by those points. Examples of k-simplices include 0-simplices which are points, 1-simplices which are line segments, 2- simplices which are triangles, and 3-simplices which are tetrahedrons. The points {v0, . . . , vk} are called the vertices of a k-simplex σ. The dimension of a k-simplex σ is simply k. And a simplex spanned by a subset of {v0, . . . , vk} is called a face of σ. Definition 2.2.2 (Geometric simplicial complex) K is called a simplicial complex if it is a set of simplices such that (i) Every face of a simplex in K is also in K. (ii) The intersection of any two simplices in K is disjoint or a face of both. A subcomplex is then a subset of the simplices of a simplicial complex K that contains the faces of all its elements. 8 Typically, we are not interested in the particular embedding of a simplicial complex into Euclidean space and just want a purely combinatorial structure for computations. Hence, we consider an abstract simplicial complex that has analogous properties to the definitions above. Definition 2.2.3 (Abstract simplicial complex) An abstract simplicial complex K is a finite collec- tion of sets Σ such that σ ∈ Σ and ν ⊆ σ implies ν ∈ Σ. The terminology is analogous to before: The sets σ in Σ are simplices. The dimension of a simplex σ is the cardinality of σ minus one. And a non-empty subset ν of σ is called a face of σ. A subcomplex is then a subset of K which is also an abstract simplicial complex. We are content to use the same notation and terminology because of the following. Starting with a geometric simplicial complex K(cid:48), we can obtain an abstract simplicial complex K by only keeping the vertices of K(cid:48). We then call K(cid:48) a geometric realization of K. See [5] for details in the proof of the following theorem. Theorem 2.2.4 (Geometric realization theorem) Every abstract simplicial complex of dimension n has a geometric realization in R2n+1. Hence, we will just say simplicial complex to refer to an abstract simplicial complex. When considering sums of simplices for abstract simplicial complexes we do so only formally so that the operations make sense algebraically, without regard to the underlying geometric meaning. Definition 2.2.5 (k-chains) We say c is a k-chain if it is a finite formal sum c = γj σj with γj ∈ Fp and each σj is a k-simplex in K. j We let Ck(K) be the vector space over the field Fp generated by the k-dimensional simplices of K, i.e. Ck(K) is a vector space consisting of k-chains. 9 Definition 2.2.6 (Boundary of k-simplex) The boundary ∂ of a k-simplex σ is the alternating formal sum of k-1 dimensional faces of σ. We denote this as follows (−1)iσ[v0, . . . ,(cid:98)vi, . . . , vk] k i=0 ∂(σ) = where the hat symbol over a vertex means that vertex is not present in the spanning set. Definition 2.2.7 (Boundary of k-chain) The boundary ∂ of a k-chain c is defined by linearly extending ∂ as follows δ(c) = γj ∂(σj). Remark 2.2.8 One can verify that ∂ ◦ ∂ = ∂2 = 0. j Definition 2.2.9 (k-cycles) A k-chain c with ∂(c) = 0 is called a k-cycle. We denote the space of all k-cycles as Zk which is a subspace of Ck. Definition 2.2.10 (k-boundaries) If a k-chain c is the boundary of a (k+1)-chain then it is called a k-boundary. We denote the space of all k-boundaries as Bk which is a subspace of Ck. By Fact 2.2.8 we have that Bk ⊂ Zk so we define the following. Definition 2.2.11 (Simplicial homology) The k-th simplicial homology group of K with Fp-coefficients is defined as the quotient Hk(K) = Zk/Bk. The ranks of the simplicial homology groups, known as Betti numbers, are of particular interest to us. Definition 2.2.12 (Betti numbers) The rank of Hk(K) is called the k-th mod p Betti number of K which we denote βk(K). The prime p is usually suppressed from the notation because it will be clear in context. 10 2.2.2 Persistent homology We now turn to persistent homology which we develop on top of simplicial homology. Definition 2.2.13 (Filtration) A filtration of a simplicial complex K is a nested sequence of sub- complexes ∅ = K0 ⊆ K1 ⊆ · · · ⊆ Km = K. Inclusions Ki ⊆ Kj for i ≤ j in the filtration induce homomorphisms f i,j k : Hk(Ki) → Hk(Kj) for each dimension k. We obtain the following sequence of homorphisms of homology groups 0 = Hk(K0) → Hk(K1) → · · · → Hk(Km) = Hk(K) where the arrows are f i,j k for 0 ≤ i ≤ j ≤ m, Persistent homology refers to the images of these induced homomorphisms. Definition 2.2.14 (Persistent homology) The k-th persistent homology groups of a simplicial com- plex K are the images of the homomorphisms induced by a filtration of K, Hi,j k . And the k k-th persistent Betti numbers are the ranks of the homology groups, β i,j k = im f i,j = rank(Hi,j k ). : Hk(Kb−1) → Hk(Kb). : Hk(Kb−1) → Hk(Kd−1) We say a homology class α is born at Kb if it is not in the image of f b−1,b If α is born at Kb, then we say it dies entering Kd if the image of f b−1,d−1 does not contain the image of α but the image of f b−1,d k k : Hk(Kb−1) → Hk(Kd) does. k Definition 2.2.15 (Persistence diagrams) A persistence diagram of homological dimension k, or k dimensional persistence diagram, denoted dgm(k), is a multiset of points (b,d) for every k- dimensional homology class that is born at Kb and dies entering Kd. We also adjoin the points on the diagonal ∆ = {(x, x) : x ≥ 0} to dgm(k) each with countably infinite multiplicity. If k is clear from context we just say dgm. We refer to d − b as the lifetime or persistence of α. Definition 2.2.16 (Maximum persistence) Let (b, d) ∈ dgm. Define pers(b, d) = d−b if (b, d) ∈ R2 and as ∞ otherwise. Maximum persistence, denoted mp(dgm), is defined to be mp(dgm) = max(b,d)∈dgmpers(b, d). 11 When comparing two diagrams, we use the notation dgm1 and dgm2 for convenience as in the following definition. Definition 2.2.17 (Bottleneck distance) The bottleneck distance denoted dB is a metric on persis- tence diagrams defined as follows dB(dgm1, dgm2) = inf φ where φ is a bijection dgm1 → dgm2. ||x − φ(x)||∞ sup x∈dgm1 We always have these bijections, despite potentially unequal numbers of off-diagonal points, because we can identify points with the diagonal. We now turn to a particular kind of filtration/complex called the Vietoris-Rips filtration/complex or simply the Rips filtration/complex which we will use later. First we let X ⊂ Rn be a compact set, such as a finite point cloud. Definition 2.2.18 (Rips complex) Fix r ≥ 0. The Rips complex Rr(X) is the simplicial complex whose vertices are the points of X and whose k-simplices consist of the k+1-tuples of points of X {x0, . . . , xk} with pairwise distances ||xi − xj|| ≤ r for all i, j with 0 ≤ i < j ≤ k. Note that higher dimensional simplices are added to the Rips filtration if and only if all its edges (1-simplices) are. Also, we can adapt this definition for any metric space. Definition 2.2.19 (Rips filtration) Let 0 = r0 ≤ r1 ≤ · · · ≤ rm. If r ≤ s, then Rr(X) ⊆ Rs(X) so we obtain the following filtration, called the Rips filtration, from the Rips complexes X = R0 ⊆ R1 ⊆ · · · ⊆ Rm where Rj = Rrj(X) and Rm is the largest simplicial complex with X as its vertex set. It is sufficient to consider only a finite set of values of rj to capture all homological changes in our particular case. 12 Figure 2.3: Pictured (left) is a point cloud in R2 at a particular distance rj just over the value of 0.1 in the Rips filtration, with purple shaded disks of radius r showing which edges will be included in the Rips complex. Pictured (right) is the one dimensional persistence diagram showing the 1-cycle that is born at this particular point in the filtration, which we see will die at filtration value around 0.5, when the purple disks will be large enough to fill in the center of the point cloud and therefore connect the corresponding vertices, killing the 1-cycle. The persistence diagram of a point cloud X will be denoted dgm(X) where we consider the persistent homology induced by the Rips filtration on X. See Figure 2.3 for an illustration. Since we intend to apply persistence to point clouds from experimental data, we accept that there will be measurement error and therefore greatly rely on the following stability theorem. See [4, 13] for more details. First, we define the Hausdorff distance for point clouds. Definition 2.2.20 (Hausdorff distance) The Hausdorff distance dH between two point X,Y ⊆ Rn clouds is dH(X,Y) = max{sup x∈X inf y∈Y ||x − y||, sup y∈Y inf x∈X ||x − y||}. Theorem 2.2.21 (Stability of persistence diagrams) dB(dgm(X), dgm(Y) ≤ 2dH(X,Y)). 13 Let X be the true point cloud of some experimental process and Y be the measured point cloud with some measurement error so that dH(X,Y) ≤ . Then, Theorem 2.2.21 says that the bottleneck distance is bounded above by 2, so the error in persistence is not much worse than the measurement error. 14 CHAPTER 3 PERSISTENCE FOR MINIMUM EMBEDDING DIMENSION 3.1 Introduction We explore the idea of tracking the persistent homology of delay embedded systems across a range of embedding dimensions to see if this has any utility in the minimum embedding dimension problem. Tracking topological features for delay embeddings has been done before for homology [11]. More recently, this has been tried using persistent homology [6, 9] for the minimum embedding dimension problem. The claim is that, intuitively, one might expect that for increasing embedding dimension the persistent homology of a reconstructed system might stabilize past the minimum embedding dimension, and therefore the persistence diagrams should converge in some sense. Or at the very least the diagrams should look qualitatively similar. To push this idea further, we compute persistence diagrams from some actual examples and test for convergence experimentally using the bottleneck metric. 3.2 Method We compute delay embeddings for a range of increasing dimensions as in [2] where AFNN is used to determine the minimum embedding dimension. The zero and one dimensional persistence diagrams are computed with F2 coefficients and we plot the bottleneck distances between adjacent dimensions to check for any stabilizing behavior. We compare against AFNN as the ground truth. For the E1 and E2 plots, if the dimension stops changing at index d we consider d to be the minimum embedding dimension. For the bottleneck distance plots, the x-axis is the higher of the two adjacent dimensions and the value on the y-axis represents the change in persistence going up a dimension. If the values are small after index d, we consider this to be an indication that d is the minimum embedding dimension. 15 3.3 Results We consider four data sets taken from [2]. These are data sets from the first coordinate of a Henon attractor, torus (i.e. sum of two non-commensurate sines), sum of four iterated sine maps, and the Santa Fe competition [18]. We reproduce the AFNN E1 and E2 plots for each example so there may be small differences with the original plots in [2]. See Figures 3.1-3.8 for the relevant figures. For the Henon attractor, E1 sharply stops increasing after dimension 2, which is considered to be the minimum embedding dimension. E2 is not always approximately 1 so we consider the embedded time series to be deterministic. The bottleneck distances are all fairly small which is possibly consistent with the minimum embedding dimension being 2. For the torus, the E1 values quickly stop increasing after dimension 3, which is exactly what we would expect for a standard torus. E2 is not always approximately 1 so we consider the embedded time series to be deterministic. We see that the bottleneck distance for the one dimensional diagrams have the largest jump when going to embedding dimension 3 and afterwards being fairly small, suggesting a minimum embedding dimension of 3. The minimum embedding dimension appears to be 4 for the sum of four iterated sine maps according to E1 and E2 suggests it is deterministic as well. NaN values fail to appear in the E1 plot which is why it may look different from the corresponding figure in [2]. NaN values also fail to appear in E1 plot for the Santa Fe time series. For this experimental time series, the results are less clear. The minimum embedding dimension is suggested to be 7 in [2]. The bottleneck distances are possibly starting to show interesting behavior around dimension 7 but this is far from clear. 3.4 Discussion Unfortunately, we do not really see any consistent convergent behavior with respect to the bottleneck distances. So it is not clear that tracking the persistence diagrams across embedding dimensions will be useful beyond the Lorenz attractor examples in [6, 9]. 16 Figure 3.1: E1 sharply stops increasing after dimension 2, which is considered to be the minimum embedding dimension. E2 is not always approximately 1 so we consider the embedded time series to be deterministic. There are two potential issues here: One is that since the average distance between points in Rm increases as the embedding dimension m increases, we actually expect the bottleneck distances to diverge as the dimension goes to infinity. The other is that not all systems will have a few prominent off-diagonal points in their zero or one dimensional persistence diagrams that will be reflected well in the bottleneck distance, so they might not even look qualitatively similar. For the first issue, we can try re-scaling the persistence diagrams by multiplying the birth and death points by 1/√ m. We try this with the Lorenz attractor from Figure 2.2. See Figure 3.9. We expect two 1-dimensional features from the reconstructed Lorenz attractor, which corresponds to what we computed. The persistence diagrams look qualitatively similar from embedding dimension 3 to 4, indicating a minimum embedding dimension of 3, which is what we expect. Normalizing shows clearer convergence as the persistence points are artificially getting larger due to increased 17 Figure 3.2: All of these bottleneck distance values are fairly small which is possibly consistent with the minimum embedding dimension being 2. average distance between points in higher dimensions. However, there is seemingly no evidence that this generalizes in a way that might be useful for determining the minimum embedding dimension of other systems. The second issue is the main problem. It is not clear what we can say theoretically about the persistent homology of delay embeddings of arbitrary time series or signals. This motivates the next part of this thesis where we restrict to the periodic case and focus on maximizing the connection to persistence. 18 Figure 3.3: E1 values quickly stop increasing after dimension 3, which is exactly what we would expect for a standard torus. E2 is not always approximately 1 so we consider the embedded time series to be deterministic. Figure 3.4: We see that the bottleneck distance for the one dimensional diagrams have the largest jump when going to embedding dimension 3 and afterwards being fairly small, suggesting a minimum embedding dimension of 3. 19 Figure 3.5: NaN values fail to appear in E1 plot. The minimum embedding dimension appears to be 4 for this deterministic time series. Figure 3.6: We do not observe any indication that the bottleneck distances are stabilizing after embedding dimension 4. 20 Figure 3.7: NaN values fail to appear in E1 plot. For this experimental time series, the results are less clear. The minimum embedding dimension is suggested to be 7 in [2]. Figure 3.8: The bottleneck distances are possibly starting to show interesting behavior around dimension 7 but this is far from clear. 21 Figure 3.9: Lorenz attractor 1-dimensional persistence diagrams for embedding dimensions 3 and 4. Note that the diagonal is not pictured. The main focus should be on the two high persistence points, corresponding to the figure 8 shape of the Lorenz attractor. 22 CHAPTER 4 MINIMUM EMBEDDING DIMENSION OF PERIODIC SIGNALS 4.1 Sliding windows and persistence We now summarize some of the main points of [14] which forms the basis of the next section. In this chapter, we only consider 1-dimensional persistence. Maximum persistence of the 1- dimensional persistence diagram is used as a measure of roundness corresponding to the shape of a delay embedding of a periodic signal. The following notation and terminology for delay embeddings will be used for the remainder of the chapter. Definition 4.1.1 (Sliding window embedding) Suppose that f is a function defined on an interval of R. Then choose an embedding dimension M ∈ Z≥0 and delay τ ∈ R>0. The sliding window embedding of f based at t ∈ R into RM+1 is the point   . SWM,τ f(t) = f(t) f(t + τ) ... f(t + Mτ) For a range of values t we get the sliding window point cloud for f. The quantity Mτ is called the window size. Let C(X,Y) denote the set of continuous functions from X to Y equipped with the sup norm. Let T = R/2πZ. The sliding window embedding induces a mapping SWM,τ : C(T, R) → C(T, RM+1). Proposition 4.1.2 SWM,τ : C(T, R) → C(T, RM+1) is a bounded linear operator with norm ||SWM,τ|| ≤ √ M + 1. 23 Since we are considering periodic functions f it makes sense to approximate f using Fourier series. See [15] for an accessible introduction to Fourier series. Let f(t) = SN f(t) + RN( f(t)). The first term is the N-truncated Fourier series = (cid:98)f(n)eint 1 2π 0 f(t)e−intdt. and RN f is the remainder. The n-th Fourier coefficient is N Þ 2π n=−N SN f(t) = (cid:98)f(n) = The following theorem, reprinted from [14] with a minor revision, tells us how SWM,τ behaves with respect to Fourier series approximations of functions f ∈ L2(T). Theorem 4.1.3 (Approximation) LetT ⊂ T, f ∈ Ck(T, R), X = SWM,τ f(T), andY = SWM,τSN f(T). Then, (i) (ii) (iii) dH(X,Y) ≤ √ 2 · ||RN f (k)||2 · (N + 1)1−2k 2k − 1 √ · M + 1, |mp(dgm(X)) − mp(dgm(Y))| ≤ 2dB(dgm(X), dgm(Y)), dB(dgm(X), dgm(Y)) ≤ 2 √ 2 · ||RN f (k)||2 · (N + 1)1−2k 2k − 1 √ · M + 1. As we take more terms in the Fourier series approximation, the remainder goes to zero, and the persistent homology of the approximation approaches that of the true sliding window point cloud. Another result is that the sliding window point cloud has maximum persistence when the window size Mτ is proportional to the underlying frequency 2π L , with proportionality constant M+1. Here L is the period of f , specifically f(t + 2π L ) = f(t). A lower bound on maximum M persistence is also derived and is shown to depend on the field of coefficients used to compute persistent homology. 24 The approach to delay embedding in [14] is markedly different from what is normally done in the literature. Delay embeddings of periodic signals f have a clear geometric interpretation which has an explicit connection to 1-dimensional persistent homology. SWM,τ f lives on an M- dimensional torus embedded in RM+1, so we expect at least one prominent off-diagonal point in the 1-dimensional persistence diagram and choose parameters to maximize persistence. 4.2 Minimum embedding dimension and persistence We now describe a method for choosing the minimum embedding dimension for periodic signals in a way that emphasizes maximum persistence. Let f ∈ L2(T) and f ∈ Ck(T). Then f(t) = ∞ n=−∞ (cid:98)f(n)eint since the Fourier series converges by the Riemann-Lebesgue Lemma [15]. Now consider the power spectrum S f f (n) = |(cid:98)f(n)|2. For a periodic signal, we can expect a few significant peaks in the power spectrum. Say there are d peaks, then the minimum embedding dimension should be M = 2d to lose no information [14]. What if we wish to discard some of the peaks in the power spectrum? Say we regard smaller values as noise. Fix  > 0. We want to construct a new function g that only has peaks in the power spectrum above  ∞ with gn = (cid:98)f(n) if |(cid:98)f(n)|2 >  and gn = 0 otherwise. g(t) = n=−∞ gneint The decay of the Fourier coefficients is related to the smoothness of the function. See [15] for more details. In particular we have, So if we choose |(cid:98)f(n)| ≤ supt | f (k)(t)| (cid:34)supt | f (k)(t)|1/k |n|k . (cid:35) + 1 N = 1/2k 25 where [·] is the integer part of the number, then g is supported on the interval [−N, N]. Define the set J ⊂ [−N, N] ∩ Z where j ∈ J if and only if |(cid:98)f(j)|2 > . Then we can rewrite g as g(t) = j∈J (cid:98)f(j)ei jt . We can think of g as a truncation of f . Fixing  determines how large N should be and which coefficients should be included. And we also get the embedding dimension M = 2|J|. We now want to evaluate how much of the energy, i.e. the L2 norm , of the signal is maintained when constructing g. The following two identities will be useful to this end. Proposition 4.2.1 If f ∈ L2(T), || f ||2 2 = ||SN f ||2 2 + ||RN||2 2 . Proposition 4.2.2 (Parseval’s identity) If f ∈ L2(T), 2 = n∈Z || f ||2 |(cid:98)f(n)|2. Using Propositions 4.2.1 and 4.2.2 we can compute all of the terms in ||g||2 || f ||2 = except ||RN f ||2 2 which we must estimate. ||g||2 2 + ||RN f ||2 2)1/2 (||SN f ||2 Proposition 4.2.3 (Remainder estimate) The L2 norm of the remainder is bounded as follows . ||RN f ||2 2 ≤ 2(sup t | f (k)(t)|)2(N + 1)1−2k 2k − 1 26 Proof. ||RN f ||2 |(cid:98)f(n)|2 2 =  ≤  n>|N| (supt | f (k)(t)|)2 1 n2k |n|2k | f (k)(t)|)2 · 2 ∞ | f (k)(t)|)2Þ ∞ n=N+1 1 x2k dx N+1 | f (k)(t)|)2(N + 1)1−2k 2k − 1 . n>|N| ≤ (sup t ≤ 2(sup t ≤ 2(sup t (cid:3) least Let B = 2(supt | f (k)(t)|)2(N+1)1−2k 2k−1 ||g||2 2 + ||RN f ||2 (||SN f ||2 ||g||2 || f ||2 = ≥ ||g||2 2 + B)1/2 (||SN f ||2 2)1/2 × 100%. . We can now estimate the percentage of the energy is at If we wish to maintain a certain percentage of the energy of the signal, we can tune  accordingly. Since the goal was to maximize persistence we want to bound the bottleneck distance between the persistence diagrams associated to f and g. Theorem 4.2.4 Let T ⊂ T, f ∈ Ck(T, R), X = SWM,τ f(T), and Y = SWM,τg(T). Then, (N + 1)1−2k √ (2N − |J|) + √ 2||RN f (k)||2 dB(dgm(X), dgm(Y)) ≤ 2 2k − 1 (cid:33) . | + |RN f(t)| √ 2||RN f (k)||2 (N + 1)1−2k 2k − 1 27 Proof. M + 1 (cid:32)√ | f(t) − g(t)| = | (cid:98)f(n)eint| ≤ |  ≤ √ n∈[−N,N], n(cid:60)J (2N − |J|) + n(cid:60)J √ By Proposition 4.1.2, ||SWM,τ f(t) − SWM,τg(t)|| ≤ ≤ √ √ (cid:32)√ M + 1|| f(t) − g(t)||∞ (2N − |J|) + M + 1 (cid:33) √ 2||RN f (k)||2 (N + 1)1−2k 2k − 1 √ M + 1(√ (2N − |J|) + Choosing δ > approach its lower bound and using Theorem 2.2.21 we get the desired result. (cid:3) 2k−1 √ 2||RN f (k)||2 (N+1)1−2k ) =⇒ dH(X,Y) ≤ δ. Letting δ We now have all the ingredients to illustrate the method on numeric data which is covered in the next section. 4.3 Application and discussion We illustrate the method on a synthetic data set generated from Re(5 n=1(cid:98)f(n)e2int) which is 2-periodic on the interval t ∈ [0,2π). The coefficients (cid:98)f(n) are chosen uniformly randomly from the unit disk in C. We sample the signal at 50 evenly spaced time points on this interval. Gaussian noise centered at 0 with standard deviation 25% of signal amplitude is added to the sampled signal. This synthetic data set is similar to one found in [14]. Cubic spline interpolation is then used on the signal to get a continuous function with two continuous derivatives. This allows us to match the theory of the previous section with the application. In particular, f ∈ C2 so we use k = 2 in the appropriate bounds. Also, when we are performing the delay embedding we may want to choose the delay τ small enough to require evaluating the function at time points not present in the sampling. Cubic spline interpolation allows us to sidestep this problem. Fixing  = 10, a user defined parameter, we compute that we need N = 12 terms in the Fourier series approximation. See Figure 4.1 and 4.2 for the time series and power spectrum of the sampled signal and truncated signal, respectively. The truncated signal is estimated to maintain at least 93% of the energy of the original signal. And the embedding dimension is determined to be M = 10. We 28 2π 120(10+1) for a small enough delay. The point clouds are then centered and normalized choose τ = as in [14]. The persistence diagrams are then computed for F11 coefficients along with the bottleneck distance dB(dgm1, dgm2) = 0.112062. See Figure 4.3. The bound on dB(dgm1, dgm2) implied by √ Theorem 4.2.4 is rather generous since 2 10 + 1 is already much larger than 0.112062. In any case, what we have shown is that it is possible to filter out less significant peaks from a signal’s power spectrum without altering the persistent homology much, with respect to the bottleneck distance. In doing so, we have also chosen a minimum embedding dimension. 29 Figure 4.1: Pictured (above) is the sampled noisy signal. The power spectrum (below) appears to have many small peaks. 30 Figure 4.2: The truncated power spectrum (below) corresponds to a cleaner signal pictured (above). 31 Figure 4.3: The persistence diagrams are superimposed to show the matching done to compute the bottleneck distance. Here dgm1 corresponds to the original signal and dgm2 corresponds to the truncated signal. 32 CHAPTER 5 CONCLUSIONS, DISCUSSION, AND FUTURE WORK The main conclusion of this thesis is that we can use persistent homology to study time series if we have a geometric understanding of their delay embeddings. Takens’ theorem gives us a topological guarantee which, while helpful, is not sufficient for every purpose. Periodic time series have a nice, geometric form when embedded so we can use persistent homology successfully. The theoretical results in Chapter 4 can be applied to more data sets for testing. While the theory is interesting in its own right, we would like further evidence that the bounds derived are useful in practice. Additionally, we could consider alternative ways of determining peaks in the power spectrum such as peak-finding algorithms or considering statistical properties of the power spectrum. It is still unknown what we can say theoretically about the persistent homology of time series which are not periodic or quasi-periodic. This is perhaps the most interesting idea for future work. 33 BIBLIOGRAPHY 34 BIBLIOGRAPHY [1] Galka Andreas. Topics in nonlinear time series analysis, with implications for EEG analysis, volume 14. World Scientific, 2000. [2] Liangyue Cao. Practical method for determining the minimum embedding dimension of a scalar time series. Physica D: Nonlinear Phenomena, 110(1-2):43–50, 1997. [3] Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308, 2009. [4] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence diagrams. Discrete & Computational Geometry, 37(1):103–120, Jan 2007. [5] Herbert Edelsbrunner and John Harer. Computational topology: an introduction. American Mathematical Soc., 2010. [6] Joshua Garland, Elizabeth Bradley, and James D. Meiss. Exploring the topology of dynam- ical reconstructions. Physica D: Nonlinear Phenomena, 334:49 – 59, 2016. Topology in Dynamics, Differential Equations, and Data. [7] JP Huke. Embedding nonlinear dynamical systems: A guide to takens’ theorem. 2006. [8] Matthew B. Kennel, Reggie Brown, and Henry D. I. Abarbanel. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys. Rev. A, 45:3403–3411, Mar 1992. [9] Slobodan Maletić, Yi Zhao, and Milan Rajković. Persistent topological features of dynamical systems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 26(5):053105, 2016. [10] Mark J. McGuinness. The fractal dimension of the lorenz attractor. Physics Letters A, 99(1):5 – 9, 1983. [11] MR Muldoon, RS MacKay, JP Huke, and DS Broomhead. Topology from time series. Physica D: Nonlinear Phenomena, 65(1-2):1–16, 1993. [12] James R. Munkres. Elements of algebraic topology. Addison-Wesley, 1984. [13] Steve Y Oudot. Persistence theory: from quiver representations to data analysis, volume 209. American Mathematical Society Providence, RI, 2015. [14] Jose A Perea and John Harer. Sliding windows and persistence: An application of topological methods to signal analysis. Foundations of Computational Mathematics, 15(3):799–838, 2015. [15] María Cristina Pereyra and Lesley A Ward. Harmonic analysis: from Fourier to wavelets, volume 63. American Mathematical Soc., 2012. 35 [16] Tim Sauer, James A. Yorke, and Martin Casdagli. Embedology. Journal of Statistical Physics, 65(3):579–616, Nov 1991. [17] Floris Takens. Detecting strange attractors in turbulence. In Dynamical systems and turbu- lence, Warwick 1980, pages 366–381. Springer, 1981. [18] Andreas S Weigend. Time series prediction: forecasting the future and understanding the past. Routledge, 2018. 36