DYNAMICAL SYSTEMS ANALYSIS USING TOPOLOGICAL SIGNAL PROCESSING By Audun Myers A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Mechanical Engineering – Doctor of Philosophy 2022 ABSTRACT DYNAMICAL SYSTEMS ANALYSIS USING TOPOLOGICAL SIGNAL PROCESSING By Audun Myers Topological Signal Processing (TSP) is the study of time series data through the lens of Topological Data Analysis (TDA)—a process of analyzing data through its shape. This work focuses on developing novel TSP tools for the analysis of dynamical systems. A dynamical system is a term used to broadly refer to a system whose state changes in time. These systems are formally assumed to be a continuum of states whose values are real numbers. However, real-life measurements of these systems only provide finite information from which the underlying dynamics must be gleaned. This necessitates making conclusions on the continuous structure of a dynamical system using noisy finite samples or time series. The interest often lies in capturing qualitative changes in the system’s behavior known as a bifurcation through changes in the shape of the state space as one or more of the system parameters vary. Current literature on time series analysis aims to study this structure by searching for a lower-dimensional representation; however, the need for user-defined inputs, the sensitivity of these inputs to noise, and the expensive computational effort limit the usability of available knowledge especially for in-situ signal processing. This research aims to use and develop TSP tools to extract useful information about the under- lying dynamical system’s structure. The first research direction investigates the use of sublevel set persistence—a form of persistent homology from TDA—for signal processing with applications including parameter estimation of a damped oscillator and signal complexity measures to detect bifurcations. The second research direction applies TDA to complex networks to investigate how the topology of such complex networks corresponds to the state space structure. We show how TSP applied to complex networks can be used to detect changes in signal complexity including chaotic compared to periodic dynamics in a noise-contaminated signal. The last research direction focuses on the topological analysis of dynamical networks. A dynamical network is a graph whose vertices and edges have state values driven by a highly interconnected dynamical system. We show how zigzag persistence—a modification of persistent homology—can be used to understand the changing structure of such dynamical networks. Copyright by AUDUN MYERS 2022 ACKNOWLEDGEMENTS I would foremost like to express my gratitude to my advisor Dr. Firas Khasawneh for the thoughtful guidance throughout my education. He has shown me how to conduct research in a robust and thorough way. I would to thank my family who have always supported me through my Ph.D by being inquisitive about my research and appreciating the work I am doing. I would like to also thank my collaborators for giving me insights into other fields of research and broadening my intellectual horizons. Lastly, this work would not be possible without generous support from Michigan State University, the National Science Foundation, and the Air Force Office of Scientific Research. v PREFACE Dynamical systems is a term used to broadly refer to systems whose state changes in time. These systems are formally assumed to be a continuum of states whose values are real numbers. However, real-life measurements of systems only provide finite information from which the underlying dynamics must be gleaned. This necessitates making conclusions on the continuous structure of a dynamical system using noisy finite samples or time series. The interest often lies in capturing qualitative changes in the system behavior as one or more of the system parameters vary. For example, a shift in surface pressure characteristics on airfoils as a function of the angle of attack from regular to aperiodic can indicate significant loss of lift and possibly stall conditions. Recent advances in sensor technology and computer hardware has also led to a shift towards data-driven analysis and modeling of engineered and natural systems. The datasets are obtained through either numerical simulations or experiments and often contain complex dynamics hidden in some high- dimensional structure. Current literature on time series analysis aims to study this structure by searching for a lower dimensional representation; however, the need for user-defined inputs, the sensitivity of these inputs to error, and the expensive computational effort limit the usability of available knowledge, especially for in-situ signal analysis. Additionally, many current TSA methods are sensitive to additive noise, which is common in experimental data. An emerging collection of tools breathing new life into this discipline is the nascent field of Topological Signal Processing (TSP), which leverages the power of Topological Data Analysis (TDA) [157] for analyzing complex signals [42,147,159,160,163,175–177,202,229,230,233,243, 246]. Some of the attractive features of using TDA for signal processing include its noise-robustness, compact visualization tools, and conduciveness to machine learning. Therefore, enriching signal processing using TDA has the potential to reveal information that is currently not possible by existing, standard dynamic systems methods. There has been exciting preliminary results in this field including showing empirically that the novel tools created have potential to revolutionize the field. However, despite the success shown in prior works, the fundamental science that connects vi TDA to the underlying dynamic systems theory remains largely unexplored. In my work I am focused on four main chapters: (1) Implementing sublevel set persistence for parameter estimation and time series analysis, (2) choosing optimal parameters for both state space reconstruction and permutations to be used for topological signal processing, (3) the persistent homology of complex networks, and (4) applying novel tools from TDA for analyzing dynamical networks. Each of these will be introduced in the following paragraphs. In each chapter a thorough introduction to each subject is provided. The first chapter of my research is based in sublevel set persistence of single variable time series—a tool from TDA that can be applied to the time series directly. The goal of this chapter is to use the sublevel set persistence for directly estimating damping parameters of the underlying one-dimensional oscillator from the positional output time series. While sublevel set persistence is robust to additive noise, it does have noise artifacts that need to be accounted for to accurately estimate system parameters from the signal. Therefor, my first contribution to this field was to develop a statistical analysis of the resulting persistence to separate out the significant features which hold information about the damping characteristics and parameter values of the underlying oscillator.The third contribution of this chapter is my development of methods for calculating time series complexity using sublevel set persistence and information theory. These complexity measures are shown to provide an avenue for bifurcation detection through an increased complexity of the signal’s sublevel set entropy. The second chapter of my research studies parameter selection techniques for both state space reconstruction and permutation formations. The two parameters needed are the dimension 𝑛 and delay 𝜏. Both permutations and state spaace reconstruction are vital prerequisite data processing techniques used to apply TDA to study a signal in the next chapter. In this chapter we also develop novel parameter selection methods based on a topological analysis of the data through both reconstructions from sliding windows and sublevel set persistence. In contrast to classical tools for representing time series as a point cloud, chapter three of my work studies network representations of the underlying dynamics. One of the advantages of this vii approach is that the size of the representation can be better controlled as a finite set, and I can leverage graph theory to research faster methods for quantifying the topology based on the resulting network. However, the representation of time series as a graph—especially in the presence of noise—is a largely open field of research, and efficient TDA computation on the resulting graphs is still in need of a solid mathematical footing. For example, questions related to the optimal parameter choices of the representation, types of detectable bifurcations, and mathematical guarantees that govern successful bifurcation identification are all wide open. Many of these optimal parameters are associated with the parameters of both permutation entropy as a time series information measure and state space reconstruction. In chapter two (section 2) I introduce information theory and specifically permutation entropy with some of the most successful optimal parameter estimation methods for time series as well as develop several novel methods based on tools from TDA. This initial research provided the needed foundations for many of the network representation tools I later use in chapter three. I also contribute to the field of complex network analysis through TDA by investigating methods for implementing weight information and complex network formation methods that best perform for the dynamic state analysis task. The fourth chapter of research in Section 4 focuses on novel applications of topological data analysis for studying interconnected dynamical systems represented as temporal graphs using topological data analysis. In this chapter I first show how a transportation system as a dynamical system can be represented as a temporal graph. I then develop a framework for applying zigzag persistence to detect structural changes in the temporal graph over time. This is done using zigzag persistence. I make comparisons to the persistence diagram results using standard shape summary statistics from graph theory literature. In this chapter I also develop a method for the analysis of complex dynamical systems using temporal graphs when only a one-dimensional signal is available. This is done using a sliding window approach with each window represented as a complex network (e.g., the ordinal partition network). I then show how zigzag persistence can be used to study the changing structure of these graphs to detect changes in the signal and underlying dynamical system. Specifically, I show how periodic and chaotic windows can be detected for the Lorenz viii system exhibiting intermittency dynamic (i.e., irregular transitions from a regular to chaotic state). I have also included a fifth auxiliary chapter, which describes the experimental data sets and software developed in my research. Specifically, two main experimental data sets are used. The firth is the magnetic pendulum which has transitions from periodic to chaotic dynamics with a change in base excitation frequency and amplitude making it useful for testing TSP methods used for characterizing the dynamic state of a system. The second data set is from a double pendulum tracked using a high-speed camera [165]. Many of the methods developed through my research are programmed into the TSP python software teaspoon. The various available modules for teaspoon are discussed in Section 5.2. ix TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv CHAPTER 1 SUBLEVEL SET PERSISTENCE FOR TIME SERIES ANALYSIS . . . . . 1 1.1 Sublevel Set Persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Sublevel Set Persistence with Additive Noise . . . . . . . . . . . . . . . . 3 1.2 Statistical Analysis of Sublevel Set Persistence . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Statistics of Additive Noise in the Persistence Diagram . . . . . . . . . . . 5 1.2.2 Cutoff Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.3 Cutoff for Noise Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.4 Cutoff and Distribution Parameter Estimation Method . . . . . . . . . . . . 13 1.2.5 Signal Compensation for the Cutoff and Distribution Parameter . . . . . . . 16 1.3 Damping Parameter Identification Using Sublevel Set Persistence . . . . . . . . . . 19 1.3.1 Sublevel Set Persistence of Damping Mechanisms . . . . . . . . . . . . . . 23 1.3.2 Noise Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.3.3 Method 1: Persistence Diagram Cutoff . . . . . . . . . . . . . . . . . . . . 32 1.3.4 Method 2: Function Fitting to the Persistence Space . . . . . . . . . . . . . 37 1.3.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 1.3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1.4 Sublevel Set Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 1.4.1 Information Entropy Statistics . . . . . . . . . . . . . . . . . . . . . . . . 48 1.4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 1.4.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 1.4.4 Analysis on the Number of Bins . . . . . . . . . . . . . . . . . . . . . . . 58 1.4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 CHAPTER 2 PARAMETER SELECTION FOR PERMUTATION ENTROPY AND STATE SPACE RECONSTRUCTION . . . . . . . . . . . . . . . . . . . . . 67 2.1 Permutation Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.2 Embedding Delay Parameter Selection Methods . . . . . . . . . . . . . . . . . . . 74 2.2.1 Frequency Approach for Embedding Delay . . . . . . . . . . . . . . . . . 74 2.2.2 Multi-scale Permutation Entropy for Selecting Delay . . . . . . . . . . . . 80 2.2.3 Autocorrelation for Embedding Delay . . . . . . . . . . . . . . . . . . . . 83 2.2.4 Mutual Information for Embedding Delay . . . . . . . . . . . . . . . . . . 84 2.2.5 Permutation Auto-mutual Information for Selecting Delay . . . . . . . . . 85 2.3 Embedding Dimension Parameter Selection Methods . . . . . . . . . . . . . . . . 86 2.3.1 False Nearest Neighbors for Embedding Dimension . . . . . . . . . . . . . 87 2.3.2 Singular Spectrum Analysis for Embedding Dimension . . . . . . . . . . . 87 2.3.3 Multi-scale Permutation Entropy for Permutation Dimension . . . . . . . . 88 2.3.4 Method Comparisons and Conclusions . . . . . . . . . . . . . . . . . . . . 89 x 2.4 Topological Methods for Delay Parameter Selection . . . . . . . . . . . . . . . . . 92 2.4.1 Finding 𝜏 Using SW1PerS . . . . . . . . . . . . . . . . . . . . . . . . . . 95 2.4.2 Finding 𝜏 Using Sublevel Set Persistence . . . . . . . . . . . . . . . . . . 99 2.4.3 Permutation Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 2.4.4 Results for Topological Data Analysis Methods . . . . . . . . . . . . . . . 106 CHAPTER 3 PERSISTENT HOMOLOGY OF COMPLEX NETWORKS . . . . . . . . . 114 3.1 Complex Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 3.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 3.1.2 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 3.1.3 Proximity and Transition Networks . . . . . . . . . . . . . . . . . . . . . . 119 3.2 Topological Analysis of Complex Networks . . . . . . . . . . . . . . . . . . . . . 122 3.2.1 Persistent Homology of Complex Networks . . . . . . . . . . . . . . . . . 122 3.2.2 Distance Measures for Graphs . . . . . . . . . . . . . . . . . . . . . . . . 126 3.2.3 Point summaries of persistence diagrams . . . . . . . . . . . . . . . . . . . 129 3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 3.3.1 First Example: Ordinal Partition and Coarse Grained State Space Net- work Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 3.3.2 Second Example: Distance Method Comparison . . . . . . . . . . . . . . . 135 3.3.3 Third Example: Periodic and Chaotic Dynamics . . . . . . . . . . . . . . . 136 3.3.4 Fourth Example: The Magnetic Pendulum . . . . . . . . . . . . . . . . . . 137 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 3.4.1 Dynamic State Change Detection on the Rössler System . . . . . . . . . . 139 3.4.2 Dynamic State Detection Using Machine Learning on Persistence Diagrams 143 CHAPTER 4 PERSISTENT HOMOLOGY OF DYNAMICAL NETWORKS . . . . . . . 152 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.1.1 Zigzag Persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.1.2 Temporal Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 4.2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 4.3.1 Great Britain Temporal Transportation Network . . . . . . . . . . . . . . . 162 4.3.2 Temporal Ordinal Partition Network for Intermittency Detection . . . . . . 163 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 CHAPTER 5 PERSISTENT HOMOLOGY OF DYNAMICAL NETWORKS . . . . . . . 168 5.1 Experiment: Magnetic Pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 5.1.1 Mathematical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 5.1.2 Equipment and Experimental Design . . . . . . . . . . . . . . . . . . . . . 171 5.1.3 Physical Parameters and Constants . . . . . . . . . . . . . . . . . . . . . . 172 5.2 Teaspoon: A comprehensive python package for topological signal processing . . . 174 5.2.1 Dynamical Systems Library (DynSysLib) . . . . . . . . . . . . . . . . . . 176 5.2.2 Machine Learning Module . . . . . . . . . . . . . . . . . . . . . . . . . . 176 5.2.3 Complex Networks Module . . . . . . . . . . . . . . . . . . . . . . . . . . 179 xi 5.2.4 Information Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 5.2.5 Parameter Selection Module . . . . . . . . . . . . . . . . . . . . . . . . . 181 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 APPENDIX A PERMUTATION ENTROPY PARAMETER SELECTION . . . . . . 183 APPENDIX B SUBLEVEL SET PERSISTENCE AND DAMPING PARAMETER ESTIMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 APPENDIX C DYNAMICAL SYSTEMS . . . . . . . . . . . . . . . . . . . . . . . . 190 APPENDIX D ADDITIONAL DIFFUSION DISTANCE ANALYSIS . . . . . . . . . 193 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 xii LIST OF TABLES Table 1.1: Ratios 𝜌 = 𝐿/ ¯ 𝐿˜ for estimating sample mean from the sample median with uncertainty as three standard deviations . . . . . . . . . . . . . . . . . . . . . . 14 Table 1.2: Constants of Eq. (1.51) for each distribution type investigated in this work with associated uncertainty from ten trials. . . . . . . . . . . . . . . . . . . . . 18 Table 1.3: Cutoff and parameter estimation equations for the Gaussian, uniform, Rayleigh, and exponential probability distribution functions. . . . . . . . . . . . . . . . . 33 Table 1.4: Constants of (1.80) for each distribution type investigated in this work with associated uncertainty from ten trials. . . . . . . . . . . . . . . . . . . . . . . . 33 Table 1.5: Cutoff and parameter estimation equations for the Gaussian, uniform, Rayleigh, and exponential probability distribution functions. . . . . . . . . . . . . . . . . 35 Table 1.6: Quick reference to equations (or cost functions) for using sublevel set persis- tence to estimate damping parameters and constants. . . . . . . . . . . . . . . . 37 Table 1.7: Tabulated results for sublevel set entropy of Lorenz example . . . . . . . . . . . 57 Table 2.1: A comparison between the calculated and suggested values for the delay pa- rameter 𝜏. The shaded (red) cells highlight the methods that failed to provide a close match to the suggested delay. . . . . . . . . . . . . . . . . . . . . . . . . 108 Table 3.1: A comparison between persistence diagram point summaries 𝑀 (𝐷 1 ), 𝑃(𝐷 1 ), and 𝐸 ′ (𝐷 1 ) for detecting differences in the networks generated from for peri- odic (Per.) and chaotic (Ch.) time series using both 𝑘-NN graphs and ordinal partition graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Table 3.2: Accuracies of the distance methods for both ordinal partition and coarse grained state space networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Table 3.3: Noise robustness comparison for persistence diagram point summaries and network parameters using ordinal partition network. . . . . . . . . . . . . . . . 151 Table 5.1: Equipment used for experimental data collection. . . . . . . . . . . . . . . . . . 172 Table 5.2: Equation of motion parameters to simulated pendulum with associated uncertainty.173 xiii Table A.1: A comparison between the calculated and suggested values for the delay pa- rameter 𝜏 for multiple MI approximation methods. The cells in bold highlight the methods that yielded the closest match to the suggested delay. The equal- sized partition method is described in Section A.3, Kraskov et al. methods 1 and 2 in Section A.3, and the adaptive partitioning approach in Section A.3. . . . 187 Table A.2: A comparison between the calculated and suggested values for the delay pa- rameter 𝜏. The cells in bold show the methods that yielded the closest match to the suggested delay. The following conditions or abbreviations were used in the table: the range under PAMI results is from using the range (4 < 𝑛 < 6), AP under MI is an abbreviation for adaptive partitioning, and AC is an abbreviation for autocorrelation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Table A.3: A comparison between the calculated and suggested values for the embedding dimension 𝑛. The cells in bold show the methods that yielded the closest match to the suggested dimension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Table C.1: Continuous and discrete dynamical Systems used throughout manuscript. . . . . 190 Table C.2: Available flows and maps in dynamic systems library module. . . . . . . . . . . 191 Table C.3: Available functions, noise models, and medical data in dynamical systems library module. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Table C.4: Parameter selection methods available in parameter selection module for both the delay and dimension parameters. . . . . . . . . . . . . . . . . . . . . . . . . 192 xiv LIST OF FIGURES Figure 1.1: Overview of research chapters with past, current, and future works. . . . . . . . 1 Figure 1.2: Example 0D sublevel set persistence from function 𝑓 (𝑡) over finite domain 𝑡 ∈ [𝑡 𝑎 , 𝑡 𝑏 ] with the resulting persistence diagram on the right. . . . . . . . . . . 2 Figure 1.3: Sublevel set persistence applied to 𝑥(𝑡) of a single variable function or time series with and without additive noise 𝜖 from N , shown in red and blue, respectively. This demonstrates the stability of persistent homology with the time series (left) with and without additive noise and the small effect on the resulting persistence diagrams (right). In addition, the light red region separates the significant features from those associated to additive noise. . . . . 4 Figure 1.4: Histograms ℎ(∗) of the zero mean normal distriubtion N (0, 𝜎 2 = 1) and the resulting birth times 𝐵 and death times 𝐷, which are compared to the density distributions from Eq. (1.4). . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Figure 1.5: Example cutoff 𝐶𝛼 for a persistence diagram and time order lifetimes of sublevel set persistence from 𝑥(𝑡) + N . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 1.6: Additive noise probability distributions 𝑓 (𝑥) for the four models realized in this work: uniform, Gaussian, Rayleigh, and exponential. . . . . . . . . . . . . 10 Figure 1.7: Example time series showing sample 𝛿𝑖 . . . . . . . . . . . . . . . . . . . . . . 17 Figure 1.8: Numeric function fitting of Eq. (1.51) to the mean of the median lifetime 𝐿˜ of 𝑓𝑖 (𝑡) for 𝑖 ∈ [1, 3] where N is unit variance Gaussian additive noise with 𝛿 ∈ [0, 2] being incremented to understand the effects of signal on the median lifetime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Figure 1.9: Demonstration of distribution parameter 𝜎 estimation of Gaussian additive noise in 𝑥(𝑡) = 𝐴 sin(𝜋𝑡) + N using the median lifetime with and without signal compensation as 𝜎 and 𝜎 ∗ , respectively. . . . . . . . . . . . . . . . . . . 19 Figure 1.10: Single degree of freedom oscillator with multiple modes of energy dissipa- tion. Energy dissipation mechanisms include Coulomb 𝜇𝑐 , viscous 𝜇𝑣 , and quadratic 𝜇 𝑞 damping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 1.11: Example 0D sub-level set persistence from the viscously damped free re- sponse time series 𝑥(𝑡). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Figure 1.12: Example free vibration response of system with Coulomb damping. . . . . . . . 26 xv Figure 1.13: Example free vibration response of system with quadratic damping. . . . . . . . 29 Figure 1.14: Sub-level set persistence applied to sample time series 𝑥(𝑡) with and without additive noise N . This demonstrates the robustness of persistent homology with the time series (top left) with and without additive noise and the small effect on the resulting persistence diagrams (top right) and the corresponding time ordered lifetimes (bottom left). . . . . . . . . . . . . . . . . . . . . . . . . 30 Figure 1.15: Overview of method: starting with a time series, the sublevel set persistence is calculated. The lifetimes from the persistence diagram are then plotted as a function of their birth time. The resulting diagram is analyzed from both a statistical and function fitting perspective to estimate the damping parameters. . 31 Figure 1.16: Example section of sampled time series 𝑥(𝑡) with (black dots) and without (green dashed line) additive noise to demonstrate effect of additive on in- ′ creasing the lifetime of sublevel set persistence by approximately 𝐿 𝑖 − 𝐿 𝑖 = 𝜖 𝑣 𝑖 + 𝜖 𝑝𝑖 ≈ F 𝛽 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Figure 1.17: Example demonstrating process of going from a time series 𝑥(𝑡) with am- plitude decrement and additive noise to the time ordered lifetimes of the persistence diagram with dual function fitting. . . . . . . . . . . . . . . . . . . 38 Figure 1.18: Time series 𝑥(𝑡) sampled at 20 Hz from Eq. (1.92) with and without additive noise N from a normal distribution with standard deviation 𝜎 = 0.01. . . . . . 39 Figure 1.19: Resulting time-ordered lifetimes plot for the viscous damping mechanism example in Fig. 1.18 with (left) the statistical analysis and (right) function fitting. 40 Figure 1.20: Time series 𝑥(𝑡) sampled at 20 Hz from Eq. (1.92) with and without additive noise N from a normal distribution with standard deviation 𝜎 = 0.01. . . . . . 41 Figure 1.21: Resulting time-ordered lifetimes plot for the experimental pendulum data (see Fig. 1.20) having an approximate Coulomb damping mechanism in the linear range with (left) the statistical analysis and (right) function fitting. . . . . . . . . 42 Figure 1.22: Time series 𝑥(𝑡) sampled at 20 Hz from the simulation of a quadratically damped oscillator with and without additive noise N from a normal distribu- tion with standard deviation 𝜎 = 0.01. . . . . . . . . . . . . . . . . . . . . . . 43 Figure 1.23: Resulting time-ordered lifetimes plot for the quadratic damping mechanism example in Fig. 1.22 with (left) the statistical analysis and (right) function fitting. 43 xvi Figure 1.24: Analysis of the noise robustness of sublevel set persistence for damping parameter estimation of an oscillator with (top) coulomb, (middle) viscous, and (bottom) quadratic damping mechanisms with (left) and without (right) noise compensation. For each damping mechanism I estimate the damping parameters using a single lifetime (One), and optimal lifetime ratio (Opt.), and function fitting (Fit.). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Figure 1.25: Effect of low sampling frequencies for the damping parameter identification methods based on sublevel set persistence for Coulomb (left), viscous (mid- dle), and quadratic (right) damping mechanisms. Analysis shows accurate results for sampling rate 𝑓𝑠 > 2 𝑓Nyquist , where 𝑓Nyquist ≈ 1.42 Hz is the Nyquist sampling rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Figure 1.26: Effects of damping parameters of (left) Coulomb, (middle) viscous, and (right) quadratic damping. These parameter values are ranged from very low damping to high or critical damping values. . . . . . . . . . . . . . . . . . . . 48 Figure 1.27: Pipeline for applying entropy metrics to the sublevel set persistence homology. The sublevel set persistence diagram in (b) is calculated from the signal in (a), which is used to calculate the lifetimes that are ordered chronologically based on their birth index in (c). The lifetimes can either be used to directly calculate the approximate and sample entropy as ℎ𝑎 (𝐿) and ℎ 𝑠 (𝐿) or are then digitized into states based on the binning procedure in (d) and (e) with bin edges shown in (c). The probability of each state can be found to calculate the information entropy ℎ. Additionally, the chronologically ordered states in (e) can be used to calculate the approximate and sample entropies ℎ𝑎 (S) and ℎ 𝑠 (S), where S is the state sequence composed of states 𝑎𝑖 ∈ A. The entropy rate ℎ𝑟 and average conditional entropy ℎ¯ 𝑐 can also be calculated from the Markov chain matrix in (f). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Figure 1.28: Example demonstrating sublevel set persistence of periodic (top row of fig- ures) and chaotic (bottom row of figures) simulations of the Lorenz system. Each row shows the time series 𝑥(𝑡) (left), sublevel set persistence diagram (middle), and binned lifetimes (right). . . . . . . . . . . . . . . . . . . . . . . . 55 Figure 1.29: Further diagrams for entropy analysis of example signals in Fig. 1.28. The top row is again for the periodic signal and bottom for chaotic. The left column is the distribution of states, the middle is the state sequence, and the right is the 1-step transition probability matrix. . . . . . . . . . . . . . . . . . . . . . . 56 Figure 1.30: Analysis on effect of number of bins or states on entropy values for 18 continuous and 12 discrete dynamical systems. . . . . . . . . . . . . . . . . . . 58 xvii Figure 1.31: Spread of entropy values for periodic and chaotic dynamics using 15 bins for 12 discrete dynamical systems (maps) and 18 continuous dynamical systems (flows). The green dashed line seperates periodic and chaotic entropy sttistics based on a maximized accuracy for both flows and maps. . . . . . . . . . . . . 60 Figure 1.32: Resilience of entropy statistics to additive noise for SNR values from 10 to 50 dB for the periodic and chaotic Lorenz system simulation described in Eq. (4.3). Uncertainties are reported as the standard deviation for each SNR repeated 20 times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Figure 1.33: Bifurcation analysis of entropy statistics for the logistic map dynamical system with 𝑟 ∈ [80, 190] with step sizes of Δ𝑟 = 0.001. Green highlighted regions are periodic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Figure 1.34: Bifurcation analysis of entropy statistics for the Lorenz dynamical system with 𝜌 ∈ [3.2, 4.0] with step sizes of Δ𝜌 = 0.1 and 𝜎 = 10 and 𝛽 = 8/3. Green highlighted regions are periodic. . . . . . . . . . . . . . . . . . . . . . . 64 Figure 1.35: Computation Time Example for Lorenz system (A) and logistic map (B) for each entropy statistic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Figure 2.1: Timeline of entropy measurements for time series analysis. . . . . . . . . . . . 68 Figure 2.2: Sample permutation formation for 𝑛 = 3 and 𝜏 = 1. . . . . . . . . . . . . . . . 69 Figure 2.3: All possible permutation configurations for n = 3. . . . . . . . . . . . . . . . . 69 Figure 2.4: Some possible modes for failure for selecting 𝜏 for phase space reconstruc- tion using classical methods: (a) mutual information registering false minima as suitable delay generated from a periodic Lorenz system, (b) mutual in- formation being mostly monotonic and not having a distinct local minimum to determine 𝜏 generated from EEG data [7], and (c) autocorrelation failing from a moving average of ECG data provided by the MIT-BIH Arrhythmia Database [154]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Figure 2.5: Overview of methods investigated for automatically calculating both the delay 𝜏 and dimension 𝑛 for permutation entropy. . . . . . . . . . . . . . . . . . . . . 73 Figure 2.6: Overview of our frequency domain approach for finding the maximum sig- nificant frequency 𝑓max using LMS for a signal contaminated with GWN. . . . . 74 Figure 2.7: LMS linear regression with 45% outliers. Results match those found in [143]. . 77 Figure 2.8: (a) Theoretical PDF for GWN. (b) CDF for GWN with an example cutoff at the 99% 𝐶𝑃. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 xviii Figure 2.9: (A) FFT of GWN with 0.035 standard deviation and zero mean with the location of the theoretical maximum of the PDF and one-dimensional LMS regression value. (B) Distribution of GWN in the Fourier Spectrum with overlapped theoretical PDF and location of the theoretical maximum of the PDF and one-dimensional LMS regression value. . . . . . . . . . . . . . . . . 79 Figure 2.10: (right) Resulting MPE plot for (left) 2𝑃 periodic time series with example embedding delays d0 , d1 , and d2 . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Figure 2.11: The three regions of the MPE plot for a periodic signal: (A) redundant, (B) resonant, and (C) irrelevant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Figure 2.12: MPE plot for the 𝑥 coordinate of the Lorenz system. Additionally, points in the MPE plot with their corresponding subsampled time series are shown for the redundant, resonant, and irrelevant regions as described in Section 2.2.2. . . 83 Figure 2.13: A comparison between the calculated and suggested values for the delay pa- rameter 𝜏 for multiple MI approximation methods. The methods investigated were equal-sized partition method, Kraskov et al. methods 1 and 2, and the adaptive partitioning approach. . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Figure 2.14: PAMI results for the sinusoidal function with 𝑛 ∈ [2, 5] and 𝜏 ∈ [1, 50]. The figure shows an optimal window size 𝜏(𝑛 − 1) ≈ 25. . . . . . . . . . . . . . . . 86 Figure 2.15: A comparison between the calculated and suggested values for the delay parameter 𝜏. The methods investigated were MI with adaptive partitions, Spearman’s Autocorrelation (AC), the frequency analysis, Multi-scale Per- mutation Entropy (MPE), and Permutation Auto-mutual Information (PAMI) with 𝑛 = 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Figure 2.16: A comparison between the calculated and suggested values for the embedding dimension 𝑛. The methods investigated were False Nearest Neighbors (FNN), Multi-scale Permutation Entropy (MPE), and Singular Spectrum Analysis (SSA). 90 Figure 2.17: Example formation of a permutation sequence from the time series 𝑥(𝑡) = 2 sin(𝑡) with sampling frequency 𝑓𝑠 = 20 Hz, permutation dimension 𝑛 = 3 and delay 𝜏 = 40. The corresponding time-delay embedded vectors from 𝑥(𝑡) with the permutation binnings (𝜋1 , . . . , 𝜋6 ) in the state space are shown in the bottom figure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 xix Figure 2.18: Example comparing first minima of mutual information and first maxima of multi-scale permutation entropy, which demonstrates the correspondance between the two. On the left are the 𝑛 = 3 time delayed state space recon- structions with an inaccurately chosen 𝜏 = 1 and appropriate 𝜏 = 14. On the right shows the permutation distribution as 𝜏 increases and the associated multi-scale permutation entropy and mutual information plots. . . . . . . . . . 94 Figure 2.19: Example showing three sample windows with 𝑚 = 2 of increasing size, which is slid across the entire time series (periodic Rossler system) resulting in the embedded time series in R2 . The window size is defined as 𝑤 = 𝑚𝜏 with (left) 𝑤 𝑠 𝑚𝜏𝑠 being too small with 𝜏𝑠 = 1 and an embedding shape concentrated on the diagonal line and a high periodicity score 𝑠 and low L, (middle) 𝑤 𝑜 is properly sized and results in a minimum periodicity score 𝑠 and maximum L suggesting an optimal delay 𝜏𝑜 = 10, and (right) 𝑤 ℓ with 𝜏 = 17 is too large and results in a high periodicity score 𝑠 and low L. . . . . . . . . . . . . . . . . 96 Figure 2.20: Example periodicity 𝑠 and max persistence L plots for the chaotic Rossler system with associated cutoffs to determine the average 𝜏. . . . . . . . . . . . . 98 Figure 2.21: Example demonstrating process from time series 𝑥 (periodic Rossler system) to sublevel set persistence diagram to time ordered lifetimes on the bottom left. Additionally, on the bottom left shows a sample time periodic between sublevel sets as 𝑇𝐵𝑖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Figure 2.22: Example demonstrating the time delay 𝜏 = 10 result for the periodic Rossler example time series shown in the top figure and the resulting 𝑛 = 2 Takens’ embedding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Figure 2.23: Overview of procedure for finding maximum significant frequency using 0- dimensional sublevel set persistence and the modified 𝑧-score for a signal contaminated with noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Figure 2.24: Percent of the persistence points from 0-D sublevel set persistence of the FFT of GWN using the modified 𝑧-score with the provided threshold ranging from 0 to 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Figure 2.25: Percent of permutations used 𝑅 = 𝑁 𝜋 /𝑛! for each example time series (see Eq. (2.30)) as the dimension is incrmented. . . . . . . . . . . . . . . . . . . . 106 Figure 2.26: Example showing difference in PE (see Eq. (2.31)) for periodic and chaotic dynamic states of the Rossler system for a wide range of PE parameters. . . . . 107 xx Figure 2.27: Noise robustness analysis of the delay parameter selection using the Rossler system with incriminating additive noise. The mean and standard deviation as error bars of the delay parameters from 30 trials at each SNR were calculated using sublevel set persistence of the frequency domain 𝜏SLf , sublevel set persistence of the time domain 𝜏SLt , the minima of SW1PerS score 𝜏PHs , the maxima of the maximum persistence 𝜏PHL , and mutual information 𝜏MI . . . . . 112 Figure 2.28: Signal length robustness analysis of the delay parameter selection using the Rossler system with incrementing signal length from 75 to 1000 in steps of 25. The delay parameters were calculated at each 𝐿 using set persistence of the frequency domain 𝜏SLf , sublevel set persistence of the time domain 𝜏SLt , the minima of SW1PerS score 𝜏PHs , the maxima of the maximum persistence 𝜏PHL , and mutual information 𝜏MI . . . . . . . . . . . . . . . . . . . . . . . . . 113 Figure 3.1: Comparison between ordinal partition networks generated from 𝑥-solution of R¥ossler system for both periodic (a) and chaotic (b) time series. . . . . . . . . 114 Figure 3.2: Example formation of a weighted transitional network as a graph (middle figure) and adjacency matrix (right figure) given a state sequence 𝑆 (left figure). 120 Figure 3.3: Assignment of Ordinal Partition (OP) or Coarse Grained (CG) state for ex- ample dimension 3 SSR vector. . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Figure 3.4: Persistent homology of weighted complex network. Top left shows the weighted network with corresponding adjacency matrix to its right. Third is the distance matrix and then at the top right is the persistence diagram of one-dimensional features. The bottom row shows the filtration at critical values. 125 Figure 3.5: Example basic graph with corresponding shortest path distance matrix. High- lighted in red is an example shortest path from node 2 to 5 with shortest path distance 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Figure 3.6: Table of examples showing the lifetime 𝐿 𝑛 of the single class (𝑟 𝐵 , 𝑟 𝐷 ) in the persistence diagram for the pipeline applied to a cycle with 𝑛 nodes. . . . . . . 130 Figure 3.7: Example formation of the ordinal partition (top) and coarse grained state space (bottom) networks for 𝑥(𝑡) = sin(𝑡) embedded into R3 . . . . . . . . . . . 132 Figure 3.8: Example illustrating issue with erraneous permutation transitions when there is additive noise and a tracjectory close to the hyperplane intersection 𝐻. The three dimensional state space reconstruction (D) from the signal 𝑥(𝑡) with and without additive noise (A) demonstrate that as the distance to the hyperdiagonal 𝑑 𝐻 (C) becomes small, undesired permutation transitions (B)– with zoomed in section shown in (E)–occur as shown in the orange highlighted regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 xxi Figure 3.9: Example demonstrating importance of choosing an appropriate network for- mation method when there is additive noise in the signal. The CGSSN retains the graph structure when additive noise, but the OPN network quickly loses all resemblance of the noise free topological structure even with a small amount of additive noise. 𝑥(𝑡) is the signal, N is additive noise and 𝐺 (𝑥) is the graph formation function of the signal 𝑥. . . . . . . . . . . . . . . . . . . . . . . . . 134 Figure 3.10: Two example weighted cycle graphs of weight 10 with the bottom row having an additional edge of weight one connecting nodes 0 and 8. The persistence diagram associated to each of the four distance methods are shown by column both both graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Figure 3.11: A comparison of the resulting persistence diagrams for an OPN formed from a periodic and chaotic signal from the Lorenz system. . . . . . . . . . . . . . . 137 Figure 3.12: Example of method applied to experimental data with a periodic response Fig. (a). In Fig. (b) the sequence of permutations are shown for 𝑛 = 6 with the associated ordinal partition network in Fig. (c). In Fig. (d) the distance matrix (using an unweighted network and short path distance) is shown, which was used to compute a persistence diagram with multiplicity shown in Fig. (e) and (f), respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Figure 3.13: Example of method applied to experimental data with a chaotic response Fig. (a). In Fig. (b) the sequence of permutations are shown for 𝑛 = 6 with the associated ordinal partition network in Fig. (c). In Fig. (d) the distance matrix (using an unweighted network and short path distance) is shown, which was used to compute a persistence diagram with multiplicity shown in Fig. (e) and (f), respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Figure 3.14: Rössler system bifurcation for 0.37 < 𝑎 < 0.43 with steps of 0.001. Left col- umn plots include point summaries calculated from ordinal partition networks with parameters 𝜏 = 40 and 𝑑 = 6; Right column plots show the same results for the 𝑘-NN networks generated from Takens’ embedding with parameters 𝜏 = 4 and 𝑑 = 7. The figure compares point summaries 𝑃(𝐷 1 ), 𝑀 (𝐷 1 ), and 𝐸 ′ (𝐷 1 ) with the Lyapunov exponent 𝜆 [19] and some common network parameters including the number of vertices 𝑁, mean out degree ⟨𝑘⟩, and out degree variance 𝜎 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Figure 3.15: Comparison between the (a) shortest unweighted path, (b) shortest weighted path, (c) weighted shortest path, and (d) lazy diffusion distances using a two dimensional MDS projection (random seed 42) of the bottleneck distances between persistence diagrams of the OPN for chaotic and periodic dynamics with an SVM radial bias function kernel separation. . . . . . . . . . . . . . . . 144 xxii Figure 3.16: Comparison between the (a) shortest unweighted path, (b) shortest weighted path, (c) weighted shortest path, and (d) lazy diffusion distances using a two dimensional MDS projection (random seed 42) of the bottleneck distances be- tween persistence diagrams of the CGSSN for chaotic and periodic dynamics with an SVM radial bias function kernel separation. . . . . . . . . . . . . . . . 147 Figure 3.17: Bottleneck distance stability analysis of the periodic Lorenz system (see Eq. (4.3)) with standard deviation normalized signal and bounded (𝜀 = 6𝜎) Gaussian additive noise. Analysis shows stability results using Shortest Un- weighted Path Distance (SUPD), Shortest Weighted Path Distance (SWPD), Weighted Shortest Path Distance (WSPD), and Diffusion Distance (DD). . . . . 148 Figure 3.18: Average point summaries and network parameters for varying SNRs from Gaussian noise added to time series generated from periodic and chaotic Rössler systems. For each SNR, 25 separate samples are taken to provide mean values and standard deviations, which are shown as the error bars. . . . . 149 Figure 4.1: Transportation networks of Great Britain for air, coach, and rail travel. . . . . . 157 Figure 4.2: Pipeline for applying zigzag persistence to temporal networks. Begin with an unweighted and undirected temporal graph where each edge is on at a point or interval of time. Create graph snapshots using a sliding window interval over the time domain. Create a sequence of simplicial complexes from the graphs and apply zigzag persistence to the union zigzag simplicial complexes. . 158 Figure 4.3: Example zigzag persistence applied to a simple temporal cycle graph. . . . . . . 160 Figure 4.4: Connectivity and centrality analysis on temporal Great Britain rail network. . . 162 Figure 4.5: Zigzag persistence diagrams of the rail transportation network of Great Britain. 163 Figure 4.6: The 𝑥(𝑡) solution to simulation of Lorenz system from Eq. (4.3) exhibiting intermittency with example sliding windows for both periodic (blue) and chaotic (red) dynamics with their respective ordinal partition networks. . . . . . 164 Figure 4.7: Connectivity and centrality analysis on temporal ordinal partition network with chaotic regions of 𝑥(𝑡) highlighted in red. . . . . . . . . . . . . . . . . . . 165 Figure 4.8: One-dimensional zigzag persistence of the temporal ordinal partition network from the 𝑥 solution of the intermittent Lorenz system described in Eq. (4.3). . . 166 xxiii Figure 5.1: Rendering of experimental setup in comparison to reduced model, where 𝑏(𝑡) = 𝐴 sin(𝜔𝑡) is the base excitation with frequency 𝜔 and amplitude 𝐴, 𝑟 𝑐𝑚 is the effective center of mass of the pendulum, 𝑑 is the minimum distance between magnets 𝑚 1 = 𝑚 2 = 𝑚 (modeled as dipoles), and ℓ is the length of the pendulum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Figure 5.2: A comparison between a generic, in-plane magnetic model in global coordi- nates and the equivalent magnetic forces in the pendulum model 𝐹𝑟 and 𝐹𝜙 (see Eq. (5.4)). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Figure 5.3: Manufacturing overview with experimental setup. In Fig. (a), an exploded view of the end mass (100% infill 3D printed PLA components) is shown with the magnet press fit into end of pendulum. In Fig. (b), an exploded view of the linear stage controlling the vertical position of the lower magnet. . . . . . 172 Figure 5.4: Measured repulsion force as a function of distance compared to theoretical force in Eq. (5.4) with 𝜃 = 0. The theoretical force 𝐹theory is based on dipole model with a dipole moment 𝑚 = 0.85 cm, which was estimated using a curve fit to the region where the magnetic thickness 𝑇 ≪ 𝑟. Region of poor fit is marked for 𝑟 < 0.035 m. . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Figure 5.5: Free drop test between collect angular position data 𝜃 data with encoder un- certainty 𝜎data and the simulated response 𝜃 sim . As shown in the zoomed-in region, the simulated response is within the bounds of uncertainty of the actual response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Figure 5.6: Tree structure of teaspoon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Figure 5.7: The persistent homology of complex networks pipeline. . . . . . . . . . . . . . 179 Figure A.1: Region N is affected by noise in the MPE plot, and region S is unaffected. . . . 183 Figure A.2: A comparison between (left) unranked values and (right) ranked values for calculating correlation coefficients. Using the ranked 𝑥 and 𝑦, Spearman’s correlation coefficient can be used to accurately reveal existing nonlinear monotonic correlations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Figure A.3: Example showing two different partition methods for Mutual Information estimation using 𝑘 = 1 nearest neighbor adaptive partitioning. . . . . . . . . . 186 Figure D.1: Numerical analysis of the maximum persistence of the cycle graph 𝐺 cycle (𝑛) with size 𝑛 when using diffusion distance with 𝑡 = 2𝑑. . . . . . . . . . . . . . . 193 Figure D.2: Comparison of max 𝐿 1 and #{𝐿 1 } for each system and mean when varying 𝑡 in 𝑃𝑡 with respect to the diameter (𝑡 ∈ [𝑑, 5𝑑]). . . . . . . . . . . . . . . . . . . 194 xxiv CHAPTER 1 SUBLEVEL SET PERSISTENCE FOR TIME SERIES ANALYSIS This chapter overviews my work on studying how sublevel set persistence, a tool from topological data analysis, can be leveraged for signal processing. The first application is to estimate damping parameters of a single degree of freedom system with a noisy time series as an input. The second application is for bifurcation and signal complexity analysis. First, in Section 1.1 I introduce sublevel set persistence and the novel and computationally efficient algorithm for applying it to one-dimensional signals, Section 1.2 develops the statistical analysis to separate sublevel sets associated to noise from signal is developed, Section 1.3 shows how sublevel set persistence can be leveraged for damping parameter estimation, and in section 1.4 I apply sublevel set persistence to signals for complexity and bifurcation detection. Figure 1.1: Overview of research chapters with past, current, and future works. r 1.1 Sublevel Set Persistence I now provide a basic introduction to sublevel set persistence so the reader has a sufficient under- standing of the method. Let us begin with the single variable function 𝑓 : R → R. Given 𝑟 ∈ R, I define the sublevel set below 𝑟 as 𝑓 −1 (−∞, 𝑟]. As the filtration parameter 𝑟 increases, the sublevel sets may grow but remain the same (up to homology) until a local extrema (i.e., a local minimum or maximum) is reached. If the extrema is a local minima, then a new set is born at 𝑟 𝐵 ; I label that 1 set with the value 𝑟 𝐵 . On the other hand, if the extrema is a local maxima, two previously-existing sets are combined. If the two sets were labeled 𝑟 𝐵 and 𝑟 ′𝐵 , with 𝑟 𝐵 ≤ 𝑟 ′𝐵 and the maximum attained at 𝑟 𝐷 , then, by the Elder Rule [70, p. 150], I say that the component born at 𝑟 ′𝐵 dies going into 𝑟 𝐷 . The pair (𝑟 ′𝐵 , 𝑟 𝐷 ) is called a persistence pair. As 𝑟 ranges from −∞ to ∞, the persistence diagram is the collection of all 𝑛 such pairs, dgm 𝑓 = {(𝑏𝑖 , 𝑑𝑖 )}𝑖=1 𝑛 . Any unpaired births are called essential classes and are paired with a death coordinate of ∞; thus, dgm 𝑓 is embedded in the extended plane 2 R . The lifetime or persistence of a point (𝑏𝑖 , 𝑑𝑖 ) ∈ dgm 𝑓 is defined as ℓ𝑖 = 𝑑𝑖 − 𝑏𝑖 . In this work, the functions are only sampled on a finite domain, with the first sample at time 𝑡 𝑎 and the last sample at time 𝑡 𝑏 . I obtain a continuous function over [𝑡 𝑎 , 𝑡 𝑏 ] by using a piecewise linear interpolation between consecutive samples, and extending the function to ±∞ by extending the first (resp., last) edges to rays. Doing so allows us to define a persistence diagram that does not have critical points on the boundary of the time series. As such, I study the persistence points where both coordinates are finite, and omit persistence points that contain an unbounded coordinate. To demonstrate persistence diagrams and sublevel set persistence, I demonstrate a simple example for the function shown in 1.2. This function has thirteen sample points, two local minima, and two local maxima. The lowest critical value of the function occurs at height 𝑣 0 . For all 𝑟 < 𝑣 0 , Figure 1.2: Example 0D sublevel set persistence from function 𝑓 (𝑡) over finite domain 𝑡 ∈ [𝑡 𝑎 , 𝑡 𝑏 ] with the resulting persistence diagram on the right. 𝑓 −1 (−∞, 𝑟] is the ray [ 𝑓 −1 (𝑟), ∞). This connected component is labeled with −∞, since it is “born” at −∞. Then, at height 𝑟 = 𝑣 0 , a second connected component is born. The next topological change occurs at height 𝑟 = 𝑣 1 , where a third connected component is born. The next extrema is reached when 𝑟 = 𝑝 0 . At this extrema, the sublevel set that was born at 𝑟 = 𝑣 1 dies, while the sublevel set born at 𝑟 = 𝑣 0 persists based on the Elder Rule. This pair (𝑣 1 , 𝑝 0 ) is recorded in the 2 persistence diagram. From here, the next change happens at 𝑟 = 𝑝 1 , where the second sublevel set dies and is recorded in the persistence diagram as (𝑣 0 , 𝑝 1 ). Then, no further topological changes occur, but this sublevel set continues to grow as 𝑟 grows. This essential class is recorded in the persistence diagram as (−∞, ∞) and is not studied in the analysis. As shown in the persistence diagram, the point (𝑣 1 , 𝑝 0 ) is close to the diagonal (the line 𝑦 = 𝑥), which signifies that the sublevel set only persisted for a short range of heights (𝑟); on the other hand, the point (𝑣 0 , 𝑝 1 ) is far from the diagonal, suggesting it was from a significant sublevel set. The idea of persistence can be extended to higher dimensions allowing for the analysis of the shape of high-dimensional data sets. However, for my work, we only need to analyze the zero-dimensional features (i.e., connected components) of a one-dimensional function. A more thorough background on TDA, and persistent homology specifically, can be found in [69,157,174]. Other common ways for studying time series with a similar perspective is through merge trees or dendograms [37, 46, 128]. 1.1.1 Sublevel Set Persistence with Additive Noise I now investigate the stability of sublevel set persistence diagrams to additive noise for single variable functions. To illustrate the stability, I first take an example time series with additive noise as 𝑥(𝑡) + 𝜖, where 𝑥(𝑡) is sampled at a uniform rate 𝑓𝑠 and 𝜖 is additive noise from the noise model N . An example of a persistence diagram from the time series with additive noise dgm𝑥 + 𝜖 is shown in 1.3, along with the diagram without the additive noise dgm𝑥. This example also demonstrates how a cutoff 𝐶𝛼 can be used to separate the significant points in the persistence diagram and those associated to the additive noise. This example demonstrates that the addition of noise does not have a large effect on the position of significant sublevel sets in the persistence diagram with the distances between significant points (𝑑1 and 𝑑2 ) all being relatively small. This is no surprise due to the stability theorem of the bottleneck distance for persistence diagrams [49], where the bottleneck distance is defined as the minimum distance to match two persistence diagrams. For example, if I assume 𝑑1 > 𝑑2 > 𝑑3 > 𝑑4 , 3 Figure 1.3: Sublevel set persistence applied to 𝑥(𝑡) of a single variable function or time series with and without additive noise 𝜖 from N , shown in red and blue, respectively. This demonstrates the stability of persistent homology with the time series (left) with and without additive noise and the small effect on the resulting persistence diagrams (right). In addition, the light red region separates the significant features from those associated to additive noise. then the bottleneck distance would be 𝑑2 . However, additive noise does introduce several points in the persistence diagram located near the diagonal with relatively small lifetimes. These noise- artifact persistence pairs are formed from the peak-valley pairs in the additive noise. This work focuses on a statistical analysis of these lifetimes to develop a method for separating the significant persistence diagram points from those of additive noise, shown in as light red region in example persistence diagram of 1.3, through a cutoff 𝐶𝛼 with 𝛼 ∈ [0, 1] as the given confidence level. As mentioned previously, there are currently methods for developing confidence sets and associated cutoffs for persistence diagrams [40, 73]. However, these methods are specific to distance-like filtrations or require a high sampling rate. Moreover, boostrap-based techniques can be costly Additionally, methods such as persistent entropy [10] for separating noise from significant features in a persistence diagram may not properly distinguish between the noise and significant points if the number of significant data points in the persistence diagram is relatively large compared to the amount of noise. To address all of these issues, I introduce a new statistical method for developing a confidence interval and corresponding cutoff 𝐶𝛼 . The aforementioned statistical analysis is discussed in the proceeding sections as follows. First, in 1.2.1, I introduce my novel analysis of the statistics of the lifetimes in the persistence diagram from the sublevel set persistence of additive noise with a probability distribution 𝑓 (𝑥). I then apply this analysis in 1.2.3 to several noise models commonly used or seen in real-world applications. 4 Following this, in 1.2.4, I introduce a method using the persistence diagram to estimate the needed distribution parameters for calculating the cutoff. Finally, in Section 1.2.5, I investigate the use of a compensation term on the distribution parameter estimation. 1.2 Statistical Analysis of Sublevel Set Persistence 1.2.1 Statistics of Additive Noise in the Persistence Diagram Before studying a time series with additive noise, 𝑥 + 𝜖 : R → R, I analyze the statistics of sublevel set persistence diagrams of the noise alone. Our goal is to leverage this analysis in order to generate a cutoff in the persistence diagram to separate out these noise-artifact points in the persistence diagram for D(𝑥 + 𝜖). Relating Statistics Background I start with the noise, which can be thought of as a (sampled) function 𝜖 : R → R, where, for each 𝑥 ∈ R, the value 𝜖 (𝑥) is a random variable sampled indepen- dently and identically distributed (iid) from some predefined noise distribution N . In our noise model, there is no covariance structure between these random variables. The first step in devel- oping a cutoff based on the persistence diagram statistics of additive noise D(𝜖) is to determine a relationship between the descriptive additive noise distribution parameters and the distribution of the lifetimes. To do this, I develop an expression for the expected lifetime of points in D(𝜖). Let 𝑓 : R → R and 𝐹 : R → R be the probability density function and cumulative density function of N , respectively. Let 𝑓 𝐵 : R → R and 𝑓 𝐷 : R → R be the probability density functions for the local minima and maxima or birth and death times of the sublevel sets from N , respectively. Let 𝐹𝐵 and 𝐹𝐷 be the corresponding cumulative density functions. Based on the commutative property of addition and the definition of a lifetime being the difference between the death and birth times, respectively, the expected or mean lifetime 𝜇 𝐿 is the difference between the expected birth times 𝜇 𝐵 := E(𝐵) and death times 𝜇 𝐷 := E(𝐷), where 𝐵 and 𝐷 are the sets of birth and death values, as ∫ ∞ 𝜇 𝐿 := 𝜇 𝐷 − 𝜇 𝐵 = 𝑥 [ 𝑓 𝐷 (𝑥) − 𝑓 𝐵 (𝑥)] 𝑑𝑥. (1.1) −∞ 5 A formal proof of this relationship is provided in Theorem B.1.1 of Appendix B.1. From Eq. (1.1), I can move forward knowing that 𝜇 𝐿 can be defined using only expressions for 𝑓 𝐵 (𝑥) and 𝑓 𝐷 (𝑥). In other words, only the distribution of birth and death times is needed, not of the lifetimes, which would require knowing how the births and deaths are paired. 𝑖𝑖𝑑 I next compute the local maxima density distribution 𝑓 𝐷 (𝑥). Let {𝑥 1 , 𝑥2 , . . . , 𝑥 𝑛 } ∼ N . Ordering the samples by their index, I look at the probability of a given sample 𝑥𝑖 being a local maximum in this sequence. Because 𝑥𝑖−1 , 𝑥𝑖 , and 𝑥𝑖+1 are all iid from 𝑓 (𝑥), I can state that 𝑓 𝐷 (𝑥) = 𝑝(𝑥𝑖 ) 𝑝(𝑥𝑖−1 < 𝑥𝑖 ) 𝑝(𝑥𝑖+1 < 𝑥𝑖 ) (1.2) 2 = 𝑓 (𝑥)𝐹 (𝑥), where 𝑝(𝑥𝑖 ) = 𝑓 (𝑥) and 𝑝(𝑥𝑖−1 < 𝑥𝑖 ) = 𝑝(𝑥𝑖+1 < 𝑥𝑖 ) = 𝐹 (𝑥) based on the definition of a cumulative probability function. Similarly, it shows that the local minima distribution is described as 𝑓 𝐵 (𝑥) = 𝑝(𝑥𝑖 ) 𝑝(𝑥𝑖−1 > 𝑥𝑖 ) 𝑝(𝑥𝑖+1 > 𝑥𝑖 ) (1.3) 2 = 𝑓 (𝑥) [1 − 𝐹 (𝑥)] , where 𝑝(𝑥𝑖−1 > 𝑥𝑖 ) = 𝑝(𝑥𝑖+1 > 𝑥𝑖 ) = 1 − 𝐹 (𝑥). To use the expectation function 𝐸 (𝑔(𝑥)) = ∫∞ −∞ 𝑥𝑔(𝑥)𝑑𝑥 on a continuous probability density function 𝑔(𝑥), it is required that 𝑔(𝑥) is a proper ∫∞ density function with −∞ 𝑔(𝑥)𝑑𝑥 = 1. This requirement is used to normalized both 𝑓 𝐵 (𝑥) and 𝑓 𝐷 (𝑥) as 2 ˆ𝑓 𝐵 (𝑥) = 𝑓 (𝑥) [1 − 𝐹 (𝑥)] , 𝑁𝐵 2 (1.4) 𝑓 (𝑥)𝐹 (𝑥) 𝑓ˆ𝐷 (𝑥) = , 𝑁𝐷 ∫∞ ∫∞ where 𝑁 𝐵 = −∞ 𝑓 (𝑥) [1 − 𝐹 (𝑥)] 2 𝑑𝑥 and 𝑁 𝐷 = −∞ 𝑓 (𝑥)𝐹 2 (𝑥)𝑑𝑥. I can further reduce 𝑁 𝐵 and 𝑁 𝐷 from Eq. (1.4) by substituting 𝑓 (𝑥) = 𝐹 ′ (𝑥), which reduces the 𝑁 𝐷 equation to ∫ ∞ ∫ ∞ 1 3 𝑁𝐷 = ′ 2 𝐹 (𝑥)𝐹 (𝑥)𝑑𝑥 = (𝐹 (𝑥)) ′ 𝑑𝑥 −∞ −∞ 3 (1.5) 1 1 = 𝐹 3 (𝑥)| ∞ −∞ = , 3 3 6 since it is assumed that 𝐹 (∞) = 1 and 𝐹 (−∞) = 0. Similarly, ∫ ∞ ∫ ∞ ∫ ∞ 2 2 𝑁𝐵 = 𝑓 (𝑥) [1 − 𝐹 (𝑥)] 𝑑𝑥 = 𝑓 (𝑥) [1 − 2𝐹 (𝑥) + 𝐹 (𝑥)]𝑑𝑥 = 𝑁 𝐷 + 𝑓 (𝑥) [1 − 2𝐹 (𝑥)]𝑑𝑥 −∞ −∞ −∞ ∫ ∞ ∫ ∞ ′ 1 (𝐹 2 (𝑥)) ′ 𝑑𝑥 = 𝑁 𝐷 + 𝐹 (𝑥) − 𝐹 2 (𝑥) | ∞   = 𝑁𝐷 + 𝐹 (𝑥)𝑑𝑥 − −∞ = 𝑁 𝐷 = . −∞ −∞ 3 (1.6) This can now reduce Eq. (1.4) to 𝑓ˆ𝐵 (𝑥) = 3 𝑓 (𝑥) [1 − 𝐹 (𝑥)] 2 , (1.7) 𝑓ˆ𝐷 (𝑥) = 3 𝑓 (𝑥)𝐹 2 (𝑥), I now assume 𝑓 (𝑥) is of a Gaussian distribution to validate our expressions in Eq. (1.4). Specifically, I define the Gaussian (normal) probability distribution as 2 1 − ( 𝑥−𝜇) 𝑓 (𝑥) = √ 𝑒 2𝜎2 , (1.8) 2𝜋𝜎 2 with a cumulative distribution    1 𝑥−𝜇 𝐹 (𝑥) = 1 + erf √ . (1.9) 2 𝜎 2 To validate the resulting expressions for 𝑓ˆ𝐵 (𝑥) and 𝑓ˆ𝐷 (𝑥) in (1.4), a numerical simulation of a normal distribution N𝑛 (𝜇 = 0, 𝜎 2 = 1) of length 𝑛 = 10𝐸5 was used (see Fig. 1.4). This analysis shows a very similar result between the histograms ℎ(∗) and distributions. From the numerical simulation, I also found the ratio 𝐿¯ ≈ 1.686, where 𝐿¯ is the sample mean of the lifetimes from N (0, 𝜎 2 = 1). Additionally I found that 𝐷¯ − 𝐵¯ ≈ 1.689 ≈ 𝐿. ¯ These results suggest that Eq. (1.1) and Eq. (1.4) are correct. I now move on to determine a suitable cutoff with unknown probability 𝑓 (𝑥) and cumulative probability density 𝐹 (𝑥) functions. Now that I have shown that our expressions for the probability distribution of the minima and maxima are correct, I proceed to correlate the mean lifetime 𝜇 𝐿 to the additive noise distribution parameters. From our results in (1.7) I can now calculate the mean lifetime as ∫ ∞ ∫ ∞ 𝑥 (𝐹 2 (𝑥)) ′ − 𝐹 ′ (𝑥) 𝑑𝑥,  2 2    𝜇𝐿 = 3 𝑥 𝑓 (𝑥) 𝐹 (𝑥) − (1 − 𝐹 (𝑥)) 𝑑𝑥 = 3 (1.10) −∞ −∞ which can then be simplified using integration by parts as ∫ ∞ 𝜇𝐿 = 3 𝐹 (𝑥) [1 − 𝐹 (𝑥)] 𝑑𝑥. (1.11) −∞ 7 Figure 1.4: Histograms ℎ(∗) of the zero mean normal distriubtion N (0, 𝜎 2 = 1) and the resulting birth times 𝐵 and death times 𝐷, which are compared to the density distributions from Eq. (1.4). 1.2.2 Cutoff Background To determine a suitable cutoff, I again start by assuming I have 𝑛 random samples from our 𝑖𝑖𝑑 noise distribution: x = {𝑥1 , 𝑥2 , . . . , 𝑥 𝑛 } ∼ N with a cumulative probability function 𝐹 (𝑥). The probability that the minimum of x is less than the value 𝑎 is equivalent to 𝑃(min(x) < 𝑎) = 1 − 𝑃(𝑥1 > 𝑎, 𝑥 2 > 𝑎, . . . , 𝑥 𝑛 > 𝑎), (1.12) where 𝑃(𝑥𝑖 > 𝑎) = 1 − 𝐹 (𝑎). If this relationship is extended to all 𝑛 realizations, the probability is 𝑃(min(x) < 𝑎) = 1 − (1 − 𝐹 (𝑎)) 𝑛 . (1.13) Similarly, an expression for the probability of an element of x being greater than 𝑏, where 𝑏 > 𝑎, is 𝑃 (max(x) > 𝑏) = 1 − (𝐹 (𝑏)) 𝑛 . (1.14) If I now take both of these probabilities, I can extend them to the maximum lifetime as max(𝐿) ⪅ max(x) − min(x). we can use to generate a probability of a lifetime being greater than 𝑏 − 𝑎 as  𝛼 = 𝑃(max(𝐿) > 𝑏 −𝑎) ⪆ 𝑃 max(x) > 𝑏, min(x) < 𝑎 = (1− [𝐹 (𝑏)] 𝑛 )(1− [1−𝐹 (𝑎)] 𝑛 ), (1.15) where 𝛼 is the confidence of this event occurring. If the 𝑓 (𝑥) associated to 𝐹 (𝑥) of Eq. (1.15) is symmetric about some mean 𝜇 such that 𝑐 = 𝑏 − 𝜇 = 𝜇 − 𝑎, I can reduce Eq. (1.15) to 𝛼 = (1 − [𝐹 (𝑐)] 𝑛 ) 2 (1.16) 8 since 𝐹 (𝑏) = 1 − 𝐹 (𝑎) for the symmetric case. (1.16) can be then solved for 𝑐 as h √  1/𝑛 i 𝑐 = 𝐹 −1 1− 𝛼 . (1.17) Additionally, I know that a cutoff should be set such that 𝐶𝛼 = 𝑏 − 𝑎 = 2𝑐 for a symmetric distribution about some mean 𝜇, which result in a cutoff equation as h √  1/𝑛 i 𝐶𝛼 = 2𝐹 −1 1− 𝛼 . (1.18) On the other hand, if there is no symmetry in the distribution then I need a new cutoff equation. To do this, I return to our probability equation as 𝛼 = 𝑃(max(𝐿) > 𝑏−𝑎) ⪆ 𝑃(min(x) < 𝑎, max(x) > 𝑏) = (1−[1 − 𝐹 (𝑎)] 𝑛 )(1−[𝐹 (𝑏)] 𝑛 ), (1.19) However, unlike Eq. (1.18), I can not solve Eq. (1.15) for a parameter 𝑐 due to their being no symmetry between 𝑎 and 𝑏 about a mean 𝜇 which means I must simplify Eq. (1.19) in some way. √ To do this, I assume that 𝑃(min(x) < 𝑎) = 𝑃(max(x) > 𝑏) or 1− [1 − 𝐹 (𝑎)] 𝑛 = 1− [𝐹 (𝑏)] 𝑛 = 𝛼. I can then solve for 𝑎 and 𝑏 separately as h √ 1/𝑛 i 𝑎 = 𝐹 −1 1 − (1 − 𝛼) (1.20) and h √ 1/𝑛 i 𝑏 = 𝐹 −1 (1 − 𝛼) . (1.21) With 𝐶𝛼 = 𝑏 − 𝑎 and the values of 𝑎 and 𝑏 from (1.20) and Eq. (1.21), respectively, I can solve for our general cutoff expression as −1 h √ 1/𝑛 i −1 h √ 1/𝑛 i 𝐶𝛼 = 𝐹 (1 − 𝛼) −𝐹 1 − (1 − 𝛼) . (1.22) For our application I want to have a high confidence level that no outliers occur and that the cutoff accurately captures all of the noise, so I suggest a confidence level of 𝛼 = 0.1%, which is equivalent to a 0.1% chance that an outlier greater than the persistence diagram lifetime cutoff 𝐶𝛼 (see Fig. 1.5) exists given 𝑛 samples. 9 Figure 1.5: Example cutoff 𝐶𝛼 for a persistence diagram and time order lifetimes of sublevel set persistence from 𝑥(𝑡) + N . (1.18) and (1.22) are only dependent on the desired confidence 𝛼, the signal length 𝑛 and the cumulative probability distribution 𝐹 (𝑥) with the cumulative probability distribution having another distribution parameter (e.g. 𝜎 for the Gaussian distribution). I address how to estimate this parameter, if it is unknown, in Section 1.2.4. Before this, in Section 1.2.3 I demonstrate how to apply Eq. (1.18) and Eq. (1.22) for the Gaussian, uniform, Rayleigh, and exponential distribution. 1.2.3 Cutoff for Noise Models For applying noise models to the confidence levels in Equations (1.15) and (1.16), I need to be either given the additive noise parameters, or estimate them from the lifetimes. However, before this can be done, I need to understand which parameters are needed given the additive noise distribution 𝑓 (𝑥). I do this analysis for Gaussian (normal), Uniform, Rayleigh, and exponential distributions as shown in Fig. 1.6. 1.0 Uniform 0.8 Gaussian Rayleigh 0.6 f (x) Exponential 0.4 0.2 0.0 −4 −2 0 2 4 x Figure 1.6: Additive noise probability distributions 𝑓 (𝑥) for the four models realized in this work: uniform, Gaussian, Rayleigh, and exponential. 10 Cutoff for Gaussian Noise I start our analysis with the commonly used Gaussian distribution model. The Gaussian (normal) probability distribution function is defined as 2 1 − ( 𝑥−𝜇) 𝑓 (𝑥) = √ 𝑒 2𝜎2 , (1.23) 2𝜋𝜎 2 with a cumulative distribution function    1 𝑥−𝜇 𝐹 (𝑥) = 1 + erf √ . (1.24) 2 𝜎 2 I start by solving for the inverse of Eq. (1.24) as √ 𝐹 −1 (𝑢) = 2𝜎erf −1 (2𝑢 − 1) + 𝜇. (1.25) Since the mean shift 𝜇 has no effect on the sublevel set lifetimes I can ignore it and apply Eq. (1.25) with 𝜇 = 0 to solve for the cutoff from Eq. (1.18) as h √ i 𝐶𝛼 = 23/2 𝜎 erf −1 2(1 − 𝛼) 1/𝑛 − 1 . (1.26) With a full development of the statistics of sublevel set persistence for Gaussian (normal) additive noise I are able to determine a suitable cutoff for i.i.d. noise with only the distribution parameter 𝜎 needed. Cutoff for Uniform Noise Let 𝑎 < 𝑏 ∈ R. The uniform distribution over the interval [𝑎, 𝑏] has a probability density function defined as 1 𝑥 ∈ [𝑎, 𝑏]     𝑏−𝑎 𝑓 (𝑥) = (1.27)  0  otherwise  with a cumulative distribution function  0     𝑥<𝑎    𝐹 (𝑥) = 𝑥−𝑎 𝑏−𝑎 𝑥 ∈ [𝑎, 𝑏] (1.28)     1   𝑥 > 𝑏.  11 By assuming a symmetric distribution about zero (this assumptions does not influence the resulting cutoff due to the properties sublevel set persistence lifetime) such that 𝑎 = −𝑏 and Δ = 𝑏 − 𝑎. This changes 𝐹 (𝑥) to 𝑥 < − Δ2  0        𝐹 (𝑥) =  2𝑥+Δ 2Δ 𝑥 ∈ [− Δ2 , Δ2 ] (1.29)    1 𝑥 > Δ2    If I now apply Eq. (1.18) to the inverse of the cumulative probability distribution in Eq. (1.29), I can calculate 𝐶𝛼 as h √  1/𝑛 i 𝐶𝛼 = Δ 2 1 − 𝛼 −1 . (1.30) Equation (1.30) only requires the distribution parameter Δ as both 𝛼 and 𝑛 are chosen as desired and the length of the time series, respectively. Cutoff for Rayleigh Noise The Rayleigh distribution has a probability density function over the domain 𝑥 ∈ [0, ∞) and is defined as 𝑥 − 𝑥22 𝑓 (𝑥) = 𝑒 2𝜎 , (1.31) 𝜎2 with a cumulative distribution function 𝑥2 − 𝐹 (𝑥) = 1 − 𝑒 2𝜎 2 . (1.32) Since this distribution is asymmetric I use Eq. (1.22) to calculate 𝐶𝛼 as √ 1/𝑛  √︃ √ 1/𝑛  √︃  𝐶𝛼 = 𝜎 −2 ln [1 − 𝛼] − −2 ln 1 − [1 − 𝛼] , (1.33) where 𝜎 is the only parameter that needs to be provided to calculate the cutoff. Cutoff for Exponential Noise The exponential distribution has a probability density function over the domain 𝑥 ∈ [0, ∞) and is defined as 𝑓 (𝑥) = 𝜆𝑒 −𝜆𝑥 , (1.34) 12 with a cumulative distribution function 𝐹 (𝑥) = 1 − 𝑒 −𝜆𝑥 , (1.35) where 𝜆 is the distribution parameter with 𝜆 > 0. This this distribution is also asymmetric, so I use Eq. (1.22) to calculate 𝐶𝛼 as 1  √ √  𝐶𝛼 = − ln [1 − 𝛼] 1/𝑛 − [1 − 𝛼] 2/𝑛 , (1.36) 𝜆 where 𝜆 is the only parameter that needs to be provided to calculate the cutoff. 1.2.4 Cutoff and Distribution Parameter Estimation Method If the distribution parameter is know (𝜎 for Gaussian distributions, Δ for uniform distributions, 𝜎 for Rayleigh distributions, and 𝜆 for exponential distributions), then the cutoff 𝐶𝛼 can be calculated simply with the use of the correct cutoff equation in Section 1.2.3 and the subsequent analysis may be skipped. However, in most real-world time series it is uncommon to know what this parameter is and thus it needs to be estimated. While there are some methods for estimating the additive noise parameters [54, 95, 234], I introduce a new method utilizing the relationship between the sublevel set lifetimes from both the signal and noise and the additive noise distribution parameters. To generate a theoretical relationship between the mean lifetime 𝜇 𝐿 and the distribution param- eters, I recall Eq. (1.11): ∫ ∞ 𝜇𝐿 = 3 𝐹 (𝑥) [1 − 𝐹 (𝑥)] 𝑑𝑥. −∞ In the subsequent subsections, I show how this relationship is used for each of the four noise models analyzed in this work. However, when the signal is not pure noise, which would be the case for any informative time series, the mean lifetime is heavily influenced from the lifetimes associated to significant features. To address this issue, I instead calculate the median of the lifetimes as it is robust up to 50% outliers (or signal in our application) and apply a signal compensation. This brings up an assumption for this distribution parameter estimation method to function correctly: the number of persistence diagram features associated with noise 𝑁𝑛 must be equal to or greater 13 than the number of features from the signal 𝑁 𝑠 . Additionally, when 𝑁𝑛 approaches 𝑁 𝑠 the cutoff becomes more conservative due to the robustness limitation of the median. To minimize this effect, in Section 1.2.5, I develop a numeric compensation multiplier which uses the persistence pairs associated to both additive noise and signal. In general, the condition for 𝑁 𝑠 < 𝑁𝑛 is met if the time series is sampled at a rate sufficiently higher than the Nyquist sampling criteria 𝑓Nyquist and, of course, the time series has some additive noise. If these conditions are not met, I suggest the use of an alternative method to estimate the distribution parameter of the additive noise and apply its associated cutoff equation in Section 1.2.3. For a symmetric distribution of the lifetimes, the median would be an accurate estimate of the mean. However, for most additive noise distributions (e.g. Gaussian), the distribution of the resulting sublevel set persistence lifetimes is not symmetric. Therefore, I resort to approximating the relationship between the mean and median numerically. While there are methods to estimate the mean using the median and Inter-Quartile Range (IQR) as described in [236]. This method is only robust for up to 25% outliers (or signal in our application) due to the 𝑄 3 upper quartile. Therefore, I use the numerically approximated ratios of 𝜌 = 𝐿/ ¯ 𝐿˜ as provided in Table 1.1 for each of the four distributions investigated, where 𝐿¯ is the sample mean lifetime and 𝐿˜ is the sample median lifetime. For each of these numeric estimates a time series of length 105 was used. This numeric experiment was repeated ten times to provide a mean 𝜌 with uncertainty. This ratio can be used to estimate the mean lifetime as 𝐿¯ ≈ 𝜌 𝐿. ˜ Table 1.1: Ratios 𝜌 = 𝐿/ ¯ 𝐿˜ for estimating sample mean from the sample median with uncertainty as three standard deviations Distribution Gussian Uniform Rayleigh Exponential ¯ 𝐿˜ 𝜌 = 𝐿/ 1.154 ± 0.012 1.000 ± 0.010 1.136 ± 0.013 1.265 ± 0.016 Relating The distribution Statistic to the Median Lifetime I now apply Eq. (1.11) and 𝜌 from Table 1.1 to find relationships between the median lifetime 𝑀𝐿 and the distribution parameter used in each distribution’s cutoff equation. 14 Normal Distribution: For estimating 𝜎 of the Gaussian distribution, I use (1.11) and the Gaussian cumulative distribution to estimate 𝜇 𝐿 as a function of 𝜎. Specifically, by numerically approximating the integral in Eq. (1.11) using 𝑥 ∈ [−10, 10] with len(𝑥) = 106 , I found the relationship 𝜇𝐿 𝜎≈ . (1.37) 1.692 I then used 𝜌 to have Eq. (1.37) as a function of the median lifetime 𝑀𝐿 as 𝜌𝑀𝐿 𝜎≈ ≈ 0.680𝑀𝐿 , (1.38) 1.692 where 𝑀𝐿 is the median lifetime. Applying this result to (1.26) allows for a cutoff to be calculate as −1 h √ 1/𝑛 i ˜ 𝐶𝛼 ≈ 1.923 𝐿 erf 2(1 − 𝛼) − 1 , (1.39) where 𝐿˜ is the sample median lifetime. Uniform Distribution Next, I apply Eq. (1.11) to the uniform cumulative distribution to estimate 𝜇 𝐿 as a function of Δ. Substituting (1.29) into Eq. (1.11) results in ∫ Δ/2   2𝑥 + Δ 2𝑥 + Δ 𝜇𝐿 = 3 1− 𝑑𝑥. −Δ/2 2Δ 2Δ Expanding and solving this integral results in the relationship Δ 𝜇𝐿 = =⇒ Δ = 2𝜇 𝐿 = 2𝑀𝐿 . (1.40) 2 Applying this result to Eq. (1.30) allows for a cutoff to be calculate as h √  1/𝑛 i ˜ 𝐶𝛼 = 2 𝐿 2 1 − 𝛼 −1 . (1.41) Rayleigh Distribution For estimating 𝜎 of the Rayleigh distribution, I again use (1.11) with the cumulative Rayleigh distribution to numerically estimate the relationship between 𝜇 𝐿 and 𝜎 as 𝜇𝐿 𝜌𝑀𝐿 𝜎≈ ≈ ≈ 1.025𝑀𝐿 , (1.42) 1.102 1.102 where the integral in Eq. (1.11) was numerically approximated using 𝑥 ∈ [0, 20] with len(𝑥) = 106 . Applying this result to Eq. (1.33) allows for a cutoff to be calculate as √ 1/𝑛  √︃ √ 1/𝑛  √︃  𝐶𝛼 ≈ 1.025 𝐿˜ −2 ln [1 − 𝛼] − −2 ln 1 − [1 − 𝛼] . (1.43) 15 Exponential Distribution Next, I apply Eq. (1.11) to the exponential cumulative distribution function to estimate 𝜇 𝐿 as a function of 𝜆. Substituting Eq. (1.35) into Eq. (1.11) results in ∫ ∞  𝜇𝐿 = 3 1 − 𝑒 −𝜆𝑥 𝑒 −𝜆𝑥 𝑑𝑥, (1.44) 0 which was solved using a 𝑢-substitution as 3 3 𝜇𝐿 = →𝜆= . (1.45) 2𝜆 2𝜇 𝐿 By then using the appropriate 𝜌 from Table 1.1 to use 𝑀𝐿 instead of 𝜇 𝐿 , I approximate 𝜆 from the median lifetime: 1.875 𝜆≈ . (1.46) 𝑀𝐿 Applying this result to Eq. (1.36) allows for a cutoff to be calculate as  √ √  𝐶𝛼 ≈ −0.533 𝐿˜ ln [1 − 𝛼] 1/𝑛 − [1 − 𝛼] 2/𝑛 . (1.47) 1.2.5 Signal Compensation for the Cutoff and Distribution Parameter In this section, I discuss the effects of signal on the cutoff estimation methods described. In 𝑖𝑖𝑑 Section 1.2.4 I assumed that the time series was of the form 𝑥(𝑡) = {𝑥 1 , 𝑥2 , . . . , 𝑥 𝑛 } ∼ N , however in practice, I typically have some underlying informative signal 𝑠 : R → R and have a time series of the form 𝑥(𝑡) = 𝑠(𝑡) + 𝜖 with a finite domain as 𝑡 ∈ [𝑡 𝑎 , 𝑡 𝑏 ]. The resulting sublevel sets from 𝑠(𝑡) + 𝜖 are assumed to have some lifetimes from 𝑠(𝑡) with the slope of the signal having an effect on the lifetimes associated with N . Because of these effects, I attempt to compensate the cutoff calculation and distribution parameter estimations for these effects for a general signal. Since a general signal is, in practice, rather subjective, I move away from a theoretical analysis of the signal and rather analyze the effects of the signal experimentally. I have partially addressed this issue of signal compensation by implementing the median lifetime 𝑀𝐿 instead of the mean lifetime 𝜇 𝐿 with the median being an outlier (signal in our case) robust statistic for up to 50% outliers. Even with the use of the median, I need to further develop a signal compensation procedure to improve the accuracy of the suggested cutoff. 16 Figure 1.7: Example time series showing sample 𝛿𝑖 . To fully understand the effects of signal on estimating the cutoff, I do a numeric study to develop a method for adjusting the median lifetime such that 𝑀𝐿 (𝑠(𝑡) + 𝜖) ≈ 𝑀𝐿 (N ). This analysis requires a new variable which I term 𝛿 as simply the median of step sizes defined as 𝛿𝑖 = 𝑥(𝑡𝑖+1 ) − 𝑥(𝑡𝑖 ) as shown in Fig. 1.7, where 𝑥(𝑡) is a discretely and uniformly sampled signal with a constant sampling rate 𝑓𝑠 . I now experimentally approximate the effects of signal on the median lifetime by using three “generic" signals suggested by [241] as 𝑓1 (𝑡) = 𝑡 − 𝑡 3 /3, (1.48) with 𝑡 ∈ [3.1, 20.4] and sampling rate 𝑓𝑠 = 20 Hz, 𝑓2 (𝑡) = sin(𝑡) + sin(2𝑡/3), (1.49) with 𝑡 ∈ [3.1, 20.4] and sampling rate 𝑓𝑠 = 20 Hz, and 5 ∑︁ 𝑓3 (𝑡) = − sin((𝑖 + 1)𝑡 + 𝑖), (1.50) 𝑖=1 with 𝑡 ∈ [−10, 10] and sampling rate 𝑓𝑠 = 20 Hz. Additionally, additive noise is included in the signal with 𝑠(𝑡) = 𝐴 𝑓 (𝑡) + 𝜖 with the additive noise distribution parameter set to one (e.g. 𝜎 = 1 for Gaussian) and signal amplitude 𝐴 increment by unit steps starting from zero such that the 𝛿 is also incremented until reaching a value 𝛿/𝜎 = 2. At each 𝛿 I calculate the median lifetime 𝐿˜ for 100 trials to provide a mean 𝐿˜ with uncertainty 𝑢 𝐿 as one standard deviation (see Fig. 1.8 for the Gaussian additive noise example). To find a function fitting to approximate this relationship between 𝛿 and 𝐿˜ for each distribution type. By observation of the median lifetimes in Fig. 1.8, I experimentally found an approximate functional template:   𝑐2 𝛿 −𝑐 1 ˜∗ 𝐿 = 𝐿˜ 0 𝑒 𝛿+ 𝐿˜ , (1.51) 17 where 𝐿˜ 0 is the median lifetime when 𝛿 = 0 or when the signal is just additive noise N . 1.6 c2 L̃∗ = L̃0e−c1( δ+L̃ ) for f1(t) δ 1.4 c2 L̃∗ = L̃0e−c1( δ+L̃ ) for f2(t) δ 1.2 c2 L̃∗ = L̃0e−c1( δ+L̃ ) for f3(t) δ 1.0 L̃ L̃ ± uL for f1(t) 0.8 L̃ ± uL for f2(t) 0.6 L̃ ± uL for f3(t) 0.4 0.2 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 δ/σ Figure 1.8: Numeric function fitting of Eq. (1.51) to the mean of the median lifetime 𝐿˜ of 𝑓𝑖 (𝑡) for 𝑖 ∈ [1, 3] where N is unit variance Gaussian additive noise with 𝛿 ∈ [0, 2] being incremented to understand the effects of signal on the median lifetime. As shown in Fig. 1.8, the fitted function shows a very similar quality to the numerically simulated means of the median lifetimes when the two constants in Eq. (1.51) were set to 𝑐 1 ≈ 0.845 and 𝑐 2 ≈ 0.809 for a Gaussian additive noise, which were chosen using the BFGS minimization of the ℓ2 norm cost function on the residuals when fitting to 𝐿˜ for all three generic functions. Another characteristic of these constants is that they are approximately independent of the additive noise distribution parameter, sampling frequency, and time series, which makes them global constants. The two constants from Eq. (1.51) are provided in Table 1.4 for the four distributions investigated in this work. With these constants, I calculate a multiplication compensation term for the signal as Table 1.2: Constants of Eq. (1.51) for each distribution type investigated in this work with associated uncertainty from ten trials. Distribution Gussian Uniform Rayleigh Exponential 𝑐1 0.845 ± 0.029 0.880 ± 0.017 0.726 ± 0.026 0.436 ± 0.036 𝑐2 0.809 ± 0.061 0.639 ± 0.026 0.605 ± 0.054 0.393 ± 0.075 𝑅, which is calculated from Eq. (1.51) as 𝐿˜ 0   𝑐2 𝛿 𝑐 𝑅= ∗ =𝑒 1 ˜ 𝛿+ 𝐿, (1.52) ˜𝐿 which is used to compensate for the effects of signal with 𝐶𝛼∗ = 𝑅𝐶𝛼 and 𝜎 ∗ = 𝑅𝜎. 18 Unfortunately, when 𝑠(𝑡) is unknown, the 𝛿 parameter used in (1.80) can no longer be directly calculated from the time series or sublevel set persistence diagram. To approximate 𝛿 I use the lifetimes greater than the initial uncompensated cutoff 𝐶𝛼 as 2 ∑︁ 𝛿≈ 𝐿𝐶𝛼 , (1.53) 𝑛 where 𝐿 𝐶 𝛼 are the lifetimes greater than 𝐶𝛼 . To validate the accuracy of Eq. (1.80) with 𝛿 approximated from (1.81) I estimate 𝜎 with and without the signal compensation 𝑅 from Eq. (1.80), I use a new time series 𝑥(𝑡) = 𝐴 sin(𝜋𝑡) + 𝜖 with N being a Gaussian distribution with unit variance and 𝐴 incremented to change 𝛿 ∈ [0, 2] for 100 trials at each 𝛿. As shown in Fig. 1.9, the true 𝜎 = 1 and the estimated 𝜎 without compensation from (1.38) shows an underestimate as 𝛿 increases until plateauing around 𝛿/𝜎 ≈ 1, which would cause for a cutoff that may not capture all of the lifetimes associated with noise. However, the signal compensated distribution parameter 𝜎 ∗ shows an accurate estimation of 𝜎 even as 𝛿 becomes significantly large. This example demonstrates the importance of signal compensation for an accurate cutoff and distribution parameter estimation. Figure 1.9: Demonstration of distribution parameter 𝜎 estimation of Gaussian additive noise in 𝑥(𝑡) = 𝐴 sin(𝜋𝑡) + N using the median lifetime with and without signal compensation as 𝜎 and 𝜎 ∗ , respectively. 1.3 Damping Parameter Identification Using Sublevel Set Persistence The study of damping mechanisms in the field of vibrations has always been a critical aspect of understanding the way dynamical systems behave and has been leveraged for many real-world 19 applications. While there have been data analysis methods for estimating these system parameters, the ability of engineers and scientisits to use signal processing techniques to determine these parameters is ever improving as new and more sophisticated data analysis techniques are discovered. The identification of these damping mechanisms and their assocaited damping parameters in a real-world dynamical system is a critical tool for analyzing and predicting the dynamics [29, 84, 191, 192]. Specifically, methods for estimating the damping parameters have been used extensively in signal processing engineering with application in structural health monitoring [32], improved predictions of mechanical response [136], biological system analysis [88, 155], and the analysis of Micro and Nano Electromechanical Systems (MEMS and NEMS) [187]. A common method for damping parameter identification is through a time domain analysis of the amplitude decrement (i.e. the damping envelope). This form of analysis is often implemented for viscous damping estimation through the logarithmic decrement of peaks. Unfortunately, many systems do not have damping of this nature or they have some non-linearity causing the log decrement method to not be suitable. Additionally, when significant noise is present in the signal, the estimation of peak values takes a degree of expertise and human evaluation, which makes damping parameter identification difficult to implement in an automatic scenario. These common issues have pushed researchers to develop automatic, noise robust methods for estimating damping parameters [102, 151]. In the past decade several of these methods have been developed for identifying systems parameter for a single degree of freedom system, including damping constants. These methods are typically based on either a time domain or frequency analysis (i.e. modal analysis) of the oscillator. The time domain response methods for damping parameter estimation are typically based on analyzing the envelope of the free response decrement or through an energy balance approach. The envelope of the free response is commonly used for estimating either viscous damping through an exponential envelope (viscous damping) or a constant decrement envelope (Coulomb or dry friction damping). Additionally, systems with both coulomb and viscous damping can be simulataneously analyzed using vibration decrements through the time domain [132]. 20 As an alternative to analyzing the envelope, the energy loss can be studied to estimate the damping parameters through least squares fitting for forced vibrations [133, 134]. However, this approach does require a method of forcing the oscillator to estimate the damping parameters, which is not always available or feasible. There are also energy-balance technqiues for parameter identification that do not require forcing, but rather both the position and velcoity signals [142]. However, for this technique to function properly a filtration is needed (cubic spline fitting in [142]), which is inherently computationally cumbersome. Another approach is to use the institaneous energy dissipation [150], but this method requires a lightly damped system which is a significant, yet common, limitation. There are also several other time domain methods including a method based on areas [96], which requires viscous damping but could possible be extended to other damping mechanisms. This method implements a numeric integration of the signal and zero crossing making it noise robust and only requiring the position signal of the oscillator. While this could seem like an easy solution for damping parameter estimation, the task of finding zero-crossings is not trivial and typically requires a filtration method which can be computationally expensive. Another commonly used method for parameter identification is to fit a function to the time series response based on tuning parameters, but this requires an initial guess on all parameters and an optimization algorithm. Another possible method for damping parameter estimation is based on solving a parabolic-type partial differential equation for analysis of the inverse vibration problem to estimate both stiffness and damping [138]. However, this method requires both the position and velocity data and is only resilient to moderate amounts of noise. As an alternative to a time domain analysis, frequency response methods are typically done by externally forcing the oscillator and measuring the phase and amplitude of the response at resonance (e.g. half-power method [172]). However, this assumes that the range of operation is within the linear region of oscillations or that the damping mechanisms are ampltidue independent. This method also requires a method of forcing the oscillator at multiple frequencies, which is not always feasible. An alternative to this option is to analyze the frequency response of a damped oscillation through the Fourier spectrum [245]. This method has been shown to be robust to some degree of 21 additive Gaussian noise [193]. However, it requires a least-squares estimation algorithm applied to the frequency domain of the signal, which is an additional computational expense. To estimate the damping model that is most suitable, I have developed a new method that implements zero-dimensional (0D) sub-level set persistence, a tool from topological data analysis, to analyze the time domain response of a free vibration single degree of freedom oscillator with viscous, coulomb, or quadratic damping. This novel method provides an extension of envelope analysis methods through a unique and noise robust analysis of the time domain response. This sublevel set persistnce analysis method also holds an advantage of not requiring a zero-mean for low damping parameters and is robust to non-stationarity in the signal. This is in comparison to many common damping parameter estimation techniques requiring these conditions [38]. I show that this technique is robust for a wide range of damping parameters (including very high damping up to a critically damped response), low sampling frequencies, and a high degree of noise contamination. Additonally, the algorithm for calculating the sublevel set persistence for one dimensional signals has a low computational cost with it being faster than the fast fourier spectrum [114]. Sublevel set persistence has recently been shown as a robust data analysis tool through appli- cations ranging from step detection [114] to cancer histology [127]. One of the most attractive features of sublevel set persistence is its robustness to perturbations (see stability theorem [49]). Additionally, by using sublevel set persistence to analyze the time domain of the free responses of a damped oscillator I will later be able to analyze the full domain (including non-lineary responses) of the system similar to the work done in analyzing MEMS [187]. The results in this work will be generated from both experimental data and numerically simulated single-degree-of-freedom spring-mass system with three common forms of damping (see Fig. 1.10): Coulomb, viscous, and quadratic. The forces caused by each of the damping mechanisms are applied to Newton’s law to generate an equation of motion as 𝑚 𝑥¥ = −𝑘𝑥 − 𝜇𝑐 𝑁 | 𝑥|sgn( ¤ ¤ − 𝜇𝑣 𝑥¤ − 𝜇 𝑞 | 𝑥| 𝑥) ¤ 𝑥, ¤ (1.54) with a mass 𝑚, spring constant 𝑘, and normal force 𝑁 = 𝑚𝑔. Here the normal force is constant, but in many applications this will not be the case, which can leave 𝑁 = 𝑓 ( 𝑥, ¥ 𝑥, ¤ 𝑥) where 𝑥 is the 22 position of the system and¤are its time derivatives. Figure 1.10: Single degree of freedom oscillator with multiple modes of energy dissipation. Energy dissipation mechanisms include Coulomb 𝜇𝑐 , viscous 𝜇𝑣 , and quadratic 𝜇 𝑞 damping. This work is ordered as follows. First, in Section 1.3.1 the closed form solutions (if applicable) and background information for viscous, Coulomb and quadratic damping are summarized. Sec- tion 1.3.1 also leverages the solutions to the damped responses for use with sublevel set persistence for damping identification. With an introduction to the damping mechanisms and sublevel set persistence, in Section 1.3.2 I begin an analysis of the effects of noise on damping parameter iden- tification using sublevel set persistence. This analysis will introduce two methods for minimizing the effects of noise. The first is based on a statistical analysis of additive noise in the persistence domain and the second is based on a function fitting approach. In Section 1.3.5 I provide three examples demonstrating each damping mechanism. Finally, in the results section (Section 4.3), the method is applied to a wide range of damping parameters, noise levels, and sampling frequencies to determine the limitations of the method. To make replicating this work easier for readers, the Python code for automatically calculating the damping parameters and constants has been made publicly available through GitHub (github.com/Khasawneh-Lab). 1.3.1 Sublevel Set Persistence of Damping Mechanisms In this section I introduce three damping mechanisms commonly used: Coulomb, viscous, and quadratic. For each form of damping, a theoretical relationship between the theoetical consecutive persistence pairs is formulated and used to determine the underlying damping parameter of the system. 23 Viscous Damping If the system being analyzed is assumed to be dominated by viscous damping then the system model is reduced from Eq. (1.54) to 𝑚 𝑥¥ + 𝑘𝑥 + 𝜇𝑣 𝑥¤ = 0. This linear differential equation has the closed form solution as 𝑥(𝑡) = 𝐴𝑒 −𝜁𝜔𝑛 𝑡 cos(𝜔 𝑑 𝑡 − 𝜙), (1.55) √ where the viscous damping can be summarized using the damping ratio 𝜁𝑣 = 𝜇𝑣 /(2 𝑚𝑘), the √︁ √︁ natural frequency 𝜔𝑛 = 𝑘/𝑚, the damped natural frequency 𝜔 𝑑 = 𝜔𝑛 1 − 𝜁 2 , the phase shift 𝜙, and the initial amplitude of the time series 𝐴. Typically, 𝜁 is estimated using local maxima and the log decrement method as v u u u 1 𝜁𝑣 = u (1.56) u u t !2 , 1+ 2𝜋𝑛  𝑝𝑖+𝑛 ln 𝑝𝑖 where 𝑝𝑖+𝑛 and 𝑝𝑖 denote the (𝑖 + 𝑛) th and 𝑖 th peaks, respectively. Unfortunately, this method for estimating 𝜁𝑣 is difficult to implement in an automatic way when noise is present as the selection of peaks becomes difficult. Additionally, if the time series is non-stationary or does not have a zero- mean, the standard logarithmic decrement method will not provide accurate damping parameter estimates. To help combat these issues I will implement sub-level set persistence to show how 𝜁𝑣 can be calculated from the resulting persistence diagram. Let us begin with a toy example of the time series and the resulting persistence diagram for viscous damping as shown in Fig. 1.11. The 𝑥 and 𝑦 coordinates in the persistence diagram Figure 1.11: Example 0D sub-level set persistence from the viscously damped free response time series 𝑥(𝑡). correspond to the local minima 𝑣 𝑛 and maxima 𝑝 𝑛+1 in the time series 𝑥(𝑡). From the known, 24 closed-form solution in Eq. (1.55), the values of the peaks and valleys are solved for as √ −𝜁 𝑣 (2𝑖𝜋+𝜙)/ 1−𝜁 𝑣2 𝑝𝑖 = 𝐴𝑒 (1.57) and √ 2 𝑣 𝑖 = −𝐴𝑒 −𝜁𝑣 (2𝑖𝜋+𝜋+𝜙)/ 1−𝜁𝑣 , (1.58) respectively. From the peaks and valleys or births and deaths for the persistence pairs, their lifetimes are calculated as −𝜁𝑣 2 𝜋 −𝑖𝜁𝑣 2 𝜋 √ √ √−𝜁𝑣 𝜋 2 ! 1−𝜁𝑣 2 1−𝜁𝑣 2 1−𝜁𝑣 𝐿 𝑖 = 𝑝𝑖+1 − 𝑣 𝑖 = 𝐴𝑒 𝑒 +𝑒 , (1.59) where 𝐿 𝑖 is a lifetime of the sub-level set persistence pair (𝑣 𝑖 , 𝑝𝑖+1 ). repeating this lifetime calculation for the (𝑖 + 𝑛) th peak-valley pair results in another lifetime 𝑖+𝑛 , which is used to find the ratio between lifetimes as 𝐿 𝑖+𝑛 √ 2 = 𝑒 −𝑛𝜁𝑣 2𝜋/ 1−𝜁𝑣 . (1.60) 𝐿𝑖 By taking this ratio, the amplitude 𝐴 cancels out, which allows for Eq. 1.60 to be used to calculate 𝜁𝑣 as v u u t 1 𝜁𝑣 =  2 , (1.61) 2𝑛𝜋 1+ ln(𝐿 𝑖+𝑛 /𝐿 𝑖 ) √ From the damping ratio I can also calculate the viscous damping constant as 𝜇𝑣 = 2𝜁𝑣 𝑘𝑚 if the other system parameters 𝑚 and 𝑘 are known. Another benefit of using sublevel set persistence for estimating the damping ratio is that only a single lifetime is needed. The standard method for estimating 𝜁𝑣 in Eq. (1.56) needs atleast two peaks to estimate the damping ratio, while only a single lifetime is needed for estimating the damping constant with a slight variation of Eq. (1.61). Specifically, if I assume the time series 𝑥(𝑡) is centered about zero such that lim𝑡→∞ 𝑥(𝑡) = 0, then I use the 𝑣 0 and 𝑝 1 to calculate the damping ratio as v u u t 1 𝜁𝑣 =  2 , (1.62) 𝜋 1 + ln(−𝑝1 /𝑣 0 ) It should be noted that this method does require a first valley, which results in a damping ratio 𝜁𝑣 < 1. If 𝜁𝑣 > 1, then the damping is considered over-damped and the method will not work to estimate the damping ratio. 25 Coulomb Damping To determine a method for relating the lifetimes to the Coulomb damping constant 𝜇𝑐 and coulomb damping parameter 𝜁𝑐 , I must first determine a theoretical expression for the response of a spring mass damper with only Coulomb damping. To do this, I will implement the method defined in [100]. However, I do acknowledge other methods for analyzing Coulomb damping (i.e. an energy approach [74]). Let us begin by defining the equation of motion from Eq (1.54) with 𝜇𝑣 = 𝜇 𝑞 = 0, resulting in 𝑚 𝑥¥ = −𝑘𝑥 − 𝜇𝑐 𝑁sgn( 𝑥). ¤ The solution to this differential equation is solved by breaking the system into two different states: (1) 𝑥¤ > 0 or (2) 𝑥¤ < 0, which each result in a unique (linear) differential equation. By “stitching" these solutions together I can get the solution as 2𝜇𝑐 𝑁𝜔𝑛 𝑥(𝑡) = ( 𝐴 − 𝑡) cos(𝜔𝑛 𝑡 − 𝜙), (1.63) 𝜋𝑘 2𝜇 𝑐 𝑁𝜔 𝑛 which has a linear amplitude decrement while | 𝐴(1 − 𝜋𝑘 𝑡)| > 𝜇 𝑠 𝑁/𝑘 with the phase shift 𝜙 introduced from other initial conditions. If the inequality is broken at sticking time 𝑡 𝑠 , then 2𝜇𝑐 𝑁𝜔𝑛 𝑥(𝑡 > 𝑡 𝑠 ) = ( 𝐴 − 𝑡 𝑠 ) cos(𝜔𝑛 𝑡 𝑠 − 𝜙) (1.64) 𝜋𝑘 An example of this linear decrement and the sticking condition are shown in Fig. 1.12. Figure 1.12: Example free vibration response of system with Coulomb damping. I now leverage this closed form solution to be used with sublevel set persistence. To do this I start by shifting Eq. 1.63 to have 𝑡 = 𝜏, where 𝜏 is the time at the first valley or 𝜏 = (𝜙 − 1)/𝜔𝑛 , which results in the shift form of the equation of motion as 2𝜇𝑐 𝑁𝜔𝑛 𝑥(𝜏) = ( 𝐴 − 𝜏) cos(𝜔𝑛 𝜏). (1.65) 𝜋𝑘 26 From Eq. (1.65), the peaks 𝑝𝑖 occur at 𝜏 = 2𝑖𝜋/𝜔𝑛 and have values of 𝑝𝑖 = 𝐴 − 4𝑖𝜇𝑐 𝑁/𝑘, and the valleys 𝑣 𝑖 occur at 𝜏 = 𝜋(2𝑖 + 1)/𝜔𝑛 with values of 𝑣 𝑖 = 2(2𝑖 + 1)𝜇𝑐 𝑁/𝑘 − 𝐴. The lifetime of the resulting persistence pairs are calculated as (8𝑖 + 6)𝜇𝑐 𝑁 𝐿 𝑖 = 𝑝𝑖+1 − 𝑣 𝑖 = 2𝐴 − . (1.66) 𝑘 Extending Eq. (1.66) to a second persistence pair results in the lifetime 𝐿 𝑖+𝑛 , which is used to cancel −8𝑛𝜇 𝑐 𝑁 the amplitudes with 𝐿 𝑖+𝑛 − 𝐿 𝑖 = 𝑘 . This difference is then used to solve for the Coulomb damping constant as 𝑘 (𝐿 𝑖 − 𝐿 𝑖+𝑛 ) 𝜇𝑐 = . (1.67) 8𝑛𝑁 With an expression for 𝜇𝑐 , the coulomb damping parameter 𝜁𝑐 is estimated since it is independent of other system parameters (𝑁 and 𝑘). This parameter is the magnitude of the slope of the decrement and is solved for using Eq. (1.67) as 2𝜇𝑐 𝑁𝜔𝑛 𝜔𝑛 (𝐿 𝑖 − 𝐿 𝑖+𝑛 ) (𝐿 𝑖 − 𝐿 𝑖+𝑛 ) 𝜁𝑐 = = = , (1.68) 𝜋𝑘 4𝑛𝜋 2(𝑡 𝐵𝑖+𝑛 − 𝑡 𝐵𝑖 ) where 𝑡 𝐵𝑖 is the time when 𝐿 𝑖 was born or at the time indice of the local minima. Similar to Viscous damping, I can also use a single lifetime to estimate both 𝜇𝑐 and 𝜁𝑐 . To do this, I again assume that the time series 𝑥(𝑡) is zero centered. If so, the damping constant and parameter are calculated as 𝑘 (𝑣 0 + 𝑝 1 ) 𝜇𝑐 = − (1.69) 2𝑁 and 2𝜇𝑐 𝑁𝜔𝑛 𝜔𝑛 (𝑣 0 + 𝑝 1 ) 𝑣0 + 𝑝1 𝜁𝑐 = =− = , (1.70) 𝜋𝑘 𝜋 𝑡𝑣0 − 𝑡 𝑝1 where 𝑡 𝑝1 and 𝑡 𝑣 0 are the time indices at the local maxima and minima, respectively. If the damping mechanism of a system is dominated by both viscous and coulomb damping I suggest implementing the amplitude decrement described by Liang and Feeny [132] in combination with sublevel set persistence. Quadratic Damping For quadratic damping, Eq. (1.54) is reduced to 𝑚 𝑥¥ = −𝑘𝑥 − 𝜇 𝑞 sgn( 𝑥) ¤ 𝑥¤ 2 , which is a non-linear differential equation that does not have a closed form solution. However, there 27 is a solution for calculating the turning points of the solution 𝑥(𝑡) [75]. For estimating the damping constant 𝜇 𝑞 and the associated parameter 𝜁 𝑞 I use these turning points, which are determined by first splitting the equation of motion into two states as   𝜇𝑞  𝑥¥ + ¤ 𝑥¤ 2 𝑚 ( 𝑥) + 𝑚𝑘 𝑥, 𝑥¤ > 0    0= (1.71) 𝜇𝑞 ¤ 𝑥¤ 2  𝑘  𝑥¥ − 𝑚 ( 𝑥) + 𝑥¤ < 0.  𝑚 𝑥,   Similar to the solution method for coulomb damping, quadratic damping requires the solution to be solved Iteratively between the two possible equations of motion in Eq. (1.71) as sgn( 𝑥) ¤ alternates. Fay [75] uses an an integration multiplier to show that the differential equation 𝑥¥ + 𝑝(𝑥) 𝑥¤ 2 + 𝑓 (𝑥) = 0 (1.72) has the solution form 𝑦2 𝑦 20 ∫ 𝑥 𝜇(𝑥) + 𝜇(𝜖) 𝑓 (𝜖)𝑑𝜖 = 𝜇(𝑥 0 ), (1.73) 2 𝑥0 2 ∫ 2𝑝(𝑥)𝑑𝑥 where 𝜇(𝑥) = 𝑒 . By applying this solution to the equation of motion with 𝑝(𝑥) = ±𝜇 𝑞 (the ± represents the two possible conditions with + if 𝑥¤ > 0), 𝜇(𝑥) = 𝑒 ±2𝜇𝑞 𝑥/𝑚 , and 𝑓 (𝑥) = 𝑘𝑥, I solve the equation as 𝑥¤ 2 ±2𝜇𝑞 𝑥 𝑘 𝑥¤02 ∫ 𝑥 ±2𝜇𝑞 ±2𝜇𝑞 𝜖 𝑚 𝑥0 𝑒 𝑚 + 𝑒 𝑚 𝜖 𝑑𝜖 = 𝑒 . (1.74) 2 𝑚 𝑥0 2 The integral is then solved using the method of integration by parts as   𝑚 − ! 𝑘 𝑥 ±2𝜇𝑞 ª ±2𝜇𝑞 𝑥 𝑘𝑥 𝑘𝑚 ±2𝜇𝑞 ® 𝑒 𝑚 = 𝑥¤ 2 + 0 − © 2 ­𝑥¤ + 0 2 𝑒 𝑚 𝑥0 . (1.75) ­ ±𝜇 𝑞 ® ±𝜇 𝑞 2𝜇 𝑞 « ¬ Equation (1.75) is then numerically solved iteratively as the solution goes through 𝑥¤ = 0. However, I would like to use an expression for the relationship between a valley and the following peak to understand how the lifetimes decrease due to the quadratic damping mechanism. To do this I first assume any initial condition [|𝑥 0 |, | 𝑥¤0 |] ≠ 0, which will yield a solution 𝑥(𝑡) that will eventually go to a valley. I then consider the new initial condition x′0 = [𝑣 0 , +0] at this first valley 𝑣 0 (see fig. 1.13 for a sample response with non-zero initial conditions). The velocity is positive (𝑥¤ > 0) between 28 Figure 1.13: Example free vibration response of system with quadratic damping. this first valley 𝑣 0 and until the next peak 𝑝 1 . Therefor, I can use Eq. (1.75) with +𝜇 𝑞 to solve for the relationship between any valley and peak pair as     2𝜇𝑞 𝑚 2𝜇𝑞 𝑚 𝑝 𝑣 𝑒 𝑚 𝑖+1 𝑝𝑖+1 − =𝑒 𝑚 𝑖 𝑣𝑖 − . (1.76) 2𝜇 𝑞 2𝜇 𝑞 This relationship can be rearranged as   𝑚 2𝜇 𝑞 𝑣 𝑖 − 𝑚 𝐿 𝑖 = 𝑝𝑖+1 − 𝑣𝑖 = ln . (1.77) 2𝜇 𝑞 2𝜇 𝑞 𝑝𝑖+1 − 𝑚 After applying sublevel set persistence and generating a persistent diagram, values for the lifetimes, valleys, and peaks are known, which allows for the numerical estimation of 𝜇 𝑞 . This is done by minimizing the cost function    2 𝑚 2𝜇 𝑞 𝑣 𝑖 − 𝑚 𝐶 (𝜇 𝑞 ) = 𝐿 𝑖 − ln . (1.78) 2𝜇 𝑞 2𝜇 𝑞 𝑝𝑖+1 − 𝑚 where 𝐶 (𝜇 𝑞 ) is the cost as a function of 𝜇 𝑞 . I can now also introduce the quadratic damping parameter 𝜁 𝑞 = 𝜇 𝑞 /𝑚. Applying 𝜁 𝑞 to Eqs. (1.78) results in 2𝜁 𝑞 𝑣 𝑖 − 1 2    1 𝐶 (𝜁 𝑞 ) = 𝐿 𝑖 − ln . (1.79) 2𝜁 𝑞 2𝜁 𝑞 𝑝𝑖+1 − 1 which now has no needed system parameters 𝑚 and 𝑘. Equation (1.79) can also be numerically minimized to estimate 𝜁 𝑞 . 1.3.2 Noise Compensation While I have already developed expressions for estimating the damping parameters and constants from sublevel set persistence in Section 1.3.1, I need to develop an automatic framework for the 29 method to be applied to real-world signals with inherent noise. To illustrate the effects of noise, let us return to the example sublevel set persistence from Fig. 1.2, but with additive noise as 𝑥(𝑡) + N . The resulting persistence diagrams from sublevel set persistence from the time series without D(𝑥) and with additive noise D(𝑥 + N ) are shown in Fig. 1.14 as well as the resulting time ordered lifetimes. This example shows that the addition of noise does not have a large effect Death Birth Figure 1.14: Sub-level set persistence applied to sample time series 𝑥(𝑡) with and without additive noise N . This demonstrates the robustness of persistent homology with the time series (top left) with and without additive noise and the small effect on the resulting persistence diagrams (top right) and the corresponding time ordered lifetimes (bottom left). on the position of signficant sublevel sets in the persistence diagram with the distances between signficant points (𝑑1 , 𝑑2 , 𝑑3 , 𝑑4 ) all being relatively small. This is no surprise due to the stability theorem of persistence diagrams [49]. However, additive noise does introduce several points in the persistence diagram located near the diagonal with relatively small lifetimes. These noise-artifact persistence pairs are formed from the peak-valley pairs in the additive noise. For the method of damping parameter estimation to function correctly, I needed to develop a method for dealing with these noise-artifact persistence pairs. One way of removing the noise-artifact persistence pairs is to seperate signficant and insignficant lifetimes through a confidence interval or cutoff. While there are methods for developing cutoffs based on a confidence set for persistence diagrams [40, 73], these methods often require that the time series sampling frequency is significantly higher than the highest dominate frequency of the time series or that the persistence diagram is generated from persistent homology and not sublevel set persistence. Both of these issues make implementing these methods difficult for persistence diagrams generated from sublevel set persistence. Additionally, methods such as 30 persistent entropy [10] for separating noise from significant features in a persistence diagram may not properly distinguish between the noise and significant points if the number of significant data points in the persistence diagram is relatively large compared to the amount of noise. To combat both of these issues, I will introduce two methods for estimating the damping constant with additive noise using sublevel set persistence. Statistical Analysis Time Sublevel Set Time Ordered Damping Series Persistence Lifetimes Parameters Function Fitting Figure 1.15: Overview of method: starting with a time series, the sublevel set persistence is calculated. The lifetimes from the persistence diagram are then plotted as a function of their birth time. The resulting diagram is analyzed from both a statistical and function fitting perspective to estimate the damping parameters. The first method is based on generating a confidence level based cutoff for the persistence diagram for sublevel set persistence, which is founded on the assumed theoretical probability distribution 𝑓 (𝑥) of noise in the persistence diagram developed in [11]. This assumed distribution allows for an accurate cutoff separating noise from features based on a desired confidence level 𝛼. The second method uses a dual function fitting algorithm applied to the time ordered lifetimes diagram.Specifically one curve is fit to the damping envelope of the lifetimes while the second is fit to the additive noise lifetimes. However, this method is only viable for viscous and Coulomb damping as the envelope function is unknown for quadratic damping. The aforementioned methods will be developed and discussed in the proceeding sub-sections as follows. First, in Section 1.3.3 I will provide an overview of the recently developed and novel analysis of the statistics of the lifetimes in the persistence diagram [11] and how its resulting cutoff can be used to separate significant persistence pairs from those associated to noise in the persistence diagram. These significant persistence pairs can then be used to estimate the damping parameters 31 as discussed in Section 1.3.1. In Section 1.3.4, I will introduce the method based on a dual curve fitting procedure in the time ordered lifetimes diagram to estimate the damping parameters. 1.3.3 Method 1: Persistence Diagram Cutoff The first method is based on calculating a suitable cutoff to seperate persistence pairs associated to additive noise from those of signal. To do this, I implement the recently published work on estimating a suitable cutoff for the persistence diagram (and time ordered lifetimes diagram) by assuming an additive noise distribution [11]. I overview the key results from this work in Section 1.3.3. I additionally develop a noise floor compensation term to minimize the effects additive noise has on the accuracy of the estimated damping parameters in Section 1.3.3. Finally, in Section 1.3.3 I show how the cutoff and noise floor are used to estimate the damping parameters. Cutoff Equations For the method developed in [11], the cutoff equations require an assumed probability distribution function for the additive noise. Due to this constraint, I have provided four, commonly assumed probability distributions as Gaussian, uniform, Rayleigh, and Exponential distributions with their associated cutoff equations and approximated distribution parameters as shown in Table 1.3. From Table 1.3, 𝐿˜ is the median lifetime, 𝑛 is the number of samples in the signal, 𝛼 is the confidence level (this is usually chosen as 0.001), and 𝜎, Δ, 𝜎, and 𝜆 are the distribution parameters for the Gaussian, uniform, Rayleigh, and exponential distributions, respectively. To compensate for the effects of signal on the cutoff and parameter estimation equations, I suggest the use of the multiplication compensation term for the signal as 𝑅. This term is used to compensate for the effects of signal with 𝐶𝛼∗ = 𝑅𝐶𝛼 and 𝜎 ∗ = 𝑅𝜎 and is calculated as   𝑐2 𝛿 𝑐1 𝛿+ 𝐿˜ 𝑅=𝑒 , (1.80) where the two constants 𝑐 1 and 𝑐 2 are provided in Table 1.4 and 𝛿 is approximated as 2 ∑︁ 𝛿≈ 𝐿𝐶𝛼 , (1.81) 𝑛 32 Table 1.3: Cutoff and parameter estimation equations for the Gaussian, uniform, Rayleigh, and exponential probability distribution functions. Distribution Cutoff Equation C𝛼 Parameter Estimation √ 1.923 𝐿˜ erf −1 2(1 − 𝛼) 1/𝑛 − 1 𝜎 ≈ 0.680 𝐿˜   Gaussian h √  1/𝑛 i Uniform 2 𝐿˜ 2 1 − 𝛼 −1 Δ ≈ 2 𝐿˜ √︃  √ √︃ √ 1/𝑛  1.025 𝐿˜ 𝜎 ≈ 1.025 𝐿˜ 1/𝑛  Rayleigh −2 ln [1 − 𝛼] − −2 ln 1 − [1 − 𝛼]  √ 1/𝑛 √ 2/𝑛  1.875 Exponential ˜ −0.533 𝐿 ln [1 − 𝛼] − [1 − 𝛼] 𝜆≈ 𝐿˜ with 𝐿 𝐶 𝛼 as the lifetimes greater than 𝐶𝛼 . Table 1.4: Constants of (1.80) for each distribution type investigated in this work with associated uncertainty from ten trials. Distribution Gussian Uniform Rayleigh Exponential 𝑐1 0.845 ± 0.029 0.880 ± 0.017 0.726 ± 0.026 0.436 ± 0.036 𝑐2 0.809 ± 0.061 0.639 ± 0.026 0.605 ± 0.054 0.393 ± 0.075 Noise Floor A secondary effect on the lifetimes associated to signal from additive noise is the increase in the lifetimes, which I term as the “noise floor" 𝐹𝛽 . For example, consider the sample peak-valley pair shown in Fig. 1.16, which illustrates 𝑥(𝑡) as the original time series without noise (blue dashed line), 𝑥(𝑡) + N (red dot data points), and an increase and decrease in the local maxima and minima by approximately 𝜖 𝑝𝑖 and 𝜖 𝑣 𝑖 , respectively. Additionally, from Fig. 1.16, I can approximate the original noise-free lifetimes ′ 𝐿𝑖 ≈ 𝐿𝑖 − 𝜖 𝐿𝑖 , (1.82) where 𝐿 𝑖 is the lifetime associated to the signal with additive noise and 𝜖 𝐿 𝑖 is the uncertainty in the lifetime associated to signal from additive noise. 33 Figure 1.16: Example section of sampled time series 𝑥(𝑡) with (black dots) and without (green dashed line) additive noise to demonstrate effect of additive on increasing the lifetime of sublevel ′ set persistence by approximately 𝐿 𝑖 − 𝐿 𝑖 = 𝜖 𝑣 𝑖 + 𝜖 𝑝𝑖 ≈ F 𝛽 . I attempt to approximate the increase in the lifetime from this uncertainty as the noise floor 𝐹𝛽 ≈ 𝜖 𝐿 𝑖 . This uncertainty will generally increase the lifetime associated to signal and will consequently alter the calculations for the damping constants. Therefore, I will attempt to approximate F 𝛽 and reduce the measure lifetimes accordingly as 𝐿 𝑖 − F 𝛽 . It is straightforward to realize that 𝜖 𝐿 is distributed the same as the lifetimes associated to additive noise. Therefore, the goal will be to approximate, on average, what the increase in 𝐿 𝑖 from additive noise using the previously derived statistics and resulting cutoff equations. Specifically, the goal is to represent the value of 𝐹𝛽 as a function of the number of points near the local extrema 𝑛𝑒 , the assumed additive noise model, and the approximate distribution parameter from the median lifetime with signal compensation (e.g. 𝜎 ∗ for Gaussian additive noise). To estimate 𝐹𝛽 I will recycle the previously derived expressions from [11] in Table 1.3 as shown in Table 1.5. However, I must first develop a method to estimate 𝑛𝑒 and an appropriate confidence level 𝛽. I first choose an appropriate confidence level 𝛽. To determine 𝛽 I consider the goal of the calculation: estimate the average increase in the lifetimes associated to signal from the additive noise near the extrema. Here, the key word is average. In comparison to the cutoff with 𝛼 = 0.01, I need a much higher confidence level for 𝛽 due to the goal not being to provide a cutoff greater than the max of lifetimes associated to noise, but rather the average max itself. Therefore, I chose to set the probability as 50% or 𝛽 = 0.5 such that there is an equal probability of increase in the lifetime being greater or less than the floor F 𝛽 . With 𝛽 assumed as 0.5, I now need to determine 𝑛𝑒 as the average number of points near 34 Table 1.5: Cutoff and parameter estimation equations for the Gaussian, uniform, Rayleigh, and exponential probability distribution functions. Distribution Noise Floor F 𝛽 √ 23/2 𝜎 ∗ erf −1 2(1 − 𝛽) 1/𝑛𝑒 − 1   Gaussian h √  1/𝑛𝑒 i Uniform Δ∗ 2 1 − 𝛽 −1 √︃  √ 1/𝑛  √︃ √ 1/𝑛  Rayleigh 𝜎∗ −2 ln [1 − 𝛽] 𝑒 − −2 ln 1 − [1 − 𝛽] 𝑒  √ √  Exponential − 𝜆1∗ ln [1 − 𝛽] 1/𝑛𝑒 − [1 − 𝛽] 2/𝑛𝑒 the extrema of a lifetime associated to signal. I do not use the total number of data points 𝑛 as only the points near the extrema have a significant probability of increasing 𝐿 𝑖 . Since I are working with signals of the underlying form 𝑥(𝑡) = 𝐴 sin(𝑡 + 𝜙)𝑒(𝑡) for damped oscillators with a damping envelope 𝑒(𝑡), I develop an expression for the number of samples near an extrema using the approximate response of the signal for a lifetime 𝐿 𝑖 > 𝐶𝛼∗ as 𝐿𝑖 𝑓 (𝑥) = − 𝑒(𝑡 𝐵𝑖 ) sin(𝑡) (1.83) 2 with the lifetime 𝐿 𝑖 born at 𝑡 𝐵𝑖 and 𝑡 ∈ [0, 2𝜋]. I consider points near an extrema when 𝐶𝛼∗ | sin(𝑡)| ≥ 1 − 2 , (1.84) 𝐿𝑖 where 𝑡 ∈ [0, 2𝜋]. I now calculate the ratio between all 𝑡 ∈ [0, 2𝜋] and the 𝑡 that satisfy Eq. (1.84) as 𝐶∗ {max(𝑡) − min(𝑡), | sin(𝑡)| ≥ 1 − 2 𝐿𝛼𝑖 } 𝑟𝑖 = , (1.85) 2𝜋 where 𝑟𝑖 ∈ [0, 1] and 𝑡 ∈ [0, 2𝜋]. 𝑟𝑖 is estimated for each 𝐿 𝑖 with the average approximated as 𝑟 = median(ri ). (1.86) 35 The total number of points in the signal with the damped sinusoidal function satisfying 𝐴𝑒(𝑡) > 𝐶𝛼∗ is estimated as 𝑁 = 𝑓𝑠 (max(𝑡 𝐵 ) − min(𝑡 𝐵 )), (1.87) where 𝑓𝑠 is the sampling frequency and 𝑡 𝐵 is the set of birth times associated to lifetimes with 𝐿 𝑖 > 𝐶𝛼∗ . Using the total number of points associated to signal 𝑁 and the ratio of those points near the extrema, I now estimate the number of points near the extrema for a lifetime as 𝑟𝑁 𝑛𝑒 = , (1.88) 𝑛𝐿 where 𝑛 𝐿 is the number of lifetimes with 𝐿 𝑖 > 𝐶𝛼∗ . I can now implement the results for 𝑛𝑒 , 𝛽, and the distribution parameter into the cutoff equations from Table 1.3 as shown in Table 1.5 to calculate a noise floor F 𝛽 . As a note, the noise floor compensation does not have a major effect for relatively low levels of noise (e.g. SNR > 30 dB). However, for higher levels of noise the compensation can be critical for calculating an accurate estimate of the damping constant. The importance of the noise floor compensation will be shown in Section 4.3. Damping Parameter Estimation The damping parameters are estimated using the cutoff and noise floor as follows: 1. Calculate the lifetimes from the persistence diagram 𝐿 = 𝛼𝐷 − 𝛼𝐵 and match them with the time indices of the lifetime minima as 𝑡 𝐵 . This allows for the time ordered lifetimes plot as shown in Fig. 1.15. 2. With the cutoff 𝐶𝛼 known, separate the lifetimes and birth times based on the 𝐿 > 𝐶𝛼 . Adjust the lifetime above the cutoff using the noise floor by substituting 𝐿 𝑖 with 𝐿 𝑖 − 𝐹𝛽 , 𝑝𝑖 with 𝑝𝑖 − 𝐹𝛽 /2, and 𝑣 𝑖 with 𝑣 𝑖 + 𝐹𝛽 /2. 3. Using the noise floor adjusted lifetimes above the cutoff and their time indices 𝑡 𝐵 , use the appropriate equation for estimating the damping constant for Coulomb, viscous, or quadratic 36 damping (see equation reference in Table 1.6). Additionally, I suggest using 𝑖 = 0 and 𝑛 as the lifetime closest to 0.3211max(𝐿) to minimize the effect of additive noise as shown in [137]. Table 1.6: Quick reference to equations (or cost functions) for using sublevel set persistence to estimate damping parameters and constants. Coulomb Viscous Quadratic h  i 2 2𝜁 𝑣 𝑖 −1 (𝐿 𝑖 −𝐿 𝑖+𝑛 ) √︂ Parameter 𝜁: 2(𝑡 𝐵𝑖+𝑛 −𝑡 𝐵𝑖 )  1 2 𝐶 (𝜁 𝑞 ) = 𝐿 𝑖 − 2𝜁1𝑞 ln 2𝜁𝑞𝑞𝑝𝑖+1 −1 2𝑛 𝜋 1+ ln ( 𝐿𝑖+𝑛 /𝐿𝑖 ) h  i 2 𝑘 (𝐿 𝑖 −𝐿 𝑖+𝑛 ) 2𝜁 𝑣 𝑘 𝑚 2𝜇𝑞 𝑣 𝑖 −𝑚 Constant 𝜇: 8𝑛𝑁 𝜔𝑛 𝐶 (𝜇 𝑞 ) = 𝐿 𝑖 − 2𝜇𝑞 ln 2𝜇𝑞 𝑝 𝑖+1 −𝑚 1.3.4 Method 2: Function Fitting to the Persistence Space The second method is based on function fitting to the time ordered lifetimes. As mentioned previously, when calculating the sublevel set persistence diagram for a time series with additive noise, the persistence pairs associated to noise populate the region near the diagonal. Similarly, for the time-ordered lifetime plot the lifetimes of the persistence pairs associated to noise 𝐿 𝑁 will be near the 𝑥-axis and the lifetimes from the persistence pairs associated to signal 𝐿 𝑆 will capture the damping envelope as shown in the example signal in Fig. 1.17. I leverage this result as a method of filtering the noise in time series such that I can apply a function fitting to the lifetimes associated to signal. Using all the lifetimes 𝐿 = 𝐿 𝑁 ∪𝐿 𝑆 , I fit two functions of the form 𝑓 𝑁 (𝑡) = 𝑏 and 𝑓𝑆 (𝑡) = 𝑒(𝑡)+𝑏, where 𝑒(𝑡) is the envelope function for the lifetimes based on the damping type (Coulomb or viscous) with 𝑏 as a constant to account for the noise offset. 𝑓 𝑁 and 𝑓𝑆 are fit to 𝐿 𝑁 and 𝐿 𝑆 , respectively. In Fig. 1.17 I demonstrate this dual function fitting method. Based on this methodology, I can use the fitting parameters of 𝑒(𝑡) to determine the damping constants. For viscous damping I estimate the envelope function as 𝑒(𝑡) = 𝑎e−𝑐𝑡 , where 𝑎 and 𝑐 are constant parameters. The exponent parameter 𝑐 correlates with Eq. (1.59) and Eq. (1.55) with 𝑛𝑐(𝑡 𝐵𝑖+𝑛 − 𝑡 𝐵𝑖 ) 𝜁𝑣 = 𝑐/𝜔𝑛 = . (1.89) 2𝜋 37 Figure 1.17: Example demonstrating process of going from a time series 𝑥(𝑡) with amplitude decrement and additive noise to the time ordered lifetimes of the persistence diagram with dual function fitting. Additionally, 𝜇𝑣 = 2𝑚𝑐. For Coulomb damping I estimate the envelope function as 𝑒(𝑡) = −𝑎𝑡 + 𝑑, where 𝑎 is the magnitude of the slope of the linear function and 𝑑 is the intercept. I use the relationship in Eq. 1.65 to calculate the coulomb damping ratio as 𝜁𝑐 = 𝑎/2, which is extended to 𝜁𝑐 𝜋𝑘 𝑎𝜋𝑘 𝜇𝑐 = = . (1.90) 2𝑁𝜔𝑛 4𝑁𝜔𝑛 I have now demonstrated how the function fitting method can easily be used to estimate the damping parameters from the lifetime plot (example illustrated in Fig. 1.17). This methods has the benefit of not needed a statistical analysis of the noise in the persistence diagram. However, the method does require an extra computational step of function fitting. For function fitting I use a unique cost function for fitting two curves simultaneously, which is defined as 𝑇 ∑︁   𝐶= min [𝐿 𝑖 − 𝑓 𝑁 (𝑡 𝐵𝑖 )] 2 , [𝐿 𝑖 − 𝑓𝑆 (𝑡 𝐵𝑖 )] 2 , (1.91) 𝑖=0 where the cost function 𝐶 is a function of the parameters 𝑎, 𝑏, 𝑐 for viscous damping and 𝑎, 𝑏, 𝑑 for Coulomb damping. Additionally, the subscript 𝑖 of 𝐿 𝑖 and 𝑡 𝐵𝑖 denote the 𝑖 th sublevel set lifetime of all 𝑇 lifetimes such that 𝑖 ∈ [1, 𝑇]. I minimize Eq. (1.91) using Python’s scipy.optimize.minimize implementation of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) minimization algorithm. A required input for the BFGS algorithm is an initial guess of the unknown parameter values. For viscous damping, I suggest the following estimations: 𝑎 = max(𝐿), 𝑏 = 𝑚𝑎𝑥(𝐿)/100, and 𝑐 = 38 ln(1/0.3299)/𝑡opt , where 𝑡opt is the birth time of the lifetime nearest to 0.3299 max(𝐿) ≠ max(𝐿). For Coulomb damping, I make the following estimations: 𝑏 = 0.1 max(𝐿), 𝑚 = max(𝐿)/𝑡 𝑜 𝑝𝑡, and 𝑑 = max(𝐿). Through simulations I have found that these initial guesses yield accurate results for a wide range of parameter values as demonstrated in the Section 4.3. 1.3.5 Examples I will now implement the method for three examples. The first example is a simulated viscously damped oscillator, the second is an experimental single pendulum with damping dominated by the Coulomb damping mechanism, and the third is a simulated quadratically damped oscillator. Example 1: Viscously Damped Oscillator For the first example, the system analyzed is the free response of the viscously damped oscillator described by 𝑚 𝑥¥ + 𝑘𝑥 + 𝜇𝑣 𝑥¤ = 0, where 𝑚 = 1 kg, 𝑘 = 20 N/m, and 𝜇𝑣 = 0.5 Ns/m. This system is solved as Eq. (1.92) with initial conditions 𝑥 0 = 1 m and 𝑥¤0 = 0 m/s as 𝑥(𝑡) = 𝑒 −𝜁𝜔𝑛 𝑡 cos(𝜔 𝑑 𝑡), (1.92) √︁ where 𝜔𝑛 = 𝑘/𝑚 ≈ 4.472 rad/s, 𝜁 = 0.05590, and 𝜔 𝑑 = 4.465 rad/s. Figure 1.18: Time series 𝑥(𝑡) sampled at 20 Hz from Eq. (1.92) with and without additive noise N from a normal distribution with standard deviation 𝜎 = 0.01. The simulation was sampled at a rate of 𝑓𝑠 = 20 Hz for 20 seconds with additive noise N from a Gaussian distribution with a standard deviation 𝜎 ≈ 0.01 m as shown in Fig. 1.18. Sub-level set persistent homology is applied to the time series with and without additive noise as P0 (𝑥 + N ) and P0 (𝑥). The lifetimes 𝐿 and their time indices 𝑡 𝐵 are then calculated from 39 the persistence diagram and time series, respectively. As mentioned previously, the persistence diagrams with and without additive noise show only slight differences for the significant lifetimes. I can then apply both the statistics based analysis (see left side of Fig. 1.19) and function fitting analysis (see right side of Fig. 1.19) to the resulting lifetimes and time indices. Figure 1.19: Resulting time-ordered lifetimes plot for the viscous damping mechanism example in Fig. 1.18 with (left) the statistical analysis and (right) function fitting. Using the lifetimes from the persistence diagram, a cutoff 𝐶𝛼 = 0.119 is calculated using 𝛼 = 1%. To calculate the damping constant, the lifetime indices are chosen as 𝑖 = 0 and 𝑛 = 3 so that 𝐿 𝑛+𝑖 /𝐿 𝑖 ≈ 0.3211 (𝐿 3 /𝐿 0 ≈ 0.583/1.542 ≈ 0.378) as suggested in [137]. Using these lifetimes, 𝜁𝑣 is calculated from Eq. (1.61) as v u v u u t 1 u t 1 𝜁𝑣 =  2 =   2 ≈ 0.05480. 2𝑛𝜋 6𝜋 1 + ln(𝐿 𝑖+𝑛 /𝐿 𝑖 ) 1 + ln(𝐿 3 /𝐿 0 ) √ Using 𝜁𝑣 I can then calculate 𝜇𝑣 = 2𝜁𝑣 𝑘𝑚 ≈ 0.4901. As noticed, both of these values a slightly below the theoretical values of 𝜁𝑣 = 0.05590 and 𝜇𝑣 = 0.5. To improve the estimation, I can account for a noise floor in the calculation of 𝜁𝑣 as v u u t 1 𝜇𝑣 = 2   2 ≈ 0.05611, 6𝜋 1 + ln(𝐿 3 −F /𝐿 0 −F ) √ where F ≈ 0.018 was calculated as described in Section 1.3.3. I then calculate 𝜇𝑣 = 2𝜁𝑣 𝑘𝑚 ≈ 0.5019, which is significantly closer to the actual 𝜇𝑣 = 0.5. Accounting for the noise floor becomes more critical as the noise level increases, which will be investigated more thoroughly in section 4.3. For the second method, I implement the dual function fitting analysis as shown on the right side of Fig. 1.19. This analysis results in the constant 𝑐 ≈ 0.2475, which is used in Eq. (1.89) 40 to calculate 𝜁 ≈ 0.4955. I then calculate 𝜇𝑣 ≈ 0.05540. This shows that the dual function fitting method also work wells for estimating the damping constants, but the statistics based method with a noise floor compensation is slightly more accurate. Example 2: Experimental Single Pendulum The second example uses data collected from a free drop experiment of a bench top pendulum within the linear range of oscillations. The pendulum used has CAD and design documentation provided through GitHub1 with uncertainty analysis [183]. This single pendulum has an approximate system model of the form 𝐼 𝜃¥ = −𝜇𝑐 sgn( 𝜃) ¤ − 𝑚𝑔𝑟 cm 𝜃, where 𝐼 = 𝐼cm + 𝑚𝑟 cm 2 with 𝑟 cm as the radius to the center of mass and 𝐼cm as the inertia about the center of mass. This equation can be compared to Eq. (1.54) with 𝜇𝑣 = 𝜇 𝑞 = 0. This comparison results in equivalence of 𝑚 = 𝐼 and 𝑘 = 𝑚𝑔𝑟 cm . Figure 1.20: Time series 𝑥(𝑡) sampled at 20 Hz from Eq. (1.92) with and without additive noise N from a normal distribution with standard deviation 𝜎 = 0.01. For the pendulum model it is assumed that the other damping mechanisms are negligible in comparison to the Coulomb damping. To validate this assumption, I implemented the BFGS algorithm for fitting a simulation of the model to collected free drop data, where the three damping constants 𝜇𝑐 , 𝜇𝑣 , and 𝜇 𝑞 were the only unknowns. This required acurate estimates for the 𝑚, 𝑟 cm , and 𝐼. These parameters were estimated with either a direct measurement or through SolidWorks’ mass properties tool with an accurate CAD model, which resulted in values of 𝑚 ≈ 0.1231 kg, 𝐼 ≈ 0.00295 kg m2 , and 𝑟 cm ≈ 0.128 m. From 5 free drops, the model fitting resulted in estimated average damping parameters with uncertainties (one standard deviation) of 𝜇𝑐 = (2.56±0.09)×10−3 , 1 https://github.com/Khasawneh-Lab/simple_pendulum 41 𝜇𝑣 = (1.20 ± 0.32) × 10−4 , and 𝜇 𝑞 = (6.0 ± 2.2) × 10−6 . These parameter values show that a large majority of the damping occurred through Coulomb damping, which substantiates the reduced model for the pendulum. Figure 1.21: Resulting time-ordered lifetimes plot for the experimental pendulum data (see Fig. 1.20) having an approximate Coulomb damping mechanism in the linear range with (left) the statistical analysis and (right) function fitting. The collected angular data (in radians) is shown in Fig. 1.20. Next, similar to the first example, the time ordered lifetimes are calculated using sublevel set persistence. I can then apply both the statistics based analysis (see left side of Fig. 1.21) and function fitting analysis (see right side of Fig. 1.21) to the resulting lifetimes and time indices. I can now estimate the damping parameter (slope of decrement envelope) as 𝜁𝑐 = (𝐿 𝑖 − 𝐿 𝑖+𝑛 )/2(𝑡 𝐵𝑖+𝑛 − 𝑡 𝐵𝑖 ), where 𝑖 = 0 and 𝑛 = 5. This calculation results in 𝜁𝑐 ≈ 0.07909. Similarly, I can use the function fitting method resulting in 𝑎 ≈ 0.1538, which is used to calculate 𝜁𝑐 = 𝑎/2 ≈ 0.07690. Using 𝜁𝑐 from the two methods, I can now calculate the damping constants as 𝜇𝑐 ≈ 2.65 × 10−3 and 𝜇𝑐 ≈ 2.58 × 10−3 for the statistics and function fitting methods, respectively. Both of these results fall within the uncertainty of the parameter estimated from model fitting (𝜇𝑐 = (2.56±0.09) ×10−3 ), which suggests that this method for damping estimation is viable for experimental data. Example 3: Quadratically Damped Oscillator For the last example, and completion of damping types, I will again simulate a time series. However, I will now use quadratic damping as the mechanism of energy dissipation. To do this, I simulated a response of 𝑚 𝑥¥ + 𝑘𝑥 + 𝜇 𝑞 𝑥¤ 2 sgn( 𝑥) ¤ =0 with initial conditions 𝑥 0 = 1 m and 𝑥¤0 = 0 m/s and parameters 𝑚 = 1 kg, 𝑘 = 20 N/m, and 𝜇 𝑞 = 0.5 42 Ns2 /m2 . The solution was sampled for 20 seconds at a sampling rate of 20 Hz. Additionally, I included additive noise N to the time series 𝑥(𝑡) from a Gaussian distribution with a standard deviation 𝜎 ≈ 0.01 m as shown in Fig. 1.22. Figure 1.22: Time series 𝑥(𝑡) sampled at 20 Hz from the simulation of a quadratically damped oscillator with and without additive noise N from a normal distribution with standard deviation 𝜎 = 0.01. Next, sublevel set persistence was applied to the time series with additive noise, and the corresponding birth times 𝑡 𝐵 and lifetimes 𝐿 were recorded. A statistical analysis of the lifetimes was used to calculate a noise floor and cutoff as shown in Fig. 1.23. By minimizing the cost function Figure 1.23: Resulting time-ordered lifetimes plot for the quadratic damping mechanism example in Fig. 1.22 with (left) the statistical analysis and (right) function fitting. in Eq. (1.78) and Eq. (1.79), I calculate the damping constant and parameter as 𝜇 𝑞 ≈ 0.513 and 𝜁 𝑞 ≈ 0.513, respectively. By comparing these values to the actual 𝜇 𝑞 = 0.5 and 𝜁 𝑞 = 0.5, I can see that sublevel set persistence is an accurate an automatic method for estimating quadratic damping parameters. 43 1.3.6 Results In this section I provide three main results of sublevel set persistence for damping parameter identification: noise robustness, functionality at low sampling frequencies, and applicability for a wide range of damping parameters. All three of these analyses are based on estimating damping parameters from the three different damping mechanisms with damping parameters of 𝜇𝑐 = 0.05 N, 𝜇𝑣 = 0.5 Ns/m, 𝜇 𝑞 = 0.5 Ns2 /m2 for Coulomb, viscous, and quadratic damping, respectively. The other system parameters are set as 𝑚 = 1 kg and 𝑘 = 20 N/m with initial conditions 𝑥 0 = 1 m and 𝑥¤0 = 0 m/s. These systems are simulated for 20 seconds at a rate of 20 Hz unless specified otherwise. Noise Robustness For analyzing noise robustness, I implement a sweep of a Signal-to-Noise- Ratio (SNR) from 15 to 40 dB, where a low SNR signifies a high level of noise. The SNR is defined as   𝐴signal SNR = 20 log2 , (1.93) 𝐴noise where 𝐴signal = 1 m as the maximum value of the signal (based on initial conditions), and 𝐴noise = √ 𝜎 2 with 𝜎 as the standard deviation of the additive Gaussian noise. In signal processing an SNR of 15 dB is considered the limit for extracting useful information from a time series. At each SNR, I add Gaussian (normal distribution) noise with the specified SNR and estimate the damping constant using all three methods: single lifetime, optimal lifetime ratio, and function fitting. I compute these estimates for 100 samples at each SNR, which provide a mean and standard deviation and is represented as a data point with standard deviation error bars as 𝜇 ± 𝜎𝜇 (see Fig. 1.24). I also ran two variations of the parameter estimations: one with and one without noise compensation. By noise compensation I are referring to a compensation of the noise floor in the damping parameter estimation as described in Section 1.3.3. the goal with this noise robustness analysis is to determine the functional limits of each method with additive noise. On the left side of Fig. 1.24 I show the results with the automated noise 44 Figure 1.24: Analysis of the noise robustness of sublevel set persistence for damping parameter estimation of an oscillator with (top) coulomb, (middle) viscous, and (bottom) quadratic damping mechanisms with (left) and without (right) noise compensation. For each damping mechanism I estimate the damping parameters using a single lifetime (One), and optimal lifetime ratio (Opt.), and function fitting (Fit.). compensation with the parameter estimation for Coulomb, viscous, and quadratic damping, from top to bottom. On the top left I have the estimated coulomb parameters (actual 𝜇𝑐 = 0.05 N), which shows that both the function fitting and optimal ratio methods accurately estimate the damping parameter all the way down to an SNR of 15 dB. However, the damping estimation has a large uncertainty when using only a single lifetime. This suggests that the single lifetime method should only be used for low noise levels or a high SNR. Additionally, on the top right of Fig. 1.24 I see almost no difference between the noise compensation and no noise compensation results suggesting that it is unnecessary to do noise compensation for Coulomb damping parameter estimation. This is most likely due to the approximately even increase in the lifetimes associated to additive noise on the lifetimes associated to signal has a minimal effect on the slope of the damping envelope. For the middle row of figures in Fig. 1.24 the results for viscous damping are shown. These 45 results demonstrate that only the optimal noise ratio with noise compensation accurately estimates the damping parameter (𝜇𝑣 = 0.5 Ns/m) at high levels of noise. At slightly lower levels of noise (SNR > 25), all three methods accurately estimate the damping parameters, but the function fitting method shows parameter estimation with higher accuracy. Similarly, for the no noise compensation case on the right, all three methods show accurate results when for SNR > 25 dB. For the last damping parameter 𝜇 𝑞 = 0.5 Ns2 /m2 on the bottom row, there is no function fitting method as there is currently no closed-form solution for the damping envelope function for quadratic damping. This means only the single and optimal lifetime methods can be used. Additionally, for improved accuracy I see that noise compensation is necessary for SNR values less than approximately 30 dB. I also notice that quadratic damping estimation is more sensitive to additive noise than Coulomb and viscous damping and only has a relatively high precision for low noise levels with the SNR greater than approximately 30 dB. Effects of Sampling Frequency The second analysis of sublevel set persistence for damping parameter identification is based on the effects of sampling rate to determine the minimum sampling rate at which the method will continue to function accurately. To do this analysis I scaled the sampling frequency (originally 20 Hz) from 2 to 20 Hz. At frequencies lower than 4 Hz I are approaching the Nyquist sampling rate with 𝑓Nyquist = 2𝜔𝑛 ≈ 1.42 Hz and expect the method to fail. Additionally, I expect the accuracy will only improve for frequencies greater than 20 Hz. The additive noise level was left at 50 dB. At each frequency an uncertainty was added to the sampling frequency and the damping parameters were calculated for 100 samples. This allows for a mean and standard deviation on the parameter values (See Fig. 1.25). This analysis shows that for all three damping mechanisms low sampling frequencies approach- ing the Nyquist sampling rate reduce the accuracy and precision of the parameter estimation. I also conclude that both the function fitting and the optimal lifetime ratio methods have similar results. However, both Coulomb and quadratic damping estimation show a a significantly higher level of uncertainty for sampling rates less than 4 Hz, which suggests that the time series should be sampled 46 Figure 1.25: Effect of low sampling frequencies for the damping parameter identification methods based on sublevel set persistence for Coulomb (left), viscous (middle), and quadratic (right) damping mechanisms. Analysis shows accurate results for sampling rate 𝑓𝑠 > 2 𝑓Nyquist , where 𝑓Nyquist ≈ 1.42 Hz is the Nyquist sampling rate. at rates greater than twice the Nyquist rate. On the other hand, the viscous damping parameter estimation showed accurate results up to the Nyquist sampling rate. Effects of Damping Parameter Variation The last result and analysis is the effect of damping parameters to determine at what parameters the methods fail. To do this analysis, there is no additive noise and I will only consider significantly high damping parameters as small damping should not decrease the accuracy of the optimal lifetime ratio and function fitting methods. However, at low damping parameters and high noise levels, the accuracy of the method based on the first, single lifetime will become inaccurate (I do not show this result). For Coulomb damping the range of damping constants 𝜇𝑐 will vary from 0.001 to 0.55 N, where at 𝜇𝑐 ≈ 0.4𝑁 the sticking effect has a significant influence on the damping parameter estimation and causes the method to fail. For viscous damping I consider damping constants that result in damping parameters up to 𝜁𝑣 = 1.0 or 𝜇𝑣 ∈ [0.01, 8.5] Ns/m (i.e. critically damped). At 𝜁𝑣 = 1.0 the response has no oscillations, which results in no lifetimes and an upper limit for viscous damping. Finally, for quadratic damping the damping constant does not have a large influence on the accuracy, which is why I chose a large damping constant range 𝜇 𝑞 = [0.01, 8.0] Ns2 /m2 . Figure 1.26 shows the resulting damping constant estimates over the range of damping constants. For Coulomb damping both the function fitting and optimal lifetime ratio begin to lose accuracy 47 Figure 1.26: Effects of damping parameters of (left) Coulomb, (middle) viscous, and (right) quadratic damping. These parameter values are ranged from very low damping to high or critical damping values. when the number of lifetimes decreases to one. This occurs at approximately 𝜇𝑐 = 0.2 N. Additionally, at 𝜇𝑐 ≈ 0.4 N, the sticking effect of coulomb damping is effecting the single lifetime, which reduces the accuracy of the method based on a single lifetime. For viscous damping in the middle of Fig. 1.26, the function fitting method (Fit.) loses accuracy at approximately 𝜇𝑣 = 2.5 Ns/m or 𝜁𝑣 ≈ 0.3, the optimal lifetime ratio method loses accuracy at 𝜇𝑣 ≈ 6 Ns/m, and, finally, the method based on a single lifetime accurately estimates the damping constant almost all the way to 𝜁𝑣 = 1.0. For quadratic damping on the right, the damping estimation method functions accurately for the entire damping constant range. I theorize that the function fitting method loses accuracy for high levels of damping due to a lack of data points or lifetimes associated to signal for the function to fit to. This result shows the benefit of using the statistics based method for estimating the damping ratio since it is functional for higher damping levels. 1.4 Sublevel Set Entropy 1.4.1 Information Entropy Statistics Entropy is used as a summary statistic for measuring the predictability of a data source based on the probability distribution across a set of discrete states. Information entropy was first realized as Shannon entropy, which was introduced in 1948 [210]. Since then, several new forms of entropy 48 have been popularized for time series analysis. Some examples include approximate entropy [185], sample entropy [198], and permutation entropy [14]. Additionally, information entropy can be applied to transition probability matrices from a Markov chain representation through the entropy rate or conditional entropy. However, this application of entropy requires the time series to be represented as a sequence of discrete states. In this work we show how each of these entropy statistics can be used to analyze the sublevel set persistence of a time series. The following paragraphs provide a brief introduction to each entropy measurement. Shannon Entropy Shannon entropy [210] is calculated using the probability distribution of a set of possible states A from the sequence of states S. Each state has its probability calculated based on frequency with state 𝑎𝑖 having probability 𝑝(𝑎𝑖 ). The Shannon entropy is calculated as ∑︁ 𝐻 (S) = − 𝑝(𝑎𝑖 ) log( 𝑝(𝑎𝑖 )), (1.94) 𝑖∈𝑁 where 𝑁 is the number of possible states. Shannon entropy can be normalized as Í 𝑖∈𝑁 𝑝(𝑎𝑖 ) log( 𝑝(𝑎𝑖 )) ℎ(S) = − , (1.95) log(𝑁) with ℎ ∈ [0, 1]. If each state is equiprobable over all possible states then the underlying dataset has a high level of uncertainty and ℎ = 1. Conversely, if ℎ = 0 then only one state has probability 𝑝(𝑎𝑖 ) = 1, while all others have zero probability representing a perfectly regular dataset. A major issue for Shannon entropy, as described in the introduction, is that it does not account for the order in which the data is received. To alleviate this issue approximate entropy was created. Approximate Entropy Unlike Shannon entropy, which can measure predictability using a prob- ability distribution among the states, approximate entropy [185] measures the regularity of a signal based on the sequence of states. Additionally, it does not require distinct states with the use of the uncertainty or filtering level parameter 𝑟 when comparing sequence segments. Unfortunately the choice of an appropriate 𝑟 value is not trivial and is dependent on the application. Therefor, using a 49 Input: Signal 𝑥 = [𝑥(0), 𝑥(1), . . . , 𝑥(𝑁 − 1) with 𝑁 as length of signal, filter level 𝑟, and data comparison length 𝑚. Output: Approximate entropy ℎ𝑎 1 Form collection of vectors 𝑉𝑚 = [𝒗 𝑚 (𝑖), . . . , 𝒗 𝑚 (𝑁 − 𝑚)] with 𝒗 𝑚 (𝑖) = [𝑥(𝑖), 𝑥(𝑖 + 1), . . . , 𝑥(𝑖 + 𝑚 − 1)] ∈ R𝑚 for each 𝑖 ∈ [0, 𝑁 − 𝑚]. 2 Calculate #{𝒗 𝑚 ( 𝑗) ∈ 𝑉𝑚 | 𝑑 (𝒗 𝑚 ( 𝑗), 𝒗 𝑚 (𝑖)) ≤ 𝑟} 𝐶𝑖𝑚 (𝑟) = , 𝑁 −𝑚+1 which measures the percent of vectors within a distance 𝑟 of vector 𝑣(𝑖) with the Chebyshev (or 𝐿 ∞ ) distance function as 𝑑 (𝒂, 𝒃) = max𝑖 (|𝑎𝑖 − 𝑏𝑖 |) where 𝑎𝑖 ∈ 𝒂 and 𝑏𝑖 ∈ 𝒃. 3 Define 𝑁−𝑚 1 ∑︁ Φ𝑚 (𝑟) = log(𝐶𝑖𝑚 (𝑟)). 𝑁 − 𝑚 + 1 𝑖=0 4 Calculate the approximate entropy as ℎ𝑎 (𝑥) = Φ𝑚 (𝑟) − Φ𝑚+1 (𝑟). Algorithm 1.1: Approximate Entropy sequence of states makes the choice of 𝑟 = 0 simple. The approximate entropy is calculated using the Algorithm 1.1 as follows. The algorithm calculates the regularity of a sequence of states by comparing how many unique (with 𝑟 filtering level) sequences of states of length 𝑚 there are. For a periodic signal there would be relatively few unique sequences and thus a low approximate entropy. In comparison, a chaotic or patternless signal would have many unique sequences and a high approximate entropy. Two major drawbacks exist for approximate entropy. The first is its high sensitivity to parameter selection [152] and second is its need for sufficiently long data. To alleviate the latter, sample entropy was devised. Sample Entropy Sample entropy [198] is similar to approximate entropy in that it compares sequences of length 𝑚 with filtration level 𝑟. However, sample entropy ℎ 𝑠 has the benefit of having data length independence. Sample entropy is typically used for measuring signal complexity with 50 applications in physiological time-series data [198], and it is calculated as 𝐴 ℎ 𝑠 (𝑥) = − log , (1.96) 𝐵 where 𝐴 = #{[𝒗 𝑚 (𝑖), 𝒗 𝑚 ( 𝑗)] ∈ [𝑉𝑚 , 𝑉𝑚 ] | 𝑑 (𝒗 𝑚 (𝑖), 𝒗 𝑚 ( 𝑗)) ≤ 𝑟} and 𝐵 = #{[𝒗 𝑚+1 (𝑖), 𝒗 𝑚+1 ( 𝑗)] ∈ [𝑉𝑚+1 , 𝑉𝑚+1 ] | 𝑑 (𝒗 𝑚+1 (𝑖), 𝒗 𝑚+1 ( 𝑗)) ≤ 𝑟}. In this work we use 𝑚 = 3 by default unless otherwise stated. Sample entropy is unfortunately still sensitive to the filtering level parameter 𝑟 and is computationally demanding for large signals as demonstrated in Section 1.4.5. For more details on approximate and sample entropy we guide the reader to [61] Permutation Entropy Permutation entropy [14] was developed as a more computationally effi- cient method for calculating the complexity of a sequence in comparison to approximate and sample entropies. Permutations are the ordinal partitions of sequences of time series data. Specifically, the sequences (or state space reconstruction vectors) are defined as 𝑣 𝑛,𝜏 (𝑖) = [𝑥(𝑖), 𝑥(𝑖 + 𝜏), . . . , 𝑥(𝑖 + (𝑛 − 1)𝜏)], where the signal 𝑥 is discretely sampled from a data source that can be either continuous or discrete, 𝑛 is the permutation dimension, and 𝜏 is the spacing between points in the signal. Each vector 𝑣 𝑛,𝜏 (𝑖) can be categorized as one of 𝑛! possible permutations. Applying this procedure over all 𝑣 𝑛,𝜏 (𝑖) allows calculating the probability of each permutation. The Shannon entropy from Eq. (1.94) can then be used to calculate the permutation entropy as Í 𝑖∈𝑛! 𝑝(𝜋𝑖 ) log( 𝑝(𝜋𝑖 )) ℎ 𝑝 (𝜋) = − , (1.97) log(𝑛!) which is normalized to the range [0, 1] by using the number of possible states 𝑛!. While computa- tionally efficient, permutation entropy does not account for amplitude as is done with sample and approximate. It can therefor be sensitive to additive noise. 51 Markov Chain Entropies: Entropy Rate and Average Conditional Entropy Markovian based entropy statistics are calculated using a transition probability matrix. To create the transition probability matrix, a sequence of states is used to track transitions through an adjacency matrix 𝐴. For each transition from state 𝑎𝑖 to state 𝑎 𝑗 , 𝐴(𝑖, 𝑗) is incremented by one. The adjacency matrix 𝐴 is |𝑉 | × |𝑉 |, where |𝑉 | is the number of states observed. The adjacency matrix is used to form a one-step transition probability matrix according to 𝐴(𝑖, 𝑗) 𝑃(𝑖, 𝑗) = Í|𝑉 |−1 . (1.98) 𝑘=0 𝐴(𝑖, 𝑘) The probability matrix now represents the probability of transitioning from state 𝑎𝑖 to state 𝑎 𝑗 in one step. This transition probability matrix serves as a stochastic model of the time series dynamics. The goal is to then quantify the predictability of this stochastic Markov chain model to calculate its complexity. The first tool for calculating its predictability is the average condition entropy ℎ¯ 𝑐 , which measures the average normalized Shannon entropy of transitions for each states. It is calculated as |𝑉 |−1 |𝑉 |−1 1 ∑︁ ∑︁ ℎ¯ 𝑐 (S) = − 𝑃(𝑖, 𝑗) log(𝑃(𝑖, 𝑗)). (1.99) log(𝑁)|𝑉 | 𝑖=0 𝑗=0 The conditional entropy measures the models’ predictability and complexity by quantifying the predictability of each state transition. If there is only one transition direction possible (e.g. from state 𝑠𝑖 to state 𝑠 𝑗 ) then the conditional entropy of state 𝑠𝑖 would be zero. However, if it is possible to transition from 𝑠𝑖 to many other states then the corresponding conditional entropy would be higher. Similar to the average conditional entropy, the entropy rate ℎ𝑟 is calculated as the normalized Shannon entropy of the transition probabilities for all states but with a weighting of each state’s entropy based on its stationary distribution 𝜇. The entropy rate is calculated as |𝑉 |−1 |𝑉 |−1 1 ∑︁ ∑︁ ℎ𝑟 (S) = − 𝜇𝑖 𝑃(𝑖, 𝑗) log(𝑃(𝑖, 𝑗)), (1.100) log(𝑁) 𝑖=0 𝑗=0 Í where we estimate 𝜇 based on the probability distribution over the states such that 𝑖 𝜇𝑖 = 1. If the distribution is equiprobable, then the average conditional entropy is equivalent to the entropy 52 rate. A drawback of the entropy rate and conditional entropy is that only the single step transition probability is investigated. In comparison, sample, approximate, and permutation entropy can analyze sequences of larger dimensions. 1.4.2 Method In this work, we implement the entropy tools discussed in Section 1.4.1 to the sublevel set per- sistence diagram. Our method is outlined in the pipeline shown in Fig. 4.2. We begin with an oscillatory signal in Fig. 4.2 (a) and calculate the 0D sublevel set persistence diagram in Fig. 4.2 (b). Additionally, we separate persistence pairs associated with noise using a cutoff as described in Sec- tion 1.2. Next, we calculate and bin the chronologically ordered lifetimes (based on the birth times 𝑡 𝐵 ) as shown in Fig. 4.2 (c). The chronological lifetimes are sorted based on the time index at which the persistence pair was born. At this stage, the approximate and sample entropy methods can be directly applied to the chronologically ordered lifetimes above the cutoff 𝐶𝛼 with 𝑟 = 0.1 max(𝐿). A benefit of using sublevel set persistence and associated lifetimes to apply approximate and sample entropy is that it eliminates the need to analyze the multi-scale aspects of the signal. This is due to the sublevel sets naturally partitioning the data using the critical points of the signal. Additionally, using sublevel set persistence provides a much more compact representation of possibly lengthy time series. This reduces the computational demand of approximate and sample entropy thus enabling in-situ analysis even for long signals. 53 Figure 1.27: Pipeline for applying entropy metrics to the sublevel set persistence homology. The sublevel set persistence diagram in (b) is calculated from the signal in (a), which is used to calculate the lifetimes that are ordered chronologically based on their birth index in (c). The lifetimes can either be used to directly calculate the approximate and sample entropy as ℎ𝑎 (𝐿) and ℎ 𝑠 (𝐿) or are then digitized into states based on the binning procedure in (d) and (e) with bin edges shown in (c). The probability of each state can be found to calculate the information entropy ℎ. Additionally, the chronologically ordered states in (e) can be used to calculate the approximate and sample entropies ℎ𝑎 (S) and ℎ 𝑠 (S), where S is the state sequence composed of states 𝑎𝑖 ∈ A. The entropy rate ℎ𝑟 and average conditional entropy ℎ¯ 𝑐 can also be calculated from the Markov chain matrix in (f). A procedure for mapping the lifetimes 𝐿 to a state sequence S is needed to implement the remaining entropy statistics. We use an equi-sized partitioning of the lifetimes within [𝐶𝛼 , max(𝐿)] into B bins. This method allows us to represent a signal with a small set of discrete states. Using the corresponding state sequence and each state’s abundance, the information entropy ℎ(S) is calculated using the probabilities of each state as shown in Fig. 4.2 (d). The state sequence S can also be used to calculate the approximate entropy ℎ𝑎 (S) and sample entropy ℎ 𝑠 (S) as shown in Fig. 4.2 (e). The benefit of applying approximate and sample entropies to the state sequence is the simplicity of parameter selection with 𝑟 = 1, which works well for B < 30. For larger B values we suggest setting 𝑟 = 0.1B. While not shown in Fig. 4.2, approximate and sample entropy can also be applied directly to the signal in subfigure (a). However, this is significantly more computationally demanding which will be demonstrated in Section 4.3. We can also use the state sequence S from the ordered lifetimes 𝐿 to create a transition probability matrix 𝑃 shown in Fig. 4.2 (f). The transition probability matrix or Markov chain matrix is then used to calculate the entropy rate ℎ𝑟 and average conditional entropy ℎ¯ 𝑐 . 54 1.4.3 Example To demonstrate the functionality of sublevel set entropy for dynamic state detection we use the popular Lorenz dynamical system 𝑑𝑥 𝑑𝑦 𝑑𝑧 = 𝜎(𝑦 − 𝑥), = 𝑥(𝜌 − 𝑧) − 𝑦, = 𝑥𝑦 − 𝛽𝑧, (1.101) 𝑑𝑡 𝑑𝑡 𝑑𝑡 with 𝜎 = 10, 𝛽 = 8/3, and 𝜌 = 100 for periodic while 𝜌 = 105 for chaotic dynamics. For our analysis, we only use the 𝑥-solution to Eq. (4.3) which was simulated for 100 seconds at a sampling rate of 100 Hz. Only the last 20 seconds were used to avoid the transient response. The left column of Fig. 1.28 shows the resulting periodic (top panel) and chaotic (bottom panel) time series. Figure 1.28: Example demonstrating sublevel set persistence of periodic (top row of figures) and chaotic (bottom row of figures) simulations of the Lorenz system. Each row shows the time series 𝑥(𝑡) (left), sublevel set persistence diagram (middle), and binned lifetimes (right). In this example our goal is to identify periodic from chaotic dynamics through the sublevel set persistence. The persistence diagram on the top row of Fig. 1.28 shows that periodic signals tend to cluster points (persistence pairs) in a few locations on the persistence diagram (two points for this example). The goal is to then quantify this regularity in the persistence diagram. To do this we use the time ordered lifetimes which can easily be binned into B states that are above the cutoff 𝐶𝛼 . For periodic signals we would expect the clustering of points in the persistence diagram to translate to a periodic sequence of states from the binned lifetimes as shown in the top-right subfigure of 55 Fig. 1.28. On the other hand, the chaotic signal will not have the same properties in the resulting state sequence as shown in the bottom-right subfigure of Fig. 1.28. The non-periodic behavior of a chaotic signal causes the points in the persistence diagram to not repeat. However, there may still be clusters in the persistence diagram from a chaotic signal due to the dynamics (strange attractor) of the system as shown in the bottom-center subfigure of Fig. 1.28. We now implement our entropy statistics to the resulting periodic and chaotic persistence diagrams through the time-ordered lifetimes. For this example, we set B = 15 based on the binning analysis done in Section 1.4.4. The information entropy is calculated from the probability distribution of states in the state sequence S derived from the binned lifetimes as shown in the frequency plot (see left column of Fig. 1.29). The entropy and associated probabilities result in ℎ ≈ 0.2559 for periodic and ℎ ≈ 0.6522 for chaotic dynamics. The periodic entropy is not zero since it is distributed over two states (4 and 15) and the chaotic is not one because it is not equiprobable over all states. However, the large difference between the two scores shows that entropy distinguishes between the two dynamic states. Figure 1.29: Further diagrams for entropy analysis of example signals in Fig. 1.28. The top row is again for the periodic signal and bottom for chaotic. The left column is the distribution of states, the middle is the state sequence, and the right is the 1-step transition probability matrix. The approximate and sample entropies are calculated using the state sequence shown in the middle column of Fig. 1.29 as ℎ𝑎 ≈ 0.0004 and ℎ 𝑠 ≈ 0.0308 for periodic and ℎ𝑎 ≈ 0.4921 and ℎ 𝑠 ≈ 0.7864 for chaotic dynamics. The approximate and sample entropy are near zero due to 56 Table 1.7: Tabulated results for sublevel set entropy of Lorenz example Entropy Periodic Chaotic Information Entropy ℎ 0.2559 0.6522 Approximate Entropy ℎ𝑎 0.0004 0.4921 Sample Entropy ℎ 𝑠 0.0308 0.7864 Entropy Rate ℎ𝑟 0 0.3832 Average Condition Entropy ℎ¯ 𝑐 0 0.3321 the regularity in the state sequence, while the chaotic signal results in significantly higher entropy values. In Section 4.3 we will demonstrate typical approximate and sample entropy values for a variety of chaotic systems. The Markov chain transition probability matrix models the dynamics of the signal as a stochastic system. This modeling approach allows periodic signals with very high transition probabilities between specific states to have low state entropies and resulting entropy rate. Conversely, the chaotic signal has a distribution of transition probabilities between multiple states with lower probabilities. For example, the right column of Fig. 1.29 shows the transition probability matrix for the periodic and chaotic time series, where the periodic signal results in only two non-zero transitions with unit probability and the chaotic signal has transitions between multiple states with lower probabilities. The resulting entropy rate and average conditional entropy are ℎ𝑟 = 0 and ℎ¯ 𝑐 = 0 for periodic and ℎ𝑟 ≈ 0.3832 and ℎ¯ 𝑐 ≈ 0.3321 for chaotic dynamics. The entropy rate of 0 for the periodic signal is due to each state having unit probability. The results for all of the entropy statistics are summarized in Table 1.7. Comparing the two columns of Table 1.7 illustrates that the entropy statistics based on the sublevel set persistence can identify periodic and chaotic dynamics for the Lorenz system. In the next section, we further challenge our method using a large number of flows and maps, and we show that the ability of the sublevel set entropy to distinguish periodic from chaotic dynamics is evident for a variety of dynamical systems. 57 1.4.4 Analysis on the Number of Bins The first result needed is an analysis of the effects of the number of bins on the entropy values for periodic and chaotic dynamics. To gain a universal understanding of these effects, we used 21 continuous dynamical systems and 15 maps (see Table C.1 in appendix), with each having periodic and chaotic dynamics. We ranged the number of bins B from 2 to 50, which demonstrated several characteristics as shown in Fig. 1.30. Figure 1.30: Analysis on effect of number of bins or states on entropy values for 18 continuous and 12 discrete dynamical systems. First, the separation between periodic and chaotic dynamics based on the entropy values tends to plateau at approximately B = 15 bins. We also note that there seems to be very little differentiation between the entropy distributions for maps and flows, suggesting that 15 bins are appropriate for both. As such, we will use 15 bins when calculating the entropy statistics throughout the manuscript. 58 1.4.5 Results The main focus of this work is on dynamic state detection, including a bifurcation analysis and robustness to noise. The first example in Section 1.4.3 uses the Lorenz system to demonstrate typical entropy values for a system with periodic and chaotic behavior. However, a global understanding of the typical distribution of entropy values for each statistic is necessary to draw conclusions on the dynamic state detection abilities. To gain a better understanding of the distribution of entropy values for identifying the dynamic state, we use box plots with no additive noise in Section 1.4.5 for both continuous (flows) and discrete (maps) binned states. For each system, we used B = 15 bins. For approximate and sample entropy we set 𝑚 = 3 and 𝑟 = 0.1B or 𝑟 = 0.1 max 𝐿. To understand the noise robustness characteristics of the entropy statistics, in Section 1.4.5 we empirically demonstrate the effects of additive noise on the Lorenz system. This was done for each entropy value with Signal to Noise Ratios (SNRs) ranging from 10 dB to 60 dB. Note that in signal processing, 15 dB is typically considered the SNR limit below which it becomes challenging to extract any useful information from the signal. In Section 1.4.5 we provide a bifurcation analysis to determine how well the entropy statistics can detect changes in a system as parameters change. We show this bifurcation analysis for the logistic map and the Lorenz system. Lastly, in Section 1.4.5 we provide a computation speed analysis for the Lorenz system and logistic map to demonstrate the benefits of applying the entropy statistics to the sublevel set persistence diagram in comparison to directly applying them to the signal. Dynamic state detection analysis Figures 1.31a and 1.31b use box-plots to demonstrate the distributions of the entropy statistics for periodic and chaotic behavior, respectively. The analysis was performed using 18 continuous and 12 discrete dynamical systems, which were simulated using the MakeData module in the python package teaspoon [161] with the default parameters. The box-plot distribution results show that the entropy statistics perform better for discrete continuous systems than the flows with less overlap between distributions. However, there is still a very clear distinction between the periodic and chaotic dynamics for both maps and flows. Further, 59 (a) Flows (b) Maps Figure 1.31: Spread of entropy values for periodic and chaotic dynamics using 15 bins for 12 discrete dynamical systems (maps) and 18 continuous dynamical systems (flows). The green dashed line seperates periodic and chaotic entropy sttistics based on a maximized accuracy for both flows and maps. the distributions for maps and flows align closely and are distributed over a specific range which allow a cutoff parameter separating periodic from chaotic dynamics to be chosen for both maps and flows. Based on the distributions we set cutoffs as 0.485 for ℎ(S), 0.100 for ℎ𝑎 (S), 0.105 for ℎ 𝑠 (S), 0.110 for ℎ𝑎 (𝐿), 0.120 for ℎ 𝑠 (𝐿), 0.172 for ℎ𝑟 (S), and 0.130 for ℎ¯ 𝑐 (S) which are marked in Fig. 1.31a and 1.31b using green dashed lines. These cutoffs were chosen to maximize the accuracy of the dynamic state detection for each entropy statistic. It can also be noted that the approximate or sample entropy applied to the lifetimes 𝐿 or state sequence S make little difference in the entropy values. As such, there is no advantage in applying approximate or sample entropy to either the lifetimes or state sequence from a performance standpoint. Robustness to Additive Noise The initial analysis in Section 1.4.5 provided a starting point for dynamic state analysis through a cutoff based on the distribution of entropy statistics. However, noise-robustness must be considered to apply the sublevel set entropy statistics to real-world data. 60 In this subsection, we determine how well these cutoffs perform for dynamic state detection in the presence of additive white noise. To test the noise robustness, we use the Lorenz system with additive noise SNRs ranging from 10 dB (high noise) to 60 dB (low noise). Figure 1.32 shows ℎ𝑎 (S), ℎ 𝑠 (S), ℎ𝑎 (𝐿), and ℎ 𝑠 (𝐿) all being the most noise robust down to an SNR of 20 dB. ℎ(S) is also moderately noise robust with accurate separation between periodic and chaotic dynamics based on the cutoff down to an SNR of 23 dB. The Markov chain statistics, ℎ𝑟 and ℎ¯ 𝑐 , are the least noise robust and only correctly separate periodic from chaotic dynamics with SNR values greater than 26 dB. Figure 1.32: Resilience of entropy statistics to additive noise for SNR values from 10 to 50 dB for the periodic and chaotic Lorenz system simulation described in Eq. (4.3). Uncertainties are reported as the standard deviation for each SNR repeated 20 times. We observed that these noise robustness results hold for the other dynamical systems with similar levels of noise robustness. We speculate that the noise robustness of these methods is mainly due to the stability theorem for sublevel set persistence [49]. This theorem states that the persistence diagram of a function with and without additive noise will only change linearly proportional to the additive noise level. Therefore, if the noise-artifact persistence pairs are removed using the cutoff 𝐶𝛼 , then the entropy statistics on the resulting persistence pairs should only be based on the noise 61 robustness of the entropy statistics. Bifurcation Analysis In our initial dynamic state analysis in Figures 1.31a and 1.31b we only looked at a single realization of chaotic and periodic signals from each system. However, it is often of interest to analyze the bifurcation behavior as one parameter varies. To determine the viability of the sublevel set entropy statistics for bifurcation analysis, we study the bifurcations in the Logistic map and the Lorenz system. Logistic Map Our first bifurcation analysis uses the logistic map as an example discrete dynamical system. The logistic map is defined as 𝑥 𝑛+1 = 𝑟𝑥 𝑛 (1 − 𝑥 𝑛 ). (1.102) For this system we increment the 𝑟 parameter from 3.2 to 4.0 in 10−3 step sizes. At each step, the system is solved for 1000 map iterations but we only retain the last 300 iterations to avoid transients. Figure 1.33 shows each of our sublevel set entropy statistics for each 𝑟 value, and it contrasts them to permutation entropy ℎ(𝜋), sample entropy ℎ 𝑠 (𝑥), and approximate entropy ℎ𝑎 (𝑥) computed directly from the simulated signals. The permutations used in calculating the permutation entropy were of dimension 𝑛 = 6 with time delay 𝜏 = 1. The sample and approximate entropy used dimension 𝑚 = 3 with filtering level of 0.2𝜎, where 𝜎 is the standard deviation of the signal. 62 Figure 1.33: Bifurcation analysis of entropy statistics for the logistic map dynamical system with 𝑟 ∈ [80, 190] with step sizes of Δ𝑟 = 0.001. Green highlighted regions are periodic. Figure 1.33 demonstrates that the sublevel set entropy statistics outperform the standard entropy tools. Specifically, all sublevel set entropies can locate the small periodic window at approximately 𝑟 ≈ 3.67, which is not identified by the standard tools. Further, permutation entropy does not provide clear drops in its value for periodic windows, where the sublevel set entropy statistics are at approximately zero for periodic dynamics. When comparing the sublevel set entropy statistics, there is no clear distinction in performance. The sample entropies are almost identical, suggesting that there is little benefit in applying it to the lifetimes or the state sequence besides bypassing the complexity involved in parameter selection when directly applying them to the sequences. It is also important to note that the Shannon entropy of S provides more information in regards to signal complexity. Specifically, it more clearly shows bifurcations. For example, at 𝑟 ≈ 3.45, there is a period-doubling bifurcation which increases ℎ(S), while the other entropy statistics do not show any change. 63 Lorenz System Our second bifurcation analysis used the Lorenz System defined in Eq. (4.3) where the 𝜌 parameter was incremented from 80 to 190 in step sizes of 0.1. The same entropy statistics from the logistic map bifurcation analysis were used for the Lorenz example. As shown in Fig. 1.34, the entropy statistics can show bifurcations in the system with periodic dynamics having low entropy values. Similar to the logistic map bifurcation, the standard entropy tools did not identify all of the periodic windows (e.g., at 𝜌 ≈ 112 and 𝜌 ≈ 182). At the same time, the sublevel set entropy methods show this as a periodic window. This example demonstrates the viability of sublevel set entropy statistics to detect periodic from chaotic windows and bifurcations for both maps and flows. Figure 1.34: Bifurcation analysis of entropy statistics for the Lorenz dynamical system with 𝜌 ∈ [3.2, 4.0] with step sizes of Δ𝜌 = 0.1 and 𝜎 = 10 and 𝛽 = 8/3. Green highlighted regions are periodic. Computation Time We now investigate the computational speed benefits of using the sublevel set persistence when calculating the sample entropy compared to its direct application to signals. 64 When approximate, sample, and permutation entropy are applied directly to a signal of length 𝑁, all 𝑁 − 𝑚 sequences are used. However, the computational demand is significantly decreased when using the sublevel set persistence diagram. This is due to the new lifetimes being shorter than the original signal. Additionally, the length 𝑁 is proportionally increased by increasing the signal’s sampling rate. However, the number of persistence pairs in the sublevel set persistence diagram remains constant. We demonstrate the computational demand of each entropy statistic for both the Lorenz system and logistic map in Fig. 1.35. Figure 1.35: Computation Time Example for Lorenz system (A) and logistic map (B) for each entropy statistic. Our computation speed analysis shows that, as expected, approximate and sample entropy applied directly to the signal as ℎ𝑎 (𝑥) and ℎ 𝑠 (𝑥) are faster than when applied to the sequence S and lifetimes 𝐿. Specifically, for the Lorenz system with 𝑁 = 103 , ℎ𝑎 (𝐿) is approximately 45 times faster than ℎ𝑎 (𝑥) and ℎ 𝑠 (𝐿) is approximately 9 times fast than ℎ 𝑠 (𝑥). Further, ℎ𝑎 (𝐿) is approximately twice as fast as ℎ𝑎 (S) and ℎ 𝑠 (𝐿) and ℎ 𝑠 (S) are approximately equivalent in computational speed. For the logistic map, the computational times are generally larger for the same signal length 𝑁 of a flow due to oscillations occurring more frequently with maps. We would also like to note that the average conditional ℎ¯ 𝑐 , entropy rate ℎ𝑟 and Shannon entropy ℎ(S) have the fastest computational speed making them the most suitable for in-situ applications. The computational benefit of the 65 sublevel set entropy statistics most likely stems from the 𝑂 (𝑁 log(𝑁)) algorithmic complexity of the zero-dimensional sublevel set persistence of one-dimensional signals [164]. 66 CHAPTER 2 PARAMETER SELECTION FOR PERMUTATION ENTROPY AND STATE SPACE RECONSTRUCTION This chapter of my research is focused on choosing the optimal delay and dimension parameters for both permutation entropy and state space reconstruction. This section will begin by intro- ducing information entropy and then specifically permutation entropy as a time series analysis tool. Following this introduction several delay and dimension parameter selection algorithms are introduced and then compared in Section 2.3.4 to choose an optimal method. This work is based on my publication “On the Automatic Parameter Selection for Permutation Entropy" [161]. The future work section is based on work that will soon be published on relating permutation entropy to state space reconstruction to allow for tools from TDA to be used for delay parameter selection in permutation entropy. 2.1 Permutation Entropy Permutation Entropy (PE) has its origins in information entropy, which is a tool to quantify the uncertainty in an information-based system. Information entropy was first introduced by Shannon [212] in 1948 as Shannon Entropy. Specifically, Shannon entropy measures the uncertainty in future data given the probability distribution of the data types in the original, finite dataset. Í Shannon entropy is calculated as 𝐻𝑠 (𝑛) = − 𝑝(𝑥𝑖 ) log 𝑝(𝑥𝑖 ), where 𝑥𝑖 represents a data type, and 𝑝(𝑥𝑖 ) is the probability of that data type. In recent years information entropy has been heavily applied to the time series of dynamical systems. Several new variations of information entropy have been proposed to better accommodate these applications, e.g. approximate entropy [186], sample entropy [199], and PE [15] with a timeline shown in Fig. 2.1. These methods measure the predictability of a sequence through the entropy of the relative data types. However, PE considers the ordinal position of the data through permutations, which has been shown to be effective for analyzing the dynamic state and complexity of a time series [6, 16, 33, 62, 80, 81, 145]. PE is also 67 Figure 2.1: Timeline of entropy measurements for time series analysis. noise robust for time series of sufficient length and relatively high signal-to-noise ratios, which is the ratio between useful signal and background noise. Alternatively, if the time series is relatively short or has a low signal-to-noise ratio, it is suggested to use a different entropy measurement such as coarse-grained entropies [190]. PE is quantified in a similar fashion to Shannon entropy with only a change in the data type to permutations (see Fig. 2.3), which I symbolically represent as 𝜋𝑖 . PE has two parameters: the permutation dimension 𝑛 and embedding delay 𝜏, which are used when selecting the permutation size and spacing, respectively. PE is sensitive to these parameters [131, 201, 221] and there is no accurate selecting approach for all applications. This introduces the motivation for this paper: investigate automatic methods for selecting both PE parameters. There are currently three main methods for selecting PE parameters: (1) parameters suggested by experts for a specific application, (2) trial and error to find suitable parameters, or (3) methods developed for phase space reconstruction. I will now overview a simple example to better understand these parameters. Bandt and Pompe [15] defined PE according to ∑︁ 𝐻 (𝑛) = − 𝑝(𝜋𝑖 ) log 𝑝(𝜋𝑖 ), (2.1) where 𝑝(𝜋𝑖 ) is the probability of a permutation 𝜋𝑖 and 𝐻 (𝑛) is the permutation entropy of dimension 𝑛 with units of bits when the logarithm is of base 2. The permutation entropy parameters 𝜏 and 𝑛 are 68 used when selecting the motif size, with 𝜏 determining the time difference between two consecutive points in a uniformly sub-sampled time series and 𝑛 as the permutation length or motif dimension. To form a permutation, begin with with an element 𝑥𝑖 of the series 𝑋. Using this element, the dimension 𝑛, and delay 𝜏, define the vector 𝑣 𝑖 = [𝑥𝑖 , 𝑥𝑖+𝜏 , 𝑥𝑖+2𝜏 , . . . , 𝑥𝑖+(𝑛−1)𝜏 ]. The corresponding permutation 𝜋𝑖 of this vector is determined using its ordinal pattern. For example, consider the third degree 𝑛 = 3 permutation shown in Fig. 2.2. The permutation type, which categorizes the 2 1 0 (1,0,2) Figure 2.2: Sample permutation formation for 𝑛 = 3 and 𝜏 = 1. permutation, is found by first ordering the 𝑛 values of the permutation smallest to largest, and then accounting for the order received. For the given permutation in Fig. 2.2, the resulting permutation is categorized as the sequence 𝜋𝑖 = (1, 0, 2), which is one of 𝑛! possible permutations for a dimension 𝑛, see Fig. 2.3 for the other possible permutations of 𝑛 = 3. (0,1,2) (0,2,1) (1,0,2) (2,0,1) (1,2,0) (2,1,0) Figure 2.3: All possible permutation configurations for n = 3. I can normalize PE using the maximum possible PE value, which occurs when all 𝑛! possible 1 permutations are equiprobable according to 𝑝(𝜋1 ) = 𝑝(𝜋2 ) = . . . = 𝑝(𝜋𝑛! ) = 𝑛! . The resulting normalized PE is 1 ∑︁ ℎ𝑛 = − 𝑝(𝜋𝑖 ) log2 𝑝(𝜋𝑖 ). (2.2) log2 𝑛! Many domain scientists who apply PE make general suggestions for 𝑛 and 𝜏 [76, 248], which can be impractical for some applications. As an example, Popov et al. [189] showed the influence of the sampling frequency on the proper selection of 𝜏. As for the dimension 𝑛, there are general suggestions [201] on how to choose its value based on the vast majority of applications having an appropriate permutation dimension in the range 3 < 𝑛 < 8. Additionally, Bandt and Pompe [15] 69 suggest that 𝑁 ≫ 𝑛, where 𝑁 is the length of the time series. However, these general outlines for the selection of 𝑛 (and 𝜏) do not allow for an application specific suggestions. If I assume that suitable PE parameters correspond to optimal phase space reconstruction parameters, then a common approach for selecting 𝜏 and 𝑛 is to implement one of the existing methods for estimating the optimal Takens’ embedding [225] parameters. Hence, some of the common methods for determining 𝜏 include the mutual information function approach [77], the first folding time of the autocorrelation function [25, 86], and phase space methods [30]. Additionally, some common phase space reconstruction methods for determining 𝑛 include box-counting [22], correlation exponent method [86], and false nearest neighbors [110]. Although the parameters in PE have similar names to their delay reconstruction counterpart, there are innate differences between ordinal patterns and phase space reconstruction which can also lead to inaccurate 𝑛 or 𝜏 values. In spite of these differences, permutations can be viewed as symbolic representation of regions in the phase space through a binning process. Permutations partition the phase space based on the ordinal rankings of the embedded vectors. This relationship between phase space and permutations opens up the potential for some of the classic phase space reconstruction methods for selecting both 𝑛 and 𝜏 to be a plausible solution for selecting the same parameters for PE. Even with the possibility that phase space reconstruction methods for selecting 𝜏 and 𝑛 may work for choosing synonymous parameters of PE, there are a few practical issues that preclude using parameters from time series reconstruction for PE. One issue stems from many of the methods (e.g. false nearest neighbors and mutual information) still requiring some degree of user input through either a parameter setting or user interpretation of the results. This introduces issues for practitioners working with numerous data sets or those without enough expertise in the subject area to interpret the results. Another issue that arises in practice is that the algorithmic implementation of existing time series analysis tools is nontrivial. This hinders these tools from being autonomously applied to large datasets. For example, the first minimum of the MI function is often used to determine 𝜏. However in practice there are limitations to using mutual information to analyze data without the operator intervention to sift through the minima and choose the first ’prominent’ one. This is due 70 (a) Delay at first minima: 5 1.6 Mutual Info. 1.4 1.2 1.0 0 5 10 15 20 25 30 (b) 1.0 Mutual Info. 0.8 0.6 0.4 0.2 0 5 10 15 20 25 30 (c) 1.0 Folding time: ρ = 1/e 0.8 Correlation Delay at ρ: 283 0.6 0.4 0.2 0.0 0 100 200 300 400 500 τ h Figure 2.4: Some possible modes for failure for selecting 𝜏 for phase space reconstruction using classical methods: (a) mutual information registering false minima as suitable delay generated from a periodic Lorenz system, (b) mutual information being mostly monotonic and not having a distinct local minimum to determine 𝜏 generated from EEG data [7], and (c) autocorrelation failing from a moving average of ECG data provided by the MIT-BIH Arrhythmia Database [154]. to possibility that the mutual information function can have small kinks that can be erroneously picked up as the first minimum. Figure 2.4a shows this situation, where the first minimum of the mutual information function for a periodic Lorenz system is actually an artifact and the actual delay should be at the prominent minimum with 𝜏 = 11. Further, the mutual information function approach may also fail if the mutual information is monotonic. This is a possibility since there is no guarantee that minima exist for mutual information [13]. An example of this mode of failure is shown in Fig. 2.4b, which was generated using EEG data [7] from a patient during a seizure. A mode of failure for the autocorrelation method can occur when the time series is non-linear or has a moving average. In this case, the autocorrelation function may reach the folding time at an unreasonably large value for 𝜏. As an example, Fig. 2.4c shows the autocorrelation not reaching the folding time of 𝜌 = 1/𝑒 until a delay of 𝜏 = 283 for electrocardiogram data provided by the 71 MIT-BIH Arrhythmia Database [154]. The last mode of failure concerns choosing the permutation dimension 𝑛 to be equal to the embedding dimension optimized using delay embedding from time series analysis. This can lead to an overly large embedding dimension [47] (𝑛 ≫ 8), which would make the calculation of PE impractical because the number of possible permutations 𝑛! would become too large. All of these possible modes of failure can make using classical phase space methods for selecting 𝜏 and 𝑛 unreliable thus necessitating new tools or modifications to make selecting 𝜏 and 𝑛 for PE more robust and less user-dependent. These shortcomings lead us to the problem that I address in this chapter: Given a sufficiently sampled/oversampled and noisy time series 𝑋 = {𝑥𝑡 }R+ , how can I reliably and systematically define appropriate dimension 𝑛 and time delay 𝜏 values for computing the corresponding PE? The first contribution towards answering this question is detailed in Section 2.2, which addresses the automatic selection of the time delay 𝜏. In Section 2.2.1 I combine the Least Median of Squares (LMS) approach for outliers detection with Fourier transformation theorem to derive a formula for the maximum significant frequency in the Fourier spectrum, with the assumption that 𝑋 is contaminated by Gaussian measurement noise. This formula allows obtaining a cutoff value where the only input, besides the time series, is a desired percentile from the Probability Density Function (PDF) of the Fourier spectrum. Once this value is obtained, Nyquist’s sampling theorem is used to compute an appropriate 𝜏 value. The second contribution is through an approach that I develop in Section 2.2.2, which uses Multi-scale Permutation Entropy (MPE) for finding 𝜏. I show how MPE can be used to find the main period of oscillation for a time series derived from a periodic system. Building upon this, I show how the method can be extended to find 𝜏 for a chaotic time series by using the first maxima in the MPE as it satisfies the Nyquist’s sampling theorem. The third contribution to the automatic selection of 𝜏 is through the analysis of Permutation Auto-Mutual Information [135] (PAMI). PAMI is an existing method for measuring the mutual information of permutations. However, I tailor this method to specifically select 𝜏 for PE. The final contribution towards answering the posited question is our evaluation of the ability 72 Delay Dimension Mutual Autocorrelation Frequency Multiscale Permutation False Multiscale Singular Information Approach Permutation Auto-Mutual Nearest Permutation Spectrum Entropy Information Neighbors Entropy Analysis Max Frequency Cutoff Frequency Figure 2.5: Overview of methods investigated for automatically calculating both the delay 𝜏 and dimension 𝑛 for permutation entropy. of existing tools for computing an embedding dimension to provide an appropriate value for the PE parameter 𝑛. I compare dimension 𝑛 values computed from False Nearest Neighbors (FNN— Section 2.3.1), Singular Spectrum Analysis (SSA—Section 2.3.2), and MPE (Section 2.2.2). While I use existing methods for performing the FNN and the SSA analyses, for the MPE-based approach I use a criteria established in prior works [201], which requires finding 𝜏 first. I made this process automatic through the selection of 𝜏 from our second contribution. This chapter is organized as follows. I first go into detail on some existing methods for selecting both 𝜏 and 𝑛. Specifically, in Section 2.2 I provide a detailed explanation for selecting 𝜏 using existing, automatic methods such as autocorrelation in Section 2.2.3 and Mutual Information (MI) in Section 2.2.4. Additionally, I modify and develop/tailor methods to automatically select 𝜏. These methods include a frequency approach in Section 2.2.1, MPE in Section 2.2.2, and PAMI in Section 2.2.5. In Section 2.4.3 I expand on the process for selecting 𝑛 using False Nearest Neighbors (FNN) in Section 2.3.1 and Singular Spectrum Analysis in Section 2.3.2. In Section 2.3.3, I explain our algorithm for automatically selecting 𝑛 using MPE. After introducing each method, in Section 2.3.4 I contrast all of these methods and make conclusions on their viability by comparing the resulting parameters to those suggested by PE experts. An overview of the methods that will be investigated for automatically calculating both 𝜏 and 𝑛 are shown in Fig. 2.5. All the functions used and developed in this work are available in Python through GitHub [161]. 73 Times Fourier Least Median Cutoff/ Embedding Series Spectrum of Squares Max Freq. Delay Max Frequency Cutoff Frequency Frequency Figure 2.6: Overview of our frequency domain approach for finding the maximum significant frequency 𝑓max using LMS for a signal contaminated with GWN. 2.2 Embedding Delay Parameter Selection Methods The delay embedding parameter 𝜏 is used to uniformly subsample the original time series. To elaborate, consider the time series 𝑋 = {𝑥𝑖 | 𝑖 ∈ N}. By applying the delay 𝜏 ∈ N, a new sub- sampled series is defined as 𝑋 (𝜏) = [𝑥 0 , 𝑥 𝜏 , 𝑥2𝜏 , . . .]. In order to obtain a stable and automatic method for estimating an optimal value for 𝜏 I investigate: a novel frequency-based analysis that I describe in Section 2.2.1, Multi-scale Permutation Entropy (MPE) (Section 2.2.2), autocorrela- tion (Section 2.2.3), and Mutual Information function (MI) (Section 2.2.4). I recognize, but do not investigate, some other methods for finding 𝜏 such as diffusion maps [20] and phase space expansion [30]. 2.2.1 Frequency Approach for Embedding Delay In this section we develop a method for finding the noise floor in the Fourier spectrum using Least Median of Squares (LMS) [143]. We then use the noise floor to find the maximum significant frequency of a signal contaminated with additive Gaussian white noise (GWN). Our method is based on finding the maximum significant frequency in the Fourier spectrum and the Nyquist sampling frequency criteria. To motivate the development of this approach, I begin by working with the frequency criteria developed by Melosik and Marszalek [148], which agrees with Nyquist sampling theorem [124] for choosing a suitable sampling frequency 𝑓𝑠 as 2 𝑓max < 𝑓𝑠 < 4 𝑓max , (2.3) where 𝑓max is the maximum significant frequency in the signal. Melosik and Marszalek [148] showed that a sampling frequency within this range is appropriate for subsampling an oversampled 74 signal, thus mitigating the effect of temporal correlations of neighboring points in densely sampled signals. However, the automatic identification of 𝑓max from an oversampled signal is not trivial. Melosik and Marszalek [148] selected a maximum significant frequency by inspecting the normal- ized Fourier spectrum and using a threshold cutoff of approximately 0.01 for a noise-free chaotic Lorenz system. This made visually finding the maximum frequency significantly easier but did not provide guidance on how to algorithmically find 𝑓max . Further, attempting to algorithmically adopt the approach suggested by Melosik and Marszalek [148] resulted in large errors especially in the presence of a low signal to noise ratio. This motivated the search for an automatic and data-driven approach for identifying the noise floor which could then be used to find the maximum significant frequency. To do this I develop a method based on 1-D least median of squares applied to the Fourier spectrum. The assumptions inherent to our method are 1. The time series is not undersampled. The purpose of the methods is to determine a suitable delay parameter for subsampling the signal, which would be meaningless if the time series is undersampled. 2. The Fourier transform of the time series needs to have less than 50% of the points with significant amplitudes. This requirement stems from the limitations of the least median of squares regression. 3. The noise in the signal is approximately GWN; otherwise, the ensuing statistical analysis becomes inapplicable. Violating this assumption can yield false peak detections, which would lead to an incorrect delay parameter. We find suitable cutoffs for obtaining 𝑓max of the signal by using the noise floor determined from the 1-D least median of squares, and compute a suitable embedding delay according to 𝑓𝑠 𝜏= , (2.4) 𝛼 𝑓max where I set 𝛼 = 2, thus agreeing with the range in Eq. (2.3) and the Nyquist sampling criterion. 75 Figure 2.6 summarizes the frequency approach for 𝜏 with the use of our 1-D LMS method for finding a noise floor in the Fourier spectrum. This process begins with computing the Fourier spectrum of the signal, which is followed by fitting an 0-D LMS regression line to the noise in the Fourier spectrum. This provides statistical information about the Probability Distribution Function (PDF) of the noise level. The PDF is used to determine the Cumulative Distribution Function (CDF), which I use determine a meaningful noise cutoff in the Fourier spectrum. However, it is assumed that the noise is approximately GWN for this method to hold statistical significance. This cutoff is used to separate the highest significant frequency in the Fourier spectrum 𝑓max , which is used to find a suitable embedding delay 𝜏 based on the frequency criteria in Eq. (2.4). In the following paragraphs I review our use of the LMS and the derivation of the PDF of the Fourier spectrum of GWN. I then show how to combine the LMS method with the resulting PDF expression to find a suitable noise floor cutoff and the corresponding maximum significant frequency. Least Median of Squares: LMS [143] is a robust regression technique used when up to 50% of the data is corrupted by outliers. Outliers will be considered as anything other than noise in the fourier spectrum for our application. In comparison to the widely used least sum of squares (LS) algorithm, the LMS replaces the sum for the median which makes LMS resilient to outliers. The difference between LS and LMS is highlighted as ∑︁ 𝐿𝑆 : min 𝑟𝑖2 , 𝑖 (2.5)   𝐿𝑀𝑆 : min median𝑖 (𝑟𝑖2 ) , Í where 𝑟 is the residual. Similar to the 𝑖 subscript in 𝑖, the 𝑖 in median𝑖 signifies that the median is of all residuals. Figure 2.7 shows an example application of the linear LMS regression. 76 Figure 2.7: LMS linear regression with 45% outliers. Results match those found in [143]. Specifically, this figure shows 110 data points drawn from the line 𝑦 = 𝑥 + 1 with added GWN of zero mean and 0.1 standard deviation. The data is corrupted with 90 outliers centered around (3, 2) with a normal distribution of 1.0 along 𝑥 and 0.6 along 𝑦. Figure 2.7 shows that the linear regression results closely match the actual trend line with the fitted line being 𝑦 = 0.998𝑥 + 1.012 in comparison to the actual 𝑦 = 𝑥 + 1. PDF and CDF of the magnitude of the Fast Fourier Transform of GWN: This section reviews the probability distribution function (PDF) and cumulative density function (CDF) for the Fourier Transform (FT) of white noise. Additionally, this section derives the location of the theoretical maximum of the PDF. The FT distribution of GWN [197] is described as 2 2|𝑋 | 𝐸−𝑤| 𝑋𝜎| 2 𝑃 |𝑋 | (|𝑋 |) = 𝑒 𝑥, (2.6) 𝐸 𝑤 𝜎𝑥2 where |𝑋 | is the magnitude of the FT of GWN, 𝑃 |𝑋 | is the probability density function of |𝑋 |, 𝜎𝑥 is the standard deviation of the GWN, and 𝐸 𝑤 is the window energy or number of discrete transforms taken during the FT. By setting the first derivative of 𝑃 |𝑋 | with respect to |𝑋 | equal to zero, the theoretical maximum of the PDF is √︄ 𝐸 𝑤 𝜎𝑥2 |𝑋 | max = . (2.7) 2 77 Figure 2.8: (a) Theoretical PDF for GWN. (b) CDF for GWN with an example cutoff at the 99% 𝐶𝑃. We calculate the CDF corresponding to the PDF described in Eq. (2.7) by combining the PDF in Eq. (2.6) with the CDF for a Rayleigh distribution as [173] − | 𝑋 |2 𝐸𝑤 𝜎 𝑥2 𝐶𝑃 |𝑋 | (|𝑋 |) = 1 − 𝑒 , (2.8) where 𝐶𝑃 |𝑋 | is the cumulative probability of |𝑋 |. Finding the Noise Floor: Our approach for finding the noise floor combines LMS with Eqs. (2.6) and (2.7). Specifically, I utilize LMS to obtain a 0-D fit of the Fast Fourier Transform (FFT) of the signal, which results in an approximate value of |𝑋 | max , which is |𝑋 | at the maximum of 𝑃 |𝑋 | . Using |𝑋 | max from the LMS fit, I then find the standard deviation of the distribution 𝜎𝑥 from Eq. 2.7, which is used to find a cutoff based on a set cumulative probability in Eq. (2.8). We begin by showing the accuracy of the LMS fit for finding |𝑋 | max . Our example uses GWN with a mean of zero and standard deviation of 0.035 with 1000 data points. Taking the FFT of the GWN ( see Fig. 2.9A) results in the distribution shown in Fig. 2.9B. The distribution shows a 1-D LMS fit of 8.215 compared to the theoretical maximum of the PDF from Eq. 2.7 of 7.826, which is approximately 4.67% greater. This shows that the 1-D LMS fit accurately locates |𝑋 | max . Additionally, the theoretical shape of the PDF in Fig. 2.9B is shown to be very similar to the actual distribution. 78 Figure 2.9: (A) FFT of GWN with 0.035 standard deviation and zero mean with the location of the theoretical maximum of the PDF and one-dimensional LMS regression value. (B) Distribution of GWN in the Fourier Spectrum with overlapped theoretical PDF and location of the theoretical maximum of the PDF and one-dimensional LMS regression value. Next, our approach utilizes Eq. (2.8) and 𝜎𝑥 derived from Eq. (2.7) for finding the cutoff value |𝑋 | cutoff . The |𝑋 | cutoff for a desired cumulative probability 𝐶𝑃 is found by solving Eq. (2.8) for |𝑋 | as √︃ |𝑋 | cutoff = −𝐸 𝑤 𝜎𝑥2 ln(1 − 𝐶𝑃). (2.9) In order to make |𝑋 | cutoff robust to normalization and scaling of the FFT I define the ratio 𝐶 between the suggested cutoff from Eq. (2.9) and the maximum of the PDF from Eq. (2.7) as |𝑋 | cutoff √︁ 𝐶= = −2 ln(1 − 𝐶𝑃). (2.10) |𝑋 | max Example Cutoff: An example of how Eqs. (2.7) and (2.9) are used is shown in Fig. 2.8, where the maximum of the PDF and the cutoff for 𝐶𝑃 = 99% are marked in Fig. 2.8a and 2.8b, respectively. For this example, I find the ratio C to be approximately 3.03 for a 99% probability. In addition, I suggest a cutoff ratio 𝐶 = 6 to be used for signals with less than 104 data points. This 79 yields an expected probability of ≈ 10−8 % for a point in the FFT of the GWN attaining a magnitude greater than |𝑋 | cutoff . Alternatively, Eq. (2.10) can be used to calculate a different value of C based on the desired probability and length of the signal. 2.2.2 Multi-scale Permutation Entropy for Selecting Delay In this section I develop a method based on Multi-scale Permutation Entropy (MPE) to find the periodicity of a signal, which is then used to find a suitable delay parameter. MPE is a method of applying permutation entropy over a range of delays for analyzing physiological time series [51]. Zunino et al. [252] showed how the first maxima in the MPE plot arises when 𝜏 matches the characteristic time delay 𝜏𝑟 . Furthermore, the periodicity can be captured by the first dip in the MPE plot as shown in Fig. 2.10 at the location d2 when the delay 𝜏 matches the characteristic time delay 𝜏𝑟 . f PE d0 d1 d2 t P d d0 d1 d2 P Figure 2.10: (right) Resulting MPE plot for (left) 2𝑃 periodic time series with example embedding delays d0 , d1 , and d2 . 𝜏 Figure 2.10 shows embedding delays 𝑑0 , 𝑑1 , and 𝑑2 calculated as 𝑑 = 𝑓𝑠 as well as their corresponding locations on a normalized MPE plot. This toy MPE plot shows that the normalized MPE reaches its first maximum when the delay is roughly 𝑑1 , which corresponds to approximately an even distribution of permutations. A second observation, as mentioned previously, is that at 𝑑2 (or the first dip in the MPE plot) there is a resonance or aliasing effect caused by 𝜏 ≈ 𝜏𝑟 , which can be used to determine the period of the time series. This is based on the embedding delay size at 𝑑2 causing the embedding vector size 𝑉 = 𝑑 (𝑛 − 1) to be approximately half the of the periodicity P, which can be expressed as 1 1 1 𝑑2 = 𝑃 = 𝜏𝑟 = , (2.11) 2 𝑓𝑠 2𝑓 80 where 𝑃 is the main period of oscillation, 𝑓 is the main frequency of the time series corresponding to 𝑃, and 𝑓𝑠 is the sampling frequency. The reason for the dip in the permutation entropy (PE) when the condition from Eq. (2.11) is met is caused from an aliasing effect, which reduces PE through more regularity in the permutation distribution. We use the criteria of Melosik and Marszalek [148] to determine a suitable delay from the location of the first dip at 𝑑2 . Their criteria states that the sampling frequency must fall within the range shown in Eq. (2.3). This range led to Eq. (2.4), which is used to calculate 𝜏. However, for MPE, I substitute 𝑓𝑠 and 𝑓max in Eq. (2.3) with 𝑓𝑠 = 2 𝑓 𝜏𝑟 from Eq. (2.11) and 𝑓max = 𝑓 . These substitutions allow Eq. (2.4) to reduce to 2 𝜏= 𝜏𝑟 , (2.12) 𝛼 where 𝛼 ∈ [2, 4]. These simplifications show that 𝜏 is only dependent on the delay which causes resonance 𝜏𝑟 when applying MPE. However, for a chaotic time series, the dip at 𝜏𝑟 may not be present due to non-linear trends. To address this issue, I will first investigate the three dominant regions of the MPE plot, which will also be located for a chaotic time series example. I will then propose a new, automatic method for selecting 𝜏 that agrees with the frequency criteria stated in Eq. (2.12). Additionally, in Section A.1 of the appendix I investigate the robustness of the method to noise and in Section C.1 of the appendix I provide the algorithm for finding 𝜏 using MPE. MPE Regions Riedl et al. [201] showed that the MPE plot can be separated into three distinct regions as described below and shown in Fig. 2.11. Region A shows a gradual increase in the permutation entropy until reaching a maxima at the transition between regions A and B. Oversampling or a low value of 𝜏 causes the motif distribution corresponding to the permutation entropy to be heavily weighted on just increasing or decreasing motifs (motifs (0,1,2) and (2,1,0) for 𝑛 = 3 from Fig. 2.3). This effect was coined as the “Redundancy Effect" by De Micco et al. [58], which means sufficiently low values of 𝜏 result in redundant motifs. However, as 𝜏 increases, the motif distribution becomes more equiprobable. Additionally, when the motif probability reaches a maximum equiprobability, the permutation entropy is at a maxima, which is the point of transitions 81 from region A to B. Region B shows a slight dip to the first minima. This reduction in permutation entropy is caused by the aliasing or resonance from the value of 𝑑 approaching half the main period length. At the transition from B to C, the resonance is reached, which provides information on the main frequency and period of the time series. Region C has possible additional minima and maxima from additional alignment of the embedding vector 𝑑 with multiples of the main period. This region was referred to as the “Irrelevant Region" by De Micco et al. [58] due to effectively large values of 𝜏 forcing the delayed sampling frequency to fall below the Nyquist sampling rate as described by the lower bound in Eq. (2.3). Figure 2.11: The three regions of the MPE plot for a periodic signal: (A) redundant, (B) resonant, and (C) irrelevant. MPE Example with Chaotic Time Series In Sections 2.2.2 and 2.2.2, I used a periodic time series to show and explain the regions developed in an MPE plot as well as an MPE-based method for determining a suitable embedding delay 𝜏. In this section I further show the applicability of this approach to chaotic signals using the 𝑥-coordinate of the Lorenz System as an example. I simulate the Lorenz equations 𝑑𝑥 𝑑𝑦 𝑑𝑧 = 𝜎(𝑦 − 𝑥), = 𝑥(𝜌 − 𝑧) − 𝑦, = 𝑥𝑦 − 𝛽𝑧, (2.13) 𝑑𝑡 𝑑𝑡 𝑑𝑡 with a sampling rate of 100 Hz and using the parameters 𝜌 = 28.0, 𝜎 = 10.0, and 𝛽 = 8.0/3.0. This system was solved for 100 seconds and only the last 15 seconds from the time series are used. Figure 2.12 shows the result of applying MPE to the simulated Lorenz system. Figure 2.12 shows similarities to Fig. 2.11 with a clear maxima at the boundary between regions A and B, albeit with no obvious minima. Therefore, a new distinct feature needs to be used to 82 Figure 2.12: MPE plot for the 𝑥 coordinate of the Lorenz system. Additionally, points in the MPE plot with their corresponding subsampled time series are shown for the redundant, resonant, and irrelevant regions as described in Section 2.2.2. determine 𝜏𝑟 . I suggest using the first maxima to find 𝜏 because this delay is likely to fall within the region described by Eq. (2.12). 2.2.3 Autocorrelation for Embedding Delay Autocorrelation is a traditional method for selecting 𝜏 for phase space reconstruction by using the correlation coefficient between the time series and its 𝜏-lagged version. This method was first introduced by Box et al. [25]. Typically, the autocorrelation function is computed as a function of 𝜏 and, as a rule of thumb, a suitable delay 𝜏 is found when the correlation between 𝑥(𝑡) and 𝑥(𝑡 + 𝜏) reaches the first folding time, i.e., when 𝜌 ≤ 1/𝑒 [106]. The two prominent correlation techniques that are commonly used when implementing an autocorrelation-based approach for finding 𝜏 are Pearson Correlation (see Section A.2 of appendix) and Spearman’s Correlation (see Section A.2 of appendix). Additionally, an example demonstrating how to calculate 𝜏 using autocorrelation and the difference between the two correlation methods is provided in Section A.2 of the appendix. 83 2.2.4 Mutual Information for Embedding Delay Mutual information (MI) can be used to select the embedding delay 𝜏 based on a minimum in the joint probability between two sequences. The mutual information between two discrete sequences was first realized by Shannon et al. [211] as ∑︁ ∑︁ 𝑝(𝑥, 𝑦) 𝐼 (𝑋; 𝑌 ) = 𝑝(𝑥, 𝑦) log , (2.14) 𝑥∈𝑋 𝑦∈𝑌 𝑝(𝑥) 𝑝(𝑦) where 𝑋 and 𝑌 are the two sequences, 𝑝(𝑥) and 𝑝(𝑦) are the probability of the element 𝑥 and 𝑦 separately, and 𝑝(𝑥, 𝑦) is the joint probability of 𝑥 and 𝑦. Fraser and Swinney [77] showed that for a chaotic time series the MI between the original sequence 𝑥(𝑡) and and delayed version 𝑥(𝑡 + 𝜏) will decrease as 𝜏 increases until reaching a first minimum. At this minima, the delay 𝜏 allows for the individual data points to share a minimum amount of information, which indicates sufficiently separated data points. While this delay value was specifically developed for phase space reconstruction, it is also used for the selection of the PE parameter 𝜏. We would like to point out that, in general, there is no guarantee that local minima exist in the mutual information, which is a serious limitation for computing 𝜏 using this method. All MI methods can be applied to either ranked or unranked data. We investigate four methods for estimating 𝜏 for PE using MI. These methods include MI with equal-sized partitions, adaptive partitions, and two permutation-based MI estimation methods. For details on these methods please reference the appendix in Section A.3. To determine the optimal MI approximation method for selecting 𝜏 for PE, Fig. 2.13 shows a comparison between the 𝜏 values computed from each of the MI methods and the corresponding values suggested by experts. The table shows that the adaptive partitioning method of Section A.3 results in an accurate selection of 𝜏 for the majority of systems. We will use the adaptive partitioning estimation method when making comparisons to other methods. For the exact values of 𝜏 from each of the MI methods please reference Table A.1 in the appendix. 84 Figure 2.13: A comparison between the calculated and suggested values for the delay parameter 𝜏 for multiple MI approximation methods. The methods investigated were equal-sized partition method, Kraskov et al. methods 1 and 2, and the adaptive partitioning approach. 2.2.5 Permutation Auto-mutual Information for Selecting Delay As shown in Section 2.2.4, Mutual information (MI) is a useful method for selecting 𝜏 for phase space reconstruction. However, it does not account for the permutation distribution when selecting 𝜏, which can lead to inaccuracies in computing the PE. To circumvent this issue, we develop a new method for selecting 𝜏 using Permutation Auto-Mutual Information (PAMI) [135], which was developed to detect dynamic changes in brain activity. We are tailoring PAMI for its application in the selection of the permutation entropy parameter 𝜏 for the first time. This is done by measuring the joint probability between the original permutations formed when a delay of 𝜏 = 1 is used and to the permutations when 𝜏 is incremented. PAMI is defined as 𝐼 𝑝 (𝜏, 𝑛) = 𝐻𝑥(𝑡,𝑛) + 𝐻𝑥(𝑡+𝜏,𝑛) − 𝐻𝑥(𝑡,𝑛),𝑥(𝑡+𝜏,𝑛) , (2.15) where 𝐻 is the permutation entropy described in Eq. (2.1). We suggest an optimal delay 𝜏 for a given dimension 𝑛 when PAMI is at a minimum. This delay corresponds to minimum shared information between the original permutations with 𝜏 = 1 and its time lagged permutations. By 85 applying this method for the simple sinusoidal function, we can form Fig. 2.14 with 𝑛 ∈ [2, 5] and 𝜏 ∈ [1, 50]. As shown, the window size is approximately independent of the dimension 𝑛, Figure 2.14: PAMI results for the sinusoidal function with 𝑛 ∈ [2, 5] and 𝜏 ∈ [1, 50]. The figure shows an optimal window size 𝜏(𝑛 − 1) ≈ 25. with an optimal window 𝜏(𝑛 − 1) ≈ 25 for the example. Through our analysis of the minimum PAMI as a function of the window size, we have developed a new method for selecting the optimal embedding window. However, we need the embedding dimension to suggest an optimal delay. Hence, we implement the common choice for 𝑛 ranging from 4 ≤ 𝑛 ≤ 6 for PE [201]. To reduce the computational demand, we suggest using permutation dimensions 𝑛 = 2 to find an optimal window size. In addition to the reduced computational demand of using 𝑛 = 2, we found that 𝐼 𝑝 (𝑛 = 2) ≈ 0 at the first minima. This also helps making this first minima even more simple. 2.3 Embedding Dimension Parameter Selection Methods The second parameter for permutation entropy that needs to be automatically identified is the embedding dimension 𝑛. The methods for determining 𝑛 fall into one of two categories: (1) independently determining 𝑛 and 𝜏, and (2) simultaneously determining 𝑛 and 𝜏 based on the width of the embedding window. For the first category, we investigate using the method of False Nearest Neighbors (FNN) [110] in Section 2.3.1, and Singular Spectrum Analysis (SSA) [26] in Section 2.3.2. For the second category, we contribute to the selection of 𝑛 by developing an automatic method using MPE from Section 2.3.3. This method combines the results for finding 𝜏 through MPE in Section 2.2.2 with the work of Riedl et al. [201]. We acknowledge that our work does not include other commonly used methods for independently calculating 𝑛 such as 86 box-counting [48], largest Lyapunov exponent [240], and Kolmogorov–Sinai entropy [182]. 2.3.1 False Nearest Neighbors for Embedding Dimension False Nearest Neighbors (FNN) is one of the most commonly used methods for geometrically determining the minimum embedding dimension 𝑛 for state space reconstruction [110]. For this method the time series is repeatedly embedded into a sequence of 𝑚-dimensional Euclidean spaces for a range of increasing values of 𝑚. The idea is that when the minimum embedding dimension 𝑚 is reached or 𝑚 ≥ 𝑛, the distance between neighboring points does not significantly change as we keep increasing 𝑚. In other words, the Euclidean distance 𝑑𝑚 (𝑖, 𝑗) between the point P𝑖 ∈ R𝑚 and its nearest neighbor P 𝑗 ∈ R𝑚 minimally changes when the embedding dimension increases to 𝑚 + 1. If the dimension 𝑚 is not sufficiently high, then the points are false neighbors if their pairwise distance significantly increases when incrementing 𝑚. This ratio of change in the distance between nearest neighbors embedded in R𝑚 and R𝑚+1 is quantified using the ratio of false nearest neighbors √︄ 2 (𝑖, 𝑗) − 𝑑 2 (𝑖, 𝑗) 𝑑𝑚+1 𝑚 𝑅𝑖 = 2 (𝑖, 𝑗) . (2.16) 𝑑𝑚 𝑅𝑖 is compared to the tolerance threshold 𝑅tol to distinguish false neighbors when 𝑅𝑖 > 𝑅tol . In this paper, we select 𝑅tol = 15 as used by Kennel et al. [110]. By applying this threshold over all points, we can find the number of false neighbors as a percent FNN 𝑃FNN . If there is no noise in the system, 𝑃FNN should reach zero when a sufficient dimension is reached. However, with additive noise present, 𝑃FNN may never reach zero. Thus, it is commonly suggested to use a percent FNN cutoff for finding a sufficient dimension 𝑛. We use the typically chosen cutoff 𝑃FNN < 10%, which is suitable for most applications when moderate noise is present. 2.3.2 Singular Spectrum Analysis for Embedding Dimension The singular spectrum analysis method was first introduced as a tool to find trends and prominent periods in a time series [26]. Leles et al. [129] summarized the SSA procedure as (1) immersion, 87 (2) Singular Value Decomposition (SVD), (3) grouping, and (4) diagonal averaging. Specifically, immersion embeds the time series into a dimension 𝐿 to form a Hankel matrix, SVD factors all the matrices, grouping combines the matrices that are similar in structure, and diagonal averaging reconstructs the time-series using the combined matrices. The needed embedding dimension is determined from the SVD by calculating the ratio 𝐷 𝑔𝐿 𝐷= (2.17) 𝑔𝑟 of the sum of the 𝐿th diagonal entries 𝑔 𝐿 to the sum of the total diagonal entries 𝑔𝑟 . When 𝐷 exceeds 0.9, we consider the dimension to be high enough and set 𝑛 = 𝐿, which can then be used as the embedding dimension for permutation entropy. 2.3.3 Multi-scale Permutation Entropy for Permutation Dimension Riedl et al. [201] showed how MPE can be used to determine an embedding dimension 𝑛. This method requires the embedding delay 𝜏 to be set to the length of the main period of the signal as shown in Section 2.2.2. The theory behind the method is based on normalizing the MPE according to −1 ℎ′𝑛 = 𝐻 (𝑛), (2.18) 𝑛−1 where ℎ′𝑛 is the PE normalized using the embedding dimension, and 𝐻𝑛 is the PE calculated from Eq. (2.1). Riedl et al. [201] determine the embedding dimension by incrementing 𝑛 to find the largest corresponding normalized PE ℎ′𝑛 with an embedding delay 𝜏 heuristically determined from the main period length. They concluded that the ℎ′𝑛 with the highest entropy accurately accounts for the needed complexity of the time series, and therefore suggests a suitable embedding dimension. Rield et al. [201] show how this method provides an accurate embedding dimension for the Van- der-Pol-oscillator, Lorenz system, and the logistic map. However, the method is not automatic due to the reliance on a heuristically chosen 𝜏. To make the process automatic, we introduce an algorithm based on Section 2.2.2 to automati- cally select the correct 𝜏, which we then use in conjunction with Eq. (2.18) to find 𝑛 corresponding 88 to the maximum ℎ′𝑛 . Additionally, we suggest scaling 𝑛 from 3 to 8 as we have not yet found a system requiring 𝑛 > 8 using this method. 2.3.4 Method Comparisons and Conclusions To make conclusions about the described methods for determining 𝜏 and 𝑛, we made comparisons to values suggested by experts. The majority of the suggested parameters are taken from the work of Riedl et al. [201], while parameters for the Rossler system and sine wave are from Tao et al. [227]. Figures 2.15 and 2.16 show the calculated and suggested values for 𝜏 and 𝑛, respectively. For the exact values of 𝜏 and 𝑛 from each of the parameter estimation methods please reference Tables A.2 and A.3 in the appendix, respectively. Additionally, script for reproducing the results found in this paper are provided through the Mendeley. Figure 2.15: A comparison between the calculated and suggested values for the delay parameter 𝜏. The methods investigated were MI with adaptive partitions, Spearman’s Autocorrelation (AC), the frequency analysis, Multi-scale Permutation Entropy (MPE), and Permutation Auto-mutual Information (PAMI) with 𝑛 = 5. 89 Figure 2.16: A comparison between the calculated and suggested values for the embedding dimen- sion 𝑛. The methods investigated were False Nearest Neighbors (FNN), Multi-scale Permutation Entropy (MPE), and Singular Spectrum Analysis (SSA). Embedding Delay Figure 2.15 shows the automatically computed 𝜏 in comparison to the expert- identified values for a variety of systems. These systems fall within several categories including the following: noise, chaotic differential equations, periodic systems, nonlinear difference equations, and medical data. The methods presented in Fig. 2.15 include PAMI from Section 2.2.5, MI calcu- lated using adaptive partitioning from Section A.3, Spearman’s Autocorrelation from Section 2.2.3, MPE from Section 2.2.2, and the frequency approach from Section 2.2.1. For the noise category we only investigated Gaussian white noise, and all the methods accurately suggest an embedding delay. For the second category of chaotic differential equations, Mutual Information approximated using adaptive partitions accurately provided suitable delay values. However, there are possible modes of failure for MI. To validate that MI is accurately selecting a value for 𝜏, we recommend also calculating 𝜏 using the frequency approach. For the third category, periodic systems, we only investigated a simple sinusoidal function. This resulted in both MPE and the Frequency approach providing accurate suggestions. Therefor, we suggest using both of these methods to calculate 𝜏 90 for periodic systems. Additionally, we do not suggest the use of MI for periodic systems as it can have early false minima resulting in inaccurate delay selection. For difference equations we found that PAMI, autocorrelation, MPE, and the frequency approach provide accurate suggestions for the delay. Finally, when testing each method on medical data with intrinsic noise, we found that the noise-robust frequency approach yielded the optimal parameter selection for 𝜏. As a generalization of the results found, we suggest the use of MI with adaptive partitioning when selecting 𝜏 for chaotic differential equations. For periodic systems, nonlinear difference equations, and ECG/EEG data we suggest the use of the frequency approach that we developed in this paper. However, when applying the frequency approach to quasiperiodic time series with multiple harmonics of decreasing amplitude, the method may fail due to the delay being selected based on an insignificant high frequency. The use of either Spearman’s autocorrelation or MPE may be more suitable under this condition. In general, multiple methods should be used for each system to validate that an accurate delay is selected due to the possible modes of failure of each method. Specifically, The frequency approach may fail if the noise does not have a Gaussian distribution, MI can fail if a false minima occurs or the relationship is monotonic, and autocorrelation can fail if the time series being analyzed does not oscillate about a fixed value. Embedding Dimension Figure 2.16 shows the automatically computed parameter 𝑛 in com- parison to the expert-identified values. It can be seen that both MPE and FNN commonly had parameters within the range specified for all categories. However, SSA failed to provide a con- sistently suitable embedding dimension 𝑛. This leads to the conclusion that either MPE or FNN are sufficient methods for determining the embedding dimension for the majority of the considered applications. However, FNN may fail if the effects of noise are not correctly accounted for, which can lead to overly large embedding dimensions. These results also show that the dimension 𝑛 = 6 works well for almost all applications. 91 2.4 Topological Methods for Delay Parameter Selection The main thrust for this work is on parameter selection for permutation entropy and state space reconstruction using topological methods. To do this, a goal of this work is to relate the distribution of permutations formed from a given delay 𝜏 to the state space reconstruction with the same delay 𝜏. This connections will show the time delay for both permutations and state space reconstruction are related. Establishing this relationship allows for tools from TDA to be used for delay parameter selection. Figure 2.17: Example formation of a permutation sequence from the time series 𝑥(𝑡) = 2 sin(𝑡) with sampling frequency 𝑓𝑠 = 20 Hz, permutation dimension 𝑛 = 3 and delay 𝜏 = 40. The corresponding time-delay embedded vectors from 𝑥(𝑡) with the permutation binnings (𝜋1 , . . . , 𝜋6 ) in the state space are shown in the bottom figure. Let me first start by redescribing the process for state space reconstruction and its similarity to permutations. As described by Takens’ [226], I can reconstruct an attractor that is topologically equivalent to the original original attractor of a dynamical system by embedding a 1-D signal into R𝑛 by forming a cloud of delayed vectors as 𝑣 𝑖 = [𝑥(𝑡𝑖 ), 𝑥(𝑡𝑖+𝜏 ), 𝑥(𝑡𝑖+2𝜏 ), . . . , 𝑥(𝑡𝑖+(𝑛−1)𝜏 )] for 𝑖 ∈ [0, 𝐿 − 𝑛𝜏], where 𝐿 is the length of the discretely and uniformly sampled signal. Permutation are formed in a very similar fashion where I take our vectors 𝑣 𝑖 and find its symbolic representation 92 based on its ordinal ranking as explained in Section 2.1. The different permutations types can be viewed as a inequality-based binning of the R𝑛 vector space of the reconstructed dynamics as shown in Fig. 2.17 for dimension 𝑛 = 3. This provides a first intuitive understanding of the connection between permutation and state space reconstruction, however, I need to determine some connection between the optimal 𝜏 parameter used in 𝑣 𝑖 and determine if it is also an optimal delay 𝜏 PE. Takens’ embedding theorem explains that, technically, any delay 𝜏 would be suitable for recon- structing the original topology of the attractor, however, this has the requirement of unrestricted signal length and no additive noise in the signal [226]. Since this is rarely a condition found in real-world signals, a 𝜏 is chosen to unfold the attractor such that the effects of noise have a minimal effect on the topology of the reconstructed dynamics. Let us now explain what I mean by the correspondence between 𝜏 and the unfolding of the dynamics and what effect this has on the corresponding permutations. If the delay 𝜏 is too small (e.g. 𝜏 = 1 for a continuous dynamical system with a high smapling rate) the delay embedded reconstructed attractor will be clustered around the hyper-diagonal in R𝑛 space. Additionally, the corresponding permutations will be overwhelmingly dominated by the permutation types 𝜋1 and 𝜋𝑛! with these two permutations being of the all increasing and decreasing ordinal patterns, respectively. The dominance of these two permutations for a delay 𝜏 that is too small was termed by De Micco et al. [59] as the “redundancy effect." For an example of this see the permutation distribution and clustering about the hyper-diagonal in R3 as shown in Fig. 2.18. This example is based on the x-solution to the periodic Rossler dynamical system as described in Section C.1. As the delay increases pass the redundancy effect, the reconstructed attractor begins to unfold to have a similar shape and topology as the true attractor. Corresponding with this unfolding, as the delay increase the permutation distribution tends towards a more equiprobable distribution (See Fig. 2.18 at 𝜏 ≈ 14). A way of summarizing the permutation probability distribution is actually through PE itself and more specifically the analysis of Multi-scale Permutation Entropy (MsPE). Riedl et al. [200] showed how after the redundancy effect there is a suitable delay for PE, which I related to the first maxima of the MsPE plot [161]. The MsPE plot for our periodic Rossler 93 example is shown in Fig. 2.18. Let us also look at the MI plot as a comparison. The theory behind MI states that at the first minima of the mutual information between 𝑥(𝑡𝑖 ) and 𝑥(𝑡𝑖+𝜏 ) the delay 𝜏 accurately provides a suitable delay for state space reconstruction. By a quick investigation of the MI function I can observe that there is a high degree of correlation between the MI function and the MsPE function with the first maxima of MsPE being approximately at the same 𝜏 as the first minima of MI. When the delay becomes signficantly larger than the first minima of MI or maxima of MsPE, the permutation distribution begins to fluctuate as shown in Fig. 2.18. This effect was termed as the “irrelevance" effect by De Micco et al. [59]. This increasing of 𝜏 beyond the the first minima also correlates with, as described by Kantz and Schreiber [105], the reconstruction filling an overly large space with the vectors already being independent. Additionally, at a minima beyond the first minima, Fraser and Swinney [78] showed how the reconstructed attractor shape will no longer qualitatively match the shape of the true state space. Figure 2.18: Example comparing first minima of mutual information and first maxima of multi-scale permutation entropy, which demonstrates the correspondance between the two. On the left are the 𝑛 = 3 time delayed state space reconstructions with an inaccurately chosen 𝜏 = 1 and appropriate 𝜏 = 14. On the right shows the permutation distribution as 𝜏 increases and the associated multi- scale permutation entropy and mutual information plots. I have now shown with both an example and a qualitative analysis that the optimal 𝜏 for permutation entropy and the state space reconstruction are correlated with the unfolding of the 94 reconstructed attractor. While I do not provide a proof that PE and state space reconstruction use the same 𝜏, it has recently been shown that there is a connection between co-homology, information theory, and probability does exist [18], which strengthens our qualitative analysis of this connection. In the future sections I will leverage tools from TDA to determine the optimal 𝜏 associated with the unfolding of the attractor. Some of the methods that I have researched are an adaptation of SW1PerS [177] for the delay parameter selection and two methods to estimate the dominate frequency in a signal using sublevel set persistence which can be used for delay parameter selection. 2.4.1 Finding 𝜏 Using SW1PerS In this section I develop a novel method implementing persistent homology for estimating an appropriate delay for permutations and state space reconstruction. Specifically, we investigate the effects of varying 𝜏 ∈ [1, 𝜏max ] on the calculation of the maximum persistence and the periodicity score from SW1PerS [180]. Perea and Harer developed SW1PerS as a TDA method for measuring periodicity in a time series; however, our goal is to leverage this method for use in determining a suitable selection of 𝜏 for permutation entropy and state space reconstruction based on the unfolding of an attractor and the associated 1-D persistent homology. SW1PerS uses 1-D persistent homology to measure how periodic or significant the circular shape of an embedded time series (point cloud) is as 𝜏 increases, which corresponds to the embedding window size increasing as 𝑤 = 𝑚𝜏 with 𝑚 as the embedding dimension of the sliding window vector. Specifically, the sliding window 𝑆𝑊 for SW1PerS is defined as 𝑆𝑊𝑚,𝜏 𝑓 (𝑁)(𝑡) = [ 𝑓 (𝑁)(𝑡), 𝑓 (𝑁)(𝑡 + 𝜏), . . . , 𝑓 (𝑁)(𝑡 + 𝑚𝜏)], (2.19) where 𝑓 (𝑁)(𝑡) is a truncated Fourier series of the signal and 𝜏 and 𝑚 are, respectively, SW1PerS’ embedding delay and dimension. Applying Eq. (2.19) to a sliding window of width 𝑤 across the domain of the time series results in a collection of vectors known as a point cloud, which live in an 𝑚-dimensional Euclidean space. However, it may not be desirable to use all of the embedded 95 1 0 5 10 15 20 Figure 2.19: Example showing three sample windows with 𝑚 = 2 of increasing size, which is slid across the entire time series (periodic Rossler system) resulting in the embedded time series in R2 . The window size is defined as 𝑤 = 𝑚𝜏 with (left) 𝑤 𝑠 𝑚𝜏𝑠 being too small with 𝜏𝑠 = 1 and an embedding shape concentrated on the diagonal line and a high periodicity score 𝑠 and low L, (middle) 𝑤 𝑜 is properly sized and results in a minimum periodicity score 𝑠 and maximum L suggesting an optimal delay 𝜏𝑜 = 10, and (right) 𝑤 ℓ with 𝜏 = 17 is too large and results in a high periodicity score 𝑠 and low L. vectors from (2.19) due to the 𝑂 (𝑛3 ) time complexity of calculating the persistent homology of a point cloud via the Vietoris-Rips complex. To improve the calculation time we chose to use a sparse version of the point cloud through a subsampling to have 𝑛𝑇 windows from the original point cloud. We chose to set the number of sliding windows as 𝑛𝑇 = 200 to be sufficiently high to detect circular structure in the embedding. For SW1PerS, 𝑚 is determined based on the theory developed by Perea et al. [180], which showed the necessary value of 𝑚 for reconstruction is bounded by 𝑚 ≥ 2𝑁 (here we use 𝑚 = 2𝑁), where N is the number of Fourier terms necessary for reconstructing the signal to some desired accuracy. In this work we automate choosing 𝑁 by approximating the Fourier series using the discrete Fourier transform. To do this we compute the normalized ℓ2 norm between the reconstructed time series from the truncated Fourier series and the original signal. The ℓ2 norm is used to obtain the value of 𝑁 that yields an error within a desired threshold of ℓ2 (𝑁) < 0.25. Specifically, if we let the time series 𝑋 be a discrete time sampling of a piece-wise smooth signal 𝑥(𝑡), then the 𝑁-partial sum of the Fourier series of 𝑥(𝑡) can be approximated according to 𝑁 |𝑋 | 1 ∑︁  ∑︁  −2𝜋𝑖 𝑗 𝑘/𝑇 2𝜋𝑖𝑘𝑡/𝑇 𝑓 (𝑁)(𝑡) = 𝑋 ( 𝑗)𝑒 𝑒 , (2.20) |𝑋 | 𝑘=0 𝑗=0 96 where 𝑋 is the original signal that has been point-wise centered and normalized with |𝑋 | as the length of the signal. As a rule of thumb 𝑁 ≈ |𝑋 |/8 yields an accurate reconstruction of 𝑥(𝑡) [8], which we use as an upper bound of 𝑁. The relative ℓ2 norm that measures the error between time series 𝑋 and its reconstruction 𝑓 (𝑁)(𝑡) is given by Í |𝑋 |   2  1/2 𝑗=0 𝑋 ( 𝑗) − 𝑓 (𝑁)( 𝑗) ℓ2 (𝑁) = Í  1/2 . (2.21) |𝑋 | 2 𝑗=0 𝑋 ( 𝑗) For our application, we consider 𝑓 (𝑁)(𝑡) as sufficiently close to 𝑥(𝑡) when we find a value of 𝑁 for which ℓ2 (𝑁) < 0.25. We chose 0.25 as it provides dimension 𝑚 which are not overly large (𝑚 < 10 typically) and it deals with the possibility of moderate additive noise in the signal. Using the truncated Fourier series we are also able to determine an upper bound for 𝜏 using the Nyquist sampling criteria as 𝑓𝑠 𝜏max = , (2.22) 2min( 𝑓sig ) where 𝑓sig are the 𝑁 significant frequencies from the truncated fast Fourier transform. We now have all the components we need to apply SW1PerS. To determine the optimal delay using persistent homology, we investigated two summaries of the resulting persistence diagrams from SW1PerS: (1) the maximum lifetime as L = max(pers( D̃1 ) (2.23) with D̃ as the SW1PerS persistence diagram and (2) the periodicity score, which was defined in [178] as 𝑟 𝐵2 − 𝑟 𝐷2 𝑠 =1− , (2.24) 3 where 𝑟 𝐵 and 𝑟 𝐷 are the birth and deaths times associated to max(pers( D̃1 ). We then calculate these point summaries for each 𝜏 as we vary 𝜏 ∈ [1, 𝜏max ] to generate 𝑠® and L® for the periodicity scores 𝑠 and persistence maximums L, respectively. To demonstrate the functionality of this method, let us implement a simple example using the periodic Rossler system (see Fig. 2.19). This example shows three different window sizes for 97 embedding dimension 𝑚 = 2 (this dimension was chosen for visualization purposes)and 𝜏 = 1, 10, and 17, to show the resulting scores for a small, optimal and overly large window size, respectively. Figure 2.19 shows that the optimal window size at 𝜏𝑜 = 10 results in a maximum L and minimum 𝑠 over the range 𝜏 ∈ [1, 𝜏max ], where 𝜏max = 20 from the truncated Fourier spectrum. This suggests that an appropriate delay for both state space reconstruction via Takens’ embedding and permutation entropy is 𝜏 = 10. Figure 2.20: Example periodicity 𝑠 and max persistence L plots for the chaotic Rossler system with associated cutoffs to determine the average 𝜏. For a chaotic time series, choosing 𝜏 from the minimum or maximum of 𝑠 and L is not as trivial as the example shown in Fig. 2.19. Specifically, due to the non-linear behavior of a chaotic time series there may not always be a clear, single minima as shown in the example periodic Rossler system, but rather two or more local minima with similar prominence. To accurately approximate the average minima and select an associated delay 𝜏, we will use heuristic cutoffs 𝐶𝑠 and 𝐶L , 1 1 where these cutoffs are defined as 𝐶𝑠 = 2 [max(𝑠) + min(𝑠)] and 𝐶L = 2 [max(L) + min(L)]. Specifically, we will choose 𝜏 based on the average 𝜏 such that 𝑠 ≥ 𝐶𝑠 or L ≥ 𝐶L . To demonstrate this method we use a chaotic response of the Rossler system and calculate the two cutoffs as shown in Fig. 2.20. This example results in an average delay greater than 𝐶L as 𝜏 = 12 and less than 𝐶𝑠 as 𝜏 = 12. This example demonstrated that the method of selecting the average 𝜏 greater or less than the cutoffs results in a similar 𝜏 for both periodic and chaotic time series. 98 2.4.2 Finding 𝜏 Using Sublevel Set Persistence In this section our goal will be to leverage sublevel set persistence for the selection of 𝜏 for both state space reconstruction and permutation entropy. Specifically, our goal is to automate the frequency analysis method [149] for selecting 𝜏 for state space reconstruction by analyzing both the time and frequency domain of the signal using sublevel set persistence. The method developed by Melosik and Marszalek [149] uses the maximum significant frequency 𝑓max and the sampling frequency 𝑓𝑠 to select an appropriate 𝜏 as 𝑓𝑠 𝜏= , (2.25) 𝛼 𝑓max where 𝛼 ∈ [2, 4] with an 𝛼 = 2 associated to the Nyquist sampling rate and 𝛼 > 4 producing an oversampling. Since this method was developed using the Nyquist sampling rate, we will first include its associated assumptions as a continuous, bandlimited signal. This frequency based approach was founded on the requirements for suitable delays for the 0/1 test on chaos and the a heuristic comparison between the Lorenz attractor and a delay-reconstruction of the Lorenz attractor. The heuristic comparison showed that this frequency approach actually provided more accurate delay parameter selections for state space reconstruction than the mutual information function when trying to replicate the shape of the attractor. Unfortunately, a major drawback of this method is the non-trivial selection of 𝑓max . In Melosik’s and Marszalek’s original work [149] the maximum frequency was manually selected using a normalized, such that ∈ [0, 1], Fast Fourier Spectrum (FFT) cutoff of approximately 0.01, which does not address the possibility of additive noise. In our previous work [161] we approximated the maximum “significant" frequency in a time series using the FFT and defining a power spectrum cutoff based on the statistics of additive noise in the FFT. An issue with this method for non-linear time series is that the Fourier spectrum does not easily yield itself to selecting the maximum “significant" frequency for chaotic time series even with an appropriately selected cutoff to ignore additive noise. Additionally, the method was only developed for Gaussian White Noise (GWN) contamination of the original time series. To improve the selection of the maximum frequency in this section we developed two novel 99 methods based on 0-D sublevel set persistence. We chose to use 0-D sublevel set persistence due to its computational efficiency and stability for true peak selection [49, 115]. The first method is based on a time domain analysis of the sublevel set lifetimes (see Section 2.4.2) and the second implements a frequency domain analysis using sublevel set persistence and the modified 𝑧-score (see Section 2.4.2). Time Domain Approach The first approach we implement for estimating the maximum signif- icant frequency of a signal is based on a time domain analysis of the sublevel set persistent. This process uses the time ordered lifetimes from sublevel set persistence diagram. We previously intro- duced time ordered lifetimes and a cutoff separating the sublevel sets associated with noise in [11]. Here we use those methods and results to find the time 𝑡 𝐵 in which all the significant sublevel sets are born. Fig. 2.21 shows a resulting time order lifetimes plot where the time between two adjacent lifetimes is defined as 𝑇𝐵𝑖 . If we use 𝑇𝐵𝑖 as an approximation of a period in the time series, then Figure 2.21: Example demonstrating process from time series 𝑥 (periodic Rossler system) to sublevel set persistence diagram to time ordered lifetimes on the bottom left. Additionally, on the bottom left shows a sample time periodic between sublevel sets as 𝑇𝐵𝑖 . we can calculate the associated frequencies as 𝑓𝑖 = 1/𝑇𝐵𝑖 Hz. If we then look at the distribution of 𝑓𝑖 , the maximum “significant" frequency can be approximated using the 75% quantile of the distribution of the frequencies as 𝑓max ≈ 𝑄 75 ( 𝑓 ). This quantile allows for a few outlying high frequencies to occur without having a significant effect on the estimate of the maximum frequency. Applying this method to the periodic Rossler system results in a 𝜏 = 10 with the corresponding 100 Figure 2.22: Example demonstrating the time delay 𝜏 = 10 result for the periodic Rossler example time series shown in the top figure and the resulting 𝑛 = 2 Takens’ embedding. state space reconstructions for 𝑛 = 2 shown in Fig. 2.22. This suggested delay is very similar to that of mutual information (𝜏 = 12). This result suggests that the time-domain analysis for selecting the maximum frequency and corresponding delay functions should accurately suggest an appropriate delay for permutation entropy and state space reconstruction. Times Fourier 0-D Modified Cutoff/ Embedding Series Transform Persistence -score Max. Freq. Delay Death Death Max Frequency Cutoff Frequency Birth Birth Frequency Figure 2.23: Overview of procedure for finding maximum significant frequency using 0- dimensional sublevel set persistence and the modified 𝑧-score for a signal contaminated with noise. Fourier Spectrum Approach In this section we present a novel TDA based approach for finding the noise floor in the Fourier spectrum for selecting the maximum significant frequency 𝑓max to be used for selecting 𝜏 for PE through Eq. (2.25). Specifically, we show how the 0-dimensional sublevel set persistence, a tool from TDA discussed in Section 1.1, can be used to find the significant lifetimes and associated frequencies the frequency spectrum. while it would then be ideal to analyze the theoretical distribution of the sublevel set lifetimes of the FFT of a random process, this would 101 not be a trivial task. There have been studies on pushing forward probability distributions into the persistence domain [3, 4, 104], but it is difficult to obtain a theoretical cutoff value in persistence space. Therefor, without doing an in depth statistical analysis of the distributions, we will calculate a instead use the modified 𝑧-score. Specifically, we separate the noise lifetimes from significant lifetimes through the use of the modified 𝑧-score, which allows us to find the noise floor and maximum significant frequency via a cutoff. This process for finding the cutoff and associated maximum frequency is illustrated in Fig. 2.23. The following paragraphs give an overview of the modified 𝑧-score and cutoff analysis. Modified 𝑧-score The modified 𝑧-score 𝑧 𝑚 is essential to understanding the techniques used for isolating noise from a signal [209]. The standard score, commonly known as the 𝑧-score, uses the mean and the standard deviation of a dataset to find an associated 𝑧-score for each data point and is defined as 𝑥−𝜇 𝑧= , (2.26) 𝜎 where 𝑥 is a data point, 𝜇 is the mean, and 𝜎 is the standard deviation of the dataset, respectively. The 𝑧-score value is commonly used to identify outliers in the dataset by rejecting points that are above a set threshold, which is set in terms of how many standard deviations away from the mean are acceptable. Unfortunately, the 𝑧-score is susceptible to outliers itself with both the mean and the standard deviation not being robust against outliers [130]. This led Hampel [91] to develop the modified 𝑧-score as an outlier detection method that is robust to outliers. The logic behind the modified 𝑧-score or median absolute deviation (MAD) method is grounded on the use of the median instead of the mean. The MAD is calculated as MAD = median(|𝑥 − 𝑥|), ˜ (2.27) where 𝑥 is a data set and 𝑥˜ is the median of the dataset. The MAD is substituted for the standard deviation in Eq. (2.26). To complete the modified 𝑧-score, Iglewicz and Hoaglin [99] suggested to additionally substitute the mean with the median. The resulting equation for the modified 𝑧-score 102 is then quantified as 𝑥 − 𝑥˜ 𝑧 𝑚 = 0.6745 , (2.28) MAD where the value 0.6745 was suggested from [99]. We can now use the modified 𝑧-score 𝑧 𝑚 for evaluating the “significance" of each point in the sublevel set persistence diagram of the Fourier spectrum. A threshold for separating noise in the persistence domain is discussed in the following paragraph. Threshold and Cutoff Analysis To determine the noise floor in the normalized Fast Fourier Transform (FFT) spectrum, we compute the 0-dimensional persistence of the FFT. This provides relatively short lifetimes for the noise, while the prominent peaks, which represent the actual signal, have comparatively long lifetimes or high persistence. To separate the noise from the outliers we calculate the modified 𝑧-score for the lifetimes in the persistence diagram. We can then determine if the lifetime is associated to noise or signal based on a 𝑧 𝑚 cutoff as 𝐷, where we can label a lifetime as signficant (an outlier) if 𝑧 𝑚 > 𝐷. Iglewicz and Hoaglin [99] suggest a 𝑧 𝑚 threshold of 𝐷 = 3.5 based on an analysis of 10,000 random-normal observations. However, we apply both the FFT and 0-D sublevel set persistence to the original signal so it would be appropriate to determine if this cutoff is suitable for our application. To do this we used a signal of 10,000 random-normal observations and applied a FFT and then calculated 0-D sublevel set lifetimes as our signal to analyze using the modified z-score 𝑧 𝑚 . For an accurate cutoff we would expect to label all of the lifetimes as noise with 𝑧 𝑚 < 𝐷 since the signal is observations are composed of pure noise. As shown in Fig. 2.24, a threshold of approximately 𝐷 = 4.8 labels all of the lifetimes as noise. This threshold was rounded up to 5 for simplicity. We can now simply define a cutoff based on the labeling of of each lifetime from the modified 𝑧-score with Cutoff = max(lifetimenoise . We can now find the maximum significant frequency 𝑓max as the highest frequency in the Fourier spectrum with an amplitude greater than the specified cutoff. For this method to accurately function, it is required that there is some additive noise in the time series. To accommodate this, additive Gaussian noise with Signal-to-Noise Ratio of 30 dB is added to the time series before 103 Figure 2.24: Percent of the persistence points from 0-D sublevel set persistence of the FFT of GWN using the modified 𝑧-score with the provided threshold ranging from 0 to 5. calculating the FFT. If we apply this method to the example periodic Rossler system time series we find a suggest delay of 𝜏 = 5. In comparison to mutual information this delay is approximately half as large as it should be. However, we will investigate its accuracy on several other systems in Section 4.3 to make conclusions on the functionality of this method for selecting 𝜏. 2.4.3 Permutation Dimension In this section we will show that, contrary to the delay selection, the dimension for permutation entropy is not related to that of Takens’ embedding. Additionally, we will provide a simple method for selecting an appropriate permutation dimension based on the permutation distribution. The goal of permutation entropy is to differentiate between the complexity of a time series when there is a dynamic state change (e.g. periodic compared to chaotic), so the dimension should be chosen such that it is large enough to capture these changes. To accomplish this we suggest that permutations of the time series do not occupy all of the possible permutations, but rather only a fraction of the permutations when an appropriate delay is selected. This criteria is set so that a change can be captured by an increase/decrease in the number of permutations and their associated probabilities. Because of this, we suggest a dimension where, at most, only 50% of the permutations are used. However, it may be more suitable to select a dimension where a lower percent are used (e.g. 10%). To begin this method for determining if the dimension is high enough to capture the time series 104 complexity we will define 𝑁 𝜋 as the number of permutation types where the probability of that permutation type is significant. Specifically, we will consider the probability of that permutation to be significant if the number of occurrences of permutation 𝜋 is greater than 10 percent of the maximum number of occurrences of any permutation type from dimension 𝑛. The permutation delay 𝜏 was selected from the expert suggested values provided in [161, 200]. We can now express our needed dimension as the ratio and inequality 𝑁𝜋 ≤ 𝑅, (2.29) 𝑛! where 𝑅 = 0.50 for the suggested maximum 50% criteria. To compare this dimension to stand Takens’ embedding tools for selecting 𝑛 we will implement four examples: 𝑡 𝑥 1 (𝑡) = 10 𝑥 2 (𝑡) = sin(𝑡) (2.30) 𝑥 3 (𝑡) = sin(𝑡) + sin(𝜋𝑡) 𝑥 4 (𝑡) = N (𝜇 = 0, 𝜎 2 = 1), where 𝑡 ∈ [0, 100] with a sampling rate of 20 Hz and N is Gaussian additive noise. By applying Eq. (2.29) to the time series in Eq. (2.30), we can suggest dimensions of 2, 4, 6, and 7 for time series 𝑥𝑖 (𝑡) with 𝑖 ∈ [1, 4] as shown in Fig. 2.25, respectively. In comparison to Takens’ embedding, for time series 𝑥 2 (𝑡) dimension 𝑛 = 2 would be sufficient, but if this was used for permutation entropy, no increase in complexity could be detected. Addi- tionally, this result suggests an upper bound on the dimension for permutation entropy as 𝑛 ≈ 9 as the ratio in Eq. (2.29) is approximately 0 for dimensions 𝑛 > 9. As a rule of thumb from this result, a dimension of 8 would be suitable for almost all applications, but it would be optimal to minimize the dimension to reduce the computation time of PE. In Section 4.3 we will show the resulting suggested dimensions using this method for a wide variety of dynamical systems. 105 10 x1(t) 5 0 0 20 40 60 80 100 1 t x2(t) 0 −1 0 20 40 60 80 100 2 t x3(t) 0 −2 0 20 40 60 80 100 3 t x4(t) 0 −3 0 20 40 60 80 100 t 103 Nπ 102 101 x1(t) 100 x2(t) 2 3 4 5 6 7 8 9 1.0 x3(t) 0.8 x4(t) Nπ /n! 0.6 0.4 0.2 0.0 2 3 4 5 6 7 8 9 n Figure 2.25: Percent of permutations used 𝑅 = 𝑁 𝜋 /𝑛! for each example time series (see Eq. (2.30)) as the dimension is incrmented. 2.4.4 Results for Topological Data Analysis Methods In this section we will provide the results of the parameter selection methods. First, in Section 2.4.4, we calculate the delay parameter for a wide variety of dynamical systems and data sets using mutual information and the the automatic TDA-based methods described in this manuscript. Unfortunately, the optimal parameters can not be decided based on a simple entropy value comparison since there is no direct equivalence between PE and other entropy approximations of a signal such as Kolmogorov-Sinai (KS) entropy with only a bounding between the two as KS ≤ PE [107]. Therefor, to determine the accuracy of the automatically selected PE parameters we implement two other methods of comparison. The first comparison is to expert suggested parameters for a wide variety of systems (see Section 2.4.4). The second approach is a comparison to optimal parameters based on having a significant difference between the PE of two different states for each system. Of course the second method has the requirement that we have a system model or data set with two different states for comparison, which is not typically the case, but does allow for an approximation of 106 optimal PE parameters for these systems. These comparisons are discussed in Section 2.4.4. The second half of the results, in Sections 2.4.4 and 2.4.4, is based on analyzing the robustness of the automatic TDA-based PE parameter selection methods to additive noise contamination and signal length requirements, respectively. Parameter Value Comparison for Common Dynamical systems To determine a range of approximately optimal PE parameters we will quantify the difference between PE values for a wide range of delays and dimensions with the difference for a given 𝜏 and 𝑛 calculated as Δℎ𝑛 (𝜏) = ℎ𝑛(Ch.) (𝜏) − ℎ𝑛(Pe.) (𝜏), (2.31) where the superscripts Ch. and Pe. represent the PE calculation on the chaotic and periodic time series for the given dynamical system. The specific parameters used to generate periodic and chaotic responses for each system are described in the Appendix Section C.1. If we apply Eq. (2.31) to the Rossler system for 𝜏 ∈ [1, 15] and 𝑛 ∈ [3, 10] we find that Δℎ𝑛 (𝜏) is significant when 𝜏 ∈ [9, 15] and 𝑛 ∈ [6, 10] as shown in Fig. 2.26. However, as mentioned previously in section 2.4.3, dimensions greater than 8 can be computationally expensive. We consider this range Figure 2.26: Example showing difference in PE (see Eq. (2.31)) for periodic and chaotic dynamic states of the Rossler system for a wide range of PE parameters. where Δℎ𝑛 (𝜏) is relatively large as the range of optimal PE parameters to be compared to. We 107 repeated this process for finding the optimal parameter ranges for PE using a similar procedure to this Rossler example as shown in Table A.2. Table 2.1: A comparison between the calculated and suggested values for the delay parameter 𝜏. The shaded (red) cells highlight the methods that failed to provide a close match to the suggested delay. Delay Dim. Exp. Sugg. Opt. Param. 1-D Pers. Sublevel R Parameters Range Cat. system State Homol. Set Pers. MI s L t f 0.5 0.1 𝜏 n Ref. 𝜏 n Gauss. - 1 1 1 1 3 7 8 1 3-6 [200] - - Noise Uniform - 1 1 1 1 3 7 8 - - - - - Models Rayleigh - 1 1 1 1 2 7 8 - - - - - Expon. - 1 1 1 1 2 7 8 - - - - - Per. 13 11 11 7 11 5 6 Lorenz 10 5-7 [200] 8-17 5-10 Cha. 12 13 12 9 12 5 7 Per. 10 10 10 8 11 5 6 Rossler 9 6 [228] 9-15 6-10 Cha. 12 12 12 10 12 5 6 Bi-direct. Per. 19 17 16 9 15 5 6 15 6-7 [200] 11-22 6-10 Rossler Cha. 18 16 16 15 17 5 6 Cont. Mackey Per. 7 7 6 3 8 5 6 10 4-8 [253] 6-12 4-8 Flows Glass Cha. 7 7 7 4 9 5 7 Chua Per. 16 17 17 11 19 5 6 20 5 [213] 16-24 5-10 Circuit Cha. 37 52 17 19 19 5 7 Coupled Per. 8 8 8 7 9 4 6 8 3-10 [222] 5-11 4-9 Ross.-Lor. Cha. 12 10 8 5 10 5 7 Double Per. 16 16 17 11 18 4 5 - - - 8-20 5-10 Pendul. Cha. 13 12 10 8 14 6 7 Period. Periodic - 12 12 13 24 16 4 5 15 4 [228] - - Funct. Quasi - 45 46 25 49 26 6 7 - - - - - Per. 1 1 1 1 3 4 5 Logistic 1-5 4-7 [200] 1-4 3-6 Cha. 1 1 1 1 16 4 6 Maps Per. 2 2 1 1 3 4 5 Henon 1-2 2-16 [200] 1-5 5-8 Cha. 1 1 1 1 16 6 7 Cont. 9 9 22 7 17 5 6 ECG 10-32 3-7 [139] 6-23 5-7 Med. Arrh. 13 13 15 6 15 5 6 Data Cont. 19 18 1 3 6 8 8 EEG 1-3 3-7 [200] 2-6 4-7 Seiz. 10 4 12 4 10 5 7 To verify our TDA-based methods for determining 𝜏, Table A.2 compares our results to the values from a wide variety of systems for both the first minima of the mutual information function and from expert suggestions, including several listed by Riedl et al. [200]. The table also shows the resulting permutation dimensions suggested from the permutation statistics as described in Section 2.4.3 for both 𝑅 = 0.1 and 𝑅 = 0.5 from Eq. (2.29). For these systems we have also included, where applicable, the delay and dimension parameter estimates for both periodic and 108 chaotic responses to validate each methods robustness to chaos and non-linearity. However, for the medical data section we instead we included a healthy/control and unhealthy (arrhythmia for ECG and seizure for EEG) as a substitute for a periodic and chaotic response, respectively. A detailed description of each dynamical system or data set used, including parameters for periodic and chaotic responses, is provided in the Appendix. In table A.2 we have highlighted the methods that failed to provide an accurate delay 𝜏 in red. We will now go through the methods and highlight the advantages and drawbacks as well as general suggestions for which method to use based on the category. Noise Models: We only have one expert suggestion of parameters for the noise models category, which is for Gaussian white noise (Gauss.) as 𝜏 = 1 and 𝑛 ∈ [3, 6]. In regards to the delay, all TDA based methods show an accurate selection of 𝜏 = 1, however the suggestion of 𝜏 = 3 from Mutual Information (MI)is slightly higher than suggested. We found that the expert suggested dimensions of 3 to 6 is significantly lower than the minimum dimension suggested by our permutation statistics method of 𝑛 = 7. As mentioned in Section 2.4.3, we believe it is necessary to have the number of permutation used to be atleast less than 50% of all the permutation available, which corresponds to a dimension 𝑛 = 7 for Gaussian noise. From this logic we can conclude that a suitable dimension should actually be atleast 𝑛 = 7 if any increase in the time series complexity is expected. If only decreases in complexity are expected, then a dimension of 𝑛 = 6 may be suitable. Continuous Flows: The next category is of continuous flows described by systems of non-linear differential equations. As shown in Table A.2, both the time domain analysis via sublevel set persistence and mutual information provide accurate delay suggestions for all of the examples. However, the 1-D persistent homology methods discussed in Section 2.4.1 also provide an accurate delay for every systems besides for the chaotic Chua circuit. This failure was most likely due to an inaccurate selection of the maximum significant frequency and associated 𝜏max . We can also conclude that the frequency domain analysis using sublevel set persistence consistently provided delays that were too small. In regards to the dimension, the suggested dimensions from the permutation statistics agreed with the delay suggested by experts for all of the continuous flow 109 systems. This suggests that the method of selecting a dimension for permutation entropy using the method described in Section 2.4.3 is accurate for simulations of continuous differential equations. Periodic Functions: For periodic functions, including a simple sinuisodal function (periodic) and two incommensurate sinuisoidal functions (quasiperiodic), our results in Table A.2 show that all methods, including mutual information, provide accurate selections of 𝜏 except the Fourier spectrum analysis via sublevel sets. This method results in a significantly high suggestion for 𝜏. In regards to the dimension selection, our results using the permutation statistics method described in Section 2.4.3 agree with the expert suggested minimum dimension of 𝑛 = 4. Maps: When selecting the delay parameter for permutations and takens’ embedding for maps we found that all of the topological methods suggested accurate delay parameters, while the standard mutual information methods selected overly large delay parameters when the maps are exhibiting a chaotic state. Therefor, we suggest the use of one of the topological methods when estimating the delay parameter for maps. For the permutation dimension we found a suggested dimension from 𝑛 ∈ [4, 7], in comparison to the expected suggested dimension ranging from 2 to 16. While the range suggested from the permutations statistics as described in Section 2.4.3 falls within the range suggested by experts, their range is too broad. Specifically, a dimension greater than 9 can be computationally cumbersome, and a dimension lower than 4 would not show significant differences for dynamic state changes. Therefor, we suggest the user of our narrower range of dimension from 𝑛 ∈ [5, 6] for maps which agrees with our optimal PE parameter range. Medical Data: The medical data used in this study Inherently has some degree of additive noise, which provides a first glimpse into the noise robustness of the delay parameter selection methods investigated. However, a more thorough investigation will be provided in Section 2.4.4. From our analysis, we disagree with the delay from experts suggested as 𝜏 ∈ [1, 3], but rather the delay selected from either mutual information or the time domain analysis of sublevel set persistence. The general selection for delays between 1 and 3 does not account for the large variation in possible sampling rates. If the small delay is used in conjunction with a high sampling rate, an inaccurate delay could be selected resulting in indistinguishable permutation entropy values as the dynamic 110 state changes. In regards, to the permutation dimension 𝑛, we believe that a more appropriate dimension, in comparison to the values suggested by experts, should range between 5 and 7 for medical data applications. Robustness to Additive Noise To determine the noise robustness of the delay parameter selection methods investigated in this work we will use an example time series. Specifically, we will use the 𝑥 solution to the periodic Rossler system. We will use additive Gasussian noise N (𝜇 = 0, 𝜎 2 ), where 𝜎 is determined from the Signal-to-Noise Ratio (SNR). The SNR is a measurement of how much noise there is in the signal with units of decibels (dB)and is calculated as   𝐴signal SNRdB = 20 log10 , (2.32) 𝐴noise where 𝐴signal and 𝐴noise are the Root-Mean-Square (RMS) amplitudes of the signal and additive noise, respectively. If we manipulate Eq. (2.32) we can solve for 𝐴noise as SNRdB 𝐴noise = 𝐴signal 10− 20 . (2.33) Because 𝑥(𝑡) is a discrete sampling from a continous system with 𝑡 = [𝑡1 , 𝑡2 , . . . , 𝑡 𝑁 ], we calculate 𝐴signal as v u t 𝑁 1 ∑︁ 𝐴signal = ¯ 2, [𝑥(𝑡𝑖 ) − 𝑥] (2.34) 𝑁 𝑖=1 where 𝑥¯ is the mean of 𝑥 and is subtracted from 𝑥(𝑡) to center the signal about zero. with 𝐴noise calculated, we set the additive noise standard deviation as 𝜎 = 𝐴noise . We applied a sweep of the SNR from 1 to 40 in increments of 1 with each SNR being repeated 2 for 30 unique realizations of the noise N (0, 𝐴noise 2 ). For each realization of 𝑥(𝑡) + N (0, 𝐴noise ) the delay parameters were calculated using all 5 methods: sublevel set persistence of the frequency domain 𝜏SLf , sublevel set persistence of the time domain 𝜏SLt , the minima of SW1PerS score 𝜏PHs , the maxima of the maximum persistence 𝜏PHL , and mutual information 𝜏MI . The mean and standard deviation of the 30 trials at each SNR were calulcated for each method as shown in Fig. 2.27. Figure 2.27 shows that the sublevel set persistence methods fail to provide an accurate delay 𝜏 in comparison to the expert suggested delay 𝜏exp. = 9 when SNR < 10 dB. While this does show a limit 111 Figure 2.27: Noise robustness analysis of the delay parameter selection using the Rossler system with incriminating additive noise. The mean and standard deviation as error bars of the delay parameters from 30 trials at each SNR were calculated using sublevel set persistence of the frequency domain 𝜏SLf , sublevel set persistence of the time domain 𝜏SLt , the minima of SW1PerS score 𝜏PHs , the maxima of the maximum persistence 𝜏PHL , and mutual information 𝜏MI . for the sublevel set persistence methods, SNR values below 10 dB are uncommon since this level of noise contamination is not considered acceptable for a signal with a rule-of-thumb requirement of SNR > 15 dB. However, the 1-D Persistent homology methods and mutual information provide accurate delay parameter selection down to an SNR of 2 dB. Robustness to Signal Length A common issue with signal processing and time series analysis methods is their limited functionality with smaller sets of data available, which has been used to analyze the sentitivty of the delay parameter selection [63]. Here we will investigate the limitations of these methods in the face of short time series. We will do this analysis by incrementing the length of the time series with the PE parameters calculated at each increment. For our analysis we will again use the Rossler system. Specifically, we incremented the length of the signal from 𝐿 = 75 to 1000 in steps of 25 (see Fig. 2.28. However, if this type of analysis is not available for the data set being analyzed, for time series analysis applications it is commonly suggested to have a data length of 𝐿 = 4000 for continuous dynamical systems and and 𝐿 = 500 for maps [250]. In Fig. 2.28 we see that all of the methods reach an accurate value of 𝜏, in comparison to the expert suggested 𝜏 = 9 when the time series is atleast 125 data points long. An important note to make is that this result is not general for all continuous dynamical systems. The required length 112 Figure 2.28: Signal length robustness analysis of the delay parameter selection using the Rossler system with incrementing signal length from 75 to 1000 in steps of 25. The delay parameters were calculated at each 𝐿 using set persistence of the frequency domain 𝜏SLf , sublevel set persistence of the time domain 𝜏SLt , the minima of SW1PerS score 𝜏PHs , the maxima of the maximum persistence 𝜏PHL , and mutual information 𝜏MI . of the signal is going to vary depending on the sampling rate of the time series. To determine a general requirement for the methods we repeated this analysis method for all of the systems shown in Table A.2. Our result from this analysis found that, in general, 𝐿 ≥ 15𝜏 for selecting an appropriate PE and state space reconstruction delay 𝜏 using the TDA-based methods described in this manuscript. 113 CHAPTER 3 PERSISTENT HOMOLOGY OF COMPLEX NETWORKS This chapter of my research investigates methods for mapping time series data in discrete complex networks whose topology can be used to infer meaningful characteristics about the underlying dynamics of the system. The topology of these complex networks is measured using persistent homology. 656 655 649 651 659 661 344 357 669 685 343 711 469 347 348 346 705 427 345 351352 585 413 367 368 371 293 287 361 525 363 380 381 364 379 307 373 374 366 349 523 349 433 553 350 167 509 657 559 340 342 506 293 307427 655 517 356 355 661 663 505507 357 656 513 161 413 664 514 347 635 659 658 516 203 645 287 237 705 666 672 665 526 281 213 660 670 381 371 662 585 712 586 161 368 211 706 636646 372 525 706 203 367 215 212 586 370 369 712 213 375 610 376 652 523 526 670 214 373 198 528 216 660 509 508 520 36106 658 215 433 666 510 209 553 507 517 665 196 559 11246 206 61 493 483 60 70 662 196 652 85 136 136 63 58 72 610 66 130 64 106 10 62 112 52 65 70 62 16 60 58 66 65 10 52 Figure 3.1: Comparison between ordinal partition networks generated from 𝑥-solution of R¥ossler system for both periodic (a) and chaotic (b) time series. These networks have the potential to provide new insights into the systems driving the time series outputs. For instance, periodic time series tend to create transitional networks with overarching circular structure, while those arising from chaotic systems have a seemingly unorganized state transition entanglement (see, for example, the OPNs in 3.1). Further, networks can provide an efficient approach for approximating topological entropy of low-dimensional chaotic systems [205]. However, practitioners often only have access to standard network analysis tools to quantify the resulting outputs such as centrality measures or average path length, and these measures can only do so much to quantify the overarching structure of the graph. The power of combining network approaches to signal processing with TDA is that there is the potential for novel methods for encoding the overall structure of the network in a quantifiable, robust manner. My work is the first to bring the tools of TDA to these networks. My work [162] provides a novel combination of persistent homology and network methods to yield a compressed, multi-scale representation of complex networks that can distinguish between dynamic states such as periodic and chaotic behavior. Applying a filtration of the simplicial complex enables us to track the changes 114 in homology classes over the course of the filtration through a persistence diagram. The persistence diagram encodes information about the loop structures and corresponding periodicity of the signal. I then extract existing as well as new geometric and entropy based point summaries from the persistence diagram. I can also make direct comparisons between persistence diagrams using distance measures and multi-scale projections. In [162], I showed that persistence-based point summaries yield a clearer distinction, compared to traditional statistics, of the dynamic behavior for a variety of simulated dynamical systems and electrocardiogram and electroencephalogram data sets. Additionally, I showed that the persistence-based point summaries are more robust to noise than existing graph-based scores. In section 3.1 I introduce the field of complex network representation of signals and the complex networks I use. Section 3.2 overviews how persistent homology is applied to the resulting networks including the various distances that can be used as well as summary statistics. Several examples are provided in Section 3.3 to demonstrate the procedure for forming the complex networks as well as the correct application of persistent homology per application. In section 3.4, I provide the results from analyzing the complex networks using persistent homology. 3.1 Complex Networks Network representations of time series generally fall within three categories: proximity networks, visibility graphs, and transitional networks. These types of complex networks are discussed in the following paragraphs. Proximity networks are formed from proximity conditions in the reconstructed state space. Examples include the 𝑘-Nearest Neighbors (𝑘-NN) [118] and recurrence networks [68] (which are essentially the network underlying the Vietoris-Rips complex of the point cloud). For proximity networks, the graph representation includes all points in the state space reconstruction as part of the vertex set. When studying the shape of these networks with TDA based tools, careful consideration is needed in the selection of 𝑘 or 𝜖 to generate a graph with the expected topology. Additionally, due to each point in the state space serving as a vertex, there are no speed gains in computing 115 persistent homology in comparison to the original state space reconstruction since the size of the simplicial complex remains the same in both representations. While proximity networks encode the dynamics of the signal into its structure, they do not store temporal information. Transitional networks partition a time series {𝑥(𝑡)} such that it has a vertex set of states {𝑠𝑖 } for each visited state and an edge for temporal transitions between states. The resulting transitional network constitutes a finite state space 𝐾 = {𝑠𝑖 }𝑖∈N , where 𝐾 is compact and every map 𝜙 : 𝐾 → 𝐾 is continuous. One interpretation of a topological system on a finite state space is as a finite graph where the edges describe the action of 𝜙, i.e., if there is a directed edge from vertex 𝑖 to vertex 𝑗, then 𝜙(𝑖) = 𝑗. Therefore, the transitional networks I obtain from a time series are topological systems, and they yield themselves to further analysis within the framework of topological dynamics. The two most common transitional networks for time series analysis are the ordinal partition network (OPN) [146] and the Coarse Grained State Space Network (CGSSN) [31, 237, 239]. Both of these transitional networks are formed by first reconstructing the state space through Takens’ embedding as 𝜒 = {Xi = (𝑥𝑖 , 𝑥𝑖+𝜏 , . . . , 𝑥𝑖+(𝑑−1)𝜏 )} ⊂ R𝑑 . The OPN is generated by defining states from the lexicographic order of the ordinal ranking of Xi . This method of partitioning the state space results in the vertex set of states as the 𝑑! possible permutations Π = (𝜋1 , . . . 𝜋 𝑑! ) representing the regions of R𝑑 separated by hyperplanes; see the example across the top of 3.7. Similarly, using the same example signal, the CGSSN is shown along the bottom of 3.7. The CGSSN is formed by defining a set of states as 𝑑-orthotopes that partition the state space occupied by Xi in a data-driven manner. For the example shown in Fig. 3.7 I defined 8 equal sized cubes (3-orthotopes) that represent the possible states, where the temporal transitions between states are tracked to add edges in the corresponding network. Both of these examples demonstrate the periodic structure of the embedding being encoded into a cyclic network structure. The visibility graph [5, 89, 120–123, 140, 141, 167, 242, 249], an idea taken from computational geometry [57], is defined by including a vertex for each data point, and including an edge between vertices if a line can be drawn between the two which does not pass below any other data point; see [168] for a review. The visibility graph is closely related to the sublevelset persistence computed 116 directly on the time series rather than on the Takens embedding. As my focus for this work is related to building upon the strong theory developed for the Takens embedding, I do not expect to utilize these constructions at this stage of the work. Additionally, visibility graphs, unfortunately, do not yield themselves well to be analyzed with persistent homology due to the lack of periodic cycle structure (e.g., loops) associated to regular dynamics. As such, I will not be investigating them. 3.1.1 Background State Space Reconstruction Takens’ theorem forms one of the theoretical foundations for the analysis of time series corresponding to nonlinear, deterministic dynamical systems [226] and is often used to form complex networks. It basically states that in general it is possible to obtain an embedding of the attractor of a deterministic dynamical system from one-dimensional measure- ments of the system’s evolution in time. The embedding of the signal is commonly known as the State Space Reconstruction (SSR). An embedding is a smooth map Ψ : 𝑀 → 𝑁 between the manifolds 𝑀 and 𝑁 that diffeomor- phically maps 𝑀 to 𝑁. Specifically, assume that the state of a system is described for any time 𝑡 ∈ R by a point x on an 𝑚-dimensional manifold 𝑀 ⊆ R𝑑 . The flow for this system is given by a map 𝜙𝑡 (x) : 𝑀 × R → 𝑀 which describes the evolution of the state x for any time 𝑡. In reality, I typically do not have access to x, but rather have measurements of x via an observation function 𝛽(x) : 𝑀 → R. The observation function has a time evolution 𝛽(𝜙𝑡 (x)), and in practice it is often a one-dimensional, discrete and equi-spaced time series of the form {𝛽𝑛 }𝑛∈N . Although the state x can lie in a higher dimension, the time series {𝛽𝑛 } is one-dimensional. Nevertheless, Takens’ theorem states that by fixing an embedding dimension 𝑑 ≥ 2𝑚 + 1, where 𝑚 is the dimension of a compact manifold 𝑀, and a time lag 𝜏 > 0, then the map Φ𝜙,𝛽 : 𝑀 → R𝑑 given by Φ𝜙,𝛽 = (𝛽(x), 𝛽(𝜙(x)), . . . , 𝛽(𝜙 𝑑−1 (x))) = (𝛽(x𝑡 ), 𝛽(x𝑡+𝜏 ), 𝛽(x𝑡+2𝜏 ), . . . , 𝛽(x𝑡+(𝑑−1)𝜏 )), 117 is an embedding of 𝑀, where 𝜙 𝑑−1 is the composition of 𝜙 𝑑 − 1 times and x𝑡 is the value of x at time 𝑡. Theoretically, any time lag 𝜏 can be used if the noise-free data is of infinite precision; however, in practice, the choice of 𝜏 is important in the delay reconstruction. The other component in Takens’ embedding is the embedding dimension 𝑑, which must be large enough to unfold the attractor. If this dimension in not sufficient, then some points can falsely appear to be neighbors at a smaller dimension due to the projection of the attractor onto a lower dimension. The appropriate method for selecting both of these parameters is thoroughly described in Chapter 2. 3.1.2 Graphs A graph 𝐺 = (𝑉, 𝐸) is a collection of vertices 𝑉 with edges 𝐸 = {𝑢𝑣} ⊆ 𝑉 × 𝑉. In this paper, I assume all graphs are simple (no loops or multiedges) and undirected. The complete graph on the vertex set 𝑉 is the graph with all edges included, i.e. 𝐸 = {𝑢𝑣 | 𝑢 ≠ 𝑣 ∈ 𝑉 }. I will reference a few special graphs. The cycle graph on 𝑛 vertices is the graph 𝐺 = (𝑉, 𝐸) with 𝑉 = {𝑣 1 , · · · , 𝑣 𝑛 }, and 𝐸 = {𝑣 𝑖 𝑣 𝑖+1 | 1 ≤ 𝑖 < 𝑛} ∪ {𝑣 𝑛 𝑣 1 }; i.e. it forms a closed path (cycle) where no repetition occurs except for the starting and ending vertex. The complete graph on 𝑛 vertices is the graph 𝐺 = (𝑉, 𝐸) with 𝑉 = {𝑣 1 , · · · , 𝑣 𝑛 }, and 𝐸 = {𝑣 𝑖 𝑣 𝑗 | 𝑖 ≠ 𝑗 }. That is, it is the graph with 𝑛 vertices and all possible edges are included. I will also work with weighted graphs, 𝐺 = (𝑉, 𝐸, 𝜔) where 𝜔 : 𝐸 → R gives a weight for each edge in the graph. In this paper, I assume all weights are non-negative, 𝜔 : 𝐸 → R≥0 . Given an ordering of the vertices 𝑉 = {𝑣 1 , · · · , 𝑣 𝑛 }, a graph can be stored in an adjacency matrix A where entry A𝑖 𝑗 = 1 if there is an edge 𝑣 𝑖 𝑣 𝑗 ∈ 𝐸 and 0 otherwise. This can be edited to store the weighting information by setting A𝑖 𝑗 = 𝜔(𝑣 𝑖 𝑣 𝑗 ) if 𝑣 𝑖 𝑣 𝑗 ∈ 𝐸 and 0 otherwise. A path 𝛾 in a graph is an ordered collection of non-repeated vertices 𝛾 = 𝑢 0 𝑢 1 · · · 𝑢 𝑘 where 𝑢𝑖 𝑢𝑖+1 ∈ 𝐸 for every 𝑖. The length of the path is the number of edges used, namely len(𝛾) = 𝑘 in the above notation. The distance between two vertices 𝑢 and 𝑣 is the minimum length of all paths from 𝑢 to 𝑣 and is denoted 𝑑 (𝑢, 𝑣). Given an ordering of the vertices, this information can be stored 118 in a distance matrix D where D𝑖 𝑗 = 𝑑 (𝑣 𝑖 , 𝑣 𝑗 ). Thus an unweighted graph 𝐺 = (𝑉, 𝐸) gives rise to a weighted complete graph on the vertex set 𝑉 by setting the weight 𝜔(𝑢𝑣) = 𝑑 (𝑢, 𝑣). 3.1.3 Proximity and Transition Networks Proximity Network: 𝑘-Nearest Neighbor Graph Given a collection of points in R𝑑 , the 𝑘- nearest neighbor graph, or 𝑘-NN, is a commonly used method to build a graph. Fix 𝑘 ∈ Z≥0 . The (undirected) 𝑘-NN graph has a vertex set in 1-1 correspondence with the point cloud, so I abuse notation and write 𝑣 𝑖 for both the point 𝑣 𝑖 ∈ R𝑑 , and for the vertex 𝑣 𝑖 ∈ 𝑉. An edge 𝑣 𝑖 𝑣 𝑗 is included if 𝑣 𝑖 is among the 𝑘th nearest neighbors of 𝑣 𝑗 . When required, I can give a weighting for this graph by setting 𝜔(𝑣 𝑖 𝑣 𝑗 ) = ∥𝑣 𝑖 − 𝑣 𝑗 ∥. Transition Networks: Ordinal Partition and Coarse Grained State Space Networks For a graph 𝐺 = (𝑉, 𝐸) given an ordering of the vertices 𝑉 = {𝑣 1 , · · · , 𝑣 𝑛 }, the graph can be stored in an adjacency matrix A where the weighting information is stored by setting A𝑖 𝑗 = 𝑤 (𝑣 𝑖 ,𝑣 𝑗 ) if 𝑣 𝑖 𝑣 𝑗 ∈ 𝐸 and 0 otherwise. Transitional networks are generated from a graph formation technique for time series data. They are formed through a chronologically ordered sequence of symbols or states. For time series analysis, these states are mapped from the measurement signal. Specifically, I first use a state space reconstruction and then assign a symbolic representation for each vector in the SSR. Our definition of the state space reconstruction is slightly different for discretely sampled time series data 𝑥 = [𝑥 1 , 𝑥2 , . . . , 𝑥 𝐿 ] with 𝐿 as the number of samples from the signal assuming the signal was sampled at uniform time stamps 𝑡 = [𝑡1 , 𝑡2 , . . . , 𝑡 𝐿 ] with sampling frequency 𝑓𝑠 . An SSR vector of a discrete sampled signal is defined as 𝑋𝑖 = [𝑥𝑖 , 𝑥𝑖+𝜏 , 𝑥𝑖+2𝜏 , . . . , 𝑥𝑖+𝜏(𝑛−1) ] (3.1) with 𝑖 ∈ Z ∩ [1, 𝐿 − 𝜏(𝑛 − 1)], 𝜏 ∈ Z. To form a symbolic sequence from the time series data we implement a function to map the SSR to a set of symbols or an alphabet A of possible symbols as 𝑓 : 𝑣 𝑖 → 𝑠 𝑗 , where 𝑠 𝑗 ∈ A is a 119 symbol from the alphabet. In this work we consider the symbols from the alphabet as integers such that 𝑠𝑖 ∈ A = Z ∩ [1, 𝑁], where 𝑁 is the number of possible symbols. Applying this mapping over all embedding vectors we get a symbol sequence as 𝑆 = [𝑠1 , 𝑠2 , . . . , 𝑠 𝐿−𝜏(𝑛−1) ]. The symbol sequence 𝑆 forms a transitional network by considering a graph 𝐺 = (𝑉, 𝐸). We represent the graph using the adjacency matrix A data structure of size 𝑁 × 𝑁. We add edges to the graph via the symbolic transitions with an edge between row 𝑠𝑖 and column 𝑠 𝑗 when there is a transition from 𝑠𝑖 to 𝑠 𝑗 . This is represented in the adjacency matrix structure by incrementing the value of A𝑠𝑖 ,𝑠 𝑗 by one for each transition between 𝑠𝑖 and 𝑠 𝑗 , where A begins as a zero matrix. We set the total number of transitions between two nodes 𝑠𝑖 and 𝑠 𝑗 as the edge weight 𝑤 (𝑠𝑖 ,𝑠 𝑗 ) . To better illustrate this process take the example of a simple cycle shown in Fig. 3.2. In this example we take the symbol or state sequence 𝑆 on the left side of Fig. 3.2 with symbols in the alphabet A = [1, 2, 3, 4] and create a network in the middle of Fig. 3.2. This network is represented as a directed and weighted adjacency matrix as shown on the right side of Fig. 3.2. With an understanding of transitional networks and their formation. I next introduce two commonly used method for assigning symbolic representations to the SSR vectors. Figure 3.2: Example formation of a weighted transitional network as a graph (middle figure) and adjacency matrix (right figure) given a state sequence 𝑆 (left figure). The ordinal partition network [146,216] provides a relatively simple method to assign symbolic representations for the SSR vectors to form a transition network. This construction arose as a generalization of the concept of permutation entropy [14]. The basic idea of the OPN construction is to replace each SSR vector 𝑋𝑖 with a permutation 𝜋 where the vector 𝑋𝑖 is assigned to a permutation based on the sorted order of its coordinates. Specifically, the permutation 𝜋 is the one in the set of 𝑛! possible permutations for which 𝑥(𝑡 +𝜋(0)𝜏) ≤ 𝑥(𝑡 +𝜋(1)𝜏) ≤ · · · ≤ 𝑥(𝑡 +𝜋(𝑛−1)𝜏), 120 where 𝜋(𝑖) is the permutation value at index 𝑖; see the top row (OP) of Fig. 3.3 for an example. Then the OPN is built with a vertex set of encountered permutations in the sequence 𝑆 with an edge included if the ordered point cloud passes from one permutation to the other. Figure 3.3: Assignment of Ordinal Partition (OP) or Coarse Grained (CG) state for example dimension 3 SSR vector. The coarse grained state space network is created by partitioning the space occupied by the SSR into discrete 𝑛-dimensional hypercubes. This is done by first digitizing the SSR vectors using a digitization function 𝜓( 𝜒𝑖 , 𝐵), where 𝐵 = [𝐵(1), 𝐵(2), . . . , 𝐵(𝑏 − 1)] is the monotonically increasing discrete binning of the vector’s coordinates into 𝑏 bins. We do this using an equal sized binning method. Specifically, the binning 𝐵 is a vector of bin edges needs to encapsulate the entire range of signal values such that max(𝑥) ≤ 𝐵(𝑏) and min(𝑥) ≥ 𝐵(1). Let us assume our binning scheme has a total of 𝑏 bins such that our digitized 𝜒𝑖 is defined as 𝑝𝑖 = 𝜓(𝛿𝑖 , 𝐵) = [ 𝑝𝑖 (1), 𝑝𝑖 (2), . . . , 𝑝𝑖 (𝑛 − 1)], (3.2) where 𝑝𝑖 ( 𝑗) is the bin index that 𝜒𝑖 ( 𝑗) is bounded by with 𝐵( 𝑝𝑖 ( 𝑗)) < 𝜒𝑖 ( 𝑗) ≤ 𝐵( 𝑝𝑖 ( 𝑗) + 1). We 121 now have our digitized SSR vectors 𝑝𝑖 which can be assigned a unique symbolic representation as 𝑛−1 ∑︁ 𝑠𝑖 = ( 𝜒𝑖 ( 𝑗) − 1)𝑏 𝑛−1− 𝑗 , (3.3) 𝑗=1 where 𝑠𝑖 ∈ [0, 𝑏 𝑛 − 1] for a total of 𝑏 𝑛 possible states. This symbolic assignment is computationally efficient since it does not require a comparison to a bank of possible states as is required with ordinal partition networks. An example assignment is shown in the bottom CG row of Fig. 3.3 with 𝑏 = 8 and 𝑛 = 3. 3.2 Topological Analysis of Complex Networks In order to analyze the shape of the constructed graphs, we turn to a generalization of the graph known as a simplicial complex, and a measurement tool known as persistent homology. We direct the interested reader looking for a more in depth discussion to [65,92,158,170]. I will first introduce persistent homology in how it applies to a distance matrix, which is easily extended to graphs. I will follow this introduction to some methods for generating a distance matrix from a graph. 3.2.1 Persistent Homology of Complex Networks Simplicial complexes A simplicial complex can be thought of as a generalization of the concept of a graph to higher dimensions. Given a vertex set 𝑉, a simplex 𝜎 ⊆ 𝑉 is simply a collection of vertices. The dimension of a simplex 𝜎 is dim(𝜎) = |𝜎| − 1. The simplex 𝜎 is a face of 𝜏, denoted 𝜎 ⪯ 𝜏 if 𝜎 ⊆ 𝜏. A simplicial complex 𝐾 is a collection of simplices 𝜎 ⊆ 𝑉 such that if 𝜎 ∈ 𝐾 and 𝜏 ⪯ 𝜎, then 𝜏 ∈ 𝐾. Equivalently stated, 𝐾 is a collection of simplices which is closed under the face relation. The dimension of a simplicial complex is the largest dimension of its simplices, dim(𝐾) = max𝜎∈𝐾 dim(𝜎). The 𝑑-skeleton of a simplicial complex is all simplices of 𝐾 with dimension at most 𝑑, 𝐾 (𝑑) = {𝜎 ∈ 𝐾 | dim(𝜎) ≤ 𝑑}. Given a graph 𝐺 = (𝑉, 𝐸), I can construct the clique complex 𝐾 (𝐺) = {𝜎 ⊆ 𝑉 | 𝑢𝑣 ∈ 𝐸 for all 𝑢 ≠ 𝑣 ∈ 𝜎}. 122 This is sometimes called the flag complex. The clique complex of the complete graph on 𝑛 vertices is called the complete simplicial complex on 𝑛 vertices. A filtration is a collection of nested simplicial complexes 𝐾1 ⊆ 𝐾2 ⊆ · · · ⊆ 𝐾 𝑁 . See the bottom row of 3.4 for an example of a filtration. A weighted graph gives rise to a filtration I will make use of extensively. Given a weighted graph 𝐺 = (𝑉, 𝐸, 𝜔) and 𝑎 ∈ R, I set 𝐾𝑎 = {𝜎 ∈ 𝐾 (𝐺) | 𝜔(𝑢𝑣) ≤ 𝑎 for all 𝑢 ≠ 𝑣 ∈ 𝜎}. Since 𝐾𝑎 ⊆ 𝐾 𝑏 for 𝑎 ≤ 𝑏, this can be viewed as a filtration 𝐾𝑎1 ⊆ 𝐾𝑎2 ⊆ · · · ⊆ 𝐾𝑎 𝑁 for any collection 𝑎 1 ≤ 𝑎 2 ≤ · · · ≤ 𝑎 𝑁 . In particular, for this paper, I will build a filtration from an unweighted graph 𝐺 by the following procedure. First, construct the pairwise distance matrix for the vertices of 𝐺 using shortest paths. This can be viewed as a weighting on the complete graph with the same vertex set as 𝐺. Thus, it induces a filtration on the complete simplicial complex 𝐾 where the 1-skeleton of 𝐾𝑎 includes edges between any pair of vertices 𝑢 and 𝑣 for which 𝑑 (𝑢, 𝑣) ≤ 𝑎. See 3.4 for an example. Homology Traditional homology [92, 158] counts the number of structures of a particular di- mension in a given topological space, which in our context will be a simplicial complex. In this context, the structures measured can be connected components (0-dimensional structure), loops (1-dimensional structure), voids (2-dimensional structure), and higher dimensional analogues as needed. For the purposes of this paper, I will only ever need 0- and 1-dimensional persistent homology so I provide the background necessary in these contexts. Further, as a note for the expert, I always assume homology with Z2 coefficients which removes the need to be careful about orientation. I start by describing homology. Assume I are given a simplicial complex 𝐾. Say the 𝑑- dimensional simplices in 𝐾 are denoted 𝜎1 , · · · , 𝜎ℓ . A 𝑑-dimensional chain is a formal sum of the 123 Íℓ 𝑑-dimensional simplices 𝛼 = 𝑖=1 𝑎𝑖 𝜎𝑖 . I assume the coefficients 𝑎𝑖 ∈ Z2 = {0, 1} and addition Íℓ Íℓ Íℓ is performed mod 2. For two chains 𝛼 = 𝑖=1 𝑎𝑖 𝜎𝑖 and 𝛽 = 𝑖=1 𝑏𝑖 𝜎𝑖 , 𝛼 + 𝛽 = 𝑖=1 (𝑎𝑖 + 𝑏𝑖 )𝜎𝑖 . The collection of all 𝑑-dimensional chains forms a vector space denoted 𝐶𝑑 (𝐾). The boundary of a given 𝑑-simplex is ∑︁ 𝜕𝑑 (𝜎) = 𝜏. 𝜏≺𝜎,dim(𝜏)=𝑑−1 That is, it is the formal sum of the simplices of exactly one lower dimension. If dim(𝜎) = 0, that is, if 𝜎 is a vertex, then I set 𝜕𝑑 (𝜎) = 0. The boundary operator 𝜕𝑑 : 𝐶𝑑 (𝐾) → 𝐶𝑑−1 (𝐾) is given by ℓ ! ∑︁ ∑︁ 𝜕𝑑 (𝛼) = 𝜕𝑑 𝑎𝑖 𝜎𝑖 = 𝑎𝑖 𝜕𝑑 (𝜎𝑖 ). 𝑖=1 A 𝑑-chain 𝛼 ∈ 𝐶𝑑 (𝐾) is a cycle if 𝜕𝑑 (𝛼) = 0; it is a boundary if there is a 𝑑 + 1-chain 𝛽 such that 𝜕𝑑+1 (𝛽) = 𝛼. The group of 𝑑-dimensional cycles is denoted 𝑍 𝑑 (𝐾); the boundaries are denoted 𝐵 𝑑 (𝐾). In particular, any 0-chain is a 0-cycle since 𝜕0 (𝛼) = 0 for any 𝛼. A 1-chain is a 1-cycle iff the 1-simplices (i.e., edges) with a coefficient of 1 form a closed loop. It is a fundamental exercise in homology to see that 𝜕𝑑 𝜕𝑑+1 = 0 and therefore that 𝐵 𝑑 (𝐾) ⊆ 𝑍 𝑑 (𝐾). The 𝑑-dimensional homology group is 𝐻𝑑 (𝐾) = 𝑍 𝑑 (𝐾)/𝐵 𝑑 (𝐾). An element of 𝐻𝑑 (𝐾) is called a homology class and is denoted [𝛼] for 𝛼 ∈ 𝑍 𝑑 (𝐾) where [𝛼] = {𝛼 + 𝜕 (𝛽) | 𝛽 ∈ 𝐶𝑑+1 (𝐾)}. I say that the class is represented by 𝛼, but note that any element of [𝛼] can be used as a representative so this choice is by no means unique. In the particular case of 0-dimensional homology, there is a unique class in 𝐻0 (𝐾) for each connected component of 𝐾. For 1-dimensional homology, I have one homology class for each “hole” in the complex. Persistent homology We next look to a more modern viewpoint of homology which is particularly useful for data analysis, persistent homology. In this case, we study a changing simplicial complex and encode this information via the changing homology. In explaining persistence, we will follow the example of Fig. 3.4 for the setting used in this work where the input data is a weighted network. 124 Figure 3.4: Persistent homology of weighted complex network. Top left shows the weighted network with corresponding adjacency matrix to its right. Third is the distance matrix and then at the top right is the persistence diagram of one-dimensional features. The bottom row shows the filtration at critical values. A filtration of a simplicial complex 𝐾 is a collection of nested simplicial complexes 𝐾1 ⊆ 𝐾2 ⊆ · · · ⊆ 𝐾 𝑁 = 𝐾. See the bottom row of Fig. 3.4 for an example of a filtration. In this work, we will be focused on the following filtration which arises from finite metric space; in our case, this is given as a pairwise distance matrix D ∈ R𝑛×𝑛 ≥0 , obtained from a weighted graph as described in Sec. 3.2.2. Set the vertex set to be 𝑉 = [1, · · · , 𝑛] and for a fixed 𝑎 ∈ R, let 𝐾𝑎 = {𝜎 ⊂ 𝑉 | D(𝑢, 𝑣) ≤ 𝑎 for all 𝑢 ≠ 𝑣 ∈ 𝜎}. This can be thought of as the clique complex on the graph with edges given by all pairs of vertices with distance at most 𝑎. Further, since 𝐾𝑎 ⊆ 𝐾 𝑏 for 𝑎 ≤ 𝑏, this construction gives rise to a filtration 𝐾𝑎1 ⊆ 𝐾𝑎2 ⊆ · · · ⊆ 𝐾𝑎 𝑁 for any collection 𝑎 1 ≤ 𝑎 2 ≤ · · · ≤ 𝑎 𝑁 . Fix a dimension 𝑑. For any inclusion of one simplicial complex to another 𝐿 ↩→ 𝐾, there is an induced map on the 𝑑-chains 𝜄 : 𝐶𝑑 (𝐿) → 𝐶𝑑 (𝐾) by simply viewing any chain in the small complex as one in the larger. Less obviously, this extends to a map on homology 𝜄∗ : 𝐻𝑑 (𝐿) → 𝐻𝑑 (𝐾) by sending [𝛼] ∈ 𝐻𝑑 (𝐿) to the class in 𝐻𝑑 (𝐾) with the same representative. That this is well defined 125 is a non-trivial exercise in the definitions [92]. Putting this together, given a filtration 𝐾𝑎1 ⊆ 𝐾𝑎2 ⊆ · · · ⊆ 𝐾𝑎 𝑁 there is a sequence of linear transformations on the homology 𝐻𝑑 (𝐾𝑎1 ) → 𝐻𝑑 (𝐾𝑎2 ) → · · · → 𝐻𝑑 (𝐾𝑎 𝑁 ). A class [𝛼] ∈ 𝐻𝑑 (𝐾𝑎𝑖 ) is said to be born at 𝑎𝑖 if it is not in the image of the map 𝐻𝑑 (𝐾𝑎𝑖−1 ) → 𝐻𝑑 (𝐾𝑎𝑖 ). The same class dies at 𝑎 𝑗 if [𝛼] ≠ 0 in 𝐻𝑑 (𝐾𝑎 𝑗 −1 ) but [𝛼] = 0 in 𝐻𝑑 (𝐾𝑎 𝑗 ). In the case of 0-dimensional persistence, this feature is encoding the appearance of a new connected component at 𝐾𝑎𝑖 that was not there previously, and which merges with an older component entering 𝐾𝑎 𝑗 . For 1-dimensional homology, this is the appearance of a loop structure that likewise fills in entering 𝐾𝑎 𝑗 . The persistence diagram encodes this information as follows. For each class that is born at 𝑎𝑖 and dies at 𝑎 𝑗 , the persistence diagram has a point in R2 at (𝑎𝑖 , 𝑎 𝑗 ). Because several features can appear and disappear at the same times, we allow for repeated points at the same location. For this reason, a persistence diagram is often denoted as a multiset of its off-diagonal points, 𝐷 = {(𝑏 1 , 𝑑1 ), · · · , (𝑏 𝑘 , 𝑑 𝑘 )}. See the top right of Fig. 3.4 for an example. Note that the farther a point is from the diagonal, the longer that class persisted in the filtration, which signifies large scale structure. The lifetime or persistence of a point 𝑥 = (𝑏, 𝑑) in the diagram in a persistence diagram 𝐷 is given by pers(𝑥) = |𝑏 − 𝑑|. It is often of interest to investigate only a specific subset of 𝑑 dimensional features from a persistence diagram, which we represent as 𝐷 𝑑 . 3.2.2 Distance Measures for Graphs We next look at four different ways to define a distance between pairs of vertices given an input (weighted) graph. In each case, we generate a distance matrix D where entry D(𝑎, 𝑏) gives the associated distance between vertices 𝑎 and 𝑏. 126 Shortest Unweighted Path Distance The first method, the shortest unweighted path distance, ignores the weighting information entirely, using only the number of edges to get from vertex 𝑎 to vertex 𝑏. Specifically, D(𝑎, 𝑏) is the number of steps it takes to transition from 𝑎 to 𝑏 through the shortest path. See the example of Fig. 3.5. The shortest path distance is calculated using the NetworkX implementation of Dijkstra’s algorithm [66] with the unweighted adjacency matrix. Figure 3.5: Example basic graph with corresponding shortest path distance matrix. Highlighted in red is an example shortest path from node 2 to 5 with shortest path distance 2. Shortest Weighted Path Distance The second method, the shortest weighted path, similarly only uses the number of edges between vertex 𝑎 and 𝑏 as the path distance. However, the weighted information is incorporated through the choice of the path. This is done by choosing the path with the lowest summed weight of all paths between 𝑎 and 𝑏. To make it such that the path with the largest weights is used, the inverse of the edge weights is used when calculating the shortest path. Again, this distance is calculated using the NetworkX implementation of Dijkstra’s algorithm [66] but with the inverse of the weighted adjacency matrix. Weighted Shortest Path The third method, the weighted shortest path is very similar to the second method. The only variation is that the sum of the edge weights along the path is used as the distance. The path used is found using the inverse of the edge weights similar to the second method. The fourth method for computing distances is the diffusion distance; for more details we direct the reader to [50]. This is computed using the transition probability distribution matrix P of the graph, where P(𝑎, 𝑏) is the probability of transitioning to vertex 𝑏 in the next step given you are 127 currently at 𝑎. Given the weighted, undirected adjacency matrix A, the transitional probability matrix is calculated as A(𝑖, 𝑗) P(𝑖, 𝑗) = Í|𝑉 | . 𝑘=1 A(𝑖, 𝑘) This formulation of the probability matrix only has transition probabilities greater then zero for one step neighbors of 𝑖. However, the transition probabilities for non-adjacent neighbors of node 𝑖 can be calculated using the random walk and the diffusion process. A random walk is the sequences of nodes visited (𝑎 1 , 𝑎 2 , . . .) in 𝑡 steps, where the selection of the next node is based on the transition probabilities. It is a classic exercise to show that, given P, the probability distribution for transitioning to vertex 𝑏 from vertex 𝑎 in 𝑡 random walk steps is P𝑡 (𝑎, 𝑏). Diffusion Distance The diffusion distance is a measure of the degree of connectivity of two nodes in a connected graph after 𝑡 steps using the lazy transition probability P̃𝑡 based on the possible random walks of length 𝑡 and is calculated as √︄ ∑︁ 1  2 𝑑𝑡 (𝑎, 𝑏) = P̃𝑡 (𝑎, 𝑐) − P̃𝑡 (𝑏, 𝑐) (3.4) 𝑐∈𝑉 d(𝑐) where d is the degree vector of the graph with d(𝑖) as the degree of node 𝑖 and P̃ is the lazy transition probability matrix, where the initial zero diagonal of 𝑃 is set such that P̃ = 1/2(I + P). In other words, there is an equal probability of staying and leaving at node 𝑖 in a single step. Applying the diffusion distance to all node pairs results in the distance matrix D𝑡 . Consider the diffusion distance with two nodes having a connected path with high transition probability edges or many random walk paths connecting the two, then the diffusion distance between them will be low. However, if two vertices are only connected through a single, low probability edge transition from a possible perturbation in the graph, then their diffusion distance will be large. A common example implementing the diffusion distance is based on assigning P as a function of the proximity of nodes. Using this formulation of the transition probability, it is possible to cluster the data based on the distances as demonstrated in [50]. However, due to the natural transitions that occur in transitional complex networks, the diffusion distance is a natural solution for incorporating edge weight data into the distance measurement. 128 It is important to mention the sensitivity of the diffusion distance D𝑡 to the selection of the number of walk steps 𝑡. We used an empirical study of 23 continuous dynamical systems to determine the optimal 𝑡 such that a periodic signal creates a significant point in the persistence diagram representing the cycle. More details on this analysis are available in the appendix in Section D.2. We found an optimal value of 𝑑 < 𝑡 < 3𝑑, where 𝑑 is the diameter of the graph. Specifically, the diameter is measured as the maximum shortest unweighted path between any two vertices. Intuitively, this value of 𝑡 seems suitable since it allows for a transition probability between all nodes in the graph. I.e., if 𝑡 ≥ 𝑑 then there is a probability of transitioning between every node pair in a random walk of length 𝑡. 3.2.3 Point summaries of persistence diagrams A common issue with persistence diagrams is that they are notoriously difficult to work with as a summary of data. While they are quantitative in nature, determining differences in structure such as “has a point far from the diagonal” is often a qualitative procedure. Metrics for persistence diagrams exist, namely the bottleneck and 𝑝-Wasserstein1 distances, however these objects are not particularly easy to work with in a statistical or machine learning context. Thus, I will pass to working with the simplest of featurizations, namely point summaries of a given diagram, which I also call scores. Maximum persistence The first very simple but extremely useful point summary is maximum persistence. Given a persistence diagram 𝐷, the maximum persistence is simply maxpers(𝐷) = max pers(𝑥). 𝑥∈𝐷 While this is obviously a very lossy point summary for a persistence diagram, it is quite useful in that, particularly for applications where the existence of a large circle is of interest, it often does what I need. See, e.g., [112, 232]. Periodicity Score 1 This metric is closely related to but not the same as the eponymous metric from probability theory. 129 Figure 3.6: Table of examples showing the lifetime 𝐿 𝑛 of the single class (𝑟 𝐵 , 𝑟 𝐷 ) in the persistence diagram for the pipeline applied to a cycle with 𝑛 nodes. Next, I set out to build a point summary which I can use to measure the similarity of our weighted graph to a cycle graph which is independent of the number of nodes. If 𝐺 ′ is an unweighted cycle graph with 𝑛 vertices, then following the procedure of Fig. 3.4 using the shortest path metric, I have that there is exactly one cycle which is born at 1, and fills in at ⌈ 𝑛3 ⌉. See the examples of 3.6. This means the persistence diagram 𝐷 ′ has exactly one point (1, ⌈ 𝑛3 ⌉), and so I denote the maximum persistence of this diagram as l𝑛m 𝐿 𝑛 = maxpers(𝐷 ′) = − 1. 3 Then, assume I are given another unweighted graph 𝐺 with |𝑉 | = 𝑛 and persistence diagram 𝐷. I define the network periodicity score maxpers(𝐷) 𝑃(𝐷) = 1 − . (3.5) 𝐿𝑛 This score is an extension of the periodicity score in [177] to unweighted networks, and it has the property that 𝑃(𝐷) ∈ [0, 1], with 𝑃(𝐷) = 0 iff the input graph 𝐺 is a cycle graph. The ratio of the number of homology classes to the graph order The next point summary I define is |𝐷| 𝑀 (𝐷) = , (3.6) |𝑉 | which is the reciprocal of the ratio between the number of vertices in the network |𝑉 |, i.e., the order of the graph, and the number of classes in the persistence diagram |𝐷|. 130 I can think of this number as an approximation of the reciprocal of the number of vertices in each class, however, this is only an approximation because some classes in 1-D persistence diagram may share vertices in the network. Note that for a network with 𝑛 nodes, the 0-dimensional persistence diagram will always have 𝑛 − 1 points, and so this metric is not particularly useful. In this paper, I only use this summary for 1-dimensional persistence diagrams. The logic behind this heuristic is that for a periodic signal I would expect to see a small number of 1-D homology classes in comparison to a chaotic time series. Therefore, for two networks of similar order but with different dynamic behavior, i.e., one is chaotic and one is periodic, the ratio 𝑀 (𝐷) for the periodic time series will be smaller than its chaotic counterpart. Normalized Persistent Entropy Persistent entropy is a method for calculating the entropy from the lifetimes of the points in a persistence diagram, inspired by Shannon entropy. This summary function, first given by Chintakunta et al. [45], is defined as   ∑︁ pers(𝑥) pers(𝑥) 𝐸 (𝐷) = − log2 , (3.7) 𝑥∈𝐷 ℒ(𝐷) ℒ(𝐷) Í where ℒ(𝐷) = 𝑥∈𝐷 pers(𝑥) is the sum of lifetimes of points in the diagram. I cannot easily compare this value across different diagrams with different numbers of points. To deal with this issue, I provide the following normalization heuristic. Specifically, I normalize 𝐸 as 𝐸 (𝐷) 𝐸 ′ (𝐷) = . (3.8) log2 ℒ(𝐷)) This normalization allows for an accurate measurement of the entropy even when there are few significant lifetimes. 3.3 Examples This section overviews several examples applying transitional networks to time series data. Namely, I provide applications of both ordinal partition and coarse grained state space networks that highlight the limitations and benefits of each. Further, I show the benefits of incorporating weight information. Lastly, I show how these networks can capture the topology of the underlying state space of the time series. This is done for both synthetic and experimental data. 131 r[h] Figure 3.7: Example formation of the ordinal partition (top) and coarse grained state space (bottom) networks for 𝑥(𝑡) = sin(𝑡) embedded into R3 . In this work we choose 𝜏 using the method of multi-scale permutation entropy as suggested in [160] since we are forming permutations to construct the OPN. While an appropriate embedding dimension 𝑛 for the state space reconstruction may be sufficient, it may not be a high enough dimension to capture the complexity of the time series. To alleviate this issue, Bandt and Pompe [14] suggested using higher dimensions (e.g. 𝑛 ∈ [4, 10]) to allow for 𝑛! different states to better capture the complexity of the time series. In this work we will use a dimension 𝑛 = 6 unless otherwise stated. 3.3.1 First Example: Ordinal Partition and Coarse Grained State Space Network Compar- ison This first example compares the ordinal partition and coarse grained state space networks in terns of noise robustness. Let us first start with a simple demonstrative example showing in Fig. 3.7 showing how the ordinal partition and coarse grained state space networks are related. The example is from embed- ding a simple sinusoidal function into dimension 𝑛 = 3 creating a circle structure in the state space reconstruction. Both network are created by covering the space occupied in the state space reconstruction. For 132 ordinal partition networks, the set of all permutations of dimension 𝑛 gives a cover of R𝑛 with  permutation 𝜋𝑖 representing a subspace of R𝑛 given by the intersection of 𝑑2 inequalities. An example of these inequality planes and their intersections for a three-dimensional embedding is shown on the top OPN route of Fig. 3.7. Coarse grained state space networks create a cover using a set of 𝑛-dimensional hypercubes. These eight cubes are equal-sized for the example in the bottom of Fig. 3.7. Both network formation techniques capture the periodic structure of the state space reconstruction with resulting cycle graphs. Figure 3.8: Example illustrating issue with erraneous permutation transitions when there is additive noise and a tracjectory close to the hyperplane intersection 𝐻. The three dimensional state space reconstruction (D) from the signal 𝑥(𝑡) with and without additive noise (A) demonstrate that as the distance to the hyperdiagonal 𝑑 𝐻 (C) becomes small, undesired permutation transitions (B)–with zoomed in section shown in (E)–occur as shown in the orange highlighted regions. Robustness to Noise During my work with ordinal partition networks I discovered that they are not particularly resilient to noise. Indeed, one can think of the ordinal partition network as being the 1-skeleton of the nerve of a particular closed cover of the state space, delineated by the hyperplanes 𝑥𝑖 ≤ 𝑥 𝑗 . Consequently, when noise is injected into the system, there are superfluous transitions when nearing one of these boundaries. This effect becomes even more prominent near an intersection of multiple hyperplanes. For example, consider the signal and its embedding into R3 in Fig. 3.8. As the distance to the hyperdiagonal 𝑑 𝐻 becomes small, there is a significant increase in seemingly superfluous transitions between permutations 𝜋 (highlighted in orange in Fig. 3.8). This 133 issue is even more exaggerated when the embedded signal is consistently close to the hyperdiagonal, which causes networks where no useful network topology can be extracted (e.g. see signal and far right OPN in Fig. 3.9). This issue can be partly alleviated by including the weight information as the most probable transition between permutations should still have the highest weight. However, these superflous transitions can become to severe when the state space reconstruction passes near the hyperdiagonal. For example, Fig. 3.9 shows the OPN and CGSSN for the signal with and without noise. This example clearly demonstrates that the CGSSN is the best choice for this signal with only very minor changes in its shape, while the OPN loses all resemblance of the noise free network. This loss in structure is due to the nature of the signals reconstructuction passing along the hyperdiagonl. While the OPN loses its structure, the ordinal partition network does not. This stability is due to their being no hyperdiagonal between states. At most there are only 8 possible states that intersection t a single point and no along an edge. This helps preserve the structure of the network when there is additive noise as it is not possible for the state to superfluously transition more than 8 states away if the amplitude of the noise is smaller than the edge length of the hypercube states. Figure 3.9: Example demonstrating importance of choosing an appropriate network formation method when there is additive noise in the signal. The CGSSN retains the graph structure when additive noise, but the OPN network quickly loses all resemblance of the noise free topological structure even with a small amount of additive noise. 𝑥(𝑡) is the signal, N is additive noise and 𝐺 (𝑥) is the graph formation function of the signal 𝑥. While this example highlighted a limitation of the OPN, the OPN does have benefits over the coarse grained state space network. Specifically, the ordinal partition network does not need to be adaptive to the amplitude of the data as does the CGSSN. Additionally, it has fewer parameters with only 𝑛 and 𝜏 being selected with the CGSSN requiring an additional number of bins 𝑏 parameter. 134 Figure 3.10: Two example weighted cycle graphs of weight 10 with the bottom row having an additional edge of weight one connecting nodes 0 and 8. The persistence diagram associated to each of the four distance methods are shown by column both both graphs. 3.3.2 Second Example: Distance Method Comparison To compare my original [162] work done using the naive shortest unweighted path distance to the weight incorporating shortest path and diffusion distances, let us look at a simple example that highlights the issue previously mentioned with the unweighted shortest path not accounting for weight information. In Fig. 3.10 there are two graphs: on the top is a cycle graph with edge weights of 10 and on the bottom is the same cycle graph but with an additional single perturbation edge added between nodes 0 and 8 with a weight of 1. This edge could be caused by additive noise, a perturbation to the underlying dynamical system, or simply a falsely added state transition in the OPN formation procedure. If we implement the shortest unweighted path distance for calculating the persistent homology of the cycle graph we get a single significant point in the resulting persistence diagram as shown in the top left persistence diagram of Fig. 3.10. However, adding the single, low-weighted edge splits the graph with the persistence diagram using the shortest unweighted path distance having two significant points in the persistence diagram (see bottom left diagram of Fig. 3.10). This is due to the edge weight information being discarded when using the shortest path distance. In comparison to the shortest unweighted path distance, the second, third, and fourth columns of 135 Fig. 3.10 show the persistence diagrams for both graphs using the shortest weighted path, weighted shortest path, and diffusion distances, respectively. For all three of these distance methods there is only a single one-dimensional point in the persistence diagrams for both graphs. Additionally, both the shortest weighted path and weighted shortest path have identical persistence diagrams for both graphs. This is due to the shortest weighted path between any two vertices never using the edge between vertices 0 and 8. For the diffusion distance we also only have a single point in the persistence diagram for one-dimensional features. This is caused by the weighted information being used in the diffusion distance calculation where the change in distance from the nodes 0 and 8 is not significantly changed from the addition of the perturbation edge connecting them since it has a low weight relative to the cycle and the transition probability distributions between vertices 0 and 8 are dissimilar. For calculating the diffusion distance in this example we used 𝑡 = 2𝑑 walk steps with 𝑑 as the shortest path diameter of the graph. This example demonstrates the importance of incorporating weight information when calculat- ing the persistent homology of a complex network. The possibility of these low weight edges is evident as shown in Fig. 3.8 where there are noise associated edge state transitions when near state intersections. 3.3.3 Third Example: Periodic and Chaotic Dynamics The third example qualitatively demonstrates that persistence of OPNs (similar results can be shown for CGSSNs) can detect the dynamic state of a signal as either periodic or chaotic. The example signal used here is from the Lorenz system defined as 𝑑𝑥 𝑑𝑦 𝑑𝑧 = 𝜎(𝑦 − 𝑥), = 𝑥(𝜌 − 𝑧) − 𝑦, = 𝑥𝑦 − 𝛽𝑧. (3.9) 𝑑𝑡 𝑑𝑡 𝑑𝑡 The system was simulated with a sampling rate of 100 Hz and system parameters 𝜎 = 10.0, 𝛽 = 8.0/3.0, and 𝜌 = 180.1 for a periodic response or 𝜌 = 181.0 for a chaotic response. This system was solved for 100 seconds with only the last 20 seconds used to avoid transients. Figure 3.11 shows the resulting Lorenz system simulation signals 𝑥(𝑡) for periodic (top row of 136 Figure 3.11: A comparison of the resulting persistence diagrams for an OPN formed from a periodic and chaotic signal from the Lorenz system. figure) and chaotic (bottom row of figure) dynamics with the corresponding ordinal partition state sequence 𝑆 using dimension 𝑛 = 6 and 𝜏 = 17 selected using multi-scale permutation entropy [160], OPN, and persistence diagram. For this example I used the diffusion distance with 𝑡 = 2𝑑 walk steps. This example result demonstrates that the persistence diagram for a periodic signals tend to have one or few significant points in the persistence diagram of one dimensional features 𝐷 1 representing the cyclic nature of the signal. On the other hand, the 𝐷 1 for chaotic signal has many significant points representing the entanglement of the OPN. The other distance methods also demonstrate similar behavior when comparing the resulting persistence diagrams from periodic and chaotic dynamics. 3.3.4 Fourth Example: The Magnetic Pendulum To demonstrate the method applied to experimental data, I will be using a time series obtained from the angular position 𝜃 (𝑡) of the magnetic pendulum experiment shown in Fig. 5.1 described in Section 5.1 with base excitation amplitude 𝐴 = 0.08 m and frequency 𝜔 = 1.5 Hz. This forcing amplitude results in the periodic time series shown in Fig. 3.12-(a). The resulting permutation sequence as well as the unweighted, undirected network are shown in Figs. 3.12-(b) and (c), 137 Figure 3.12: Example of method applied to experimental data with a periodic response Fig. (a). In Fig. (b) the sequence of permutations are shown for 𝑛 = 6 with the associated ordinal partition network in Fig. (c). In Fig. (d) the distance matrix (using an unweighted network and short path distance) is shown, which was used to compute a persistence diagram with multiplicity shown in Fig. (e) and (f), respectively. respectively. The network exhibits a rather simple structure with one large loop, two smaller loops, and two insignificantly small loops. The distance between nodes is shown through a shortest-path distance matrix (see Fig. 3.12-(d)). With the distance matrix known, the persistence diagram is obtained, which summarizes the loops as 1-D features with lifetimes of [12, 8, 8, 1, 1]. Additionally, a histogram is used to show the lifetime multiplicity, i.e., how many points are overlaid in each location of the persistence diagram. The periodicity score was calculated as 𝑃(𝐷) ≈ 0.61 and the persistent entropy was calculated as 𝐸 ′ (𝐷) ≈ 0.45 using the lifetimes in Fig. 3.12-(f). To make a fair comparison, the same process as shown in Fig. 3.12 is applied to a time series generated from a base excitation with 𝐴 = 0.085 and frequency 𝜔 = 1.5𝐻𝑧, which results in a chaotic response. The resulting network from the permutation sequence is shown in Fig. 3.13-(a). It is clear that the network from the chaotic time series shows significantly more loops with, in general, smaller loop sizes. The size and quantity of these loops are shown in the persistence diagram of the network with the lifetimes (with multiplicity) shown in Fig. 3.13-(b) and (c), respectively. The periodicity score was calculated as 𝑃(𝐷) ≈ 0.95 and the persistent entropy was calculated as 𝐸 ′ (𝐷) ≈ 0.90. This examples show how persistent homology of complex networks can be used to detect a change in complexity of the time series from experimental data. 138 (a) (b) 4 (c) 4 Death Lifetime 3 2 2 1 0 0 1 2 3 4 5 0 50 Birth Count Figure 3.13: Example of method applied to experimental data with a chaotic response Fig. (a). In Fig. (b) the sequence of permutations are shown for 𝑛 = 6 with the associated ordinal partition network in Fig. (c). In Fig. (d) the distance matrix (using an unweighted network and short path distance) is shown, which was used to compute a persistence diagram with multiplicity shown in Fig. (e) and (f), respectively. 3.4 Results This section compares the persistence-based point summaries and the standard network scores, and illustrates the ability of these scores to detect dynamic state changes. Specifically, I compare the point summaries 𝑀 (𝐷 1 ), 𝑃(𝐷 1 ), and 𝐸 ′ (𝐷 1 ) to some commonly used network quantitative characteristics such as the mean out degree ⟨𝑘⟩, the out degree variance 𝜎 2 , and the number of vertices 𝑁. These comparisons are shown in Section 3.4.1 for a family of trajectories from the Rössler system, while Section 3.4.1 tabulates the different scores for a variety of dynamical systems. In Section 3.4.2 I contrast the noise robustness of our approach to the standard network scores for ordinal partition networks. 3.4.1 Dynamic State Change Detection on the Rössler System Letting the parameter 𝑎 in the Rossler system vary in the range 0.37 < 𝑎 < 0.43 in steps of Δ𝑎 = 0.001 and setting 𝛽 = 2 and 𝛾 = 4, I obtain 1201 time series of length 1000 seconds for the 139 state 𝑥. I only retain the last 400 seconds of the simulation to allow the trajectory to settle on an attractor. For the construction of the corresponding 𝑘-NN networks, I sample the time series at 2 Hz in order to capture a sufficient number of oscillations while avoiding overly large point clouds for computing persistence. For the Takens’ embedding I use the mutual information function approach and the nearest neighbor method, respectively, to choose the parameters 𝜏 = 4 and 𝑑 = 7. For constructing the ordinal partition networks use the higher sampling frequency of 20 Hz, and I use MPE to select 𝜏 = 40 and 𝑑 = 6. I found that a higher sampling rate for ordinal partition networks and the resulting longer time series is not an issue due to the maximum number of vertices not being dependent on the length of the time series, but rather on the motif dimension 𝑑 and time series complexity. Furthermore, a higher sampling rate tends to improve the detection of periodic and chaotic time series for ordinal partition networks. The resulting point summaries were found for both ordinal partition networks (left column plots of Fig. 3.14) and 𝑘-NN of Takens’ embedding networks (right column plots of Fig. 3.14). The top two graphs in 3.14 show the bifurcation diagram depicting the local extrema of 𝑥 and the Lyapunov exponent [19], respectively. The periodic regions (shown as the regions between vertical,dashed, green lines with a solid green line below) were identified by investigating the bifurcation diagram and the Lyapunov exponent plots. For the ordinal networks, the left columns plots of Figure 3.14 show a significant drop in all six scores for the large periodic window corresponding to approximately 0.409 ≤ 𝑎 ≤ 0.412. There are also less pronounced drops in these scores for the other shorter periodic windows. These drops are especially evident for ⟨𝑘⟩, 𝐸 ′ (𝐷 1 ), and 𝑃(𝐷 1 ) where the scores significantly decrease in comparison to their surrounding values. However, some scores such as ⟨𝑘⟩ are not normalized, e.g., so that 0 ≤ ⟨𝑘⟩ ≤ 1. Given one time series, and not a parameterized set of series, this makes it difficult or even impossible to distinguish between chaotic and periodic regions. On the other hand, the normalized scores that I introduce in this paper, 𝐸 ′ (𝐷 1 ) and 𝑃(𝐷 1 ), suggest periodic regions when 𝐸 ′ (𝐷 1 ) < 0.5 and 𝑃(𝐷 1 ) < 0.75. It should be noted that the difference between chaotic and periodic regions, as shown in Section 3.4.2, starts degrading as noise levels are increased. 140 Figure 3.14: Rössler system bifurcation for 0.37 < 𝑎 < 0.43 with steps of 0.001. Left column plots include point summaries calculated from ordinal partition networks with parameters 𝜏 = 40 and 𝑑 = 6; Right column plots show the same results for the 𝑘-NN networks generated from Takens’ embedding with parameters 𝜏 = 4 and 𝑑 = 7. The figure compares point summaries 𝑃(𝐷 1 ), 𝑀 (𝐷 1 ), and 𝐸 ′ (𝐷 1 ) with the Lyapunov exponent 𝜆 [19] and some common network parameters including the number of vertices 𝑁, mean out degree ⟨𝑘⟩, and out degree variance 𝜎 2 . 141 Table 3.1: A comparison between persistence diagram point summaries 𝑀 (𝐷 1 ), 𝑃(𝐷 1 ), and 𝐸 ′ (𝐷 1 ) for detecting differences in the networks generated from for periodic (Per.) and chaotic (Ch.) time series using both 𝑘-NN graphs and ordinal partition graphs. 𝑘-NN Graph from Ordinal Partition Graph System/ Takens’ Embedding Ref. Data 𝐸 ′ (𝐷1 ) 𝑀 (𝐷1 ) 𝑃 (𝐷1 ) 𝐸 ′ (𝐷1 ) 𝑀 (𝐷1 ) 𝑃 (𝐷1 ) Per. Ch. Per. Ch. Per. Ch. Per. Ch. Per. Ch. Per. Ch. Chua Circuit C.1 0.00 0.80 0.001 0.19 0.54 0.89 0.21 0.72 0.051 0.19 0.42 0.88 Lorenz C.1 0.04 0.84 0.005 0.16 0.64 0.93 0.18 0.95 0.026 0.36 0.28 0.96 Rossler C.1 0.00 0.85 0.001 0.18 0.50 0.94 0.00 0.89 0.036 0.28 0.33 0.85 Coupled C.1 0.00 0.82 0.003 0.16 0.46 0.94 0.00 0.87 0.033 0.35 0.56 0.92 Lorenz-Rossler Bi-directional C.1 0.00 0.76 0.004 0.13 0.55 0.87 0.25 0.91 0.064 0.29 0.40 0.92 Rossler Mackey-Glass C.1 0.00 0.67 0.001 0.07 0.56 0.93 0.30 0.96 0.077 0.37 0.25 0.93 Logistic Map C.1 0.00 0.93 0.125 0.70 0.00 0.91 NA Henon Map C.1 0.00 0.88 0.111 0.48 0.00 0.96 ECG C.1 0.95 0.86 0.282 0.14 0.97 0.97 0.82 0.89 0.268 0.45 0.92 0.97 EEG C.1 0.96 0.94 0.627 0.33 0.99 0.98 0.89 0.84 0.513 0.31 0.97 0.93 For the 𝑘-NN Takens’ embedding networks, the right column plots of Figure 3.14 show a significant drop in 𝑃(𝐷 1 ), 𝑀 (𝐷 1 ), and 𝐸 ′ (𝐷 1 ) during periodic windows. However, for the traditional graph scores ⟨𝑘⟩ and 𝜎 2 this drop does not clearly correspond to the beginning and end of the periodic window. Further, for the smaller periodic windows interspersed with the chaotic regions I found that ⟨𝑘⟩, 𝜎 2 , and 𝑀 ′ (𝐷 1 ) are too noisy to identity the dynamic state changes in these areas. In contrast, our scores 𝑃(𝐷 1 ) and 𝐸 ′ (𝐷 1 ) retain the ability to distinguish between dynamics regimes, and for 𝑘-NN networks of Takens’ embedding I suggest tagging the time series as periodic when 𝐸 ′ (𝐷 1 ) < 0.5 and 𝑃(𝐷 1 ) < 0.7. Tabulated Results This section uses a variety of dynamical systems to validate the observations I made for the Rössler system in 3.4.1 related to the point summaries 𝐸 ′ (𝐷 1 ), 𝑀 (𝐷 1 ), and 𝑃(𝐷 1 ) that I introduced in 3.2.3. The results for each system when using ordinal partition networks and the 𝑘-NN network from Takens’ embedding are provided side by side in Table 3.1. The model and time series information for all of these systems are provided in C.1. The table can be categorized into three types of dynamical systems: (1) systems of differential equations (Chua circuit, Lorenz, Rössler, coupled Lorenz-Rössler, bi-directional Rössler, and Mackey-Glass equations), (2) discrete- time dynamical systems (Logistic map, and Hénon map), and (3) ECG and EEG signals. The paragraphs below discuss the results for each one of these systems. 142 Systems of differential Equations: As shown in Table 3.1, our point summaries from both networks yield distinguishable differences between periodic and chaotic time series. The 𝑘-NN graph results in Table 3.1 show that periodic time series have 𝐸 ′ (𝐷 1 ) < 0.5, 𝑀 (𝐷 1 ) < 0.15, and 𝑃(𝐷 1 ) < 0.7. Similarly, the ordinal partition graph scores in Table 3.1 show that periodic time series have 𝐸 ′ (𝐷 1 ) < 0.5, 𝑀 (𝐷 1 ) < 0.07, and 𝑃(𝐷 1 ) < 0.75. Discrete dynamical systems: The results for the discrete dynamical equations in Table 3.1 show distinguishable differences between periodic maps in comparison to chaotic maps when using ordinal partition networks. Takens’ embedding was not applied to the discrete dynamical systems, and only the ordinal partition network results are reported here because working with these networks is more natural for maps. EEG and ECG Results: The point summary results from real world data sets (ECG and EEG) shown in Table 3.1 have inherent noise, which causes the differences between the compared states to be less significant as shown in Fig. 3.18. The 𝑘-NN graph results in Table 3.1 do not show a significant difference between the two groups for either ECG and EEG data. This is most likely due to the sensitivity of Takens’ embedding to noise and perturbations. However, I did find a difference between epileptic and healthy patients through the networks formed by ordinal partitions for ECG [153] and EEG [7] data. 3.4.2 discusses the effect of additive noise on the point summaries in more detail. As a note, there have been other methods for characterizing EEG data using TDA and persistent entropy [184], but our method is different from prior works because I apply persistent homology to the generated networks. In this section we discuss the empirical results on the dynamic state detection capabilities and stability of the persistent homology of ordinal partition networks using the distance methods for incorporating weight information. 3.4.2 Dynamic State Detection Using Machine Learning on Persistence Diagrams To determine the viability of the persistence diagram for categorizing the dynamic state of a signal using the persistent homology of the shortest weighted path, weighted shortest path, and diffusion 143 distances compared to the shortest unweighted path distance we use the lower dimensional projection of the persistence diagrams. Specifically, we implemented the Multi-Dimensional Scaling (MDS) (a) Shortest unweighted path distance. (b) Shortest weighted path distance. (c) Weighted shortest path distance (d) Diffusion distance Figure 3.15: Comparison between the (a) shortest unweighted path, (b) shortest weighted path, (c) weighted shortest path, and (d) lazy diffusion distances using a two dimensional MDS projection (random seed 42) of the bottleneck distances between persistence diagrams of the OPN for chaotic and periodic dynamics with an SVM radial bias function kernel separation. projection to two dimensions using the bottleneck distance matrix for our 23 systems (see Table C.1 for a list). These systems were simulated from the dynamical systems module in the Python package Teaspoon with details on the simulations provided in Appendix C. We then use a Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel to delineate periodic and chaotic dynamics based on the two dimensional MDS projection. The SVM fit was done using default parameters for the SKLearn SVM package in Python. 144 I generate results separating persistence diagrams from periodic and chaotic dynamic using the following graph distances: shortest unweighted path, shortest weighted path, weighted shortest path, and diffusion distance. These distance are used when defining the distance matrix that is used to calculated the persistent homology of the complex network. In the following paragraphs I apply this machine learning analysis to both the OPNs and CGSSNs. Machine Learning on the Ordinal Partition Network’s Persistent Homology The results for the OPN using the shortest unweighted path (Fig. 3.15 a), shortest weighted path (Fig. 3.15 b), weighted shortest path (Fig. 3.15 c), and diffusion distance (Fig. 3.15 d) are shown in Fig. 3.15. The average and standard deviation of the accuracy for each SVM kernel are provided as the percent accuracy in Table 3.2. These accuracy statistics were generated using random seed 1 to 100. Based on this initial analysis it is clear that the diffusion distance significantly outperforms the other distance methods with an accuracy of 95.0% ± 0.9% in comparison to the second best accuracy of 89.5% using the weighted shortest path. The worst performance was from the shortest unweighted path distance, which has an accuracy of 80.3% for this random seed (42). We theorize that one reason for the increased performance when using the diffusion distance is in how it tends to normalize the scale of the persistence diagram. Specifically, when comparing the 23 dynamical systems, the maximum lifetimes for 𝑡 = 2𝑑 walk steps ranges from 0.08 to 0.21 with a mean of 0.147 and standard deviation of 0.042 or 28.6% of the average. In comparison, the maximum lifetimes for the shortest unweighted path distance range from 2 to 24 with an average of 9.38 and standard deviation of 6.36 or 67.8% of the average. This demonstrates that the persistence diagrams from the diffusion distance calculation tends to be more consistent in magnitude. We can further show this relationship using the cycle graph 𝐺 cycle (𝑛), where 𝑛 as the number of nodes is increased from 2 to 500 with the maximum persistence calculated for each graph (see Appendix Section D.1). In comparison to the shortest path distances, this result shows that the persistence of the cycle graph does not continue to grow with a larger cycle graph when using the diffusion 145 distance and trends to a plateau. Overall, none of the distances in combination with the ordinal partition networks were able to accurately separate 100% of the periodic from chaotic persistence diagrams. Table 3.2: Accuracies of the distance methods for both ordinal partition and coarse grained state space networks. Network Distance Method Percent Accuracy (%) OPN Shortest unweighted path 80.7 ± 1.5 OPN Shortest weighted path 88.9 ± 0.0 OPN Weighted shortest path 88.9 ± 0.0 OPN Lazy diffusion distance 95.0 ± 0.9 CGSSN Shortest unweighted path 98.1 ± 0.0 CGSSN Shortest weighted path 100.0 ± 0.0 CGSSN Weighted shortest path 98.1 ± 0.0 CGSSN Lazy diffusion distance 100.0 ± 0.0 Machine Learning on the Coarse Grained State Space Network’s Persistent Homology I next repeat the previous SVM analysis on the coarse grained state space network. As mentioned previously, the CGSSN has better stability qualities than the OPN and thus may be able to better distinguish between dynamic states. Further, the CGSSN takes into account the amplitude of the state space vectors, which is discarded information when creating OPNs. For this analysis we used 𝑏 = 12 bins and 𝑛 = 4 for generating CGSSNs for all of the systems. An appropriate delay was selected using the multi-scale permutation entropy method. The resulting SVM separations are shown in Fig. 3.16 for random seed 42. The average and standard deviation of the accuracy for each SVM kernel are provided as the percent accuracy in Table 3.2. These accuracy statistics were generated using random seed 1 to 100. These results show that all of the distances applied to the CGSSN outperformed the OPN alternative. Specifically, both the shortest weighted path and diffusion distances were able to have 100% accuracy for seperating dynamics based on the persistence diagrams, while the shortest unweighted path and weighted shortest path both had 98.1% accuracy. I again theorize this is due 146 (a) Shortest unweighted path distance. (b) Shortest weighted path distance. (c) Weighted shortest path distance (d) Diffusion distance Figure 3.16: Comparison between the (a) shortest unweighted path, (b) shortest weighted path, (c) weighted shortest path, and (d) lazy diffusion distances using a two dimensional MDS projection (random seed 42) of the bottleneck distances between persistence diagrams of the CGSSN for chaotic and periodic dynamics with an SVM radial bias function kernel separation. to the CGSSN taking into account the state space vector amplitude information that is discarded by the OPN. Stability Analysis One drawback to using MDS in our setting is that it cannot be used for true supervised learning as data points not in the original training set cannot be assigned a projection after the fact. We can at least analyze how sensitive the bottleneck distance between persistence diagrams is to differences in the input time series, showing that the results are resilient to noise. While we would like to be able to provide a stability proof in the spirit of [49], such an investigation 147 is outside the scope of this work. Figure 3.17: Bottleneck distance stability analysis of the periodic Lorenz system (see Eq. (4.3)) with standard deviation normalized signal and bounded (𝜀 = 6𝜎) Gaussian additive noise. Analysis shows stability results using Shortest Unweighted Path Distance (SUPD), Shortest Weighted Path Distance (SWPD), Weighted Shortest Path Distance (WSPD), and Diffusion Distance (DD). Instead we use an empirical study of the stability of the bottleneck distance using the same systems with the periodic signals (both dissipative autonomous and driven). Specifically, we tested the stability by adding bounded Gaussian noise to the signal. The noise had Signal to Noise Ratios (SNR) from ∞ (no noise) to 15 dB (extremely noisy). The additive noise followed a zero-mean Gaussian distribution that was truncated at three standard deviations from the mean and set 𝜖 = 6𝜎. To make a fair comparison between each of the distance methods in terms of stability and sensitivity to noise we normalize the bottleneck distance as 𝑑 𝐵 (𝐷 1 , 𝐷 1𝜖 ) 𝑑 ∗𝐵 (𝐷 1 , 𝐷 1𝜖 ) = 1 Í , (3.10) 2 𝑥∈𝐷 1 pers(𝑥) where 𝑑 𝐵 is the bottleneck distance function and 𝐷 1 and 𝐷 1𝜖 are the noise free and noise contaminated one-dimensional persistence diagrams, respectively. Figure 3.17 provides a demonstrative example of the effects of noise and the stability of the persistence diagram for the Lorenz system. The persistence diagrams as 𝜖 is increased are drawn overlaid in Fig.3.17 (b) In Fig.3.17, we see the bottleneck distance from the the noise free diagram to the noise contaminated diagram as the noise amplitude 𝜖 is increased. In the case of Lorenz, all four distance methods are stable with an approximately linear change in the bottleneck distance with respect to the noise level 𝜖 for small levels of noise (less than 25 dB). Additionally, 𝑑 ∗𝐵 tends 148 r[h] Figure 3.18: Average point summaries and network parameters for varying SNRs from Gaussian noise added to time series generated from periodic and chaotic Rössler systems. For each SNR, 25 separate samples are taken to provide mean values and standard deviations, which are shown as the error bars. to plateau at noise levels greater than approximately 18 dB. This is due to the minimum pairing between diagrams matching to the diagonal. It is also clear the shortest weighted path distance is significantly less sensitive to additive noise with only slight changes in its normalized bottleneck distance as 𝜖 is increased. Some of these characteristics seen in the Lorenz systems seem to be consistent across all of the other 22 systems; see Appendix for similar figures for the remaining systems. The shortest weighted path distance tends to be the least sensitive to additive noise. Additionally, the bottleneck distance tends to plateau at approximately 20 dB for most systems. Most importantly, all of the distance methods tend to have an approximately linear relationship between 𝑑 ∗𝐵 and 𝜖 for low levels of noise (SNR ≤ 25 dB). These results empirically demonstrate that the persistence diagram is stable in this setting for limited levels of additive noise. 149 Some characteristics that tend to be highly dependent on the system is the sensitivity of the shortest unweighted path, weighted shortest path, and diffusion distances to additive noise. For some systems (e.g. the Rabinocih Frabrikant attractor), the weighted shortest path distance is the least sensitive to high levels of additive noise, while in other systems (e.g. the Thomas cyclically symmetric attractor) the weighted shortest path distance is the most sensitive to additive noise. In most systems the diffusion distance and shortest unweighted path are comparably sensitive to additive noise. Effects of Additive Noise I investigate the noise robustness of the point summaries in comparison to some common network parameters—mean out degree ⟨𝑘⟩, out degree variance 𝜎 2 , and the number of vertices 𝑁. The ordinal partition networks are based on time series from the Rössler system with parameters 𝑏 = 2.0, 𝑐 = 4.0, and either 𝑎 = 0.41 or 𝑎 = 0.43 for a periodic or chaotic response, respectively. To make comparisons on the noise robustness I add Gaussian noise to the signal and calculate the point summaries and network parameters at various Signal-to-Noise Ratios (SNR) for both periodic and chaotic Rössler systems. The chosen SNR values were all the integers from 1 to 50, and at each SNR value I obtain 25 realizations of noisy signals. To determine the 68% confidence interval at each SNR, I repeat the calculation of the point summaries and network parameters for all noise realizations at each SNR level, and I set our confidence interval to 𝑥(𝑆𝑁 𝑅) ± 𝑠(𝑆𝑁 𝑅) where 𝑥(𝑆𝑁 𝑅) and 𝑠(𝑆𝑁 𝑅) are the sample average and sample standard deviation, respectively, at a specific SNR value. Figure 3.18 shows the mean values and confidence intervals for each SNR. To assess the ability of point summaries to assign a distinguishing score to a periodic versus a chaotic system in the presence of noise, I check for an overlap in the confidence intervals for the periodic and chaotic results at each SNR. If for a particular point summary there is an overlap between the scores for periodic and the chaotic time series, then that point summary is not effective in distinguishing the dynamics at that specific SNR. Table 3.3 summarizes the noise robustness by providing the lowest SNR at which each point summary and 150 network parameter no longer has an overlap between the periodic and chaotic confidence intervals. This result shows a lower capable SNR for the persistence based point summaries than the mean out degree ⟨𝑘⟩ and variance 𝜎 2 . Another trend that should be noted is the reduction in difference between periodic and chaotic time series for high levels of noise. This should be taken into account when applying the point summaries to real world data with intrinsic noise. Table 3.3: Noise robustness comparison for persistence diagram point summaries and network parameters using ordinal partition network. Point Summary Network Parameter Lowest Distinguishing SNR 𝐸 ′ (𝐷 1 ) 14 𝑀 (𝐷 1 ) 19 𝑃(𝐷 1 ) 20 ⟨𝑘⟩ 29 𝜎2 29 𝑁 8 151 CHAPTER 4 PERSISTENT HOMOLOGY OF DYNAMICAL NETWORKS A dynamical system is any system whose future state is dependent on the current state. Many real- world dynamical systems are simulated using approximate models. Standard dynamical system models occupy a wide range of applications from population models [144] to aeronautical dynam- ics [204]. A common characteristic of a dynamical system is that its behavior can change with a system parameter, known as a bifurcation. For example, the airflow over an aircraft’s wing can change from laminar to turbulent with a change in the angle of attack resulting in stall [208,251], the load on a power-grid system can push a line to fail causing a cascade failure and black-out [206,207], or a change in atmospheric chemistry can cause for severe weather [93]. Capturing the characteristic changes of a dynamical system through a measurement signal is critical in detecting, predicting, and possibly preventing some of these catastrophic failures. Outside of detecting imminent events, many other important characteristics of a system are studied through the lens of dynamics. These include population models transitioning from stable values to chaotic oscillations based on environmental factors [144], economic bubbles showing dynamics with bimodal distribution bifurcations [67], or chaotic fluctuations in power-grid dynamics through period-doubling [103]. A common avenue to study these systems is through time series or signals, which are widely utilized to analyze real-world dynamical system bifurcations. For example, a change in measured biophysical signals can indicate upcoming health problems [87,166,195] or a change in the vibratory signals of machines or structures can be the harbinger of imminent failure [12, 218]. Time series typically originate from real-life systems measurements, and they provide only finitely sampled information from which the underlying dynamics must be gleaned. Time series analysis methods have many useful foundational tools for bifurcation and dynamic state analysis, such as frequency spectrum analysis [24, 64] and autocorrelation [194]. While time series analysis tools can be leveraged for bifurcation detection and dynamic state analysis, many complex and high-dimensional dynamical systems and their corresponding mea- 152 surements can more naturally natural be represented as complex networks. For example, there are dynamical systems models for social networks [215], disease spread dynamics [98], manufacturer- supplier networks [244], power grid network [206], transportation networks [56]. These dynamical system models demonstrate how dynamical networks can be representative of highly complex real-world systems. Many important characteristics of a dynamical network can be extracted from the data. These include source and rate of disease spread as well as predictions on future infec- tions [72], weak branches in supply chains and possible failures [169,244], changes in infrastructure to avoid cascade failures in power grids [206, 219], transportation network optimal routing (finding an optimal minimum time route between) [17], fault analysis (detecting transportation disruptions) in transportation networks [224], and flow pattern analysis (visualization) [90]. If the studied system only has a single one-dimensional signal output, we can still represent the dynamical system as a temporal network. This is done using complex networks representations of windowed sections of time series data to visualize how the graph structure of the windowed time series data changes. Examples of graph formation techniques from time series include k- nearest-neighbors networks [118], epsilon-recurrence networks [101], coarse-grained state-space network [216, 217], or ordinal partition networks [146]. We are only using the data to construct the evolving networks in this work. As such, we will aptly refer to them as temporal graphs [94]. While a complex dynamical system typically drives the temporal graphs, the underlying equations of motion are unknown. Temporal graph data is commonly represented using attributed information on the edges for the time intervals or instances in which the edges are active [41, 97]. Using this attributed information, we can represent the graph in several ways [238]. In this work we will first represent the data in the standard attributed temporal graph structure and then use the graph snapshots approach. The graph snapshots represent the temporal graph as a sequence of static graphs 𝐺 0 , 𝐺 1 , . . . , 𝐺 𝑛 . The standard network analysis tools for studying temporal networks often include measures such as centrality or flow measures [23], temporal clustering for event detection [53, 156, 247], and connectedness [108]. However, these tools do not account for higher-dimensional structures 153 (e.g., loops as a one-dimensional structure). It may be important to account for evolving higher- dimensional structures in temporal networks to understand the changing structure better. For example, a highly connected network may only have one connected component with no clear clusters, but the number of loops within the network may detect the change. To study the evolving higher dimensional structures within a temporal network, we will leverage zigzag persistence [35] from the field of Topological Data Analysis (TDA) [34]. TDA is typically used to study point cloud data through the flagship tool persistent homology. Persistent homology, colloquially referred to as persistence, encodes structure by analyzing the changing shape of a simplicial complex (a higher dimensional generalization of a network) over a filtration (a nested sequence of subcomplexes). It should be noted that the majority of these applications utilize a relatively standard pipeline to construct this filtration. Namely, given point cloud data embedded in R𝑛 as input, construct the Vietoris Rips (VR) at multiple distance filtration values. The VR complex is generated for incremented filtration values such that the result is a nested sequence of simplicial complexes. The homology of the point cloud data can then be measured for each simplicial complex. The homologies that persist over a broader range of filtration values are significant. We provide a more detailed introduction in Section 4.1. It is also possible to apply this framework to graph data using geodesic distance measures such as the shortest path as done in [162]. Unfortunately, the standard persistent homology pipeline does not account for temporal infor- mation. To account for temporal changes, we use zigzag persistence. Instead of measuring the shape of static point cloud data through a distance filtration, zigzag persistence measures how long a structure persists through a sequence of changing simplicial complexes. For example, in [233] the Hopf bifurcation is detected through zigzag persistence (i.e., a loop is detected through the one-dimensional zigzag persistence diagram). The zigzag persistence algorithm incorporates the two essential characteristics of temporal graphs we are looking to study—namely, the temporal and structural information stored within a temporal network. In this work, we will use zigzag persistence to visualize these changes. Zigzag persistence com- 154 pactly represents both temporal and structural changes using a persistence diagram. The persistence diagram is a two-dimensional summary of persistent homology. The resulting persistence diagram is commonly analyzed through either a qualitative analysis, standard one-dimensional statistical summaries, or machine learning via vectorizing the persistence diagram. Organization We will start in section 4.1 with an introductory background on persistent homol- ogy and zigzag persistence. Following this, we introduce the two systems we will study. The first is a dataset collected over a week of the Great Britain transportation system. The second is an intermittent Lorenz system simulation, where we generate a temporal network through complex networks of sliding windows. Next, in Section 4.2, we overview the general pipeline for applying zigzag persistence to temporal graph data. We couple this explanation with a demonstrative toy example. In Section 4.3 we apply zigzag persistence to our two examples and show how the resulting persistence diagrams help visualize the underlying dynamics in comparison to standard temporal network analysis techniques. 4.1 Background 4.1.1 Zigzag Persistence A problem with the standard application of persistent homology is that it requires each subsequent simplicial complex to be a subset of the previous simplicial complex. This directionality problem results in limited applications where new simplexes can not be included in the simplicial complex filtration, which occurs in many real-world datasets. This issue was alleviated through zigzag persistence [35, 36], which allows for a zigzagging of the subset directions as 𝐾0 ↔ 𝐾1 ↔ 𝐾2 ↔ . . . ↔ 𝐾𝑛 , (4.1) where there isn’t necessarily a filtration parameter for the ordered simplicial complexes. The subset direction is determined based on which is the subset. However, it is possible to force the direction to zigzag if we can create a simplicial complexes 𝐾𝑖,𝑖+1 with both 𝐾𝑖 and 𝐾𝑖+1 as subset as shown 155 in Eq. (4.2). 𝐾0 ↩→ 𝐾0,1 ←↪ 𝐾1 ↩→ 𝐾1,2 ←↪ 𝐾2 ↩→ . . . ←↪ 𝐾𝑛−1 ↩→ 𝐾𝑛−1,𝑛 ←↪ 𝐾𝑛 . (4.2) We can now determine when homology features are born and die based on the zigzag persistence. We again track this with a persistence diagram consisting of persistence pairs (𝑏𝑖 , 𝑑𝑖 ). However, 𝑏𝑖 and 𝑑𝑖 are the times or indices when the homology was born and died instead of the filtration value. If there are times associated with the indices, then the time value can be used in substitution of the indices. Additionally, the complexes 𝐾𝑖,𝑖+1 have half step indices (e.g., 𝑖 + 0.5), or the average time between the two can be used. This work will have times associated with the simplicial complexes instead of indices. For more details, an example demonstrating zigzag persistence on a temporal graph is provided in Section 4.2.1. 4.1.2 Temporal Graphs A temporal graph is a graph structure that incorporates information on when edges and/or nodes are present in the graph. We will only be using the case on temporal information attributed to the edges in this work. We apply zigzag persistence to two main temporal networks described in the subsequent subsec- tions. The first is the Great Britain transportation network, and the second is the temporal ordinal partition network. Great Britain Multi-layered Temporal Transportation Network We use temporal networks created from the Great Britain (GB) temporal transportation dataset [79] for the air, rail, and coach transportation methods. This data provides the destinations (nodes) and connections (edges) for public transportation in GB. Additionally, the departure and arrival times are provided to allow for a temporal analysis. This temporal data was collected for one week. The graphs constructed without the use of temporal information are shown in Fig. 4.1 where the destinations are overlaid with a GB map. As shown, the network’s destination encompasses both cities and remote towns as well as the connections between them. As such, the network’s structure 156 (a) Air Travel Network (b) Coach Network (c) Rail Network Figure 4.1: Transportation networks of Great Britain for air, coach, and rail travel. encodes the transportation connectivity. In section 4.2 we introduce our method for generating snapshots for different time intervals over the entire week period that the transportation data was collected. Temporal Ordinal Partition Network Ordinal partition networks [146] are a graph representa- tion of time series data based on permutation transitions. As such, they encapsulate the state space structure of the underlying system. While we only use the ordinal partition network in this work, there are several other transitional complex networks from time-series data that a similar analysis could be done. These include 𝑘-nearest-neighbors [118], epsilon-recurrence [101], coarse-grained state-space networks [216, 217]. The ordinal partition network is formed by first generating a sequence of permutations from the time series 𝑥 = [𝑥 0 , 𝑥1 , 𝑥2 , . . . , 𝑥 𝑛 ] using a permutation dimension 𝑚 and delay 𝜏. These are the same permutations in the information statistic permutation entropy [14]. In this work we choose 𝑚 = 6 and 𝜏 using the multi-scale permutation entropy method as suggested in [160]. We generate a sequence of permutation by assigning each vector embedding 𝑣 𝑖 = [𝑥𝑖 , 𝑥𝑖+𝜏 , 𝑥𝑖+2𝜏 , . . . , 𝑥𝑖+(𝑚−1)𝜏 ] = [𝑣 𝑖 (0), 𝑣𝑖 (1) . . . , 𝑣𝑖 (𝑚 − 1)] 157 to one of the 𝑚! possible permutations. We assign the permutation 𝜋𝑖 = [𝜋𝑖 (0), . . . 𝜋𝑖 (𝑛 − 1)] ∈ Z𝑚 based on the ordinal pattern of 𝑣 𝑖 such that 𝑣 𝑖 (𝜋(0)) ≤ 𝑣 𝑖 (𝜋(1)) ≤ 𝑣 𝑖 (𝜋(2)) ≤ . . . ≤ 𝑣 𝑖 (𝜋(𝑛 − 1)). Using the sequence of permutations Π = [𝜋0 , 𝜋1 , . . . , 𝜋𝑛−𝑚−2𝜏 ] we can form a graph 𝐺 (𝐸, 𝑉) by sett the vertices 𝑉 as all permutations used and edges for transitions from 𝜋𝑖 to 𝜋𝑖 + 1. We will not add weight or directionality to the graph for this formation. However, we will include the index 𝑖 and the corresponding time at which each edge is activated as temporal data for the graph. For more details on the ordinal partition network, we direct the reader to [146, 162]. 4.2 Method To apply zigzag persistence to study temporal graphs, we need a process as outlined in the pipeline shown in Fig. 4.2. This process needs to take a temporal graph to a sequence of snapshot graphs, which can then be represented as zigzagging subset simplicial complexes. This procedure then allows for the application of zigzag persistence. We begin with a dataset as a temporal graph where each edge has intervals or instances in time representing when the edge is active. Figure 4.2: Pipeline for applying zigzag persistence to temporal networks. Begin with an un- weighted and undirected temporal graph where each edge is on at a point or interval of time. Create graph snapshots using a sliding window interval over the time domain. Create a sequence of simplicial complexes from the graphs and apply zigzag persistence to the union zigzag simpli- cial complexes. Graph snapshots 𝐺 𝑖 are generated using a sliding window technique using the temporal in- formation. The sliding window for graph snapshot 𝐺 𝑖 is defined as 𝑆𝑊 𝑖 (𝑤, 𝑡𝑖𝑆𝑊 ) with width 𝛿 and centered at time 𝑡𝑖𝑆𝑊 . The sliding windows can also be set to overlap by choosing window 𝑆𝑊 − 𝑡 𝑆𝑊 ≤ 𝑤. We further need to include union windows for the use of zigzag times such that 𝑡𝑖+1 𝑖 158 persistence, which are defined as 𝐺 𝑖,𝑖+1 and re-generated from the union of two adjacent sliding windows 𝑆𝑊 𝑖 ∪ 𝑆𝑊 𝑖+1 . From the graph snapshots using the sliding windows and their unions, we create a sequence of simplicial complexes using a Vietoris-Rips (VR) complex with distance filtration value 𝑟. The choice of an appropriate 𝑟 is dependent on the application, but in general, we suggest 1 ≤ 𝑟 ≤ 3. The VR complex 𝐾𝑖 for each 𝐺 𝑖 is generated using the unweighted and undirected shortest path distance between nodes and filtration value 𝑟. If 𝑟 = 1, the original graph is returned by filling in only the edges as 1-dimensional simplices. Similarly, higher 𝑟 values fill in 𝑟-dimensional simplices. Choosing higher 𝑟 values for generating simplicial complexes results in small higher-dimensional features not being represented in the persistence. For example, if 𝑟 = 2 and there is a 3-node cycle subgraph in the graph, the cycle would be filled with the 2-simplex. This would result in the cycle not being present in the one-dimensional homology. We use the resulting sequence of simplicial complexes to calculate zigzag persistence to study the changing structure of the temporal graph. In the following simple example shown in Fig. 4.3, we describe the method in more detail and show how to interpret the resulting zigzag persistence diagram. 4.2.1 Example In this example, we demonstrate how to use zigzag persistence to measure the changing structure of a simple 5-node cycle graph as edges are added and removed based on the temporal information. Figure 4.3a shows the temporal information of the simple cycle graph as the intervals on each edge. The sliding windows for this example are created with width 𝑤 = 1 and 𝑡𝑖𝑆𝑊 = 0.5 such that the windows are the non-overlapping intervals 𝑆𝑊𝑖 = [𝑖, 𝑖 + 1]. For each window a graph snapshot 𝐺 𝑖 is created, where 𝐺 𝑖 is the edge induced subgraph with edges added if the window 𝑆𝑊 𝑖 overlaps with the edge interval. The union graphs 𝐺 𝑖,𝑖+1 are also created using the union of adjacent sliding windows as 𝑆𝑊 𝑖 ∪ 𝑆𝑊 𝑖+1 = [𝑖, 𝑖 + 2]. By using the union subgraphs we have 𝐺 𝑖 ⊂ 𝐺 𝑖,𝑖+1 and 𝐺 𝑖+1 ⊂ 𝐺 𝑖,𝑖+1 . 159 (a) Edge intervals with sliding windows highlighted (alternating blue- (b) Zigag persistence diagram for red) with corresponding graphs and union graphs above. both 𝐻0 and 𝐻1 . Figure 4.3: Example zigzag persistence applied to a simple temporal cycle graph. To calculate the zigzag persistence for this example we created VR complexes 𝐾𝑖 and 𝐾𝑖,𝑖+1 for each graph 𝐺 𝑖 and union graph 𝐺 𝑖,𝑖+1 , respectively, using the unweighted and undirected shortest path distance with distance filtration value 𝑟 = 1. Setting 𝑟 = 1 creates the graph equivalent simplicial complex. At the end of the sliding windows, we consider the graph empty and set the death of any remaining homology features as the end time of the last window (i.e., 𝑡 = 10 for this example). The resulting zigzag persistence diagram is shown in Fig. 4.3b. This persistence diagram shows the zero-dimensional and one-dimensional features as 𝐻0 and 𝐻1 , respectively. There are two one-dimensional features at persistence pairs (1, 3) and (0.5, 10). The persistence pair (0.5, 10) was born first at 𝐺 0 which occurred at 𝑡 = 0.5 as the first connected component. The second component and persistence pair appears in 𝐺 0,1 at time 𝑡 = 1. Both of these components persist until 𝐺 2,3 at 𝑡 = 3, where, based on the elder’s rule, the first-born feature persists with the later-born feature or component dying. This explains the persistence pair (1, 3) with the component born at 𝐺 0,1 and dying at the merging of components in 𝐺 2,3 . The first-born component continues to persist until the last window. Based on our definition, we set the death of this feature as the end interval of the last window, with the second persistence pair at (0.5, 10). The one-dimensional feature (the cycle represented in 𝐻1 ) is present twice in the persistence diagram. This is due to it first appearing in 𝐺 3,4 and then disappearing at 𝐺 4 with 160 the first corresponding persistence pair at (4, 4.5). The cycle then reappears at 𝐺 5,6 and again disappears at 𝐺 7 corresponding to the second persistence pair at (6, 8.5). This example demonstrates how zigzag persistence captures the changing structure of temporal graphs at multiple dimensions. We can also capture higher-dimensional structures using the persistence diagram, but we do not investigate them in this work. 4.3 Results To demonstrate the functionality of zigzag persistence for analyzing temporal graphs, we will use two examples. The first is an analysis of transportation data from Great Britain discussed in Section 4.1. The second is a simulated dataset from the Lorenz system that exhibits intermittency, a dynamical system phenomenon where the dynamic state transition from periodic to chaotic in irregular intervals. We compare our results for both examples to some standard networks tools to analyze temporal networks. Namely, we will compare two connectivity statistics and three centrality statistics. The two connectivity statistics analyze the Connected Components (CCs). The first CC statistic is the number of connected components 𝑁𝑐𝑐 , which provides a simple shape summary of the graph snapshots by understanding the number of disconnected subgraphs. The second statistic is the average size of the connected components 𝑆¯𝑐𝑐 . This statistic provides insight into how significant the components are for each graph snapshot. The second statistic type is on centrality measures. The three centrality measures we use are the average and standardized degree centrality 𝐶¯𝑑 , betweenness centrality 𝐶¯𝑏 , and closeness centrality 𝐶¯𝑐 . The degree centrality measures the number of edges connected to a node, the betweenness centrality measures how often a node is used all possible shortest paths, and the closeness centrality measures how close the node is to all other nodes through the shortest path. For details on the implementation of each centrality measure, we direct the reader to [125]. 161 4.3.1 Great Britain Temporal Transportation Network From the Great Britain transportation data discussed in Section 4.1, we created temporal graphs from the air, rail, and coach transportation methods. We created these temporal graphs using the sliding window technique for graph snapshots introduced in Section 4.2. For the sake of brevity, in this section, we will only show the results of applying zigzag persistence to the temporal rail network. Results for the other two networks (air and coach) are provided in the appendix and show similar behavior. We set the sliding windows with width 𝑤 = 20 minutes. We chose this window size based on the average weight time being 7 minutes and 7 seconds with a standard deviation of 7 minutes and 24 seconds from a collected sample [235]. Additionally, we used an overlap of 50% between adjacent windows. To create simplicial complexes from the graph snapshots, we used a distance filtration of 𝑟 = 1. Figure 4.4: Connectivity and centrality analysis on temporal Great Britain rail network. As a first approach to understand the dynamics of this graph, we implement the standard centrality and connectivity statistics as shown in Fig. 4.4. The standard tools show us the general daily trends. Specifically, all the connectivity and centrality measures increase during peak travel hours. However, further information is difficult to glean from these statistics. On the other hand, in 162 (b) Zero-dimensional zigzag persis- (c) One-dimensional zigzag persis- (a) Full Rail Travel Network. tence. tence. Figure 4.5: Zigzag persistence diagrams of the rail transportation network of Great Britain. Fig. 4.5 the zigzag persistence provides us with much more information. It also shows daily trends, but it also conveys through 𝐻0 that a main connected component persists for the first six days and a second component for the last day. This provides an understanding of the long-term connectivity of this component that was not present in the standard statistics. Further, the 𝐻1 encapsulates that travel loops form during peak travel times and persist daily. 4.3.2 Temporal Ordinal Partition Network for Intermittency Detection Using a sliding window technique, we can represent ordinal partition networks as temporal graphs. However, instead of each edge having a set of intervals associated with it as in the example in Section 4.2, they instead have time instances each edge is active. The instances are based on when a transition between unique permutations occurs. For example, the transition from 𝜋𝑖 to 𝜋𝑖+1 occurring at time 𝑡𝑖 would be active for that moment in time 𝑡𝑖 . If the sliding window overlaps with an edge’s activation instance, we add that edge to the sliding windows graph. We will show how this procedure can be used to detect chaotic and periodic windows in a signal exhibiting intermittency (i.e., the irregular transitions from periodic to chaotic dynamics). The 163 signal is 𝑥 solution to the simulated Lorenz system defined as 𝑑𝑥 𝑑𝑦 𝑑𝑧 = 𝜎(𝑦 − 𝑥), = 𝑥(𝜌 − 𝑧) − 𝑦, = 𝑥𝑦 − 𝛽𝑧 (4.3) 𝑑𝑡 𝑑𝑡 𝑑𝑡 with system parameters 𝜎 = 10.0, 𝛽 = 8.0/3.0, and 𝜌 = 166.15 for a response with type 1 intermittency [188]. We simulated the system with a sampling rate of 100 Hz for 500 seconds with only the last 70 seconds used. We set the sliding windows for generating graph snapshots to have a width of 𝑤 = 10𝜏 and 80% overlap between adjacent windows. For each window, we generated ordinal partition networks using 𝜏 = 30 and 𝑛 = 6, where 𝜏 was selected using the multi-scale permutation entropy method [160]. The resulting signal 𝑥(𝑡) from simulating the Lorenz system in Eq. (4.3) is shown in Fig. 4.6 with example ordinal partition networks generated at a chaotic window highlighted in red and a periodic window highlighted in blue. These sample graph snapshots show that the structure of the ordinal partition network significantly changes depending on the dynamic state of the window’s time-series segment. Further, we expect to see little change in the graph structure while the window slides along a periodic region of 𝑥(𝑡) compared to significant changes when overlapping with a chaotic region. Figure 4.6: The 𝑥(𝑡) solution to simulation of Lorenz system from Eq. (4.3) exhibiting intermittency with example sliding windows for both periodic (blue) and chaotic (red) dynamics with their respective ordinal partition networks. 164 We show the standard tools for connectivity and centrality measures of the graph snapshots in Fig. 4.7. The number of components 𝑁𝑐𝑐 is constant due to the nature of the ordinal partition network, where the sequence of permutation transitions creates a chain of connected edges. As such, there is no structural information in the number of components. However, the size of the components does increase during the chaotic windows. This increase is due to, in general, more unique permutations and thus nodes used in a chaotic signal compared to periodic. Of the centrality statistics, only the average closeness centrality shows an apparent increase during chaotic regions. The increase in centrality is most likely due to the chaotic regions causing a more highly connected graph as demonstrated in the chaotic window and corresponding network of Fig. 4.6. While these statistics do provide some insight into the changing dynamics, they do not show how the higher- dimensional structure of the graph evolves through the sliding windows and graph snapshots. Figure 4.7: Connectivity and centrality analysis on temporal ordinal partition network with chaotic regions of 𝑥(𝑡) highlighted in red. In comparison to the standard statistics, the 𝐻1 in Fig. 4.8 shows us a persistent loop structure that persists between the chaotic windows, which is representative of the periodic nature. Further, the 165 𝐻1 shows that the chaotic windows characteristically have many low-lifetime persistence pairs. This Figure 4.8: One-dimensional zigzag persistence of the temporal ordinal partition network from the 𝑥 solution of the intermittent Lorenz system described in Eq. (4.3). is in line with the results in [162] that showed ordinal partition networks from chaotic signals tend to have persistence diagrams with many features in 𝐻1 in comparison to their periodic counterpart. These additional insights through the zigzag persistence provide a helpful insight into analyzing temporal graphs that is not possible with standard statistics. 4.4 Conclusion In this work we studied how to effectively apply zigzag persistence to temporal graphs. Zigzag persistence provides a unique perspective when studying the evolving structure of a temporal graph by tracking the standard lower-dimensional features (e.g., connected components), but also higher- dimensional features (e.g., loops and voids). We showed the benefits of using zigzag persistence on two examples: the Great Britain transportation network and ordinal partition networks. Our results showed that the informative zero and one-dimensional zigzag persistence provided insights into the structure of the temporal graph that were not easily gleaned from standard centrality and connectivity statistics. We believe zigzag persistence could also be leveraged to study other temporal graphs including flock behavior models (e.g., viscsek model) and the emergence of coordinated motion, power 166 grid dynamics with the topological characteristics of a cascade failures, and supplier-manufacture networks through the effects of trade failures on production and consumption. Future work to improve this method would involve an analysis on deciding an optimal window size and overlap, a method to incorporate edge weight and directionality, and temporal information on both the nodes and edges. It would also be worth investigating higher-dimensional features (e.g., voids through 𝐻2 ). 167 CHAPTER 5 PERSISTENT HOMOLOGY OF DYNAMICAL NETWORKS This auxillary chapter of my research introduces some of the data sets used throughout my research and the software packages developed. Namely, the two main experimental data sets are from a magnetic single pendulum (see Section 5.1) and a tracked double pendulum [165]. I did not include the extensive double pendulum documentation in this document. However, the open- source publication is available [165]. Throughout my research project I have also been contributing and developing the website documentation for teaspoon, which is an open source topological signal processing package available through Python. 5.1 Experiment: Magnetic Pendulum Note: a Computer Aided Design (CAD) model and design document for the pendulum used for the experimental section of this manuscipt is available through GitHub at https://github.com/Khasawneh- Lab/simple_pendulum. The driven magnetic pendulum is a well known system to exhibit chaos [117, 214, 231]. There- fore, I designed and built a magnetic pendulum apparatus, and utilized the ordinal partition embed- ding and TDA to characterize the dynamics of the resulting signals. In this section I derive a simplified equation of motion using Lagrange’s approach. The design, manufacturing, and equipment used for the experiment are also explained. Additionally, I describe our methods for estimating and measuring the constants that appear in the equation of motion. 5.1.1 Mathematical Model I begin by deriving the equations of motion for the physical system shown in Fig. 5.1. Let the total mass of the rotating components be 𝑀, the distance from the rotation center 𝑂 to the mass center of the rotating assembly 𝑟 cm , and the mass moment of inertia of the rotating components about their mass center be 𝐼cm . Further, assume that the magnetic interactions are well approximated by 168 a dipole model with 𝑚 1 = 𝑚 2 = 𝑚 representing the magnitudes of the dipole moment. To develop Datum Base Excitation Figure 5.1: Rendering of experimental setup in comparison to reduced model, where 𝑏(𝑡) = 𝐴 sin(𝜔𝑡) is the base excitation with frequency 𝜔 and amplitude 𝐴, 𝑟 𝑐𝑚 is the effective center of mass of the pendulum, 𝑑 is the minimum distance between magnets 𝑚 1 = 𝑚 2 = 𝑚 (modeled as dipoles), and ℓ is the length of the pendulum. the equation of motion, I use Lagrange’s equation (Eq. (5.9)), so the potential energy 𝑉, kinetic energy 𝑇, and non-conservative moments 𝑅 are needed. In this analysis the damping moments and the moments generated from the magnetic interaction are treated as non-conservative. The potential and kinetic energy are defined as 1 1 𝑇= 𝑀 |®𝑣 𝑐𝑚 | 2 + 𝐼𝑐𝑚 𝜃¤2 , 2 2 (5.1) 𝑉 = −𝑀𝑔𝑟 𝑐𝑚 cos(𝜃), where 𝑣®𝑐𝑚 is the velocity of the mass center given by 𝑣®𝑐𝑚 = 𝑟 𝑐𝑚 𝜃¤ cos(𝜃) 𝜖ˆ𝑥 + sin(𝜃) 𝜖ˆ𝑦 + 𝐴 cos(𝜔𝑡) 𝜖ˆ𝑥 .   (5.2) In Eq. (5.2), 𝐴 cos(𝜔𝑡) is introduced from the base excitation 𝑏(𝑡) = 𝐴 cos(𝜔𝑡) in the 𝑥 direction with 𝐴 as the amplitude and 𝜔 as the frequency and 𝜖ˆ𝑥 and 𝜖ˆ𝑦 are the unit vectors in the 𝑥 and 𝑦 directions, respectively. The non-conservative moments are caused by the energy lost to damping. For our analysis, I consider three possible mechanisms of energy dissipation: Coulomb damping 𝜏𝑐 , viscous damping 𝜏𝑣 , and quadratic damping 𝜏𝑞 . I chose to use all three mechanisms of damping due to previous work on damping estimation for a pendulum similar to the one I used [183]. These three moments 169 are defined as ¤ 𝜏𝑐 = 𝜇𝑐 sgn( 𝜃), 𝜏𝑣 = 𝜇𝑣 𝜃,¤ (5.3) 𝜏𝑞 = 𝜇 𝑞 𝜃¤2 sgn( 𝜃), ¤ where 𝜇𝑐 , 𝜇𝑣 , and 𝜇 𝑞 are the coefficient for Coulomb, viscous, and quadratic damping, respectively. To begin the derivation of the torque induced from the magnetic interaction 𝜏𝑚 , consider two, in-plane magnets as shown on the left side of Fig. 5.2. The red side of the magnet in the figure represents its north-pole. From this representation, the magnetic force acting on each magnet is calculated as 3𝜇 𝑜 𝑚 2 𝐹𝑟 = [2𝑐(𝜙 − 𝛼)𝑐(𝜙 − 𝛽) − 𝑠(𝜙 − 𝛼)𝑠(𝜙 − 𝛽)] , 4𝜋𝑟 4 (5.4) 3𝜇 𝑜 𝑚 2 𝐹𝜙 = [𝑠(2𝜙 − 𝛼 − 𝛽)] , 4𝜋𝑟 4 where 𝑚 1 and 𝑚 2 are the magnetic moments, 𝜇 𝑜 is the magnetic permeability of free space, and 𝑐(∗) = sin(∗) and 𝑠(∗) = sin(∗). Equation (5.4) assumes that the cylindrical magnets used in the experiment can be approximated as a dipole. I later show that this assumption is satisfactory in Fig. 5.4 of Section 5.1.3. These magnetic forces are then adapted to the physical pendulum as Figure 5.2: A comparison between a generic, in-plane magnetic model in global coordinates and the equivalent magnetic forces in the pendulum model 𝐹𝑟 and 𝐹𝜙 (see Eq. (5.4)). shown on the right side of Fig. 5.2, with 𝛼 = 𝜋/2 and 𝛽 = 𝜋/2 − 𝜃. Additionally, 𝜙 and 𝑟 are 170 calculated from 𝜃, 𝑑, and ℓ from Fig. 5.1 as   𝜋 ℓ 𝜙 = − arcsin sin(𝜃) , and (5.5) 2 𝑟 √︃ 𝑟 = [ℓ sin(𝜃)] 2 + [𝑑 + ℓ(1 − cos(𝜃))] 2 . (5.6) The moment induced by the magnetic interaction is then 𝜏𝑚 = ℓ𝐹𝑟 cos(𝜙 − 𝜃) − ℓ𝐹𝜙 sin(𝜙 − 𝜃). (5.7) Using 𝜏𝑚 from Eq. (5.7) and the non-conservative torques from Eq. (5.3), 𝑅 is defined as 𝑅 = 𝜏𝑐 + 𝜏𝑣 + 𝜏𝑞 + 𝜏𝑚 . (5.8) Finally, the equation of motion for the base-excited magnetic single pendulum is found by substi- tuting the above expressions into Lagrange’s equation and noting that 𝐿 = 𝑇 − 𝑉   𝜕 𝜕𝐿 𝜕𝐿 − + 𝑅 = 0. (5.9) 𝜕𝑡 𝜕 𝜃¤ 𝜕𝜃 Equation (5.9) was symbolically manipulated to express it in state space format using Python’s Sympy package. Then, the system was simulated at a frequency of 𝑓𝑠 = 60 Hz using Python’s odeint function from the Scipy library. 5.1.2 Equipment and Experimental Design The setup of the experiment was manufactured by extending the capabilities of a previously manufactured simple pendulum [183]. To increase the non-linearity, in-plane magnets on the base as well as at the end of the pendulum were added. To assume a permeability of free space 𝜇0 , any ferromagnetic material within the vicinity was removed, which made the use of 3D printed components critical. In Fig. 5.3 an overview of the utilized, 3D-printed components are shown. Specifically, Figs. 5.3 (a) and (b) show exploded views of the end mass of the pendulum, and the linear stage for controlling the distance 𝑑, respectively. The magnets used are two, approximately identical, rare-earth (neodymium) N52 permanent magnets with a radius and length of 6.35 mm (1/4"). 171 10-32 Nylon Nut 3D PLA Print 10-32 Nylon Nut N52 Magnet 10-32 Nylon Bolt 10-32 Nylon Nut N52 Magnet 3D PLA Print Figure 5.3: Manufacturing overview with experimental setup. In Fig. (a), an exploded view of the end mass (100% infill 3D printed PLA components) is shown with the magnet press fit into end of pendulum. In Fig. (b), an exploded view of the linear stage controlling the vertical position of the lower magnet. Table 5.1 provides a list of the item, description, and manufacturer for all of the experimental equipment used to collect the rotational data from the magnetic single pendulum under base excitation. Table 5.1: Equipment used for experimental data collection. Item Description Manufacturer Shaker 113 Electro-Seis APS DC Power Supply Model 1761 BK Precision Accelerometer Model 352C22 Piezotronics Rotary Encoder UCD-AC005-0413 Posital Data Acquisition USB-6356 Nat. Inst. PC OptiPlex 7050 Dell 5.1.3 Physical Parameters and Constants To estimate the magnetic dipole moment 𝑚 of the cylindrical magnets used (see Fig. 5.3), I performed an experiment similar to the one described in [85]. When the distance between the magnets is less than a critical value 𝑟 𝑐 , modeling the magnets as dipoles can lead to large errors 172 since the dipole model does not accurately approximate the repulsive force between the magnets. This distance was estimated as 𝑟 𝑐 = 0.035 m (see Fig. 5.4). Additionally, in the region where 𝑟 > 𝑟 𝑐 , the force curve, a function of scale 𝑟 −4 , was fit to the curve to estimate the magnetic dipole moment as 𝑚 = 0.85 Cm. Figure 5.4: Measured repulsion force as a function of distance compared to theoretical force in Eq. (5.4) with 𝜃 = 0. The theoretical force 𝐹theory is based on dipole model with a dipole moment 𝑚 = 0.85 cm, which was estimated using a curve fit to the region where the magnetic thickness 𝑇 ≪ 𝑟. Region of poor fit is marked for 𝑟 < 0.035 m. The other parameter values as well as their uncertainties (when applicable) are provided in Table 5.2, which are in reference to Fig. 5.1. Most of these parameters were either estimated using SolidWorks or by multiple direct measurements. Table 5.2: Equation of motion parameters to simulated pendulum with associated uncertainty. Parameter (units) Value Uncertainty (±𝜎) 𝑑 (m) 0.36 0.005 ℓ (m) 0.208 0.005 𝑔 (m/s2 ) 9.81 - 𝑀 (kg) 0.1038 0.005 𝑟 cm (m) 0.188 - 𝜔 (rad/s) 3𝜋 - 𝜇0 (Cm) 1.257 × 10−6 - 𝑚 (Cm) 0.85 - 𝜇𝑐 (-) 0.002540 0.000020 𝜇𝑣 (-) 0.000015 0.000003 𝜇 𝑞 (-) 0.000151 0.000020 173 Figure 5.5: Free drop test between collect angular position data 𝜃 data with encoder uncertainty 𝜎data and the simulated response 𝜃 sim . As shown in the zoomed-in region, the simulated response is within the bounds of uncertainty of the actual response. To validate the parameters, an experiment and simulation of a free drop of the pendulum are compared. The resulting angle 𝜃 (𝑡) is shown in Fig. 5.5, which shows a very similar response between simulation and experiment. Additionally, the simulation is within the bounds of uncertainty of the encoder 𝜎data = 1◦ as shown in the zoomed in region of Fig. 5.5. 5.2 Teaspoon: A comprehensive python package for topological signal pro- cessing Topological signal processing is a newly emerging field with an ever growing collection of tools. Us- ing Topological Data Analysis (TDA) for signal processing allows for an analysis of the underlying shape of a time series. These methods are well backed by theory [177,203] and have shown success in numerous application areas including machining dynamics [111–114, 116], finance [82, 83], and gene expression [21, 179]. Here I present the python package, teaspoon, that provides state-of-the-art topological signal processing tools as well as wrappers for available persistent homology software. While some TDA based packages exist for python (e.g. Scikit-TDA and Giotto-TDA), the teaspoon package specifically provides modules design to tackle questions related to signal processing and time series analysis from the viewpoint of topology. In comparison, other existing packages are designed for more general applications for TDA. In the teaspoon package there are currently five main modules: dynamical systems, machine learning, complex networks, information, and parameter selection with several sub-modules for each 174 as shown in Fig. 5.6. The dynamical systems library is currently hosting 60 dynamical systems including maps, flows, and collected data sets. The machine learning library contains code for numerous persistence diagram featuriztion and kernel methods. Specifically, this module includes the template function featurization methods described in [181,232] as well as persistence landscapes [28], persistence images [1], Carlsson coordinates [2], persistence paths and signature [43, 44] and the multi-scale kernel method [196]. The complex networks module contains code to represent a time series as a network using ordinal partitions [146] or 𝑘 nearest neighbors [118]. This module also provides several methods for calculating distances between nodes based on the adjacency matrix, which allows for the calculation of the persistent homology of the resulting networks. The information theory module implements entropy based functions for signal processing persistence diagram analysis. Lastly, the parameter selection module currently provides multiple algorithms for the automatic selection of the delay 𝜏 and dimension 𝑛 parameters for state space reconstruction and permutation entropy. In this work, I outline the features available in each module as well as features that will be added in the future. The goal of this package is to provide a range of topological signal processing tools in one unified framework. Additionally, for most of these modules, further documentation and examples of the functions are provided in the teaspoon documentation webpage1 . Figure 5.6: Tree structure of teaspoon. 175 5.2.1 Dynamical Systems Library (DynSysLib) The dynamical systems library (DynSysLib) is a teaspoon module that provides a wide selection of dynamical system simulation models with many from [220]. Most of the available dynamical systems are able to exhibit both periodic and chaotic responses. In general, these systems can be separated into three categories: (1) flows, (2) Maps, and (3) Collected data. A full list of the available dynamical systems are provided in tables C.2 and C.3 of the appendix. The module has a single function DynamicSystems, which allows a wide range of the simulation control with the user being able to control as little as the system of interest and the desired dynamical state (chaotic or periodic) or provide detailed simulation parameters such as initial conditions, system parameters, and solution time. The function output is the resulting time series response. For details on the default parameters used, equations of motion, and examples, please see the teaspoon documentation webpage1. 5.2.2 Machine Learning Module In this section, I describe the machine learning module in teaspoon. Machine learning module provides automated feature matrix generation and classification, and it is suitable for the applications where persistence diagrams can be computed. There are three main files inside the module and these are 𝐵𝑎𝑠𝑒.𝑝𝑦, 𝑓 𝑒𝑎𝑡𝑢𝑟𝑒_ 𝑓 𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠.𝑝𝑦, and 𝑃𝐷_𝐶𝑙𝑎𝑠𝑠𝑖 𝑓 𝑖𝑐𝑎𝑡𝑖𝑜𝑛.𝑝𝑦. Here, I will explain the necessary functions in each of these files and show how to use these functions to perform machine learning using Topological Data Analysis (TDA). Parameter Buckets The parameter bucket is a tool to hold all necessary parameters for the fea- turization functions as well as the classification algorithms. This includes parameters such as the classification algorithm, the size of the test set, as well as the desired persistence diagram featur- ization method. The parameter buckets are implemented as classes in the 𝐵𝑎𝑠𝑒.𝑝𝑦 file. The basic structure is implemented as a class 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝐵𝑢𝑐𝑘𝑒𝑡, however there are two more specialized 1 http://elizabethmunch.com/code/teaspoon 176 classes, 𝐼𝑛𝑡𝑒𝑟 𝑃𝑜𝑙 𝑦𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑠 and 𝑇 𝑒𝑛𝑡𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑠 that are dedicated for parameters to the template functions introduced in Ref. [181]. These parameter buckets also have the functionality to use the template function featurization on localized regions of the persistence diagrams, using an adaptive partitioning method described in Ref. [232]. The rest of the parameter buckets are used for other featurization methods. The 𝐿𝑎𝑛𝑑𝑠𝑐𝑎 𝑝𝑒𝑠𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝐵𝑢𝑐𝑘𝑒𝑡 is for persistence landscapes [28], which requires an input for the landscape number that will be used to generate feature matrix. The 𝐶 𝐿_𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝐵𝑢𝑐𝑘𝑒𝑡 is used to set parameters for classification using Persistence Images [1], Carlsoon Coordinates [2], persistence paths and signature [43, 44] and kernel method [196]. Featurization The file 𝑓 𝑒𝑎𝑡𝑢𝑟𝑒_ 𝑓 𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠.𝑝𝑦 contains functions that compute the topological features mentioned above. 𝐹_ and 𝐶 𝐿_ suffixes indicate that corresponding functions are designed for featurization and classification, respectively. First, for the template featurizations, there are two main functions, 𝑡𝑒𝑛𝑡𝑠 and 𝑖𝑛𝑡𝑒𝑟 𝑝_𝑝𝑜𝑙 𝑦𝑛𝑜𝑚𝑖𝑎𝑙. These functions compute the collection of template functions based on a grid formed using parameters from the corresponding parameter buckets. In addition to these, there is 𝑃𝐿𝑎𝑛𝑑𝑠𝑐𝑎 𝑝𝑒 class that uses 𝑃𝐿𝑎𝑛𝑑𝑠𝑐𝑎 𝑝𝑒𝑠 function to compute the persistence landscapes for a given persistence diagram [28]. This class has an option to define 𝐿_𝑛𝑢𝑚𝑏𝑒𝑟 which returns specific landscapes in an array. Output of 𝑃𝐿𝑎𝑛𝑑𝑠𝑐𝑎 𝑝𝑒𝑠 is a dictionary that includes all landscapes, total number of landscapes and the desired landscapes if user defines 𝐿_𝑛𝑢𝑚𝑏𝑒𝑟. 𝑃𝐿𝑎𝑛𝑑𝑠𝑐𝑎 𝑝𝑒 class can also plot persistence landscapes. If user does not define the desired landscapes to plot, all landscapes will be plotted. 𝐹_𝐿𝑎𝑛𝑑𝑠𝑐𝑎 𝑝𝑒 uses persistence landscapes to compute feature matrix as explained in Ref. [147]. The inputs of the function are persistence landscapes, parameter bucket object that is explained in Sec. 5.2.2. The second featurization method is persistence images. I utilized https://gitlab.com/csu-tda/PersistenceImagesP Images package to compute persistence images. 𝐹_𝐼𝑚𝑎𝑔𝑒 takes persistence diagrams, pixel size, variance of the Gaussian distribution, the numbers of persistence diagrams whose image will be 177 plotted, and transfer learning option. If transfer learning option is set to true, second set of per- sistence diagrams should be provided. Then, it will compute feature matrices for both sets of diagrams. Carlsson Coordinates is the third featurization method [2]. It has five coordinates that depend on birth and death times of persistence diagrams. 𝐹_𝐶𝐶𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒𝑠 takes persistence diagrams and computes these five features. It has second input, 𝐹 𝑁 that defines how many feature 𝐹Í𝑁 𝐹𝑁 will be computed. Feature vectors are generated using 𝑖 combinations of five coordinates. 𝑖=1 𝐹_𝐶𝐶𝑜𝑜𝑟 𝑑𝑖𝑛𝑎𝑡𝑒𝑠 will return these feature vectors, number of combinations and combinations in a list. Another featurization method is persistence path and signatures [43, 44]. 𝐹_𝑃𝑆𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒 function computes signatures on persistence landscapes. The first two levels of signatures are currently coded in the function. The inputs are persistence landscapes and the number of the landscape which will be used to compute the signatures. Then it returns the feature matrix to be used in the classification. Final featurization method is kernel method for persistence diagrams. 𝐾𝑒𝑟𝑛𝑒𝑙 𝑀𝑒𝑡ℎ𝑜𝑑 computes the kernel between given two persistence diagrams. It also has 𝑠𝑖𝑔𝑚𝑎 input which is a variable in the formula of the kernel given in Ref. [196]. After computing pairwise kernels between the diagrams, it can be used as pre-defined kernel in Support Vector Machine (SVM) algorithm for classification. Classification Classification functions are embedded in 𝑃𝐷_𝐶𝑙𝑎𝑠𝑠𝑖 𝑓 𝑖𝑐𝑎𝑡𝑖𝑜𝑛. Most of the func- tions take feature functions and parameter bucket object as input. They divide the given feature matrix into training set and test set with respect to test size defined in the parameter bucket. Clas- sification can be performed using four classification algorithms: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF) and Gradient Boosting (GB). For the kernel method, LibSVM package [39] is utilized to insert pre-computed kernel matrix for classification. Addition- ally, the featurization methods can be used to create feature vectors compatible with any scikit-learn classification algorithm. I also include the option of Transfer learning in classification for most of the featurization 178 methods except kernel method. In this type of classification, a classifier is trained on a data set and tested on another one. One can refer to Ref. [171] for more details about transfer learning. When user defines the transfer learning as true in parameter bucket, feature functions will be computed for training and test persistence diagrams separately. In both classification type, training and test set will be generated 10 times randomly. Mean classification score, standard deviation for training and test set and total runtime for the classification are given as output. 5.2.3 Complex Networks Module The teaspoon module provides the Python implementation of the algorithms used in [162], which provides methods for analyzing the dynamic state of a time series based on the persistent homology of the network representations of time series. The general pipeline, as shown in Fig. 5.7, is as follows: (1) represent a time series as a network as described in Section 5.2.3, (2) Generate a distance matrix from the undirected and weighted adjacency matrix as described in section 5.2.3, and (3) apply 1-D persistent homology to the distance matrix. The persistence diagram point summaries can be generated to analyze the dynamic state of the underlying time series. Figure 5.7: The persistent homology of complex networks pipeline. Network Representations of Time Series There are currently two available algorithms in the complex networks module to represent a time series as a complex network. Specifically, these are 𝑘 Nearest Neighbor (𝑘-NN) networks [118] and ordinal partition networks [146]. For the implementation of these algorithms I use the adjacency matrix as the graph data structure. For the ordinal partition network a permutation sequence needs to be generated by using the function permutation_sequence, which requires a time series and the permutation dimension 179 𝑛 and delay 𝜏. For selecting the dimension and delay I suggest using the parameter selection module. Using the permutation sequence, the resulting adjacency matrix is formed using the AdjacenyMatrix_OP function, which creates edges in the graph based on permutation transitions. Two steps are required to generate 𝑘-NN networks. First, the time series needs to have its state space reconstructed through Takens’ embedding, which is done through the function Takens_Embedding. This function requires the time series and the embedding dimension and delay. The dimension and delay can be selected using the parameter selection module. Next, the 𝑘-NN are found using the k_NN function and specifying 𝑘 which has a default of 𝑘 = 4. Using the list of neighbors, an adjacency matrix is formed using the Adjacency_KNN function by treating each embedded vector as a node and adding edges when two nodes are 𝑘-NN. The next step in the pipeline is to define algorithms to represent distances between nodes in the network based on the adjacency matrix, which is discussed in the subsequent section. Distance Matrix Two steps are required to assign distances between nodes in a network: (1) apply an edge weight algorithm to represent distances for adjacent nodes and (2) implement a distance algorithm for non-adjacent nodes. For the first step I provide the following edge weight functions: unweighted, inverse, and difference. Specifically, the unweighted option changes all the edge weights to 1, the inverse sets the weight to the element-wise reciprocal, and the difference finds the maximum edge weight and sets the new edge weight as the difference between the max edge weight and that edge’s weight. The second step requires a method for defining distances between non-adjacent nodes. To do this I offer two options: the shortest-path distance and effective network resistance [71]. Both of these steps are implemented through the DistanceMatrix function. 5.2.4 Information Module The information theory module currently provides three functions for information entropy calcula- tions. The first two are the calculation of the permutation entropy [14] and multi-scale permutation 180 entropy as PE and MsPE, respectively. Permutation entropy has been shown to be a useful tool for analyzing signal complexity and has very few requirments for its application. The third function is the persistent entropy [9] through the function PersistentEntropy, which calculates the entropy of a persistence diagram given the lifetimes from the persistence diagram. 5.2.5 Parameter Selection Module The parameter selection module provides code for the functions used in [161] and [11] for automat- ically calculating the dimension 𝑛 and delay 𝜏 parameters for both permutation entropy and Takens’ embedding (state space reconstruction). For details on each of the methods please reference their respective publications as some are more suitable for non-linear time series or have time series requirements. A comprehensive list of the available methods are provided in Table C.4. 181 APPENDICES 182 APPENDIX A PERMUTATION ENTROPY PARAMETER SELECTION A.1 MPE Effects of Noise Figure A.1: Region N is affected by noise in the MPE plot, and region S is unaffected. Effects of Noise We found that the main advantage of using MPE for determining the embedding delay is its robustness to noise. Noise on an MPE plot has minimal effects on regions B and C from Fig. 2.11, while only significantly affecting region A as shown in Fig A.1. Furthermore, depending on the signal to noise ratio, there will only be an effect at the beginning of region A. Figure A.1 shows the first region N where noise is affecting the permutation entropy. The effect of noise causes the MPE plot to start at a maxima and decrease to a local minima. When the time delay becomes large enough, the permutations are no longer influenced by the noise causing this minima. We found that the location of the minima is based on the condition 𝑚 avg 𝜏𝑁 ≈ 𝐴noise 𝑓𝑠 , (A.1) where 𝑚 avg is the average of the absolute value of the slope and 𝐴noise is approximately the maximum amplitude of the noise, 𝜏𝑁 is the value of 𝜏 great enough to surpass the noise amplitude. We derived this condition from the need for, on average, | 𝑓 (𝑡) − 𝑓 (𝑡 + 𝜏)| > 𝐴noise . This shows that MPE is robust to noise as long as the noise amplitude does not exceed the amplitude of the signal. 183 A.2 Autocorrelation Methods and Example Pearson Correlation The Pearson correlation coefficient 𝜌𝑥𝑦 ∈ [−1, 1] measures the linear correlation of two time series 𝑥 and 𝑦. Using these two data sets the correlation coefficient is calculated as Í𝑛 𝑖=1 (𝑥𝑖 − 𝑥)(𝑦 ¯ 𝑖 − 𝑦¯ ) 𝜌𝑥𝑦 = √︃ √︃ . (A.2) Í𝑛 2 Í𝑛 2 𝑖=1 (𝑥𝑖 − 𝑥)¯ 𝑖=1 (𝑦 𝑖 − 𝑦¯ ) The possible values of 𝜌𝑥𝑦 represent the relationship between the two data sets, where 𝜌𝑥𝑦 = 1 represents a perfect positive linear correlation, 𝜌𝑥𝑦 = 0 represents no linear correlation, while 𝜌𝑥𝑦 = −1 represents a perfect negative linear correlation. However, Pearson correlation is limited because it only detects linear correlations. This limitation is somewhat alleviated by using Spearman’s Correlation which operates on the ordinal ranking of the two time series instead of their numeric values. Spearman’s Correlation Spearman’s correlation is also calculated using Eq. (A.2) with the substitution of 𝑥 and 𝑦 for their ordinal ranking. This substitution allows for detecting nonlinear correlation trends to be represented as long as the correlation is monotonic. To demonstrate the difference, Fig. A.2 shows two sequences 𝑥 and 𝑦 calculated from 𝑦 = 𝑥 4 with 𝑥 ∈ [0, 10]. Using this example, the Pearson correlation is calculated as 𝜌 ≈ 0.86, while Spearman’s ranked correlation yields 𝜌 = 1.0. This result demonstrates how Spearman’s correlation coefficient accurately detects the non-linear, monotonic correlation between 𝑥 and 𝑦 whereas Pearson correlation may miss it. Figure A.2: A comparison between (left) unranked values and (right) ranked values for calculating correlation coefficients. Using the ranked 𝑥 and 𝑦, Spearman’s correlation coefficient can be used to accurately reveal existing nonlinear monotonic correlations. 184 Autocorrelation Example We can use the concept of correlation to select a delay 𝜏 by calculating the correlation coefficient using Eq. (A.2) between a time series and its 𝜏-lagged version. As an example, take the time series 𝑥(𝑡) = sin(2𝜋𝑡), with 𝑡 ∈ [0, 5] having a sampling frequency of 100 Hz. This results in a suggested delay 𝜏 = 20 at the first folding time using both Spearman’s and Pearson correlation. A.3 MI methods MI using Equal-sized Partitions For the calculation of MI, the joint and independent proba- bilities of the original 𝑥(𝑡) and time lagged 𝑥(𝑡 + 𝜏) time series are needed. However, since 𝑥 is a discrete time series, we approximate these probabilities using bins, which segment the range of the series into discrete groups. The simplest method for approximating the probabilities using this discretization method is to use equal sized bins. However, the size of these bins is dependent on the number of bins 𝑘. We investigated various methods for estimating an appropriate number of bins using the length of the time series 𝑁. These methods include the common square-root choice √ 𝑘 = ⌈ 𝑁⌉, Sturge’s formula [223] 𝑘 = ⌈log2 (𝑁)⌉ + 1, and Rice Rule [126] 𝑘 = ⌈2𝑁 1/3 ⌉. After comparing each method using a variety of examples, we found that the use of Sturge’s formula provided the best results for selecting 𝜏 for PE using MI. MI using Adaptive Partitions Darbellay and Vajda [55] introduced a multistep, adaptive parti- tioning scheme to select appropriate binning sizes in the observation space formed by the plane 𝑥(𝑡) and 𝑥(𝑡 +𝜏). Their method is often considered state-of-the-art for estimating the mutual information function [119]. In this approach, the bins are recursively created where in the first function call, the space of the signal and its 𝜏-lagged version is divided into an equal number of 2D bins. Then a A chi-squared test is used to test the null hypothesis that the data within the newly created bins are independent. Any segment that fails the test is further divided until the resulting sub-segments contain independent data (or a certain number of divisions is satisfied). Using this partitioning method, the MI is calculated using Eq. (2.14). 185 Kraskov MI Kraskov et al. [119] developed a method for approximating the MI using entropy estimates using partition sizes based on 𝑘-nearest neighbors. Specifically, the method begins by first calculating the MI using entropy [52] as 𝐼 (𝑋; 𝑌 ) = 𝐻 (𝑋) + 𝐻 (𝑌 ) − 𝐻 (𝑋, 𝑌 ), (A.3) where 𝐻 is the Shannon entropy. Next, an approximation of 𝐻 (𝑋) with digamma functions is done, but the probability density of 𝑋 and 𝑌 still needs to be estimated. To do this, adaptive partitions using the 𝑘-nearest neighbor are formed. Specifically Kraskov et al. develop two different partitioning methods with similar results. The first method uses the maximum Chebyshev distance to the 𝑘 = 1 nearest neighbor 𝑗 to form square bins as shown in Fig. A.3-a, and the second method in Fig. A.3-b uses rectangular partitions using the horizontal and vertical distances to the 𝑘 = 1 nearest neighbor 𝑗. To continue with the example shown in Fig. A.3, the density probability is estimated using the (a) (b) Figure A.3: Example showing two different partition methods for Mutual Information estimation using 𝑘 = 1 nearest neighbor adaptive partitioning. strips formed from these bins. To highlight the difference, Fig. A.3-a shows a horizontal strip of width 𝜖 (𝑖) encapsulating 𝑛𝑥 (𝑖) = 2 points (strip does not include the point 𝑖), while in Fig. A.3-b only 𝑛𝑥 (𝑖) = 1 point is enclosed. Using these probability density approximations and the digamma function 𝜓, MI between 𝑋 and 𝑌 can be estimated. Using the partitioning method shown in Fig. A.3-a the MI is estimated as 𝐼 (1) (𝑋; 𝑌 ) = 𝜓(𝑘) − (𝜓(𝑛𝑥 + 1) + 𝜓(𝑛 𝑦 + 1) + 𝜓(𝑁). (A.4) Using the partitioning method shown in Fig. A.3-b the MI is estimated as 𝐼 (2) (𝑋; 𝑌 ) = 𝜓(𝑘) − 1/𝑘 − [𝜓(𝑛𝑥 ) + 𝜓(𝑛 𝑦 )] + 𝜓(𝑁). (A.5) 186 A.4 Tabulated PE parameters Table A.1: A comparison between the calculated and suggested values for the delay parameter 𝜏 for multiple MI approximation methods. The cells in bold highlight the methods that yielded the closest match to the suggested delay. The equal-sized partition method is described in Section A.3, Kraskov et al. methods 1 and 2 in Section A.3, and the adaptive partitioning approach in Section A.3. Mutual Information Suggested System Equal-sized Kraskov et al. Kraskov et al. Adaptive Ref. Delay tau Partitions Method 1 Method 2 Partitions White Noise 1 3 3 1 1 [201] Lorenz 13 9 9 9 10 [201] Rossler 14 13 11 9 9 [227] Bi-directional 16 14 14 15 15 [201] Rossler Mackey-Glass 7 8 7 7 1 to 700 [201] Sine Wave 4 17 13 1 15 [227] Logistic Map 5 8 11 5 1 to 5 [201] Henon Map 12 15 13 8 1 to 5 [201] ECG 22 16 9 8 1 to 4 [201] EEG 6 5 5 5 1 to 3 [201] Table A.2: A comparison between the calculated and suggested values for the delay parameter 𝜏. The cells in bold show the methods that yielded the closest match to the suggested delay. The following conditions or abbreviations were used in the table: the range under PAMI results is from using the range (4 < 𝑛 < 6), AP under MI is an abbreviation for adaptive partitioning, and AC is an abbreviation for autocorrelation. Traditional Methods Modified/Proposed Methods Suggested Catagory System Ref. PAMI Delay (𝜏) MI using AP Spearman’s AC Freq. App. MPE (4 ≤ n ≤ 6) Noise White Noise 1 1 1 1 1 1 [201] Lorenz 9 15 6 17 5 to 9 10 [201] Chaotic Rossler 9 12 7 19 6 to 10 9 [227] Differential Bi-directional Equation 15 12 7 20 6 to 10 15 [201] Rossler Mackey-Glass 7 5 3 8 2 to 4 1 to 700 [201] Periodic Sine Wave 1 10 21 16 5 to 8 15 [227] Nonlinear Logistic Map 5 1 1 1 1 1 to 5 [201] Difference Eq. Henon Map 8 1 1 1 1 1 to 5 [201] Medical ECG 8 21 2 13 1 to 2 1 to 4 [201] Data EEG 5 4 1 4 2 to 4 1 to 3 [201] 187 Table A.3: A comparison between the calculated and suggested values for the embedding dimension 𝑛. The cells in bold show the methods that yielded the closest match to the suggested dimension. Traditional Modified Suggested Catagory System Methods Method Ref. Dim. (n) FNN SSA MPE White Noise 4 23 5 3 to 7 [201] Noise Lorenz 3 4 5 5 to 7 [201] Chaotic Rossler 4 4 4 6 [227] Differential Bi-directional Equation 4 4 4 6 to 7 [201] Rossler Mackey-Glass 4 6 4 4 to 8 [201] Sine Periodic 4 2 3 4 [227] Wave Logistic Nonlinear 4 3 5 2 to 16 [201] Map Difference Henon Equation 4 2 5 3 to 10 [201] Map Medical ECG 7 8 5 3 to 7 [201] Data EEG 5 11 6 3 to 7 [201] 188 APPENDIX B SUBLEVEL SET PERSISTENCE AND DAMPING PARAMETER ESTIMATION In appendix B, we provide an omitted proof and algorithm. Specifically, we have included the theorem showing the relationship between the mean lifetime and mean birth and death times of a persistence diagram and the algorithm for calculating the sublevel set persistence diagram. B.1 Proof of Expected Lifetime Equation The following proof supports a claim made in 1.2.1. In what follows, we will use the notation 𝜇 𝑆 to denote the expected value of the distribution over the multi-set 𝑆. Theorem B.1.1 (Expected Lifetime). Let D = {(𝑏𝑖 , 𝑑𝑖 )}𝑖=1 𝑛 be a persistence diagram. Let 𝐵, 𝐷, and 𝐿 be the multi-sets of birth times, death times, and lifetimes, respectively. Then, the average lifetime is: 𝜇𝐿 = 𝜇𝐷 − 𝜇𝐵 . 𝑛 𝑛 Proof. By definition, 𝐵 = {𝑏𝑖 }𝑖=1 , 𝐷 = {𝑑𝑖 }𝑖=1 , and 𝐿 = {𝑑𝑖 − 𝑏𝑖 }𝑖=1 𝑛 . By definition of mean and of 𝐿, the mean lifetime is 𝑛 1 ∑︁ 𝜇𝐿 = (𝑑𝑖 − 𝑏𝑖 ). (B.1) 𝑛 𝑖=1 Expanding the sum to two separate sums and using the commutative property of addition, we get: 𝑛 1 ∑︁ 𝜇𝐿 = (𝑑𝑖 − 𝑏𝑖 ) 𝑛 𝑖=1 𝑛 𝑛 1 ∑︁ 1 ∑︁ (B.2) = 𝑑𝑖 − 𝑏𝑖 𝑛 𝑖=1 𝑛 𝑖=1 = 𝜇𝐷 − 𝜇𝐵, where the last equality is by definition of 𝜇 𝐷 and 𝜇 𝐵 . Thus, we conclude that 𝜇 𝐿 = 𝜇 𝐷 − 𝜇 𝐵 . 189 APPENDIX C DYNAMICAL SYSTEMS C.1 Dynamic State Analysis System Models The following 18 continuous and 12 discrete dynamical systems were used throughout this work. For details on their equations of motion and system parameters we direct the reader to the MakeData module in the python package teaspoon [161]. Table C.1: Continuous and discrete dynamical Systems used throughout manuscript. Autonomous Continuous Dynamical Systems Driven Continuous Dynamical Systems Discrete Dynamical Systems Lorenz Driven Van der Pol Oscillator Logistic Map Rossler Shaw Van der Pol Oscillator Henon Map Double Pendulum Forced Brusselator Sine Map Diffusionless Lorenz Attractor Ueda Oscillator Tent Map Complex Butterfly Duffing Van der Pol Oscillator Ricker’s Population Map Chen’s System Base Excited Magnetic Pendulum Gauss Map ACT Attractor Sine Circle Map Rabinovich Frabrikant Attractor Lozi Map Halvorsen’s Cyclically Symmetric Attractor Tinkerbell Map Burke Shaw Attractor Holmes Cubic Map Rucklidge Attractor Kaplan-Yorke Map WINDMI Gingerbread Man Map C.2 All Available Dynamic System Models 190 Table C.2: Available flows and maps in dynamic systems library module. Dissipative Flows Conservative Flows Driven Dissipative Flows Maps Lorenz Att. Simple Driven Driven Pendulum Logistic Rossler Att. Nose-Hoover Osc. Driven Van der Pol Osc. Henon Chua Circuit Labyrinth Chaos Shaw Van der Pol Osc. Sine Coupled Lorenz-Rossler Henon-Heiles Osc. Forced Brusselator Tent Coupled Rossler-Rossler Ueda Osc. Linear Congruent Double Pendulum Duffing’s Two-well Osc. Ricker’s Pop. Diffusionless Lorenz Att. Duffing Van der Pol Osc. Gauss Complex Butterfly Rayleigh-Duffing Osc. Cusp Chen’s Att. Pincher’s Hadley Att. Sine-circle ACT Att. Lozi Rabinovich-Fabrikant Att. Delayed Logistic Rigid Body Feedback Tinkerbell Moore-Spiegel Osc. Burgers Thomas Att. Holmes Halvorsen’s Att. Kaplan-Yorke Burke-Shaw Att. Rucklidge Att. WINDMI Simple Quadratic Flow Simple Cubic Flow Simple Piecewise Flow Double Scroll Table C.3: Available functions, noise models, and medical data in dynamical systems library module. Functions Noise Models Medical Data Sine Gaussian Electrocardiogram Incommensurate Sine Uniform Electroencephalogram Rayleigh 191 Table C.4: Parameter selection methods available in parameter selection module for both the delay and dimension parameters. Algorithm Reference(s) Dimension or Delay Mutual Information [78, 161] Delay Autocorrelation [25, 161] Delay Frequency Analysis [11, 149, 161] Delay Multi-scale Permutation Entropy [161, 200] Delay Permutation Auto-mutual Information [135, 161] Delay SW1PerS [11, 177] Delay False Nearest Neighbors [109, 161] Dimension Multi-scale Permutation Entropy [161, 200] Dimension Singular Spectrum Analysis [27, 161] Dimension 192 APPENDIX D ADDITIONAL DIFFUSION DISTANCE ANALYSIS D.1 Persistence of Cycle Graph The cycle graph on 𝑛 vertices is the graph 𝐺 = (𝑉, 𝐸) with 𝑉 = {𝑣 1 , · · · , 𝑣 𝑛 }, and 𝐸 = {𝑣 𝑖 𝑣 𝑖+1 | 1 ≤ 𝑖 < 𝑛} ∪ {𝑣 𝑛 𝑣 1 }; i.e. it forms a closed path (cycle) where no repetitions occur except for the starting and ending vertices. If we increase the number of nodes from 2 to 500 and calculate the maximum persistence or maximum lifetime, we find that it quickly reaches a maximum of 𝐿 1 = 0.216 at 𝑛 = 32, and then steadily declines seeming to approach a plateau as shown in Fig. D.1. This is Figure D.1: Numerical analysis of the maximum persistence of the cycle graph 𝐺 cycle (𝑛) with size 𝑛 when using diffusion distance with 𝑡 = 2𝑑. in comparison to the unweighted shortest path distance of the cycle graph which has a maximum persistence of ⌈𝑛/3⌉ − 1 as shown in [162]. D.2 Analysis on Random Walk Steps In this section we vary the number of random walk steps 𝑡 with respect to the graph diameter 𝑑 to determine how many steps is suitable for calculating the persistent homology based on the diffusion distance. We vary 𝑡/𝑑 from 1 to 5 as shown in Fig. D.2. To decide on the optimal 𝑡 we calculate the maximum lifetime and number of persistence pairs in each resulting persistence diagram for each of the 23 dynamical systems investigated in this work. Additionally, the average for both the 193 maximum lifetime and number of lifetimes is plotted as shown in Fig. D.2. Figure D.2: Comparison of max 𝐿 1 and #{𝐿 1 } for each system and mean when varying 𝑡 in 𝑃𝑡 with respect to the diameter (𝑡 ∈ [𝑑, 5𝑑]). Based on the each systems maximum lifetimes, a suitable value for 𝑡 should be greater than 𝑑 based on having a 𝑡 large enough that each system reaches a maximum of the max(𝐿 1 ). We can also note that the number of persistence pairs or lifetimes in the persistence diagram does not stabalize for the majority of systems until approximately 𝑡 = 2𝑑/3. This again supports a minimum suggest 𝑡 > 𝑑. The only downfall of larger values of 𝑡 is that the maximum lifetime tends to diminish as shown in the max(𝐿 1 ) figure. Therefor, we conclude that a suitable 𝑡 should be within the range 𝑑 < 𝑡 < 3𝑑. In this work we chose 𝑡 = 2𝑑. 194 BIBLIOGRAPHY 195 REFERENCES [1] Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Ship- man, Sofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier. Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18(8):1–35, 2017. [2] Aaron Adcock, Erik Carlsson, and Gunnar Carlsson. The ring of algebraic functions on persistence bar codes. Homology, Homotopy and Applications, 18(1):381–402, 2016. [3] Robert J. Adler, Omer Bobrowski, Matthew S. Borman, Eliran Subag, and Shmuel Wein- berger. Persistent homology for random fields and complexes. In Institute of Mathematical Statistics Collections, pages 124–143. Institute of Mathematical Statistics, 2010. [4] Robert J. Adler, Omer Bobrowski, and Shmuel Weinberger. Crackle: The homology of noise. Discrete & Computational Geometry, 52(4):680–704, aug 2014. [5] Mehran Ahmadlou and Hojjat Adeli. Visibility graph similarity: A new measure of gen- eralized synchronization in coupled dynamic systems. Physica D: Nonlinear Phenomena, 241(4):326–332, feb 2012. [6] José M. Amigó, Roberto Monetti, Thomas Aschenbrenner, and Wolfram Bunk. Transcripts: An algebraic approach to coupled time series. Chaos: An Interdisciplinary Journal of Nonlinear Science, 22(1):013105, mar 2012. [7] Ralph G. Andrzejak, Klaus Lehnertz, Florian Mormann, Christoph Rieke, Peter David, and Christian E. Elger. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6), nov 2001. [8] Nakhlé H Asmar. Partial differential equations with Fourier series and boundary value problems. Courier Dover Publications, 2016. [9] N. Atienza, L. M. Escudero, M. J. Jimenez, and M. Soriano-Trigueros. Persistent entropy: a scale-invariant topological statistic for analyzing cell arrangements. [10] Nieves Atienza, Rocio Gonzalez-Diaz, and Matteo Rucco. Persistent entropy for separating topological features from noise in vietoris-rips complexes. Journal of Intelligent Information Systems, 52(3):637–655, jul 2017. [11] Brittany T. Fasy Audun D. Myers, Firas A. Khasawneh. Separating persistent homol- ogy of noise from time series data using topological signal processing. arXiv:2012.04039 [math.AT], 2020. [12] Onur Avci, Osama Abdeljaber, Serkan Kiranyaz, Mohammed Hussein, Moncef Gabbouj, and Daniel J. Inman. A review of vibration-based damage detection in civil structures: From traditional methods to machine learning and deep learning applications. Mechanical Systems and Signal Processing, 147:107077, jan 2021. 196 [13] Massoud Babaie-Zadeh and Christian Jutten. A general approach for mutual information minimization and its application to blind source separation. Signal Processing, 85(5):975– 995, may 2005. [14] Christoph Bandt and Bernd Pompe. Permutation entropy: A natural complexity measure for time series. Physical Review Letters, 88(17), apr 2002. [15] Christoph Bandt and Bernd Pompe. Permutation entropy: a natural complexity measure for time series. Physical review letters, 88(17):174102, 2002. [16] Aurelio F. Bariviera, Luciano Zunino, and Osvaldo A. Rosso. An analysis of high-frequency cryptocurrencies prices dynamics using permutation-information-theory quantifiers. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7):075511, jul 2018. [17] Hannah Bast, Daniel Delling, Andrew Goldberg, Matthias Müller-Hannemann, Thomas Pajor, Peter Sanders, Dorothea Wagner, and Renato F. Werneck. Route planning in trans- portation networks. [18] Pierre Baudot and Daniel Bennequin. Topological forms of information. AIP Publishing LLC, 2015. [19] Giancarlo Benettin, Luigi Galgani, Antonio Giorgilli, and Jean-Marie Strelcyn. Lyapunov characteristic exponents for smooth dynamical systems and for hamiltonian systems: A method for computing all of them. part 2: Numerical application. Meccanica, 15(1):21–30, mar 1980. [20] T. Berry, J. R. Cressman, Z. Gregurić-Ferenček, and T. Sauer. Time-scale separation from diffusion-mapped delay coordinates. SIAM Journal on Applied Dynamical Systems, 12(2):618–649, jan 2013. [21] Jesse Berwald and Marian Gidea. Critical transitions in a model of a genetic regulatory system. Mathematical Biosciences & Engineering, 11(4):723–740, 2014. [22] A Block, W Von Bloh, and HJ Schellnhuber. Efficient box-counting determination of generalized fractal dimensions. Physical Review A, 42(4):1869, 1990. [23] Stephen P. Borgatti. Centrality and network flow. Social Networks, 27(1):55–71, jan 2005. [24] L. Borkowski and A. Stefanski. FFT bifurcation analysis of routes to chaos via quasiperiodic solutions. Mathematical Problems in Engineering, 2015:1–9, 2015. [25] George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. Time series analysis: forecasting and control. John Wiley & Sons, 2015. [26] David S Broomhead and Gregory P King. Extracting qualitative dynamics from experimental data. Physica D: Nonlinear Phenomena, 20(2-3):217–236, 1986. [27] David S Broomhead and Gregory P King. Extracting qualitative dynamics from experimental data. Physica D: Nonlinear Phenomena, 20(2-3):217–236, 1986. 197 [28] Peter Bubenik. Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research, 16:77–102, 2015. [29] John Butterworth, Jin Hee Lee, and Barry Davidson. Experimental determination of modal damping from full scale testing. In 13th world conference on earthquake engineering, volume 310, pages 1–15, 2004. [30] Th Buzug and G Pfister. Optimal delay time and embedding dimension for delay-time coordinates by analysis of the global static and local dynamical behavior of strange attractors. Physical review A, 45(10):7073, 1992. [31] Andriana S. L. O. Campanharo, M. Irmak Sirer, R. Dean Malmgren, Fernando M. Ramos, and Luís A. Nunes. Amaral. Duality between time series and networks. PLoS ONE, 6(8):e23378, aug 2011. [32] M S Cao, G G Sha, Y F Gao, and W Ostachowicz. Structural damage identification using damping: a compendium of uses and features. Smart Materials and Structures, 26(4):043001, mar 2017. [33] Yinhe Cao, Wen-wen Tung, JB Gao, Vladimir A Protopopescu, and Lee M Hively. De- tecting dynamical changes in time series using the permutation entropy. Physical review E, 70(4):046217, 2004. [34] Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308, January 2009. Survey. [35] Gunnar Carlsson and Vin de Silva. Zigzag persistence. Foundations of Computational Mathematics, 10(4):367–405, apr 2010. [36] Gunnar Carlsson, Vin de Silva, and Dmitriy Morozov. Zigzag persistent homology and real-valued functions. In Proceedings of the 25th annual symposium on Computational geometry. ACM Press, 2009. [37] Gunnar Carlsson, Jackson Gorham, Matthew Kahle, and Jeremy Mason. Computational topology for configuration spaces of hard disks. Physical Review E, 85(1), jan 2012. [38] M. J. Casiano. Extracting damping ratio from dynamic data and numerical solutions. Nasa Technical Reports, 2016. [39] Chih-Chung Chang and Chih-Jen Lin. LIBSVM. ACM Transactions on Intelligent Systems and Technology, 2(3):1–27, apr 2011. [40] Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Aarti Singh, and Larry Wasserman. On the bootstrap for persistence diagrams and landscapes. arXiv preprint arXiv:1311.0376, 2013. [41] Xiaoying Chen, Chong Zhang, Bin Ge, and Weidong Xiao. Temporal query processing in social network. Journal of Intelligent Information Systems, 49(2):147–166, dec 2016. 198 [42] Yang Chen, Harish Chintakunta, Le Xie, Yuliy M. Baryshnikov, and P. R. Kumar. Persistent- homology-based detection of power system low-frequency oscillations using PMUs. In 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, dec 2016. [43] Ilya Chevyrev and Andrey Kormilitzin. A Primer on the Signature Method in Machine Learning. 2016. [44] Ilya Chevyrev, Vidit Nanda, and Harald Oberhauser. Persistence paths and signature features in topological data analysis. [45] Harish Chintakunta, Thanos Gentimis, Rocio Gonzalez-Diaz, Maria-Jose Jimenez, and Hamid Krim. An entropy-based persistence barcode. Pattern Recognition, 48(2):391–401, feb 2015. [46] S. Chowdhury and F. Mémoli. Convergence of hierarchical clustering and persistent homol- ogy methods on directed networks. ArXiv, abs/1711.04211, 2017. [47] Yu-Min Chung, Chuan-Shen Hu, Yu-Lun Lo, and Hau-Tieng Wu. A persistent homology approach to heart rate variability analysis with an application to sleep-wake classification. arXiv preprint arXiv:1908.06856, 2019. [48] Septima Poinsette Clark. Estimating the fractal dimension of chaotic time series. Lincoln Laboratory Journal, 3(1), 1990. [49] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence dia- grams. Discrete & Computational Geometry, 37(1):103–120, dec 2006. [50] Ronald R. Coifman and Stéphane Lafon. Diffusion maps. 21(1):5–30, jul 2006. [51] Madalena Costa, Ary L Goldberger, and C-K Peng. Multiscale entropy analysis of complex physiologic time series. Physical review letters, 89(6):068102, 2002. [52] Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons, 2012. [53] Joseph Crawford and Tijana Milenković. ClueNet: Clustering a temporal network based on topological similarity rather than denseness. PLOS ONE, 13(5):e0195993, may 2018. [54] S. Czesla, T. Molle, and J. H. M. M. Schmitt. A posteriori noise estimation in variable data sets. Astronomy & Astrophysics, 609:A39, jan 2018. [55] G.A. Darbellay and I. Vajda. Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45(4):1315–1321, may 1999. [56] Bin Ran David Boyce. Modeling Dynamic Transportation Networks. Springer Berlin Hei- delberg, 2012. 199 [57] Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag, 2008. [58] Luciana De Micco, Juana Graciela Fernández, Hilda A Larrondo, Angelo Plastino, and Osvaldo A Rosso. Sampling period, statistical complexity, and chaotic attractors. Physica A: Statistical Mechanics and its Applications, 391(8):2564–2575, 2012. [59] Luciana De Micco, Juana Graciela Fernández, Hilda A Larrondo, Angelo Plastino, and Osvaldo A Rosso. Sampling period, statistical complexity, and chaotic attractors. Physica A: Statistical Mechanics and its Applications, 391(8):2564–2575, 2012. [60] Cecil Jose A. Delfinado and Herbert Edelsbrunner. An incremental algorithm for Betti numbers of simplicial complexes on the 3-sphere. Computer Aided Geometric Design, 12(7):771–784, 1995. [61] Alfonso Delgado-Bonal and Alexander Marshak. Approximate entropy and sample entropy: A comprehensive tutorial. Entropy, 21(6):541, may 2019. [62] Bin Deng, Li Liang, Shunan Li, Ruofan Wang, Haitao Yu, Jiang Wang, and Xile Wei. Complexity extraction of electroencephalograms in alzheimers disease with weighted- permutation entropy. Chaos: An Interdisciplinary Journal of Nonlinear Science, 25(4):043105, apr 2015. [63] Varad Deshmukh, Elizabeth Bradley, Joshua Garland, and James D. Meiss. Using curvature to select the time lag for delay reconstruction. Chaos: An Interdisciplinary Journal of Nonlinear Science, 30(6):063143, jun 2020. [64] T. Detroux, L. Renson, L. Masset, and G. Kerschen. The harmonic balance method for bifurcation analysis of large-scale nonlinear mechanical systems. Computer Methods in Applied Mechanics and Engineering, 296:18–38, nov 2015. [65] Tamal K Dey and Yusu Wang. Computational Topology for Data Analysis. Cambridge University Press, 2021. [66] Edsger W Dijkstra. A note on two problems in connexion with graphs. Numerische mathe- matik, 1(1):269–271, 1959. [67] Andrey Dmitriev, Victor Dmitriev, Oleg Sagaydak, and Olga Tsukanova. The application of stochastic bifurcation theory to the early detection of economic bubbles. Procedia Computer Science, 122:354–361, 2017. [68] Reik V Donner, Yong Zou, Jonathan F Donges, Norbert Marwan, and Jürgen Kurths. Recurrence networks—a novel paradigm for nonlinear time series analysis. New Journal of Physics, 12(3):033025, mar 2010. [69] Herbert Edelsbrunner and John Harer. Persistent homology-a survey. Contemporary math- ematics, 453:257–282, 2008. [70] Herbert Edelsbrunner and John Harer. Computational Topology - an Introduction. American Mathematical Society, 2010. 200 [71] W. Ellens, F.M. Spieksma, P. Van Mieghem, A. Jamakovic, and R.E. Kooij. Effective graph resistance. Linear Algebra and its Applications, 435(10):2491–2506, nov 2011. [72] Jessica Enright and Rowland Raymond Kao. Epidemics on dynamic networks. Epidemics, 24:88–97, sep 2018. [73] Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman Balakrishnan, Aarti Singh, et al. Confidence sets for persistence diagrams. The Annals of Statistics, 42(6):2301–2339, 2014. [74] Temple H. Fay. Coulomb damping. International Journal of Mathematical Education in Science and Technology, 43(7):923–936, oct 2012. [75] Temple H. Fay. Quadratic damping. International Journal of Mathematical Education in Science and Technology, 43(6):789–803, sep 2012. [76] Birgit Frank, Bernd Pompe, Uwe Schneider, and Dirk Hoyer. Permutation entropy improves fetal behavioural state classification based on heart rate analysis from biomagnetic recordings in near term fetuses. Medical and Biological Engineering and Computing, 44(3):179, 2006. [77] Andrew M Fraser and Harry L Swinney. Independent coordinates for strange attractors from mutual information. Physical review A, 33(2):1134, 1986. [78] Andrew M. Fraser and Harry L. Swinney. Independent coordinates for strange attractors from mutual information. Physical Review A, 33(2):1134–1140, feb 1986. [79] Riccardo Gallotti and Marc Barthelemy. The multilayer temporal network of public transport in great britain. Scientific Data, 2(1), jan 2015. [80] Joshua Garland, Tyler Jones, Michael Neuder, Valerie Morris, James White, and Elizabeth Bradley. Anomaly detection in paleoclimate records using permutation entropy. Entropy, 20(12):931, 2018. [81] Joshua Garland, Tyler R Jones, Elizabeth Bradley, Michael Neuder, and James WC White. Climate entropy production recorded in a deep antarctic ice core. arXiv preprint arXiv:1806.10936, 2018. [82] Marian Gidea. Topological data analysis of critical transitions in financial networks. In Puzis R. Shmueli E., Barzel B., editor, 3rd International Winter School and Conference on Network Science NetSci-X 2017, Springer Proceedings in Complexity. Springer, Cham, 2017. [83] Marian Gidea and Yuri Katz. Topological data analysis of financial time series: Landscapes of crashes. Physica A: Statistical Mechanics and its Applications, 491:820–834, 2018. [84] C. Gontier, M. Smail, and P.E. Gautier. A time domain method for the identification of dynamic parameters of structures. Mechanical Systems and Signal Processing, 7(1):45–56, jan 1993. 201 [85] Manuel I González. Forces between permanent magnets: experiments and model. European Journal of Physics, 38(2):025202, dec 2016. [86] Peter Grassberger and Itamar Procaccia. Measuring the strangeness of strange attractors. Physica D: Nonlinear Phenomena, 9(1-2):189–208, 1983. [87] Aixia Guo, Bettina F. Drake, Yosef M. Khan, James R. Langabeer II, and Randi E. Foraker. Time-series cardiovascular risk factors and receipt of screening for breast, cervical, and colon cancer: The guideline advantage. PLOS ONE, 15(8):e0236836, aug 2020. [88] T. C. Gupta. Identification and experimental validation of damping ratios of different human body segments through anthropometric vibratory model in standing posture. Journal of Biomechanical Engineering, 129(4):566–574, dec 2006. [89] Gregory Gutin, Toufik Mansour, and Simone Severini. A characterization of horizontal visibility graphs and combinatorics on words. Physica A: Statistical Mechanics and its Applications, 390(12):2421–2428, jun 2011. [90] Jürgen Hackl and Bryan T. Adey. Estimation of traffic flow changes using networks in networks approaches. Applied Network Science, 4(1), may 2019. [91] Frank R Hampel. The influence curve and its role in robust estimation. Journal of the american statistical association, 69(346):383–393, 1974. [92] Allen Hatcher. Algebraic Topology. Cambridge University Press, 2002. [93] Assaf Hochman, Pinhas Alpert, Tzvi Harpaz, Hadas Saaroni, and Gabriele Messori. A new dynamical systems perspective on atmospheric predictability: Eastern mediterranean weather regimes as a case study. Science Advances, 5(6), jun 2019. [94] Petter Holme and Jari Saramäki. Temporal networks. Physics Reports, 519(3):97–125, oct 2012. [95] J. Hu, J.B. Gao, and K.D. White. Estimating measurement noise in a time series by exploiting nonstationarity. Chaos, Solitons & Fractals, 22(4):807–819, nov 2004. [96] Fang-Lin Huang, Xue-Min Wang, Zheng-Qing Chen, Xu-Hui He, and Yi-Qing Ni. A new approach to identification of structural damping ratios. Journal of Sound and Vibration, 303(1-2):144–153, jun 2007. [97] Silu Huang, Ada Wai-Chee Fu, and Ruifeng Liu. Minimum spanning trees in temporal graphs. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, may 2015. [98] Ismail Husein, Herman Mawengkang, Saib Suwilo, and Mardiningsih. Modeling the trans- mission of infectious disease in a dynamic network. Journal of Physics: Conference Series, 1255(1):012052, aug 2019. 202 [99] Boris Iglewicz and David Hoaglin. Volume 16: how to detect and handle outliers, The ASQC basic references in quality control: statistical techniques, Edward F. Mykytka. PhD thesis, Ph. D., Editor, 1993. [100] Daniel J. Inman. Engineering Vibration. Pearson, 2014. [101] Rinku Jacob, K. P. Harikrishnan, R. Misra, and G. Ambika. Weighted recurrence networks for the analysis of time-series data. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 475(2221):20180256, jan 2019. [102] N JAKSIC and M BOLTEZAR. An approach to parameter identification for a single-degree- of-freedom dynamical system based on short free acceleration response. Journal of Sound and Vibration, 250(3):465 – 483, 2002. [103] W. Ji and V. Venkatasubramanian. Hard-limit induced chaos in a fundamental power system model. International Journal of Electrical Power & Energy Systems, 18(5):279–295, jun 1996. [104] Matthew Kahle and Elizabeth Meckes. Limit theorems for betti numbers of random simplicial complexes. Homology, Homotopy and Applications, 15(1):343–374, 2013. [105] Holger Kantz and Thomas Schreiber. Nonlinear Time Series Analysis. Cambridge University Press, nov 2003. [106] Holger Kantz and Thomas Schreiber. Nonlinear Time Series Analysis. Cambridge University Press, 2004. [107] Karsten Keller, Teresa Mangold, Inga Stolz, and Jenna Werner. Permutation entropy: New ideas and challenges. Entropy, 19(3):134, mar 2017. [108] David Kempe, Jon Kleinberg, and Amit Kumar. Connectivity and inference problems for temporal networks. Journal of Computer and System Sciences, 64(4):820–842, jun 2002. [109] Matthew B. Kennel, Reggie Brown, and Henry D. I. Abarbanel. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Physical Review A, 45(6):3403–3411, mar 1992. [110] Matthew B Kennel, Reggie Brown, and Henry DI Abarbanel. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Physical review A, 45(6):3403, 1992. [111] Firas A. Khasawneh and Elizabeth Munch. Stability determination in turning using persistent homology and time series analysis. In Proceedings of the ASME 2014 International Me- chanical Engineering Congress & Exposition, November 14-20, 2014, Montreal, Canada, 2014. Paper no. IMECE2014-40221. [112] Firas A. Khasawneh and Elizabeth Munch. Chatter detection in turning using persistent homology. Mechanical Systems and Signal Processing, 70-71:527–541, 2016. 203 [113] Firas A. Khasawneh and Elizabeth Munch. Utilizing Topological Data Analysis for Studying Signals of Time-Delay Systems, pages 93–106. Springer International Publishing, Cham, 2017. [114] Firas A. Khasawneh and Elizabeth Munch. Topological data analysis for true step detection in periodic piecewise constant signals. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science, 474(2218):20180027, oct 2018. [115] Firas A. Khasawneh, Elizabeth Munch, and Jose A. Perea. Chatter classification in turning using machine learning and topological data analysis. In Tamas Insperger, editor, 14th IFAC Workshop on Time Delay Systems TDS 2018: Budapest, Hungary, 28–30 June 2018, volume 51, pages 195–200, 2018. Accepted for publication at IFAC Workshop on Time Delay Systems; Budapest, Hungary; June 2018. [116] Firas A. Khasawneh, Elizabeth Munch, and Jose A. Perea. Chatter classification in turning using machine learning and topological data analysis. IFAC-PapersOnLine, 51(14):195–200, 2018. [117] Giorgi Khomeriki. Parametric resonance induced chaos in magnetic damped driven pendu- lum. Physics Letters A, 380(31-32):2382–2385, 2016. [118] Alexander Khor and Michael Small. Examining k-nearest neighbour networks: Superfamily phenomena and inversion. Chaos: An Interdisciplinary Journal of Nonlinear Science, 26(4):043101, apr 2016. [119] Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. Estimating mutual informa- tion. Physical Review E, 69(6), jun 2004. [120] L. Lacasa, B. Luque, J. Luque, and J. C. Nuño. The visibility graph: A new method for estimating the hurst exponent of fractional brownian motion. EPL (Europhysics Letters), 86(3):30001, may 2009. [121] L. Lacasa, A. Nuñez, É. Roldán, J. M. R. Parrondo, and B. Luque. Time series irreversibility: a visibility graph approach. The European Physical Journal B, 85(6), jun 2012. [122] Lucas Lacasa, Bartolo Luque, Fernando Ballesteros, Jordi Luque, and Juan Carlos Nuño. From time series to complex networks: The visibility graph. Proceedings of the National Academy of Sciences, 105(13):4972–4975, mar 2008. [123] Lucas Lacasa and Raul Toral. Description of stochastic and chaotic series using visibility graphs. Physical Review E, 82(3), sep 2010. [124] HJ Landau. Sampling, data transmission, and the nyquist rate. Proceedings of the IEEE, 55(10):1701–1706, 1967. [125] Andrea Landherr, Bettina Friedl, and Julia Heidemann. A critical review of centrality measures in social networks. Business & Information Systems Engineering, 2(6):371–385, oct 2010. 204 [126] David Lane, Joan Lu, Camille Peres, Emily Zitek, et al. Online statistics: An interactive multimedia course of study. Retrieved January, 29:2009, 2008. [127] Peter Lawson, Andrew B. Sholl, J. Quincy Brown, Brittany Terese Fasy, and Carola Wenk. Persistent homology for the quantitative evaluation of architectural features in prostate cancer histology. Scientific Reports, 9(1), feb 2019. [128] Hyekyoung Lee, Hyejin Kang, M. K. Chung, Bung-Nyun Kim, and Dong Soo Lee. Persistent brain network homology from the perspective of dendrogram. IEEE Transactions on Medical Imaging, 31(12):2267–2277, dec 2012. [129] Michel CR Leles, João Pedro H Sansão, Leonardo A Mozelli, and Homero N Guimarães. Improving reconstruction of time-series based in singular spectrum analysis: A segmentation approach. Digital Signal Processing, 77:63–76, 2018. [130] Christophe Leys, Christophe Ley, Olivier Klein, Philippe Bernard, and Laurent Licata. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4):764–766, 2013. [131] Duan Li, Zhenhu Liang, Yinghua Wang, Satoshi Hagihira, Jamie W. Sleigh, and Xiaoli Li. Parameter selection in permutation entropy for an electroencephalographic measure of isoflurane anesthetic drug effect. Journal of Clinical Monitoring and Computing, 27(2):113– 123, dec 2012. [132] J. W. Liang and B. F. Feeny. Identifying coulomb and viscous friction from free-vibration decrements. Nonlinear Dynamics, 16(4):337–347, 1998. [133] Jin-Wei Liang and Brian F. Feeny. Balancing energy to estimate damping parameters in forced oscillators. Journal of Sound and Vibration, 295(3-5):988–998, aug 2006. [134] Jin-Wei Liang and Brian F. Feeny. Balancing energy to estimate damping in a forced oscillator with compliant contact. Journal of Sound and Vibration, 330(9):2049–2061, apr 2011. [135] Zhenhu Liang, Yinghua Wang, Gaoxiang Ouyang, Logan J Voss, Jamie W Sleigh, and Xiaoli Li. Permutation auto-mutual information of electroencephalogram in anesthesia. Journal of Neural Engineering, 10(2):026004, feb 2013. [136] R.M. Lin and J. Zhu. Model updating of damped structures using FRF data. Mechanical Systems and Signal Processing, 20(8):2200–2218, nov 2006. [137] Jared A. Little and Brian P. Mann. Optimizing logarithmic decrement damping estimation via uncertainty analysis. In Special Topics in Structural Dynamics & Experimental Techniques, Volume 5, pages 19–22. Springer International Publishing, jun 2019. [138] Chein-Shan Liu. Identifying time-dependent damping and stiffness functions by a simple and yet accurate method. Journal of Sound and Vibration, 318(1-2):148–165, nov 2008. [139] Tiebing Liu, Wenpo Yao, Min Wu, Zhaorong Shi, Jun Wang, and Xinbao Ning. Multiscale permutation entropy analysis of electrocardiogram. Physica A: Statistical Mechanics and its Applications, 471:492–498, 2017. 205 [140] Bartolo Luque, Lucas Lacasa, Fernando J. Ballesteros, and Alberto Robledo. Feigenbaum graphs: A complex network perspective of chaos. PLOS ONE, 6(9):1–8, 09 2011. [141] Bartolo Luque, Lucas Lacasa, Fernando J. Ballesteros, and Alberto Robledo. Analytical properties of horizontal visibility graphs in the feigenbaum scenario. Chaos: An Interdisci- plinary Journal of Nonlinear Science, 22(1):013109, mar 2012. [142] B.P. Mann and F.A. Khasawneh. An energy-balance approach for oscillator parameter identification. Journal of Sound and Vibration, 321(1-2):65–78, mar 2009. [143] Desire L Massart, Leonard Kaufman, Peter J Rousseeuw, and Annick Leroy. Least median of squares: a robust method for outlier and model error detection in regression and calibration. Analytica Chimica Acta, 187:171–179, 1986. [144] Robert M. May. Chaos and the dynamics of biological populations. Nuclear Physics B - Proceedings Supplements, 2:225–245, nov 1987. [145] Michael McCullough, Michael Small, Thomas Stemler, and Herbert Ho-Ching Iu. Time lagged ordinal partition networks for capturing dynamics of continuous dynamical systems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 25(5):053101, 2015. [146] Michael McCullough, Michael Small, Thomas Stemler, and Herbert Ho-Ching Iu. Time lagged ordinal partition networks for capturing dynamics of continuous dynamical systems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 25(5):053101, may 2015. [147] Andreas Otto Melih C. Yesilli, Firas A. Khasawneh. Topological feature vectors for chatter detection in turning processes. arXiv:1905.08671, 2019. [148] Michał Melosik and W Marszalek. On the 0/1 test for chaos in continuous systems. Bulletin of the Polish Academy of Sciences Technical Sciences, 64(3):521–528, 2016. [149] Michał Melosik and W Marszalek. On the 0/1 test for chaos in continuous systems. Bulletin of the Polish Academy of Sciences Technical Sciences, 64(3):521–528, 2016. [150] Craig Meskell. A decrement method for quantifying nonlinear and linear damping parame- ters. Journal of Sound and Vibration, 296(3):643–649, sep 2006. [151] T. Mimura and A. Mita. Automatic estimation of natural frequencies and damping ratios of building structures. Procedia Engineering, 188:163–169, 2017. [152] Luis Montesinos, Rossana Castaldo, and Leandro Pecchia. On the use of approximate entropy and sample entropy with centre of pressure time-series. Journal of NeuroEngineering and Rehabilitation, 15(1), dec 2018. [153] George B Moody and Roger G Mark. Mit-bih arrhythmia database, 1992. [154] George B Moody and Roger G Mark. The impact of the mit-bih arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50, 2001. 206 [155] John R. Moore and Douglas A. Maguire. Natural sway frequencies and damping ratios of trees: concepts, review and synthesis of previous studies. Trees - Structure and Function, 18(2):195–203, mar 2004. [156] Pablo Moriano, Jorge Finke, and Yong-Yeol Ahn. Community-based event detection in temporal networks. Scientific Reports, 9(1), mar 2019. [157] Elizabeth Munch. A user’s guide to topological data analysis. Journal of Learning Analytics, 4(2), 2017. [158] James R. Munkres. Elements of Algebraic Topology. Addison Wesley, 1993. [159] Audun Myers and Firas A. Khasawneh. Delay parameter selection in permutation entropy using topological data analysis. arXiv:1905.04329 [physics.data-an], 2019. [160] Audun Myers and Firas A. Khasawneh. Dynamic state analysis of a driven magnetic pen- dulum using ordinal partition networks and topological data analysis. In Volume 7: 32nd Conference on Mechanical Vibration and Noise (VIB). American Society of Mechanical Engineers, aug 2020. [161] Audun Myers and Firas A. Khasawneh. On the automatic parameter selection for permutation entropy. Chaos: An Interdisciplinary Journal of Nonlinear Science, 30(3):033130, mar 2020. [162] Audun Myers, Elizabeth Munch, and Firas A Khasawneh. Persistent homology of complex networks for dynamic state detection. arXiv preprint arXiv:1904.07403, 2019. [163] Audun D. Myers, Firas A. Khasawneh, and Brittany T. Fasy. Separating persistent homology of noise from time series data using topological signal processing. December 2020. [164] Audun D. Myers, Firas A. Khasawneh, and Brittany T. Fasy. ANAPT: Additive noise analysis for persistence thresholding. Foundations of Data Science, 0(0):0, 2022. [165] Audun D. Myers, Joshua R. Tempelman, David Petrushenko, and Firas A. Khasawneh. Low- cost double pendulum for high-quality data collection with open-source video tracking and analysis. HardwareX, 8:e00138, oct 2020. [166] Suraj K. Nayak, Arindam Bit, Anilesh Dey, Biswajit Mohapatra, and Kunal Pal. A review on the nonlinear dynamical system analysis of electrocardiogram signal. Journal of Healthcare Engineering, 2018:1–19, 2018. [167] Angel Nuñez, Lucas Lacasa, Eusebio Valero, Jose Patricio Gómez, and Bartolo Luque. Detecting series periodicity with horizontal visibility graphs. International Journal of Bifurcation and Chaos, 22(07):1250160, jul 2012. [168] Angel M Nuñez, Lucas Lacasa, Jose Patricio Gomez, and Bartolo Luque. Visibility algo- rithms: A short review. New Frontiers in Graph Theory, pages 119–152, 2012. [169] Philip Nuss, T.E. Graedel, Elisa Alonso, and Adam Carroll. Mapping supply chain risk by network analysis of product platforms. Sustainable Materials and Technologies, 10:14–22, dec 2016. 207 [170] S. Y. Oudot. Persistence theory: from quiver representations to data analysis, volume 209 of AMS Mathematical Surveys and Monographs. American Mathematical Society, 2015. [171] Sinno Jialin Pan and Qiang Yang. A Survey On Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, oct 2010. [172] George A. Papagiannopoulos and George D. Hatzigeorgiou. On the use of the half-power bandwidth method to estimate damping in building structures. Soil Dynamics and Earthquake Engineering, 31(7):1075–1079, jul 2011. [173] Athanasios Papoulis and S Unnikrishna Pillai. Probability, random variables, and stochastic processes. Tata McGraw-Hill Education, 2002. [174] Jose A. Perea. A brief history of persistence. [175] Jose A. Perea. Persistent homology of toroidal sliding window embeddings. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, mar 2016. [176] Jose A. Perea. Topological time series analysis. Notices of the American Mathematical Society, ??(05):1, may 2019. [177] Jose A. Perea, Anastasia Deckard, Steve B. Haase, and John Harer. SW1PerS: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data. BMC Bioinformatics, 16(1), Aug 2015. [178] Jose A Perea, Anastasia Deckard, Steve B Haase, and John Harer. Sw1pers: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data. BMC bioinformatics, 16(1):257, 2015. [179] Jose A. Perea, Anastasia Deckard, Steve B. Haase, and John Harer. SW1pers: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data. BMC Bioinformatics, 16(1), 2015. [180] Jose A Perea and John Harer. Sliding windows and persistence: An application of topological methods to signal analysis. Foundations of Computational Mathematics, 15(3):799–838, 2015. [181] Jose A. Perea, Elizabeth Munch, and Firas A. Khasawneh. Approximating continuous functions on persistence diagrams using template functions. [182] Yakov Borisovich Pesin. Characteristic lyapunov exponents and smooth ergodic theory. Uspekhi Matematicheskikh Nauk, 32(4):55–112, 1977. [183] David Petrushenko and Firas A. Khasawneh. Uncertainty propagation of system parameters to the dynamic response: An application to a benchtop pendulum. In Volume 4B: Dynamics, Vibration, and Control. American Society of Mechanical Engineers, nov 2017. [184] Marco Piangerelli, Matteo Rucco, and Emanuela Merelli. Topological classifier for detecting the emergence of epileptic seizures. 2016. 208 [185] Steven M Pincus. Approximate entropy as a measure of system complexity. Proceedings of the National Academy of Sciences, 88(6):2297–2301, 1991. [186] Steven M Pincus. Approximate entropy as a measure of system complexity. Proceedings of the National Academy of Sciences, 88(6):2297–2301, 1991. [187] Pavel M. Polunin, Yushi Yang, Mark I. Dykman, Thomas W. Kenny, and Steven W. Shaw. Characterization of MEMS resonator nonlinearities using the ringdown response. Journal of Microelectromechanical Systems, 25(2):297–303, apr 2016. [188] Yves Pomeau and Paul Manneville. Intermittent transition to turbulence in dissipative dynamical systems. Communications in Mathematical Physics, 74(2):189–197, jun 1980. [189] Anton Popov, Oleksii Avilov, and Oleksii Kanaykin. Permutation entropy of eeg signals for different sampling rate and time lag combinations. In Signal Processing Symposium (SPS), 2013, pages 1–4. IEEE, 2013. [190] Alberto Porta, Vlasta Bari, Andrea Marchi, Beatrice De Maria, Paolo Castiglioni, Marco di Rienzo, Stefano Guzzetti, Andrei Cividjian, and Luc Quintin. Limits of permutation- based entropies in assessing complexity of short heart period variability. Physiological Measurement, 36(4):755–765, mar 2015. [191] M Prandina, J E Mottershead, and E Bonisoli. Damping identification in multiple degree-of- freedom systems using an energy balance approach. Journal of Physics: Conference Series, 181:012006, aug 2009. [192] Marco Prandina, John E. Mottershead, and Elvio Bonisoli. An assessment of damping identification methods. Journal of Sound and Vibration, 323(3-5):662–676, jun 2009. [193] Fengyong Qian, Shuhung Leung, Yuesheng Zhu, Waiki Wong, Derek Pao, and Winghong Lau. Damped sinusoidal signals parameter estimation in frequency domain. Signal Process- ing, 92(2):381–391, feb 2012. [194] Thomas Quail, Alvin Shrier, and Leon Glass. Predicting the onset of period-doubling bifurcations in noisy cardiac systems. Proceedings of the National Academy of Sciences, 112(30):9358–9363, jul 2015. [195] Rangayyan. Biomed Signal Analysis 2E. John Wiley & Sons, 2015. [196] Jan Reininghaus, Stefan Huber, Ulrich Bauer, and Roland Kwitt. A stable multi-scale kernel for topological machine learning. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2015. [197] Mark A Richards. The discrete-time fourier transform and discrete fourier transform of windowed stationary white noise. Georgia Institute of Technology, Tech. Rep, 2013. [198] Joshua S Richman and J Randall Moorman. Physiological time-series analysis using approx- imate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6):H2039–H2049, 2000. 209 [199] Joshua S Richman and J Randall Moorman. Physiological time-series analysis using approx- imate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6):H2039–H2049, 2000. [200] M. Riedl, A. Müller, and N. Wessel. Practical considerations of permutation entropy. The European Physical Journal Special Topics, 222(2):249–262, jun 2013. [201] Müller Riedl, A Müller, and N Wessel. Practical considerations of permutation entropy. The European Physical Journal Special Topics, 222(2):249–262, 2013. [202] Michael Robinson. Topological Signal Processing. Springer, 2014. [203] Michael Robinson. Topological signal processing. Springer, 2014. [204] G. Rohith and Nandan K. Sinha. Routes to chaos in the post-stall dynamics of higher- dimensional aircraft model. Nonlinear Dynamics, 100(2):1705–1724, apr 2020. [205] Konstantinos Sakellariou, Thomas Stemler, and Michael Small. Estimating topological entropy using ordinal partition networks. Physical Review E, 103(2):022214, feb 2021. [206] Benjamin Schäfer, Dirk Witthaut, Marc Timme, and Vito Latora. Dynamically induced cascading failures in power grids. Nature Communications, 9(1), may 2018. [207] Benjamin Schäfer and G. Cigdem Yalcin. Dynamical modeling of cascading failures in the turkish power grid. Chaos: An Interdisciplinary Journal of Nonlinear Science, 29(9):093134, sep 2019. [208] N. V. Semionov, Yu. G. Yermolaev, A. D. Kosinov, A. N. Semenov, B. V. Smorodsky, and A. A. Yatskikh. The effect of small angle of attack on the laminar-turbulent transition in boundary layer on swept wing at mach number m=2. In AIP Conference Proceedings. Author(s), 2017. [209] Songwon Seo. A review and comparison of methods for detecting outliers in univariate data sets. PhD thesis, University of Pittsburgh, 2006. [210] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423, jul 1948. [211] Claude E Shannon, Warren Weaver, and Arthur W Burks. The mathematical theory of communication. 1951. [212] Claude Elwood Shannon. A mathematical theory of communication. ACM SIGMOBILE mobile computing and communications review, 5(1):3–55, 2001. [213] He Shaobo, Sun Kehui, and Wang Huihai. Modified multiscale permutation entropy al- gorithm and its application for multiscroll chaotic systems. Complexity, 21(5):52–58, nov 2014. [214] Azad Siahmakoun, Valentina A French, and Jeffrey Patterson. Nonlinear dynamics of a sinusoidally driven pendulum in a repulsive magnetic field. American Journal of Physics, 65(5):393–400, 1997. 210 [215] B. Skyrms and R. Pemantle. A dynamic model of social network formation. Proceedings of the National Academy of Sciences, 97(16):9340–9346, aug 2000. [216] Michael Small. Complex networks from time series: Capturing dynamics. In 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013). IEEE, may 2013. [217] Michael Small, Jie Zhang, and Xiaoke Xu. Transforming time series into complex networks. In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommu- nications Engineering, pages 2078–2089. Springer Berlin Heidelberg, 2009. [218] Hoon Sohn and Charles Reed Farrar. Damage diagnosis using time series analysis of vibration signals. Smart Materials and Structures, 10:446–451, 2001. [219] Saleh Soltan, Dorian Mazauric, and Gil Zussman. Cascading failures in power grids. In Proceedings of the 5th international conference on Future energy systems. ACM, jun 2014. [220] J. Sprott. Chaos and time-series analysis. Choice Reviews Online, 41(06):41–3492–41– 3492, feb 2004. [221] MATTHÄUS STANIEK and KLAUS LEHNERTZ. PARAMETER SELECTION FOR PERMUTATION ENTROPY MEASUREMENTS. International Journal of Bifurcation and Chaos, 17(10):3729–3733, oct 2007. [222] MATTHÄUS STANIEK and KLAUS LEHNERTZ. PARAMETER SELECTION FOR PERMUTATION ENTROPY MEASUREMENTS. International Journal of Bifurcation and Chaos, 17(10):3729–3733, oct 2007. [223] Herbert A Sturges. The choice of a class interval. Journal of the american statistical association, 21(153):65–66, 1926. [224] Kashin Sugishita and Yasuo Asakura. Vulnerability studies in the fields of transportation and complex networks: a citation network analysis. Public Transport, 13(1):1–34, sep 2020. [225] Floris Takens. Detecting strange attractors in turbulence. In Dynamical systems and turbu- lence, Warwick 1980, pages 366–381. Springer, 1981. [226] Floris Takens. Detecting strange attractors in turbulence. In David Rand and Lai-Sang Young, editors, Dynamical Systems and Turbulence, Warwick 1980, volume 898 of Lecture Notes in Mathematics, pages 366–381. Springer Berlin Heidelberg, 1981. [227] Mei Tao, Kristina Poskuviene, Nizar Alkayem, Maosen Cao, and Minvydas Ragulskis. Permutation entropy based on non-uniform embedding. Entropy, 20(8):612, 2018. [228] Mei Tao, Kristina Poskuviene, Nizar Alkayem, Maosen Cao, and Minvydas Ragulskis. Permutation entropy based on non-uniform embedding. Entropy, 20(8):612, 2018. [229] Joshua Tempelman. Chaos detection with persistent homology, 2020. 211 [230] Joshua R. Tempelman, Audun Myers, Jeffrey T. Scruggs, and Firas A. Khasawneh. Effects of correlated noise on the performance of persistence based dynamic state detection methods. In Volume 7: 32nd Conference on Mechanical Vibration and Noise (VIB). American Society of Mechanical Engineers, aug 2020. [231] Vy Tran, Eric Brost, Marty Johnston, and Jeff Jalkio. Predicting the behavior of a chaotic pendulum with a variable interaction potential. Chaos: An Interdisciplinary Journal of Nonlinear Science, 23(3):033103, sep 2013. [232] Sarah Tymochko, Elizabeth Munch, Jason Dunion, Kristen Corbosiero, and Ryan Torn. Using persistent homology to quantify a diurnal cycle in hurricane felix. [233] Sarah Tymochko, Elizabeth Munch, and Firas A. Khasawneh. Using zigzag persistent homology to detect hopf bifurcations in dynamical systems. Algorithms, 13(11):278, oct 2020. [234] Krzysztof Urbanowicz and Janusz A. Hołyst. Noise-level estimation of time series using coarse-grained entropy. Physical Review E, 67(4), apr 2003. [235] M. van Hagen. Waiting experience at train stations. PhD thesis, University of Twente, April 2011. [236] Xiang Wan, Wenqian Wang, Jiming Liu, and Tiejun Tong. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Medical Research Methodology, 14(1), dec 2014. [237] Minggang Wang and Lixin Tian. From time series to complex networks: The phase space coarse graining. Physica A: Statistical Mechanics and its Applications, 461:456–468, nov 2016. [238] Yishu Wang, Ye Yuan, Yuliang Ma, and Guoren Wang. Time-dependent graphs: Definitions, applications, and algorithms. Data Science and Engineering, 4(4):352–366, sep 2019. [239] Tongfeng Weng, Jie Zhang, Michael Small, Rui Zheng, and Pan Hui. Memory and between- ness preference in temporal networks induced from time series. Scientific Reports, 7(1), feb 2017. [240] Alan Wolf, Jack B Swift, Harry L Swinney, and John A Vastano. Determining lyapunov exponents from a time series. Physica D: Nonlinear Phenomena, 16(3):285–317, 1985. [241] G.R. Wood and B.P. Zhang. Estimation of the lipschitz constant of a function. Journal of Global Optimization, 8(1), jan 1996. [242] Hui Xiong, Pengjian Shang, and Jiayi He. Nonuniversality of the horizontal visibility graph in inferring series periodicity. Physica A: Statistical Mechanics and its Applications, 534:122234, nov 2019. [243] Boyan Xu, Christopher J. Tralie, Alice Antia, Michael Lin, and Jose A. Perea. Twisty takens: A geometric characterization of good observations on dense trajectories. 212 [244] Mengkai Xu, Srinivasan Radhakrishnan, Sagar Kamarthi, and Xiaoning Jin. Resiliency of mutualistic supplier-manufacturer networks. Scientific Reports, 9(1), sep 2019. [245] Jiawei Xue and Ruipeng Diao. A frequency domain interpolation method for damping ratio estimation. In 2014 IEEE International Conference on Control System, Computing and Engineering (ICCSCE 2014). IEEE, nov 2014. [246] Melih C. Yesilli, Sarah Tymochko, Firas A. Khasawneh, and Elizabeth Munch. Chatter diagnosis in milling using supervised learning and topological features vector. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, dec 2019. [247] Jingyi You, Chenlong Hu, Hidetaka Kamigaito, Kotaro Funakoshi, and Manabu Okumura. Robust dynamic clustering for temporal networks. In Proceedings of the 30th ACM Interna- tional Conference on Information & Knowledge Management. ACM, oct 2021. [248] Hong Zhang and Xuncheng Liu. Analysis of parameter selection for permutation entropy in logistic chaotic series. In Intelligent Transportation, Big Data & Smart City (ICITBS), 2018 International Conference on, pages 398–402. IEEE, 2018. [249] J. Fang Zhang and Z. Gang Shao. Complex networks from lévy noise. Indian Journal of Physics, 85(9):1425–1432, sep 2011. [250] Jianye Zhang and Peng Zhang. Time Series Analysis Methods and Applications for Flight Data. Springer Berlin Heidelberg, 2017. [251] Yang Zhang, Zhou Zhou, Kelei Wang, and Xu Li. Aerodynamic characteristics of different airfoils under varied turbulence intensities at low reynolds numbers. Applied Sciences, 10(5):1706, mar 2020. [252] Luciano Zunino, Miguel C Soriano, Ingo Fischer, Osvaldo A Rosso, and Claudio R Mirasso. Permutation-information-theory approach to unveil delay dynamics from time-series analy- sis. Physical Review E, 82(4):046212, 2010. [253] Luciano Zunino, Miguel C Soriano, Ingo Fischer, Osvaldo A Rosso, and Claudio R Mirasso. Permutation-information-theory approach to unveil delay dynamics from time-series analy- sis. Physical Review E, 82(4):046212, 2010. 213