ASSESSMENT OF FUNCTIONAL CONNECTIVITY IN THE HUMAN BRAIN: MULTIVARIATE AND GRAPH SIGNAL PROCESSING METHODS By Marisel Villafañe-Delgado A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Electrical Engineering — Doctor of Philosophy 2017 ABSTRACT ASSESSMENT OF FUNCTIONAL CONNECTIVITY IN THE HUMAN BRAIN: MULTIVARIATE AND GRAPH SIGNAL PROCESSING METHODS By Marisel Villafañe-Delgado Advances in neurophysiological recording have provided a noninvasive way of inferring cognitive processes. Recent studies have shown that cognition relies on the functional integration or connectivity of segregated specialized regions in the brain. Functional connectivity quantifies the statistical relationships among different regions in the brain. However, current functional connectivity measures have certain limitations in the quantification of global integration and characterization of network structure. These limitations include the bivariate nature of most functional connectivity measures, the computational complexity of multivariate measures, and graph theoretic measures that are not robust to network size and degree distribution. Therefore, there is a need of computationally efficient and novel measures that can quantify the functional integration across brain regions and characterize the structure of these networks. This thesis makes contributions in three different areas for the assessment of multivariate functional connectivity. First, we present a novel multivariate phase synchrony measure for quantifying the common functional connectivity within different brain regions. This measure overcomes the drawbacks of bivariate functional connectivity measures and provides insights into the mechanisms of cognitive control not accountable by bivariate measures. Following the assessment of functional connectivity from a graph theoretic perspective, we propose a graph to signal transformation for both binary and weighted networks. This provides the means for characterizing the network structure and quantifying information in the graph by overcoming some drawbacks of traditional graph based measures. Finally, we introduce a new approach to studying dynamic functional connectivity networks through signals defined over networks. In this area, we define a dynamic graph Fourier transform in which a common subspace is found from the networks over time based on the tensor decomposition of the graph Laplacian over time. Copyright by MARISEL VILLAFAÑE-DELGADO 2017 To my parents and NC. v TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Functional Connectivity and Cognitive Control . . . . . . . . . . . . . 1.2 Methods for Quantifying Functional Connectivity . . . . . . . . . . . . 1.3 Organization and Contributions of this Thesis . . . . . . . . . . . . . . 1.3.1 Multivariate Phase Synchrony and Hyperdimensional Geometry 1.3.2 Graph to Signal Transform for Weighted Graphs . . . . . . . . . 1.3.3 Dynamic Graph Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 3 . 4 . 6 . 7 . 8 . 10 Chapter 2 Background . . . . . . . . . . . . . . . . 2.1 Bivariate Time-Frequency Phase Synchrony . . . 2.2 Graph Theory . . . . . . . . . . . . . . . . . . . 2.3 Cognitive control experiment . . . . . . . . . . . 2.3.1 Participants . . . . . . . . . . . . . . . . 2.3.2 Experiment . . . . . . . . . . . . . . . . 2.3.3 EEG Data Acquisition and Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12 13 15 15 15 15 Chapter 3 Hypertorus Multivariate Phase Synchrony . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 S-estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Hyperspherical Phase Synchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Hyperspherical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Hypertorus Synchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Statistical assessment of HTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . dS2 . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Correction of Bias in HT 3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Assessment of robustness to noise of multivariate phase synchrony measures 3.6.2 Effect of number of oscillators on the multivariate synchrony measures . . . 3.6.3 Kuramoto Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.4 Rössler oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.5 Assessment of topographical sensitivity . . . . . . . . . . . . . . . . . . . 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 17 20 20 21 24 24 28 30 31 31 34 37 39 39 40 43 45 50 61 vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 Graph to Signal Transform Based on the Resistance Distance and its applications to Functional Connectivity Networks . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Graph Entropy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Graph to Signal Transform based on Classical Multidimensional Scaling . Graph to Signal Transformation Based on the Resistance Distance Matrix . . . . 4.3.1 Resistance distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Classical Multidimensional Scaling based on the Resistance Distance . . 4.3.3 Reconstruction of the original graph . . . . . . . . . . . . . . . . . . . . 4.3.4 Perturbation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.5 Illustration of Graph to Signal Transform . . . . . . . . . . . . . . . . . 4.3.5.1 Binary graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.5.2 Weighted graphs . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.5.3 Reconstruction of Weighted Networks . . . . . . . . . . . . . . 4.3.5.4 Robustness to network anomalies . . . . . . . . . . . . . . . . Small-world network characterization . . . . . . . . . . . . . . . . . . . . . . . Graph Entropy based on the graph to signal transform . . . . . . . . . . . . . . . Event detection in temporal networks . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Tensor Decompositions for Temporal Networks . . . . . . . . . . . . . . 4.6.2 Graph to signal transform based event detection . . . . . . . . . . . . . . Characterization of Functional Connectivity Networks . . . . . . . . . . . . . . . 4.7.1 Assessment of Graph Information Theoretic Measures in Functional Connectivity Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5 Dynamic Graph Fourier Transform . . . . . . . . 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Graph Signal Processing . . . . . . . . . . . . . . 5.1.2 Tucker Decomposition . . . . . . . . . . . . . . . 5.2 Dynamic Graph Fourier Transform on Temporal Networks 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Simulations . . . . . . . . . . . . . . . . . . . . . 5.3.2 Dynamic Functional Connectivity Networks . . . . 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 63 69 69 70 72 72 74 75 77 80 80 83 85 86 88 90 95 95 97 99 . 102 . 103 . . . . . . . . . 106 108 108 111 112 113 114 117 126 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 127 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 vii LIST OF TABLES Table 3.1: Multivariate synchrony (mean±st.dev.) in networks of Rössler oscillators. 49 Table 3.2: Multivariate synchrony (mean±st.dev.) for different number of oscillators containing two subnetworks of three oscillators. . . . . . . . . . . . . 50 Table 3.3: Multivariate synchrony (mean±st.dev.) for a network consisting of 12 oscillators for different number of subnetworks composed of three oscillators. 50 Table 3.4: Statistical significance of error-correct responses obtained from PLV and HTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Table 4.1: Reconstruction errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Table 4.2: Estimated small-world parameters. . . . . . . . . . . . . . . . . . . . . . 101 Table 4.3: Small-world measure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Table 4.4: Graph entropy from cognitive control FCNs. . . . . . . . . . . . . . . . . 103 Table 4.5: Correlations between graph entropy and behavioral measures from cognitive control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Table 5.1: Performance of the proposed LT and LA . . . . . . . . . . . . . . . . . . . 116 Table 5.2: MSE of the smoothness of f1 from LT and LA . . . . . . . . . . . . . . . 116 Table 5.3: MSE of the smoothness of f2 from LT and LA . . . . . . . . . . . . . . . 117 (t) (t) viii LIST OF FIGURES Figure 3.1: dS (solid lines) and true HTS (dashed lines) for synchrony values of 0, HT 0.20, 0.40, 0.60, 0.80 and 0.99. M = 4 oscillators were simulated with phases θi distributed as V M(0, κ). . . . . . . . . . . . . . . . . . . . . . 33 Figure 3.2: (a) Theoretical upper bounds for the variance of HTS; and (b) empirical variance of HTS as a function of sample size in a network of M = 4 oscillators for different synchronization levels in the Von Mises distribution in (3.35). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Figure 3.3: dS2 (solid lines) and true HT S2 (dashed lines) for true HT S values of HT 0, 0.20, 0.40, 0.60, 0.80 and 0.99. . . . . . . . . . . . . . . . . . . . . . . 38 Figure 3.4: dS2 (solid lines) and true HT S2 (dashed lines) for true HT S Variance of HT values of 0, 0.20, 0.40, 0.60, 0.80 and 0.99. . . . . . . . . . . . . . . . . 38 Figure 3.5: Multivariate synchrony for a network of highly synchronized sinusoidal oscillators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Figure 3.6: Effect of number of oscillators on S-estimator and HTS for mean synchrony values of 0.1, 0.3, 0.5, and 0.7. . . . . . . . . . . . . . . . . . . . 42 Figure 3.7: Effect of number of oscillators (M) on the eigenvalues for a true multivariate synchrony value of 0.4, a) M = 4; b) M = 8; c) M = 12; and d) M = 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Figure 3.8: Comparison of mean and standard deviation of multivariate synchrony (HTS and S-estimator) within a Kuramoto network with Kc = 2. . . . . . . 45 Figure 3.9: Eight Rössler networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Figure 3.10: SynAmps2-64 EEG system and network under test. . . . . . . . . . . . . 52 Figure 3.11: Topographical plots showing the sensitivity of HTS and S-estimator at various SNR levels. a), c) and e) HTS, 20 dB, 10 dB and 0 dB, respectively; b), d) and f) S-estimator, 20 dB, 10 dB and 0 dB, respectively. . . . 53 Figure 3.12: Error-Correct topographical sensitivity of multivariate synchrony in intervals of 25 ms and theta band [4-8 Hz]. (a), (c), (e) and (g) Error-Correct HTS difference. (b), (d), (f) and (h) Error-Correct S-estimator difference. . 56 ix Figure 3.13: Topographical plots of p-values from t-test investigating the difference of multivariate synchrony from HTS between intervals. Note: Black regions correspond to lower p-values. . . . . . . . . . . . . . . . . . . . . . . . . 57 Figure 3.14: Topographical plots of multivariate synchrony from HTS in the 25-75 ms interval. (a) Error responses; (b) Correct responses; (c) Error-Correct responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Figure 3.15: ROC curves for HTS and S-Estimator. Probability of detection is based on the multivariate synchrony among FCz and its neighbors whereas the probability of false alarm is based on the multivariate synchrony around CPz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Figure 3.16: Correlation coefficient between (a) PES, (b) PEA and error-correct multivariate synchrony difference computed using HTS in the ERN interval 25-75 ms, for each electrode. . . . . . . . . . . . . . . . . . . . . . . . . 60 Figure 3.17: Topographical distribution of p-values obtained from the correlation coefficient between (a) PES, (b) PEA and error-correct multivariate synchrony difference computed using HTS in the ERN interval 25-75 ms, for each electrode. Black refers to more significant. . . . . . . . . . . . . . . . . . 61 Figure 4.1: Signal representation of a ring lattice network composed of N = 128 nodes. Top: Resistance distance; (a) K = 2; (b) K = 10. Bottom: Distance D; (c) K = 2; (d) K = 10. . . . . . . . . . . . . . . . . . . . . . . . 82 Figure 4.2: Erdős-Rènyi network signal representation; (a) and (b), Resistance distance, p = 0.2 and p = 0.5, respectively; (c) and (d), Distance D, p = 0.2 and p = 0.5, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Figure 4.3: Signals constructed from a weighted stochastic block network with probability of attachment p = 0.3 using the resistance distance matrix R. (a) First three components corresponding to a network with 3 blocks; (b) First four components corresponding to a network with 4 blocks. . . . . . 84 Figure 4.4: Signals constructed from a weighted Small-World network consisting of N = 128 nodes and average degree K = 6. (a) Rewiring probability p = 0.1; (b) Rewiring probability p = 0.7. . . . . . . . . . . . . . . . . . . . 85 Figure 4.5: Error of the magnitude spectrum. (a) and (b) Ring lattice network with average degree K equal to 4, 16, and 64 consisting of N = 128 and N = 256 nodes, respectively; (c) Stochastic block network, 3 clusters and probability of attachment p = 0.1, p = 0.3, p = 0.5. . . . . . . . . . . . . . . . 87 x Figure 4.6: Error of the magnitude spectrum from a ring lattice network (a) and a stochastic block network (b) with one anomalous edge whose weight ranged in the intervals [0.8, 1.2], [0.6, 1.4], [0.4, 1.6], [0.2, 1.8], and [0, 2]. . 88 Figure 4.7: Estimated probability of rewiring pr in weighted small-world networks. Weights of the small world structure are uniformly distributed in the interval [0, 1] and noise values are uniformly distributed in (a) [0, 1], (b) [0, 0.25]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Figure 4.8: Power spectrum of the first graph signal for: (a) Ring network with K = 4; (b) Small world network with pr = 0.01, pr = 0.05, and pr = 0.1; (c) Erdős-Rènyi network with p = 0.1, p = 0.3, and p = 0.6; (d) Stochastic Block network with Ck = 2, Ck = 6, and Ck = 10 clusters, N = 300 nodes for all networks. The frequency axis limits are adjusted in order to better illustrate the spectrum. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Figure 4.9: Comparison of graph entropy measures for (a) Ring network, K = 2, 4, 8, 16, 32 (I f V1 , α = 0.98 and α = 1.03); (b) Small world network, K = 4 and probability of rewiring pr ranging from 0.0001 to 1 (I f V1 , α = 0.95, and α = 1.05); (c) Erdős-Rènyi network, p from 0.05 to 1 in increments of 0.05 (I f V1 , α = 0.95, and α = 1.1); (d) Stochastic Block network, Ck = 3, 5, 7, 9 (I f V1 , α = 0.95, and α = 1.1), and N = 300 nodes for all networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Figure 4.10: Computation of graph divergence between (a) small-world network with K = 4 and p = 0.0001 and another small-world network with increasing p; (b) Stochastic Block network with 3 blocks and p = 0.9 and another Stochastic Block with 3 blocks and different probability of attachment p. . 95 Figure 4.11: Detection of an event consisting of a Small-World network whose probability of attachment p changes from that of the default network (p = 0.01) at t = 31. ROCs are constructed from the proposed method (blue) and adjacency matrix based method (red) for (a) p = 0.05; (b) p = 0.1; (c) p = 0.15; (d) p = 0.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Figure 4.12: Magnitude Spectrum for each signal obtained through network to signal transformation for (a) Error responses; (b) Correct responses. . . . . . . . 101 Figure 5.1: Tensor unfolding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Figure 5.2: DGFT of a ring network with N = 100 nodes and K = 4 over T = 80 seconds. The graph signal is composed of different components over time, which are extracted by the proposed method. . . . . . . . . . . . . . 115 Figure 5.3: dGFT of the ERN dFCN over the interval [-25 ms, 125 ms]. . . . . . . . . 118 xi Figure 5.4: dGFT of the CRN dFCN over the interval [-25 ms, 125 ms]. . . . . . . . . 118 Figure 5.5: Topoplots from filtered signals in the low, medium and high graph frequency bands. Interval: [-25 ms, 0 ms]. . . . . . . . . . . . . . . . . . . . 120 Figure 5.6: Topoplots from filtered signals in the low, medium and high graph frequency bands. Interval: [0 ms, 25 ms]. . . . . . . . . . . . . . . . . . . . 121 Figure 5.7: Topoplots from filtered signals in the low, medium and high graph frequency bands. Interval: [25 ms, 50 ms]. . . . . . . . . . . . . . . . . . . 122 Figure 5.8: Topoplots from filtered signals in the low, medium and high graph frequency bands. Interval: [50 ms, 75 ms]. . . . . . . . . . . . . . . . . . . 123 Figure 5.9: Topoplots from filtered signals in the low, medium and high graph frequency bands. Interval: [75 ms, 100 ms]. . . . . . . . . . . . . . . . . . . 124 Figure 5.10: Topoplots from filtered signals in the low, medium and high graph frequency bands. Interval: [100 ms, 125 ms]. . . . . . . . . . . . . . . . . . 125 xii Chapter 1 Introduction Cognition and perception are founded on the coordinated activity of neural populations communicating among different specialized brain regions [1]. Neurons that synchronously oscillate in the low and high frequency provide the fundamental mechanism for information transfer [2], allowing coordinated activity in the normally functioning brain [3], [4], [5]. This neural coordination is spatiotemporally dynamic [6], and the oscillatory synchronization among different regions is dynamically adjusted based on the cognitive task [3]. Furthermore, cognitive dysfunctions such as schizophrenia, epilepsy, autism, Alzheimer’s disease, and Parkinson’s disease have been related to abnormalities in neuronal synchronization [1], [7]. Brain functionality has been argued to be based on functional segregation and integration [8], [9]. Functional segregation establishes that specialized activity occurs due to segregated neuronal populations within dedicated brain regions [10]. Functional integration, on the other hand, consists of the combination of multiple distributed regions and serves as the basis for coherent cognition and behavior [10]. The development of brain imaging techniques has provided the means to non-invasively infer patterns of neural activity in the human brain. Among those techniques are electroencephalography (EEG) and magnetoencephalography (MEG), which measure the scalp electric and magnetic fields generated by electrical activity of neural assemblies composed of thousands of neurons in 1 the cortex, respectively [11]. Both techniques provide a high temporal resolution (in the order of milliseconds), but lack good spatial resolution. In addition, these methodologies are sensitive to current sources taking place in different locations of the cortex. Particularly, EEG is more sensitive to secondary currents (volume), whereas MEG is more sensitive to primary current sources [11]. Another popular neuroimaging technique is functional Magnetic Resonance Imaging (fMRI), which measures the changes in blood flow and oxygenation in the brain [12]. It provides an excellent spatial resolution (in the order of few millimeters), at the expense of a slower temporal resolution (in the order of seconds). The advancements in fMRI studies have had a great impact on the study of task-based activation and resting state networks. Brain connectivity encompasses three categories: anatomical, effective and functional connectivity. Anatomical connectivity refers to the physical interconnections among neurons or neuronal elements and can vary depending on the time-scale that it is observed, being quasi-stationary during short-time scales, whereas it is more dynamic at long-temporal scales, due to plasticity. Functional connectivity has been defined as the statistical dependencies among remote neurophysiological events [13]. It does not make any assumptions regarding the underlying structural connections nor the directionality relationships among the regions being assessed. Functional connectivity, as opposed to structural connectivity, can be dynamic at both short and long temporal scales. Effective connectivity, on the other hand, describes the causal relationships between different regions in the brain [13]. In this thesis, we focus on the assessment of functional connectivity, as it has been shown to contribute to the understanding of neural functions in cognition [14]. Functional connectivity was initially defined as the temporal coherence among different neurons, measured by the crosscorrelation of spike trains [15], [16]. Furthermore, anomalies in the functional connections between certain regions are indicative of cognitive dysfunctions, including Alzheimer’s Disease [17] 2 and schizophrenia [18]. 1.1 Functional Connectivity and Cognitive Control Cognitive control is thought to be the foundation of intelligent behavior [18], and it is the mechanism that allows for performance adjustments under activities such as perceptual selection, novel information, and realization of errors [19], [20]. An important question in neuroscience is how the brain adjusts behavior by monitoring the performance under certain activities [21]. It has been hypothesized that cognitive control relies on the information carried by neural signals associated to control, which allows the appropriate selections of actions [22] and has particular influence in psychopathology and self-monitoring [23]. Brain areas involved in cognitive control are the anterior cingulate cortex (ACC), dorsal medial prefrontal cortex (mPFC), and some regions in the parietal lobes [19], [23], [24]. Evoked-related potentials (ERPs) related to cognitive control include the N2, feedback-related negativity (FRN), conflict-related negativity (CRN), and error-related negativity (ERN) [19]. Among these, the ERN is an indicator of cognitive control occurring when the individual performs a behavioral error [25], reaching its maximum (negative) amplitude within 25-75 ms after the error. It has been shown that the ERN exhibits its maximum potential at central and frontal-central regions [25]. Based on dipole modeling, it is suggested that the ERN has a medial frontal generator, potentially the ACC [20], [21], [22]. Hall et al. [22] reported that the ERN response amplitude reflects reduced self-monitoring associated with psychopathology and is linked to higher scores on a selfreport measure. Other studies have related a reduction in the amplitude of ERN to acute alcohol intoxication [23] and increases in ERN amplitude are related to the degree of error responses [24]. In addition, the ERN amplitude is attenuated in patients that experience damage in the dorsal ACC 3 [24]. Various functional connectivity resting state fMRI studies have related abnormalities in the functional relationships among brain regions associated with cognitive control, which includes the dorsal ACC, dorsal PFC and regions of the parietal lobe [25], to attention deficit hyperactivity disorder (ADHD) [26]. In an EEG study, Cavanagh et al. [27] have found increased synchronization between electrodes in the medial prefrontal cortex (mPFC) and the lateral prefrontal cortex (lPFC). Furthermore, their results suggest that neural oscillations between mPFC and lPFC might be the underlying mechanism of functional communication involving networks related to action monitoring and cognitive control. It is suggested that errors indicate the action monitoring network that there is a need for increased cognitive control [27]. Recently, Cavanagh et al. [19] showed that the theta band oscillations in the frontal regions are responsible determining the need of cognitive control and adjustments for a given task [19]. 1.2 Methods for Quantifying Functional Connectivity Methods for quantifying functional connectivity include linear correlation, mutual information, coherence, phase-locking value (PLV) and pairwise phase consistency [28]. The most commonly implemented methods are linear correlation and coherence, which are only sensitive to the linear interactions between time series, and thus cannot account for the nonlinear relationships reflected between electrophysiological time series [28]. Mutual information measures suffer from the estimation of the distribution and associated problems such as the size of histogram bins [28]. In order to quantify both linear and nonlinear relationships in the brain signals, phase synchronization, as defined in the context of two chaotic oscillators, has emerged as an alternative method for the assessment of functional connectivity and is quantified through PLV [29], [30], [31]. PLV 4 as a measure for functional connectivity was introduced by [32], and it estimates the synchrony between two signals by looking at the circular variance of their phase difference across trials. In comparison to other linear and nonlinear methods, PLV has been shown to be more sensitive to nonlinear effects [33]. In addition, this metric has contributed to the assessment of brain rhythms and their related cognitive processes, for example, alpha, beta, delta and theta in the low-frequency bands and gamma bands in the higher frequencies [28], [29], [32], [34]. Although PLV is a promising measure for quantifying functional connectivity, it is still limited by its bivariate nature. Specifically, it does not provide information regarding the integration across multiple regions in the brain. In addition, functional connectivity results from bivariate measures are difficult to interpret and computationally expensive for systems with a large number of regions. In order to overcome these drawbacks, researchers have proposed multivariate phase synchrony measures [35], [36], [37], [38]. Multivariate phase synchrony aims to quantify the global connectivity among a group of oscillators. Previously proposed multivariate measures include the S-Estimator [35], [36] and hyperspherical phase synchrony (HPS) [37], [38], but those methods present some drawbacks such as being computationally complex and having poor topographical sensitivity, respectively. On the other hand, graph theory has provided the means for characterizing the functional connections in the brain, which is a complex dynamic system [39]. Functional connectivity networks are constructed by considering the different brain regions or electrodes/sensors as nodes and the relationships between different nodes, quantified by bivariate functional connectivity measures such as PLV and correlation, as edges. In this manner, functional connectivity networks can take advantage of the widely available set of techniques for characterizing complex networks. In terms of brain networks, these measures have been grouped as measures of functional segregation and functional integration [10]. Measures of functional segregation include the clustering coefficient, 5 transitivity, and modularity [10]. On the other hand, measures of functional integration include the characteristic path length and the global efficiency. By computing measures that characterize network structure, such as the small-world measure and the degree distribution, it has been shown that functional connectivity networks exhibit features of complex networks, including the small-world network [39], [40], and both small-world and scale-free networks [41]. Although graph-theoretic measures have contributed greatly to the advancements in the study of functional connectivity networks, these measures present some drawbacks. Measures employed in the characterization of network structure such as the mean clustering coefficient, the characteristic path length, and the global efficiency may be affected by certain characteristics of the network. Examples include how nodes with low degree affect the clustering coefficient and the dependence of the characteristic path length and the global efficiency of the shortest path between nodes, when networks may rely on other mechanisms than the shortest path for communication. 1.3 Organization and Contributions of this Thesis In this thesis, we present novel techniques that aim to overcome some of the drawbacks previously mentioned in the quantification of functional connectivity and the assessment of functional connectivity networks. Extending bivariate functional connectivity measures, we introduce a multivariate phase synchrony measure based on hyperdimensional geometry. Along the lines of graph theory, we introduce a graph to signal transform and a dynamic graph Fourier transform as alternative methods in the study of functional connectivity networks. 6 1.3.1 Multivariate Phase Synchrony and Hyperdimensional Geometry Since cognitive processes involve the coordinated activity of multiple regions, functional connectivity measures that account for global integration are needed. Multivariate phase synchrony accounts for the global synchronization of a group of oscillators. In addition, it allows for the quantification of the connectivity structure with a single number instead of a matrix of pairwise values, which means that we can assess the global functional connectivity within a neighborhood of a particular region and represent it with a single number. Furthermore, it provides the means for quantifying the integration of large-scale synchronization and functional connectivity. In Chapter 3, we introduce a novel multivariate phase synchrony measure in order to overcome the drawbacks of bivariate measures such as PLV in the quantification of functional connectivity among multiple brain regions. The proposed measure overcomes drawbacks of current multivariate phase synchrony measures, such as being computationally efficient, robust to noise, and providing excellent topographical sensitivity. This method, referred to as HyperTorus Synchrony (HTS), is based on defining phase differences within a group of oscillators on a flat hyperdimensional torus, which can be considered as an extension of PLV since it is defined as the Cartesian product of unit circles. This novel measure also accounts for the dependency of the multivariate synchrony in the ordering of the phase differences observed in a recently proposed multivariate phase synchrony, HPS [37], [42]. In this chapter, we show that the proposed HTS is equivalent to an extended version of HPS which includes the coordinates of circles with varying radii and thus is independent of the ordering of the phase differences. This chapter provides an extensive mathematical characterization of the measure as well as the statistical properties of the proposed estimator. It is shown that the proposed measure is more robust to noise, is not affected by the number of oscillators, and possesses an excellent topographical 7 sensitivity. Simulation results motivate the use of HTS for the quantification of global functional connectivity in brain networks. In addition, in this chapter we assess the global functional connectivity in cognitive control. It is of particular interest to study the difference between the multivariate synchrony of error responses and correct responses during the ERN interval and its topographical distribution. In order to accomplish this, we study the topographical connectivity by computing the multivariate synchrony in the neighborhood of each electrode and assigning this value to the location of the electrode. In this way, it is possible to construct a multivariate synchrony topographical map, which allows to quantify the integration of multiple regions across the brain. A comparison of HTS to a conventional multivariate synchrony measure, the S-estimator, reveals high error minus correct differences within fronto-lateral and medial-central regions from HTS. In addition, by computing the multivariate synchrony from HTS over different time intervals we identify significant changes in the connectivity within central regions. Furthermore, by summarizing the global synchronization within a region by a single number at each electrode we can correlate functional connectivity and behavioral measures from the experiment, such as the post-error slowing and post error accuracy. In particular, our results show that frontal-central synchrony from HTS is associated with adaptive behavioral adjustments after errors, and suggest that multivariate synchrony from HTS provides the means for quantifying the functional integration of regions engaged in cognitive control. 1.3.2 Graph to Signal Transform for Weighted Graphs In addition to bivariate and multivariate measures of functional connectivity, graph theoretic methods have been widely used in the study and characterization of functional connectivity networks. However, these methods have some drawbacks such as their dependence on node degree and shortest path distance. In Chapter 4, we introduce a method for graph to signal transform which is 8 applicable to both binary and weighted networks. In recent years, the relationship between graphs and signals has been exploited, providing great contributions to the analysis of complex networks. In particular, graph to signal transformations provide the means for obtaining signals that contain the structural information of the networks [43], [44], [45]. Such methods can overcome some of the drawbacks that traditional graph measures face such as the dependence on node degree and shortest path distance, and furthermore, facilitate the implementation of signal processing measures over networks by applying them directly to the signals from the networks. Applications of the graph to signal transform include the assessment of temporal networks [46] and graph filtering [47]. However, all of the previously proposed methods are only applicable to binary networks and it is not possible to apply them directly to the assessment of functional connectivity networks, which are generally weighted. In order to apply the graph to signal transformation to functional connectivity networks, we propose to employ the resistance distance matrix as an alternative distance in the classical multidimensional algorithm (CMDS) [44]. We show how the proposed method serves to characterize both binary and weighted networks and its characterization analytically. Based on the signals obtained from the transformation we propose a series of approaches to study the networks. First, we propose a method for characterizing small-world networks based on the spectral centroid of the signals. As illustrated, the proposed method is more accurate in the estimation of the probability of attachment and the average degree when compared to the traditional small-world measure. In addition, we introduce graph information theoretic measures that account for the information content of the networks. Quantifying the network’s entropy and divergence between networks is important as these provide insights into the network’s information content [48]. In this work, we propose a graph entropy measure and a graph divergence based on the magnitude spectra of the signals from the graphs. This method is novel in the sense that the network’s information is quantified from its 9 signals and does not depend on any graph theoretic measure nor rely on any arbitrary parameters as current graph information measures do. Finally, we propose to use the spectra of the signals from the networks in the detection of events in temporal networks. This method employs a tensor constructed from the magnitude spectra over time and uses the temporal mode of the tensor for detecting the events. 1.3.3 Dynamic Graph Fourier Transform Following the study of the relationships between networks and signals for quantifying their structural properties, it is possible to learn from the networks by combining both the networks and the signals recorded at each node. This provides an alternative way for studying how the human brain functionally integrates the activity from various regions during cognitive processes. The recent field of signal processing over graphs [49] or graph signal processing aims to analyze signals defined over irregular domains, such as signals indexed by the nodes of a graph. This is opposed to traditional signal processing, which aims to analyze signals defined over regular domains, like time and frequency. In recent years, there has been a significant advancement in the definition of concepts widely used in traditional signal processing specifically designed for signals over graphs, such as the graph Fourier transform (GFT) and wavelet transforms, and other concepts such as sampling. In Chapter 5, we are interested in extending the study of functional connectivity networks by considering techniques from graph signal processing. In particular, we are interested in computing the GFT of the EEG signals defined over the electrodes. However, all of the current transforms in graph signal processing consider static networks, which is not the case for functional connectivity networks which change over time depending on the underlying cognitive processes. In order to compute the graph Fourier transform on dynamic networks, a common subspace needs to be found 10 across time. Recently, a dynamic graph Fourier transform [50] which finds a common subspace based on Grassmann manifolds was proposed. However, the accuracy of the common subspace is compromised as the time span increases. Alternatively, in [51] the authors consider the networks averaged over time, which compromises the time-varying structure of the temporal networks. In this chapter, we propose a dynamic Graph Fourier transform (dGFT) whose common subspace is found through the tensor decomposition of the graph Laplacian over time. By computing a dynamic graph Fourier transform over functional connectivity networks from the cognitive control experiment, it is possible to quantify the graph spectral activity from error and correct trials which suggests a higher structural organization during errors within the ERN interval when compared to correct trials. 11 Chapter 2 Background In this chapter we review the background on bivariate time-frequency phase synchrony, graph theory, and describe the EEG cognitive control experiment. 2.1 Bivariate Time-Frequency Phase Synchrony Functional connectivity networks are constructed from the bivariate time-frequency PLV, based on the Reduced Interference Rihaczek time-frequency distribution (RID-Rihaczek) as proposed in [30]. For a signal xi , define Ci (t, ω) to be its complex RID-Rihaczek time-frequency distribution, given by Z Z Ci (t, ω) =   θτ (θ τ)2 exp( j )Ai (θ , τ)e− j(θt+τω) dτdθ , exp − σ 2 | {z } | {z } Choi-Williams kernel (2.1) Rihaczek kernel where Ai (θ , τ) is the ambiguity function of xi : Z Ai (θ , τ) = τ τ xi (u + )xi∗ (u − )e jθ u du. 2 2 12 (2.2) The time-varying phase of the signal xi is computed as  Ci (t, ω) . Φi (t, ω) = arg |Ci (t, ω)|  (2.3) The phase difference between two signals x1 and x2 can be computed as  C1 (t, ω) C2∗ (t, ω) Φ1,2 (t, ω) = arg . |C1 (t, ω)| |C2 (t, ω)|  (2.4) The PLV between two signals x1 and x2 as a function of time and frequency [52] is defined by   1 N PLV1,2 (t, ω) = ∑ exp jΦk1,2 (t, ω) N k=1 q = hcos Φk1,2 (t, ω)i2 + hsin Φk1,2 (t, ω)i2 , (2.5) where N corresponds to the total number of trials or realizations of the signal, Φk1,2 (t, ω) is the phase difference between x1 and x2 as defined by (2.4) for the kth trial and h·i denotes averaging over trials. For each trial k, the phase difference Φk1,2 (t, ω) defines a vector on the unit circle. Thus, PLV evaluates the circular variance of the unit vectors across trials. PLV approaches 1 if the phase differences over trials exhibit small variation and approaches 0 if there is no synchrony over trials. 2.2 Graph Theory An undirected graph G = (V, E) is defined by a set of N nodes, vi ∈ V , and a set of M edges, ei j , i, j ∈ {1, . . . , N}. The relationships between the nodes of the graph is represented by the adja13 cency matrix A = [Ai j ], for binary graphs, and W = [Wi j ] for weighted graphs. In binary graphs, Ai j = 1 when nodes i and j are connected and Ai j = 0 when the nodes are not connected. For weighted graphs, Wi j represents the weight of the edge between nodes i and j and equals zero when i = j. The degree matrix ∆ is defined as the diagonal matrix with entries ∆ii = ∑Nj=1 Ai j , j6=i where ∆ii is the degree of node vi . Similarly, the degree matrix ∆w for weighted networks has N diagonal entries ∆w ii = ∑ j=1 Wi j . j6=i For binary graphs, the combinatorial Laplacian L is defined as L = ∆ − A. The entries of L are given by      ∆ii , i = j ,      Li j = −1, (i, j) ∈ E,         0, otherwise, (2.6) where ∆ii is the degree of node vi . Similarly, for weighted graphs the Laplacian is defined as Lw = ∆w − W. Another important matrix in graph theory is the incidence matrix C, which is a N × M matrix, where N is the total number of nodes and M is the total number of edges in the graph. For undirected graphs, the entries Ci j are equal to 1 in the case that vertex vi and edge e j are incident and equal to zero otherwise. The network small-worldness for binary networks is given by σ = networks as σ w = w Cw /Crand w Lw /Lrand C/Crand L/Lrand and for weighted [10], [53]. C and Crand are the clustering coefficients of the network and a random network, respectively. Similarly, L and Lrand are the characteristic path lengths for the analyzed network and a random network, respectively. 14 2.3 Cognitive control experiment In this section, we describe the EEG dataset from the cognitive control experiment assessed in this thesis. 2.3.1 Participants Data from nineteen subjects, all native English speakers and undergraduate students from Michigan State University, were extracted from a study of the relationship between the ERN and individual differences [54]. Subjects participated for course credit. 2.3.2 Experiment The experiment consisted of a letter version of the Eriksen flanker task [55] which involved correctly identifying the target letter, located at the center of a five-letter string. The target was either congruent (e.g., MMMMM) or incongruent (e.g., NNMNN) with the flanker letters. Subjects pressed a determined mouse button to identify the center letter. The total time for each trial was 135 ms. Flanker letters were presented 35 ms before the target-letter onset and then the five letters remained on the screen for 100 ms. During the inter-trial intervals, ranging from 1,200 1,700 ms, a fixation cross was presented. The experiment consisted of six blocks of 80 trials and the letters comprising the strings differed between blocks. Also, the mouse button assigned to each target letter was reversed at the middle of each block. 2.3.3 EEG Data Acquisition and Pre-processing Electroencephalographic activity was recorded by the ActiveTwo system (BioSemi, Amsterdam, The Netherlands). 64 Ag-AgCl electrodes were embedded in a stretch Lycra cap according to the 15 10/20 system. Also, two electrodes were located on the left and right mastoids. Electrooculogram activity generated by eye movements and blinks was recorded at electrode FP1 and three electrodes located underneath the left pupil and on the left and right outer canthi. All signals were sampled at 512 Hz by the BioSemi s ActiView software. Offline analyses were completed on BrainVision Analyzer 2 (BrainProducts, Gilching, Germany). Electrode recordings were re-referenced to the mean of the mastoids and then band-pass filtered between 0.1 and 30 Hz, with 12 dB/octave roll off. Eye movement artifacts were corrected by the regression method provided by [56]. Epochs of response-locked signals were taken 200 ms preceding the response onset and the subsequent 800 ms. Trials with a voltage difference greater than 200V within it, a voltage step greater than 50V between adjacent sampling points or voltage difference less than 0.5 mV within it were rejected. EEG signals were processed using the Current Source Density Toolbox (CSD) for volume conduction correction. The number of error trials ranged from 20 to 61 (36.78 ± 13.72, meanst.dev.) and the same number of correct responses, per subject, were chosen randomly for the current analyses. 16 Chapter 3 Hypertorus Multivariate Phase Synchrony 3.1 Introduction Coordinated time-varying interactions are fundamental in dynamical systems, ranging from a few coupled elements to complex networks. Examples of systems of coupled oscillators occur widely in nature and engineering such as circadian rhythms [57], neuroscience [58], flashing fireflies [59], coupled Josephson junctions [60], the Millenium Bridge [61], and others [62], [63], [64], [65]. In the stochastic sense, synchronization has been defined as an adjustment of rhythms of oscillating objects due to their weak interaction [66] and this adjustment can be described in terms of phase locking and frequency entrainment. Phase locking or phase synchrony between two oscillators occurs when the generalized phase difference, Φi, j (t, ω) = |Φi (t, ω) − Φ j (t, ω)| mod 2π < constant, at time t and frequency ω [67], [68]. Two steps are needed for quantifying phase synchrony. First, instantaneous phase of each signal is estimated at a particular frequency of interest through methods such as the Hilbert transform, complex wavelet transform [69], empirical mode decomposition [42], [70], [71], [72], [73] or the recently proposed RID-Rihaczek complex time-frequency distribution [30], [74]. In the second step, the amount of synchrony is quantified through either the entropy of the distribution of the phase differences or mean phase coherence, also known as PLV2.1, which computes the circular variance of the relative phase [28], [75]. Although bivariate 17 PLV has been widely used, it has various disadvantages for the study of large and complex networks. First, PLV does not provide information about the common integrating structure among the ensemble of oscillators. Second, for large data sets multiple computations of pairwise PLV increase computational costs. Recently, phase synchronization of a group of oscillators, which is referred to as global or multivariate phase synchronization, has been of interest for understanding the group dynamics and characteristic behavior of complex networks [72], [76], [77], [78], [79]. Contrary to the bivariate phase synchrony, multivariate synchrony captures the global synchronization patterns quantifying the degree of interactions within a group of oscillators. In addition, multivariate synchrony methods provide a single number, rather than a matrix of pairwise synchrony values. One of the earliest approaches to multivariate synchrony analysis was global field synchronization (GFS) proposed by Koenig et al. [77]. GFS first transforms the time series data to the frequency domain and then quantifies the scatter of the multivariate data through the eigenvalues of the covariance matrix of the sine and cosine coefficients of the Fourier transform. This measure inherently assumes the stationarity of the data and cannot capture time-varying aspects of synchrony. Moreover, this method quantifies synchrony as the instance when the phases of the two signals are exactly the same and does not take into account the case of constant phase difference. Knyazeva et al. [80] proposed another simple measure, the multivariate phase synchrony (MPS), defined as the mean phase synchrony averaged across the observation samples. Rudrauf et al. [81], on the other hand, proposed an alternative approach to quantifying phase synchrony through frequency locking by exploiting the relationship between phase and frequency and identifying continuous periods of identical instantaneous frequency. Similarly, in [72] the idea of cointegration is used to define multivariate phase synchrony. However, this method can only identify phase synchrony in a nonstatistical sense and is not reliable in the case of noisy signals. 18 More recently, methods inspired by random matrix theory (RMT) and spectral graph theory were proposed. These methods first compute the bivariate synchrony and then perform cluster analysis through eigendecomposition of the bivariate synchrony matrix as proposed by Allefeld et al. [82]. Initial work in this area focused on perceiving the oscillators as constituting a single cluster to which they participate in different degrees [83]. The existence of a single synchronization cluster is not a reasonable assumption since most complex networks usually consist of multiple clusters. In order to address this limitation, approaches based on the eigenvalue decomposition of the pairwise bivariate synchronization matrix have been proposed [84], [85]. However, it has recently been shown in cases where there are clusters of similar strength that are slightly synchronized with each other, the assumed one-to-one correspondence between eigenvectors and clusters is not realistic [86]. In order to capture the connectivity structure with a single number, Saito et al. [87] quantified global synchrony through the entropy of the eigenspectrum of the covariance or bivariate connectivity matrix. This measure was then generalized by Stam et al. [35] and others as the S-measure [36], [76], [88]. This measure uses the principle of time-delay embedding and indicates how strongly channel k at a given time is synchronized to all other channels. Similar to other methods in nonlinear dynamics, it requires the selection of different parameters, such as a threshold and the time-lag, and is computationally expensive. Recently, HPS was introduced as an alternative method to directly measure the multivariate phase synchronization among a group of oscillators [37], [38]. HPS generalizes bivariate synchrony, where the phase difference between two time series is mapped onto a unit circle, by mapping the N − 1 phase differences between consecutive oscillators onto an N-dimensional space parameterized by hyperspherical coordinates [37]. HPS is advantageous over the S-estimator thanks to its reduced computational complexity and robustness to noise [38]. However, as we show in Sec19 tion 3.3, we found that this estimator was highly dependent on the ordering of the phase differences parameterizing the hypersphere. In this chapter, we propose a novel measure to estimate the multivariate phase synchrony in a hyperdimensional coordinate system and address the shortcomings of HPS. Two complementary approaches are developed to quantify the circular variance of phase differences among multiple oscillators in a high dimensional space. In the first approach, we extend the hyperspherical coordinate system used in HPS to include redundancies, i.e. x and y coordinates of circles with varying radii, such that the ordering of the phases is not important. In the second approach, we propose a new mapping of the phase differences to a high-dimensional flat torus and compute the magnitude of the mean phase vector in this new geometry resulting in the hypertorus phase synchrony (HTS). We then show the equivalence of these two metrics, provide analytical bounds on the bias and variance of HTS and show bias correction for HTS squared. We compared the performance of HTS and the S-estimator on simulated networks of chaotic oscillators for sensitivity to coupling strength and network structure. 3.2 3.2.1 Background S-estimator The S-estimator at time t and frequency ω is computed as  S(t, ω) = 1 +  ∑ λm log(λm ) M m=1 log(M) , (3.1) where λm , m = 1, . . . , M are the eigenvalues of the bivariate synchronization matrix {PLVi, j (t, ω)}, i, j = 1, . . . , M, and M is the total number of oscillators in the network [35, 36]. S-estimator is 20 equivalent to 1 minus the entropy of the normalized eigenvalues of the PLV matrix. This measure equals to 1 when all the oscillators are pairwise highly synchronized. In that case, all the entries in the PLV matrix will be equal to one and thus only one eigenvalue will be equal to one. On the other hand, when all the oscillators in the network are not pairwise synchronized the PLV matrix is full rank and its eigenvalues are uniformly distributed, maximizing the entropy and resulting in zero multivariate synchrony. 3.3 Hyperspherical Phase Synchrony In this section, we describe the problem of HPS, which was published on [89]. Bivariate phase synchrony is based on the circular variance of the two-dimensional direction vectors on a unit circle (1-sphere), obtained by mapping the phase differences {Φk1,2 (t, ω)}k=1,...,N , where N is the total number of trials, between the two time-series onto a Cartesian coordinate system. If the circular variance of these direction vectors is low, the time-series are said to be locked to each other. HPS proposed in [37] is an extension of this idea to the multivariate case. Define k θ1k (t, ω), θ2k (t, ω), . . . , θM−1 (t, ω) (3.2) as the (M − 1) angular coordinates at time t and frequency ω for the kth trial, where θik (t, ω) = Φki (t, ω) − Φki+1 (t, ω) is the phase difference between the ith and (i + 1)th time series within a group of M oscillators. These (M − 1) angular coordinates are mapped onto an M-dimensional space by forming direction vectors in an M-dimensional hyperspherical coordinate system. For any natural number M, an (M − 1)-sphere of radius r is defined as the set of points in (M)-dimensional 21 Euclidean space which are at distance r from a central point, where the radius r may be any positive real number. The set of coordinates in an M-dimensional space, γ1 , γ2 , . . . , γM , that define an (M − 1)-sphere is represented by M r2 = ∑ (γi − ci )2 , (3.3) i=1 where c = [c1 , . . . , cM ] is the center point and r is the radius. In [37], r = 1 and the center point is the origin. k (t, ω)] Using the (M − 1) angular coordinates, a direction vector Γk (t, ω) = [γ1k (t, ω), . . . , γM can be formed by mapping the angular coordinates (θ1 , ..., θM−1 ) on a unit (M − 1)-sphere as: γ1k (t, ω) = cos (θ1k (t, ω)), γ2k (t, ω) = sin (θ1k (t, ω)) × cos (θ2k (t, ω)), γ3k (t, ω) = sin (θ1k (t, ω)) × sin (θ2k (t, ω)) × cos (θ3k (t, ω)), .. . k k k γM−1 (t, ω) = sin (θ1k (t, ω)) × . . . × sin (θM−2 (t, ω)) × cos (θM−1 (t, ω)), k k k and γM (t, ω) = sin (θ1k (t, ω)) × . . . × sin (θM−2 (t, ω)) × sin (θM−1 (t, ω)). (3.4) Based on this mapping HPS is defined as 1 N k HPS(t, ω) = ∑ Γ (t, ω) , N k=1 (3.5) 2 where HPS(t, ω) is the multivariate synchronization value at time t and frequency ω, k.k2 is the Euclidean norm and N is the number of trials. In the case of perfect multivariate phase synchro22 nization of the network, HPS is equal to 1 and it equals 0 when the oscillators are independent. Note that HPS is equivalent to PLV for a network consisting of two signals. In this case, M = 2 and from (3.4) the direction vector Γk (t, ω) = [γ1k (t, ω), γ2k (t, ω)], where γ1k (t, ω) = cos(θ1k (t, ω)) and γ2k (t, ω) = sin(θ1k (t, ω)). Hence, (3.5) is equivalent to (2.5). It can be shown that the HPS defined based on the coordinate system in (3.4) is dependent on the ordering of the phase differences θi (t, ω). This dependency will result in unstable HPS values and lead to incorrect interpretation of the multivariate synchrony. To illustrate this problem, we show the derivation of the HPS value for the case of three oscillators (M = 3). The rotating vectors in (3.4) can be written as, γ1k (t, ω) = cos (θ1k (t, ω)), γ2k (t, ω) = sin (θ1k (t, ω)) × cos (θ2k (t, ω)), γ3k (t, ω) = sin (θ1k (t, ω)) × sin (θ2k (t, ω)). (3.6) For simplicity, we further assume that we have only two trials with angular coordinates (or phase differences) {θ11 (t, ω), θ21 (t, ω)} and {θ12 (t, ω), θ22 (t, ω)}, respectively. The corresponding HPS given in (3.5) reduces to HPS(t, ω) = = v !2 !2 !2 u N N N 1u t ∑ γ1k (t, ω) + ∑ γ2k (t, ω) + ∑ γ3k (t, ω) N k=1 k=1 k=1 q 1 2 + 2 cos (θ11 (t, ω)) cos (θ12 (t, ω)) + 2 sin (θ11 (t, ω)) sin (θ12 (t, ω)) cos (θ21 (t, ω) − θ22 (t, ω)). 2 (3.7) In order to show that HPS is dependent on the ordering of the phase differences, we recalculate 23 the HPS with reordered angular coordinates {θ21 (t, ω), θ11 (t, ω)} and {θ22 (t, ω), θ12 (t, ω)}, HPS(t, ω) = 1 2 q 2 + 2 cos (θ21 (t, ω)) cos (θ22 (t, ω)) + 2 sin (θ21 (t, ω)) sin (θ22 (t, ω)) cos (θ11 (t, ω) − θ12 (t, ω)). (3.8) It is clear that (3.7) and (3.8) are not equivalent except in the case of perfect synchrony, i.e., θ11 (t, ω) = θ21 (t, ω), θ12 (t, ω) = θ22 (t, ω), θ11 (t, ω) = θ12 (t, ω) and θ21 (t, ω) = θ22 (t, ω). Therefore, the ordering of the phase differences θik (t, ω) plays a major role in calculating the corresponding HPS values. Thus, a modification of this definition is required to address this problem. In addition, in order to capture global phase information, we will replace the previously defined pairwise phase differences for HPS by the phase difference between the phase of each oscillator and the phase of the resultant vector of the remaining oscillators [90], given by       M k k k θi (t, ω) = Φi (t, ω) − arg ∑ exp( jΦm (t, ω)) .   m=1  (3.9) m6=i 3.4 3.4.1 Proposed Solution Hyperspherical Approach In this section, we propose a solution to the phase ordering problem encountered in HPS. This approach is based on the analysis of the hyperspherical coordinate system given in (3.4). The coordinates in (3.4) are equivalent to x coordinates of a rotating circle with varying radii. For example, γ1k (t, ω) is the x coordinate of a vector on the unit circle at angular position θ1k (t, ω), while γ2k (t, ω) is the x coordinate of a vector on a circle with radius sin(θ1k (t, ω)) at angular position 24 θ2k (t, ω). Similar analysis applies to the remaining θik (t, ω)s. Thus, every γik (t, ω) is just the x k coordinate of a vector on a circle with radius ri (t, ω) = ∏i−1 j=1 sin(θ j (t, ω)), for i = 2, 3, ..., M and with a phase θik (t, ω). The equation for ri (t, ω) shows that as i increases, γik (t, ω) will have less impact on the overall synchrony. This means that the choice of the first phase difference, θ1k (t, ω), will have a high impact on the measured synchrony. Eq. (3.4) may also be interpreted as follows. Every γik (t, ω) is the x projection of the y coordik (t, ω) on the x-axis with a phase θ k (t, ω), i.e. define x and y coordinates nate of the previous γi−1 i of the rotating vector for each trial k as γxk1 (t, ω) = cos (θ1k (t, ω)), γyk1 (t, ω) = sin (θ1k (t, ω)), γxk2 (t, ω) = sin (θ1k (t, ω)) × cos (θ2k (t, ω)), γyk2 (t, ω) = sin (θ1k (t, ω)) × sin (θ2k (t, ω)), γxk3 (t, ω) = sin (θ1k (t, ω)) × sin (θ2k (t, ω)) × cos (θ3k (t, ω)), γyk3 (t, ω) = sin (θ1k (t, ω)) × sin (θ2k (t, ω)) × sin (θ3k (t, ω)), .. . k γxkM (t, ω) = sin (θ1k (t, ω)) × . . . × sin (θM−1 (t, ω)) × cos (θMk (t, ω)), k γykM (t, ω) = sin (θ1k (t, ω)) × . . . × sin (θM−1 (t, ω)) × sin (θMk (t, ω)), (3.10) where the phases θik (t, ω)s are defined as in (3.9) and the superscripts x and y refer to the projection coordinates. We can also rewrite (3.10) as, 25 γxk1 (t, ω) = cos (θ1k (t, ω)), γxk2 (t, ω) = γyk1 (t, ω) × cos (θ2k (t, ω)), γxk3 (t, ω) = γyk2 (t, ω) × cos (θ3k (t, ω)), .. . γxkM (t, ω) = γykM−1 (t, ω) × cos (θMk (t, ω)), γykM (t, ω) = γykM−1 (t, ω) × sin (θMk (t, ω)). (3.11) k Eq. (3.11) reveals that the radius rik (t, ω) = ∏i−1 j=1 sin(θ j (t, ω)), for i = 2, 3, ..., M is just the y coordinate of the previous γyki−1 (t, ω). This recursive structure is the cause of the ordering problem. To solve this problem, we propose to consider both the x and y coordinates for all oscillators. By computing the l2 norm, dik (t, ω), of the direction vectors for each oscillator i using the coordinates γxki (t, ω) and γyki (t, ω), we end up with the following norms, d1k (t, ω) = 1, d2k (t, ω) = sin (θ1k (t, ω)), d3k (t, ω) = sin (θ1k (t, ω)) × sin (θ2k (t, ω)), .. . k k dM (t, ω) = sin (θ1k (t, ω)) × . . . × sin (θM−1 (t, ω)), (3.12) 26 k or simply dik (t, ω) = rik (t, ω) = ∏i−1 j=1 sin(θ j (t, ω)) for i = 2, 3, ..., M. Thus, in order to get rid of the dependency on the phase ordering, we propose to normalize γxki (t, ω) and γyki (t, ω) by dik (t, ω). This will result in unit radius for all i. Therefore, the modified multivariate phase synchrony measure is given by N 1 k √ ∑ D (t, ω) , HPS(t, ω) = N × M k=1 2 where Dk (t, ω)  (3.13)  = d k (t,ω) , d k (t,ω) , . . . , d k (t,ω) , d k (t,ω) . As the l2 norm of each vector Dk (t, ω) in the γxk1 (t,ω) γyk1 (t,ω) 1 γxkM (t,ω) γykM (t,ω) M 1 M √ above equation is equal to M, to make the definition and range of HPS consistent with PLV (see √ (2.5)) we normalize HPS by M. The modified measure given in (3.13) can be rewritten as v u 1 u HPS(t, ω) = √ t N M N γ k (t, ω) ∑ dxk1(t, ω) k=1 1 !2 By noting that cos (θik (t, ω)) = γyk1 (t, ω) N + ∑ d k (t, ω) k=1 γxki (t,ω) dik (t,ω) 1 !2 N γ k (t, ω) + · · · + ∑ xkM k=1 dM (t, ω) and sin (θik (t, ω)) = γyki (t,ω) dik (t,ω) !2 N + γykM (t, ω) ∑ d k (t, ω) k=1 !2 (3.14) . M we can write the modified HPS as q 1 HPS(t, ω) = √ PLV12 (t, ω) + · · · + PLVM2 (t, ω) M s 1 M = ∑ PLVi2(t, ω), M i=1 (3.15) where PLVi2 quantifying the synchronization of each oscillator with respect to a common reference 27 angle with θik (t, ω) as defined in (3.9) and PLVi is given by   1 N PLVi (t, ω) = ∑ exp jθik (t, ω) N k=1 q hcos θik (t, ω)i2 + hsin θik (t, ω)i2 . = (3.16) The maximum value of HPS(t, ω) is 1, when there is complete phase synchronization among oscillators. On the other hand, HPS(t, ω) is theoretically 0 when the oscillators are independent. 3.4.2 Hypertorus Synchrony Results found in the previous section can alternatively be derived from an alternative mapping: the Cartesian product of unit circles parameterized by phase differences as given in (3.9). In a network consisting of M oscillators, consider a phase θik (t, ω) that parameterizes the unit circle S1 ⊂ R2 by    the angular coordinates S1 = cos θik , sin θik | 0 ≤ θik ≤ 2π . Let another unit circle S1 ⊂ R2 n     o be parameterized by the angular coordinates S1 = cos θ jk , sin θ jk | 0 ≤ θ jk ≤ 2π . The Cartesian product S1 × S1 defines the manifold 2 1 1 T = S ×S = n         o k k k k k k cos θi , cos θ j , sin θi , sin θ j | 0 ≤ θi , θ j ≤ 2π , (3.17) which is embedded in R4 . The M-dimensional flat torus Tm ⊂ R2m is the manifold Tm = S1 × 2 + y2 = 1. It is parameterized by x = cos (θ (t, ω)) and · · · × S1 defined by x12 + y21 = ... = xM i i M yi = sin (θi (t, ω)) [91]. A Riemannian metric g on a n-dimensional manifold M defines an inner product between tangent vectors in each tangent space Tp M for every point p ∈ M [91]. A Riemannian manifold 28 (M, g) is a differentiable manifold equipped with a Riemannian metric [92]. Thus, for every point p in (M, g) the length of any tangent vector X ∈ Tp M is given by |X|= hX, Xi1/2 [91]. The Cartesian product between two Riemannian manifolds (M1 , g1 ) and (M2 , g2 ) is equipped with the product metric g = g1 ⊕ g2 , which is defined as [91] g (X1 + X2 ,Y1 +Y2 ) = g (X1 +Y1 ) + g (X2 +Y2 ) , (3.18) where Xi ,Yi ∈ Tpi Mi and T(p1 ,p2 ) (M1 × M2 ) = Tp1 M1 ⊕ Tp2 M2 . A torus Tm is locally isometric to Euclidean space, meaning that every point on Tm has a neighborhood that is isometric to an open set in Rm [91], which results in a manifold whose curvature is zero everywhere and its tangent spaces are identical to the manifold [93]. Hence, Tm is a flat Riemannian manifold equipped with the Euclidean metric [94]. For a group of M oscillators, vector (3.19) lies in Tm k Γk (t, ω) = [x1k , yk1 , ..., xM , ykM ], (3.19)   where xik = cos θik (t, ω) , yki = sin θik (t, ω) , and θik (t, ω) is a phase difference as defined in (3.9) for the kth trial. HT S(t, ω) can then be defined as HT S(t, ω) = N 1 √ k ∑ Γk (t, ω) k2 , N M k=1 29 (3.20) HT S(t, ω) = = = = v !2 !2 !2 u N N N 1 u t k k k √ ∑ cos θ1 (t, ω) + ∑ sin θ1 (t, ω) + · · · + ∑ cos θM (t, ω) + N M k=1 k=1 k=1 q 1 k (t, ω)i2 + hsin θ k (t, ω)i2 √ hcos θ1k (t, ω)i2 + hsin θ1k (t, ω)i2 + · · · + hcos θM M M q 1 √ PLV12 (t, ω) + · · · + PLVM2 (t, ω) M s 1 M ∑ PLVi2 (t, ω), M i=1 !2 N ∑ k (t, ω) sin θM k=1 (3.21) where M is the number of oscillators and N is the total number of trials. HTS can be re-expressed as shown in (3.21), which is equivalent to (3.15). Throughout the rest of this article we will use HTS to refer to both approaches. 3.4.3 Computational Complexity HTS involves the computation of M PLVs, with complexity O(M log n) per time-frequency point [95], where n is the number of points used in the fast Fourier transforms (ffts) in the computation of the time-frequency distribution (usually equal or greater than the length of the signal). The computation of one square root has complexity O(m2 ) when computed through the Newton-Raphson Method [96], where m corresponds to the minimum of the number of bits from the two numbers being multiplied (32 or 64 bits for double precision). Thus, the total computational complexity of HTS is O(M log n) + O(m2 ). On the other hand, the computational complexity of the S-estimator  relies on the computation of M2 PLVs for the construction of the synchronization matrix and its   eigendecomposition. Computing M2 PLVs has a complexity of O( M2 log n), which can be approximated as O( M2 log n) for large M. The eigendecomposition of the synchronization matrix has complexity O(M 3 ) [97]. Thus, the total computational complexity of the S-estimator is O( M2 log n) + O(M 3 ). Therefore, the proposed metric is computationally more efficient than the S-estimator. 30 3.5 Statistical assessment of HTS dS(t, ω) In this section, we assessed the asymptotic properties of the expected value and variance of HT in the absence of synchrony as well as for different levels of synchrony. Finally, as previously done dS2 and evaluated its variance empirically. for PLV, we found an unbiased estimator for HT 3.5.1 Bias Due to its dependence on PLV, the proposed measure also exhibits a bias which is dependent on the number of trials. We will first illustrate this dependency by assuming a Von Mises distribution for phase differences. The Von Mises distribution V M(θ , κ) is the most common model for circular data [90]. It is defined by the reference direction, θ , and its dispersion about that direction, κ. Its probability density function is given by f (θ ) = [2πI0 (κ)]−1 exp [κ cos (θ − µ)] , 0 ≤ θ < 2π, 0 ≤ κ < ∞, where I0 (κ) = (2π)−1 R 2π 0 (3.22) exp [κ cos (φ − µ)] is the modified Bessel function of order zero [90]. dS, Figure 3.1 illustrates the theoretical and experimental multivariate synchrony, HTS and HT respectively, for different levels of synchronization in a network consisting of M = 4 oscillators. Here we are assuming that the phase differences in (3.15) and (3.21) are equally distributed according to V M(0, κ) for simplicity and various levels of synchrony are obtained by varying the concentration parameter κ. As observed, the bias of HTS depends on the underlying distribution 31 of the angles θi , bias being the most prominent in the absence of synchrony, or when θi is uniformly distributed. In addition, the bias is dependent on the sample size and results based on small sample sizes should be interpreted carefully. dS. A lower bound In this thesis, we further assessed the bounds on the bias and variance of HT on bias can be found from the inequality for arithmetic and quadratic means [98] as PLV1 (t, ω) + · · · + PLVM (t, ω) ≤ M s PLV12 (t, ω) + · · · + PLVM2 (t, ω) , M (3.23) d (t, ω) ∈ [0, 1]. where the absolute value in the original inequality is no longer required since PLV An upper bound can be found as s PLV12 (t, ω) + · · · + PLVM2 (t, ω) PLV1 (t, ω) + · · · + PLVM (t, ω) √ ≤ . M M (3.24) dS(t, ω) can be found as Thus, the lower and upper bounds on the expected value of HT "s # 1 M d2 ∑ PLV i (t, ω) M i=1 " # 1 M d ≥ E ∑ PLV i(t, ω) M i=1 i 1 M hd E PLV (t, ω) , = i ∑ M i=1 i h d E HT S(t, ω) = E (3.25) and " # h i M 1 dS(t, ω) ≤ E √ ∑ PLV d i (t, ω) , E HT M i=1 32 (3.26) respectively. 1 Actual HTS − − − True HTS 0.8 HTS 0.6 0.4 0.2 0 0 50 100 Sample Size 150 200 dS (solid lines) and true HTS (dashed lines) for synchrony values of 0, 0.20, 0.40, Figure 3.1: HT 0.60, 0.80 and 0.99. M = 4 oscillators were simulated with phases θi distributed as V M(0, κ). h i d (t, ω) in the absence of synchrony has been previously An approximate value for E PLV i h i h dS(t, ω) can be bounded as d (t, ω) ≈ √1 [30]. Hence, E HT found to be E PLV N i rM h 1 dS(t, ω) ≤ √ ≤ E HT . N N (3.27) Eq. (3.27) shows that in a network in which all oscillators are independent, the minimum dS can attain is inversely proportional to the square root of the total number possible value that HT d . On the other hand, its upper bound is directly of trials or observations, as previously found for PLV proportional to the number of oscillators. h i d (t, ω) , or the mean resultant length as known in the circular Asymptotic results for E PLV statistics community, have been found for the Von Mises distribution with mean direction θ = 0 33 and concentration parameter κ > 0 [99] as h i d (t, ω) = A(κ) + 1 , E PLV 2Nκ where A(κ) = I1 (κ) I0 (κ) (3.28) and Ii (κ) is the modified Bessel function of the first kind of the ith order. Considering all M oscillator’s θi s to be identically distributed and substituting (3.28) into (3.25) and (3.26) yields    h i √  1 1 d A (κ) + ≤ E HT S(t, ω) ≤ M A (κ) + . 2Nκ 2Nκ 3.5.2 (3.29) Variance dS(t, ω) we will define HT dS2 (t, ω) as In order to find an upper bound on the variance of HT M dS2 (t, ω) = 1 ∑ PLV d 2i (t, ω). HT M i=1 (3.30) Taking expectation on both sides yields # 1 M d2 ∑ PLV i (t, ω) M i=1 i 1 M hd2 = E PLV (t, ω) , ∑ i M i=1 h i dS2 (t, ω) = E E HT " (3.31) 2 d (t, ω) where the linearity property of expectation has been employed. The expected value of PLV is a well known expression [34], [99], given by 34  h i 1  2 1 d i (t, ω) = + 1 − E PLV PLVi2 (t, ω). N N (3.32) Thus, substituting (3.32) in (3.31) yields  h i 1  2 1 d E HT S (t, ω) = + 1 − HT S2 (t, ω). N N (3.33) dS(t, ω) in the absence of synchrony can be found as An upper bound on the variance of HT i2 i h   h dS(t, ω) dS2 (t, ω) − E HT Var HT\ S(t, ω) = E HT     1 1 2 1 2 + 1− HT S (t, ω) − √ ≤ N N N 1 = 1 − HT S2 (t, ω). N (3.34) dS can Thus, in the absence of synchrony, the maximum possible value that the variance of HT attain is 1. In the case of V M(0, θ ) the upper bound for the variance is       1  1 1 1 2 2 d + 1− HT S (t, ω) − A (κ) + 2A (κ) + . Var HT S(t, ω) ≤ N N 2Nκ 4N 2 κ 2 where phase differences θi are drawn from V M(0, κ). 35 (3.35) 0.04 HTS = 0.20 HTS = 0.40 HTS = 0.60 HTS = 0.80 HTS = 0.99 0.035 Variance 0.03 0.025 0.02 0.015 0.01 0.005 0 20 40 60 80 100 120 Sample Size 140 160 180 200 dS(t, ω). (a) Upper bounds for variance of HT −3 8 x 10 HTS = 0.20 HTS = 0.40 HTS = 0.60 HTS = 0.80 HTS = 0.99 7 Var(HTS) 6 5 4 3 2 1 0 20 40 60 80 100 120 Sample Size 140 160 180 200 dS(t, ω). (b) Empirical variance of HT Figure 3.2: (a) Theoretical upper bounds for the variance of HTS; and (b) empirical variance of HTS as a function of sample size in a network of M = 4 oscillators for different synchronization levels in the Von Mises distribution in (3.35). dS(t, ω) and its empirical Figure 3.2 (a) and (b) show the upper bounds on the variance of HT variance, respectively, for various levels of synchrony in a network consisting of M = 4 oscillators. 36 dS(t, ω) decreases as the number of trials or observations From Figure 3.2 (a), the variance of HT increase as well as when the global synchronization increases. Figure 3.2 (b) shows the variance dS(t, ω) obtained empirically for various levels of synchrony. It is observed that the empirical of HT variance follows similar trends as those obtained by the upper bound, without attaining it. 3.5.3 dS2 Correction of Bias in HT dS2 (t, ω) As in the case of PLV [34, 100], it is straightforward to find an unbiased estimator of HT dS(t, ω). Eq. (3.33) suggests that the bias in HT dS arises from the bias in PLV d. rather than for HT dS2 (t, ω) can be found as An unbiased estimator for HT 2 dSUB HT (t, ω) =  1 d2 HT S (t, ω) × N − 1 . N −1 (3.36) d 2 (t, ω) in (3.31) by its unbiased estimator This result is obtained similarly by substituting PLV previously found in [34], [100] d 2i(UB) (t, ω) = PLV = N−1 N 2 ∑ ∑ cos (φim (t, ω) − φin (t, ω)) N(N − 1) n=1 m=n+1  1 d2 PLV i (t, ω) × N − 1 . N −1 (3.37) 37 1 Actual HTS2 − − − True HTS2 0.8 HTS2 0.6 0.4 0.2 0 0 50 100 Sample Size 150 200 dS2 (solid lines) and true HT S2 (dashed lines) for true HT S values of 0, 0.20, 0.40, Figure 3.3: HT 0.60, 0.80 and 0.99. 0.012 HTS = 0 HTS = 0.20 HTS = 0.40 HTS = 0.60 HTS = 0.80 HTS = 0.99 Var(HTS2hat) − − − Var(HTS2 ) 0.01 UB Var(HTS2) 0.008 0.006 0.004 0.002 0 20 40 60 80 100 120 Sample Size 140 160 180 200 dS2 (solid lines) and true HT S2 (dashed lines) for true HT S values of 0, Figure 3.4: Variance of HT 0.20, 0.40, 0.60, 0.80 and 0.99. 2 dS2 (t, ω) and HT dSUB Figs. 3.3 and 3.4 illustrate the expected value and variance of HT (t, ω) for d 2 , the variance of the unbiased estimator various synchrony values. As previously reported for PLV 38 is slightly higher than that of the biased estimator for small sample sizes. 3.6 Results In this section, results of multivariate phase synchronization on simulated and EEG data are presented. First, the robustness to noise of proposed measure is evaluated on a network of highly synchronized sinusoidal oscillators. Next, we asses the effect of the number of oscillators on the synchrony measures. In addition, we evaluate the multivariate synchrony of a network of Kuramoto oscillators with varying coupling strengths. Next, the sensitivity to global coupling is evaluated for various Rössler oscillators, followed by the assessment of the topographical sensitivity of the proposed measure. Finally, we present a detailed analysis of global connectivity in the cognitive control experiment. 3.6.1 Assessment of robustness to noise of multivariate phase synchrony measures In the first simulation, the performance of HTS was evaluated for various signal-to-noise ratio (SNR) levels in a network of synchronized cosine oscillators and compared to S-estimator. A network of 8 sinusoidal oscillators with constant phase differences is defined as   (i − 1) + ηi (t), i = 1, . . . , 8, t = 1, 2, . . . , 512, xi (t) = cos 80πt + π 8 (3.38) where ηi (t) is independent white Gaussian noise, with SNR between -20 and 30 dB in steps of 2 dB, for a total of N=200 trials and the signal length equal to T = 512 samples. S-estimator and HTS were computed by first estimating the instantaneous phase at the frequency of interest, 40 Hz, 39 through Hilbert transform and then evaluating the synchrony through (3.1) and (3.20). S-estimator T T ¯ S = 1 ∑t=1 HT S(t) and S̄ = T1 ∑t=1 S(t), where and HTS were then averaged over time to obtain HT T T = 512. Figure 3.5 shows the average multivariate synchrony, for HTS and S-estimator as a function of SNR in the network of sinusoidal oscillators given by (3.38). Both, HTS and S-estimator result in multivariate synchrony equal to 1 for high SNR. However, HTS is more robust to noise as the S-estimator shows a sharper decrease for SNR values less than 10 dB. At low SNR, S-estimator approaches 0 whereas HTS remains constantly high due to bias in the PLV estimator [30], [34], [100]. 1 Multivariate Synchrony HTS S−estimator 0.8 0.6 0.4 0.2 0 −30 −20 −10 0 10 SNR (dB) 20 30 40 Figure 3.5: Multivariate synchrony for a network of highly synchronized sinusoidal oscillators. 3.6.2 Effect of number of oscillators on the multivariate synchrony measures In order to show how the number of oscillators in a network affects the accuracy of the estimators, S-estimator was computed for four simulated bivariate synchronization matrices whose off-diagonal entries were uniformly distributed between [0.05, 0.15], [0.25, 0.35], [0.45, 0.55] and 40 [0.65, 0.75]. The HTS was computed by letting the PLVs in 3.16 to take values from the same four intervals as for the S-estimator. These simulation were repeated 100 times. The number of oscillators in the network was incremented in powers of 2, starting at 4 up to 256. In a network where all of the oscillators have similar bivariate pairwise synchrony, we would expect the multivariate synchrony to be around that value. As shown on Figure 3.6, the multivariate synchrony obtained from the S-estimator varies depending on the number of oscillators in the network. In general, the S-estimator results in a lower synchronization value than expected, where the largest differences occur at higher synchrony. HTS, on the other hand, is not affected by the number of oscillators since it is computed through the l2 -norm and gives synchrony values close to the true value. In order to understand the sensitivity of the S-estimator to the number of oscillators, Figure 3.7 shows the normalized eigenvalues of a synchronization matrix whose off-diagonal entries are close to 0.4, for 4, 8, 12 and 16 oscillators. As Figure 3.7 suggests, increasing the number of oscillators results in a reduction of the entropy. Also, as the number of oscillators increases the largest normalized eigenvalue decreases. These findings show how the S-estimator is affected by the number of oscillators and results in lower multivariate synchrony than the true global synchronization in the network, potentially due to approximating the entropy through a finite number of samples. These results suggest that multivariate synchrony using the S-estimator could be underestimated unless the total number of oscillators in the network is high. 41 0.7 Multivariate Synchrony − S−estimator ... HTS 0.1 0.3 0.5 0.7 0.8 0.6 0.5 0.4 0.3 0.2 0.1 0 4 8 16 32 64 128 256 Number of Oscillators Figure 3.6: Effect of number of oscillators on S-estimator and HTS for mean synchrony values of 0.1, 0.3, 0.5, and 0.7. 42 1 0.8 0.8 0.6 0.6 value value 1 0.4 0.2 0.2 0 0.4 1 1.5 2 2.5 eigenvalue 3 3.5 0 4 1 2 3 1 1 0.8 0.8 0.6 0.6 0.4 7 8 0.4 0.2 0.2 0 6 (b) 𝑀 = 8 value value (a) 𝑀 = 4 4 5 eigenvalue 0 2 4 6 eigenvalue 8 10 0 12 0 5 (c) 𝑀 = 12 10 eigenvalue 15 20 (d) 𝑀 = 16 Figure 3.7: Effect of number of oscillators (M) on the eigenvalues for a true multivariate synchrony value of 0.4, a) M = 4; b) M = 8; c) M = 12; and d) M = 16. 3.6.3 Kuramoto Model In order to evaluate the performance of the proposed measure as a function of coupling strength, we computed the multivariate synchrony in a large network of coupled oscillators as presented by Kuramoto [101]. Kuramoto model describes a system consisting of multiple oscillators with different natural frequencies which synchronize to a common frequency after their coupling exceeds a certain threshold [102]. This model has been used to describe many physical phenomena, ranging from unicellular organisms [103] to the neurosciences [74], [104]. Phase dynamics governing the cooperative synchronization among M oscillators are given by dφi K = ωi + dt M M ∑ sin(φ j − φi), j=1 43 (3.39) where φi corresponds to the phase of the ith oscillator, ωi is its natural frequency and K corresponds to the coupling strength, which is equal among all oscillators. The natural frequency of each oscillator is chosen randomly from a Lorentzian distribution given by g (ω) = γ h i, π γ 2 + (ω − ωo )2 (3.40) with mean ωo and width γ. Kuramoto found that oscillators are desynchronized until K exceeds a critical value Kc = 2γ. Exceeding Kc separates the oscillators into two groups: one that contributes to the synchronization of the system and another whose natural frequencies come from the tails of the distribution and contribute to desynchronization of the system [105]. As K increases, the group of synchronized oscillators increases until all oscillators are synchronized. A network consisting of M = 64 oscillators was simulated and the time-varying phases φi (t) were solved numerically via Runge-Kutta with a time step of ∆t = 0.0078 s, which results in a sampling frequency of 128 Hz. The natural frequencies of each oscillator are drawn from a Lorentzian distribution as given by (3.40) where ωo = 40 rad/s and γ = 1. This results in a Kc = 2γ = 2. The signal length was 2048 samples, and the first 500 samples were discarded to avoid transients. Figure 3.8 shows multivariate synchrony estimated from HTS and the S-estimator as K increased from 0 to 9 in increments of 0.5. We expect to observe low synchrony for K < Kc with a sudden increase in synchrony after Kc . When K = 0 multivariate synchrony from both S-estimator and HTS is greater than 0, which indicates bias on the estimators when phases come from an uniform distribution. On the other hand, HTS is more sensitive to the increase of global synchronization for K = Kc = 2 compared to the S-estimator. The standard deviation of both estimators is maximal around Kc [104], with S-estimator showing less variance than HTS since it is a weighted 44 average of all bivariate PLVs, obtained from the eigendecomposition. Finally, when the system is fully synchronized HTS approaches 1 as expected. 1 Multivariate Synchrony 0.9 HTS S−estimator 0.8 0.7 0.6 0.5 0.4 0.3 0.2 −2 0 2 4 K 6 8 10 Figure 3.8: Comparison of mean and standard deviation of multivariate synchrony (HTS and Sestimator) within a Kuramoto network with Kc = 2. 3.6.4 Rössler oscillator In order to test multivariate synchrony under different network configurations we used a Rössler oscillator model. Rössler oscillators describe a system of weakly coupled self-sustained stochastic oscillators [106]. We modeled a network consisting of 6 oscillators coupled through their x-dimension [107]. Eight different configurations are considered, illustrated in Fig 3.9. It is expected that networks 1 and 2 will exhibit low synchrony, and network 8 will result in multivariate synchrony close to 1. Dynamics governing the networks under study are given by 45 ξ˙j   Ẋ j      = Y˙j      Z˙ j  " Ẋ j = −ω jY j − Z j + ∑ εi j Xi − X j  i6= j   = ˙ −ω j X j − aY j Y j =   Z˙ j = b + Xj − c Zj #   + σ η j   ,    (3.41) where i, j = 1, 2, ..., 6, a = 0.35, b = 0.2, c = 10, ω1 = 1.05, ω2 = 1.03, ω3 = 1.01, ω4 = 0.99, ω5 = 0.97, ω6 = 0.95, εi j = ε ji = 0.5, σ = 1.5 and η j is white Gaussian noise. The differential equations were solved by the Runge-Kutta method at a time step of 0.067 seconds. Simulations were repeated 200 times, for a signal length of 2000 samples and sampling frequency of 15 Hz. Table 1 compares multivariate synchrony evaluated using HTS and S-estimator for each of the eight Rössler networks presented in Figure 3.9. The second and third columns show results for HTS and S-estimator (mean±st.dev.) computed according to (3.20) and (3.1), respectively. Multivariate synchrony values obtained from both measures are comparable and align with our expectations for all networks. For both methods, the multivariate synchrony results for each network is significantly different from that obtained from a null network in which none of the oscillators is connected, i.e. εi j = 0, (Wilcoxon rank sum test, p<0.01). The two networks differ in their behavior only for networks 5 and 6. In the case of network 5, multivariate synchrony obtained from HTS is higher than that from network 6, whereas it is the opposite for S-estimator. In network 5, four out of six oscillators are all interconnected with only 46 two isolated oscillators contributing to low synchrony. Since HTS relies on the root mean square of PLVs with one PLV computed between each oscillator and the mean phase, there will be only two PLVs with low synchrony. On the other hand, in network 6 although there are two sub-networks that are fully synchronized these are not interconnected and hence the global synchrony of the network should not be as high as in network 5 as indicated by HTS. This result is also observed from the unbiased squared HTS and S-estimator, as shown in the fourth and fifth columns of Table 2 is computed as in (3.36) and S2 is obtained by using unbiased 1, respectively. Here, HT SUB UB PLV 2 as in (3.37). Note that network 4 also contains 6 connections and results in higher synchrony than networks 5 and 6. This is due to the indirect connections that emerge when oscillators are interconnected through a third oscillator. 47 1 1 2 3 6 5 5 4 1 3 5 1 2 4 (f) Network 6 1 2 3 5 3 5 4 6 2 6 (e) Network 5 1 4 (d) Network 4 3 5 3 5 4 6 2 6 (c) Network 3 1 4 (b) Network 2 2 6 3 6 (a) Network 1 1 2 2 3 6 5 4 4 (h) Network 8 (g) Network 7 Figure 3.9: Eight Rössler networks. In order to assess the effect of the number of oscillators in the computed synchrony values, we 48 constructed two subnetworks consisting of 3 oscillators each (as in Figure 3.9 (f)) and increased the number of oscillators in the network to 9 and 12. Table 2 shows the results for both HTS and S-estimator as the number of oscillators increases. Note that the first case, 6 oscillators, is the same as network 6 in Figure 3.9. For both methods, as the number of oscillators increases the multivariate synchrony decreases as there are more non-synchronized oscillators in the network. This trend is also observed from the unbiased estimators of HT S2 and S2 . Finally, we assessed the effect of the number of subnetworks on the multivariate synchrony measures. Table 3 shows the results for HTS, S, their squared unbiased estimators for different number of subnetworks of three oscillators in a network of 12 oscillators. As expected, increasing the number of subnetworks increases the multivariate synchrony for both estimators. Table 3.1: Multivariate synchrony (mean±st.dev.) in networks of Rössler oscillators. Network HTS 2 HT SUB S 2 SUB 1 0.226±0.023 0.267±0.004 0.051±0.010 0.254±0.002 2 0.612±0.028 0.474±0.016 0.375±0.034 0.384±0.009 3 0.944±0.029 0.851±0.059 0.892±0.055 0.765±0.086 4 0.985±0.000 0.945±0.001 0.971±0.000 0.907±0.001 5 0.800±0.009 0.580±0.012 0.640±0.015 0.523±0.007 6 0.765±0.027 0.673±0.011 0.586±0.042 0.610±0.005 7 0.980±0.001 0.929±0.002 0.960±0.002 0.881±0.004 8 0.999±0.000 0.996±0.000 0.998±0.000 0.993±0.000 49 Table 3.2: Multivariate synchrony (mean±st.dev.) for different number of oscillators containing two subnetworks of three oscillators. Number of HTS S 2 HT SUB 2 SUB Oscillators 6 0.765±0.027 0.673±0.011 0.585±0.042 0.610±0.005 9 0.682±0.017 0.457±0.011 0.463±0.024 0.372±0.009 12 0.613±0.015 0.348±0.008 0.373±0.018 0.263±0.006 Table 3.3: Multivariate synchrony (mean±st.dev.) for a network consisting of 12 oscillators for different number of subnetworks composed of three oscillators. Number of HTS S 2 HT SUB 2 SUB Subnetworks 3.6.5 1 0.531±0.009 0.252±0.009 0.278±0.010 0.156±0.006 2 0.613±0.015 0.348±0.008 0.373±0.018 0.263±0.006 3 0.735±0.031 0.506±0.010 0.538±0.018 0.398±0.010 4 0.757±0.012 0.594±0.013 0.571±0.025 0.492±0.012 Assessment of topographical sensitivity The topographical sensitivity of multivariate phase synchrony measures is evaluated on a network consisting of 58 chaotic non-identical Colpitts oscillators [36]. Colpitts oscillators have been employed in the assessment of topographical sensitivity since they are chaotic non-symmetrical oscillators that generate irregular sinusoidal signals similar to EEG [108]. Eq. (3.42) describes the (i) dynamics of oscillator i. In this network, oscillators are coupled through x2 , and Ci j indicates the coupling from oscillator j to oscillator i; k = 0.5, and g, Q and α are chosen randomly between 50 the intervals [4.006, 4.428], [1.342, 1.483], and [0.949, 0.999], respectively. (i) dx1 dt = (i) g (i) [α(e−x2 − 1) + x3 ] Q(1 − k) = 58 (i) g (i) ( j) (i) [(1 − α)(e−x2 − 1) + x3 ] + ∑ Ci j (x2 − x2 ) Qk j=1 (i) dx2 dt (i) dx3 dt = − Qk(1 − k) (i) 1 (i) (i) [(x1 − x2 ) − x3 ]. g Q (3.42) A network consisting of bidirectional connections among electrodes 9, 10, 11, 18, 19, 20, 27, 28, and 29 with the nonlinear dynamics described (3.42) was simulated for 100 repetitions. Here, coefficients Ci j are set to zero for all i and j except for i, j = {9, 10, 11, 18, 19, 20, 27, 28, 29}, corresponding to electrodes F1, FZ, F2, FC1, FCZ, FC2, C1, CZ, and C2 in Figure 3.10. For this network, it is expected that electrode FCz will show the highest degree of multivariate synchrony. This network is simulated via the Heun method with δt = 0.04 s. The signals are 100 seconds-long, the first 40 seconds are discarded to remove transients and signals are down-sampled to a sampling frequency of 12.5 Hz. In addition, white Gaussian noise was added and multivariate synchrony was assessed for SNR levels of 0 dB, 10 dB and 20 dB. S-estimator and HTS were computed at each electrode, for each time and frequency point among each electrodes first nearest neighbors, resulting in groups of five electrodes. For example, for this simulated network the S-estimator and HTS at electrode FCZ consider electrodes FZ, FC1, FCZ, FC2, and CZ. Figure 3.11 shows the topographical plots for HTS and S-Estimator for the three SNR levels considered. Overall, HTS and S-estimator result in similar topographical maps, demonstrating good topographical sensitivity. Under additive noise, HTS results in higher multivariate synchrony among the electrodes comprising the simulated network, being more robust to noise when com- 51 pared to S-estimator. Figure 3.10: SynAmps2-64 EEG system and network under test. 52 HTS S-estimator 20 dB (a) HTS 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 (b) S-estimator HTS 1 10 dB S-estimator 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 (d)0 S-estimator (c) HTS HTS 1 0 dB 0 S-estimator 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.2 0.1 0.3 0.2 0.1 0 0 (e) HTS (f) S-estimator Figure 3.11: Topographical plots showing the sensitivity of HTS and S-estimator at various SNR levels. a), c) and e) HTS, 20 dB, 10 dB and 0 dB, respectively; b), d) and f) S-estimator, 20 dB, 10 dB and 0 dB, respectively. 53 First, we were interested in comparing the proposed multivariate measure with respect to conventional bivariate PLV. For this purpose, we compared the multivariate synchrony among electrodes FCz, F5 and F6 with the pairwise synchrony between FCz and F5 and FCz and F6 obtained from both error and correct responses for the 25-75 ms time interval, as this time interval contains the peak of the ERN and has been used in prior phase synchrony studies [29], [27]. These electrode pairs were selected as they are commonly used in assessing error-correct synchronization differences during cognitive control [27], [109]. The statistical significance of the difference between error and correct synchrony was investigated by performing a t-test for HTS and a two-way ANOVA for the two PLVs. Results are shown in Table 1 and indicate that HTS is more sensitive to the difference between error and correct responses across the central and lateral frontal regions when compared to PLV. This example shows the benefit of computing multivariate synchronization over bivariate pairs. Table 3.4: Statistical significance of error-correct responses obtained from PLV and HTS. Synchrony p-value PLV: FCz-F5 0.0209 PLV: FCz-F6 HTS 1.019e-10 Topographical connectivity for EEG data is investigated following the same definition of electrode neighborhoods as in Section 3.6.5. HTS and S-estimator were computed for both error and correct responses separately, at each electrode as described in Section 3.6.5, obtaining a multivariate synchrony value for each time and frequency point and electrode for each subject. Next, for each subject, the multivariate synchrony was averaged over different time intervals of interest, and frequency bins corresponding to the theta band (4-8 Hz) at each electrode. Here, we inves54 tigated the dynamic nature of functional connectivity by looking at multivariate synchronization over different post-response time intervals. Figure 3.12 shows the topographical distribution of multivariate synchrony for error minus correct responses averaged over subjects estimated from HTS and S-estimator for four time intervals: 0-25 ms, 25-50 ms, 50-75 ms and 75-100 ms, and in the theta frequency band. From these figures, it is observed that HTS results in higher synchrony for error-correct difference for the frontal and central electrodes when compared to the centralparietal electrodes for all intervals. The S-estimator, on the other hand, does not indicate much variation across time and the error-correct synchrony differences are close to zero for most brain regions. For HTS, time intervals of 25-50 ms and 50-75 ms show moderate increase in synchrony in the central-frontal regions compared to the other time windows. 55 0 – 25 ms (a) HTS (b) S-estimator 25 - 50 ms (c) HTS (d) S-estimator 50 - 75 ms (e) HTS (f) S-estimator 75 - 100 ms (g) HTS (h) S-estimator Figure 3.12: Error-Correct topographical sensitivity of multivariate synchrony in intervals of 25 ms and theta band [4-8 Hz]. (a), (c), (e) and (g) Error-Correct HTS difference. (b), (d), (f) and (h) Error-Correct S-estimator difference. 56 In order to investigate the significance of changes across time, t-tests were performed between consecutive time intervals. Figure 3.13 shows the topographical distribution of p-values obtained for each electrode for different comparisons. Clearly, there is a significant change in multivariate synchrony in the central electrodes between the first and second time intervals, i.e. 0-25 ms, and 25-50 ms (Figure 3.13 (a)), no significant change between 25-50 ms and 50-75 ms (Figure 3.13 (b)), and finally a small change in central electrodes between 50-75 ms and 75-100 ms (Figure 3.13 (c)). (a) [0, 25]-[25, 50] (b) [25, 50]-[50, 75] (c) [50, 75]-[75, 100] Figure 3.13: Topographical plots of p-values from t-test investigating the difference of multivariate synchrony from HTS between intervals. Note: Black regions correspond to lower p-values. Since the intervals 25-50 ms and 50-75 ms show the largest error-correct differences based on Figure 3.12, we next focus on the time interval 25-75 ms and look at the topographical distribution of HTS for error and correct responses, separately. As Figure 3.14 shows, HTS yields increased synchronization for the central and lateral frontal regions for the error response (Figure 3.14 (a)) whereas there is no topographical differentiation of synchrony for the correct response (Figure 3.14 (b)). The proposed HTS measure replicates these findings and further identifies regionally increased synchronization on error trials, compared to correct, in the medial-lateral and frontalcentral areas. Unlike conventional bivariate measures which require the computation of synchro- 57 nization between multiple pairwise regions, the proposed measure can summarize the global synchronization within a region by a single number providing an ease of interpretation. (a) Error (b) Correct (c) Error-Correct Figure 3.14: Topographical plots of multivariate synchrony from HTS in the 25-75 ms interval. (a) Error responses; (b) Correct responses; (c) Error-Correct responses. In addition, a Receiver Operating Characteristic (ROC) curve was constructed in order to compare the performance of HTS and the S-estimator in the detection of multivariate synchrony during the ERN interval in the theta band. The probability of detection and false alarm were defined as the ratio at which the average multivariate synchrony over the ERN interval and theta band in electrodes FCz and CPz, respectively, exceeded a threshold. Figure 3.15 shows the ROC curves for both estimators. The area under the curve (AUC) for each estimator was computed, resulting in AUCHT S = 0.8601 and AUCS−estimator = 0.7936. Thus, as observed, HTS exceeds S-estimator in the detection of multivariate synchronization in the frontal-central regions during the ERN in the theta band indicating that HTS is more sensitive to detecting the difference in synchronization between the frontal-central region and the central-parietal region. 58 1 HTS S−Estimator Probability of Detection 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 Probability of False Alarm 0.8 1 Figure 3.15: ROC curves for HTS and S-Estimator. Probability of detection is based on the multivariate synchrony among FCz and its neighbors whereas the probability of false alarm is based on the multivariate synchrony around CPz. To further examine the functional, behavioral significance of the increased HTS synchrony identified following errors, compared to corrects, we computed correlations between the errorcorrect HTS synchrony and behavioral adjustments observed after mistakes post-error slowing (PES) calculated as correct response time (RT) following error responses correct RT following correct responses, and post-error accuracy (PEA) calculated as accuracy following error responses accuracy following correct responses. Although the functional significance of PES remains controversial i.e., some believe it represents the cautious slowing to ensure effective responding following mistakes, whereas others argue it represents off-task orienting to rare mistakes PEA is more clearly an adaptive response following errors (see [110] for a review). Figure 3.16 shows the topographical distribution of these correlations. Across frontal and central regions of interest, synchrony was inversely related to PES and positively related to PEA. The negative correlations between multivariate synchrony and PES were more broadly distributed whereas those between 59 multivariate synchrony and PEA were more localized to the central and left frontal electrodes. Figure 3.17 illustrates the topographical distribution of p-values (two-tailed) from the correlations obtained from PES (Figure 3.17 (a)) and PEA (Figure 3.17 (b)). For each behavioral condition, a distribution of correlations pΓ (γ) is constructed and a p-value for each electrode is obtained as p − valuei = 2min{P(Γ ≤ γi ), P(Γ ≥ γi )} where γi denotes the correlation coefficient at electrode i. As shown in Figure 3.17 (a), the inverse relationship between HTS and PES is most significant in the right frontal (F6) and parietal regions (P6), whereas as shown in Figure 3.17 (b) correlation between HTS and PEA is most significant around the left -central (FC3, FC5, C3) and frontal regions (AF3). (a) PES (b) PEA Figure 3.16: Correlation coefficient between (a) PES, (b) PEA and error-correct multivariate synchrony difference computed using HTS in the ERN interval 25-75 ms, for each electrode. 60 (a) PES 0.1 0.1 0.09 0.09 0.08 0.08 0.07 0.07 0.06 0.06 0.05 0.05 0.04 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 (b) PEA 0 Figure 3.17: Topographical distribution of p-values obtained from the correlation coefficient between (a) PES, (b) PEA and error-correct multivariate synchrony difference computed using HTS in the ERN interval 25-75 ms, for each electrode. Black refers to more significant. 3.7 Conclusions In this chapter, we presented a novel time-frequency measure of multivariate phase synchrony based on a hyperdimensional coordinate system. This measure has been derived from both a hyperspherical coordinate system and from the Cartesian product of unit circles. The proposed measure has been shown to be advantageous over a widely used multivariate measure, the S-estimator, in estimating the global synchrony in simulated systems of coupled oscillators and in neurophysiological signals. In particular, it was shown that the proposed method is a direct measure of global synchrony which overcomes the drawbacks of multivariate synchrony methods based on the bivariate PLV. First, it was shown, from a simulation in Rössler oscillators, that the proposed measure provides information about the underlying structure of the network, otherwise misinterpreted from the S-estimator. Second, the proposed measure is computationally efficient since it does not require the computation of all pairwise synchrony values in a network nor the eigendecomposition 61 of a connectivity matrix. The application of this method to EEG data showed that it is more sensitive to the increase of multivariate synchrony among electrodes in the central-frontal region during error responses in an ERN experiment compared to the S-estimator. Furthermore, HTS was shown to be more sensitive to spatial changes in multivariate synchrony compared to S-estimator and to be a better predictor of error-correct differences in the error monitoring experiment compared to traditional bivariate PLV synchrony metrics. The proposed measure can be implemented using instantaneous phase estimates obtained from the Hilbert transform, the Wavelet transform and, with some limitations, the Hilbert-Huang transform in addition to the RID-Rihaczek distribution. Thus, the proposed measure of multivariate synchrony is a promising tool for the assessment of the global integration in dynamic complex networks. 62 Chapter 4 Graph to Signal Transform Based on the Resistance Distance and its applications to Functional Connectivity Networks 4.1 Introduction Complex networks arise in a wide variety of systems such as biological, computational and social. In biology, for example, protein-protein interactions constitute protein interaction networks in which proteins represent nodes and their interactions are represented by edges [111]. Another example of complex networks is the Internet, for which nodes may represent computers, or webpages [112]. In the context of social networks, users are represented by nodes and their connections to other users are represented by edges [113]. Over the last decade, complex network theory has contributed significantly to the study of functional connectivity networks (FCNs), in particular in the assessment of functional integration and segregation [10]. Specifically, graph theoretic measures such as the shortest path length and clustering coefficient have helped to characterize small-world networks [114], and the degree distribution has been utilized to characterize scale-free networks 63 [115]. In particular, it has been shown that FCNs of the healthy population exhibit small-world properties [40], [116]. Despite the contributions of graph theoretic and information theoretic methods to the characterization of FCNs, these methods possess certain drawbacks. The first problem is the sensitivity of graph theoretic measures to network size. An example is the clustering coefficient, which quantifies the ratio of the number of triangles around a node and the maximum number of edges that can be connected to it [117]. Therefore the mean clustering coefficient can be unfairly affected by nodes with a low degree [117]. This would have an effect on other measures such as the small-world measure which depends on the clustering coefficient, particularly in brain networks constructed from electrophysiological modalities where the number of nodes tends to be small [118]. Another problem with graph theoretic measures is their non-uniqueness. An example is the small-world measure, which relies on the clustering coefficient normalized by the clustering coefficient of a random network. Such a normalization may affect the small-world measure as two very different network structures may have similar clustering coefficients [119]. Finally, another problem with graph theoretic measures is the mismatch between the measure and the flow of information in the underlying network, especially for weighted networks such as FCNs. For example, FCNs may not necessarily rely on shortest paths for communication between the nodes, and measures like the characteristic path length and the global efficiency are unable to capture this type of connectivity patterns [10], [120]. Alternatively, a complementary set of methods for network analysis has been proposed through transforming graphs into signals in order to take advantage of signal processing methods in the analysis [44], [43], [46]. By transforming graphs into signals it is possible to apply traditional signal processing techniques on those signals in order to extract information from the networks and overcome some of the shortcomings of graph measures. Both probabilistic and determinis64 tic methods have been proposed to transform networks into signals. In [121], a transformation of networks based on random walk theory has been proposed to show that the transformed signals reveal mixing patterns of the network. In another recent work, Girault et al. [122] proposed a semi-supervised learning method for graph to signal mapping which results in smooth signals from graphs. However, stochastic methods do not provide the means for recovering the network once they are transformed into signals. Shimada [44] and Haraguchi [43] formulated a deterministic method based on classical multidimensional scaling (CMDS), allowing the transformation of complex binary networks into time series. Under this transformation, the vertices of the network correspond to time indices for the time series [123]. It was shown that lattice networks transform to sinusoids and Watt-Strogatz networks transform to random signals [44]. Recently, Hamon et al. [124], [125] have extended this method to the analysis of temporal networks, with an application to a network of face-to-face contacts revealing significant subnetworks. However, all of these approaches have focused on binary graphs, and therefore have limited applicability to weighted networks that arise in neuroscience. In order to transform both binary and weighted graphs into signals, we propose to use the resistance distance of a connected graph as the distance matrix for CMDS. The resistance distance of a graph was proposed in [126] in the context of chemistry. The resistance distance between two nodes corresponds to the equivalent resistance between them, considering the graph as an electric circuit [126] whose edges represent resistors inversely proportional to the edge weight. An advantage of the resistance distance over other graph distances, such as the shortest path distance, is that the resistance distance takes into account the global structure of the graph and hence reflects information about multiple paths. Moreover, the resistance distance can be obtained from the pseudoinverse of the Laplacian matrix of the graph and is a valid distance matrix [127]. Therefore, it is an alternative distance matrix for CMDS. 65 Transforming graphs into signals from CMDS results in a total of N signals or components, corresponding to each one of the N nodes in the network. We are interested in quantifying structural information of the networks based on their signals. In this work, we propose a graph entropy measure based on the spectrum of the signals obtained from the transformation. Graph information theoretic measures, such as graph entropy, are important as they allow for the quantification of the structural information of the networks. Quantifying the information content of FCNs through graph information measures has been limited. In one such study, Sato et al. [128] characterized FCNs from Attention Deficit Hyperactivity Disorder (ADHD) based on the graph spectral entropy of FCNs. In addition, Takahashi et al. [129] introduced the Jensen-Shannon divergence between graph spectra to compare networks from ADHD. Current graph entropy measures rely on the spectral distribution of the graph adjacency matrix or other graph-related matrices [130], and methods that consider the probability distribution on the graph vertices [131]. However, these methods present some drawbacks. In particular, spectral graph entropy may fail to discriminate the network structure among networks sharing similar spectra [132]. Moreover, graph entropy measures based on the probability distribution on the graph vertices rely heavily on the method for estimating the graph-vertex probability distributions. One method is based on partitioning the vertex set of binary graphs [133], [134]. Under this approach, vertices are clustered into sets of identical vertices based on their local and non-local degree-dependencies and a probability is assigned to each partition based on the total number of vertices in that partition divided by the total number of vertices [135]. However, this method relies on arbitrary parameters, which can be found in an optimal sense if there is a ground truth. Another method is based on an information functional for undirected and connected graphs [131]. In this method, the information functionals are function of the sets denoted j-sphere for each vertex, which is the set of vertices that are j edges apart from the current vertex where the distance is quantified by the shortest path. In this chapter, we propose 66 a graph entropy and divergence measure by implementing traditional information-theoretic measures on the spectrum of the signals obtained from the graph to signal transformation, and both methods are parameter free. It is shown that this method allows for the quantification of network structural information and discrimination between distinct cognitive network structures. As the signals obtained from this graph transformation convey structural information of the networks, we consider them for event detection in temporal networks. Previously, Hamon et al. [124] used nonnegative matrix factorization of the spectra of the signals for characterizing the structure of temporal networks. By doing this, it is possible to characterize the network structure over time, which may remain ambiguous if assessed from graph theoretic measures solely. Another important problem in the study of temporal networks is the detection of events, which occur as a deviation of the network structure from its usual structure at a particular instance. Due to the evolving nature of temporal networks in many applications, detecting abrupt changes in the structure of temporal networks is of great importance. In particular, the connections among the nodes of the network may change with time or some network components may be removed over time [136]. Previously proposed methods for event detection in temporal networks include [136] distance based methods, probabilistic model based methods, and subspace estimation-based methods. In distance based methods, the distance between two graphs is computed at each time point and thus anomalous instances are extracted from this time series. Probabilistic model based methods account for the deviations from models of the graph spectrum, which indicate the presence of an event. Subspace estimation based methods, such as singular value decomposition and tensor decompositions, track the singular values over time as well as the reconstruction error for identifying events in the temporal networks. On the other hand, several methods have been proposed in computer science and sensor networks literature, with applications to video and wireless sensors. In addition, work on the analysis of temporal networks has focused on the computation of graph theoretic measures 67 based on the graph adjacency matrix at each time point and then constructing time series for each feature [137]. However, it has been argued that this approach may fail when activity patterns are discontinuous and when there are abrupt changes in the network [138]. In this work, we focus on tensor decompositions since they exploit the underlying relationships of multiway data [139], as opposed to matrix decomposition methods, such as PCA and Independent Component Analysis (ICA). In this chapter, we propose to form a tensor based on the spectra of the signals obtained from the graph to signal transformation at each time point and detect abrupt structural changes in the temporal network at a particular time. Our proposed method is compared to the tensor decomposition based on the graph adjacency matrices over time. Both methods correctly detect the change points in time when significant changes occur, however, the proposed method is more sensitive to those changes. Finally, we employ the proposed graph to signal transformation for characterizing functional connectivity network structure. The proposed measure is applied to the electroencephalogram (EEG) data described in Chapter 2. Previous works have suggested that functional connectivity networks are small-world networks [140]. Furthermore, a recent work on weighted small-world measures have shown increased small-world characteristics in functional connectivity networks during error-related negativity [116]. In this chapter, we assess the small-worldness of functional connectivity networks during ERN by computing the correlation between the spectrum of the signals transformed and the spectrum of a small-world network for different parameters of smallworldness. We show that the signals obtained from graphs contain structural information that allows us to demonstrate the structural differences between different experimental conditions. In particular, we show how the structure of functional connectivity networks from different conditions in a cognitive control experiment can be characterized with the proposed method. This chapter is organized as follows. Section 4.2 presents a background on graph entropy 68 measures and CMDS for graph to signal transform of binary networks. Next, the proposed graph to signal transform based on the resistance distance matrix is presented in Section 4.3 as well as characterizations of this transform. Simulation results comparing the proposed method to the previously defined distance measures are shown in Section 4.3.5.1. In 4.4 Next, we present the proposed graph entropy measures based on the signals obtained from the transform in Section 4.5, followed by event detection in Section 4.6 and the characterization of functional connectivity networks in Section 4.7. Finally, conclusions are discussed in Section 4.8. 4.2 4.2.1 Background Graph Entropy Measures Previously proposed graph entropy measures to quantify the structural information of the graph include information functionals evaluated at the local node level [131], and the spectrum of the adjacency matrix A or the Laplacian [130]. Let d(u, v) be the shortest path between nodes u and v and σ (v) = maxu∈V d(u, v) be the eccentricity of the graph. Define ρ(G) = maxv∈V σ (v) as the diameter of the graph and S j (vi ) = {v ∈ V |d(vi , v) = j}, j ≥ 1, as the j-sphere of node vi . In this work, we compare our proposed method to two information functionals. The first information functional is defined as f V1 (vi ) = α c1 |S1 (vi )|+c2 |S2 (vi )|+···+cρ(G) |Sρ(G) (vi )| , where ck > 0, 1 ≤ k ≤ ρ(G), α > 0, and V1 identifies it as the first information functional. This functional is a function of the j-sphere for each node. Node probabilities are defined based on this functional as pV1 (vi ) = f V1 (vi ) N f V1 (v ) ∑ j=1 j . Based on pV1 , the graph entropy of G is given as |V | I f V1 (G) = − ∑ pV1 (vi ) log(pV1 (vi )). i=1 69 (4.1) The second graph entropy measure that is commonly used is obtained from the spectrum of the adjacency matrix A [130]. In this case, the information functional is defined as fiV2 = |λi |, where λi is the ith eigenvalue of the adjacency matrix. The node probability is defined as pVi 2 = |λi | , ∑Cj=1 |λ j | where λ1 , ..., λC correspond to the non-zero eigenvalues of the adjacency matrix. Then, the graph entropy is defined as C I f V2 (G) = − ∑ pV2 (vi ) log(pV2 (vi )). (4.2) i=1 This functional is equivalent to the Von Neumann entropy of a graph if we consider the eigenvalues of the normalized Laplacian [141], [142]. Finally, the one-dimensional structural information of a connected, undirected graph has been defined based on the degree distribution as [143] N ∆ii ∆ii log , 2M i=1 2M H 1 (G) = − ∑ (4.3) where ∆ii corresponds to the degree of the ith node, N is the total number of nodes in the network, and M is the total number of edges in the network. 4.2.2 Graph to Signal Transform based on Classical Multidimensional Scaling The transformation of graphs into signals is based on CMDS. CMDS is a data reduction algorithm whose objective is to find a low-dimensional representation of the data while preserving the Euclidean distances between points [144]. In particular, for our application of transforming graphs into signals, the aim is to obtain coordinate vectors that preserve a given distance [43]. The first 70 step in the algorithm is to entry-wise square the distance matrix D and double center it as 1 B = − JN D(2) JN , 2 (4.4) 0 where D(2) = D ◦ D is the entry-wise squared Euclidean distance matrix, JN = IN − N1 1N 1N is a centering matrix, IN is an N × N identity matrix, 1N is a N × 1 vector of ones, and 0 de- notes the transpose. B is a positive semidefinite matrix with rank(B) = C, C ≤ N. Therefore, B has C positive eigenvalues, and N − C eigenvalues equal to zero. The next step is to per 1   1 0 0 0 form the spectral factorization of B, resulting in B = P Λ P = PΛ 2 × PΛ 2 = XX , where p √ √ 1 Λ = diag(λ1 , λ2 , . . . λC ), and Λ 2 = diag( λ1 , λ2 , . . . λC ), correspond to the nonzero eigenvalues of B, with λ1 ≥ λ2 ≥ · · · ≥ λC , P ∈ RN×C , and X ∈ RN×C . Based on X, a total of C signals of length N corresponding to the columns of X are obtained. The ith signal xi ∈ RN×1 is defined as the ith column of X with i = 1, 2, . . . ,C. In this chapter, we will refer to xi (n) as components and signals interchangeably. In order to preserve the positive definiteness of B, the matrix D needs to be a valid distance matrix and conditionally negative definite. In previous works [44], [124], CMDS has been implemented in the transformation of binary networks and the distance D is based on the binary adjacency matrix A as      0,      Di j = 1,         γ, i = j, ai j = 1 and i 6= j, (4.5) ai j = 0 and i 6= j, where γ is a parameter that guarantees the conditional negative definiteness of D. In this manner, the coordinate vectors preserve the adjacency relationship of the nodes [43]. In [46], Hamon et. al 71 found the upper bound of γ to be q N N−2 . It is important to note that D only provides information about whether the vertices are connected or not, but not about the distance between vertices. 4.3 Graph to Signal Transformation Based on the Resistance Distance Matrix 4.3.1 Resistance distance In order to extend the graph to signal transform based on CMDS to weighted graphs, we consider the resistance distance matrix of a graph, denoted as R. The resistance distance was introduced by Klein and Randic [126] as an alternative to the shortest path distance for applications in chemistry. It is inspired by basic circuit theory, where each edge on the graph represents a resistor with value 1 wi j [145]. The following definitions of the resistance distance apply to both binary and weighted networks. For simplicity, the notation will follow that of binary networks. The resistance distance matrix R ∈ RN×N of a complete graph with N nodes is given by 0 0 R = Z̃1N 1N + 1N 1N Z̃ − 2Z, (4.6) 0 where Z = (L + N1 1N 1N )−1 and Z̃ = diag(Z11 , Z22 , ..., ZNN ). In addition, R is a conditionally negative matrix and therefore has only one positive eigenvalue. Each entry Ri j in R corresponds to the squared Euclidean distance between vertices i and j [127]. In a connected graph, Ri j ≤ d(i, j), where d(i, j) is the shortest path distance, and equality holds when there is only one path between i and j [146]. The resistance distance is a valid distance measure, and satisfies the following properties [147]: 72 Ri j ≥ 0 f or all i, j with equality i f and only i f i = j, Ri j = R ji , Ri j + R jk ≥ Rik . (4.7) The resistance distance between vertices i and j, Ri j , is defined through the Moore-Penrose pseudo inverse of L, L† [44], as Ri j = Lii† + L†j j − 2Li†j . (4.8) Ri j = (ei − e j )T L† (ei − e j ), (4.9) Equivalently, Ri j can be computed as where ei and e j are N × 1 vectors of zeros with 1 in the ith and jth index, respectively. In addition, the resistance distance is related to random walks on the graph, where the Ri j is proportional to the expected commute time of a random walk on the graph [146] and is given by Ri j = 1 [E(T i j ) + E(T ji )], 2|E| (4.10) where T i j is the number of transitions from vertex i to vertex j and |E| is the cardinality of the edge set. The resistance distance also provides a measure of network robustness. In particular, the effective resistance, also known as the Kirchoff index, provides information regarding to how robust the 73 network is [146], [148], with a small effective resistance value indicating a robust network. The effective resistance distance is defined as N Re f f = ∑ Ri j = N 1≤i≤ j≤N 1 ∑ λk = N tr(L†), (4.11) k=2 where λi , i = 2, ..., N correspond to the eigenvalues of L. Finally, Re f f is related to the network criticality γ, γ = 2Re f f [146], [149]. 4.3.2 Classical Multidimensional Scaling based on the Resistance Distance Using R, (4.4) can be reexpressed as 1 B = − JN R J N , 2 (4.12) to obtain the signals X for both binary and weighted networks. Note that since the entries of R correspond to squared Euclidean distances, it is not necessary to square the entries of R for the computation of B. As described in [43], if we denote the ith point in an m-dimensional Euclidean space as x̃i = 0 0 (x̃i1 , x̃i2 , . . . , x̃im ) , i = 1, . . . , N and define the coordinate matrix as X̃ = (x̃1 , x̃2 , . . . , x̃N ) , then the 0 0 0 0 0 Euclidean distance matrix in (4.12) is given as R = diag(X̃X̃ 1N 1N ) + 1N 1N diag(X̃X̃ ) − 2X̃X̃ . Then, (4.12) can be reexpressed as 74 1 B = − JN R J N 2 0 0 0 0 0 1 = − JN [diag(X̃X̃ 1N 1N ) + 1N 1N diag(X̃X̃ ) − 2XX ] JN 2 (4.13) 0 = JN X̃X̃ JN 0 = XX , (4.14) where X = JX̃ is the matrix whose columns correspond to the signals from the network, as described in Section 4.2.2. 4.3.3 Reconstruction of the original graph If the signals X are not distorted, then in principle the resistance distance matrix R can be recovered from the signals through the computation of the squared Euclidean distance between the points C R̂i j = ∑ (xc(i) − xc( j))2 , (4.15) c=1 where R̂ is the estimated R, C corresponds to the total number of components and xc (i) and xc ( j) correspond to the ith and jth entries of the cth component. It is possible to recover the original adjacency matrix from R̂, for both weighted and binary graphs as follows. First, we introduce τ as [147] τi = 2 − R̂i, j , W j∈V (i) i j ∑ (4.16) where V (i) denotes the set of vertices adjacent to i where adjacency is defined as Wi j 6= 0. For the next step, since R is nonsingular [127], we consider the following expression of R̂−1 75 which follows from the inverse of Euclidean distance matrices [150]: 0 1 1 R̂−1 = − L̂ + 0 ττ . 2 τ R̂τ (4.17) From (4.17), the Laplacian matrix L̂ is estimated as L̂ = −2 (R̂−1 − 0 1 ττ ). τ R̂τ 0 (4.18) Given an estimate of the Laplacian matrix, the degree matrix ∆ˆ is estimated as the diagonal matrix whose elements are the diagonal entries of L̂ ∆ˆ = diag(L̂11 , L̂22 , ..., L̂NN ). (4.19) Finally, the weighted adjacency matrix Ŵ is computed as Ŵ = ∆ˆ − L̂. (4.20) An alternative procedure for network reconstruction is as follows. It has been previously shown that R is an Euclidean Distance matrix if and only if B = − 12 JRJ is a positive definite matrix [150]. If R is an invertible Euclidean Distance matrix, then R−1 = −Y + uuT , where Y is a positive definite matrix, rank(Y) = N − 1, and 1T Y = 0, that is, the sum of its rows is zero. As shown in [150], Y corresponds to the Moore-Penrose pseudoinverse of B. Therefore, L̂† can be computed from R̂ as 1 L̂† = − JR̂J. 2 76 (4.21) Finally, ∆ˆ and Ŵ are estimated as before. 4.3.4 Perturbation Analysis In this section, we describe analytically the effect of small perturbations such as addition and removal of edges to the network on the transformed signals. Specifically, we are interested in describing how (4.12) is affected in both cases and its effect on the graph to signal transformation. We begin with the case edges are added to the network, where the perturbed adjacency matrix is given as W̃ = Wo + Wδ , (4.22) where Wo is the original adjacency matrix, Wδ is an adjacency matrix with the new edges and W̃ is the perturbed adjacency matrix. The following results apply to both weighted and binary graphs. We will follow the notation of weighted graphs for simplicity. The Laplacian of the perturbed network can be expressed in terms of the original network and the new links. As described by [149], the graph Laplacian can be reexpressed in terms of the graph incidence matrix as L = CWE CT = ∑ WiEj ei j eTij , (4.23) i, j where C ∈ RN×M is the graph incidence matrix, ei j is a column vector whose ith entry equals 1 and the jth entry equals -1, WE ∈ RM×M is a diagonal matrix whose entries are the M graph edges, and M is the total number of edges in the network. When edges are added to the network, (4.23) results in 77 L̃ = C̃W̃E C̃T = C̃o WEo C̃To + C̃δ WEδ C̃Tδ = Lo + Lδ (4.24) where Lo is the Laplacian matrix of the original network and Lδ is the Laplacian matrix of the subnetwork corresponding to the perturbation. The Moore-Penrose pseudoinverse of (4.24) was found [149] to be L̃† = L†o − L†o Lδ (I + L†o Lδ )−1 L†o . (4.25) This expression is valid as long as the addition of edges does not increase the rank of L̃ more than the rank of Lo . A similar expression is obtained for the case when edges are removed from the network. In this case, the perturbed adjacency matrix is expressed as W̃ = Wo − Wδ , (4.26) where now Wδ is a matrix whose entries Wδ i j are identical to the edges ei j to be removed from Wo and zero otherwise. Thus, the Moore-Penrose pseudoinverse of the Laplacian can now be expressed as L̃† = L†o + L†o Lδ (I + L†o Lδ )−1 L†o , (4.27) where the only difference between (4.25) and (4.27) is the sign and the same conditions on the 78 rank of L hold. Based on (4.25) and (4.27) we can express (4.12) for the case when edges are added or removed, respectively. We show only the derivation for the former scenario, since the only variant in the case of missing edges is a sign. We begin by recalling that Ri j = Lii† + L†j j − 2Li†j . Denote R̃i j the resistance distance between edges i and j for the perturbed network. Based on (4.25), we can express R̃i j when edges are added as R̃i j = [L†o +L†o Lδ (I+L†o Lδ )−1 L†o ]ii +[L†o +L†o Lδ (I+L†o Lδ )−1 L†o ] j j −2[L†o +L†o Lδ (I+L†o Lδ )−1 L†o ]i j , (4.28) where [ · ]i j denotes the i, j entry of the matrix L̃. Let M = L†o Lδ (I + L†o Lδ )−1 L†o . Rearranging (4.28) we obtain † † R̃i j = L1ii + L1† j j − 2L1i j − (Mii + M j j − 2Mi j ), = Roij − RM ij , where Roij refers to the resistance distance of the original network and RM i j is the resistance distance defined from the perturbation matrix M. Now, we can define an alternative matrix B̃ as 79 1 B̃ = − JN R̃JN 2 1 = − JN [Ro − RM ]JN 2 1 1 = − JN Ro JN + JN RM JN 2 2 = Bo − Bδ . (4.29) Therefore, we can express the matrix B for CMDS based on the resistance distance matrix when edges are added/removed to the network in terms of the original network and the new set of links. The previous procedure can be similarly carried out to obtain an expression for the case when edges are removed as B̃ = Bo + Bδ . 4.3.5 Illustration of Graph to Signal Transform In this section, we first compare the graph to signal transformation based on the distance D and R for binary networks. Next, the reconstruction of weighted networks is assessed. Following this, the robustness of the proposed method is assessed for various types of network anomalies. 4.3.5.1 Binary graphs Signals from binary graphs based on the resistance distance matrix are first compared to those obtained from CMDS based on the distance matrix D. First, we simulate two ring networks with N = 128 nodes and average degree K = 2 and K = 10. Figure 4.1 shows the results obtained from both methods. As expected, the signals based on the resistance distance matrix are sinusoidal signals (Figure 4.1 (a) and Figure 4.1 (b)) similar to the signals obtained from D (Figure 4.1 (c) and Figure 4.1 (d)). Figure 4.1 (a) and Figure 4.1 (c) show the first three components obtained 80 from R and D, respectively, for an average degree K equal to 2, and similarly Figure 4.1 (b) and Figure 4.1 (d) for K = 10. From these figures, it is observed that the amplitude of signals obtained from R is inversely proportional to the average degree, K, with the maximum amplitude when K = 2 and reduced amplitude when K = 10. This can be explained in terms of the resistance distance. It has been shown previously that the pairwise resistance Ri j decreases when edges are added or weights are increased in the network [146]. In addition, it has been shown that the signals obtained from ring lattice networks are sinusoids [44], since the eigenvectors of circulant matrices come from the Fourier matrix. Since the resistance distance matrix of a binary ring network is circulant, the signals obtained from R follow the same rule. Increasing K results in a reduction on the signal’s amplitude. Suppose that K1 ≤ K2 , then by properties of the resistance distance adding (2) (1) edges between vertices i and j causes Ri j ≤ Ri j . Since B is a Gramian matrix, it follows that √ (2) (1) Bi j ≤ Bi j [151]. By Weyl’s Theorem, λm1 ≤ λm2 [146]. Since the signals Xm (t) = λm cos( −2πmt N ) [44], and the eigenvectors cos( −2πmt N ) follow from the Fourier matrix for a ring lattice network, which are independent of the entries of the circulant matrix [152], increasing the node degree K results in a reduction of the signal’s amplitude. 81 K = 10 1 1 0.5 0.5 Amplitude Amplitude K=2 0 −0.5 −1 −1 50 Vertex Number (a) K=2 100 0 −0.5 0 50 Vertex Number (b) K = 10 100 0 50 Vertex Number (d) 100 0.5 Amplitude 0.5 Amplitude 0 −0.5 0 Component 1 Component 2 Component 3 0 50 Vertex Number (c) 0 −0.5 100 Figure 4.1: Signal representation of a ring lattice network composed of N = 128 nodes. Top: Resistance distance; (a) K = 2; (b) K = 10. Bottom: Distance D; (c) K = 2; (d) K = 10. Next, we compared both methods in an Erdős-Rènyi binary graph for probabilities of attachment p equal to 0.2 and 0.5. For the original distance matrix D, the signals are random signals, as previously found [46], with amplitudes bounded within the same range for all values of p (Figure 4.2 (c) and Figure 4.2 (d)). On the other hand, signals estimated from R still exhibit a random structure, with peaks that are inversely proportional to p (Figure 4.2 (a) and Figure 4.2 (b)). The location of these peaks correspond to the nodes with smallest degree, i.e. the largest peak occurs in the first component and corresponds to the node with the smallest degree. In terms of the resistance distance, a node with small degree will have a high resistance distance between it and the remaining nodes in the network. The reduction of the peak’s amplitude as the probability of attachment increases follows the previous discussion based on Weyl’s Theorem. 82 p = 0.5 0.3 0.3 0.2 0.2 Amplitude Amplitude p = 0.2 0.1 −0.1 0 50 Vertex Number (a) p = 0.2 100 0.3 0.3 0.2 0.2 0.1 0.1 Amplitude Amplitude 0.1 0 0 −0.1 Component 1 Component 2 Component 3 0 −0.1 −0.3 −0.3 100 100 0 50 Vertex Number (d) 100 −0.1 −0.2 50 Vertex Number (c) 50 Vertex Number (b) p = 0.5 0 −0.2 0 0 Figure 4.2: Erdős-Rènyi network signal representation; (a) and (b), Resistance distance, p = 0.2 and p = 0.5, respectively; (c) and (d), Distance D, p = 0.2 and p = 0.5, respectively. 4.3.5.2 Weighted graphs The proposed method was also assessed on a weighted stochastic block model consisting of 200 vertices and with fixed probability of attachment, p = 0.3. The weights are assigned randomly, uniformly distributed in the interval [0, 1]. Figure 4.3 (a) and Figure 4.3 (b) show the signal representations of stochastic block networks with 3 and 4 clusters, respectively. It can be observed from these figures that the first K − 1 components reveal the total number of clusters, and the K th component is an impulse. In addition, the size of each cluster can be inferred from the support of the constant regions in the first K − 1 components. 83 3 Clusters 4 Clusters 0.35 0.35 Component 1 Component 2 Component 3 0.3 0.25 0.3 0.2 Amplitude Amplitude 0.2 0.15 0.1 0.05 0.15 0.1 0.05 0 0 −0.05 −0.05 −0.1 Component 1 Component 2 Component 3 Component 4 0.25 0 50 100 150 Vertex Number (a) −0.1 200 0 50 100 150 Vertex Number (b) 200 Figure 4.3: Signals constructed from a weighted stochastic block network with probability of attachment p = 0.3 using the resistance distance matrix R. (a) First three components corresponding to a network with 3 blocks; (b) First four components corresponding to a network with 4 blocks. In addition to the stochastic block network, we investigated the graph to signal transformation of a weighted small-world network. Figure 4.4 shows the signals resulting from a small-world network with average degree K = 6, and composed of N = 128 vertices. As seen in Figure 4.4 (a), for a network with a low rewiring probability, p = 0.1, the resulting signals are sinusoidal signals plus some noise, which increases as p increases (Figure 4.4 (b)). This is consistent with previous works on binary networks [44], where it has been shown that the small-world network is equivalent to a ring network plus noise. 84 p = 0.7 0.6 0.4 0.4 0.2 0.2 Amplitude Amplitude p = 0.1 0.6 0 0 −0.2 −0.2 −0.4 −0.4 0 50 100 Vertex Number (a) 150 Component 1 Component 2 Component 3 0 50 100 Vertex Number (b) 150 Figure 4.4: Signals constructed from a weighted Small-World network consisting of N = 128 nodes and average degree K = 6. (a) Rewiring probability p = 0.1; (b) Rewiring probability p = 0.7. 4.3.5.3 Reconstruction of Weighted Networks In this section, the reconstruction of networks based on the procedure introduced in Section 4.3.3 is evaluated. The first network considered is a stochastic block model consisting of N = 150 nodes, with probability of attachment p = 0.3 and 4 clusters. This network is constructed and recovered from its signals X. The second network is a Erdős-Rènyi network consisting of N = 200 nodes with probability of attachment p = 0.5. Networks were reconstructed for a total of 100 simulations and the reconstruction error is based on the normalized Frobenius norm of the difference between the reconstructed and original adjacency matrices, 1 N(N−1) kW − ŴkF . Table 4.1 shows the reconstruc- tion errors for both networks. This indicates that the proposed reconstruction approach proposed in Section 4.3.3 allows to reconstruct the networks reliably, up to minimal numerical error. Table 4.1: Reconstruction errors Network Error (mean ± st.dev.) Stochastic Block 2.04 × 10−6 ± 1.43 × 10−6 Erdos-Renyi 2.00 × 10−6 ± 2.46 × 10−6 85 4.3.5.4 Robustness to network anomalies In this section, the robustness of the signals obtained from the resistance distance matrix to network anomalies such as missing edges and anomalous edges is assessed. For all the anomalies, the error is computed as the normalized Frobenius norm of the difference between the magnitude spectrum of the original graph and the anomalous graph, 1 N(N−1) kM − M̂kF . By computing the error based on the magnitude spectra of all components instead of the actual signals we avoid misleading error computations that might arise in cases such as ring networks. For ring networks, the corresponding signals are sinusoids and for certain types of anomalies the resulting degraded signal is close to zero, which may result in a small error. First, we assess the robustness of the method to missing edges. For a weighted ring lattice and a stochastic block network, the weights were uniformly distributed between 0 and 1. Edges are removed at random, while ensuring that the network remains connected. Figure 4.5 (a) and (b) show the error in the signal’s spectra for a ring lattice network with N = 128 and N = 256 nodes, respectively, and the percentage of missing edges ranging from 5% to 20% in increments of 5%. A total of 100 simulations were performed. Figure 4.5 (c) shows the errors for a stochastic block network with 3 clusters as the probability of attachment is varied, which is equivalent to adding more edges to the network, and the percentage of missing edges ranged from 5% to 50% in increments of 5%. As shown in this figure, networks with higher probability of attachment, resulting in more connections, are more robust to the removal of edges. 86 Ring Lattice Network, N = 128 Ring Lattice Network, N = 256 0.06 Stochastic Block Network, N = 150 0.06 K=4 K = 16 K = 64 0.014 K=4 K = 16 K = 64 0.05 0.05 0.04 0.04 0.03 0.03 0.02 0.02 0.012 p = 0.1 p = 0.3 p = 0.5 0.01 Error 0.008 0.006 0.004 0.01 0 −10 0.01 0 10 20 Missing edges (%) (a) 30 0 −10 0.002 0 10 20 Missing edges (%) (b) 30 0 −20 0 20 40 Missing edges (%) (c) 60 Figure 4.5: Error of the magnitude spectrum. (a) and (b) Ring lattice network with average degree K equal to 4, 16, and 64 consisting of N = 128 and N = 256 nodes, respectively; (c) Stochastic block network, 3 clusters and probability of attachment p = 0.1, p = 0.3, p = 0.5. Next, we investigate the robustness of the method when certain edges are anomalous by varying their weight within a certain range. We simulated a weighted ring lattice and a stochastic block network with N = 128 nodes, whose edge weights are uniformly distributed within the range [0.75, 1.25]. The ring network has an average degree K = 16, and the stochastic block network has 3 blocks with a probability of attachment p = 0.4. The weight of an anomalous subset of edges is then taken from different amplitude ranges: Range 1: [0.8, 1.2], Range 2: [0.6, 1.4], Range 3: [0.4, 1.6], Range 4: [0.2, 1.8], and Range 5: [0, 2]. Figure 4.6 (a) and Figure 4.6 (b) show the error of the magnitude spectrum for the ring lattice network and the stochastic block network, respectively. For both networks, the error increases proportionally to the range of possible values for the anomalous edges and the percentage of anomalous edges. 87 −3 −3 x 10 2.25 4.2 x 10 4 2.2 3.8 3.6 Error Error 2.15 3.4 3.2 2.1 3 2.8 Range 1 Range 2 Range 3 Range 4 Range 5 2.6 2.4 0 2 4 6 8 % of anomalous edges (a) 10 Range 1 Range 2 Range 3 Range 4 Range 5 2.05 12 0 2 4 6 8 % of anomalous edges (b) 10 12 Figure 4.6: Error of the magnitude spectrum from a ring lattice network (a) and a stochastic block network (b) with one anomalous edge whose weight ranged in the intervals [0.8, 1.2], [0.6, 1.4], [0.4, 1.6], [0.2, 1.8], and [0, 2]. 4.4 Small-world network characterization In this section, we propose to characterize the network structure based on the spectrum of the signals X. We focus on small-world networks and compare our proposed method to the conventional small-world measure for estimating the small-world parameters. Humpires et al. [53] proposed to use the small-world measure for estimating the probability of rewiring, pr , of real networks with small-world characteristics. They propose to find the pr that minimizes the error e(K, pr , N) = |σ (K, pr , N) − σtest |, where σ (K, pr , N) is the small-world measure of a simulated small-world network with known degree K, probability of rewiring pr , and N nodes, and σtest is the small-world measure computed from the network under test. However, this estimation method may be affected by various factors related to the small-world measure, as discussed in the introduction. In this section, we propose to estimate the small-world parameters by correlating the spectral centroid of the signals X with the spectral centroid of a baseline network. The spectral centroid is the first moment of the normalized power spectral distribution of the signal. Specifically, we denote 88 the vector of spectral centroids from a baseline simulated network as cB = (cB1 , cB2 , ..., cBN ), where cBi refers to the spectral centroid of the ith signal. Similarly, we denote the vector of spectral centroids test test of the test network as ctest = (ctest 1 , c2 , ..., cN ). Thus, the parameters K and pr of the small-world network under test are found empirically to maximize the correlation coefficient between the two vectors of centroids, ρ(K, pr , N) = corr(cB ,ctest ). Weighted small-world networks with K = 8 and varying pr values were generated 100 times, and then converted into fully connected networks by adding uniformly distributed noise. The weights of the small-world structure were uniformly distributed between [0, 1], while the noise values were uniformly distributed in [0, 1] for Fig. 4.7 (a), and in [0, 0.25] for Fig. 4.7 (b). The results in Fig. 4.7 show that the proposed method is more accurate than the small-world measure in estimating the probability of rewiring, especially for low pr within the small-world region. The small-world measure is dependent on the path length and clustering coefficient, which change as more links are added to the network, whereas the spectra of the signals obtained from the graph to signal transformation reflect the underlying small-world structure and is more robust to small changes in the network structure. Estimated p r 1 1 SW measure Spectral Centroid Gound Truth 0.5 0 10-4 0.5 10-2 0 10-4 100 10-2 pr pr (a) (b) 100 Figure 4.7: Estimated probability of rewiring pr in weighted small-world networks. Weights of the small world structure are uniformly distributed in the interval [0, 1] and noise values are uniformly distributed in (a) [0, 1], (b) [0, 0.25]. 89 4.5 Graph Entropy based on the graph to signal transform In this section, we propose a measure for quantifying the structural information of graphs based on the signals obtained from the networks. Since the signals X convey structural information of the networks, these signals provide alternative distributions to base the graph information theoretic measures on. Specifically, we consider the spectrum and the energy of the signals for computing the Shannon entropy of the networks. The proposed method does not depend on the selection of any parameter nor any graph theoretic measure unlike the previously introduced measures. The magnitude spectrum of the ith signal is defined as Mi [ f ] = |F{xi }|2 , where F denotes the − discrete Fourier transform, F{xi } = ∑N n=1 xi [n]e j2πnk N . We denote M ∈ RN×C as the matrix whose columns contain the magnitude spectrum of all signals. The normalized power spectrum of the ith signal is computed as Pi [ f ] = Mi [ f ] b(N−1)/2c ∑ f =0 Mi [ f ] , where f = 0, 1, ..., b(N − 1)/2c corresponds to discrete frequency bins [153]. We propose to compute the graph entropy based on the normalized power spectrum of xi [n], i = 1, 2, ..., C̃, where we consider the C̃ signals with highest energy. This parameter is selected empirically similar to the selection of the total number of factors in Principal Components Analysis (PCA). We propose to use the normalized power spectrum rather than the original signals for entropy computation since computing the Shannon entropy directly on the signals does not necessarily provide information about the network’s structural content. For example, for the ring network, the corresponding signals are pure sinusoids 4.3.5 [44], [45], with almost uniform histograms and thus resulting in high entropy. On the other hand, the power spectrum of a sine wave is well localized at a particular frequency (Fig. 4.8 (a)) thus its Shannon entropy is theoretically zero. This is consistent with the intuition that a ring network is deterministic and thus should exhibit low entropy. 90 4 Power Spectrum 8 4 x 10 8 6 6 4 4 2 2 0 0 5 10 x 10 p=0.01 p=0.05 p=0.1 0 0 15 10 20 (a) 0.25 Power Spectrum 30 (b) 14 p=0.1 p=0.3 p=0.6 0.2 Ck=2 12 Ck=6 10 0.15 8 0.1 6 Ck=10 4 0.05 0 2 0 50 100 0 150 Frequency (Hz) (c) 0 10 20 30 40 50 Frequency (Hz) (d) Figure 4.8: Power spectrum of the first graph signal for: (a) Ring network with K = 4; (b) Small world network with pr = 0.01, pr = 0.05, and pr = 0.1; (c) Erdős-Rènyi network with p = 0.1, p = 0.3, and p = 0.6; (d) Stochastic Block network with Ck = 2, Ck = 6, and Ck = 10 clusters, N = 300 nodes for all networks. The frequency axis limits are adjusted in order to better illustrate the spectrum. The normalized graph entropy for the ith signal is then defined as b(N−1)/2c 1 Pi [ f ] log(Pi [ f ]), Hi = − log(b(N − 1)/2c) f∑ =0 (4.30) where i = 1, 2, ..., C̃ [153]. Since (4.30) refers to the Shannon entropy, it is bounded as 0 ≤ Hi ≤ log(N 2 /2). Similar to Shannon entropy, the lower bound is reached when the distribution is an impulse, and the upper bound occurs when the distribution is uniform. In terms of graph structures, the lower bound corresponds to the ring lattice and the upper bound to a random network. As noted in Fig. 4.8, random networks transform into random signals with a high peak inversely proportional to the probability of attachment p, and hence its spectra are random for all p. In order 91 to account for the variation in the network entropy as the probability of attachment in random networks varies, we introduce the energy of the signals. We propose to weight each entropy term √ i k1 √1 , 1], using the fact that kxk2 ≤ kxk1 ≤ Nkxk2 . defined in (4.30) by weights wi = √kx , w ∈ [ i Nkx k N i 2 These weights are normalized across signals as w̃i = √ wi , C̃kwk2 where w = (w1 , w2 , ..., wC̃ ). We define the weighted graph spectral entropy (GSE) as C̃ GSE = ∑ w̃i Hi . (4.31) i=1 This definition of network entropy is independent of graph theoretic measures and the eigenspectrum of the adjacency matrix. Another information theoretic measure of interest in graph theory is graph divergence. In order to assess network dissimilarities, we define the graph Kullback-Leibler divergence based on the graph signal spectrum. First, we define the power spectral density based on the first signal of the graph Gi as PGi ( f ). Next, we define the graph Kullback-Leibler divergence between graph G2 and G1 as b(N−1)/2c DKL (PG1 ||PG2 ) = ∑ f =0 PG1 ( f ) log2 PG1 ( f ) . PG2 ( f ) (4.32) where similar to Kullback-Leibler divergence, DKL (PG1 ||PG2 ) ≥ 0. Since the Kullback-Leibler divergence is non-symmetric, we employ the Jensen-Shannon divergence, J-divergence, defined as J(PG1 , PG2 ) = DKL (PG1 ||PG2 ) + DKL (PG2 ||PG1 ) . 2 (4.33) We compare the structural entropy obtained from the graph entropies previously defined (I f V1 , 92 I f V2 , and H 1 ) and the proposed graph-signal based spectral entropy for various networks. Four networks were considered: ring lattice networks with degree K = 2, 4, 8, 16, and 32, a small-world network with K = 8 and probability of attachment p varying from 0.0001 to 1, a random network with probability of attachment p from 0.1 to 1 in increments of 0.05, and a stochastic block network with p = 0.1 and number of blocks Ck = 3, 5, 7, 9. All networks have N = 300 nodes and the entropies were computed for 200 simulations. Fig. 4.9 shows the graph entropy for the four networks. As expected, GSE is close to zero for the ring lattice network (Fig.4.9 (a)), whereas the other three entropy measures remain high for all K. In the case of the small-world network (Fig. 4.9 (b)), increasing pr increases the network structural complexity, and GSE reflects this change in the network structure. Information functionals increase as a function of the probability of rewiring as well, but remain high for low pr . For small pr , the network is close to a ring and hence it has low structural complexity. On the other hand, as pr increases, the network becomes more random and thus its entropy increases. Random networks have high entropy for different probability of attachment p (Fig. 4.9 (c)). Finally, Fig. 4.9 (d) shows the results for a stochastic block network. As reflected by GSE, as the number of clusters increases, the entropy increases. As observed from the different network structures, the information functionals are not consistent at quantifying the structural complexity in the network. In addition, these require prior knowledge of the network structure in order to select the parameter α. On the other hand, the proposed graph spectral entropy quantifies the network structural information and is sensitive to structural changes in the networks. 93 1 1 0.8 Info1 Entropy Entropy Info1 Info2 0.5 SE GSE 0.6 0.4 0.2 0 0 0 10 20 Average Degree K (a) 0 30 0.2 0.4 0.6 0.8 1 Probability of Rewiring, pr (b) 1 1 Entropy Entropy 0.8 0.95 0.6 0.4 0.9 0.2 0 0 0.2 0.4 0.6 0.8 1 3 Probability of Attachment, p (c) 4 5 6 7 8 9 Number of Clusters (d) Figure 4.9: Comparison of graph entropy measures for (a) Ring network, K = 2, 4, 8, 16, 32 (I f V1 , α = 0.98 and α = 1.03); (b) Small world network, K = 4 and probability of rewiring pr ranging from 0.0001 to 1 (I f V1 , α = 0.95, and α = 1.05); (c) Erdős-Rènyi network, p from 0.05 to 1 in increments of 0.05 (I f V1 , α = 0.95, and α = 1.1); (d) Stochastic Block network, Ck = 3, 5, 7, 9 (I f V1 , α = 0.95, and α = 1.1), and N = 300 nodes for all networks. Next, graph divergence between two different binary networks is computed. In the first case, the divergence between two small-world networks, one with mean degree K = 4 and p = 0.0001 and the other with the same mean degree but different p is assessed. The second case considers the divergence between two stochastic block networks with 3 blocks, the first one with p = 0.9 and the second one with varying levels of p. Figure 4.10 (a) and Figure 4.10 (b) show the results for divergence for the small-world and the stochastic block network, respectively. In the small-world network, as p increases, the network becomes less similar to the default network and hence an increase in the divergence is expected. Similarly, in the stochastic block network, as p decreases, the network becomes more random and thus deviates from the default network. 94 8 7 J−Divergence 6 N=100 N=300 N=500 0.8 0.7 5 0.6 4 0.5 3 0.4 2 0.3 1 0.2 0 0.1 −1 −0.5 0 0.5 0 0 1 p (a) 0.5 p (b) 1 Figure 4.10: Computation of graph divergence between (a) small-world network with K = 4 and p = 0.0001 and another small-world network with increasing p; (b) Stochastic Block network with 3 blocks and p = 0.9 and another Stochastic Block with 3 blocks and different probability of attachment p. 4.6 Event detection in temporal networks In this section, we present a method for event detection in temporal networks based on the proposed graph to signal transform. We first introduce background on the tensor decomposition based on the network’s adjacency matrix. Next, we describe the proposed method and compare its performance to traditional tensor decompositions based on the adjacency matrix. 4.6.1 Tensor Decompositions for Temporal Networks Tensor analysis provides a useful tool for revealing the underlying relationships of multilinear data and can be thought of as an extension of PCA and Singular Value Decomposition (SVD) from vector to higher order data. Two major methods for factor analysis of multilinear data are Canonical Polyadic Decomposition or Parallel Factor (CANDECOMP/PARAFAC) and Tucker decomposition [154]. CANDECOMP/PARAFAC is useful in applications where it is desired to factor the data into components that are easily interpretable, such as rank-1 terms, whereas Tucker decompo95 sition is used more often for compression and low-rank projections [139]. In this work, we focus on the PARAFAC decomposition for extracting the temporal profile of dynamic networks since it facilitates its interpretation. Let X ∈ R I1 ×I2 ×···×IN be a Nth-order tensor. The PARAFAC decomposition approximates the tensor X as the linear combination of R rank-1 tensors, expressed as [139] X ≈ R (1) ∑ λr ur (N) (2) ◦ ur ◦ · · · ◦ ur , (4.34) r=1 where ◦ denotes the outer product. Alternatively, the decomposition in (4.34) can be expressed as X ≈ D ×1 U(1) ×2 U(2) ×3 · · · ×N U(N) , (4.35) where D = diag(λ1 , λ2 , ..., λR ), and U(i) ∈ RIi ×R , i = 1, . . . , N, are the loading matrices. In the analysis of temporal networks based on the adjacency matrix, a tensor Xa ∈ RN×N×T is constructed, where Xa (:, :,t) = A(t) and Xa (:, :,t) = W(t) , for binary and weighted networks, respectively, and A(t) ∈ RN×N and W(t) ∈ RN×N are binary and weighted adjacency matrices at time t, t = 1, . . . , T , respectively. The tensor Xa can be decomposed as Xa ≈ R (1) ∑ λr ur (2) (3) ◦ ur ◦ ur , (4.36) r=1 (1) (2) where ur ∈ RN×1 and ur ∈ RN×1 are the same in the case of undirected networks and contain (3) information regarding the node’s connectivity, and ur ∈ RT ×1 is the temporal factor. 96 4.6.2 Graph to signal transform based event detection In this section, we propose a tensor decomposition method based on the network’s signals for event detection in temporal networks. Let X(t) ∈ RN×C be the set of signals obtained through CMDS (4.4) at time t. In order to obtain a better insight into the structure of the graph at each time point, we consider the spectra of the graph signals at each time point. We denote the magnitude (t) spectrum of X(t) as M(t) , and Mi ∈ RN×1 , i = 1, . . . ,C, to be the magnitude spectrum of the signal xi (n) from the network at time t. We then form a tensor XS ∈ RF×C×T , where XS (:, :,t) = M(t) , and F corresponds to the total number of frequency bins, equal to d N2 e considering only the positive frequencies, C is the total of components obtained from CMDS, and T is the total number of time points. In order to understand the interactions between the different components across time and frequency, we propose to decompose the tensor XS as Xs ≈ R (1) ∑ λr ur (2) (3) ◦ ur ◦ ur , (4.37) r=1 (1) (2) (3) where ur ∈ RF×1 is the spectral factor, ur ∈ RC×1 is the components factor, and ur ∈ RT ×1 is the temporal factor. The tensor decomposition in (4.37) is performed via the MATLAB Tensor Toolbox Version 2.6 [155], [156]. The algorithm enforces nonnegative constraints on the factors and is based on the multiplicative updates of the Nonnegative Matrix Factorization in [157]. The rank of the decomposition, R, is selected according to the core consistency [158]. We compared the proposed method and the tensor decomposition discussed in Section 4.6.1 in the detection of sudden changes in the structure of temporal networks. We generated a temporal weighted small-world network whose edge weights were uniformly distributed between 0.5 and 0.7. The networks consisted of 40 nodes and with p = 0.01 for the whole duration of 51 time points, 97 and created an event by increasing p only at t = 31. Four different values of p were considered: p = 0.05, p = 0.1, p = 0.15 and p = 0.2. A null network with the same properties but without an abrupt event is also constructed for detection analysis, meaning that its size, probability of attachment p and edge weights distribution is the same for all t. For each simulation and network, a total of 200 repetitions were performed, and an Receiver Operating Curve (ROC) for different p values is constructed. In order to construct the ROC, we compared the amplitude of the first (3) (3) component of the temporal mode, u1null (t) and u1alt (t), at t = 31, corresponding to the null and anomalous network, respectively, to a given threshold. A true detection is defined if the amplitude (3) of u1alt (31) is greater than the threshold, whereas a false alarm is identified if the amplitude of (3) u1null (31) is greater than the threshold. The resulting ROCs for each p are shown in Figure 4.11. It can be observed that for all p, the proposed method (blue) yields a larger area under the curve (AUC) compared to that obtained from the adjacency matrices solely (red) indicating the method’s ability to detect sudden changes in the network. Since the magnitude spectrum of the signals does not change considerably as long as the network structure is the same, the temporal mode of the tensor XS will reflect only true structural deviations. On the other hand, the entries of the adjacency matrix are distributed within an interval of edge weights and the temporal mode of the tensor Xa is sensitive to deviations that do not necessarily correspond to structural changes. 98 p = 0.05, AUC1 = 0.97995, AUC2 = 0.6069 p = 0.1, AUC1 = 0.9637, AUC2 = 0.65925 1 PD PD 1 0.5 0 0 0.2 0.4 0.6 0.8 0.5 0 1 PF (a) p = 0.15, AUC1 = 0.9873, AUC2 = 0.7037 0.5 0 0.2 0.4 0.6 0.8 1 PF (b) p = 0.2, AUC1 = 0.98605, AUC2 = 0.7636 0 0.2 0.4 1 PD PD 1 0 0 0.2 0.4 0.6 0.8 0.5 0 1 PF (c) 0.6 0.8 1 PF (d) Figure 4.11: Detection of an event consisting of a Small-World network whose probability of attachment p changes from that of the default network (p = 0.01) at t = 31. ROCs are constructed from the proposed method (blue) and adjacency matrix based method (red) for (a) p = 0.05; (b) p = 0.1; (c) p = 0.15; (d) p = 0.2. 4.7 Characterization of Functional Connectivity Networks In this section, we propose to characterize the network structure based on the spectrum of the signals of functional connectivity networks from the cognitive control experiment described in Chapter 2. Functional connectivity networks are constructed based on the bivariate phase-locking value (PLV) described in Chapter 2 between pairs of electrodes, for both error and correct responses. A network for each subject was constructed by averaging the PLV over the frequency bins corresponding to the theta band, 4-8 Hz, and the ERN interval, 25 − 75 ms. These networks were transformed into signals by using (4.12). The magnitude of the Fourier transform of each component for error, Me , and correct, Mc , responses are shown in Figure 4.12 (a) and Figure 4.12 99 (b), respectively. As observed in Figure 4.12 (a), the first components of error responses exhibit high energy concentrated in the low frequencies and this energy shifts towards higher frequencies as the component number increases. On the other hand, there is no clear trend in the spectrum corresponding to correct responses. This suggests that functional connectivity networks from error responses have a more organized structure than that of correct responses, which suggests to follow a random, less organized structure. Next, we assess the similarity of the signal spectra obtained from the error and correct functional connectivity networks to that of a small-world network following the approach proposed in Section 4.4. For both error and correct responses, we computed the spectral centroid for each signal and computed its correlation with that of a small world network for different average degrees K and probabilities of rewiring pr . Table 4.2 shows the estimated parameters (mean ± st.dev.) for different time intervals. As observed, FCNs from error responses are characterized with small pr and K, characteristic of small-world networks, while CRN networks have higher pr and K, indicative of increased randomness. On the other hand, as observed in Table 4.3, the small-world measure does not reflect such difference. Previous works have reported increased small-worldness for ERN compared to CRN [116]. Therefore, the proposed approach can serve as an alternative method for the characterization of FCN structure in distinct cognitive states, and furthermore, estimate network parameters. 100 Table 4.2: Estimated small-world parameters. Estimated small-world parameters Interval -25 - 0 0 - 25 25-50 50-75 75-100 100-125 k pr ERN 0.0113 ± 0.0152 0.0060 ± 0.0044 0.0119 ± 0.0213 0.0085 ± 0.0137 0.0065 ± 0.0065 0.007 ± 0.0111 CRN 0.2897 ± 0.3602 0.3707 ± 0.3910 0.3474 ± 0.3636 0.3533 ± 0.3801 0.3622 ± 0.3897 0.2734 ± 0.3740 ERN 2.1111 ± 0.4714 2±0 2±0 2±0 2±0 2±0 CRN 9.2222 ± 6.9668 13 ± 6.5530 9.7778 ± 6.8561 10.2222 ± 7.9377 11.7778 ± 6.9583 12.5556 ± 6.2046 Frequency (Hz) 0.3 25 20 0.2 15 0.1 10 5 0 10 20 30 40 50 (a) Frequency (Hz) 0.3 25 20 0.2 15 0.1 10 5 0 10 20 30 40 50 (b) Figure 4.12: Magnitude Spectrum for each signal obtained through network to signal transformation for (a) Error responses; (b) Correct responses. 101 Table 4.3: Small-world measure. Interval -25 - 0 0 - 25 25 - 50 50-75 75-100 100-125 4.7.1 σ ERN 1.165 ± 0.2827 1.1635 ± 0.2823 1.1689 ± 0.2823 1.1653 ± 0.2826 1.166 ± 0.2828 1.1671 ± 0.2829 CRN 1.1677 ± 0.2833 1.1662 ± 0.2829 1.1717 ± 0.2844 1.1681 ± 0.2833 1.1687 ± 0.2835 1.1698 ± 0.2836 Assessment of Graph Information Theoretic Measures in Functional Connectivity Networks Table 4.4 shows the entropy results (mean ± st.dev.) for the ERN and CRN FCNs over six different time intervals. For all intervals, FCNs from correct responses show higher entropy than FCNs from error responses and this difference is significant for all intervals (p < 0.05, Wilcoxon rank-sum test). This is consistent with the fact that the error-related negativity is associated with increased synchronization which results in less random networks and hence lower network entropy. In addition, the slight increase in the network entropy within the ERN interval (25-75 ms) can be related to results from a previous study [159], where it has been shown that ERN is associated with increased segregation within the FCN resulting in more clusters, i.e. higher entropy in the network organization. In addition, we correlate the entropy results of each subject with behavioral measures relevant to the cognitive control experiment. In particular, we considered the post-error slowing (PES) and post-error accuracy (PEA). PES is computed as correct response time after error responses minus correct response time after correct responses, and PEA is computed as the accuracy after error responses minus the accuracy after correct responses. Table 4.5 shows the correlation between 102 Table 4.4: Graph entropy from cognitive control FCNs. Entropy (mean ± st.dev.) Interval (ms) ERN CRN -25-0 0.8511 ± 0.0197 0.8738 ± 0.0072 0-25 0.8575 ± 0.0138 0.8726 ± 0.0073 25-50 0.8594 ± 0.0130 0.8724 ± 0.0078 50-75 0.8542 ± 0.0157 0.8713 ± 0.0095 75-100 0.8510 ± 0.0168 0.8738 ± 0.0090 100-125 0.8513 ± 0.0169 0.8723 ± 0.0066 p-value 0.0001 0.0042 0.0048 0.0090 0.0003 0.0002 Table 4.5: Correlations between graph entropy and behavioral measures from cognitive control. PES Interval (ms) ERN CRN -25-0 0.1867 0.0325 0-25 -0.0879 -0.0626 25-50 0.3358 -0.2744 50-75 0.2772 -0.0518 75-100 0.3807 0.4019 100-125 0.2976 -0.2697 PEA ERN CRN -0.1086 0.0041 -0.269 0.054 -0.0406 0.1959 -0.1373 0.0299 -0.1183 0.1128 0.2504 -0.128 the FCN entropy and behavioral measures over six different time intervals. As observed, PES is positively correlated with the graph entropy of error FCNs over the ERN interval, while the FCN entropy of correct responses is negatively correlated. This result follows previous studies showing an inverse relationship between PES and increased synchrony in the error-related activity [160], [161], and hence, PES is directly proportional to the network entropy during the ERN. 4.8 Conclusions In this chapter, a new network to signal transformation based on the resistance distance has been proposed for both binary and weighted networks. This is the first deterministic graph to signal transform proposed for both binary and weighted networks. This transform is also shown to guarantee the reconstruction of the networks from their signals. Transforming graphs into signals 103 provides the benefit of applying traditional signal processing measures to these signals in order to assess certain properties of the networks. Along those lines, the proposed graph to signal transform served as the basis for the introduction of graph information theoretic measures, an approach for small-world network characterization, and an event detection approach in temporal networks proposed in this chapter. First, we showed theoretical properties of the proposed method and the resistance distance matrix. The graph to signal transformation of various well-known network structures was illustrated for both binary and weighted networks. For binary networks, the proposed method reveals structural attributes of the graphs not perceivable by a previously proposed distance matrix. Furthermore, analysis of perturbations in the network and how these are reflected in the proposed graph to signal transform were presented as well. In addition, through simulations, we demonstrated the behavior of our proposed technique to network anomalies. Second, an approach for the computation of the structural information content of graphs using the network to signal transform was introduced. The proposed method considers the normalized power spectrum of the graph signal with the highest energy as a probability distribution to be employed in the computation of Shannon entropy and Kullback-Leibler divergence. This method is advantageous over current graph information theoretic measures due to its independence from arbitrary parameters. The results from simulated networks illustrate that the proposed method yields a reliable characterization of the structural information of the graph. Furthermore, it allows for the quantification of the structural information of functional connectivity networks and reflects differences between two cognitive states during the particular time interval of interest. In addition, we introduced a graph-signal transformation based approach for detecting events in temporal networks. The method is based on factor analysis of a tensor formed from the spectra of the different signal components across time, where the factors along the time mode reveal the 104 change to the network structure. We compared the temporal factors from the tensor decomposition of the proposed approach to the factors from the tensor decomposition of the adjacency matrices over time. By comparing the first component of the temporal mode at the time where the event occurred, we showed from ROC analysis that the proposed method results in a higher detection rate of abrupt structural changes in temporal networks when compared to the tensor decomposition of adjacency matrices over time. Lastly, it was shown how the proposed method can characterize the structural properties of functional connectivity networks under different cognitive states. Following a priori knowledge suggesting that functional connectivity networks behave as small-world networks, it was shown how the spectral centroid of the functional connectivity networks signals from the proposed graph to signal transform correlate to the spectral centroid of the signals from small-world networks, for different network parameters. From these results, it was shown that functional connectivity networks from error responses are highly correlated to a small-wold network, whereas the networks for the correct responses are less correlated. 105 Chapter 5 Dynamic Graph Fourier Transform Recent research in signal processing over graphs has provided the tools for processing signals defined on irregular domains such as graphs [49]. In many applications, such as social networks, sensor networks, energy networks, and brain networks, among others, signals lie on the set of vertices of the network. Other fields where data is defined on irregular domains include data defined on manifolds and irregularly shaped domains [162], such as cells in histological images [163], and data based on point clouds. Recently, it has been shown [164] how signal processing methods adapted to signals on graphs such as filtering and Fourier transform defined in the context of signal processing over graphs provide insights about learning processes in the brain. Various transforms from signal processing have been adapted to the graph domain to analyze the spectral content of signals over graphs. The first is the graph Fourier transform (GFT), which aims to compute the Fourier transform of a signal defined on the vertices of a graph by employing a basis obtained from the network's adjacency matrix [165] or Laplacian matrix [49]. Another transform defined on graph signals is the windowed graph Fourier transform [166], which considers the nonstationarity of the graph signals and transforms them to the vertex-frequency domain. In order to define a windowed graph Fourier transform over graph signals, in [166] the authors define generalized convolution, translation, and modulation operators for signals on graphs. In addition, a wavelet transform for graph signals has been developed in [162], known as spectral graph wavelets 106 since it is based on spectral graph theory. By doing this, scaling is defined from the eigenfunctions of the graph Laplacian and avoid its computation over irregular domains. Recently, the joint timevertex Fourier transform [167] was proposed for graph signals evolving over time. The joint timevertex Fourier transform is found by first computing the GFT along the graph dimension and the discrete Fourier transform along the time-domain. In addition, the dynamic graph wavelet transform [168] has been proposed for the case when the time-vertex domain is dynamic. However, in all of these approaches the underlying graphs are stationary. In some applications, such as functional connectivity networks in the brain, the underlying network structure varies over time [29], [169]. This requires the adaptation of the previously mentioned graph signal transforms in order to consider the nonstationary network structure. For example, in the case of the graph Fourier transform, the adjacency matrix or the Laplacian matrix of the network changes for each time instance, and a unique spectral representation is not possible. Therefore, there is no unique definition of frequency across time as the graph evolves. One alternative would be averaging of the adjacency matrix or the Laplacian [51]. However, averaging does not necessarily find the optimal subspace across time. This problem has been previously addressed by defining a common Laplacian across time, where a common subspace was found by means of Grassmann manifolds [50]. There, the authors used this common subspace in the definition of a dynamic graph Fourier transform. However, the accuracy of a common subspace is compromised as the number of time points increases. In addition, when the network structure is nonstationary, the size of the window in which the common subspace is found should be determined by the characteristics of the network. In this chapter, we propose a dynamic graph Fourier transform (dGFT) for which a common subspace estimate is found by means of tensor decomposition. The temporal network adjacency matrices or Laplacian matrices over time constitute a 3-way tensor. The Tucker decomposition of 107 this tensor results in orthonormal component matrices which define the basis of the time-varying Laplacian operator. The obtained basis and the corresponding subspace are optimal in the sense of finding the best low-rank approximation to the Laplacians across time. This chapter is organized as follows. Section 5.1 introduces background on graph signal processing and tensor decompositions. Section 5.2 presents the proposed tensor based dGFT. Section 5.3 presents results on simulated graph signals and dynamic functional connectivity networks (dFCNs) constructed from cognitive control EEG experiment. Section 5.4 presents the conclusions and future work. 5.1 5.1.1 Background Graph Signal Processing A graph signal f : V → R is defined on the vertices of the graph G. It is represented by a vector f ∈ RN×1 , and the ith element of this vector corresponds to the signal at vertex vi [165]. Thus, signal amplitudes at each node define a graph signal, and it is indexed by the graph nodes. Since the signals are defined over the graph nodes, the underlying network structure plays an important role in the definition of transformations of the signals f. In particular, the adjacency matrix or the graph Laplacian are employed in the analysis of graph signals. In this work, we focus on the graph Laplacian. The Laplacian L is a positive semidefinite and real matrix and thus has a complete set of orthonormal eigenvectors {ul }l=0,1,...,N−1 , and eigenvalues {λl }l=0,1,...,N−1 , 0 = λ0 ≤ λ1 ≤ · · · ≤ λN−1 . Therefore, it admits the eigendecomposition L = UΛUT , 108 (5.1) where U = [u0 , u1 , . . . , uN−1 ], and Λ = diag {λ0 , λ1 , . . . , λN−1 }. The spectrum of the graph Laplacian has been widely used in applications such as clustering and spectral matching. Recently, the Laplacian eigenvectors {ul }l=0,1,...,N−1 have been proposed as the Fourier basis for the graph Fourier transform [49]. As in the classical Fourier analysis, the eigenvectors of L provide a notion of frequency since, for connected graphs, the eigenvector corresponding to the smallest eigenvalue is constant, equal to √1 . N As the frequency λi increases, the eigenvectors oscillate more rapidly. The graph Fourier transform (GFT) of f defined on the graph vertices V is given by [49] N fˆ(λl ) = hf, ul i = ∑ f (i)u∗l (i), (5.2) i=1 where ul , l = 0, 1, . . . , N −1 correspond to the eigenvectors of the graph Laplacian, and the frequencies are indexed by its corresponding eigenvalues. The inverse graph Fourier transform (iGFT) is obtained by N−1 f (i) = ∑ fˆ(λl )ul (i). (5.3) l=0 In matrix form, the graph Fourier transform and its inverse are given as f̂ = UT f, (5.4) f = Uf̂, (5.5) and respectively. 109 The graph Laplacian is the discrete difference operator, and for every vector y ∈ RN it satisfies yT Ly = 1 N Wi j (yi − y j )2 , 2 i,∑ j=1 (5.6) where Wi j is the (i, j) entry of the graph adjacency matrix. In the context of graph signal processing, the graph Laplacian quadratic form, S2 (f) = fT Lf is referred to as the smoothness of the signal f with respect to the Laplacian L. The GFT can be interpreted in terms of the smoothness of the graph signals the GFT of a smooth signal will occupy the low frequencies λ in the spectrum, and will have a small S2 (f). This occurs when two neighbor vertices are connected by a edge with large weight and the signal f at those vertices has similar values. In order to filter the graph signals, low-pass ĥLk , band-pass ĥBk and high-pass ĥHk graph filters are defined as ĥLk = I{k