COMMUNITY DETECTION IN TEMPORAL MULTI-LAYER NETWORKS By Esraa Mustafa Al-sharoa A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Electrical Engineering - Doctor of Philosophy 2019 ABSTRACT COMMUNITY DETECTION IN TEMPORAL MULTI-LAYER NETWORKS By Esraa Mustafa Al-sharoa Many real world systems and relational data can be modeled as networks or graphs. With the availability of large amounts of network data, it is important to be able to reduce the network’s dimensionality and extract useful information from it. A key approach to network data reduction is community detection. The objective of community detection is to summarize the network by a set of modules, where the similarity within the modules is maximized while the similarity between different modules is minimized. Early work in graph based community detection methods focused on static or single layer networks. This type of networks is usually considered as an oversimplifi- cation of many real world complex systems, such as social networks where there may be different types of relationships that evolve with time. Consequently, there is a need for a meaningful rep- resentation of such complex systems. Recently, multi-layer networks have been used to model complex systems where the objects may interact through different mechanisms. However, there is limited amount of work in community detection methods for dynamic and multi-layer networks. In this thesis, we focus on detecting and tracking the community structure in dynamic and multi-layer networks. Two particular applications of interest are considered including temporal social networks and dynamic functional connectivity networks (dFCNs) of the brain. In order to detect the community structure in dynamic single-layer and multi-layer networks, we have developed methods that capture the structure of these complex networks. In Chapter 2, a low-rank + sparse estimation based evolutionary spectral clustering approach is proposed to detect and track the community structure in temporal networks. The proposed method tries to decompose the network into low-rank and sparse parts and obtain smooth cluster assignments by minimizing the subspace distance between consecutive time points, simultaneously. Effectiveness of the pro- posed approach is evaluated on several synthetic and real social temporal networks and compared to the existing state-of-the-art algorithms. As the method developed in Chapter 2 is limited to dy- namic single-layer networks and can only take limited amount of historic information into account, a tensor-based approach is developed in Chapter 3 to detect the community structure in dynamic single-layer and multi-layer networks. The proposed framework is used to track the change points as well as identify the community structure across time and multiple subjects of dFCNs constructed from resting state functional magnetic resonance imaging (rs-fMRI) data. The dFCNs are summa- rized into a set of FC states that are consistent over time and subjects. The detected community structures are evaluated using a consistency measure. In Chapter 4, an information-theoretic ap- proach is introduced to aggregate the dynamic networks and identify the time points that are topo- logically similar to combine them into a tensor. The community structure of the reduced network is then detected using a tensor based approach similar to the one described in Chapter 3. In Chapter 5, a temporal block spectral clustering framework is introduced to detect and track the community structure of multi-layer temporal networks. A set of intra- and inter-adjacency matrices is con- structed and combined to create a set of temporal supra-adjacency matrices. In particular, both the connections between nodes of the network within a time window, i.e. intra-layer adjacency, as well as the connections between nodes across different time windows, i.e. inter-layer adjacency are taken into account. The community structure is then detected by applying spectral clustering to these supra-adjacency matrices. The proposed approach is evaluated on dFCNs constructed from rs-fMRI across time and subjects revealing dynamic connectivity patterns between the resting state networks (RSNs). This work is dedicated to my beloved parents, Mustafa Al-sharoa and Shama Al-kofahi, who have been the source of inspiration, encouragement and unconditional love during all times. To my soulmate and lovely husband, Mahmood Al-khassaweneh, who has been supporting and encouraging me to make my dreams come true. Thank you for being in my life and believing in me. To my precious children, Ahmad, Ghadaq, Alaya and Tabarak. You are the source of my happiness, strength and success. I love you to the moon and beyond. To my family in law, sisters, brothers and friends. Thank you for your prayers, kind words and support. iv ACKNOWLEDGMENTS I would like to thank my advisor Prof. Selin Aviyente who without her guidance, valuable com- ments and inputs this dissertation would not have been possible. Many thanks to Prof. Aviyente for being ready to help in all times. In addition, I would like to thank my committee members Prof. John Deller, Prof. Hayder Radha and Dr. Mark Iwen for their useful comments and guidance. A very special thanks go to the Schlumberger Foundation, Faculty for the Future Fellowship who supported me through my PhD. I am very grateful to them. I would like also to thank my husband Mahmood Al-khassaweneh, my children Ahmad, Ghadaq, Alaya and Tabarak. With their encouragement, support and love I was able to finish this disserta- tion. Finally, I would like to thank the best role models in my life, my parents Mustafa Al-sharoa and Shama Al-kofahi, who have been showering me with their love and prayers. They inculcated me with the willing to learn and build a bright future. Thanking them will never be enough. v TABLE OF CONTENTS LIST OF TABLES . . LIST OF FIGURES . . . . . . . LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Chapter 1 . . . . . . Introduction . . 1.1 Graph Theory . 1.2 Community Detection in Static Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Graph Cut problem and Spectral Clustering . . . . . . . . . . . . . . . . . 1.2.2 Modularity Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 4 5 6 7 1.3 Background on Tensors . 1.4 Community Detection in Dynamic Networks . . . . . . . . . . . . . . . . . . . . . 10 1.5 Community Detection in Dynamic Multi-layer Networks . . . . . . . . . . . . . . 12 1.6 Dynamic Functional Connectivity Networks of the Brain . . . . . . . . . . . . . . 13 1.7 Organization and Contributions of this Thesis . . . . . . . . . . . . . . . . . . . . 16 Chapter 2 Introduction . 2.1 2.2 Background . . . . Detecting and Tracking Community Structure in Temporal Networks: A Low-Rank + Sparse Estimation Based Evolutionary Clustering Approach 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.1 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.2 Low-rank + Sparse matrix approximation . . . . . . . . . . . . . . . . . . 23 2.2.3 Chordal Subspace Distance Measure . . . . . . . . . . . . . . . . . . . . . 24 2.2.4 Variation of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 . . . . . . . . . 2.3 Methods: low-rank + sparse estimation based evolutionary spectral clustering (LS- . . . . . . . . . . ESC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Distance Between Successive Subspaces . . . . . . . . . . . . . . . . . . . 26 2.3.3 Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3.4 Tracking the community structure over time . . . . . . . . . . . . . . . . . 30 2.3.5 Detecting the number of clusters . . . . . . . . . . . . . . . . . . . . . . . 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Simulated temporal networks . . . . . . . . . . . . . . . . . . . . . . . . . 32 Simulated binary temporal networks . . . . . . . . . . . . . . . . 32 2.4.1.1 2.4.1.2 Simulated weighted temporal networks . . . . . . . . . . . . . . 35 2.4.2 Real social network: MIT Reality Mining Dataset . . . . . . . . . . . . . . 38 2.4.3 Real social network: Primary school network . . . . . . . . . . . . . . . . 39 2.4.1 . . . . . . . . . 2.4 Results . 2.4.3.1 Primary school data set description and temporal network con- struction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 vi 2.4.3.2 Detecting and tracking the community structure in the primary 2.5 Conclusions . . school temporal network (PSTN) . . . . . . . . . . . . . . . . . . 42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 . . . . 3.3 Methods . 3.4 Results . Chapter 3 Introduction . 3.1 3.2 Background . . . . . . . . . . . . . . . . . . . . . Tensor Based Temporal and Multi-layer Community Detection . . . . . . 47 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1 Quality Metrics to Determine the Number of Clusters . . . . . . . . . . . . 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.1 3D-Windowed Tensor Approach (3D-WTA) . . . . . . . . . . . . . . . . . 53 3.3.2 3.3.3 Running Time Tensor Approach (RTTA) . . . . . . . . . . . . . . . . . . . 54 3.3.4 4D-Windowed Tensor Approach (4D-WTA) . . . . . . . . . . . . . . . . . 56 3.3.5 Tracking the Community Structure Over Time . . . . . . . . . . . . . . . . 57 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Simulated Temporal Networks . . . . . . . . . . . . . . . . . . . . . . . . 60 3.4.1.1 Description of the Data sets . . . . . . . . . . . . . . . . . . . . 60 3.4.1.2 Comparison of different quality metrics . . . . . . . . . . . . . . 63 3.4.1.3 Comparison between 3D-WTA and state of the art algorithms . . 64 3.4.1.4 Comparison between 3D-WTA and 4D-WTA . . . . . . . . . . . 65 3.4.1.5 . . . . . . . . . . . . . . . . . . . . . . 69 3.4.2 MIT Reality Mining Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4.3 Application to dFCN From EEG Dataset . . . . . . . . . . . . . . . . . . . 70 3.4.4 Application to dFCN From rs-fMRI Dataset . . . . . . . . . . . . . . . . . 71 Flocks of Boids Dataset 3.4.1 . . . . . . . . . 3.4.4.1 Data Preprocessing and Functional Connectivity Networks Con- 3.4.5 Dynamic Functional Connectivity Network Construction struction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 . . . . . . . . . 72 3.4.5.1 Resting State Networks (RSNs) . . . . . . . . . . . . . . . . . . 72 3.4.5.2 Detecting the Community Structure in rs-fMRI . . . . . . . . . . 72 3.4.5.3 Community structure of each FC state . . . . . . . . . . . . . . . 76 3.4.5.4 . 77 3.4.5.5 Consistency of the RSNs across time . . . . . . . . . . . . . . . 78 3.4.5.6 Comparison Between 3D-WTA and 4D-WTA . . . . . . . . . . . 79 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 State transition profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Conclusions . Chapter 4 Introduction . 4.1 . 4.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detecting Brain Dynamics During Resting State: A Tensor Based evolu- tionary Clustering Approach . . . . . . . . . . . . . . . . . . . . . . . . . 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 . 94 4.2.1 Data and Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.2.2 Dynamic Functional Connectivity Networks (dFCNs) . . . . . . . . . . . . 95 Dynamic Functional Connectivity Network Construction . . . . 95 Surrogate Data Generation for Testing Statistical Significance of the dFCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2.3 Reducibility of Brain dFCNs . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.2.2.1 4.2.2.2 vii 4.4 Conclusions . . Chapter 5 5.1 5.2 Methods . Introduction . . . 5.3 Results . 4.3 Results . 4.2.4 Dynamic Network Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2.4.1 HOOI Clustering Based Approach . . . . . . . . . . . . . . . . 99 4.2.4.2 Choosing the Number of Clusters . . . . . . . . . . . . . . . . . 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 . 100 . 102 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 dFCN Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FC States Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 4.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Construction of multi-layer temporal networks Temporal Block Spectral Clustering for Multi-layer Temporal Functional Connectivity Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 . 109 . 111 . . . . . . . . . . . . . . . 111 5.2.1.1 Construction of a temporal network from signals . . . . . . . . . 111 5.2.1.2 Construction of multi-layer temporal network . . . . . . . . . . 112 . . . . . . . . . . . 114 5.2.2 Temporal block spectral clustering approach (TBSCA) . 115 Simulated networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3.1 5.3.2 Community Detection in dynamic functional connectivity networks (dFCNs)116 5.3.2.1 Data and preprocessing . . . . . . . . . . . . . . . . . . . . . . 116 5.3.2.2 Dynamic Community structure . . . . . . . . . . . . . . . . . . 117 5.3.2.3 Consistency of the RSNs across time . . . . . . . . . . . . . . . 118 . 121 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Conclusions . Chapter 6 6.1 Conclusions . 6.2 Future Work . . Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 122 . 122 . 125 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX . . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 viii LIST OF TABLES Table 2.1: Table 2.2: Table 2.3: Table 3.1: Table 3.2: Table 3.3: Table 4.1: Parameters of the simulated binary networks from Girvan and Newman benchmark for six different data sets and the selected values of the regu- larization parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Parameters of the truncated Gaussian distribution used to generate the edges of the simulated weighted temporal networks for three different data sets, the ground truth of the nodes’ community membership for the generated networks and the selected values of the regularization parameters. 37 Different classes and number of teachers and students participating in the primary school data from each class. . . . . . . . . . . . . . . . . . . . . 41 Parameters of the truncated Gaussian distribution used to generate the edges of the simulated networks for four different datasets, and the ground truth of the nodes’ community membership for the generated networks. . 62 Parameters of the truncated Gaussian distribution used to generate the edges of the simulated networks for the 4D simulated data, and the ground truth of the nodes’ community membership for the generated network. . . 68 The ROIs belonging to the different subnetworks in resting state. The numbers in the table refer to the ROIs defined in Table 1. . . . . . . . . . 73 The ROIs that defines the RSNs. The numbers in the table refer to the ROIs defined in Table 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Table 4.2: Key Attributes of the Tensors Constructed from the Reduced dFCN. . . . 102 Table 1: List of the ROIs in the AAL 90 template. The odd and even indices refer to the left and right brain hemispheres ROIs, respectively. . . . . . . . . . 139 ix Figure 1.1: Figure 2.1: Figure 2.2: Figure 2.3: Figure 2.4: Figure 2.5: Figure 2.6: Figure 2.7: LIST OF FIGURES An example of dynamic multi-layer network: The network consists of 6 nodes, 3 time points and 3 layers. Solid lines represent the edges be- tween the nodes in the same layer while dashed lines represent the edges between nodes across layers. . . . . . . . . . . . . . . . . . . . . . . . . 3 VI comparison between LS-ESC, AFFECT and DYNMOGA for simu- lated binary networks from Girvan and Newman benchmark with the pa- rameters presented in Table 2.1. LS-ESC scored the lowest VI over time compared to the other methods: (a) Data set 1; (b)Data set 2; (c) Data set 3; (d) Data set 4; (e) Data set 5; and (f) Data set 6. . . . . . . . . . . . . . 34 Comparison between LS-ESC and state of the art algorithms for the dif- ferent weighted networks in Table 3.1: (a)-(c) The cost function calcu- lated by LS-ESC, (d)-(f) The detected number of clusters (DNOC) for the simulated networks using different algorithms. LS-ESC estimated the correct number of clusters over time and the cost function reflected the correct change points; (g)-(i) Comparison of variation of information (VI) between LS-ESC, AFFECT and GenLov for the simulated networks. LS-ESC outperforms other methods and scored the lowest variation of in- formation over time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Cost function calculated by LS-ESC for the MIT Reality mining network. The changes in the cost function coincides with six important dates during the school year (red dashed lines). . . . . . . . . . . . . . . . . . . . . . 39 Detected community structure of the MIT reality mining temporal net- work at different times: (a) Week 5; (b)Week 15; (c) Week 27; and (d) Week 39. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 . . . Cost function calculated by LS-ESC for the primary school temporal net- work for the: (a) first day, (b) second day. Red dashed lines refer to the time during the day. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Detected community structure in the first day of the primary school tem- poral network at different times: (a) Morning, (b) break, (c) lunch and (d) end of the school day. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Detected community structure in the second day of the primary school temporal network at different times: (a) Morning, (b) break, (c) lunch and (d) end of the school day. . . . . . . . . . . . . . . . . . . . . . . . . 46 x Figure 3.1: Figure 3.2: Figure 3.3: Figure 3.4: Detected number of clusters (DNOC) by 3D-WTA for the data sets in Table 3.1 using different metrics, including: Eigengap (EG), modularity (Q), muti-resolution modularity Qγ and asymptotical surprise (AS). . . . . 63 Comparison between 3D-WTA and state of the art algorithms for the dif- ferent data sets in Table 3.1: (a)-(d) The normalized cost function calcu- lated by 3D-WTA, (e)-(h) The detected number of clusters (DNOC) for the simulated networks using different algorithms. 3D-WTA estimated the correct number of clusters over time and the normalized cost func- tion reflected the correct change points represented by the pink dashed lines. (i)-(l) Comparison of variation of information (VI) between 3D- WTA, AFFECT, PCM-NC, PCQ-NC, TC and GenLov for the simulated networks. 3D-WTA scored the lowest variation of information over time compared to the other methods. . . . . . . . . . . . . . . . . . . . . . . . 66 Variation of information (VI) comparison between the detected commu- nity structure by 3D-WTA and 4D-WTA and the ground truth for simu- lated data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 . . . (a) The normalized cost function and the detected number of clusters for Flocks of Boids dataset using different algorithms. 3D-WTA results in the best estimate of the number of clusters especially after the change point, (b) Rand index comparison between 3D-WTA, AFFECT, PCM and PCQ for Flocks of Boids Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . 82 Figure 3.5: Detected Change Points for Reality Mining Data. . . . . . . . . . . . . . 83 Figure 3.6: Detected Change Points for ERN Data. . . . . . . . . . . . . . . . . . . . 83 Figure 3.7: Community structure for the ERN networks obtained by RTTA: (a) Pre- ERN, (b) ERN, (c) Post-ERN. . . . . . . . . . . . . . . . . . . . . . . . 84 Figure 3.8: A flowchart of the application of 4D-WTA to rs-fMRI: (a) Time series from each ROI; (b) dFCNs constructed across time and subjects using Pearson correlation coefficient; (c) After random subsampling across time and subjects, 4 dimensional FC tensors are constructed for each sample. Each tensor consists of 30 random time points from each subject; (d) Tucker decomposition of each FC tensor; (e) Community structure found by 4D-WTA for each sample; (f) The detected community structure is summarized into 5 functional connectivity states defined through com- munity detection using k-means. . . . . . . . . . . . . . . . . . . . . . . 85 Figure 3.9: (a) Distribution of weights across time for all samples, (b) Distribution of weights across subjects for all samples. . . . . . . . . . . . . . . . . . . . 86 xi Figure 3.10: The similarity among the community structures obtained by 4D-WTA across the 200 subsamples across time and subjects. The similarity be- tween the different community structures is quantified by variation of in- formation. Samples with similar community structure are grouped to- gether by k-means. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Figure 3.11: State transition profiles for subjects 4, 7, 9 and 12. . . . . . . . . . . . . . 87 Figure 3.12: Variation in RSNs’ consistency across subjects calculated by (a) 3D- WTA, (b) PCQ-NC, (c) GenLov, (d) 4D-WTA. . . . . . . . . . . . . . . . 88 Figure 3.13: Full and circular view of the community structure of the FC states found by 4D-WTA: (a) State 1 with 9 clusters; (b) State 2 with 4 clusters; (c)State 3 with 8 clusters; (d) State 4 with 7 clusters and (e) State 5 with 6 clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 . . . Figure 4.1: Jensen-Shannon divergence between all pairs of time points in the dFCN. 101 Figure 4.2: Figure 4.3: Figure 5.1: Figure 5.2: Figure 5.3: . . . . . . . . . . . . . . . . . . . . . . 101 The dendrogram resulting from hierarchical clustering and the correspond- ing values of q. The dashed red line identifies the cut point of the dendro- gram from the second peak of q. The detected community structure for the FC States (1− 10). For each state, (left) the optimized weighted average of the adjacency matrices within the state. Each cluster is denoted by the main ROIs detected within the cluster. The clusters shown in the circular graphs (right) refer to the different RSNs defined in literature, and the connections between them are detected by the proposed algorithm. The number of clusters for the FC states varies between 3 and 4. . . . . . . . . . . . . . . . . . . . . . . 108 Comparison between the detected community structure by TBSCA, BNLSC and GenLov and the ground truth for the synthetic data set in terms of variation of information. The number of clusters changes from 3 to 2 at time point 21, and from 2 to 3 at time point 41. . . . . . . . . . . . . . . 116 Agreement between the detected community structures across time for subject 1 measured by variation of information (VI). The dashed lines refer to the detected change points. . . . . . . . . . . . . . . . . . . . . . 118 Detected community structure from selected change points. The ROIs in different networks reconfigured their community membership across time. Visualized by circularGraph toolbox [1]. . . . . . . . . . . . . . . . 119 Figure 5.4: Agreement between the detected community structures across time for all subjects measured by variation of information (VI). . . . . . . . . . . . . 120 xii Figure 5.5: Consistency of different RSNs across subjects calculated from the de- tected community structure by TBSCA. . . . . . . . . . . . . . . . . . . 120 xiii LIST OF ALGORITHMS Algorithm 2.1 LS-ESC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Algorithm 3.2 3D-Windowed Tensor Approach (3D-WTA) . . . . . . . . . . . . . . 54 Algorithm 3.3 Running Time Tensor Approach (RTTA) . . . . . . . . . . . . . . . . 55 Algorithm 3.4 TrackU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Algorithm 3.5 4D Windowed Tensor Approach (4D-WTA) . . . . . . . . . . . . . . 58 Algorithm 4.6 HOOI clustering based approach. . . . . . . . . . . . . . . . . . . . 100 xiv Chapter 1 Introduction Many real world systems can be modeled as complex networks. The objects in these systems are defined as the nodes or vertices of the graph, while the interactions between these objects define the edges of the graph. With the growth of network data, it is important to be able to reduce the dimensionality and extract useful information from these networks. Examples of real systems that are naturally structured as networks include social [2], biological [3] and communication systems [4]. One of the most important characteristics of these networks is the community structure, where the nodes are organized into a set of modules. One common way to understand the community structure in these networks is through community detection techniques. The objective of graph based community detection is to partition a set of nodes into smaller subsets (modules or communities) such that nodes within the same module are similar to each other and are densely connected while they are different from the nodes in the other modules. Since many complex systems can be modeled as graphs or networks, various graph based community detection techniques have been developed. However, most of these methods are limited to static networks, i.e. the communities do not change with time. As many real world systems exhibit variation in their community structure over time, there is a growing need to develop algorithms that detect the community structure and track its evolution across time [5]. Other complex systems might have different types of relationships between their entities and these systems can be modeled as multi- 1 layer networks [6]. Multi-layer networks consider relationships between nodes within layers, i.e. intra-layer edges, and across layers, i.e. inter-layer edges. In particular, multi-layer networks consider all possible edges between nodes within and across layers. Consequently, it important to develop community detection algorithms that can detect and track the community structure in multi-layer networks in order to understand these systems correctly. In this thesis, we focus on developing community detection approaches for two types of net- works: dynamic (temporal) single-layer networks and dynamic multi-layer networks. In dynamic single-layer networks, the network topology changes over time as the nodes might come and go or the interactions between them might change across time. For instance, we can model a group of users on Facebook and their interactions by a dynamic network, where the interactions between them might vary across time as shown in the first row of Fig. 1.1. In multi-layer networks, the objects in the network have different types of interactions between them and each layer corre- sponds to one type of these interactions. For instance, a group of users on Facebook may also have accounts on Twitter and Instagram and interactions between these users can occur through the different accounts which reflect the multiple relationships between them. In dynamic multi- layer networks, the relationships between the objects vary both by the type of the relationship and time. For example, the interactions between the same group of users on Facebook, Twitter and Instagram may change over time as well as across the different layers and this can be represented by a dynamic multi-layer network as shown in Fig. 1.1. 1.1 Graph Theory Let G = {V,E,A} be an undirected weighted graph with a set of vertices or nodes set V = {v1, . . . ,vn}, and a set of edges, E, that represents the pairwise weights between the nodes. The 2 Figure 1.1: An example of dynamic multi-layer network: The network consists of 6 nodes, 3 time points and 3 layers. Solid lines represent the edges between the nodes in the same layer while dashed lines represent the edges between nodes across layers. weighted adjacency matrix, A ∈ Rn×n, is symmetric and its elements represent the similarity be- tween each pair of nodes. For binary graphs, Ai j ∈ {0,1} and for weighted graphs Ai j ∈ [0,1]. The degree of a vertex, vi ∈ V , is defined as di = diagonal matrix with {d1, . . . ,dn} on its diagonal [7, 8]. Ai j and the degree matrix, D, is defined as the n ∑ j=1 For a dynamic network with M time points, a set of weighted and undirected graphs are con- structed to represent the network over time as {G (tm)}, where tm ∈ {t1,t2, . . . ,tM}. The graph at each time point is represented by the adjacency matrix A(tm) ∈ Rn×n. 3 1.2 Community Detection in Static Networks Early work in community detection has focused on static networks. Different clustering approaches have been proposed for community detection, including hierarchical clustering, partitional [9] [7] and modularity optimization approaches [10, 11]. In hierarchical clustering approaches, a hier- archical structure is constructed using a distance matrix and visualized through a dendrogram. Hierarchical clustering is usually implemented by agglomerative or divisive algorithms. In ag- glomerative algorithms, clustering starts with individual nodes as separate clusters, then at each step the closest pairs of clusters are merged until all of the nodes form one big cluster. On the other hand, divisive algorithms start with all nodes as one cluster, then at each step it divides the clusters until each cluster contains only one node [12, 13]. Under partitional algorithms, squared- error based algorithms and graph-theoretic algorithms were introduced to assign the objects into r clusters. Squared-error based algorithms rely on optimizing a criterion function in order to pro- duce the clusters. k-means clustering, the most popular squared-error algorithm, relies on a cost function that minimizes the within-cluster sum of squares (intra-cluster distances) and maximizes the between-cluster sum of squares (inter-clusters distances). Graph-theoretic algorithms partition the graph using a cut cost depending on the spectral properties of the graph, i.e. eigenvalues and eigenvectors of the adjacency matrix [7]. Under modularity optimization methods, the modular- ity measures the deviation of the within community edges from the expected value of the same quantity in a network with the same community partitions but random connections between the vertices [11]. Even though the aforementioned methods are limited to static networks, recent work has extended these methods to handle time varying networks. More discussion follows in Sec. 1.4. 4 1.2.1 Graph Cut problem and Spectral Clustering Over the past decades, several methods have been proposed for graph partitioning. One of the simplest ways to partition a graph, G = {V,E,A}, is to solve the mincut problem. The mincut problem can be solved by choosing the partition C = {C1, . . . , Cr} with Cl ∩Cl(cid:48) = /0 where l (cid:54)= l(cid:48) and C1 ∪ . . .∪Cr = C which minimizes the following objective function: cut(C1, . . . , Cr) = 1 2 r ∑ l=1 A(Cl, ¯Cl), (1.1) where ¯Cl is the complement of Cl and A(Cl, ¯Cl) = ∑i∈Cl, j∈ ¯Cl Ai j for two disjoint sets Cl, ¯Cl ⊂ V . The mincut problem performs efficiently when r = 2 as discussed in [14]. However, solving this problem might result in unbalanced clusters as it might separate one single vertex from the rest of the graph and consider it as a separate cluster. In order to overcome this problem, normalized versions of the cut function have been proposed. The two most commonly used ones are RatioCut [15] and normalized cut, Ncut [16]. In RatioCut, the size of a cluster in the graph is measured by the number of vertices, |Cl|, whereas in Ncut the size is measured by the total weights of the within cluster edges, vol(Cl). These functions are are defined as: RatioCut(C1, . . . , Cr) = 1 2 r ∑ l=1 A(Cl, ¯Cl) |Cl| , Ncut(C1, . . . , Cr) = 1 2 r ∑ l=1 A(Cl, ¯Cl) vol(Cl) . (1.2) (1.3) Minimizing both of these functions leads to balanced clusters normalized by the number of vertices or edge weights, respectively. However, adding the balancing conditions, |Cl| and vol(Cl), makes the cut problem NP hard to begin with as discussed in [17]. Spectral clustering offers an efficient solution to the relaxed versions of these problems. Relaxing the RatioCut leads to unnormalized spectral clustering and relaxing the Ncut leads to normalized spectral clustering. 5 Spectral clustering maps the relationship between the nodes in a graph into a lower dimen- sional subspace. Given a network with adjacency matrix, A ∈ Rn×n, spectral clustering solves the following trace optimization problem [7], (cid:16) (cid:17) min U∈Rn×K Trace U†ΦΦΦU , s.t U†U = I, (1.4) where ΦΦΦ = I−AN is the normalized Laplacian matrix with AN = D− 1 2 AD− 1 2 the normalized adja- cency matrix [18, 19]. The normalized Laplacian matrix is a square, symmetric and positive semi-definite matrix. Consequently, all the eigenvalues, {λ1,λ2, . . . ,λn}, of ΦΦΦ are real and positive with 0 as the smallest eigenvalue. The solution to the optimization problem in Eq. (2.1) is found by choosing U as the matrix that contains the r eigenvectors that correspond to the smallest r eigenvalues of ΦΦΦ. The community structure is then determined by applying k-means to the matrix U [7]. Since we are concerned about community detection in dynamic networks, one approach is to apply standard spectral clustering at each time point individually. However, this approach is very sensitive to instantaneous noise, and produces inaccurate clustering results for noisy networks. 1.2.2 Modularity Based Clustering Modularity based clustering is considered as one of the most popular optimization based cluster- ing techniques. This approach relies on maximizing the modularity function across all possible partitions [11, 20]. Modularity can be defined as, Q = 1 2E ∑ c∈C ∑ i, j∈Vc [Ai j − DiD j 2E ], (1.5) 6 n ∑ i, j=1 Ai j is the total edge weight, Ai j is the i jth entry of the adjacency matrix, Di is the where 2E = degree of node i, C = {C1, . . . ,Cr} is the set of clusters and Vc is the set of nodes in the cth cluster. Modularity measures the deviation ratio of the within community edges in the network from the expected value of the same quantity in the network with the same community partitions but random connections between the vertices [11]. However, there are several drawbacks with modularity based clustering, such as computational cost, null model selection, preference for bigger clusters and the possibility to converge to a sub-optimal solution [20, 21]. 1.3 Background on Tensors Tensors are multi-dimensional arrays that can be used to provide a compact representation for higher-order data and preserve the multi-linear relationships in data. Different approaches have been developed to summarize and extract meaningful structure from high-order data using tensors. The advantages of using tensor analysis over two-dimensional analysis for high-order data have been discussed in great detail in [22–24]. Tensor matricization: A J-order tensor is denoted as X ∈ RI1×I2×···×IJ where xi1, i2,..., iJ cor- responds to the (i1, i2, . . . , iJ)th element in the tensor, X . A J-order tensor can be transformed into a matrix by reordering its elements in a process called matricization or unfolding. The mode- j matricization of a tensor, X ∈ RI1×I2×···×IJ, is denoted as X( j). A tensor element (i1, i2, . . . , iJ) is mapped to a matrix element (i j,k), where k = 1 + J ∑ l=1, l(cid:54)= j (il − 1)Kl with Kl = l−1 ∏ q=1, q(cid:54)= j Iq. (1.6) j-mode product: The j-mode product of a tensor X ∈ RI1×···×Ij×···×IJ and a matrix U ∈ RK×Ij is denoted as X × j U and has a size of I1 ×···× Ij−1 × K × Ij+1 ×···IJ with its elements defined as: 7 (X × j U)i1 i2 ... i j−1 k i j+1 ... iJ = Ij ∑ i j=1 xi1 ... iJuki j, which can be expressed in terms of unfolded tensors as: C = X × j U ⇔ C( j) = UX( j) (1.7) (1.8) The j-Rank: The rank of a j-mode tensor, X ∈ RI1×I2×···×IJ, is denoted as rank j(X ) and defined as the column rank of X( j). The n-rank of X is then defined as the collection of ranks of mode matrices X( j) and can be expressed as: rank j(X ) = {rank(X(1)), rank(X(2)), . . . , rank(X( j))}, (1.9) where j = 1, . . . , J. Tensor factorization: Matrix factorization methods such as singular value decomposition (SVD) and principle component analysis (PCA) have been extended to tensors through Parral- lel Factor for Cross Products (PARAFAC) and Tucker decomposition [25]. Tucker decomposition provides orthogonal subspace information along each mode of the tensor, where the tensor is de- composed into a core tensor multiplied by an orthogonal component matrix along each mode. A J mode tensor X ∈ RI1×I2×···×IJ, can be written using the Tucker model as follows: X = Y ×1 U(1) ×2 U(2)···×J U(J), (1.10) where Y ∈ RI1×I2×···×IJ is the core tensor, U( j) ∈ RIj×Ij for j = 1,2, . . . , J is the orthogonal com- ponent matrix along the jth mode. Tucker decomposition is commonly implemented through high-order SVD (HOSVD) [26] or higher order orthogonal iteration (HOOI) [27]. HOSVD performs SVD on the unfolded matrix, X( j), along each mode of the tensor independently [28]. On the other hand, HOOI performs alter- nating optimization to find the best projection matrices. 8 HOOI is an iterative algorithm used to compute the low rank approximation of tensors. For a J mode tensor X , let r1,r2, . . . ,rJ be a set of integers satisfying r j ≤ Ij, for j = 1, . . . ,J. The best rank-(r1,r2, . . . ,rJ) approximation of the tensor X is achieved by finding a set of U( j) ∈ RIj×r j matrices with orthonormal columns for j = 1,2, . . . , J, and a core tensor Y ∈ Rr1×r2×···×rJ that satisfies the following least squares optimization problem, min U(1),U(2),...,U(J),Y (cid:107) X − Y ×1 U(1) ×2 U(2)···×J U(J) (cid:107)2 F . (1.11) The optimal solution of Eq. (1.11) is given by, Y = X ×1 UT (1) ×2 UT (2)···×J UT (J), (1.12) with UT ( j)U( j) = I, where I ∈ Rr j×r j is the identity matrix. The HOOI relies on alternating least squares approach to solve the best rank-(r1,r2, . . . ,rJ) approximation problem, and solves a set of restricted optimization problems successively as follows, (cid:107) X − Y ×1 U(1) ×2 U(2)···×J U(J) (cid:107)2 F , min U(p) (1.13) where the optimization is performed over the jth matrix U( j) with the latest optimized values of the other U(p)’s substituted in (1.11) and (1.12), where j (cid:54)= p. A more detailed description can be found in [27]. 9 1.4 Community Detection in Dynamic Networks The community structure of most real world systems varies across time. For instance, in social networks, the community structure can change with time as people leave or join different commu- nities [29, 30]. As a result, evolutionary clustering approaches have been implemented to detect the community structure in this type of networks. Evolutionary clustering provides a framework to cluster the network at each time point, such that it preserves the network structure at the current time and the cluster assignments change smoothly across time. Over the past decade, different evolutionary clustering approaches have been developed to detect the community structure of dynamic networks. These approaches include simple extensions of static clustering methods [31, 32], statistical modeling based methods [33], tensor based methods [34] and modularity based methods [35]. Extensions of static clustering methods such as k-means and agglomerative hierarchical cluster- ing approaches have been developed in [31] for dynamic networks. A cost function that considers both the cluster membership at the current time and the deviation from the previous time point was introduced. In k-means evolutionary clustering, this deviation is quantified through a distance measure between centroids of clusters from consecutive points. However, this distance measure is sensitive to variation in the historic centroids, since it depends on matching the current centroid with its peer from recent history. Similarly, in evolutionary hierarchical clustering, the merging step of the tree nodes at the current time is affected by the historic community assignment. In [32], the k-means evolutionary clustering approach was extended using two spectral clustering based frameworks: preserving cluster quality (PCQ) and preserving cluster membership (PCM). In both frameworks, the clustering depends on a cost function to guarantee temporal smoothness. In PCQ, the temporal smoothness is defined as the cost of applying the current clustering results to the 10 previous data point, consequently the current clustering results are rejected if they disagree with recent history. On the other hand, PCM considers the projection distance between the current and previous clustering results, which is not a proper metric [36], to measure the difference between the current and historic clusterings. Moreover, the suggested temporal cost function requires a priori knowledge about the changes in the community structure and depends on the choice of a changing parameter. Under statistical modeling based methods, authors in [33] assumed a statistical model for the evolution of the adjacency matrix and an adaptive forgetting factor for evolutionary clustering and tracking (AFFECT) was introduced. In AFFECT, static clustering methods were performed after smoothing the proximity between objects over time. However, AFFECT depends on a stochastic block model for the adjacency matrix and has higher computational complexity compared to static clustering to estimate the forgetting factor. Under modularity based methods, a statistical method based on greedy modularity optimization has been proposed in [35] to identify the changing modular organization in temporal multiplex networks. Yet this approach has some drawbacks as it depends on modularity optimization as discussed in [20]. Under tensor based methods, a non-negative tensor factorization approach was introduced in [34]. In this approach, the community structure of the temporal network is detected using PARAFAC, where all the successive adjacency matrices in the temporal network were represented by a single 3-way tensor. However, this representation of the temporal network may not reflect the community structure of the network at different time points. 11 1.5 Community Detection in Dynamic Multi-layer Networks Most real world complex systems cannot be simply represented by static networks. In many disci- plines, the systems are represented by dynamic networks [5] that evolve over time, and the com- munity structure has to be detected across time. As a result, different evolutionary clustering approaches have been introduced, as discussed in Section 1.4. Moreover, many of these systems can be comprised of multiple types of relationships, and can be modeled as multi-relational or multi-layer networks [6]. In multi-layer networks, the entities or the nodes in the network ex- hibit different types of relationships, i.e. within layer and across-layers relationships, and these relationships should be taken into account to understand the correct underlying structure of these networks [37]. For instance, social networks can be modeled as multi-layer networks, where the nodes correspond to the same group of individuals across all layers while the edges within each layer correspond to intra-layer relationships and the edges across layers correspond to inter-layer relationships [37–39]. Another example is air transportation network [40], where the nodes cor- respond to the airports and the edges correspond to the direct flights between them. Each layer then corresponds to the different airlines. More recently, the need to combine the concepts of dynamic networks and multi-layer networks has increased, resulting in a new representation for complex systems as dynamic multi-layer networks. Dynamic multi-layer networks provide a com- plete representation for systems with various types of relationships that evolve across time. The aforementioned social network example can be remodeled as a dynamic multi-layer network when considering the changes in the relationships between individuals across time. Over the past years, few methods have been developed to detect the community structure in dynamic multi-layer networks. In [35], the authors introduced a general framework that encom- passes time-evolving, multi-scale and multi-layer networks to determine the community structure 12 in multi-slice networks. These networks include a set of adjacency matrices that represent the vari- ation in connectivity between nodes across time, or variation in the types of relationships across layers, or variation of the same network across multiple scales. The framework relies on a gener- alization of the modularity optimization problem for detecting the community structure of a single network. Since this framework depends on modularity, it has multiple drawbacks including the computational cost, null model selection and the possibility of the convergence to a non-optimal solution as discussed in [20]. Moreover, this approach only considers the inter-layer edges be- tween each node and itself and ignores the inter-layer edges between different nodes across layers. Another evolutionary clustering approach to mine and track dynamic multi-layer networks is intro- duced in [41]. The proposed framework relies on a multi-objective optimization representation and the solution is obtained by a genetic algorithm. This approach has a high computational cost and is impractical in the case of detecting the community structure for large networks. Furthermore, this approach does not consider any inter-layer edges between nodes across layers. In this thesis, we address the problem of community detection in dynamic multi-layer networks. In particular, we focus on analyzing brain functional connectivity by modeling it as a dynamic network across multiple subjects. This representation allows a comprehensive understanding of the dynamic behavior of the brain functional connectivity and its variation across subjects. 1.6 Dynamic Functional Connectivity Networks of the Brain With the advancement of neuroimaging technology, it is now possible to collect neuronal data from the human brain across different modalities, spatial and temporal resolutions, such as elec- troencephalography (EEG) and fMRI data [42–44]. One of the major ways of analyzing this mul- tidimensional neurophysiological data has been to construct functional connectivity networks of 13 the brain. Functional connectivity (FC) is defined as the statistical dependency between spatially distributed brain regions [45–47]. With the development of complex network theory, it is now possible to map human brain activity as a complex network [48–50]. In particular, the nodes of the network correspond to the different brain regions and the edges are quantified through the interactions between these regions. The possibility of modeling the brain as a network opened the door for researchers to investigate brain behavior during task related events and resting state. Early work on functional connectivity network (FCN) analysis focuses on static networks. However, recent studies have shown that FCNs exhibit a dynamic behavior [51–53]. In recent years, understanding the spontaneous activity of the brain during resting state at- tracted lots of attention. Consequently, different approaches focused on understanding the func- tional mechanisms of human brain during resting state. This was accomplished by investigating the functional connectivity of the brain networks from fMRI during resting state. FC between dif- ferent brain regions has been quantified through the computation of Pearson’s correlation between regions of interest (ROIs) [54–56]. Increased interest in studying resting state fMRI (rs-fMRI) has shown that the different brain regions of the motor cortex are active even in the absence of a motor task [57]. Many of the current studies on rs-fMRI reported a set of networks that appear consis- tently in healthy subjects such as default mode network, visual network, somatomotor network, executive control network and auditory network [58, 59]. Early work on FCN analysis relied on static network representation. However, recent studies revealed evidence for dynamic nature of FCNs [51–53], [60–62] Different methods have been developed to study the dynamics of the functional connectivity networks (dFCNs). These methods include clustering based methods [55], [63–67], statistical state modeling methods [68, 69] and subspace analysis based methods [56], [70–72], [73]. 14 Under clustering based methods, different algorithms have been developed assuming that the network is at one distinct state at each time point, and the states reoccur across time. In [55], an approach based on k-means clustering of dFCNs across time and subjects is introduced to identify ”FC states” from rs-fMRI data. Similarly, the authors in [63–66] used k-means clustering to ob- tain ”synchrostates” from a time-frequency dependent phase difference data. Another clustering based approach is introduced in [67], where a codebook was obtained to facilitate the description of the ”FCµstate” through Neural Gas algorithm. Under statistical state modeling, an approach to identify the network states through hierarchical clustering followed by a Hidden Markov Model (HMM) is introduced in [68]. Similarly, the authors in [69] obtained the network states and the transitions between them through independent vector analysis and Markov modeling. Under sub- space modeling methods, an approach based on principal component analysis (PCA) is introduced in [56], where it is assumed that the network states are made up of multiple building blocks, or eigenconnectivities. A similar PCA based approach is adopted in [70] using EEG data to recon- struct the principal resting state dynamics. Most of the current work focuses on extracting FC states and reduces the dimension of the dFCNs by vectorizing the connectivity matrices before extracting the FC states. This procedure may result in the loss of the topological structure of the FCN. These approaches also reduce the dFCNs into a few common basis functions or states without explic- itly identifying the topological organization within each state. Moreover, they do not necessarily identify the change points between these states. More recently, approaches that can first detect the change points in dFCN connectivity patterns and then summarize the network structure have been introduced. These include greedy algorithms using graphical approaches [74], Bayesian network modeling based models [75] and tensor based approaches [76], [73]. 15 1.7 Organization and Contributions of this Thesis In this thesis, we present novel approaches to both track and detect the community structure in dynamic multi-layer networks. A particular focus of this thesis is detecting and tracking the com- munity structure in temporal social networks and dynamic multi-layer functional connectivity net- works (dFCNs) of the brain. In Chapter 2, we introduce a low-rank + sparse estimation based evolutionary spectral clus- tering approach to detect and track the community structure in temporal networks. The proposed method decomposes the network into low-rank and sparse parts and obtains smooth cluster assign- ments by minimizing the subspace distance between consecutive time points. The method tries to obtain smooth cluster assignments by minimizing the subspace chordal distance between con- secutive time points. Furthermore, the proposed algorithm explicitly considers the changes in the dimensions of the consecutive subspaces over time by modifying the chordal distance. In addition, the introduced framework is robust to noise and outliers and can detect the community structure in both binary and weighted temporal networks efficiently. Therefore, unlike existing evolutionary clustering methods, the proposed algorithm does not require any prior knowledge or make assump- tions about the community structure and does not assume any particular model for the network. The proposed approach is evaluated by applying it to multiple simulated and real social temporal networks. In Chapter 3, we propose a tensor based approach using 3-way and 4-way tensors to detect the community structure of dynamic single-layer and multi-layer networks, respectively. In the dynamic single-layer network case, two approaches, windowed tensor approach (3D-WTA) and running time tensor approach (RTTA) are proposed. In 3D-WTA, a fixed length widow is used to construct the tensors that represent the network at each time point, while in RTTA, a variable 16 length window is used to construct the tensors. The proposed framework takes the history of the networks over time into account and optimizes the contribution of each time point in the network. Moreover, a normalized cost function is introduced to track the changes in the community structure over time. The proposed approach is applied to synthetic and real networks including social net- works and dFCNs. In particular, it is applied to EEG data collected during a study of error-related negativity (ERN) in order to understand the dynamic behavior of the FCN during cognitive con- trol. For dynamic multi-layer networks, 3D-WTA is extended for temporal multi-layer networks, namely 4D-WTA, to identify and track the community structure across time and layers, simultane- ously. The proposed algorithm is applied to study the temporal evolution of communities in FCNs constructed across different regions of interests (ROIs) in the brain from rs-fMRI. In 4D-WTA, the contribution of each subject and time point within the framework is optimized. Moreover, the dFCNs are summarized into a set of FC states defined through their community structure that repeat over time and subjects. The detected community structures are evaluated through a con- sistency measure and the dynamic behavior of the FCNs is compared to the results from prior work. In Chapter 4, we address the issue of community detection in dynamic networks by considering the ability to reduce the network and aggregate the time points that are topologically similar to achieve more compact and efficient community detection. The proposed approach consists of two steps. The first step relies on an information-theoretic function to aggregate the dynamic networks and identify the time points that possess similar topological structure to combine them into a tensor. In the second step, a tensor based spectral clustering approach similar to the one proposed in Chapter 3 is applied to identify the community structure of the constructed tensors. The detected community structure is summarized as a set of connectivity patterns or states that reoccur across time. A particular application of this approach is the analysis of resting state FCNs 17 of the brain from fMRI signals. A set of FC states is extracted from the FCNs and evaluated by comparing it to the findings of previous studies. In Chapter 5, we propose a temporal block spectral clustering framework to detect and track the community structure of multi-layer temporal networks. The main contribution of the proposed approach is the construction of data-driven within and across layer adjacency matrices. The pro- posed approach considers all possible edges between nodes within and across layers to construct intra- and inter- layers adjacency matrices unlike existing methods that ignores inter-layer edges or considers inter-layer edges between the node and itself only, i.e. multiplex networks. In particular, a sliding window correlation approach is used to model the pair-wise temporal relationships be- tween all nodes within and across time windows. A set of intra- and inter-adjacency matrices are constructed and combined to create a set of temporal supra-adjacency matrices. The community structure is then determined by applying spectral clustering to these supra-adjacency matrices. The proposed algorithm is evaluated on simulated networks and dFCNs from rs-fMRI across time and subjects and compared to results reported from previous studies. Finally, Chapter 6 summarizes the contributions of this thesis and suggests future work to improve current approaches. 18 Chapter 2 Detecting and Tracking Community Structure in Temporal Networks: A Low-Rank + Sparse Estimation Based Evolutionary Clustering Approach 2.1 Introduction Many real world systems and relational data can be modeled as networks or graphs, where the system’s objects and the interactions between them are represented by the vertices and edges of a graph, respectively. With the availability of large amounts of network data, it is important to be able to reduce the network’s dimensionality and extract useful information from it. A key approach to network data reduction is community detection [77]. Early work in graph based community de- tection methods has focused on static networks [2] [7]. This type of networks is usually considered as an oversimplification as many real world complex systems exhibit variation in their community structure over time. Consequently, there is a growing need to develop algorithms that detect the 19 community structure and track its evolution across time [5]. Over the past decade, various methods have been proposed to detect the community struc- ture in temporal networks. Evolutionary clustering approaches provide a framework to cluster the network over time, such that the network structure at each time point is preserved and the cluster assignments change smoothly across time. These approaches include extensions of static clustering methods [31] [32], statistical modeling based methods [33], tensor based methods [34] [78], mod- ularity based methods [35] and multi-objective optimization genetic algorithms [79]. Extensions of static clustering methods to temporal networks include k-means and agglomerative hierarchical clustering approaches [31]. In [32], k-means evolutionary clustering approach was extended to two spectral evolutionary clustering based frameworks: preserving cluster quality (PCQ) and pre- serving cluster membership (PCM). The clustering in both frameworks depends on a cost function to guarantee temporal smoothness. However, this cost function requires a priori knowledge about the community structure of the underlying network. For statistical modeling algorithms, a statisti- cal model for the evolution of the adjacency matrix was assumed [33] and an adaptive forgetting factor for evolutionary clustering and tracking (AFFECT) was introduced. In AFFECT, the prox- imity between objects was smoothed over time followed by a static clustering method. AFFECT assumes a stochastic block model for the adjacency matrix and has higher computational complex- ity compared to static clustering due to the estimation of the forgetting factor. Under tensor based methods, non-negative PARAFAC tensor factorization community detection approaches were in- troduced in [34] and [78] where the temporal network is represented by a single 3-way tensor. In [78], PARAFAC decomposition and k-means are used to learn the network generative models and detect the clusters, respectively. However, this framework requires a priori knowledge about the ground truth clusters in the temporal network, including the number of the clusters, the nodes that belong to each cluster and the presence of the clusters across time, to determine the significant 20 clusters and evaluate them. Modularity based methods are based on greedy modularity optimiza- tion to identify the changing modular organization [35]. This approach has some drawbacks such as computational cost, null model selection, preference for bigger clusters and the possibility to converge to a sub-optimal solution as discussed in [20]. Multi-objective optimization genetic al- gorithms consider a multi-objective approach (DYNMOGA) [79] that consist of two objectives: Snapshot cost and temporal cost. The snapshot cost relies on computing four quality measures, including modularity, conductance, normalized cut and community score. On the other hand, the temporal cost relies on minimizing the normalized mutual information between consecutive parti- tions. However, DYNMOGA has high computational complexity that increases with the number of generations. Moreover, DYNMOGA performs well in binary networks whereas its performance declines for weighted networks, especially when the networks are fully connected. Although a variety of methods have been developed to detect the community structure in tem- poral networks, there are still some problems and limitations associated with these methods. First, some of the existing approaches are limited to binary networks [34]. Second, some of the current algorithms require a priori knowledge or make assumptions about the community structure or the adjacency matrix [33] [35] [78]. Third, some of the existing algorithms such as [79] have a high computational complexity. Finally, many of the existing algorithms are not robust to outliers which leads to the detection of inaccurate community structure. Consequently, there is a need to develop new approaches to overcome these shortcomings. In this chapter, a low-rank + sparse estimation based evolutionary spectral clustering approach is proposed to detect and track the community structure in temporal weighted and binary networks. The major advantages of the proposed approach over existing methods are five fold. First, the pro- posed method can detect and track the community structure in both weighted and binary temporal networks without any adjustments to the optimization problem. Second, the extraction of the low- 21 rank part of the adjacency matrix at each time point removes the sparse noise and outliers in the network which improves the robustness of the method. Third, minimizing the subspace distance between consecutive time points results in smooth cluster assignments across time, i.e. preserves the community structure at the current time point and guarantees that the structure is evolving smoothly across time. Fourth, a cost function defined through subspace distance measure provides a way to track the changes in the community structure across time. Finaly, the proposed approach does not require a priori knowledge or make any assumptions about the community structure un- like some existing methods. 2.2 Background 2.2.1 Spectral Clustering A popular approach for detecting the community structure in static networks is spectral cluster- ing, which maps the relationship between the nodes in a graph to a lower dimensional subspace. Given a network with adjacency matrix, A ∈ Rn×n, spectral clustering solves the following trace minimization problem [7], (cid:16) U†ΦΦΦU (cid:17) , min U∈Rn×r Tr s.t U†U = Ir, (2.1) where † is the transpose operator and ΦΦΦ = I− D− 1 2 AD− 1 2 is the normalized Laplacian matrix with D as the diagonal degree matrix [19]. Φ is always a positive semi-definite matrix for undirected graphs. The solution to the optimization problem is the matrix U containing the r eigenvectors that correspond to the smallest r eigenvalues of ΦΦΦ. The community structure is then determined by applying k-means to the matrix U [7]. The number of the eigenvectors, r, refers to the number of 22 clusters in the network [7]. 2.2.2 Low-rank + Sparse matrix approximation Different methods have been proposed to find a low-rank approximation for a noise corrupted ma- trix A ∈ Rn1×n2. For example, principal component analysis (PCA) finds a low rank approximation by solving the following optimization problem [80], [81], [82]: min L∈Rn1×n2 (cid:107)A− L(cid:107)2 F , s.t rank(L) ≤ r, (2.2) where L ∈ Rn1×n2 is the low rank approximation of A. The solution to this problem is given in terms of singular value decomposition (SVD) of the 2 be the SVD of A, where Q1 ∈ Rn1×n1 and Q2 ∈ Rn2×n2 are the left matrix A. Let A = Q1ΣΣΣQ† and right singular vectors matrices, respectively and ΣΣΣ ∈ Rn1×n2 is a diagonal matrix with the †, where ˜Q1 ∈ Rn1×r, singular values on its diagonal. The rank-r matrix L is given as L = ˜Q1 ˜ΣΣΣ ˜Q2 ˜Q2 ∈ Rn2×r and ˜ΣΣΣ ∈ Rr×r are the top r singular vectors and singular values matrices, respectively. Even though PCA is the most commonly used technique for identifying a low-rank approximation of a corrupted observations matrix, it is known that its performance declines when the corruptions are non-Gaussian. In order to address this issue, robust PCA (RPCA) has been introduced to recover the low-rank approximation from data corrupted by non-Gaussian noise [83]. RPCA decomposes the matrix A into a low-rank component, L, and a sparse component, S, by solving the following optimization problem: min L ∈Rn1×n2 ,S∈Rn1×n2 (cid:107)L(cid:107)∗ + µ(cid:107)S(cid:107)1, s.t L + S = A, (2.3) where (cid:107)L(cid:107)∗ is the nuclear norm of L, (cid:107)S(cid:107)1 is the l1-norm of S and µ > 0 is the regularization 23 parameter. The unconstrained version of the problem in Eq. (2.3) is given by: min L ∈Rn1×n2 ,S∈Rn1×n2 (cid:107)L(cid:107)∗ + µ(cid:107)S(cid:107)1 + (cid:107)A − L − S(cid:107)2 F , γ 2 (2.4) which can be solved efficiently by iterative SVD soft-thresholding algorithm [83]. 2.2.3 Chordal Subspace Distance Measure Let B be a subspace of dimension r1 in Rn, and C another subspace of dimension r2 in Rn, where r1,r2 ≤ n. The chordal distance, which defines a metric on subspaces of different dimensions, is defined as [84]: Ch(B,C) = |r2 − r1|+ d2 min{r1,r2} ∑ i=1 (sinθi)2, (2.5) where θi represents the ith principal angle between B and C which quantifies how close the two subspaces are geometrically. 2.2.4 Variation of Information In order to evaluate the performance of a community detection algorithm, the degree of agreement between different partitions of the same network should be determined. Throughout this chapter, the variation of information metric introduced in [85] is used for this purpose. Variation of infor- mation (VI) between two partitions ¯C = (C1, C2, . . . , Cr) and ¯C(cid:48) = (C(cid:48) r(cid:48)) with r and r(cid:48) 2, . . . , C(cid:48) 1, C(cid:48) clusters, respectively is defined as: V I( ¯C, ¯C(cid:48)) = H( ¯C| ¯C(cid:48)) + H( ¯C(cid:48)| ¯C), (2.6) 24 where H( ¯C| ¯C(cid:48)) and H( ¯C(cid:48)| ¯C) are conditional entropies. VI ranges between 0 ≤ V I ≤ logn, where n is the number of nodes in the network. A value of 0 means the partitions are exactly the same and as the VI value increases, the degree of agreement between partitions decreases. 2.3 Methods: low-rank + sparse estimation based evolutionary spectral clustering (LS-ESC) 2.3.1 Problem Formulation Given noisy adjacency matrix observations, A(t) ∈ Rn×n, of a low-rank temporal network, L(t) ∈ Rn×n, the objective is to obtain temporally smooth community detection assignments from these noisy adjacency matrix observations. To accomplish this objective, we propose to decompose the corrupted adjacency matrix into low-rank, L(t) ∈ Rn×n, and sparse, S(t) ∈ Rn×n, components. The estimated low-rank adjacency matrix at each time point is then used to extract the corresponding subspace, i.e. cluster assignment, with the additional constraint that the communities evolve slowly over time. This results in the following optimization problem: min L(t)∈Rn×n,S(t)∈Rn×n, U(t)∈Rn×r (cid:107)L(t)(cid:107)∗ + µ(cid:107)S(t)(cid:107)1 + λ1 Tr(U(t)†ΦΦΦ(t)U(t)) + λ2 d2 ch(U(t),U(t−1)), (2.7) s.t A(t) = L(t) + S(t),U(t)†U(t) = I,L(t) = L(t)† , L(t) i j ≥ 0, L(t) ii = 0. The different terms in problem in Eq. (2.7) achieve the following goals: • The first two terms represent the RPCA problem. The constraints L(t) = L(t)† , L(t)i j ≥ 0, L(t)ii = 0 are added to ensure the symmetry and nonnegativity of the low-rank adjacency matrix with zeros on the diagonal. 25 • The third term, Tr(U(t)†ΦΦΦ(t)U(t)), with the constraint U(t)†U(t) = I, represents the spectral clustering problem and is introduced to detect the community structure of the underlying low-rank network at time point t. The normalized Laplacian matrix, ΦΦΦ(t), is defined in terms of the low-rank component as ΦΦΦ(t) = I− D(t)−0.5L(t)D(t)−0.5. • The last term, d2 Ch(U(t),U(t−1)), is introduced to minimize the distance between the consec- utive networks’ subspaces at time points t and t − 1. By adding this term to the optimization problem in Eq. (2.7), the community structure of the temporal network is guaranteed to be evolving smoothly over time. • µ,λ1 and λ2 > 0 are regularization parameters. 2.3.2 Distance Between Successive Subspaces In order to define the distance between consecutive subspaces, let U(t) be the matrix whose columns form a basis for the r-dimensional subspace in Rn, defined by the r eigenvectors corresponding to the smallest eigenvalues of the graph Laplacian at time point t. Similarly, U(t−1) can be defined as the basis spanning the r1-dimensional subspace in Rn, that contains the r1 eigenvectors corre- sponding to the smallest eigenvalues of the graph Laplacian at time point (t − 1), i.e. U(t) ∈ Rn×r and U(t−1) ∈ Rn×r1. One of our main objectives is to find the subspace spanned by U(t) that is close to the subspace spanned by U(t−1), and at the same time conserves the nodes’ connectivity at time point t. Using the definition of chordal distance in Eq. (2.5) and the fact that the columns of U(t) are orthonormal, we have three different cases. In the first case, the dimensions of the subspaces corresponding to the consecutive time points are equal to each other, i.e. r = r1. In the second case, the dimension of U(t) is smaller than the dimension of U(t−1), i.e. r ≤ r1. In the third case, the dimension of U(t) is larger than the dimension 26 of U(t−1), i.e. r > r1. The chordal distance between the successive subspaces for the three cases can be simplified as follows. For the first case: Ch(U(t),U(t−1)) = d2 = i=1 (sinθi)2 (1 − (cosθi)2) r1∑ r1∑ = r1 − r1∑ = r1 − Trace(U(t)U(t)†U(t−1)U(t−1)† (cosθi)2 i=1 i=1 For the second case: ∑ ∑ i=1 r (sinθi)2 (1 − (cosθi)2) Ch(U(t),U(t−1)) = r1 − r + d2 = r1 − r + = r1 − r + r − r ∑ = r1 − r ∑ = r1 − Trace(U(t)U(t)† ˆU(t−1) ˆU(t−1)† (1 − (cosθi)2) (1 − (cosθi)2) i=1 i=1 (cid:104) ˆU(t−1) ¯U(t−1)(cid:105) , ˆU(t−1) ∈ Rn×r is the matrix that contains the first r eigenvectors where U(t−1) = of the Laplacian matrix at the previous time point, and ¯U(t−1) ∈ Rn×(r1−r) is the matrix that contains the last r1 − r eigenvectors of the Laplacian matrix at the previous time point. (2.8) (2.9) ), ), r i=1 27 For the third case: i=1 r1∑ r1∑ i=1 (sinθi)2 (1 − (cosθi)2) Ch(U(t),U(t−1)) = r − r1 + d2 = r − r1 + = r − r1 + r1 − r1∑ = r − r1∑ = r − Trace( ˆU(t) ˆU(t)†U(t−1)U(t−1)† (1 − (cosθi)2) (1 − (cosθi)2) i=1 i=1 (2.10) ), (cid:104) ˆU(t) ¯U(t)(cid:105) , ˆU(t) ∈ Rn×r1 is the matrix that contains the first r1 eigenvectors of the where U(t) = Laplacian matrix at the current time point, and ¯U(t) ∈ Rn×(r−r1) is the matrix that contains the last r− r1 eigenvectors of the Laplacian matrix at the current time point. 2.3.3 Problem Solution The solution to the optimization problem in Eq. (2.7) can be obtained by alternating direction method of multipliers (ADMM) [86] combined with proximal algorithms [87] [88] due to the non- differentiability of the nuclear and l1− norms. First, the problem in Eq. (2.7) is split by adding an auxiliary variable W(t) as follows: (cid:107)L(t)(cid:107)∗ + µ(cid:107)S(t)(cid:107)1 + λ1Tr(U(t)† (I − D(t)−0.5W(t)D(t)−0.5 )U(t)) (2.11) min L(t)∈Rn×n,S(t)∈Rn×n, U(t)∈Rn×r,W(t)∈Rn×n +λ2d2 ch(U(t),U(t−1)), s.t A(t) = L(t) + S(t), W(t) = L(t),U(t)†U(t) = I,W(t) = W(t)† , W (t) i j ≥ 0, W (t) ii = 0. Second, the augmented Lagrange multipliers (ALM) function is rewritten by adding additional quadratic terms to guarantee the attainment of the first two equality constraints: 28 (I − D(t)−0.5W(t)D(t)−0.5 min (cid:107)L(t)(cid:107)∗ + µ(cid:107)S(t)(cid:107)1 + λ1Tr(U(t)† L(t)∈Rn×n,S(t)∈Rn×n, U(t)∈Rn×r,W(t)∈Rn×n 1,A(t) − L(t) − S(t)(cid:105) + λ2d2 F + (cid:104)Xk 2,W(t) − L(t)(cid:105) + + i j ≥ 0, W (t) s.t U(t)†U(t) = I, W(t) = W(t)† , W (t) ch(U(t),U(t−1)) + (cid:104)Xk (cid:107)A(t) − L(t) − S(t)(cid:107)2 γ1 2 (cid:107)W(t) − L(t)(cid:107)2 F , γ2 2 ii = 0, )U(t)) (2.12) with Lagrange multipliers (dual variables) at the kth iteration defined as, Xk+1 1 = Xk 1 + γ1(A(t) − L(t)k+1 − S(t)k+1 ), Xk+1 2 = Xk 2 + γ2(W(t)k+1 − L(t)k+1 ). (2.13) (2.14) The problem in Eq. (2.12) can be solved at each time point, t, by following the steps in Algorithm 2.1 where the primal variables L(t),S(t),W(t) and U(t) are updated iteratively as follows: L(t)k+1 = prox 1 γ1+γ2 (cid:107)L(t)(cid:107)∗ (cid:18)γ1Yk 1 + γ2Yk 2 γ1 + γ2 (cid:19) , (2.15) (2.16) + Xk 1 γ1 , Yk 2 = W(t)k + where Yk 1 = A(t)−S(t)k 2 (cid:107) L−Y (cid:107)2 is the proximity operator of the convex function f where f (L) in this case is defined as f (L) = (cid:107)L(t)(cid:107)∗. Details are included in the Appendix. and prox f (Y) = argminL∈Rn×n f (L) + 1 Xk 2 γ2 F Similarly, S(t)k+1 = prox µ γ1 (cid:107)S(t)(cid:107)1 A(t) − L(t)k+1 + Xk 1 γ1 (cid:18) (cid:19) . with f (S) defined as f (S) = (cid:107)S(t)(cid:107)1 as explained in the Appendix. The update of W(t) is performed by computing the closed form solution as defined in Eq. (2.17) followed by the spectral projected gradient (SPG) [89] method as explained in the Appendix to fulfill all the constraints: W(t)k+1 = L(t)k+1 − Xk 2 γ2 + λ1 γ2 (D(t)k −0.5 ) U(t)kU(t)k† (D(t)k −0.5 ) (2.17) 29 Finally, the update of U(t) requires the solution of multiple optimization subproblems depend- ing on the dimension of the consecutive subspaces. For the first two cases defined in Section 2.3.2, U(t) is updated by solving: min λ1Tr(U(t)†ΦΦΦ(t)k+1 modU(t)), s.t U(t)†U(t) = I, (2.18) U(t) ∈Rn×r mod = ΦΦΦ(t)k+1 − λ2 λ1 where ΦΦΦ(t)k+1 ˆU(t−1) ˆU(t−1)†. For the last case, U(t) is updated by solving: min λ1Tr( ˆU(t)†ΦΦΦ(t)k+1 mod ˆU(t)) s.t λ1Tr( ¯U(t)†ΦΦΦ(t)k+1 ¯U(t)), ˆU(t) ∈Rn×r1 min ¯U(t)† ¯U(t) = I, ˆU(t)† ¯U(t) = 0, ¯U(t) ∈Rn×(r−r1) s.t ˆU(t)† ˆU(t) = I, ¯U(t)(cid:105) . (2.19) (cid:104) ˆU(t) where ΦΦΦ(t)k+1 mod = ΦΦΦ(t)k+1 − λ2 λ1 U(t−1)U(t−1)† and U(t) = The derivations are included in the Appendix. 2.3.4 Tracking the community structure over time In order to track changes in the community structure over time, a cost function is defined using the distance between consecutive subspaces as follows: CostFunction = d2 ch(U(t),U(t−1)), (2.20) where this cost represents the minimum distance achieved between consecutive subspaces. The changes in the cost function reflect the changes in the community structure. 2.3.5 Detecting the number of clusters During the application of LS-ESC, the number of clusters at each time point is determined using the asymptotical surprise (AS) metric [90]. After the iterations start at Step 10 in Algorithm 2.1, U(t)k+1 is calculated using a range of eigenvectors, e.g. (2-20), followed by k-means clustering. 30 ), Pk % Define dual variables Ch(U(t)k 4 = λ2d2 4(cid:107)2 4 −Pk (cid:107)Pk+1 (cid:107)Pk 4(cid:107)2 F F ,U(t−1)). > ε and > ε and 3 = λ1Tr(U(t)k† 2 = µ(cid:107)S(t)k(cid:107)1, Pk ΦΦΦ(t)kU(t)k 3(cid:107)2 3 −Pk (cid:107)Pk+1 2(cid:107)2 2 −Pk (cid:107)Pk+1 (cid:107)Pk (cid:107)Pk 2(cid:107)2 3(cid:107)2 > ε do > ε and F F F F F F F 2 ← W(t)k − L(t)k. Algorithm 2.1 LS-ESC Input: A(t) ∈ Rn×n, U(t−1) ∈ Rn×r1 µ, λ1, λ2, ε. Output: L(t), S(t), U(t), Clustering labels. 1: k ← 0 2: Initialize L(t) ← A(t), S(t) ← zeros(n,n), W(t) ← L(t). 3: γ1 ← 1, γ2 ← 1. 1 ← A(t) − L(t)k − S(t)k, Xk 4: Xk 1 = (cid:107)L(t)k(cid:107)∗, Pk 5: Pk % Define primal objectives 1(cid:107)2 6: while (cid:107)Pk+1 1 −Pk > ε and 1(cid:107)2 (cid:107)Pk 1(cid:107)2 2(cid:107)2 2 −Xk (cid:107)Xk+1 1 −Xk (cid:107)Xk+1 > ε and F (cid:107)Xk 1(cid:107)2 (cid:107)Xk 2(cid:107)2 Update L(t)k+1 using Eq. (3). Update S(t)k+1 using Eq. (5). Update W(t)k+1 using Eq. (9) and Algorithm ??. Update U(t)k+1 by solving Eq. (14) or Eq. (24). if k = 2 then 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: end while 19: Apply k-means on U(t) to obtain clustering labels. end if Update Xk+1 1 Update Xk+1 2 Update Pk+1 1 k ← k + 1. using Eq. (2.13). using Eq. (2.14). , Pk+1 , Pk+1 using Step 5. and Pk+1 2 4 3 F F Determine the number of clusters using Eq. (2.21). 31 The correct number of clusters at time point t is determined as the number that maximizes the AS metric [90] with respect to the underlying network and is defined as: AS(A(t),Clustering labels), r : argmax r (2.21) where Clustering labels : {C1,C2, . . . ,Cr}. The number of clusters is then fixed for time point t until the algorithm converges. This process is repeated at each time point to update the number of clusters across time. 2.4 Results In order to validate the performance of the proposed algorithm in detecting and tracking the com- munity structure of temporal networks, the algorithm is evaluated on a set of simulated and real binary and weighted temporal networks. The performance of the proposed LS-ESC is compared to the state-of-the-art graph-based community detection algorithms including; AFFECT [33], DYN- MOGA [79] and generalized Louvain for multi-layer modularity maximization (GenLov) [35]. The comparison is conducted in terms of detected number of clusters (DNOC) and accuracy of the detected community structure across time using VI metric. 2.4.1 Simulated temporal networks 2.4.1.1 Simulated binary temporal networks To evaluate the performance of the proposed algorithm in detecting the community structure of binary temporal networks, the benchmark networks proposed by Girvan and Newman in [91] and implemented in [92] are adopted. In this data set, the temporal network consists of 20 time points with 128 nodes divided into 4 clusters. Each node has a fixed average degree (AD) and shares a number (z) of edges with nodes from other communities. As AD increases, the network becomes 32 Data set AD z NC% µ, λ1, λ2 λ1 1 32 3 10 2 32 5 10 3 20 2 20 4 20 1 30 5 16 3 20 6 16 3 30 0.3, 0.1, 0.4, 0.1, 0.5, 0.1, 0.5, 0.1, 0.5, 0.1, 0.6, 0.1, 0.3 0.3 0.1 0.1 0.1 0.1 Table 2.1: Parameters of the simulated binary networks from Girvan and Newman benchmark for six different data sets and the selected values of the regularization parameters. more dense and as z decreases, the clusters become more distinct. In order to impose nonstationar- ity to the community structure over time, a percentage of nodes (NC%) are moved randomly from their communities and assigned to other communities at each time step. A set of temporal networks are generated with different parameters to obtain different community structures as summarized in Table 2.1. The regularization parameters are determined empirically and reported in Table 2.1. The performance of the proposed approach is compared to the state-of-the-art algorithms using VI metric in Fig. 2.1. The simulations are repeated 100 times and the VI is averaged over all simulations. As it can be seen in Fig. 2.1, the proposed algorithm performs better than AFFECT and DYNMOGA in detecting the community structure over time for the different networks. The performance of DYNMOGA is found to be comparable to the proposed algorithm for data sets 1 and 2 in Fig. 2.1a and Fig. 2.1d, respectively. This is due to the fact that In data set 1, AD is high and NC% is low, i.e. the networks are densely connected and more stationary. On the other hand, in data set 4, z is low, i.e. the clusters are more distinct. However, the performance of DYNMOGA degrades as AD decreases and z and NC% increases , i.e. in cases where the network is sparse, non-stationary and is less modular such as data set 5 and 6. It is found that AFFECT does not perform well in binary networks. 33 (a) (c) (b) (d) (e) (f) Figure 2.1: VI comparison between LS-ESC, AFFECT and DYNMOGA for simulated binary networks from Girvan and Newman benchmark with the parameters presented in Table 2.1. LS- ESC scored the lowest VI over time compared to the other methods: (a) Data set 1; (b) Data set 2; (c) Data set 3; (d) Data set 4;(e) Data set 5; and (f) Data set 6. 34 246810121416182000.050.10.150.20.250.30.35Time PointVIData set 1 DynmogaAFFECTLS−ESC246810121416182000.050.10.150.20.250.30.350.4Time PointVIData set 2 DynmogaAFFECTLS−ESC246810121416182000.050.10.150.20.250.30.350.40.450.5Time PointVIData set 3 DynmogaAFFECTLS−ESC246810121416182000.10.20.30.40.50.60.7Time PointVIData set 4 DynmogaAFFECTLS−ESC246810121416182000.10.20.30.40.50.60.7Time PointVIData set 5 DYNMOGAAFFECTLS−ESC246810121416182000.10.20.30.40.50.60.7Time PointVIData set 6 DYNMOGAAFFECTLS−ESC 2.4.1.2 Simulated weighted temporal networks As many of the real-world networks are weighted, a set of weighted temporal networks are gener- ated to evaluate the performance of the proposed approach in detecting and tracking the community structure over time. Temporal networks are generated for 60 time points with 100 nodes. Intra and inter-cluster edges are selected from a truncated Gaussian distribution in the range of [0,1]. Dif- ferent levels of sparse noise (SN%) are added to the inter-cluster edges. The parameters and the details of the community structure for these data sets are given in Table 3.1. The values of the regularization parameters are chosen empirically and reported in Table 3.1. The performance of LS-ESC and the state-of-the-art algorithms in detecting the community structure for the weighted temporal networks in Table 3.1 is compared in terms of the DNOC and VI over time in Fig. 2.2. DNOC at each time point for LS-ESC and AFFECT is determined as the number that maximizes the asymptotical surprise metric. The simulations are repeated for 100 times and the cost function, DNOC and VI are averaged over all simulations. As it can be seen in Fig. 2.2(a)-(c), the cost function can track the change points in the com- munity structure over time and detects changes at time points 20 and 40. Fig. 2.2(d)-(f) show that the proposed approach performs better than the other methods in detecting the correct number of clusters over time. Finally, Fig. 2.2(g)-(i) present a comparison between the different methods using the VI metric. LS-ESC outperforms the other methods in terms of detecting and tracking the correct community structure over time. Moreover, LS-ESC has better performance around the change points and detects the correct community structure faster than the other methods. The performances of AFFECT and GenLov decline as the SN% increases and the size of clusters de- creases. In particular, both methods show preference to large clusters and tend to merge small clusters together as can be seen during the time intervals (40-60), (20-40) and (20-60) in data sets 35 Figure 2.2: Comparison between LS-ESC and state of the art algorithms for the different weighted networks in Table 3.1: (a)-(c) The cost function calculated by LS-ESC, (d)-(f) The detected number of clusters (DNOC) for the simulated networks using different algorithms. LS-ESC estimated the correct number of clusters over time and the cost function reflected the correct change points; (g)-(i) Comparison of variation of information (VI) between LS-ESC, AFFECT and GenLov for the simulated networks. LS-ESC outperforms other methods and scored the lowest variation of information over time. 1, 2 and 3, respectively. The results for DYNMOGA are not presented in these figures as it does not perform well on weighted networks. 36 0204060Time Point (a) 00.511.5Cost Function0204060Time Point (d) 345DNOC0204060Time Point (g) 00.020.040.060.080.1VI0204060Time Point (b) 00.511.5Cost Function0204060Time Point (e) 22.533.544.5DNOCAFFECTGenLovLS-ESC0204060Time Point (h) 00.020.040.060.080.1VI0204060Time Point (c) 00.511.5Cost Function0204060Time Point (i) 00.020.040.060.080.1VI0204060Time Point (f) 456DNOCDataSet 3DataSet 2DataSet 1 Data Sets Time points 1-20 Time points 21-40 Time points 41-60 1 2 3 µintra,σintra, µiner,σinter Nodes in each cluster C Number of clusters SN%, µ, λ1, λ2 λ1 0.5,0.3,0.2,0.1 C1(1− 30), C2(31− 45), C3(46− 60),C4(61− 80) C5(81− 100) 0.5,0.2,0.1,0.1 0.5,0.3,0.2,0.1 C1(1− 30), C2(31− 60), C3(61− 80),C4(81− 100) C1(1− 60), C2(61− 80), C3(81− 100) 5 4 3 10%, 0.3, 0.1, 0.1 10%, 0.3, 0.1, 0.1 10%, 0.3, 0.1, 0.1 µintra,σintra, µiner,σinter Nodes in each cluster C C1(1− 30), C2(31− 60), C3(61− 80),C4(81− 100) 0.7,0.3,0.2,0.1 0.7,0.2,0.3,0.2 0.7,0.3,0.2,0.1 C1(1− 60), C2(61− 80), C3(81− 100) C1(1− 30), C2(31− 60), C3(61− 80),C4(81− 100) Number of clusters SN%, µ, λ1, λ2 λ1 4 3 4 20%, 0.5, 0.1, 0.1 20%, 0.5, 0.1, 0.1 20%, 0.5, 0.1, 0.1 µintra,σintra, µiner,σinter Nodes in each cluster C C1(1− 30), C2(31− 60), C3(61− 80),C4(81− 100) 0.8,0.2,0.3,0.2 0.6,0.1,0.2,0.1 0.8,0.2,0.3,0.2 C1(1− 30), C2(31− 45), C3(46− 60), C4(61− 80) C5(81− 100) C1(1− 15), C2(16− 30), C3(31− 45),C4(46− 60) C5(61− 80), C6(81− 100) Number of clusters SN%, µ, λ1, λ2 λ1 4 5 6 30%, 0.25, 0.1, 0.1 30%, 0.2, 0.1, 0.1, 30%, 0.2, 0.1, 0.1 Table 2.2: Parameters of the truncated Gaussian distribution used to generate the edges of the simulated weighted temporal networks for three different data sets, the ground truth of the nodes’ community membership for the generated networks and the selected values of the regularization parameters. 37 2.4.2 Real social network: MIT Reality Mining Dataset This data set was collected between August 2004 and June 2005, by recording the cell phone ac- tivity for 94 students and staff at MIT [93]. This data set can be represented as a binary dynamic network across 46 time points. Each point refers to a week during the school year. Two dominant clusters are assumed at each time point. LS-ESC approach is applied to this data set to detect the community structure across time. The cost function is used to track the changes in the commu- nity structure over time as shown in Fig. 2.3. The changes in the cost function coincides with six important dates during the school year, namely the beginning and end of the Fall and Spring semesters, Thanksgiving and spring breaks. The detected community structures for selected time points are shown in Fig. 2.4. These time points are selected to present the detected community structures in the beginning of the Fall semester, Thanksgiving break, 2 weeks after the beginning of the Spring semester and end of the Spring semester. The black frames represent the expected ground truth and the different colors represent the communities detected by LS-ESC. As it can be seen be seen from Fig. 2.4a, small clusters are detected at the beginning of the Fall semester indicating less interactions between students during this time. On the other hand, bigger clusters are detected during Thanksgiving break and Spring semester indicating more interactions between participants during these times as shown in Fig. 2.4b and Fig. 2.4c. Finally, two big clusters are detected at the end of the Spring semester as shown in Fig. 2.4d. However, these clusters are less dense than the ones detected during the semester which indicates that the interactions between participants decreases toward the end of the semester. 38 Figure 2.3: Cost function calculated by LS-ESC for the MIT Reality mining network. The changes in the cost function coincides with six important dates during the school year (red dashed lines). 2.4.3 Real social network: Primary school network In order to validate the performance of the proposed approach in detecting and tracking the com- munity structure in real weighted temporal networks, a high-resolution data set that describes a temporal social network in a primary school is used [94]. This data set studies the relation between contact patterns among children in a primary school and the propagation of diseases [95]. 2.4.3.1 Primary school data set description and temporal network construction This data set is collected by the SocioPatterns collaboration (http://www.sociopatterns.org) during two consecutive days in October 2009: Thursday, October 1st from 8:45 am to 5:20 pm and Friday, October 2nd from 8:30 am to 5:05 pm. The data set is collected using wearable sensors that sense face-to-face proximity interactions (about 1 to 1.5 meters) between individuals. The sensing system has a temporal resolution of 20 seconds and the proximity interactions are detected over consecutive 20 seconds long time intervals. The school consists of five grades, where each grade is divided into two sections. The total number of participants in the data set is 242 including 232 students and 10 teachers. Details about the students in the different grades are presented in Table 39 051015202530354045Weeks00.20.40.60.811.21.41.61.8Cost functionCost function calculated by LS-ESC for reality mining data setSpring term beginsThanksgivingSpring term endsFall term endsSpring breakFall term begins (a) (c) (b) (d) Figure 2.4: Detected community structure of the MIT reality mining temporal network at different times: (a) Week 5; (b) Week 15; (c) Week 27; and (d) Week 39. 40 Detected community structure by LS-ESC during week 5102030405060708090102030405060708090Detected community structure by LS-ESC during week 15102030405060708090102030405060708090Detected community structure by LS-ESC during week 27102030405060708090102030405060708090Detected community structure by LS-ESC during week 39102030405060708090102030405060708090 Class name 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B Teachers Number of partici- pants 23 25 23 26 23 22 21 23 22 24 10 Table 2.3: Different classes and number of teachers and students participating in the primary school data from each class. 2.3. The school day runs from 8:30 am 4:30 pm, with lunch break from 12:00 pm to 2:00 pm and two breaks of 20− 25 minutes around 10:30 am and 3:30 pm. Only two or three classes have breaks at the same time. Lunch is served in a common canteeen in two consecutive time periods and a common playground located outside the building is used during the other two breaks. A more detailed description of the data set can be found in [94]. In order to analyze the primary school data set, two temporal networks are constructed to present the temporal interactions between students and teachers for two consecutive days. In each network, the raw sensor data is averaged over time intervals of approximately 13 minutes, i.e. each time step of the constructed temporal network represents the average interactions between individuals over 13 minutes. Each time step is represented by an undirected weighted adjacency matrix A(t) ∈ Rn×n with n = 242 nodes. This results in two temporal networks with T = 40 time points representing each day. An important advantage of this network is that since it reflects the interactions in a primary school, a set of communities are expected to represent the different classes in the school. Moreover, the classes, lunch and break times are known which provides an expected ground truth about the community structure and how it changes over time. For instance, face-to-face interactions are expected to be higher between children in the same class during class times while the interactions between children from different classes are expected to increase during lunch or break times. 41 2.4.3.2 Detecting and tracking the community structure in the primary school temporal network (PSTN) The proposed LS-ESC approach is applied to PSTN for both days to detect and track the commu- nity structure over time. In both days, the detected number of communities in the network varies between 4 and 15 communities. The proposed cost function is calculated for both days and presented in Fig. 2.5(a) and (b). As it can be seen in Fig. 2.5, both cost functions follow a similar pattern during the day where the changes in the cost function reflect the changes in the community structure corresponding to breaks and lunch times. In particular, the biggest variation in the cost functions corresponds to lunch times where face-to-face proximity interactions between students from different classes increase. The detected community structures at selected time points are shown in Fig.2.6 and Fig. 2.7. These time points are selected to present the detected community structures in the morning, break, lunch and end of the day. Teachers are represented by the last ten nodes in red, the black frames represent the different classes and the different colors represent the communities detected by LS- ESC. As it can be seen from Fig.2.6 and Fig. 2.7, the detected community structures in the morning and at the end of the school day, i.e. Fig. 2.6(a) and (d) and Fig. 2.7 (a) and (d), represent the different classes in the school where the face-to-face proximity interactions are high between children from the same classes. Also, teachers are members of these communities during these times. These results agree with the activity pattern of the school since these times represent class times. On the other hand, bigger clusters that consist of more than one class are detected during break and lunch times as shown in Fig. 2.6 and Fig. 2.7(b) and (c), respectively, which reflects the true activity pattern during these times. 42 2.5 Conclusions In this chapter, a new low-rank + sparse estimation based evolutionary spectral clustering approach is proposed to detect and track the community structure in temporal networks. In particular, the proposed method decomposes the network into low-rank and sparse parts and obtains smooth cluster assignments by minimizing the subspace distance between consecutive time points, simul- taneously. The estimated low-rank adjacency matrix is used for clustering and the subspaces are defined through spectral embedding. Furthermore, a cost function to track changes in the com- munity structure across time is introduced through the subspace distance measure. The algorithm is validated on multiple simulated and real weighted and binary temporal networks and compared to the existing state-of-the-art evolutionary clustering algorithms. The simulation results show that the proposed algorithm can adapt to the changing community structure and detect the correct community structure over time. The proposed approach detects and tracks the community structure in both weighted and binary temporal networks using the same optimization problem efficiently unlike some existing methods that are limited to either binary or weighted networks. Moreover, LS-ESC approach is robust to noise and outliers, non-stationarity and abrupt changes in the structure of the temporal network due to the estimation of the low-rank part of the adjacency matrix at each time point. 43 (a) (b) Figure 2.5: Cost function calculated by LS-ESC for the primary school temporal network for the: (a) first day, (b) second day. Red dashed lines refer to the time during the day. 44 024681012Cost function calculated by LS−ESCTimeCost function10:0011:0012:0002:0003:0004:0005:00Break TimeBreak Time01:00Lunch Time02468101214TimeCost functionCost function calculated by LS−ESC for the second day 10:00 11:00 05:00 04:00 03:00 02:00 12:00Break TimeLunch TimeBreak Time 01:00 Figure 2.6: Detected community structure in the first day of the primary school temporal network at different times: (a) Morning, (b) break, (c) lunch and (d) end of the school day. 45 Detected community structure by LS−ESC at 9:11 amDetected community structure by LS−ESC at 10:55 amDetected community structure by LS−ESC at 1:05 pmDetected community structure by LS−ESC at 4:55 pm(b)(a)(c)(d)1A 1B 2A3A3B4A4B5A5BTeachers Figure 2.7: Detected community structure in the second day of the primary school temporal net- work at different times: (a) Morning, (b) break, (c) lunch and (d) end of the school day. 46 Detected community structure by LS−ESC at 9:35 amDetected community structure by LS−ESC at 10:40 amDetected community structure by LS−ESC at 1:00 pmDetected community structure by LS−ESC at 4:25 pm(a)(c)(d)(b) Chapter 3 Tensor Based Temporal and Multi-layer Community Detection 3.1 Introduction Networks are commonly used to model the relationships and interactions between people or objects such as in social, biological, and communication networks. A lot of research has been done in detecting the community structure of networks, especially for static networks [96]. However, in many applications, the community structure changes over time. In these applications, the system is represented by a dynamic network [5]. The topology of dynamic networks change over time, since nodes and edges might come and go. In other applications, the nodes of each graph might have different types of relationships that evolve with time, and this variation should be taken into account to understand the correct community structure of these networks [37]. In this case, the system is represented by a dynamic multi-layer network [41]. Over the past decade, multiple approaches have been proposed to investigate the community structure in dynamic networks. Yet, community detection algorithms for dynamic multi-layer networks are very limited. Standard spectral clustering can be used to analyze these networks at each time point or layer separately, but this approach is very sensitive to instantaneous noise. 47 In recent years, different approaches have been developed to detect the community structure of dynamic networks [31–34]. These approaches include simple extensions of static clustering meth- ods, statistical modeling based methods and tensor based methods. Extensions of static clustering methods such as k-means and agglomerative hierarchical clustering approaches were developed in [31] for dynamic networks. A cost function that considers both the cluster membership at the current time and the deviation from the previous time point was introduced. In [32], two spectral clustering based frameworks were introduced: preserving cluster quality (PCQ) and preserving cluster membership (PCM). In both frameworks, the clustering depends on a cost function to guar- antee temporal smoothness. However, this cost function requires a priori knowledge about the number of clusters and depends on the choice of a changing parameter. Under statistical modeling based methods, [33] assumed a statistical model for the evolution of the adjacency matrix and an adaptive forgetting factor for evolutionary clustering and tracking (AFFECT) was introduced. In AFFECT, static clustering methods were used after smoothing the proximity between objects over time. Under tensor based methods, a non-negative tensor factorization approach was introduced in [34]. This approach used PARAFAC to detect the community structure in temporal networks, where a single 3-way tensor was constructed to represent all the successive adjacency matrices in the temporal network. Over the past years, few methods have been developed to detect the community structure in dynamic multi-layer networks. In [35], the authors introduced a general framework that encom- passes time-evolving, multi-scale and multi-layer networks to determine the community structure in multi-slice networks. These networks include a set of adjacency matrices that represents the vari- ation in connectivity between nodes across time, or variation in the types of relationships across layers, or variation of the same network across multiple scales. The framework relies on a gener- alization of the modularity optimization problem for detecting the community structure of a single 48 network. Since this framework depends on modularity, it has multiple drawbacks including the computational cost, null model selection and the possibility of the convergence to a non-optimal solution as discussed in [20]. Another evolutionary clustering approach to mine and track dynamic multi-layer networks is introduced in [41]. The proposed framework relies on a multi-objective optimization representation and the solution is obtained by a genetic algorithm. This approach has a high computational cost and impractical in the case of detecting the community structure for large networks. In this chapter, we propose a tensor based approach using 3-way and 4-way tensors to detect the community structure of dynamic single-layer and dynamic multi-layer networks, respectively. In the case of dynamic networks, two approaches, 3-dimensional windowed tensor approach (3D- WTA) and running time tensor approach (RTTA) are proposed. In 3D-WTA, a fixed length sliding window is used to construct the tensors, while a variable length sliding widow is used to con- struct the tensors in RTTA. In the case of dynamic multi-layer networks, we extend 3D-WTA to 4-dimensional windowed tensor approach (4D-WTA) to optimize the contribution of the multiple layers and time points to the community structure, simultaneously. The main contributions of the proposed work are five-fold. First, a tensor based approach that extends spectral clustering to tem- poral and multi-layer networks is introduced. The proposed approach simultaneously takes the past history across time or time and multiple layers into account to obtain a common community structure for the network. Moreover, unlike simple averaging based methods, the proposed ap- proach optimizes the contribution of each layer and time point within the framework. Second, the proposed approach preserves the topological structure of the networks by using tensors to repre- sent the network across time and layers. Third, the proposed framework offers a way to track the changes in the community structure over time. In particular, a normalized cost function is intro- duced to track the changes in the community structure over time. Fourth, the proposed framework, 49 unlike AFFECT, is model-free and it tracks and identifies the dynamic community structure across time and layers without making any assumptions about the changes in the community structure like PCM and PCQ the temporal clustering (TC) framework in [78]. Fifth, the proposed work can be applied to investigate different types of networks, such as social and biological networks. An application of interest in our work is studying and analyzing the dynamic functional connectivity networks of the brain during resting state and cognitive control, with a major focus on resting state fMRI (rs-fMRI). In particular, the dFCNs are summarized into a set of functional connectivity patterns or states that repeat over time. For each subject, each time point in the dFCN is repre- sented by a FC state. Unlike previous work [55], [56] and [97], the FC states are defined through a common community structure. Furthermore, transition profiles between different states are ob- tained for each subject to show the variance in the dynamic brain activity across subjects during rest. A consistency measure is employed to quantify the dynamics of the different subnetworks in rs-fMRI. 3.2 Background 3.2.1 Quality Metrics to Determine the Number of Clusters Different methods and quality metrics have been utilized to determine the number of clusters and to quantify the quality of a community structure when the ground truth is unknown [98,99]. Some of the most commonly used metrics include eigengap criterion [7], [100], modularity [11] and asymptotical surprise [90]. In the eigengap criterion, the number of clusters, r, for a given normalized similarity matrix, AN, with eigenvalues λ1 ≥ λ2 . . . ≥ λn is determined by min{r : λi − λi+1 ≤ δ ,for all i > r}. The threshold δ is determined as the 0.95 quantile of the largest eigengap obtained from the eigenvalues 50 of a null model generated from random networks with the same degree distribution and weights as the original network [100]. community detection methods [30, 33, 92]. Modularity is defined as Q = 1 Another commonly used metric is modularity [11], which has been used as a quality metric in [Ai j − DiD j 2E ] Ai j is the total edge weight, Di is the degree of node i, ¯C = {C1, . . . ,Cr} is the set of clusters and Vc is the set of nodes in the cth cluster. This metric measures the deviation where 2E = n ∑ i, j=1 2E ∑ c∈ ¯C ∑ i, j∈Vc fraction of the within community edges in the network from the expected value of the same quan- tity in the network with the same community partitions but random connections between the ver- tices [11]. However, modularity has been shown to suffer from the resolution and null model selection problems [101]. The addition of a tunable parameter namely a resolution parameter γ is suggested in [102] to address this issue and the definition of the modularity is modified as Qγ = 1 2E ∑ c∈ ¯C ∑ i, j∈Vc [Ai j − γ DiD j 2E ], where Qγ is the multi-resolution modularity metric. An approach to calculate the resolution parameter is suggested in [103]. Despite the fact that the addition of γ improves the performance of modularity, it still does not provide a reliable solution with the inappropriate choice of the null model [20]. In order to overcome the limitations of modularity, asymptotical surprise is proposed in [90]. Asymptotical surprise is defined as Sa = 2E DKL(q||(cid:104)q(cid:105)), where 2E is the sum of edge weights in the adjacency matrix, q = Eint edge weights and (cid:104)q(cid:105) = pint 2E is the observed ratio of the intra-cluster edge weights to the total p is the expected ratio of intra-cluster edge weights with pint as the total sum of possible intra-cluster edge weights. DKL is the Kullback-Leibler divergence [104]. This quality metric compares the distribution of the nodes and edges in the detected communities in a certain network with respect to a null model. The community structure with the highest score is then selected. As discussed in [90], [105] and [98], this metric is not affected by the resolution limit unlike modularity. 51 3.3 Methods 3.3.1 Spectral Clustering In graph theory, a common method to detect the community structure in static networks is spectral clustering, as it represents the relationship between the nodes in the graph in a lower dimensional subspace. Spectral clustering solves the following trace optimization problem [7, 8], min U∈Rm×r Trace U†ΦΦΦU , s.t U†U = I, (3.1) where, ΦΦΦ = I − AN is the normalized Laplacian matrix. AN = D− 1 2 is the normalized adjacency matrix, where AN ∈ Rn×n, n is the number of nodes and D is the diagonal degree matrix 2 AND− 1 which is defined as Dii = n ∑ j=1 ai j. Spectral clustering problem can also be written as, (cid:16) (cid:16) (cid:17) (cid:17) max U∈Rn×r Trace U†ANU , s.t U†U = I, (3.2) or since AN is positive semi-definite, (3.2) can be written as, (cid:107) U†ANU (cid:107)2 F , max U∈Rn×r s.t U†U = I. (3.3) The solution to the optimization problems in (3.1), (3.2) and (3.3) is found by choosing U as the matrix that contains the r eigenvectors that correspond to the smallest eigenvalues of ΦΦΦ or the largest eigenvalues of AN. The community structure is then determined by applying k-means to the matrix U [7]. 52 3.3.2 3D-Windowed Tensor Approach (3D-WTA) In clustering dynamic or temporal networks at time t, it is important to take the recent history of the network into account so that the resulting community structure is smooth across time. One way to accomplish this is to employ a weighted average of the adjacency matrices within a time window of length L and then optimize the conventional spectral clustering cost function over this weighted average. This results in an optimization over both the subspace vectors matrix, U(t), and the weight vector, W (t). The optimization problem in (3.3) can then be rewritten as, (cid:32) L ∑ l=1 (cid:107) U(t)† max U(t),W (t) (cid:33) w(t) l A(l) N U(t) (cid:107)2 F , ,s.t U(t)†U(t) = I, W (t) ≥ 0,(cid:107) W (t) (cid:107)2= 1, (3.4) (cid:105) = N where wl are the weights for the adjacency matrices within a time window of length L. (cid:104) In the proposed approach, 3D-WTA, we build a third order tensor, X (t) ∈ Rn×n×L A(t−L+1) matrix A(l) , . . . ,A(t) N N . A sliding fixed size tensor is constructed across time and (3.4) can be reformulated in , where the lth frontal slice of the tensor X (t) is the normalized adjacency terms of Tucker decomposition as follows, (cid:107) X (t) ×1 U(t)† ×2 U(t)† ×3 W (t)† (cid:107)2 F , s.t U(t)†U(t) = I,(cid:107) W (t) (cid:107)2= 1, (3.5) max U(t),W (t) where U(t) is the basis along the first and second modes and W (t) corresponds to the weighting vector across time. Since the networks are undirected, the components along the first two modes are the same. The optimization in (3.5) can be solved by higher-order orthogonal iteration (HOOI), where in each iteration either the matrix U(t) or the vector W (t) is fixed to optimize the other one. This approach is summarized in the pseudo code in Algorithm 3.2. In this approach, we ini- 53 tialize the eigenvectors matrix using HOSVD for the initial tensor and then update this estimate using HOOI. For the remaining time points, in order to reduce the computational cost, the eigen- vectors matrix at time point t is initialized using the estimated optimal eigenvectors matrix from the preceding time point (t − 1), U(t−1). N , . . . ,A(M) Algorithm 3.2 3D-Windowed Tensor Approach (3D-WTA) Input: A(1) Output: Clustering Labels, Change points. 1: for t = L : M do 2: N ,L. Construct a similarity tensor , . . . ,A(t) X (t) = . N Unfold the tensor along mode 3 to get X(t) (3). if t = L then N (cid:104) A(t−L+1) (cid:105) else Initialize U(t) Initialize U(t) 0 by HOSVD of X (t). 0 by Ut−1. 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: end if i = 0 Calculate W (t) Compute ¯A = Obtain U(t) if (cid:107) U(t) i+1 as the dominant left singular vector of X(t) (3) t ∑ l=t−L+1 (W (t) i+1)lA(l) N . i+1 by eigenvalue decomposition of ¯A. F > ε then i (cid:107)2 i+1 − U(t) i = i + 1. goto line 10. 13: 14: 15: 16: 17: 18: 19: end for 20: Track the change points using (3.11). end if Normalize the columns of U(t). Apply k−means on U(t) to obtain clustering labels. (cid:16) i ⊗ U(t) U(t) i (cid:17) . 3.3.3 Running Time Tensor Approach (RTTA) (cid:104) N ···A(t) A(1) N (cid:105) In this approach, the tensor at time point t is built by taking all history into account, where X (t) = . This approach estimates the optimal eigenspace that represents the network at time 54 t in a similar fashion to 3D-WTA. The main differences are in the tensor size and initialization of the matrix U(t). This approach uses a window size that grows with time. Consequently, the tensor constructed at each time point has a different size. Moreover, at each time point the initial eigenvectors matrix is an updated version of the previous one. This update was motivated by the fact that the tensor at time t is the same as the tensor at time (t − 1) except for one time slice. Therefore, the matrix U(t) is adaptively updated based on the new adjacency matrix. We used TrackU algorithm [106, 107] to update the eigenvectors matrix. The RTTA is described in Algorithm 3.3, and the eigenvectors matrix update is described in Algorithm 3.4. N . . .A(M) Algorithm 3.3 Running Time Tensor Approach (RTTA) Input: A(1) N , starttime, f orgetting f actor = β . Output: Clustering Labels, Change points. 1: for t = starttime : M do 2: N . . .A(t) A(1) Construct a similarity tensor X (t) = N Unfold the tensor along mode 3 to get X(t) (3). if t = starttime then (cid:104) (cid:105) . 0 by HOSVD of X (t) 1 . 0 =TrackU(U(t−1),S(t−1),A(t) N ,β ). 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: Initialize U(t) else Initialize U(t) end if i = 0 Calculate W (t) Compute ¯A = (cid:16) i ⊗ U(t) U(t) i (cid:17) . i+1 as the dominant left singular vector of X(t) (3) t ∑ l=1 (W (t) i+1)lA(l) N . i+1 by eigenvalue decomposition of ¯A as, ¯A = U(t) F > ε then i+1S(t) i+1 (cid:16) U(t) i+1 (cid:17)† . Obtain U(t) if (cid:107) U(t) i+1 and S(t) i (cid:107)2 i+1 − U(t) i = i + 1. goto Line 10. 13: 14: 15: 16: 17: 18: 19: end for 20: Track the change points using the normalized cost function given in (3.11). end if Normalize the columns of U(t). Apply k-means on U(t) to obtain clustering labels. 55 Algorithm 3.4 TrackU Input: Eigenvectors matrix: U, Eigenvalues matrix: S, Adjacency matrix at time point t: A(t) N , forgetting factor: β . Output: Updated eigenvectors matrix. 1: CA =size(AN,2), CU =size(U,2), s=diag(S). 2: for i = 1 : CA do ¯b = AN(:,i). 3: for j = 1 : CU do 4: 5: ¯b j. y j = u† j s(t) j = β s j + y2 j. Compute the error: e j = ¯b j − y ju j. u j = u j + 1 y je j. s j ¯b j+1 = ¯b j − y ju j. end for 9: 10: 11: end for 12: Orthogonalize U using Gram-Schmidt. 6: 7: 8: 3.3.4 4D-Windowed Tensor Approach (4D-WTA) In the case of four dimensional networks, where graphs are built over time and across multi-layer or multi-subjects, we want to track the changes in the community structure at time t across all layers based on the past history. In a similar fashion to the 3D case, we can build a weighted average of the adjacency matrices within a time window of length L across all layers and time, then optimize the conventional spectral clustering cost function over this weighted average. This results in an optimization over the subspace vectors matrix, U(t), the weighting vector across layers, W (t) s , and the weighting vector over time points within a window, W (t) w . The optimization problem in Eq. (3.3) can then be rewritten as, (cid:32) K L (cid:107) U(t)† ∑ max U(t),W (t) k=1 s s.t U(t)†U(t) = I, W (t) s ,W (t) w w(t) sk ∑ wl A(l,k) w(t) w ≥ 0, (cid:107) W (t) , W (t) l=1 N U(t) (cid:107)2 F , (cid:107)2= 1, (cid:107) W (t) w (cid:107)2= 1, (3.6) (cid:33) s where w(t) wl are the weights for the adjacency matrices within a time window of length L, w(t) sk are 56 the weights across layers and K is the total number of layers. In the proposed 4D-WTA, we build a fixed size fourth order tensor, X (t) ∈ Rn×n×L×K = slice of the tensor X (t) is the adjacency matrix A(l,k) at the time point l for subject k. (3.6) can be , where the (l,k)th , . . . ,A(t,1:K) N N N (cid:104) A(t−L+1,1:K) (cid:105) reformulated in terms of Tucker decomposition as follows, (cid:107) X (t) ×1 U(t)† ×2 U(t)† ×3 W (t) w † ×4 W (t) s † (cid:107)2 F , (3.7) max U(t),W (t) s s.t U(t)†U(t) = I, (cid:107) W (t) ,W (t) w w (cid:107)2= 1, (cid:107) W (t) s (cid:107)2= 1, where U(t) is the basis along the first and second modes, W (t) w corresponds to the weighting vector across time and W (t) s is the weighting vector across layers. The components along the first two modes are the same, since the networks are undirected. The problem in (3.7) can be solved by HOOI. This approach is summarized in the pseudo code in Algorithm 3.5. In this approach, the pro- jection matrices for the first tensor are initialized using HOSVD and then updated using HOOI. For the remaining time points, the eigenvectors matrix, U(t), and the weighting vector over time, W (t) w , are initialized from the estimated optimal eigenvectors matrix and time weighting vector at the preceding time point (t − 1), U(t−1) and W (t−1) , respectively. w 3.3.5 Tracking the Community Structure Over Time To track the changes in the community structure using the proposed approaches, we define a cost function that quantifies the quality of the cluster assignment. For 4D-WTA, this cost function is defined as (cid:107) X (t) ×1 U† ×2 U(t)† ×3 W (t) † (cid:107)F and satisfies the following inequality, † ×4 W (t) w s Cost = (cid:107) X (t) ×1 U(t)† ×2 U(t)† ×3 W (t) w (cid:107)2(cid:107) W (t) ≤ (cid:107) X (t) (cid:107)F(cid:107) U(t) (cid:107)2 2(cid:107) W (t) † ×4 W (t) (cid:107)2, w s s † (cid:107)F (3.8) 57 N N ,L. , . . . ,A(M,1:K) Algorithm 3.5 4D Windowed Tensor Approach (4D-WTA) Input: A(1,1:K) Output: Clustering Labels, Change points. 1: for t = L : M do 2: Construct a similarity tensor X (t) = Unfold the tensor along mode 3 and mode 4 to get X(t) if t = L then (cid:104) A(t−L+1,1:K) , . . . ,A(t,1:K) (cid:105) N N . Initialize U(t) Initialize U(t) 0 and W (t) 0 by U(t−1) and W (t) w,0 by HOSVD of X(t). w,0 by W (t−1) w . (3) and X(t) (4), respectively. s,i+1 as the dominant left singular vector of X(t) (4) w,i+1 as the dominant left singular vector of X(t) (3) (cid:16) (cid:16) (cid:17) i ⊗W (t) . i ⊗ U(t) w,i i (cid:17) . i ⊗ U(t) U(t) s,i+1 ⊗ U(t) W (t) 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: else end if i = 0 Calculate W (t) Calculate W (t) Compute ¯A = Obtain U(t) if (cid:107) U(t) K ∑ k=1 t ∑ (W (t) (W (t) s,i+1)k i+1 by eigenvalue decomposition of ¯A. w,i+1)lA(l,k) N . l=t−L+1 F > ε then i (cid:107)2 i+1 − U(t) i = i + 1. goto line 10. 14: 15: 16: 17: 18: 19: 20: end for 21: Track the change points using (3.10). end if Normalize the columns of U(t). Apply k−means on U(t) to obtain clustering labels. 58 which can be further simplified to, Cost = (cid:107) X (t) ×1 U(t)† ×2 U(t)† ×3 W (t) w ≤ (cid:107) X (t) (cid:107)F , † ×4 W (t) s † (cid:107)F (3.9) w (cid:107)2= 1, (cid:107) W (t) s (cid:107)2= 1 and (cid:107) U(t) (cid:107)2 since (cid:107) W (t) 2= λmax(U(t)†U), where λmax is the largest eigenvalue of U(t)†U, which equals to 1 since U(t)†U(t) = I. Therefore, by dividing both sides in (3.9) by (cid:107) X (t) (cid:107)F we get a normalized cost function which is always between 0 and 1, Cost4DNormalized = (cid:107) X (t) ×1 U(t)† ×2 U(t)† ×3 W (t) w (cid:107) X (t) (cid:107)F † ×4 W (t) s † (cid:107)F . (3.10) Similarly, we define the normalized cost function to detect the changes in community structure with 3D-WTA and RTTA as, Cost3DNormalized = (cid:107) X (t) ×1 U(t)† ×2 U(t)† ×3 W (t)† (cid:107)F (cid:107) X (t) (cid:107)F . (3.11) Changes in the normalized cost function in Eq. (3.10) and (3.11) reflect changes in the com- munity structure. Both cost functions are computed based on the detected community structure at each time point and quantify the agreement between the detected community structure and the underlying network structure. The change points are then determined using a change detection algorithm [108], [109]. In this algorithm, the normalized cost function is partitioned by choosing random change points. The total deviation of all points within the partition from the empirical mean is calculated. A total residual error is then calculated by summing up these deviations across all partitions. These steps are repeated by varying the location of the change points until the resid- ual error is minimized. In 3D-WTA and RTTA, the number of clusters at each time point t is determined by repeating 59 the algorithm for different number of clusters, usually for 2-10 clusters and the number of clus- ters that maximizes the asymptotical surprise metric is considered as the best in representing the underlying network. Similarly, in 4D-WTA the algorithm is repeated over a different number of clusters. The asymptotical surprise of the adjacency matrix for each layer and time point is then computed and averaged across all layers. Finally, the number of the clusters that maximizes the average asymptotical surprise is selected as the number of clusters for all layers at that time point. 3.4 Results 3.4.1 Simulated Temporal Networks 3.4.1.1 Description of the Data sets Four simulated data sets with different community structures and different noise levels are gener- ated to validate the performance of the proposed 3D-WTA. For each data set, a dynamic network is generated for 60 time points with 100 nodes. Intra and inter-cluster edges are selected from a trun- cated Gaussian distribution in the range of [0,1]. The parameters and the details of the community structures for these data sets are given in Table 3.1. The SNR of each network is defined as the ratio of the total power of intra-cluster edges to the total power of inter-cluster edges. The SNR can be calculated as: SNR = 10log10 1 (tm−tM(cid:48) +1) 1 (tm−tM(cid:48) +1) tM(cid:48) ∑ t=tm tM(cid:48) ∑ t=tm ∑ i, j∈inter A(t) i j 2 A(t) i j ∑ i, j∈intra , 2 (3.12) where tm and tM(cid:48) refer to the start and end time points in the network, respectively. The term (tm −tM(cid:48) + 1) refers to the total number of time points in the network. 60 In order to introduce variability across the different data sets, we have further varied the struc- ture of the networks for data set 1 and 2 during the intervals 21-40 and 1-20, respectively. For these networks and time intervals, the large cluster with 60 nodes is generated such that the nodes close to the diagonal of the cluster have stronger connections than the nodes on the off-diagonal. In particular, all the edges close to the diagonal of the cluster are randomly selected from a trun- cated Gaussian distribution with µinter and σinter while 60% of the edges on the off-diagonal of the cluster are selected from a truncated Gaussian distribution with µinter and σinter and the remaining edges are selected from the same distribution with µintra and σintra. 61 Data Sets 1 2 3 4 µintra,σintra, µiner,σinter SNR(dB) Time points 1-20 0.7,0.4,0.4,0.2 -1.69 Number of clusters Nodes in each cluster C C1(1− 30), C2(31− 60),C3(61− 80), 4 C4(81− 100) µintra,σintra, µiner,σinter SNR(dB) 0.5,0.4,0.3,0.2 1.839 Number of clusters Nodes in each cluster C C1(1− 60), C2(61− 80),C3(81− 100) 3 µintra,σintra, µiner,σinter SNR(dB) 0.6,0.4,0.2,0.2 1.233 Number of clusters Nodes in each cluster C C1(1− 30), C2(31− 60),C3(61− 80), 4 C4(81− 100) µintra,σintra, µiner,σinter SNR(dB) 0.8,0.5,0.3,0.2 0.0339 Number of clusters Nodes in each cluster C C1(1− 30), C2(31− 60),C3(61− 80), 4 C4(81− 100) Time points 21-40 Time points 41-60 0.5,0.3,0.2,0.2 0.7,0.4,0.4,0.2 3.32 3 -1.77 4 C1(1− 60), C2(61− 80),C3(81− 100) C1(1− 30), C2(31− 60), C3(61− 80),C4(81− 100) 0.5,0.2,0.2,0.2 0.5,0.4,0.3,0.2 -1.289 5 -0.643 4 C1(1− 20), C2(21− 40),C3(41− 60), C4(61− 80), C5(81− 100) C1(1− 40), C2(41− 60), C3(61− 80), C4(81− 100) 0.4,0.2,0.2,0.1 0.6,0.4,0.2,0.2 0.1628 5 -1.191 6 C1(1− 30), C2(31− 45),C3(46− 60), C4(61− 80),C5(81− 100) C1(1− 15), C2(16− 30), C3(31− 45), C4(46− 60), C5(61− 80), C6(81− 100) 0.6,0.2,0.2,0.3 0.8,0.5,0.3,0.2 -4.541 8 0.0388 4 C1(1− 15), C2(16− 30), C3(31− 45), C4(46− 60),C5(61− 70), C6(71− 80), C7(81− 90), C8(91− 100) C1(1− 30), C2(31− 60), C3(61− 80), C4(81− 100) Table 3.1: Parameters of the truncated Gaussian distribution used to generate the edges of the simulated networks for four different datasets, and the ground truth of the nodes’ community membership for the generated networks. 62 Figure 3.1: Detected number of clusters (DNOC) by 3D-WTA for the data sets in Table 3.1 using different metrics, including: Eigengap (EG), modularity (Q), muti-resolution modularity Qγ and asymptotical surprise (AS). 3.4.1.2 Comparison of different quality metrics For 3D-WTA, a 3-mode tensor, X(t) ∈ R64×64×4, is constructed at each time point, and 3D-WTA is applied with a window length of 4. For each data set, the networks are simulated 100 times. In order to choose the best quality metric for determining the number of clusters, the different metrics discussed in Section 3.2.1 are employed. The detected number of clusters (DNOC) for the four different data sets by 3D-WTA using the eigengap criterion (EG) with δ calculated as suggested in [100], modularity (Q) [11], multi-resolution modularity (Qγ) with γ calculated as suggested in [103] and asymptotical surprise (AS) [90] are presented in Fig. 3.1. As can be seen in Fig. 3.1, the best quality metric in determining the number of clusters over time is asymptotical surprise. With the EG criterion, the number of clusters is detected more 63 51015202530354045505560123456789Time PointDNOCData Set 4 WTAASWTAQWTAQ γWTAEGGround truth5101520253035404550556011.522.533.544.5Time PointDNOCData Set 15101520253035404550556011.522.533.544.555.5Time PointDNOCData Set 2510152025303540455055602.533.544.555.566.5Time PointDNOCData Set 3 accurately when the networks have a high SNR and large clusters with all the nodes strongly connected within these clusters. On the other hand, the modularity metric performs better than the multi-resolution modularity for the different data sets. However, both fail to detect the number of clusters correctly when the community structure is noisy with small clusters. Consequently, asymptotical surprise is selected to quantify the quality of the detected community structure and to determine the number of clusters in the rest of this chapter. 3.4.1.3 Comparison between 3D-WTA and state of the art algorithms The performance of the proposed approach is compared to other state of the art community detec- tion algorithms including AFFECT [33], normalized cut PCM (PCM-NC), normalized cut PCQ (PCQ-NC) [32], temporal clustering (TC) [78] and generalized Louvain (GenLov) for multi-layer modularity maximization with the value of γ calculated as described in [103] using variation of information metric. The TC framework is implemented for the different data sets with different number of generative models and the number that detects the best community structure is pre- sented (5, 7, 8 and 7 are the number of generative models that performed the best for data sets 1, 2, 3 and 4, respectively). The GenLov algorithm detected the correct community structure best with γ = 1. A range of coupling coefficients has been tested with GenLov and the best performance is achieved with coupling coefficient, ω = 1 [35], [110]. The number of clusters at each time point for 3D-WTA, AFFECT, PCM-NC and PCQ-NC is determined as the number that maximizes the asymptotical surprise metric. On the other hand, the ground truth community structures are given to the TC algorithm a priori to evaluate the community structure over time. As it can be seen from Fig. 3.2 (a)-(d), the normalized cost function reflects the changes in the networks, where change points are detected around time points 20 and 40. The proposed algorithm is able to detect the correct number of clusters over time better than other methods as 64 shown in Fig.3.2 (e)-(h). The variation of information scores achieved by the different methods are given in Fig.3.2 (i)-(l). The results indicate that the proposed algorithm outperforms the other algorithms in terms of detecting the correct community structure over time and can correct itself faster to capture the true community structure after the change points. As seen in Fig.3.2 (e)-(l), the performance of AFFECT, PCM, PCQ and GenLov declines with increasing noise level and decreasing cluster size. 3.4.1.4 Comparison between 3D-WTA and 4D-WTA In order to compare 3D-WTA with 4D-WTA, a simulated dynamic network for 60 time points with 64 nodes across 20 layers is generated. The edges of the graphs of the simulated dynamic network are taken from a truncated Gaussian distribution in the range of [0,1], with the specifications shown in Table 3.2. For evaluating 3D-WTA, the networks at each time point are averaged across layers. A 3-mode tensor, X(t) ∈ R64×64×4, is constructed at each time point, with a window length of 4. Moreover, 4D-WTA is applied to the same dynamic network where a 4-mode tensor, X(t) ∈ R64×64×4×20 , is constructed at each time point, with a window length of 4. This experiment is repeated 100 times. A comparison between the detected community structure by 3D-WTA and 4D-WTA and the network’s ground truth in terms of variation of information is shown in Fig. 3.3. The variation of information is calculated by comparing the clustering results with the ground truth for each layer separately and then averaged across all layers. The results show that 4D-WTA performs better than 3D-WTA in detecting the community structure over time. This indicates that the 4D-WTA captures the true community structure better than 3D-WTA since it captures the variations across layers, whereas 3D-WTA loses this variability due to averaging the networks across layers. 65 Figure 3.2: Comparison between 3D-WTA and state of the art algorithms for the different data sets in Table 3.1: (a)-(d) The normalized cost function calculated by 3D-WTA, (e)-(h) The detected number of clusters (DNOC) for the simulated networks using different algorithms. 3D-WTA es- timated the correct number of clusters over time and the normalized cost function reflected the correct change points represented by the pink dashed lines. (i)-(l) Comparison of variation of information (VI) between 3D-WTA, AFFECT, PCM-NC, PCQ-NC, TC and GenLov for the sim- ulated networks. 3D-WTA scored the lowest variation of information over time compared to the other methods. 66 1020304050600.250.260.270.280.29Time StepCost3DNormalizedData Set 15101520253035404550556022.533.544.55Time StepDNOC AFFECTGenLovPCM−NCPCQ−NCTC3D−WTAGround Truth1020304050600.260.280.30.320.34Time StepCost3DNormalizedData Set 25101520253035404550556033.544.555.56Time StepDNOC5101520253035404550556000.050.10.15Time StepVI1020304050600.290.30.310.320.330.34Time StepCost3DNormalizedData Set 35101520253035404550556044.555.566.57Time StepDNOC5101520253035404550556000.050.10.15Time StepVI1020304050600.250.30.35Time StepCost3DNormalizedData Set 451015202530354045505560246810Time StepDNOC5101520253035404550556000.050.10.150.2Time StepVI5101520253035404550556000.050.10.150.2Time StepVI(e)(f)(g)(h)(d)(c)(b)(a)(i)(j)(k)(l) Figure 3.3: Variation of information (VI) comparison between the detected community structure by 3D-WTA and 4D-WTA and the ground truth for simulated data. 67 010203040506000.010.020.030.040.050.060.070.080.090.1Time StepVIVI comparison between 4D−WTA and 3D−WTA for simulated data 4D−WTA3D−WTA Layers 1− 5 6− 15 16− 20 Time points 1-20 Time points 21-40 Time points 41-60 0.2,0.1,0.1,0.1 0.2,0.2,0.1,0.1 0.2,0.2,0.1,0.1 µintra,σintra, µiner,σinter Nodes in each cluster C C1(1− 22), C2(23− 44), C3(45− 64) C1(1− 22), C2(23− 64) C1(1− 22), C2(23− 44), C3(45− 64) µintra,σintra, µiner,σinter Nodes in each cluster C C1(1− 22), C2(23− 44), C3(45− 64) C1(1− 22), C2(23− 64) C1(1− 22), C2(23− 44), C3(45− 64) µintra,σintra, µiner,σinter Nodes in each cluster C C1(1− 25), C2(26− 49), C3(50− 64) C1(1− 25), C2(26− 64) C1(1− 25), C2(26− 49), C3(50− 64) 0.4,0.1,0.1,0.1 0.2,0.1,0.1,0.1 0.1,0.1,0.1,0.1 0.2,0.1,0.1,0.1 0.2,0.1,0.1,0.1 0.4,0.1,0.1,0.1 Table 3.2: Parameters of the truncated Gaussian distribution used to generate the edges of the simulated networks for the 4D simulated data, and the ground truth of the nodes’ community membership for the generated network. 68 3.4.1.5 Flocks of Boids Dataset This data set represents the natural flocking behavior of birds [33] [111]. The data set is represented by a dynamic network across 40 time points. The network at each time point consists of 100 boids and the community structure changes over time by changing the flock membership of one boid at each time step. The adjacency matrices across a window at time point t were represented by a 3- mode tensor. The proposed 3D-WTA algorithm was applied to this dataset to detect the community structure and track the changes over time. The number of clusters at each time point was detected using modularity, and the normalized cost function in (3.11) was used to track the changes in the community structure over time. The boids are arranged into 4 clusters until time point 18, then they rearrange into 2 clusters until the end. The change points and the number of clusters shown in Fig. 3.4(a) were detected using 3D-WTA with L = 8. In Fig. 3.4(b), the Rand index of the proposed algorithm and other state of the art evolutionary clustering methods are compared. The results show that our algorithm performs almost the same as AFFECT and better than PCQ and PCM in terms of Rand index. On the other hand, the proposed algorithm performs better than the existing methods in terms of detecting the changes in community structure and the number of clusters over time. Moreover, the proposed algorithm is robust over multiple runs with respect to tracking changes and number of clusters over time. 3.4.2 MIT Reality Mining Dataset This data set represents the cell phone activity for 94 students and staff in MIT Media Laboratory. It was collected between August/2004 and June/2005 [93]. The data set is represented by a dynamic network across 46 time points. Each point refers to a week during the school year. Each time point was represented by a 3-mode tensor. Both algorithms were applied to this data to track the changes 69 over time.We assumed that there were two dominant clusters at each time point, and we used the normalized cost function in (3.10) to track the changes in the community structure over time. The detected change points corresponds to end of the Fall semester, start of the Spring semester, spring break and the end of the Spring semester. The change points shown in Fig.3.5 were detected by the WTA with α = 11. 3.4.3 Application to dFCN From EEG Dataset The proposed tensor tracking approach is applied to a set of connectivity graphs constructed from EEG data containing the error-related negativity (ERN). The ERN is a brain potential response that occurs following performance errors in a speeded reaction time task usually 25-75 ms after the response. Previous work [112] indicates that there is increased coordination between the lateral prefrontal cortex (lPFC) and medial prefrontal cortex (mPFC) within the theta frequency band (4-8 Hz) and ERN time window. EEG data from 63-channels was collected in accordance with the 10/20 system on a Neuroscan Synamps2 system (Neuroscan, Inc.) sampled at 128 Hz from 91 subjects. The task was a common speeded response letter (H/S) flanker, where error response locked trials from each subject were utilized. The EEG data are pre-processed by the spherical spline current source density (CSD) waveforms to sharpen event-related potential (ERP) scalp topographies and reduce volume conduction [113]. We constructed average connectivity networks at each time point across subjects to represent the adjacency matrix of the connectivity graphs. Individual connectivity networks were constructed by using a previously introduced phase synchrony measure [114]. This results in a dynamic net- work with 63 nodes and 256 time points. We applied the proposed RTTA to detect the community structure over time. Starting from time point 30, a 3-mode tensor was built at each time point.The change points were detected by the normalized cost function in (3.10) as shown in Fig.3.6. As it 70 can be seen in Fig.3.6, there is an increase in the cost function around −78 ms right before the response time. Similarly, there is a decrease in the cost function around 400 ms corresponding well with the end of the ERN waveform. The number of clusters was determined by eigengap criterion and the detected community structure corresponding to these three time periods are given in Fig.3.7. While the community structures for pre- and post- ERN intervals are similar, there is increased segregation during ERN especially in the frontal central brain regions. This is aligned with the increased coordination between lPFC and mPFC during this time. 3.4.4 Application to dFCN From rs-fMRI Dataset 3.4.4.1 Data Preprocessing and Functional Connectivity Networks Construction The rs-fMRI dataset used in this chapter is from Bangor rs-fMRI data. This dataset is part of 1000 Functional Connectomes Project [115]. The dataset includes 20 healthy subjects (males, aged be- tween 19−38). The rs-fMRI data was collected using a 3T Magnet scanner from all subjects while their eyes were open. TR equals to 2s yielding 34 slices and 265 time points. The pre-processing was done using SPM12 [116] and CONN [117] functional connectivity toolboxes. The structural images were registered to the mean functional image and spatially normalized to MNI space for each subject. Slice timing and motion correction were performed on every functional image. The functional images were regionally parcellated by automated anatomical labeling (AAL) atlas [118] and smoothed by spatial convolution with 4-mm FWHM Gaussian kernel. Confounds such as mo- tion parameters obtained from realignment and BOLD signals obtained from white matter and CSF masks were regressed out and band-pass (0.0084-0.09 Hz) temporal filtering was applied to functional images of each subject. 71 3.4.5 Dynamic Functional Connectivity Network Construction The dynamic functional connectivity networks are constructed for each subject by computing the Pearson correlation coefficient, ρ, between different ROIs (90 ROIs: with 45 ROIs in the left and the other 45 in the right hemispheres, listed in Table1) using a sliding window approach with a rectangular window of length equal to 60 TRs and an overlap of 1 TR [60], [55], [56]. For each subject, a sequence of weighted and undirected graphs {A(t)}t1,...,tM are constructed i j, where A(t) ∈ Rn×n is the adjacency matrix representing the network at time step t, i j = ρt with A(t) n is the total number of nodes (ROIs) and M is the total number of time points. 3.4.5.1 Resting State Networks (RSNs) In order to study the brain dynamics over time, our analysis focuses on the temporal dynamics of functional connectivity communities. The detected communities will be evaluated with respect to well-known resting state networks including the default mode network (DMN), auditory network (AN), visual network (VN), somatomotor network (SMN), bilateral limbic network (BiN), subcor- tical network (SCN) and cognitive control network (CCN) as summarized in Table 3.3 [55]. Each subnetwork is defined by a group of ROIs that are commonly categorized to be part of these net- works. The purpose of applying the proposed approach to the dFCN is to determine the functional organization of the brain and evaluate the obtained functional communities with respect to these commonly defined resting state networks. 3.4.5.2 Detecting the Community Structure in rs-fMRI The proposed community detection approaches were applied to dFCNs constructed from rs-fMRI dataset described in Section 4.2.2.1. 90 ROIs were extracted after the preprocessing procedure for each subject. Networks built using Pearson correlation are positive semi-definite and consequently 72 RSN ROIs VN AN (43− 56) (29,30,79− 82) (21,22,27,28,37− 42,83,84,87,88) BiN DMN (3− 6,23− 26,31− 36,65− 68,85,85) SMN (1,2,17− 20,57− 60,63,64,69,70) SCN (71− 78) CCN (61,62,89,90,7− 16) Table 3.3: The ROIs belonging to the different subnetworks in resting state. The numbers in the table refer to the ROIs defined in Table 1. can be used as AN in (3.2) and (3.3). 3D-WTA is applied to each subject separately to track and detect the community structure over time and to analyze variability across subjects. At each time point, a 3-mode tensor is constructed using a sliding window with a length of 30, X (t) ∈ R90×90×30. For each subject, a total of 177 tensors are constructed. The constructed tensors are approximated by Tucker decomposition, as X (t) ≈ C ×1 U×2 U×3 W , where C ∈ Rr×r×1 is the core tensor, U ∈ R90×r is the orthogonal low rank component matrix along the connectivity mode and W ∈ R30×1 is the weighting vector across time mode. r is the rank which corresponds to the number of clusters. The number of clusters at each time point is determined using asymptotical surprise metric and varied between 2 and 10 clusters. For the application of 4D-WTA to rs-fMRI, the fact that not all subjects have the same com- munity structure within the same time window is taken into account. Recent studies in dynamic rs-fMRI literature have revealed groups of reproducible patterns of dFCNs named as FC states [55], [119]. As a result, the possibility that dFCNs across subjects may cycle through a few number of distinct FC states over time has been hypothesized and investigated [56], [71], [120], and [121]. 73 Following what have been proposed in literature, 4D-WTA is applied to rs-fMRI data to extract common network connectivity patterns, or connectivity states, across time and subjects. The main difference between our approach and what is proposed in the literature is that the states are iden- tified through their community structures and not just co-activation patterns. It is also important to note that the goal of 4D-WTA is to achieve data reduction rather than track the dynamics of FC within each subject unlike 3D-WTA. Following the preliminary steps shown in Fig. 3.8, the application of 4D-WTA to the rs-fMRI can be summarized as: 1. Subsample 30 random time points across all subjects, similar to [55], and form a 4D tensor, X (sample) ∈ R90×90×30×20. This subsampling process was repeated 200 times. 2. Apply 4D-WTA to detect the community structure of the tensor constructed from this ran- dom sampling across subjects and time. The tensor corresponding to each random subsam- pling is approximated by Tucker decomposition following lines 4-16 from Algorithm 3.5. 3. Determine the number of clusters for each subsampling using asymptotic surprise. 4. Use U from Eq. (3.7) to obtain the common community structure of the subsampling. 5. Determine the similarity between community structures from different subsmples using vari- ation of information metric and construct a 200× 200 similarity matrix. 6. Apply k-means clustering to the similarity matrix from Step (5) to identify the samples with similar community structure, i.e. the network states. After performing Step 2, the optimized weights for the different time points within a time window are found to be very close to each other with a mean of 0.1824 and a standard deviation of 0.0074. However, the optimized weights across subjects show more variation with a mean of 0.2155 and a standard deviation of 0.0597. This is consistent with the variations of functional 74 connectivity networks across subjects. The distributions of the weights across time and subjects for all samples are shown in Fig. 3.9(a) and (b), respectively. From Fig. 3.9, it can be noticed that the distributions are highly concentrated, indicating that all time points and subjects contributed almost equally to the clustering results. In the application of 4D-WTA to the rs-fMRI, the detected number of clusters for the differ- ent subsamplings is varied between 4 and 10. After applying k-means clustering to identify the samples with similar community structure, 5 clusters that correspond to the FC states shown in Fig. 3.10 are obtained. The optimal number of states is determined using the elbow criterion. The community structure that corresponds to each FC state is determined as the community structure of the centroid of each cluster found by k-means. The community structure corresponding to the detected FC states is compared to the dynamic functional connectivity network of each subject. For each subject and at each time point, the state that best represents each time point in terms of asymptotical surprise metric is assigned as the community structure at that time point. Finally, a state transition profile that tracks the community structure is obtained for each subject. The obtained state transition profiles for the different subjects followed different patterns, which reflects the variation in the brain activity across subjects during resting state as can be seen in Fig. 3.11. In the next part of this subsection, the community structure corresponding to each FC state and state transition profiles for the different subjects are discussed. Moreover, the dynamic behavior of the different RSNs is quantified through the consistency measure. Finally, a comparison between 3D-WTA and 4D-WTA in terms of tracking brain dynamics is presented. 75 3.4.5.3 Community structure of each FC state After applying 4D-WTA to the rs-fMRI, the community structure of FC states that are persistent across time and subjects are extracted as shown in Fig. 3.13. First, the detected FC states are evaluated based on their frequency of occurrence across time and subjects. This is achieved by calculating the average occurrence rate (AR%) across time and subjects, as shown on the top of each state in Fig. 3.13. For each subject, the occurrence rate of each state is calculated as the number of times this state occurred divided by the total number of time points. The occurrence rate of each state across subjects is then averaged to get AR%. As shown in Fig. 3.13(b), State 2 has the highest AR% at 25%. This state consists of one large cluster consisting of DMN and BiN. Furthermore, there is another large cluster that reflects the connectivity between other RSNs, such as SMN, AN and SCN. A similar state was observed previously in [55]. Other states including States 1 and 3− 5 have AR% between 13% to 22%. A major network of interest during resting state is the DMN, as this network is known to be activated during rest and deactivated during task related events [122], [123]. Across the different states, we observe that different ROIs of the DMN are clustered together in some states and clus- tered with other RSNs in other states. For example, in States 4 and 5, the DMN module includes the main ROIs of the DMN, namely the posterior cingulate gyrus (PCG), precuneus (PCUN), anterior cingulate gyrus (ACG), angular gyrus (AG) and superior frontal gyrus including the dorsolateral (SFGdor), medial (SFGmed) and medial orbital (ORBsupmed). On the other hand, in State 2, the DMN is assigned to the same community with ROIs from the BiN, mainly the hippocampus (HIP) and parahippocampal gyrus (PHG). In fact, HIP and PHG are known to play an important role in long-term memory [124,125], and previous work considered them as part of the DMN [126], [55]. On the other hand, in States 1 and 3, PCUN, PCG, ACG, SFGmed and middle frontal gyrus (MFG) 76 from CCN form a cluster, while the remaining ROIs of the DMN are clustered with other RSNs. This variation in the community membership between the different ROIs of the DMN reflects the dynamic behavior of this network and confirms its important role during resting state. Across the different states, the AN and the VN show a relatively stationary behavior. The ROIs of the AN are clustered together and joined the SMN across all states. Similar behavior is observed for the VN, where the ROIs of the VN form a cluster that appears across all the states. Even though the VN show a relatively stationary behavior, it is strongly connected to ROIs from other RSNs in some states. As can be seen in Fig. 3.13(b), a cluster consisting of the visual network ROIs and the PCUN is observed. This is in line with previously reported results in [55] and [127], where a network state that included PCUN and the ROIs of the VN has been identified. Furthermore, the VN cluster includes the ROIs of the VN and the superior parietal gyrus (SPG) across States 1, 2 and 5, and this agrees with the results reported in [128]. Another observation across the states is related to the dynamics of the SCN. As can be seen in Fig. 3.13(b-d), the ROIs of the SCN exhibit variation in their community membership. In State 1, a complete separation between subcortical and cortical ROIs is observed. Moreover, during this state the ROIs of the DMN join other ROIs from different RSNs to form small clusters. This observation suggests that State 1 represents a light sleep state, which is commonly observed in rs-fMRI as in [55]. 3.4.5.4 State transition profiles In order to understand the dynamics of the FCNs, we studied the state transition profiles for the different subjects over time. These profiles reflected two important phenomena. First, the state transition profiles followed relatively different patterns over time suggesting variation in the brain activity across subjects during resting state. Moreover, for some subjects the brain may fluctuate 77 between a few number of states and not necessarily occupy all the detected FC states. Second, we observed that the community structure did not fluctuate quickly between states over time. State transition profiles for selected subjects are shown in Fig. 3.11. 3.4.5.5 Consistency of the RSNs across time In order to calculate the consistency of a subnetwork, we first calculated the recruitment coefficient defined in [129] for each ROI within the subnetwork. The ROI’s recruitment coefficient is defined as the probability that this ROI is in the same cluster as other ROIs of its own subnetwork at each time point. The subnetwork consistency is then calculated as the average of the recruitment coefficients of all ROIs that correspond to the subnetwork. After the application of 3D-WTA and 4D-WTA to the rs-fMRI FCNs, the consistency of the RSNs across time and subjects was computed to quantify their dynamic behavior. The variation in consistency across the different RSNs is shown in Fig.3.12(a) for 3D-WTA and Fig.3.12(d) for 4D-WTA. VN, SCN, SMN and AN have relatively higher consistency across subjects implying that the ROIs within each network are strongly connected. In other words, these networks experience less variability over time. On the other hand, DMN, CCN and BiN are less consistent over time, indicating that these RSNs are more dynamic over time. By comparing the variation in consistency calculated by 4D-WTA and 3D-WTA, we observe that the consistency results by 3D-WTA are less representative, since the variation in consistency across different RSNs is very similar. Thus, it is hard to reach a definitive conclusion about the dynamics of these RSNs using 3D-WTA. In order to compare the results of 3D-WTA and 4D-WTA with other state of the art algorithms, the algorithms that performed the best among others in the simulated data are selected. These methods are PCQ-NC and Genlov. The variation in consistency across all subjects for the different 78 RSNs is shown in Fig.3.12(b) for PCQ-NC and Fig.3.12(c) for GenLov. From these figures, it can be seen that the consistency results from both methods follow similar patterns that align with the results from both 3D-WTA and 4D-WTA. However, 3D-WTA and 4D-WTA show less variation in the consistency of the different RSNs across subjects, indicating the stability of our methods compared to existing algorithms. 3.4.5.6 Comparison Between 3D-WTA and 4D-WTA In order to study the degree of agreement between the community structure detected by 3D-WTA and 4D-WTA for the different subjects, the variation of information is calculated between the clustering results found by 4D-WTA and the clustering results found by 3D-WTA for each subject separately. The average VI across time and subjects has a mean of 0.46 and a standard deviation of 0.0507. These results indicate that in general, there is a high agreement between group and individual community structures. 3.5 Conclusions In this chapter, we introduced a new tensor based framework to detect and track community struc- ture in temporal and temporal multi-layer networks using 3-way and 4-way tensors, respectively. The framework is implemented by two approaches to detect the community structure in temporal networks: 3D-WTA and RTTA. In 3D-WTA, a fixed length sliding window is used to construct the tensors, while a variable length sliding widow is used to construct the tensors in RTTA. On the other hand, we extended the 3D-WTA to 4D-WTA to detect the community structure in temporal multi-layer networks. In the proposed framework, either a 3-way or 4-way tensor is constructed at each time point by taking the past history across time or time and layers into account. The 79 community structure is detected by finding a common subspace through Tucker decomposition of the constructed tensors. Moreover, the proposed algorithms optimize the contribution of each time point and layer within the framework. Furthermore, a normalized cost function is introduced to track the changes in the community structure over time. The algorithms were tested on multiple simulated and real data sets and compared to the existing state of the art algorithms. The appli- cation of the proposed framework to simulated and real social networks shows that the proposed algorithms are able to detect the correct community structure and changes in the number of clusters over time even in noisy networks. A particular application of interest of the proposed work is to detect and track the community structure dynamics in FCN of the brain during cognitive control or resting state, with the major fo- cus on FCN from rs-fMRI. The application to functional connectivity brain network from EEG data revealed changes in the community structure across time that aligned well with the experimental setting and prior work on cognitive control. In the application of the proposed framework to the dFCN from rs-fMRI, the dFCNs are sum- marized into a set of FC states that repeat over time and subjects. The detected FC states are defined through their community structure. Moreover, state transition profiles which reflect the variance in the dynamic brain behavior across subjects during resting state are obtained. The de- tected community structures were evaluated through consistency measure. From this measure, it is clear that VN, AN, SCN and SMN are the most stationary RSNs, whereas DMN, CCN and BiN reconfigure themselves more frequently across time. In particular, the DMN showed the highest dynamic activity over time as expected. In addition, the results of 3D-WTA and 4-WTA showed high agreement in the community structure in terms of variation of information. However, the community structure detected by 4D- WTA reflected the RSNs’ dynamics and evolution across time better than 3D-WTA. This is due 80 to the fact that 4D-WTA is using more data from both time and subjects, and it preserves the topological structure of the FCNs. Future work will consider the extension of this framework to higher order tensor type data clustering across different modes. Moreover, we will consider reducing the computational cost of the algorithms by reducing the size of the constructed tensors. This can be achieved by adding extra terms or constraints to the objective functions. The major goal of this modification will be to select the time points or layers that can be included in the framework. A possible selection criterion could be dependent on selecting the time points or layers that are sparser than others, as this will guarantee the selection of the time points or layers with less noisy community structure. 81 (b) Figure 3.4: (a) The normalized cost function and the detected number of clusters for Flocks of Boids dataset using different algorithms. 3D-WTA results in the best estimate of the number of clusters especially after the change point, (b) Rand index comparison between 3D-WTA, AFFECT, PCM and PCQ for Flocks of Boids Dataset. 82 101520253035400.250.30.35Noormalized Cost3DTimeStepNormalized Cost3D10152025303540246810Comparing the Detected Number Of Clusters Using ModularityTime StepNumber Of Clusters WTAAFFECTPCMPCQ510152025303540102030405060708090100Time StepRand IndexRand Index Comparison WTAPCQPCMAFFECT Figure 3.5: Detected Change Points for Reality Mining Data. Figure 3.6: Detected Change Points for ERN Data. 83 51015202530350.220.230.240.250.260.270.28End of FallNormalized Cost Function Estimated by WTA for Reality Mining Data SetWeeksNormalized CostEnd of SpringSpring BreakStart of Spring−800−600−400−200020040060080010000.30950.30950.30960.30960.30960.30960.30960.30970.30970.3097Normalized Cost Function Estimated by RTTA for ERN DataTime (ms)Normalized CostERNPost−ERNPre−ERN Figure 3.7: Community structure for the ERN networks obtained by RTTA: (a) Pre-ERN, (b) ERN, (c) Post-ERN. 84 Figure 3.8: A flowchart of the application of 4D-WTA to rs-fMRI: (a) Time series from each ROI; (b) dFCNs constructed across time and subjects using Pearson correlation coefficient; (c) After random subsampling across time and subjects, 4 dimensional FC tensors are constructed for each sample. Each tensor consists of 30 random time points from each subject; (d) Tucker decomposition of each FC tensor; (e) Community structure found by 4D-WTA for each sample; (f) The detected community structure is summarized into 5 functional connectivity states defined through community detection using k-means. 85 (a) (b) Figure 3.9: (a) Distribution of weights across time for all samples, (b) Distribution of weights across subjects for all samples. 86 00.10.20.30.40.50.60.70.80.9100.050.10.150.20.25Weights across time for all samplesHistogram of distribution of weights across time for all samples00.10.20.30.40.50.60.70.80.9100.050.10.150.20.250.30.350.40.45Weights across subjects for all samplesHistogram of distribution of weights across subjects for all samples Figure 3.10: The similarity among the community structures obtained by 4D-WTA across the 200 subsamples across time and subjects. The similarity between the different community structures is quantified by variation of information. Samples with similar community structure are grouped together by k-means. Figure 3.11: State transition profiles for subjects 4, 7, 9 and 12. 87 00.050.10.150.20.250.30.350.41002003004000246Subject 04Time (sec)States1002003004000246Subject 07Time (sec)States1002003004000246Subject 09Time (sec)States1002003004000246Subject 12Time (sec)States (a) (b) (c) (d) Figure 3.12: Variation in RSNs’ consistency across subjects calculated by (a) 3D-WTA, (b) PCQ- NC, (c) GenLov, (d) 4D-WTA. 88 VNDMN1SCNSMNANBiLNCCN0.20.30.40.50.60.70.80.911.1Change in consistencyChange in consistency of RSNs across subjects calculated by 3D−WTA Medianµ25%−75%9%−91%VNDMN1SCNSMNANBiLNCCN0.40.50.60.70.80.91Change in consistencyChange in consistency of RSNs across subjects calculated by PCQ−NC Medianµ25%−75%9%−91%VNDMN1SCNSMNANBiLNCCN0.40.50.60.70.80.91Change in consistencyChange in consistency of RSNs across subjects calculated by GenLov Medianµ25%−75%9%−91%VNDMN1SCNSMNANBiLNCCN0.20.30.40.50.60.70.80.911.1Change in consistencyChange in consistency of RSNs across subjects calculated by 4D−WTA Medianµ25%−75%9%−91% (a) (c) (b) (d) Figure 3.13: Full and circular view of the community structure of the FC states found by 4D-WTA: (a) State 1 with 9 clusters; (b) State 2 with 4 clusters; (c) State 3 with 8 clusters; (d) State 4 with 7 clusters and (e) State 5 with 6 clusters. (e) 89 Chapter 4 Detecting Brain Dynamics During Resting State: A Tensor Based evolutionary Clustering Approach 4.1 Introduction With the advancement of neuroimaging technology, researchers are able to collect digital data for the brain across different modalities, spatial and temporal resolutions. With the growth in neuroimaging data, it is now possible to map human brain activity as a complex network. In particular, the nodes of the network correspond to the different regions of the brain and the edges are quantified through the interactions between these regions. In recent years, different approaches focused on understanding the functional mechanisms of human brain. A particular field of interest is understanding the spontaneous activity of the brain during resting state. This was accomplished by investigating the functional connectivity of the brain networks from functional magnetic resonance imaging (fMRI) during resting state. fMRI is a noninvasive neuroimaging modality that measures the changes of blood oxygenation level- 90 dependent (BOLD) signal. FC is defined as the statistical dependency between different brain regions and has been quantified through the computation of Pearson’s correlation between regions of interest (ROIs) [54–56]. Increased interest in studying resting state fMRI (rs-fMRI) has shown that the different brain regions of the motor cortex are active even in the absence of a motor task [57]. Since that time, rs-fMRI has been widely used as one of the major tools to investigate and explore FC [58, 59, 130]. Many of the current studies on rs-fMRI reported a set of functional connectivity networks that appears consistently in healthy subjects such as default mode network, visual network, somatomo- tor network, executive control network and auditory network [58,59]. A major network of interest during resting state is the default mode network (DMN), which consists mainly of posterior cingu- late gyrus, precuneus, anterior cingulate gyrus, angular gyrus and medial frontal gyrus. DMN has attracted a lot of attention since it reflects the default neural activity and is deactivated during task related events [122, 123, 131]. Early work on rs-fMRI assumed stationarity of functional connectivity, however, recent stud- ies revealed evidence for dynamic nature of FC during resting state [53, 60]. Different methods have been developed to study the dynamics of the functional connectivity networks (FCNs). These methods include clustering based methods [55], statistical state modeling methods [68] and sub- space analysis based methods [56]. Most of the current work focuses on extracting FC states and reduces the dimension of the dynamic FCNs (dFCNs) by vectorizing the connectivity matrices be- fore extracting the FC states. This procedure may result in the the loss of the topological structure of the FCN. These approaches also reduce the dFCNs into a few common basis functions or states without explicitly identifying the topological organization within each state. A well-known method to study the network organization is community detection [132]. Community detection algorithms have attracted lots of attention as the detected structure reflects the topological relationships within 91 the network [50, 133]. However, most of the current work on community detection focuses on detecting the community structure of static networks, such as standard spectral clustering [7]. Over the past decade, different evolutionary clustering approaches have been developed to de- tect the community structure of dynamic networks. These approaches include simple extensions of static clustering methods [31, 32], statistical modeling based methods [33], tensor based meth- ods [34, 134] and modularity based methods [35]. Extensions of static clustering methods such as k-means and agglomerative hierarchical clustering approaches were developed in [31] for dy- namic networks. A cost function that considers both the cluster membership at the current time and the deviation from the previous time point was introduced. In [32], two spectral clustering based frameworks were introduced: preserving cluster quality (PCQ) and preserving cluster member- ship (PCM). In both frameworks, the clustering depends on a cost function to guarantee temporal smoothness. However, this cost function requires a priori knowledge about the changes in the community structure and depends on the choice of a changing parameter. Under statistical model- ing based methods, authors in [33] assumed a statistical model for the evolution of the adjacency matrix and an adaptive forgetting factor for evolutionary clustering and tracking (AFFECT) was introduced. In AFFECT, static clustering methods were performed after smoothing the proximity between objects over time. However, the AFFECT depends on a stochastic block model for the adjacency matrix and requires higher computational cost compared to static clustering to estimate the forgetting factor. Under tensor based methods, a non-negative tensor factorization approach was introduced in [34]. In this approach, the community structure of the temporal network is detected using PARAFAC, where all the successive adjacency matrices in the temporal network were represented by a single 3-way tensor. However, this representation of the temporal network may not preserve the community structure of the different time points. Under modularity based methods, a statistical method based on greedy modularity optimization has been proposed in [35] 92 to identify the changing modular organization in temporal multiplex networks, yet this approach has some drawbacks since it depends on modularity optimization as discussed in [20]. In this Chapter, we introduce a two step dynamic community detection algorithm to identify the network community structure of dFCN of the brain. First, we introduce an information-theoretic function to reduce the dFCN across time and identify the time points that carry similar topological information to combine them as a multi-way array or a tensor. In the second step of the algorithm, a tensor based approach is developed to identify the community structure of the reduced dFCN over time and summarize it in a set of ”FC patterns” or ”FC states”. The contributions of the proposed work are four-fold. First, the proposed approach relies on quantum theory principle to reduce the whole dFCN into a few distinct network states that have similar topological information. This approach identifies the time points with similar network organization, unlike simple averaging methods, the proposed approach optimizes the contribution of each time point within each state through an optimized weighted average where the common subspace and the weights across time are optimized, simultaneously. Second, the proposed approach preserves the topological structure of the time points in each state, where the FC matrices from the original dFCN are used to identify the community structure within the state, unlike most recent studies that vectorize the network before clustering. Third, the proposed approach detects and tracks the community structure of the dFCN across time without any assumptions about the network structure or the number of clusters or community membership across time. Finally, the proposed approach provides evidence for the dynamic behavior of the RSNs and reveals the fluctuations in the community membership for the different ROIs. 93 4.2 Materials and Methods 4.2.1 Data and Preprocessing The data used in this chapter is from Bangor rs-fMRI dataset. This dataset is part of 1000 Func- tional Connectomes Project [115]. The dataset includes 20 healthy subjects (males, aged between 19 − 38). The rs-fMRI data was collected using a 3T Magnet scanner from all subjects while their eyes were open. TR equals to 2s yielding 34 slices and 265 time points. The pre-processing was done using SPM12 [116] and CONN [117] functional connectivity toolboxes. The structural images were registered to the mean functional image and spatially normalized to MNI space for each subject. Slice timing and motion correction were performed on every functional image. The functional images were regionally parcellated by automated anatomical labeling (AAL) atlas [118] and smoothed by spatial convolution with 4-mm FWHM Gaussian kernel. Confounds such as mo- tion parameters obtained from realignment and BOLD signals obtained from white matter and CSF masks were regressed out and band-pass (0.008-0.09 Hz) temporal filtering was applied to functional images of each subject. In order to study the brain dynamics over time, we focus our analysis on some well-known resting state networks (RSNs) including the default mode network (DMN), auditory network (AN), visual network (VN), somatomotor network (SMN), bilateral limbic network (BiN), subcortical network (SCN) and cognitive control network (CCN). These networks are commonly defined by a set of ROIs. In this study, the set of ROIs that are categorized within these RSNs are summarized in Table 4.1. [55, 56] 94 VN AN (43− 56) (79− 82) (21,22,27,28,37− 42,83,84,87,88) BiN DMN (3− 6,23− 26,31− 36,65− 68,85,85) SMN (1,2,17− 20,57− 60,63,64,69,70) SCN (71− 78) CCN (61,62,89,90,7− 16,29,30) Table 4.1: The ROIs that defines the RSNs. The numbers in the table refer to the ROIs defined in Table 1. 4.2.2 Dynamic Functional Connectivity Networks (dFCNs) 4.2.2.1 Dynamic Functional Connectivity Network Construction The dynamic functional connectivity networks are constructed for each subject by computing Pear- son correlation coefficient, ρ, between different ROIs (90 ROIs: with 45 ROIs in the left and 45 ROIS in the right hemispheres, listed in Table 1) using a sliding window approach with a rectan- gular window of length equal to 60 TRs and an overlap of 1 TR [55, 56, 60]. For each subject, a sequence of weighted and undirected graphs {A(t) k }t1,..., tM were constructed k ∈ Rn×n is the adjacency matrix representing the network at time step t for subject k, n is the total number of nodes (ROIs) and M is the total number of time k (i, j), where A(t) with A(t) k (i, j) = ρ (t) points. The final dFCN at each time point is averaged across all subjects [135–137] to obtain {A(t)}t1,..., tM, where A(t)(i, j) = 1 k (i, j) and A(t) ∈ Rn×n. ρ (t) K K ∑ k=1 4.2.2.2 Surrogate Data Generation for Testing Statistical Significance of the dFCN Before applying the proposed clustering approach to the dFCN, we first apply thresholding to the connectivity matrices to obtain sparse adjacency matrices with nonzero edges corresponding to the 95 significant connections. We generated surrogate data [138] from the ROI time series and we kept all edges that were significantly different from the null model. For each ROI, surrogate data was generated using iterative amplitude-adjusted Fourier trans- form (IAAFT) [139]. For each subject, we created 100 surrogate time series for each ROI. We then computed Pearson correlation coefficient between each pair of ROIs using a sliding rectan- gular window of length 60 TRs and an overlap of 1 TR. This process was repeated for all 100 surrogate time series. For each pair of ROIs, i and j, and for each window, the correlation values obtained from the null model were normally distributed with a sample mean of µ (t)(i, j) and a calculated the absolute z-score as z(t) standard deviation of σ (t)(i, j). For each edge, A(t) k (i, j) = |A(t) k (i, j), in the original dFCN of each subject, we k (i, j)− µ (t)(i, j)|/σ (t)(i, j) and removed all the k (i, j) < 3, which corresponds to a statistical test with significance of 99.8%. The v , k }t1,..., tM. The final z-score thresholded dFCN, {B(t)}t1,..., tM, is constructed at each time point edges with z(t) {B(t) by averaging {B(t) k }t1,..., tM across subjects, where B(t)(i, j) = 1 K K ∑ k=1 z(t) k (i, j). 4.2.3 Reducibility of Brain dFCNs The first step in our proposed approach is the reduction of dFCN. The goal of the reduction step is to cluster the time points that possess similar structure into a common partition to identify the distinct network states. The z-score thresholded dFCN, {B(t)}t1,..., tM, constructed from surrogate data is used to reduce the the original dFCN. The reduction method relies on quantum theory principles to group similar time points by maximizing the distinguishability between the reduced temporal network and the corresponding fully aggregated graph defined as, Bagg = detailed description and derivation of the method can be found in [140]. M ∑ m=1 B(tm). A The method can be summarized in three main steps: (i) computation of Jensen-Shanon di- vergence, DJS, between all pairs of time points in the temporal network, (ii) performing classical 96 hierarchical clustering of the M time points and, (iii) calculating the relative entropy, q, which is used as the quality function of the resulting partition. The partition that maximizes q is chosen. In the first step, DJS is computed between all pairs of time points, which is defined in terms of Von Neumann entropy of each graph, B(t), h(B(t)) = −tr(LB(t)log2LB(t)) = − n ∑ i=1 λ (t) i log2 λ (t) i , (4.1) where LB(t) = (DB(t)−B(t)) n B(t)(i, j) ∑ i, j=1 λ (t) 1 ,λ (t) density matrix and (cid:110) (cid:111) is the rescaled Laplacian of graph B(t) which has all the properties of a 2 , . . . ,λ (t) n are the eigenvalues of LB(t), tr, is the trace of a matrix and DB(t), is the diagonal degree matrix corresponding to B(t). Measuring the dissimilarity between any two pairs of time points in the temporal network can be achieved by computing DJS between them. Given the rescaled Laplacian of any pair of time points as Lti and Lt j, we can compute a new density matrix as L = 1 2 (Lti + Lt j), then DJS can be calculated as, DJS(Lti||Lt j) = h(L)− 1 2 (h(Lti) + h(Lt j)). (4.2) In the second step, classical hierarchical clustering of the M time points is performed using DJS between all pairs of time points as the dissimilarity measure. At each step of the algorithm, the time points that are separated by the smallest DJS are aggregated by summing up the corresponding adjacency matrices. The distances between the resulting new pairs of the reduced temporal network are then updated. This procedure is repeated M−1 times until we form the fully aggregated graph, Bagg. As a result of the hierarchical clustering algorithm, we get a dendrogram that is associated with the aggregation procedure. 97 In the third step, the relative entropy, q, is used as the quality function to quantify the distin- guishability between the fully aggregated graph, Bagg, and the reduced temporal network, Gred, where Gred : {G(1), G(2), . . . , G( ¯M)}, with ¯M ≤ M. The matrix, G(α), is either one of the z-score ¯M . The quality function thresholded matrices or the sum of two or more of them with α = 1, . . . , of the reduced network is defined as, q(Gred) = 1− ¯H(Gred) h(Bagg) , (4.3) where ¯H(Gred) is the average Von Neumann entropy of the reduced z-score thresholded dynamic network. Finally, the partition that maximizes the relative entropy is chosen, where the z-score thresholded matrices of the time points that have similar information are clustered together. The output of the reduction step is a set of multidimensional arrays or tensors, where the slices of each tensor consists of adjacency matrices from the original dFCN. The adjacency matrices included within each tensor are marked and selected from the original dFCN through the reduction step as they carry similar topological information. The reduced dFCN can be defined as Y : {Y (1), . . . , Y ( ¯M)} with ¯M ≤ M, where Y (τ) ∈ Rn×n×L(τ) T is the number of time points included in the tensor Y (τ), which refers to the third dimension of the tensor and T with τ = 1, . . . , ¯M and L(τ) might vary between multiple tensors. For each tensor, Y (τ), we define L(τ) as a vector of indices of the time points from the original dFCN included in the tensor. The lth frontal slice of the 3-mode tensor, Y (τ)(:,:,l), is the normalized adjacency matrix AL(τ)(l) , where l ∈ L(τ). N 98 4.2.4 Dynamic Network Clustering 4.2.4.1 HOOI Clustering Based Approach The evolutionary community structure of the reduced dFCN, Y , can be determined by identifying the community structure for each partition, Y (τ). Each Y (τ) can be represented as a multidimen- sional array and the optimal subspace U(τ) that describes the collection of adjacency matrices can be found through spectral clustering of the weighted average of the adjacency matrices. This re- sults in an optimization over both the subspace vectors matrix, U(τ), and the weight vector, W (τ), for each partition. The optimization problem in (3.3) can then be rewritten for each tensor as, (cid:32) (cid:33) (cid:107) U(τ)† max U(τ),W (τ) ∑ l∈L(τ) l AL(τ)(l) w(τ) N U(τ) (cid:107)2 F ,s.t U(τ)†U(τ) = I, W (τ) ≥ 0,(cid:107) W (τ) (cid:107)2= 1, (4.4) where w(τ) l are the weights for the adjacency matrices within Y (τ) and L(τ) refers to the vector of indices of the time points from the original dFCN included in the tensor. The first two modes of the tensor refer to the connectivity and the third mode refers to time. A variable size set of tensors are constructed to represent the reduced dFCN across time and (4.4) can be reformulated in terms of Tucker decomposition, for each tensor as follows, (cid:107) Y (τ) ×1 U(τ)† ×2 U(τ)† ×3 W (τ)† (cid:107)2 F , s.t U(τ)†U(τ) = I,(cid:107) W (τ) (cid:107)2= 1, (4.5) max U(τ),W (τ) where U(τ) is the basis along the first and second modes and W (τ) corresponds to the weighting vector across time. Since the networks are undirected, the components along the first two modes are the same. The optimization in (4.5) can be solved by higher-order orthogonal iteration (HOOI), where in each iteration either the matrix U(τ) or the vector W (τ) is fixed to optimize the other one. This approach is summarized in the pseudo code in Algorithm 4.6. 99 3: 4: 5: 6: 7: 0 by HOSVD of Y (τ). i+1 as the dominant left singular vector of Y(τ) Unfold the tensor Y (τ) along mode 3 to get Y(τ) (3). Initialize U(τ) i = 0 Calculate W (τ) Compute ¯A = ∑ l∈L(τ) Obtain U(τ) if (cid:107) U(τ) i+1 by eigenvalue decomposition of ¯A. (W (τ) i+1)lAL(τ)(l) , where AL(τ)(l) (cid:107)2 F > ε then (cid:16) N N (cid:17) . i ⊗ U(τ) U(τ) = Y (τ)(:,:,l). (3) i Algorithm 4.6 HOOI clustering based approach. Input: Y : {Y (1), . . . , Y ( ¯M)},L = {L(1), . . . ,L( ¯M)}. Output: Clustering Labels. 1: for τ = 1 : ¯M do 2: i+1 − U(τ) i = i + 1. goto line 6. i 8: 9: 10: 11: 12: 13: 14: 15: end for end if Use modularity to determine the number of clusters. Normalize the columns of U(τ). Apply k−means on U(τ) to obtain clustering labels. 4.2.4.2 Choosing the Number of Clusters The number of the clusters for each state were determined by modularity [10]. In the proposed approach, we calculated the modularity of the detected community structure for each tensor using the adjacency matrices for the time points included in each tensor separately. Then, an average modularity across all of the time points within that FC state was calculated, and the number of clusters that maximized the average modularity was chosen. 4.3 Results 4.3.1 dFCN Reduction The first step of the proposed approach is to combine the time points that carry similar topological information, so that the time points with similar FC patterns are grouped together. Consequently, 100 the dimension of the dFCN is reduced while the distigushibility between the reduced dFCN and the fully aggregated graph is maximized. We performed reducibility for the subject averaged dFCN following the procedure explained in 4.2.2.2 and 4.2.3. The Jensen-Shannon distance matrix and quality function for the temporal network obtained after each step of the hierarchical clustering algorithm are shown in Fig.4.1 and Fig.4.2, respectively. Figure 4.1: Jensen-Shannon divergence between all pairs of time points in the dFCN. Figure 4.2: The dendrogram resulting from hier- archical clustering and the corresponding values of q. The dashed red line identifies the cut point of the dendrogram from the second peak of q. From the dendrogram shown in Fig.4.2, the reduced dFCN consists of 12 FC states represented by 12 tensors. The time points included in each tensor carry similar information. As can be seen from Fig.4.2, we cut the dendrogram using the quality function curve. The cut point is chosen as the second peak of the quality function, since the partition resulting from the first peak produced 101 Time PointsTime Points 204060801001201401601802002040608010012014016018020000.10.20.30.40.50.60.70.80.91DJS00.511.522.533.500.050.10.150.20.250.30.35Distanceq(.)00.511.522.53Time PointsDistance 18 tensors and some of these tensors corresponded to less than 5 time points. 4.3.2 FC States Summarization The HOOI based clustering approach is applied to analyze the brain functional connectivity dy- namics. A set of 12 tensors are constructed corresponding to the reduced dFCN. The third di- mension of the tensors represents the number of time points included in each partition. The pro- posed approach is applied to each tensor to detect its community structure. The constructed ten- sors were approximated by Tucker decomposition as Y (τ) ≈ C (τ) ×1 U(τ) ×2 U(τ) ×3 W (τ), where C (τ) ∈ Rr×r×1 is the core tensor, U(τ) ∈ R90×r is the orthogonal low rank component matrix along the connectivity mode and W (τ) ∈ RL(τ) T ×1 is the weighting vector across time mode. r is the rank which corresponds to the number of clusters. The number of clusters for each tensor is determined using modularity, as explained in 4.2.4.2, and varied between 3 and 4. State 1 2 3 4 5 6 7 8 9 10 Tensor Size 90× 90× 16 90× 90× 19 90× 90× 25 90× 90× 32 90× 90× 19 90× 90× 20 90× 90× 19 90× 90× 13 90× 90× 14 90× 90× 29 Time Points (seconds) (120− 140)(154− 162) (141− 153)(163− 188) (189− 206)(246− 256)(282− 298)(301− 302) (207− 245)(257− 281)(299− 300) (303− 314)(330− 342)(386− 396) (315− 329)(356− 358)(452− 470) (343− 355)(397− 422) (360− 384) (423− 451) (471− 503)(505− 530) Reoccurrence percentage 7.8% 9.2% 12% 15.5% 9.2% 9.7% 9.2% 6.3% 6.7% 14% Table 4.2: Key Attributes of the Tensors Constructed from the Reduced dFCN. In this chapter, we are investigating the whole brain dynamics by detecting the community 102 structure of the brain over time. This is achieved by summarizing the dynamic behavior of the brain by extracting a set of “FC states”. As the detected community structure across some states is the same, the final resulting FC states are summarized into 10 states. Table 4.2, summarizes the key attributes of the tensors constructed to detect the community structure of the dFCN states. Fig. 4.3 shows the detected community structures for all of the FC states. For each state, the optimally weighted average of the adjacency matrices within the state is given along with the clus- ter assignments (left). The clusters are labeled using the major ROIs detected within the cluster, i.e. the cluster might contain ROIs from multiple RSNs, but it is labeled by the the most domi- nant RSN within the cluster. Moreover, the circular graph (right) shows the detected community structures using circularGraph toolbox [1] and the connections between the ROIs of the different RSNs. The extracted FC states are analyzed with respect to two aspects. First, the frequency of oc- currence of each FC state is computed across time. Second, the variability in the community membership of the core ROIs corresponding to different RSNs is investigated across FC states and the reconfiguration of the community membership of the ROIs is analyzed. The different FC states are first analyzed based on their frequency of occurrence. States 4 and 10 are the two states with the highest reoccurrence percentage with relative frequency of occurrence at 15.5% and 14%, respectively. In State 4, a cluster which includes most of the core ROIs known to belong to DMN, is detected. State 10, on the other hand, has a large SMN cluster that also includes ROIs of the SCN, CCN and BiN. The reoccurrence percentage of the remaining states varied between 6.3% and 9.7%, as shown in Table 4.2. For the variability in the community membership of the ROIs, we observed that some ROIs have a relatively stable community membership across the multiple states, i.e. these ROIs were clustered together across the different states. For example, ROIs of the VN revealed a consistent 103 community membership across all states similar to the results in [141]. On the other hand, the VN expended to include the superior parietal gyrus (SPG) during States 1, 2, 5, 6, 8, 9 and 10. Similar findings were reported in [128], where the innovation-driven co-activation patterns (iCAPs) from rs-fMRI were extracted and an iCAP that consists of ROIs of the VN and SPG has been identified. Another set of ROIs that clustered consistently together across all states are the core ROIs of the SMN, including paracentral lobule (PCL), precentral gyrus (PreCG), postcentral gyrus (PoCG) and supplementary motor area (SMA). Even though these ROIs were clustered together across all states, the SMN was extended to include ROIs from the SCN in States 1 − 4, 9 and 10 as previously noted in [55]. Also, the HIP is included within the SMN across States 1− 4, 6 and 10. In contrast, States 5 and 8 exhibited a separation of the ROIs of the SMN and SCN. Moreover, we observed that the ROIs of the AN, superior temporal gyrus (STG) and heschl gyrus (HES), were co-activated with the SMN across all states and this agrees with the results found in [128]. Similar to the other RSNs, some of the ROIs of the SCN were relatively stable, such as putamen (PUT) and pallidum (PAL), whereas thalamus (THA) and caudate (CAU) have a variation in their community membership between DMN, SCN and SMN [128, 142]. Furthermore, the ROIs of the BiN exhibited a dynamic behavior across different states. For instance, in States 5 and 7− 9, the hippocampus (HIP) and parahippocampal gyrus (PHG) are included in the DMN, as these ROIs are known for playing an important role in long-term memory [124, 125], previous studies comprised them as part of the DMN [126]. During States 1-3, the HIP and AMYG joined the SMN and PHG joined the DMN. On the other hand, the ROIs HIP, PHG and AMYG are clustered together during State 10. As the DMN is a major network of interest during resting state [122,123], we will present some interesting features of the detected DMN cluster across the different states. In State 4, which scored the highest reoccurrence percentage, a large DMN cluster is detected. This cluster consists of the 104 core ROIs of the DMN, including precuneus (PCUN), Posterior cingulate gyrus (PCG), anterior cingulate gyrus (ACG), angular gyrus (AG), superior frontal gyrus (SFG), middle frontal gyrus (MFG), inferior frontal gyrus (IFG) and middle temporal gyrus (MTG). Most of the ROIs of the DMN were stable across all states except for the PCUN. The PCUN exhibited different community membership across the FC states, and it is known to play a crucial role in the DMN [127, 143]. In particular, it is affiliated with DMN cluster only in State 4. On the other hand, a consistent co-activation between PCUN and the ROIs of the VN in States 1, 3, 5, 6 and 10 is revealed as can be seen in Fig. 4.3, and this agrees with previously reported results in [55, 127], where a FC state that included PCUN and the ROIs of the VN has been identified. Moreover, States 7-9 detected a cluster that revealed co-activation between PCUN and the ROIs of SCN and CCN, similar to the results in [55, 144]. By studying the community structures detected across the different FC states, we observe that some RSNs show a relatively stationary behavior, while other RSNs show a dynamic behavior. From the community structures shown in Fig. 4.3, it can be seen that VN and AN are more stationary across the states, where the ROIs of these networks are clustered together across multiple states and experienced less variability in their community membership. On the other hand, the DMN, SMN, SCN, BiN and CCN exhibited a dynamic behavior across multiple states, as most of the ROIs of these RSNs keep changing their community membership across states. Moreover, in some states the aforementioned RSNs joined each other to form bigger clusters. For instance, SMN, SCN,AN and BiN formed one cluster during States 1, 2, 4, 6, 9 and 10 and DMN and ROIs from CCN and SCN formed one cluster during States 2, 5 and 6. 105 4.4 Conclusions In this chapter, we introduce a two step dynamic community detection algorithm to investigate the whole brain FC dynamics during resting state. The first step in the proposed approach relies on an information-theoretic function to reduce the dimensionality of the dFCN across time. This reduction step identifies the time points that possess similar topological information across time and combines them as a tensor. The second step introduces a tensor based clustering approach to detect the community structure of the reduced dFCN and summarizes it into a set of FC states that reoccur across time. The extracted FC states provide evidence for the temporal evolution of the brain dFCN, as the ROIs of the different RSNs reconfigure their community membership across time. The extracted FC states are evaluated through the frequency of reoccurrence across time and the variability in the community membership of the ROIs across the different states. The results show that some RSNs are more dynamic across time than others. In particular, we observed that the VN and AN are the most stationary networks across states, whereas DMN, SCN, BiN, CCN and SMN reconfigure their community membership more often across the different states. Moreover, the results show that the FC state that is repeated most frequently across time is State 4, which has a cluster that includes most of the core ROIs of the DMN. This result provides further evidence for the important role played by the DMN during resting state. The proposed approach makes some key contributions to the field of dFCN analysis. First, the proposed algorithm depends on quantum theory principles to reduce the dFCN across time into a set of FC states. Second, the proposed approach uses a tensor representation for the extracted partitions to detect the community structure across them, and this presentation preserves the topo- logical structure of the dFCN. Third, the proposed tensor based clustering approach optimizes the 106 contribution of each time point within each state. Finally, the proposed approach detects and tracks the community structure and its evolution in dFCN without any prior assumptions. Future work will focus on different extensions of the proposed approach. First, the reduction of the dFCN will be performed across time and subjects in order to construct a reduced multi-layer temporal FCN network. Second, FC states will be extracted across time and subjects and will be used to evaluate and compare the variability in the dFCN community structure between group and individuals. 107 Figure 4.3: The detected community structure for the FC States (1− 10). For each state, (left) the optimized weighted average of the adjacency matrices within the state. Each cluster is denoted by the main ROIs detected within the cluster. The clusters shown in the circular graphs (right) refer to the different RSNs defined in literature, and the connections between them are detected by the proposed algorithm. The number of clusters for the FC states varies between 3 and 4. 108 Chapter 5 Temporal Block Spectral Clustering for Multi-layer Temporal Functional Connectivity Networks 5.1 Introduction A network representation is useful for describing the structure of a large variety of complex sys- tems in the biological, social and physical sciences [145]. Traditional studies of networks generally assume that nodes are connected to each other by a single type of static edge. However, this as- sumption is almost always an oversimplification, and ignores important network dynamics such as time dependence. Multi-layer networks explicitly incorporate multiple channels of connectivity in a system. A fundamental aspect of describing multi-layer networks is defining and quantify- ing the interconnectivity between different layers. One common approach to characterizing these connections is community detection. While different community detection approaches have been developed for static networks [7, 9–11], there is a growing need to develop algorithms that detect the community structure 109 in multi-layer networks [6, 37]. Examples of multi-layer community detection methods include temporal clustering approaches [31–33], subspace learning based methods [146], stochastic block model (SBM) based multi-layer community detection methods [147, 148] and tensor based meth- ods [149, 150]. More recently, a multi-objective optimization based genetic algorithm [41] was proposed for detecting and tracking community structure in dynamic multi-layer networks. How- ever, none of the aforementioned methods take the inter-layer couplings into account and are thus mere extensions of multi-view data clustering approaches. Recently modularity optimization and spectral clustering based methods for multi-layer net- works with inter-layer connections have been proposed to address this limitation [35,151]. In [35], a modularity optimization framework to determine the community structure in multi-slice net- works is introduced. Since this framework depends on modularity, it has multiple drawbacks as discussed in [20]. Moreover, the approach is limited to the case where each node is connected to only itself in the adjacent time windows, i.e., multiplex networks with the inter-layer coupling set to a constant parameter. In [151], a spectral clustering method to detect the community structure in multi-layer networks with inter-layer couplings is introduced. However, this approach is limited as it constructs a single block adjacency matrix to represent all the graphs in the network and cannot capture the temporal variation in the community structure. Therefore, there is an increasing need to look at the dynamic community structure of multi- layer temporal networks with unconstrained inter-layer or inter-temporal connectivity [6,152–154]. In this chapter, a framework to detect and track the community structure in multi-layer temporal networks (MLTNs) is introduced. In the proposed approach, the temporal relationships between all node pairs within and across time window are modeled using a sliding window correlation approach. In this manner, non-trivial intra- and inter-adjacency matrices are constructed resulting in temporal supra-adjacency matrices. The dynamic community structure is detected by applying 110 spectral clustering on these temporal block supra-adjacency matrices. This work makes significant contributions to the field of dynamic community detection and differs from existing multi-layer community detection methods in some key ways. First, unlike the work in [35], the inter-layer adjacency matrices across time are not limited to the interactions between each node and itself across time and consider the interactions across all nodes and all time points. Second, unlike the work in [151] that constructs a single block matrix for the whole network, the community structure at each time point is updated by taking the history of networks into account. A particular focus of this chapter is community detection in dynamic functional connectivity networks (dFCNs) of the brain. More specifically, the dFCNs are modeled for each subject as a MLTN where the different brain regions correspond to the different nodes and the different time windows correspond to the different layers. 5.2 Methods In this chapter, a temporal block spectral clustering approach is introduced to detect and track the community structure in temporal multi-layer temporal networks. In the proposed approach, a set of supra-adjacency matrices that captures the intra- and inter-temporal relations is constructed as follows. 5.2.1 Construction of multi-layer temporal networks 5.2.1.1 Construction of a temporal network from signals Given a set of multivariate signals, {x1(t),x2(t), . . . ,xn(t)} with t ∈ {t1,t2, . . . ,tM}, a set of adja- cency matrices is constructed at each time point describing the temporal network with the signals corresponding to the nodes in the networks. A sliding window of length W is used to construct 111 both the intra-adjacency matrix, A(tm), with tm ∈ {tW ,tW +1, . . . ,tM}, and the inter-adjacency matrix, A(tk,tl), with tk,tl ∈ {tW ,tW +1, . . . ,tM}. A(tm) captures the correlations between the different signals within a time window W , whereas A(tk,tl) captures the correlations between signals across different time windows. A(tk,tl) is defined as:  ,  A(tk,tl) = ρW (tk,tl ) x1x1 ρW (tk,tl ) x1x2 . . . ρW (tk,tl ) x1xn ρW (tk,tl ) x2x1 ρW (tk,tl ) x2x2 . . . . . . ... ... ... ... ρW (tk,tl ) xnx1 ρW (tk,tl ) xnx2 . . . ρW (tk,tl ) xnxn where ρW (tk,tl ) xix j = ρ(xi(tk−W +1 : tk),x j(tl−W +1 : tl)) is the absolute correlation between xi in time window W (tk) and x j in time window W (tl). The constructed A(tk,tl) measures the similarities be- tween each node in window W (tk) and all other nodes in window W (tl), unlike previous work that focused mostly on node-to-node edges [35]. A(tm) is considered as a special case of A(tk,tl), where A(tm) = A(tm,tm). Note that A(tl,tk) = A(tk,tl)†. 5.2.1.2 Construction of multi-layer temporal network In this chapter, the variations in the community structure of temporal networks is studied. In order to achieve this, a set of block supra-adjacency matrices is constructed to reflect the relationships between the different networks across time. In the proposed approach, a sliding window technique is used over the time-varying networks A(tm) to construct a set of intra- and inter- block adjacency matrices, Aintra and Ainter, respectively. Each window considers T consecutive networks to construct: 112  A(tm(cid:48) ) intra = A(tm(cid:48)−T +1) 0 ...  , and A(tm(cid:48) ) 0 A(tm(cid:48) ) inter =  0 A(tm(cid:48)−T +1,tm(cid:48)−T +2) A(tm(cid:48)−T +1,tm(cid:48) ) A(tm(cid:48)−T +2,tm(cid:48)−T +1) ... 0 ... ... ... 0  , (5.1) inter ∈ RnT×nT . Note that T is different where tm(cid:48) ∈ {tW +1, . . . ,tM}, T < M −W + 1 and A(tm(cid:48) ) from W used in Section 5.2.1.1, where T is the number of networks included in constructing A(tm(cid:48) ) intra and A(tm(cid:48) ) inter , while W is the signal length used to construct the adjacency matrices. intra,A(tm(cid:48) ) For each set of T consecutive networks, a supra-adjacency matrix with the intra-adjacency ma- trices on the diagonal and the inter-adjacency matrices on the off-diagonal is constructed. As this matrix is symmetric with positive entries, the corresponding Laplacian is positive semi-definite. Supra ∈ RnT×nT is A(tm(cid:48) ) A(tm(cid:48) ) Supra = A(tm(cid:48) ) intra + A(tm(cid:48) ) inter. (5.2) A supra-normalized Laplacian matrix can then be constructed following the definition in [151,155], Supra = D(tm(cid:48) ) L(tm(cid:48) ) Supra − 1 2 (L(tm(cid:48) ) intra + L(tm(cid:48) ) inter)D(tm(cid:48) ) Supra − 1 2 , (5.3) 113 intra = D(tm(cid:48) ) intra − A(tm(cid:48) ) where L(tm(cid:48) ) the degree matrices of A(tm(cid:48) ) intra and L(tm(cid:48) ) intra and A(tm(cid:48) ) inter = D(tm(cid:48) ) inter and A(tm(cid:48) ) inter, with D(tm(cid:48) ) inter − A(tm(cid:48) ) Supra , respectively. intra, D(tm(cid:48) ) inter and D(tm(cid:48) ) Supra defined as 5.2.2 Temporal block spectral clustering approach (TBSCA) Block spectral clustering [155], [151] simplifies the clustering problem of multi-layer networks to the standard spectral clustering of a single LSupra that corresponds to all graphs in the network. The advantage of this approach is that it can provide a globally optimal solution for the clustering problem using the eigenvectors of LSupra. In order to obtain temporally smooth communities, it is important to determine a community structure that simultaneously fits the network structure at time t and does not deviate from the past community structures. To accomplish this objective, a temporal block spectral clustering approach (TBSCA) is introduced. In this approach, a set of supra-adjacency matrices is constructed as described in (5.1) and (5.2) across time to capture the dynamics of the network. From each A(tm(cid:48) ) Supra, the corresponding supra-normalized Laplacian is constructed, L(tm(cid:48) ) Supra, for T consecutive networks using (5.3). The clustering problem is then formulated as, (cid:16) ˆU(tm(cid:48) )†L(tm(cid:48) ) Supra ˆU(tm(cid:48) )(cid:17) min ˆU(tm(cid:48) )∈RnT×rm(cid:48) trace s.t. ˆU(tm(cid:48) )† ˆU(tm(cid:48) ) = I, (5.4) ˆU(tm(cid:48) ) can be obtained from the eigendecomposition of L(tm(cid:48) ) Supra. The number of clusters is deter- mined using the eigen-gap criterion [7]. By applying k−means on ˆU(tm(cid:48) ), a cluster label vector C ∈ R1×nT is obtained for the nT vertices. Finally, the cluster labels for time point tm(cid:48) are obtained as C(1,nT − n + 1 : nT ). 114 5.3 Results 5.3.1 Simulated networks A temporal network with 150 nodes is generated for 60 time points. This temporal network is created with 3 clusters within time intervals 1− 20 and 40− 60 and 2 clusters within time inter- val 21− 40. At each time point, Aintra and Ainter are constructed. The intra- and inter-adjacency matrices are generated where the within- and between-cluster edges are selected randomly from a uniform distribution in the range [0,1] and [0, 0.5], respectively. 30% of the within-cluster and 5% of the between-cluster edges are randomly set to 1. The TBSCA is implemented to detect and track the community structure of the network over time with T = 4 time points. This sim- ulation is repeated 100 times. The number of clusters at each time point is detected using the eigengap criterion. The proposed algorithm is able to detect the correct community structure and the change points across time as shown in Fig. 5.1. The performance of the proposed approach (TBSCA), BNLSC [151] and the generalized Louvain for multi-layer modularity maximization (GenLov) [35] is compared using Variation of information (VI) metric [4]. A VI value of 0 means the partitions are exactly the same and as the VI value increases, the degree of agreement between partitions decreases. The results in Fig.5.1 indicate that the proposed algorithm is very compara- ble to the GenLov method, which is considered as the state-of-the-art algorithm for detecting the community structure in temporal networks, and is better than BNLSC. 115 Figure 5.1: Comparison between the detected community structure by TBSCA, BNLSC and GenLov and the ground truth for the synthetic data set in terms of variation of information. The number of clusters changes from 3 to 2 at time point 21, and from 2 to 3 at time point 41. 5.3.2 Community Detection in dynamic functional connectivity networks (dFCNs) Resting state fMRI (rs-FMRI) is commonly used to investigate spontaneous brain activity. Previous work [55,59] indicates that there is a set of resting state networks (RSNs) that appears consistently in healthy subjects. These RSNs are default mode network (DMN), visual network (VN), somato- motor network (SMN), cognitive control network (CCN), bilateral network (BiN) and auditory network (AN). Even though early studies in rs-fMRI assume stationary functional connectivity, recent studies revealed evidence for temporal evolution of the brain functional connectivity during resting state [53, 60]. 5.3.2.1 Data and preprocessing Bangor rs-fMRI dataset (http://www.nitrc.org/projects/fcon1000) which includes 20 healthy sub- jects (males, aged between 19−38) is used in this chapter. The rs-fMRI data was collected using a 3T Magnet scanner from all subjects while their eyes were open. TR equals to 2s yielding 34 slices 116 010203040506000.020.040.060.080.10.120.140.16Time pointsVariation of informationComparison between the detected community structure and the ground truth using VI TBSCABLSCGenLov and 265 time points. The pre-processing was done using SPM12 (http://www.fil.ion.ucl.ac.uk/spm/) and CONN (http://www.nitrc.org/projects/conn) toolboxes. The structural images were registered to the mean functional image and spatially normalized to MNI space for each subject. Slice timing and motion correction were performed on every functional image. The functional images were re- gionally parcellated by automated anatomical labeling (AAL) atlas [118] to 90 regions of interest (ROIs) and smoothed by spatial convolution with 4-mm FWHM Gaussian kernel. Confounds such as motion parameters obtained from realignment and BOLD signals obtained from white matter and CSF masks were regressed out and band-pass (0.008-0.09 Hz) temporal filtering was applied to functional images of each subject. 5.3.2.2 Dynamic Community structure The proposed approach is applied to dFCNs constructed from rs-fMRI for each subject following the steps described in Sec. 5.2.1 with W = 60 (2 minutes) similar to previous studies [56]. The constructed dFCN consists of 206 time points. The TBSCA with T = 2 networks is applied to detect and track the community structure for each subject separately. The detected number of clusters varied between (2− 7) for the different subjects across time. The application of the TBSCA to multi-layer temporal networks for each subject resulted in a community structure for each subject at each time point. For a typical subject, Fig. 5.2 shows the similarity between the detected community structures across consecutive time points measured by VI. From Fig. 5.2, it can be seen that the detected community structure changes smoothly across time and there are certain time points where the structure shows significant changes indicating the dynamics of resting state connectivity. Fig. 5.3, displays the detected community structure at some of these change points. In an attempt to quantify the similarity of the community structure across time and subjects, a 4100×4100 similarity matrix is constructed to reflect the degree of agreement 117 Figure 5.2: Agreement between the detected community structures across time for subject 1 mea- sured by variation of information (VI). The dashed lines refer to the detected change points. between the detected community structures across time and subjects in terms of VI, as shown in Fig. 5.4. From Fig. 5.4, it can be seen that the dFCNs exhibit a smooth variation in the community structure across time for a given subject indicating the dynamic behavior of the FCNs during rest. Furthermore, the community structure of the dFCNs for the different subjects follows different patterns across time as during resting state the subjects are not necessarily in the same network state at the same time. 5.3.2.3 Consistency of the RSNs across time In order to quantify the dynamics of the RSNs across time and subjects, the consistency of each RSN is calculated. In order to calculate the consistency of a RSN, the recruitment coefficient defined in [129] is calculated for each ROI within the RSN. The ROI’s recruitment coefficient is defined as the probability that the ROI is in the same cluster as other ROIs of its own subnetwork at each time point. The RSN’s consistency is then calculated as the average of the recruitment coefficients of all ROIs that correspond to that RSN. The consistency is calculated for the different RSNs across time and subjects after applying 118 Time pointsTime points 204060801001201401601802002040608010012014016018020000.050.10.150.20.250.30.350.40.450.5VIAgreement between detected community structures by TBSCA across time for subject 1 using variation of information (a) Time point 14 (b) Time point 40 (c) Time point 104 (d) Time point 200 Figure 5.3: Detected community structure from selected change points. The ROIs in different networks reconfigured their community membership across time. Visualized by circularGraph toolbox [1]. TBSCA to the dFCN. The variation in consistency across subjects is shown in Fig. 5.5. From Fig. 5.5, the VN, AN and SMN have a relatively higher consistency than the other RSNs. This reflects a relatively stronger connection between the ROIs within these networks over time. On the other hand, the ROIs of the DMN, SCN, CCN and BiN exhibit more variation in their community membership over time which indicates the dynamic behavior of these RSNs during rest. 119 Figure 5.4: Agreement between the detected community structures across time for all subjects measured by variation of information (VI). Figure 5.5: Consistency of different RSNs across subjects calculated from the detected community structure by TBSCA. 120 Subjects 5001000150020002500300035004000500100015002000250030003500400000.10.20.30.40.50.60.7VIAgreement between detected community structures across time between all subjects using variation of informationSubjectsVNDMN1SCNSMNANBiLNCCN0.40.50.60.70.80.91Change in consistencyChange in consistency of RSNs across subjects calculated from the detected community structure by TBSCA Medianµ25%−75%9%−91% 5.4 Conclusions In this chapter, a temporal block spectral clustering framework is introduced to detect and track the community structure of multi-layer temporal networks. In the proposed approach, a sliding win- dow correlation approach is used to model the pair-wise temporal relationships between all nodes within and across time windows. A set of intra- and inter-adjacency matrices is constructed and combined to create a set of temporal supra-adjacency matrices. The dynamic community structure is then detected by applying spectral clustering to these supra-adjacency matrices. The simulation results show that the proposed algorithm is able to detect and track the correct community structure over time. The proposed method is also evaluated on dFCNs constructed from rs-fMRI across time and subjects revealing dynamic connectivity patterns between the RSNs. Moreover, the detected community structures for the different subjects are found to follow different patterns across time as this is rs-fMRI. Finally, a consistency measure is used to quantify the dynamics of the RSNs across time and subjects. From this measure, certain RSNs such as the DMN, SCN, CCN and BiN have been found to exhibit more dynamic behavior during rest. 121 Chapter 6 Conclusions and Future Work 6.1 Conclusions In this thesis, we proposed various methods to detect and track the community structure in dy- namic and multi-layer networks. In order to better understand the community structure in these complex networks, we started our work by analyzing dynamic single-layer networks and then gen- eralized it to analyze dynamic multi-layer networks. A particular focus of this thesis is detecting the community structure in social networks and functional connectivity networks of the brain. In Chapter 2, we approached the problem of community detection in dynamic networks by introducing a low-rank + sparse estimation based evolutionary spectral clustering approach. The proposed method decomposes the network into low-rank and sparse parts and obtains smooth clus- ter assignments by minimizing the subspace distance between consecutive time points. The method tries to obtain smooth cluster assignments by minimizing the subspace chordal distance between consecutive time points. Moreover, the proposed method considers explicitly the changes in the number of clusters between the consecutive time points and reformulates the objective function de- pending on these changes. The proposed framework is robust to noise and outliers and can detect the community structure in both binary and weighted temporal networks efficiently. Furthermore, a cost function to track changes in the community structure across time is introduced through the 122 subspace distance measure. Therefore, unlike existing evolutionary clustering methods, the pro- posed algorithm does not require any prior knowledge or make assumptions about the community structure and does not assume any particular model for the network. The proposed approach is evaluated on both simulated temporal networks and social temporal networks. The results show that the proposed algorithm is able to detect and track the correct community structure over time even in noisy networks. As the method proposed in Chapter 2 is limited to temporal single-layer networks and can take limited history into account, a computationally efficient tensor based method is proposed in Chapter 3 to detect the community structure in dynamic single-layer networks dynamic multi- layer networks and track change across longer history. The proposed method relies on a tensor based representation of the dynamic network. A 3-mode tensor is used to represent the dynamic network, while a 4-mode tensor is used to represent the dynamic multi-layer network. In the case of dynamic single-layer networks, the framework is implemented through two approaches: 3D- windowed tensor approach (3D-WTA) and running time tensor approach (RTTA). The proposed framework uses a sliding window approach in order to take the past history of the networks into account and optimizes the contribution of each time point within the window. In 3D-WTA, a fixed length sliding window is used to construct the tensors, while a variable length sliding window is used in RTTA. For each window, a common subspace across all time points within the window is identified using Tucker decomposition. Furthermore, a normalized cost function is introduced to track the changes in the community structure over time. The performance of the proposed framework is evaluated using multiple synthetic and real networks including social networks and FCNs of the brain. The change points detected by the introduced normalized cost function reflect the true changes in these networks. Moreover, the application of proposed framework to the dFCNs of the brain constructed from EEG data during ERN and from rs-fMRI revealed changes in the 123 community structure across time that is well-aligned with prior work. As one of the major applications of our work is exploring the dFCNs from rs-fMRI, we applied 4D-WTA to identify and track the brain network community structure across time and subjects. In particular, this approach is adopted to study the temporal evolution of communities in the resting state networks. In 4D-WTA, the contribution of each subject and time point within the framework is optimized. Moreover, the dFCNs are summarized into a set of FC states defined through their community structures that repeat over time and subjects. The detected community structures are evaluated through a consistency measure and the dynamic behavior of the FCNs is compared to the results from prior work. In Chapter 4, we introduce an information-theoretic approach to reduce the dynamic network and identify the time points that carry similar topological structure to aggregate them into a tensor. A tensor based spectral clustering approach, similar to the one in Chapter 3, is then applied to identify the community structure of the constructed tensors. A particular focus of the proposed approach is detecting the dynamics of the community structure in the FCNs constructed from rs- fMRI data. The objective of the proposed algorithm is to reduce the dynamic FCNs by grouping the time points that are topologically similar into a set of functional connectivity states. A tensor based approach is then performed to detect the community structure of these states. Finally, the detected community structure is summarized as a set of FC states that reoccur across time. The extracted FC states revealed the dynamic behavior of the resting state networks. The detected community structure of the FC states is found to be in line with results reported in previous studies of rs-fMRI. In Chapter 5, a temporal block spectral clustering framework is proposed to detect and track the community structure of multi-layer temporal networks. The main contribution of the proposed approach is the construction of data-driven intra- and inter-layer adjacency matrices. In particular, 124 the proposed approach considers all possible edges between nodes within and across layers to construct intra- and inter- layers adjacency matrices unlike existing methods that ignore inter-layer edges or considers inter-layer edges between the node and itself only, i.e. multiplex networks. This is performed by using a sliding window correlation approach to model the pair-wise temporal relationships between all nodes within and across time windows. A set of intra- and inter-adjacency matrices are constructed and combined to create a set of temporal supra-adjacency matrices. The community structure is then determined by applying spectral clustering to these supra-adjacency matrices. The proposed algorithm is evaluated on simulated networks and dFCNs from rs-fMRI across time and subjects and compared to results reported from previous studies. 6.2 Future Work In Chapter 2, the proposed approach considers the minimization of the subspace distance between consecutive time points to obtain smooth cluster assignments. Future work can extend the current work by considering longer history and minimize the distance between the current subspace and the subspaces of multiple previous time points. One main challenge of the proposed approach is the computational complexity where the dominant cost for each time step corresponds to the computation of the nuclear proximal operator and eigenvalue decomposition (EVD). Future work will consider reducing this cost by using randomized algorithms for SVD [156] or partial SVD and partial EVD as suggested in [157] [158]. In Chapter 3, the proposed approaches rely on a sliding window technique that is applied at each time point to identify the current community structure. Future work will consider optimal selection of the window length to determine how many time points from past history should be included. Moreover, we will reduce the computational cost by reducing the constructed tensors’ 125 sizes. This can be accomplished by adding extra constraints to the objective function. The added constraints can guarantee the selection of time points or layers that are sparser than others to be included in the framework, thus excluding noisy adjacency matrices. In Chapter 4, future work will focus on different extensions of the proposed approach for temporal multi-layer networks. In particular, the aggregation of the multi-layer network can be performed by considering both intra- and inter-layer edges since the current work focuses on intra- layer edges only. The aggregation process can be achieved by a two-step approach: First, a sliding window approach can be used to construct supra-adjacency matrices that consider both intra- and inter-layer connections between nodes within the time window. Second, the time windows that possess similar topological structure can be aggregated together. This may result in a better rep- resentation of the temporal multi-layer network and reflect a more accurate community structure. After the temporal multi-layer network is reduced, a tensor based community detection approach can be applied to detect the underlying community structure. In Chapter 5, the proposed approach detects the community structure in temporal multi-layer networks through a temporal block spectral clustering framework. Future research will investigate the extension of different definitions of graph cut to multi-layer networks and derive the corre- sponding spectral clustering algorithms. Moreover, a multi-layer network may also have multiple aspects. For example, each layer may correspond to a different frequency band and vary across time for neuronal networks. Extensions of spectral clustering for these types of multiple aspect multi-layer networks will also be considered. 126 APPENDIX 127 Update L(t)k+1 Update L(t) by keeping only the terms with L(t): L(t)k+1 argmin L(t)∈Rn×n γ1 2 + = (cid:107)L(t)(cid:107)∗ + (cid:104)Xk (cid:107)A(t) − L(t) − S(t)k(cid:107)2 1,A(t) − L(t) − S(t)k(cid:105) F + (cid:104)Xk 2,W(t)k − L(t)(cid:105) + γ2 2 (cid:107)W(t)k − L(t)(cid:107)2 F , which can be simplified to: (cid:107)L(t)(cid:107)∗ + γ1 2 (cid:107)L(t) − (A(t) − S(t)k + Xk 1 γ1 )(cid:107)2 F = argmin L(t)∈Rn×n γ2 2 + = argmin L(t)∈Rn×n (cid:107)L(t) − (W(t)k + (cid:107)L(t)(cid:107)∗ + = prox 1 γ1+γ2 (cid:107)L(t)(cid:107)∗( Xk )(cid:107)2 2 F γ2 γ1 + γ2 γ1Yk 2 1 + γ2Yk 2 γ1 + γ2 ), (cid:107)L(t) − γ1Yk 1 + γ2Yk 2 γ1 + γ2 (cid:107)2 F (1) (2) where Yk 1 = A(t) − S(t)k + Xk 1 γ1 , Yk 2 = W(t)k + Xk 2 γ2 is the proximity operator of the convex function f [88]. Let Y = and prox f (Y) = argminL∈Rn×n f (L) + 1 , γ = γ1+γ2 2 1+γ2Yk 2 γ1+γ2 γ1Yk 2(cid:107)L− Y(cid:107)2 and Y = F Y be the SVD of the matrix Y, singular value soft thresholding is then used to update QYΣΣΣYV† L(t)k+1 as: L(t)k+1 = QYΩ 1 γ (ΣΣΣY)V† Y, (3) where Ωτ is the element-wise thresholding operator defined as Ωτ (a) = sgn(a)max(|a|−τ,0). 128 Update S(t)k+1 Update S(t) by keeping only the terms with S(t): S(t)k+1 argmin S(t)∈Rn×n = (cid:107)S(t)(cid:107)1 + (cid:104)Xk (cid:107)S(t)(cid:107)1 + = argmin S(t)∈Rn×n 1,A(t) − S(t) − L(t)k+1(cid:105) + γ1 2 (cid:107)S(t) − (A(t) − L(t)k+1 + γ1 2 Xk 1 γ1 )(cid:107)2 F (cid:107)A(t) − S(t) − L(t)k+1(cid:107)2 F Similar to updating L(t)k+1, S(t)k+1 can be computed as: S(t)k+1 = prox µ γ1 (cid:107)S(t)(cid:107)1 (A(t) − L(t)k+1 + (A(t) − L(t)k+1 Xk 1 γ1 ) = Ω µ γ1 Update W(t)k+1 + Xk 1 γ1 ) Update W(t) by keeping only the terms with W(t): f (W(t)) = argmin W(t)∈Rn×n λ1Tr(U(t)k† W(t)k+1 = argmin W(t)∈Rn×n + (cid:104)Xk 2,W(t) − L(t)k+1(cid:105) + (cid:107)W(t) − L(t)k+1(cid:107)2 F , γ2 2 s.t W(t) = W(t)† , W (t) i j ≥ 0, W (t) ii = 0, (I − (D(t)k −0.5 ) W(t)(D(t)k −0.5 ) k ) )U(t) (4) (5) (6) 129 where D(t)k is the degree matrix calculated from the previous iteration. Eq. (6) can be then simpli- fied to: −0.5 ) W(t)(D(t)k −0.5 ) U(t)k ) argmin W(t) ∈Rn×n λ1Tr(U(t)k† (D(t)k (cid:107)W(t) − (L(t)k+1 − Xk 2 γ2 + s.t W(t) = W(t)† γ2 2 )(cid:107)2 F , W (t) i j ≥ 0, W (t) ii = 0. The gradient of the function f (W(t)) can be written as: ∇W(t) f (W(t)) = γ2(W(t) − L(t)k+1 + )−λ1(D(t)k ) −1 2 U(t)kU(t)k† (D(t)k −0.5 ) Xk 2 γ2 A closed form solution for W(t) can be calculated as: (D(t)k W(t)k+1 = L(t)k+1 − Xk 2 γ2 λ1 γ2 + −0.5 ) U(t)kU(t)k† (D(t)k −0.5 ) . (7) (8) (9) In order to guarantee that W(t) satisfies all the constraints, spectral projected gradient (SPG) [89] method is used to update W(t) by following the steps in Algorithm 1 with the projection operator, P(W(t)), defined as: P(W(t)) =  W (t) i j if i (cid:54)= j and W (t) i j ≥ 0. 0 if i = j or W (t) i j < 0. Update U(t)k+1 Update U(t) by keeping all the terms with U(t) only: U(t)k+1 = argmin U(t)∈Rn×r f (U(t)) = argmin U(t)∈Rn×r λ1 Tr(U(t)†ΦΦΦ(t)k+1U(t)) + λ2 d2 ch(U(t),U(t−1)), s.t U(t)†U(t) = I. (10) 130 2 ,U(t),λ1,γ2,ε. Algorithm 1 Spectral projected gradient method (SPG) Input: W(t),L(t),X(t) Output: W(t). 1: l ← 0 2: Initialize W(t)0 using Eq. (9). 3: σ ← 1. 4: while (cid:107) W(t)l+1 − W(t)l (cid:107)2 5: 6: 7: W(t)l+1 ← W(t)l + ρR. s ← vec(W(t)l+1 − W(t)l 8: y ← vec(∇ f (W(t)l+1 9: σ ← y†y/s†y. 10: l ← l + 1 11: 12: end while R ← P(W(t)l − σ∇ f (W(t)l Compute the step length ρ using line search [89]. ))− W(t)l. F < ε do ). )− ∇ f (W(t)l )). Considering the three different cases for d2 Ch(U(t),U(t−1)) as discussed in Section 2.3.2, the func- tion, f (U(t)), in Eq. (10) can be rewritten for the first case as: + λ2 λ1 {r − Trace(U(t)U(t)†U(t−1)U(t−1)† )}, + (−λ2 λ1 Trace(U(t)U(t)†U(t−1)U(t−1)† )) + (11) λ2 λ1 r, (cid:16) U(t)†ΦΦΦ(t)k+1U(t)(cid:17) (cid:16) U(t)†ΦΦΦ(t)k+1U(t)(cid:17) U(t)†ΦΦΦ(t)k+1U(t)(cid:17) (cid:16) U(t) ∈Rn×r λ1 min Trace s.t U(t)†U(t) = I, λ1 min Trace s.t U(t)†U(t) = I, U(t) ∈Rn×r U(t) ∈Rn×r λ1 min Trace s.t U(t)†U(t) = I. using the cyclic property of the trace operator, Eq. (11) becomes, + (−λ2 λ1 Trace(U(t)†U(t−1)U(t−1)†U(t))) + λ2 λ1 r, (12) By ignoring the constant term and rearranging the problem using properties of the trace function we get, λ1 min U(t) ∈Rn×r Trace (cid:18) U(t)†(cid:18) ΦΦΦ(t)k+1 − λ2 λ1 U(t−1)U(t−1)†(cid:19) (cid:19) , U(t) s.t U(t)†U(t) = I, (13) 131 which can be rewritten as, min U(t) ∈Rn×r λ1Trace(U(t)†ΦΦΦ(t)k+1 modU(t)), s.t U(t)†U(t) = I, (14) mod is defined as the modified Laplacian matrix at time point t. ΦΦΦ(t)k+1 mod is calculated as where ΦΦΦ(t)k+1 ΦΦΦ(t)k+1 mod = ΦΦΦ(t)k+1 − λ2 λ1 For the second case, the function, f (U(t)), in Eq. (10) can be rewritten as: U(t−1)U(t−1)†. U(t)†ΦΦΦ(t)k+1U(t)(cid:17) (cid:16) (cid:16) U(t)†ΦΦΦ(t)k+1U(t)(cid:17) λ1 min Trace U(t) ∈Rn×r s.t U(t)†U(t) = I, λ1 min Trace s.t U(t)†U(t) = I, U(t)∈Rn×r + {r1 − Trace(U(t)U(t)† ˆU(t−1) ˆU(t−1)† )}, λ2 λ1 + (−λ2 λ1 Trace(U(t)U(t)† ˆU(t−1) ˆU(t−1)† )) + (15) λ2 λ1 r1, using the cyclic property of the trace operator and ignoring the constant term, Eq. 15 becomes, (cid:16) U(t)†ΦΦΦ(t)k+1U(t)(cid:17) + (−λ2 λ1 Trace(U(t)† ˆU(t−1) ˆU(t−1)†U(t))), U(t) ∈Rn×r λ1 min Trace s.t U(t)†U(t) = I, which can be rewritten as, (16) (17) min U(t) ∈Rn×r mod = ΦΦΦ(t)k+1 − λ2 λ1 where ΦΦΦ(t)k+1 λ1Trace(U(t)†ΦΦΦ(t)k+1 modU(t)), s.t U(t)†U(t) = I, ˆU(t−1) ˆU(t−1)†. The solutions to the problems in Eq. (14) and Eq. (17) are found by choosing U(t) as the matrix containing the r eigenvectors that correspond to the smallest r eigenvalues of ΦΦΦ(t)k+1 ΦΦΦ(t)k+1 mod ∈ S+ For the third case, the problem in Eq. (10) can be rewritten as: n is the set of positive semidefinite matrices. n and S+ mod where 132 (cid:16) U(t)†ΦΦΦ(t)k+1U(t)(cid:17) U(t)†ΦΦΦ(t)k+1U(t)(cid:17) (cid:16) + λ2 λ1 {r − Trace( ˆU(t) ˆU(t)†U(t−1)U(t−1)† )}, + (−λ2 λ1 Trace( ˆU(t) ˆU(t)†U(t−1)U(t−1)† )) + (18) λ2 λ1 r, U(t) ∈Rn×r λ1 min Trace s.t U(t)†U(t) = I, λ1 min Trace s.t U(t)†U(t) = I, U(t) ∈Rn×r by ignoring the constant term, Eq. 18 becomes, (cid:16) U(t)†ΦΦΦ(t)k+1U(t)(cid:17) U(t) ∈Rn×r λ1 min Trace s.t U(t)†U(t) = I. + (−λ2 λ1 Trace( ˆU(t) ˆU(t)†U(t−1)U(t−1)† )), (19) Considering the first term in Eq. (19), and defining U(t) as, U(t) = with ˆU(t) ∈ Rn×r1 and ¯U(t) ∈ Rn×(r−r1) , then (cid:16) U(t)†ΦΦΦ(t)k+1U(t)(cid:17) = Trace  ˆU(t)†{ΦΦΦ(t)k+1 ˆU(t) ˆU(t)†ΦΦΦ(t)k+1 ¯U(t) ¯U(t)†ΦΦΦ(t)k+1 ˆU(t) (cid:16) ˆU(t)†ΦΦΦ(t)k+1 ˆU(t)(cid:17) = Trace ¯U(t)†L(t) N ¯U(t) + Trace ¯U(t)(cid:105) (cid:104) ˆU(t)  (cid:16) ¯U(t)†ΦΦΦ(t)k+1 ¯U(t)(cid:17) The optimization problem in Eq. (19) can be rewritten as: (cid:16) ˆU(t)†ΦΦΦ(t)k+1 ˆU(t)(cid:17) (cid:16) ¯U(t)†ΦΦΦ(t)k+1 ¯U(t)(cid:17) + Trace min ˆU(t) ∈Rn×r1 , ¯U(t)∈Rn×(r−r1) Trace + (−λ2 λ1 Trace( ˆU(t) ˆU(t)†U(t−1)U(t−1)† )), s.t U(t)†U(t) = I. (cid:18) ˆU(t)†(cid:18) (cid:16) ¯U(t)†ΦΦΦ(t)k+1 ¯U(t)(cid:17) Trace , min ˆU(t) ∈Rn×r, ¯U(t)∈Rn×(r−r1) + Trace ΦΦΦ(t)k+1 − λ2 λ1 s.t U(t)†U(t) = I. U(t−1)U(t−1)†(cid:19) ˆU(t) (cid:19) 133 (20) . (21) (22) A modified Laplacian using the current Laplacian and the eigenvectors found for the previous time point can be defined as ΦΦΦ(t)k+1 U(t−1)U(t−1)†. mod = ΦΦΦ(t)k+1 − λ2 λ1 The modified version of the optimization problem in Eq. 22 can be written as two separate optimization problems for the third case, which will be solved by splitting the single constraint in Eq. 21 into the multiple orthogonality constraints shown below: min λ1Trace( ˆU(t)†ΦΦΦ(t)k+1 mod ˆU(t)) U(t) ∈Rn×r +λ1Trace( ¯U(t)†ΦΦΦ(t)k+1 ¯U(t)), s.t U(t)†U(t) = I. (23) The optimization problem in Eq. (23) can be written as two separate optimization problems which will be solved by splitting the single constraint in Eq. (23) into multiple orthogonality constraints as follows: λ1Tr( ˆU(t)†ΦΦΦ(t)k+1 mod ˆU(t)) s.t ˆU(t)† ˆU(t) = I, λ1Tr( ¯U(t)†ΦΦΦ(t)k+1 ¯U(t)), s.t ¯U(t)† ¯U(t) = I, ˆU(t)† ¯U(t) = 0, (24) min ˆU(t) ∈Rn×r1 min ¯U(t) ∈Rn×(r−r1) mod = ΦΦΦ(t) − λ2 λ1 mod ∈ S+ n where S+ where ΦΦΦ(t)k+1 t and ΦΦΦ(t)k+1 U(t−1)U(t−1)† is defined as the modified Laplacian matrix at time point n is the set of positive semidefinite matrices. Similar to the previous cases, the solution of the first part of this problem can be found as the first r1 eigenvectors corre- sponding to the smallest r1 eigenvalues of the modified Laplacian. After ˆU(t) is found, it is used to solve for the second part subject to the two constraints in Eq. (24). In particular, a complementary set of eigenvectors that corresponds to the graph at time point t is calculated. These vectors should be orthogonal within themselves and simultaneously orthogonal to the eigenvectors from the first part, i.e., the inner product between ˆU(t) and ¯U(t) is zero. The solution can be found using the trace penalty minimization model introduced in [159]. The trace penalty minimization model [159] that represents the second part in Eq. 24 is, 134 min ¯U(t) ∈Rn×(r−r1) (cid:16) ¯U(t)†ΦΦΦ(t)k+1 ¯U(t)(cid:17) f ( ¯U(t)) = 1 2 Trace s.t ˆU(t)† ¯U(t) = 0, + ζ 4 (cid:107) ¯U(t)† ¯U(t) − I(cid:107)2 F , (25) where ˆU(t)† ˆU(t) = I and ζ is the penalty parameter. The problem in Eq. 25 can be solved using modified projected gradient method proposed in [159], initializing ¯U(t) 0 such that ˆU(t)† ¯U(t) 0 = 0 yields, ¯U(t) j+1 = ¯U(t) j − δ j I − ˆU(t) ˆU(t)†(cid:17) (cid:16) ∇ f ( ¯U(t) j ), (26) where j refers to the jth iteration and δ j is the step size. The initialization of ¯U(t) 0 is done by solving the following optimization problem: min ¯U(t) ∈Rn×(r−r1) Trace (cid:16) ¯U(t)†ΦΦΦ(t)k+1 ¯U(t)(cid:17) , s.t ¯U(t)† ¯U(t) = I. (27) A closed form solution can be used to calculate the gradient of f ( ¯U(t) j ) at each iteration, as follows: ∇ f ( ¯U(t) j ) = ΦΦΦ(t)k+1 ¯U(t) j + ζ ¯U(t) j ¯U(t) j † ¯U(t) The step size is computed as, (cid:19) j − I (cid:18) (cid:17) (cid:16) . (28) (29) jH j , δ j = Trace V† (cid:107)H j(cid:107)2 j+1−∇ f ( ¯U(t) F j+1− ¯U(t) j and H j = ∇ f ( ¯U(t) where V j = ¯U(t) j )). The penalty parameter ζ is chosen such that ζ ∈ [max(0,λr−r1),λn] where λr−r1 is the (r − r1)th smallest generalized eigenvalue of the graph Laplacian ΦΦΦ(t)k+1. The iterations termination criterion is (cid:107) f ( ¯U(t) j )(cid:107)F≤ ε1. At the end of the iterations ˆU(t) and ¯U(t) are combined to form U(t). 135 ROIs in the AAL Template Index ROI (1,2) Precentral gyrus (PreCG) (3,4) Superior frontal gyrus, dorsolateral (SFGdor) (5,6) Superior frontal gyrus, orbital part (ORBsup) (7,8) Middle frontal gyrus (MFG) (9,10) Middle frontal gyrus, orbital part (ORBmid) (11,12) Inferior frontal gyrus, opercular part IFGoperc (13,14) Inferior frontal gyrus, triangular part (IFGtriang) (15,16) Inferior frontal gyrus, orbital part (ORBinf) (17,18) Rolandic operculum (ROL) (19,20) Supplementary motor area (SMA) (21,22) Olfactory cortex (OLF) (23,24) Superior frontal gyrus, medial (SFGmed) (25,26) Superior frontal gyrus, medial orbital (ORBsupmed) 136 (27,28) Gyrus rectus (REC) (29,30) Insula (INS) (31,32) Anterior cingulate and paracingulate gyri (ACG) (33,34) Median cingulate and paracingulate gyri (DCG) (35,36) Posterior cingulate gyrus (PCG) (37,38) Hippocampus (HIP) (39,40) Parahippocampal gyrus (PHG) (41,42) Amygdala (AMYG) (43,44) Calcarine fissure and surrounding cortex (CAL) (45,46) Cuneus (CUN) (47,48) Lingual gyrus (LING) (49,50) Superior occipital gyrus (SOG) (51,52) Middle occipital gyrus (MOG) (53,54) Inferior occipital gyrus (IOG) (55,56) Fusiform gyrus (FFG) 137 (57,58) Postcentral gyrus (PoCG) (59,60) Superior parietal gyrus (SPG) (61,62) Inferior parietal, but supramarginal and angular gyri (IPL) (63,64) Supramarginal gyrus (SMG) (65,66) Angular gyrus (ANG) (67,68) Precuneus (PCUN) (69,70) Paracentral lobule (PCL) (71,72) Caudate (CAU) (73,74) Lenticular nucleus, putamen (PUT) (75,76) Lenticular nucleus, pallidum (PAL) (77,78) Thalamus (THA) (79,80) Heschl gyrus (HES) (81,82) Superior temporal gyrus (STG) (83,84) Temporal pole: superior temporal gyrus (TPOsup) (85,86) Middle temporal gyrus (MTG) 138 (87,88) Temporal pole: middle temporal gyrus (TPOmid) (89,90) Inferior temporal gyrus (ITG) Table 1: List of the ROIs in the AAL 90 template. The odd and even indices refer to the left and right brain hemispheres ROIs, respectively. 139 BIBLIOGRAPHY 140 BIBLIOGRAPHY [1] “Circulargraph toolbox,” https://www.mathworks.com/matlabcentral/fileexchange/ 48576-circulargraph. [2] S. Borgatti, A. Mehra, D. Brass, and G. Labianca, “Network analysis in the social sciences,” science, vol. 323, no. 5916, pp. 892–895, 2009. [3] M. Rubinov and O. Sporns, “Complex network measures of brain connectivity: uses and interpretations,” Neuroimage, vol. 52, no. 3, pp. 1059–1069, 2010. [4] B. Malin, “Data and collocation surveillance through location access patterns,” in Proc. NAACSOS Conf, 2004. [5] P. Holme and J. Saram¨aki, “Temporal networks,” Physics reports, vol. 519, no. 3, pp. 97– 125, 2012. [6] M. Kivel¨a, A. Arenas, M. Barthelemy, J. Gleeson, Y. Moreno, and M. Porter, “Multilayer networks,” Journal of complex networks, vol. 2, no. 3, pp. 203–271, 2014. [7] U. Luxburg, “A tutorial on spectral clustering,” Statistics and computing, vol. 17, no. 4, pp. 395–416, 2007. [8] N. Andrew, J. Michael, and Y. Weiss, “On spectral clustering:analysis and an algorithm,” Advances in neural information processing systems, vol. 2, pp. 849–856, 2002. [9] A. Jain, “Data clustering: 50 years beyond k-means,” Pattern recognition letters, vol. 31, no. 8, pp. 651–666, 2010. [10] M. Newman, “Modularity and community structure in networks,” Proceedings of the na- tional academy of sciences, vol. 103, no. 23, pp. 8577–8582, 2006. [11] M. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004. [12] A. Jain, M. Murty, and P. Flynn, “Data clustering: a review,” ACM computing surveys (CSUR), vol. 31, no. 3, pp. 264–323, 1999. 141 [13] P. Tan et al., Introduction to data mining. Pearson Education India, 2006. [14] M. Stoer and F. Wagner, “A simple min-cut algorithm,” Journal of the ACM (JACM), vol. 44, no. 4, pp. 585–591, 1997. [15] L. Hagen and A. Kahng, “New spectral methods for ratio cut partitioning and clustering,” IEEE transactions on computer-aided design of integrated circuits and systems, vol. 11, no. 9, pp. 1074–1085, 1992. [16] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 8, pp. 888–905, 2000. [17] D. Wagner and F. Wagner, “Between min cut and graph bisection,” in International Sympo- sium on Mathematical Foundations of Computer Science. Springer, 1993, pp. 744–750. [18] S. White and P. Smyth, “A spectral clustering approach to finding communities in graphs,” in Proceedings of the 2005 SIAM international conference on data mining. SIAM, 2005, pp. 274–285. [19] J. Forman, P. Clemons, S. Schreiber, and S. Haggarty, “Spectralnet–an application for spec- tral graph analysis and visualization,” BMC bioinformatics, vol. 6, no. 1, p. 260, 2005. [20] S. Fortunato and D. Hric, “Community detection in networks: A user guide,” Physics Re- ports, vol. 659, pp. 1–44, 2016. [21] K. Steinhaeuser and N. Chawla, “Identifying and evaluating community structure in com- plex networks,” Pattern Recognition Letters, vol. 31, no. 5, pp. 413–421, 2010. [22] E. Acar and B. Yener, “Unsupervised multiway data analysis: A literature survey,” IEEE transactions on knowledge and data engineering, vol. 21, no. 1, pp. 6–20, 2009. [23] E. Acar, S. Camtepe, M. Krishnamoorthy, and B. Yener, “Modeling and multiway analysis of chatroom tensors.” ISI, vol. 2005, pp. 256–268, 2005. [24] F. Estienne, N. Matthijs, D. Massart, P. Ricoux, and D. Leibovici, “Multi-way modelling of high-dimensionality electroencephalographic data,” Chemometrics and Intelligent Labora- tory Systems, vol. 58, no. 1, pp. 59–72, 2001. [25] T. Kolda and B. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009. 142 [26] L. D. Lathauwer, B. D. Moor, and J. Vandewalle, “A multilinear singular value decompo- sition,” SIAM journal on Matrix Analysis and Applications, vol. 21, p. 12531278, January 2000. [27] L. D. Lathauwer, B. D. Moor, and J.Vandewalle, “On the best rank-1 and rank-(r 1, r 2,..., r n) approximation of higher-order tensors,” SIAM Journal on Matrix Analysis and Applications, vol. 21, pp. 1324–1342, January 2000. [28] T. Kolda, Multilinear operators for higher-order decompositions. United States. Depart- ment of Energy, 2006. [29] G. Palla, A. Barab´asi, and T. Vicsek, “Quantifying social group evolution,” Nature, vol. 446, no. 7136, pp. 664–667, 2007. [30] T. Yang, Y. Chi, S. Zhu, Y. Gong, and R. Jin, “Detecting communities and their evolutions in dynamic social networksa bayesian approach,” Machine learning, vol. 82, no. 2, pp. 157– 189, 2011. [31] D. Chackrabarti, R. Kumar, and A. Tomkins, “Evolutionary clustering,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, vol. II. ACM, 2006, pp. 554–560. [32] Y. Chi, X. Song, D. Zhou, K. Hino, and B. Tseng, “Evolutionary spectral clustering by in- corporating temporal smoothness,” in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, vol. II. ACM, 2007, pp. 153–162. [33] K. Xu, M. Kliger, and A. Hero, “Adaptive evolutionary clustering,” Data Mining and Knowl- edge Discovery, vol. 28, no. 2, pp. 304–336, 2014. [34] L. Gauvin, A. Panisson, and C. Cattuto, “Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach,” PloS one, vol. 9, no. 1, 2014. [35] P. Mucha, T. Richardson, K. Macon, M. Porter, and J. Onnela, “Community structure in time-dependent, multiscale, and multiplex networks,” science, vol. 328, no. 5980, pp. 876– 878, 2010. [36] S. Chevallier, Q. Barth´elemy, and J. Atif, “Metrics for multivariate dictionaries,” arXiv preprint arXiv:1302.4242, 2013. [37] S. Wasserman and K. Faust, Social network analysis: Methods and applications. Cam- bridge university press, 1994, vol. 8. 143 [38] D. Krackhardt, “Cognitive social structures,” Social networks, vol. 9, no. 2, pp. 109–134, 1987. [39] J. Padgett and C. Ansell, “Robust action and the rise of the medici, 1400-1434,” American journal of sociology, vol. 98, no. 6, pp. 1259–1319, 1993. [40] A. Cardillo, J. G´omez-Gardenes, M. Zanin, M. Romance, D. Papo, F. D. Pozo, and S. Boc- caletti, “Emergence of network features from multiplexity,” Scientific reports, vol. 3, 2013. [41] A. Amelio and C. Pizzuti, “Evolutionary clustering for mining and tracking dynamic multi- layer networks,” Computational Intelligence, vol. 33, no. 2, pp. 181–209, 2017. [42] F. Babiloni, F. Cincotti, C. Babiloni, F. Carducci, D. Mattia, L. Astolfi, A. Basilisco, P. Rossini, L. Ding, Y. Ni et al., “Estimation of the cortical functional connectivity with the multimodal integration of high-resolution eeg and fmri data by directed transfer func- tion,” Neuroimage, vol. 24, no. 1, pp. 118–131, 2005. [43] A. D. Martino, A. Scheres, D. Margulies, A. Kelly, L. Uddin, Z. Shehzad, B. Biswal, J. Wal- ters, F. Castellanos, and M. Milham, “Functional connectivity of human striatum: a resting state fmri study,” Cerebral cortex, vol. 18, no. 12, pp. 2735–2747, 2008. [44] M. V. D. Heuvel and H. Pol, “Exploring the brain network: a review on resting-state fmri functional connectivity,” European neuropsychopharmacology, vol. 20, no. 8, pp. 519–534, 2010. [45] M. Steriade, P. Gloor, R. Llinas, F. D. Silva, and M. Mesulam, “Basic mechanisms of cere- bral rhythmic activities,” Electroencephalography and clinical neurophysiology, vol. 76, no. 6, pp. 481–508, 1990. [46] S. Bressler, “Large-scale cortical networks and cognition,” Brain Research Reviews, vol. 20, no. 3, pp. 288–304, 1995. [47] F. Varela, J. Lachaux, E. Rodriguez, and J. Martinerie, “The brainweb: phase synchroniza- tion and large-scale integration,” Nature reviews. Neuroscience, vol. 2, no. 4, p. 229, 2001. [48] E. Bullmore and O. Sporns, “Complex brain networks: graph theoretical analysis of struc- tural and functional systems,” Nature reviews. Neuroscience, vol. 10, no. 3, p. 186, 2009. [49] M. van den Heuvel, C. Stam, M. Boersma, and H. Pol, “Small-world and scale-free organi- zation of voxel-based resting-state functional connectivity in the human brain,” Neuroimage, vol. 43, no. 3, pp. 528–539, 2008. 144 [50] C. Stam and J. Reijneveld, “Graph theoretical analysis of complex networks in the brain,” Nonlinear biomedical physics, vol. 1, no. 1, p. 3, 2007. [51] R. Prabhakaran, S. Blumstein, E. Myers, E. Hutchison, and B. Britton, “An event-related fmri investigation of phonological–lexical competition,” Neuropsychologia, vol. 44, no. 12, pp. 2209–2221, 2006. [52] P. Boveroux, A. Vanhaudenhuyse, M. Bruno, Q. Noirhomme, S. Lauwick, A. Luxen, C. Degueldre, A. Plenevaux, C. Schnakers, C. Phillips et al., “Breakdown of within-and between-network resting state functional magnetic resonance imaging connectivity during propofol-induced loss of consciousness,” Anesthesiology: The Journal of the American So- ciety of Anesthesiologists, vol. 113, no. 5, pp. 1038–1053, 2010. [53] C. Chang and G. Glover, “Time–frequency dynamics of resting-state brain connectivity measured with fmri,” Neuroimage, vol. 50, no. 1, pp. 81–98, 2010. [54] B. Rogers, V. Morgan, A. Newton, and J. Gore, “Assessing functional connectivity in the human brain by fmri,” Magnetic resonance imaging, vol. 25, no. 10, pp. 1347–1357, 2007. [55] E. Allen, E. Damaraju, S. Plis, E. Erhardt, T. Eichele, and V. Calhoun, “Tracking whole- brain connectivity dynamics in the resting state,” Cerebral cortex, p. bhs352, 2012. [56] N. Leonardi, J. Richiardi, M. Gschwind, S. Simioni, J. Annoni, M. Schluep, P. Vuilleumier, and D. V. D. Ville, “Principal components of functional connectivity: a new approach to study dynamic brain connectivity during rest,” NeuroImage, vol. 83, pp. 937–950, 2013. [57] B. Biswal, F. Yetkin, V. Haughton, and J. Hyde, “Functional connectivity in the motor cortex of resting human brain using echo-planar mri,” Magnetic resonance in medicine, vol. 34, no. 4, pp. 537–541, 1995. [58] C. Rosazza and L. Minati, “Resting-state brain networks: literature review and clinical ap- plications,” Neurological Sciences, vol. 32, no. 5, pp. 773–785, 2011. [59] J. Damoiseaux, S. Rombouts, F. Barkhof, P. P. Scheltens, C. Stam, S. Smith, and C. Beck- mann, “Consistent resting-state networks across healthy subjects,” Proceedings of the na- tional academy of sciences, vol. 103, no. 37, pp. 13 848–13 853, 2006. [60] R. Hutchison, T. Womelsdorf, E. Allen, P. Bandettini, V. Calhoun, M. Corbetta, S. D. Penna, J. Duyn, G. Glover, J. Gonzalez-Castillo et al., “Dynamic functional connectivity: promise, issues, and interpretations,” Neuroimage, vol. 80, pp. 360–378, 2013. 145 [61] R. Goebel, F. Esposito, and E. Formisano, “Analysis of functional image analysis contest (fiac) data with brainvoyager qx: From single-subject to cortically aligned group general linear model analysis and self-organizing group independent component analysis,” Human brain mapping, vol. 27, no. 5, pp. 392–401, 2006. [62] J. Schrouff, V. Perlbarg, M. Boly, G. Marrelec, P. Boveroux, A. Vanhaudenhuyse, M. Bruno, S. Laureys, C. Phillips, M. P´el´egrini-Issac et al., “Brain functional integration decreases during propofol-induced loss of consciousness,” Neuroimage, vol. 57, no. 1, pp. 198–205, 2011. [63] W. Jamal, S. Das, and K. Maharatna, “Existence of millisecond-order stable states in time- varying phase synchronization measure in eeg signals,” in Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE. IEEE, 2013, pp. 2539–2542. [64] W. Jamal, S. Das, K. Maharatna, D. Kuyucu, F. Sicca, L. Billeci, F. Apicella, and F. Mu- ratori, “Using brain connectivity measure of eeg synchrostates for discriminating typi- cal and autism spectrum disorder,” in Neural Engineering (NER), 2013 6th International IEEE/EMBS Conference on. IEEE, 2013, pp. 1402–1405. [65] W. Jamal, S. Das, I. Oprescu, K. Maharatna, F. Apicella, and F. Sicca, “Classification of autism spectrum disorder using supervised learning of brain connectivity measures extracted from synchrostates,” Journal of neural engineering, vol. 11, no. 4, pp. 046–019, 2014. [66] W. Jamal, S. Das, K. Maharatna, F. Apicella, G. Chronaki, F. Sicca, D. Cohen, and F. Mura- tori, “On the existence of synchrostates in multichannel eeg signals during face-perception tasks,” Biomedical Physics & Engineering Express, vol. 1, no. 1, pp. 015–002, 2015. [67] S. Dimitriadis, N. Laskaris, and A. Tzelepi, “On the quantization of time-varying phase synchrony patterns into distinct functional connectivity microstates (fcµstates) in a multi- trial visual erp paradigm,” Brain topography, vol. 26, no. 3, pp. 397–409, 2013. [68] J. Ou, L. Xie, C. Jin, X. Li, D. Zhu, R. Jiang, Y. Chen, J. Zhang, L. Li, and T. Liu, “Charac- terizing and differentiating brain state dynamics via hidden markov models,” Brain topog- raphy, vol. 28, no. 5, pp. 666–679, 2015. [69] S. Ma, V. Calhoun, R. Phlypo, and T. Adalı, “Dynamic changes of spatial functional net- work connectivity in healthy individuals and schizophrenia patients using independent vec- tor analysis,” Neuroimage, vol. 90, pp. 196–206, 2014. [70] S. Mehrkanoon, M. Breakspear, and T. Boonstra, “Low-dimensional dynamics of resting- state cortical activity,” Brain topography, vol. 27, no. 3, pp. 338–352, 2014. 146 [71] N. Leonardi, W. Shirer, M. Greicius, and D. V. D. Ville, “Disentangling dynamic networks: Separated and joint expressions of functional connectivity patterns in time,” Human brain mapping, vol. 35, no. 12, pp. 5984–5995, 2014. [72] A. Mutlu, E. Bernat, and S. Aviyente, “A signal-processing-based approach to time-varying graph analysis for dynamic brain network identification,” Computational and mathematical methods in medicine, vol. 2012, 2012. [73] A. Mahyari, D. Zoltowski, E. Bernat, and S. Aviyente, “A tensor decomposition-based ap- proach for detecting dynamic network states from eeg,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 1, pp. 225–237, 2017. [74] I. Cribben, R. Haraldsdottir, L. Atlas, T. Wager, and M. Lindquist, “Dynamic connectivity regression: determining state-related changes in brain connectivity,” Neuroimage, vol. 61, no. 4, pp. 907–920, 2012. [75] J. Zhang, X. Li, C. Li, Z. Lian, X. Huang, G. Zhong, D. Zhu, K. Li, C. Jin, X. Hu et al., “In- ferring functional interaction and transition patterns via dynamic bayesian variable partition models,” Human brain mapping, vol. 35, no. 7, pp. 3314–3331, 2014. [76] A. Ozdemir, E. Bernat, and S. Aviyente, “Recursive tensor subspace tracking for dynamic brain network analysis,” IEEE Transactions on Signal and Information Processing over Networks, 2017. [77] “Community detection in graphs,” Physics Reports, vol. 486, no. 3, pp. 75 – 174, 2010. [78] K. Tu, B. Ribeiro, A. Swami, and D. Towsley, “Temporal clustering in dynamic networks with tensor decomposition,” arXiv preprint arXiv:1605.08074, 2016. [79] F. Folino and C. Pizzuti, “Multiobjective evolutionary community detection for dynamic networks,” in Proceedings of the 12th annual conference on Genetic and evolutionary com- putation. ACM, 2010, pp. 535–536. [80] C. Eckart and G. Young, “The approximation of one matrix by another of lower rank,” Psychometrika, vol. 1, no. 3, pp. 211–218, 1936. [81] H. Hotelling, “Analysis of a complex of statistical variables into principal components.” Journal of educational psychology, vol. 24, no. 6, p. 417, 1933. [82] N. Kishore Kumar and J. Schneider, “Literature survey on low rank approximation of ma- trices,” Linear and Multilinear Algebra, vol. 65, no. 11, pp. 2212–2244, 2017. 147 [83] E. Cand`es, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM (JACM), vol. 58, no. 3, p. 11, 2011. [84] K. Ye and L. Lim, “Distance between subspaces of different dimensions,” ArXiv e-prints, 2014. [85] M. Meil˘a, “Comparing clusteringsan information based distance,” Journal of multivariate analysis, vol. 98, no. 5, pp. 873–895, 2007. [86] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends R(cid:13) in Machine learning, vol. 3, no. 1, pp. 1–122, 2011. [87] N. Parikh, S. Boyd et al., “Proximal algorithms,” Foundations and Trends R(cid:13) in Optimiza- tion, vol. 1, no. 3, pp. 127–239, 2014. [88] P. Combettes and J. Pesquet, “Proximal splitting methods in signal processing,” in Fixed- Springer, 2011, pp. point algorithms for inverse problems in science and engineering. 185–212. [89] E. Birgin, J. Mart´ınez, M. Raydan et al., “Spectral projected gradient methods: review and perspectives,” J. Stat. Softw, vol. 60, no. 3, pp. 1–21, 2014. [90] V. Traag, R. Aldecoa, and J. Delvenne, “Detecting communities using asymptotical sur- prise,” Physical Review E, vol. 92, no. 2, p. 022816, 2015. [91] M. Girvan and M. Newman, “Community structure in social and biological networks,” Pro- ceedings of the national academy of sciences, vol. 99, no. 12, pp. 7821–7826, 2002. [92] Y. Lin, Y. Chi, S. Zhu, H. Sundaram, and B. Tseng, “Facetnet: a framework for analyzing communities and their evolutions in dynamic networks,” in Proceedings of the 17th inter- national conference on World Wide Web. ACM, 2008, pp. 685–694. [93] N. Eagle and A. Pentland, “Reality mining: sensing complex social systems,” Personal and ubiquitous computing, vol. 10, no. 4, pp. 255–268, 2006. [94] J. Stehl´e, N. Voirin, A. Barrat, C. Cattuto, L. Isella, J. Pinton, M. Quaggiotto, W. V. den Broeck, C. R´egis, B. B. Lina et al., “High-resolution measurements of face-to-face contact patterns in a primary school,” PloS one, vol. 6, no. 8, p. e23176, 2011. 148 [95] V. Gemmetto, A. Barrat, and C. Cattuto, “Mitigation of infectious disease at school: targeted class closure vs school closure,” BMC infectious diseases, vol. 14, no. 1, p. 695, 2014. [96] S. Fortunato, “Community detection in graphs,” Physics reports, vol. 486, no. 3, pp. 75–174, 2010. [97] M. Yaesoubi et al., “Mutually temporally independent connectivity patterns: A new frame- work to study the dynamics of brain connectivity at rest with application to explain group difference based on gender,” NeuroImage, vol. 107, pp. 85–94, 2015. [98] “Metrics for community analysis: A survey.” [99] G. Rossetti and R. Cazabet, “Community discovery in dynamic networks: A survey,” ACM Computing Surveys (CSUR), vol. 51, no. 2, p. 35, 2018. [100] “Global spectral clustering in dynamic networks.” [101] S. Fortunato and M. Barthelemy, “Resolution limit in community detection,” Proceedings of the National Academy of Sciences, vol. 104, no. 1, pp. 36–41, 2007. [102] J. Reichardt and S. Bornholdt, “Statistical mechanics of community detection,” Phys. Rev. E, vol. 74, p. 016110. [103] M. Newman, “Community detection in networks: Modularity optimization and maximum likelihood are equivalent,” arXiv preprint arXiv:1606.02319, 2016. [104] S. Kullback and R. Leibler, “On information and sufficiencyannals of mathematical statis- tics, 22, 79–86,” MathSciNet MATH, 1951. [105] C. Nicolini et al., “Community detection in weighted brain connectivity networks beyond the resolution limit,” Neuroimage, vol. 146, pp. 28–39, 2017. [106] S. Papadimitriou, S. Jimeng, and F. Christos, “Streaming pattern discovery in multiple time-series,” in Proceedings of the 31st international conference on Very large data bases. VLDB Endowment, 2005, pp. 697–708. [107] J. Sun, D. Tao, S. Papadimitriou, P. Yu, S. Philip, and C. Faloutsos, “Incremental tensor analysis: Theory and applications,” ACM Transactions on Knowledge Discovery from Data, vol. 2, no. 3, p. 11, 2008. 149 [108] M. Lavielle, “Using penalized contrasts for the change-point problem,” Signal processing, vol. 85, no. 8, pp. 1501–1510, 2005. [109] “findchangepts,” www.mathworks.com/help/signal/ref/findchangepts.html. [110] “A generalized louvain method for community detection implemented in matlab.” [111] C. Reynolds, “Flocks, herds and schools: A distributed behavioral model,” ACM SIG- GRAPH computer graphics, vol. 21, no. 4, pp. 25–34, 1987. [112] J. Cavanagh, M. Cohen, and J. Allen, “Prelude to and resolution of an error: Eeg phase synchrony reveals cognitive control dynamics during action monitoring,” The Journal of Neuroscience, vol. 29, pp. 98–105, 2009. [113] J. Kayser and C. Tenke, “Principal components analysis of laplacian waveforms as a generic method for identifying erp generator patterns: I. evaluation with auditory oddball tasks,” Clinical neurophysiology, vol. 117, no. 2, pp. 348–368, 2006. [114] S. Aviyente, E. Bernat, W. Evans, and S. Sponheim, “A phase synchrony measure for quan- tifying dynamic functional integration in the brain,” Human brain mapping, vol. 32, no. 1, pp. 80–93, 2011. [115] “1000 functional connectomes project,” http://www.nitrc.org/projects/fcon 1000 [116] “Statistical parametric mapping,” http://www.fil.ion.ucl.ac.uk/spm/ [117] “Functional connectivity toolbox,” http://www.nitrc.org/projects/conn [118] N. Tzourio-Mazoyer, B. Landeau, D. Papathanassiou, F. Crivello, O. Etard, N. Delcroix, B. Mazoyer, and M. Joliot, “Automated anatomical labeling of activations in spm using a macroscopic anatomical parcellation of the mni mri single-subject brain,” Neuroimage, vol. 15, no. 1, pp. 273–289, 2002. [119] N. Leonardi and D. V. D. Ville, “Identifying network correlates of brain states using tensor decompositions of whole-brain dynamic functional connectivity,” in Pattern Recognition in Neuroimaging (PRNI), 2013 International Workshop on. IEEE, 2013, pp. 74–77. [120] X. Li et al., “Dynamic functional connectomics signatures for characterization and differ- entiation of ptsd patients,” Human brain mapping, vol. 35, no. 4, pp. 1761–1778, 2014. 150 [121] H. Eavani, T. Satterthwaite, R. Gur, R. Gur, and C. Davatzikos, “Unsupervised learning of functional network dynamics in resting state fmri,” in International Conference on Informa- tion Processing in Medical Imaging. Springer, 2013, pp. 426–437. [122] M. Raichle, A. MacLeod, A. Snyder, W. Powers, D. Gusnard, and G. Shulman, “A default mode of brain function,” Proceedings of the National Academy of Sciences, vol. 98, no. 2, pp. 676–682, 2001. [123] M. Greicius, K. Supekar, V. Menon, and R. Dougherty, “Resting-state functional connec- tivity reflects structural connectivity in the default mode network,” Cerebral cortex, vol. 19, no. 1, pp. 72–78, 2009. [124] Y. Zhou, N. Shu, Y. Liu, M. Song, Y. Hao, H. Liu, C. Yu, Z. Liu, and T. Jiang, “Al- tered resting-state functional connectivity and anatomical connectivity of hippocampus in schizophrenia,” Schizophrenia research, vol. 100, no. 1, pp. 120–132, 2008. [125] A. David and L. Pierre, “Hippocampal neuroanatomy,” in The hippocampus book. Oxford University Press, 2009. [126] J. Vincent, A. Snyder, M. Fox, B. Shannon, J. Andrews, M. Raichle, and R. Buckner, “Co- herent spontaneous activity identifies a hippocampal-parietal memory network,” Journal of neurophysiology, vol. 96, no. 6, pp. 3517–3531, 2006. [127] Z. Yang, R. Craddock, D. Margulies, C. Yan, and M. Milham, “Common intrinsic connec- tivity states among posteromedial cortex subdivisions: Insights from analysis of temporal dynamics,” Neuroimage, vol. 93, pp. 124–137, 2014. [128] F. Karahano˘glu and D. V. D. Ville, “Transient brain activity disentangles fmri resting-state dynamics in terms of spatially and temporally overlapping networks,” Nature communica- tions, vol. 6, 2015. [129] “Learning-induced autonomy of sensorimotor systems.” [130] M. Lowe, B. Mock, and J. Sorenson, “Functional connectivity in single and multislice echo- planar imaging using resting-state fluctuations,” Neuroimage, vol. 7, no. 2, pp. 119–132, 1998. [131] M. Greicius, B. Krasnow, A. Reiss, and V. Menon, “Functional connectivity in the resting brain: a network analysis of the default mode hypothesis,” Proceedings of the National Academy of Sciences, vol. 100, no. 1, pp. 253–258, 2003. 151 [132] D. Meunier, R. Lambiotte, and E. Bullmore, “Modular and hierarchically modular organi- zation of brain networks,” Frontiers in neuroscience, vol. 4, p. 200, 2010. [133] E. Bullmore and D. Bassett, “Brain graphs: graphical models of the human brain connec- tome,” Annual review of clinical psychology, vol. 7, pp. 113–140, 2011. [134] E. Al-sharoa, M. Al-khassaweneh, and S. Aviyente, “A tensor based framework for commu- nity detection in dynamic networks,” in Proceedings of ICASSP 2017, to be published. [135] D. Meunier, S. Achard, A. Morcom, and E. Bullmore, “Age-related changes in modular organization of human brain functional networks,” Neuroimage, vol. 44, no. 3, pp. 715–723, 2009. [136] M. Rubinov and O. Sporns, “Weight-conserving characterization of complex functional brain networks,” Neuroimage, vol. 56, no. 4, pp. 2068–2079, 2011. [137] J. Power, A. Cohen, S. Nelson, G. Wig, K. Barnes, J. Church, A. Vogel, T. Laumann, F. Miezin, B. Schlaggar et al., “Functional network organization of the human brain,” Neu- ron, vol. 72, no. 4, pp. 665–678, 2011. [138] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, and J. Farmer, “Testing for nonlinearity in time series: the method of surrogate data,” Physica D: Nonlinear Phenomena, vol. 58, no. 1-4, pp. 77–94, 1992. [139] T. Schreiber and A. Schmitz, “Surrogate time series,” Physica D: Nonlinear Phenomena, vol. 142, no. 3, pp. 346–382, 2000. [140] M. D. Domenico, V. Nicosia, A. Arenas, and V. Latora, “Structural reducibility of multilayer networks,” Nature communications, vol. 6, p. 6864, 2015. [141] X. Di and B. Biswal, “Dynamic brain functional connectivity modulated by resting-state networks,” Brain structure & function, vol. 220, no. 1, p. 37, 2015. [142] A. Schaefer, D. Margulies, G. Lohmann, K. Gorgolewski, J. Smallwood, S. Kiebel, and A. Villringer, “Dynamic network participation of functional connectivity hubs assessed by resting-state fmri,” Frontiers in human neuroscience, vol. 8, 2014. [143] P. Fransson and G. Marrelec, “The precuneus/posterior cingulate cortex plays a pivotal role in the default mode network: Evidence from a partial correlation network analysis,” Neu- roimage, vol. 42, no. 3, pp. 1178–1184, 2008. 152 [144] L. Uddin, K. Clare, B. Biswal, C. Xavier, and M. Milham, “Functional connectivity of de- fault mode network components: correlation, anticorrelation, and causality,” Human brain mapping, vol. 30, no. 2, pp. 625–637, 2009. [145] M. Newman, Networks: an introduction. Oxford university press, 2010. [146] X. Dong, P. Frossard, P. Vandergheynst, and N. Nefedov, “Clustering with multi-layer graphs: A spectral perspective,” IEEE Transactions on Signal Processing, vol. 60, no. 11, pp. 5820–5831, 2012. [147] N. Stanley, S. Shai, D. Taylor, and P. Mucha, “Clustering network layers with the strata multilayer stochastic block model,” IEEE transactions on network science and engineering, vol. 3, no. 2, pp. 95–105, 2016. [148] T. Valles-Catala, F. Massucci, R. Guimera, and M. Sales-Pardo, “Multilayer stochastic block models reveal the multilayer structure of complex networks,” Physical Review X, vol. 6, no. 1, p. 011036, 2016. [149] E. Al-Sharoa, M. Al-khassaweneh, and S. Aviyente, “A tensor based framework for commu- nity detection in dynamic networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 2312–2316. [150] E. Al-sharoa, M. Al-khassaweneh, and S. Aviyente, “Detecting brain dynamics during rest- ing state: a tensor based evolutionary clustering approach,” in Wavelets and Sparsity XVII, vol. 10394. International Society for Optics and Photonics, 2017, p. 103940G. [151] C. Chen, M. Ng, and S. Zhang, “Block spectral clustering for multiple graphs with inter- relation,” Network Modeling Analysis in Health Informatics and Bioinformatics, vol. 6, no. 1, p. 8, 2017. [152] R. Kanawati, “Multiplex network mining: a brief survey,” IEEE Intelligent Informatics Bul- letin, vol. 16, pp. 24–28, 2015. [153] M. Berlingerio, M. Coscia, F. Giannotti, A. Monreale, and D. Pedreschi, “Evolving net- works: Eras and turning points,” Intelligent Data Analysis, vol. 17, no. 1, pp. 27–48, 2013. [154] M. D. Domenico, A. Sol´e, S. G´omez, and A. Arenas, “Random walks on multiplex net- works,” arXiv preprint arXiv:1306.0519, 2013. [155] C. Chen, M. Ng, and S. Zhang, “Block spectral clustering methods for multiple graphs,” Numerical Linear Algebra with Applications, vol. 24, no. 1, 2017. 153 [156] R. Witten and E. Candes, “Randomized algorithms for low-rank matrix factorizations: sharp performance bounds,” Algorithmica, vol. 72, no. 1, pp. 264–281, 2015. [157] D. Goldfarb and S. Ma, “Convergence of fixed-point continuation algorithms for matrix rank minimization,” Foundations of Computational Mathematics, vol. 11, no. 2, pp. 183– 210, 2011. [158] Y. Nakatsukasa and N. Higham, “Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the svd,” SIAM Journal on Scientific Com- puting, vol. 35, no. 3, pp. A1325–A1349, 2013. [159] Z. Wen, C. Yang, X. Liu, and Y. Zhang, “Trace-penalty minimization for large-scale eigenspace computation,” Journal of Scientific Computing, vol. 66, no. 3, pp. 1175–1203, 2016. 154