SIGNAL PROCESSING INSPIRED GRAPH THEORETIC METHODS FOR UNDERSTANDING FUNCTIONAL CONNECTIVITY OF THE BRAIN By Marcos Efren Bolaños A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Electrical Engineering 2012 ABSTRACT SIGNAL PROCESSING INSPIRED GRAPH THEORETIC METHODS FOR UNDERSTANDING FUNCTIONAL CONNECTIVITY OF THE BRAIN By Marcos Efren Bolaños Functional brain networks underlying cognitive control processes have been of central interest in neuroscience. A great deal of empirical and theoretical work now suggests that frontal networks in particular the medial prefrontal cortex (mPFC) and lateral prefrontal cortex (lPFC) are involved in cognitive control. The most common way to study functional brain networks has been through measures of connectivity such as coherence, synchrony and mutual information. However, it has been noted that functional connectivity measures are limited to quantifying pairwise relationships between brain regions and do not describe the overall organization of the brain network. Recently, researchers have adapted tools from graph theory to address this issue. Graph theory can model a network by a set of vertices and edges upon which complex network analysis may be applied. With respect to the functional brain network, the vertices represent the individual neural assemblies and the edges are weighted by their pair-wise phase synchrony. Most graph theoretic measures, however, are limited to sparsely connected unweighted graphs. Therefore, some of the existing graph measures cannot be directly applied to the fully connected weighted graphs. In this thesis, existing graph measures and graph theoretic approaches are modified specifically for the analysis of the functional brain network. First, new weighted clustering coefficient and path length measures are introduced for quantifying the local weighted ‘small-world’ index of the brain. These measures are based on modeling the edge weights as probabilities which represent the reliability of information flowing across these edges. These measures differ from conventional measures by considering all possible connections with varying strengths of connectivity and do not require arbitrary thresholding of the weighted connectivity matrix, i.e. they can be applied directly to a fully connected weighted graph. Next, concepts from signal processing are adapted to graphs to identify central vertices and anomalies within a network. These measures include new graph energy and entropy measures for graphs. The proposed graph energy measure outperforms existing definitions of graph energy for local anomaly detection because it is computed from the most relevant spectral content extracted from the graph’s Laplacian matrix. A new definition of entropy rate based on modeling the adjacency matrix of a graph as a Markov process is introduced to quantify the local complexity of a weighted graph. Finally, we introduce a hierarchical consensus clustering algorithm that uses the well-known Fiedler vector to reveal a hierarchical structure of the brain network across various modular resolutions. The proposed methods are applied to error-related negativity (ERN) data, a response-locked negative deflection of the brain event-related potential observed following errors in performance tasks. Previous research shows that the primary neural generator of the ERN is the anterior cingulate cortex (ACC) and there is significant difference in connectivity patterns between mPFC and lPFC for error and correct responses. The proposed graph theoretic approaches give a succinct representation of the functional networks involved during action-monitoring and cognitive control and provide insight into the reorganization of the neural networks during error processing. The ‘small-world’ measures reveal there is increased local functional segregation and integration among electrodes in the mPFC and lPFC during error responses compared to correct responses. Also, the mPFC region of the brain network demonstrated increased energy and complexity indicating the presence of an anomalous perturbation located around the FCz. Finally, the hierarchical consensus clustering algorithm revealed an increase in modularity across the mPFC during error responses indicating a reorganization of the underlying functional network. ACKNOWLEDGMENT I want to thank my family back in El Paso, Texas for their moral support while I pursued my Ph.D. so far from home. I missed them so much during my time here and I hope to continue making them proud. My brothers, Andres and Efren, and my parents, Ofelia and Arturo, have kept me grounded and helped me remember what the important things in life really are. I love you all so very much and I owe all my success to you. I especially want to thank Whitney Rantz for being my rock as I progressed through this challenging journey. You are a remarkable woman with such a great heart and I thank God for bringing you into my life. Thank you for all your encouragement and love. Thank you to Mark, Sue, and Kayla Rantz for making me feel like a part of their family these past few years and helping me feel at home. Also, thank you so much for purchasing my Ph.D. regalia. It is one of the most generous gifts anybody has ever given me and I will always be very grateful. Many thanks to Milton Martinez and Juan Nava for being awesome friends. It is really a blessing that the three of us El Paso guys ended up in the same state. You guys will always be brothers to me. I would also like to express my deep-felt gratitude to my advisor, Dr. Selin Aviyente, for her advice and encouragement. She was never ceasing in her belief in me and constantly pushing me to bring out the best in my work. Also, many thanks to my lab partners Ali Mutlu, Ying Liu, and Suhaily Cardona. I’m glad we were able to share the frustrations and celebrations of research work together. I also wish to thank the other members of my committee, Dr. Rama, Dr. Chakrabartty, and Dr. Esfahanian. Their suggestions, comments and additional guidance were invaluable to the completion of this work. As a special note, Dr. Percy Pierre was a driving force in my success through his help in obtaining a graduate fellowship to pay for my studies, representing me in front iv of the qualifier exam committee, and offering a word of advice when I needed it. I also want to extend my thanks to Dr. Barbara O’Kelly for personally calling me and inviting me to visit Michigan State as well as for making my transition into the Ph.D. program a smooth experience. v TABLE OF CONTENTS List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . xi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 6 6 8 10 12 14 14 14 15 15 16 16 16 18 19 Chapter 2 Small-World Type Measures for Weighted Graphs . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Clustering Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Conventional Clustering Coefficient . . . . . . . . . . . . . . . . . . . . . 2.2.2 Existing Definitions for Weighted Clustering Coefficient . . . . . . . . . . 2.2.3 Newly Proposed Weighted Clustering Coefficient . . . . . . . . . . . . . . 2.2.4 A Random Walk Based Clustering Coefficient Measure for Binary Networks 2.3 Characteristic Path Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Conventional Characteristic Path Length . . . . . . . . . . . . . . . . . . . 2.3.2 Existing Definition for Weighted Characteristic Path Length . . . . . . . . 2.3.3 Newly Proposed Weighted Characteristic Path Length . . . . . . . . . . . . 22 22 25 26 27 29 31 32 33 34 34 . . . . . . . . . . . . . . Chapter 1 . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . 1.2 Background . . . . . . . . . . . . . . . . . 1.2.1 Physiology of the Human Brain . . 1.2.2 The Electroencephalogram (EEG) . 1.2.3 Measures of Functional Connectivity 1.2.4 Graph Theory . . . . . . . . . . . . 1.2.5 Graph Topologies . . . . . . . . . . 1.2.5.1 Lattice Graphs . . . . . . 1.2.5.2 Random Graphs . . . . . 1.2.5.3 Star Graphs . . . . . . . . 1.2.5.4 Scale-Free Networks . . . 1.2.6 Benchmark Data . . . . . . . . . . 1.2.6.1 Simulated Data . . . . . . 1.2.6.2 Real-World Networks . . 1.2.6.3 Biological Data . . . . . . 1.3 Thesis Outline . . . . . . . . . . . . . . . . vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 2.4 2.5 2.6 2.7 A Random Walk Based Characteristic Path Length Measure for Binary Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Small-World Measure of a Network . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Conventional Small-World Measure . . . . . . . . . . . . . . . . . . . . . 2.4.2 Newly Proposed Weighted Small-World . . . . . . . . . . . . . . . . . . . Mapping of Weighted Graphs to Binary Graphs . . . . . . . . . . . . . . . . . . . 2.5.1 Existing Approaches for Transforming Weighted Graphs to Binary Graphs . 2.5.2 Weighted to Binary Graph Transformation: Alternative to Matrix Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Random Walk Based Clustering Coefficient/Path Length Measures for Binary Benchmark Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Weighted Graph Measures on Benchmark Data . . . . . . . . . . . . . . . 2.6.3 Weighted Graph Measures on ERN EEG Data . . . . . . . . . . . . . . . . 2.6.3.1 Bivariate Phase-Synchrony Measures . . . . . . . . . . . . . . . 2.6.3.2 Graph Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3.3 Relationship between Graph and Bivariate Measures . . . . . . . 2.6.4 Optimal Transformation From Weighted to Binary Matrix: ERN/EEG Data Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 3 Local Feature Extraction from Weighted Graphs . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Existing Feature Measures for Graphs . . . . . . . . . . . . . . . . . . . . 3.2.1 Hub Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Provincial and Connector Hubs . . . . . . . . . . . . . . . . . . . . 3.3 Graph Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Existing Measures of Graph Energy . . . . . . . . . . . . . . . . . 3.3.2 Laplacian Hückel Graph Energy . . . . . . . . . . . . . . . . . . . 3.3.3 Localized Laplacian Hückel Graph Energy . . . . . . . . . . . . . . 3.4 Graph Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Existing Measures of Graph Entropy . . . . . . . . . . . . . . . . . 3.4.2 Proposed Graph Entropy Rate . . . . . . . . . . . . . . . . . . . . 3.4.2.1 Matrix Bandwidth Minimization . . . . . . . . . . . . . . 3.4.2.2 Scanning Function of the Permuted Matrix . . . . . . . . 3.4.2.3 Quantization of the Weighted Connectivity Matrix . . . . 3.4.3 Localized Graph Entropy Rate . . . . . . . . . . . . . . . . . . . . 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Comparison Between Energy Measures . . . . . . . . . . . . . . . 3.5.2 Laplacian-Hückel Energy of a Biological Network: EEG ERN data 3.5.3 Comparison Between Local Graph Entropy Measures . . . . . . . . 3.5.4 Entropy Rate Across Binary Graph Topologies . . . . . . . . . . . 3.5.5 Entropy with Respect to Permutation and Scanning Function . . . . 3.5.6 Graph Entropy with Respect to Order of the Markov Process . . . . 3.5.7 Entropy Rate of a Biological Network: EEG ERN data . . . . . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 39 39 40 40 41 41 44 44 45 47 47 47 49 50 51 64 64 66 66 68 69 69 71 72 74 75 77 79 81 82 83 83 84 85 90 91 97 98 101 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Chapter 4 4.1 4.2 4.3 4.4 4.5 4.6 Hierarchical Multi-Subject Community Detection for Weighted Neuronal Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background on Community Detection on Graphs . . . . . . . . . . . . . . . . . 4.2.1 Community Detection on Graphs . . . . . . . . . . . . . . . . . . . . . . 4.2.1.1 Girvan-Newman . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1.2 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Consensus Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Quality Measures for Community Structures . . . . . . . . . . . . . . . . Hierarchical Modular Detection Algorithms for Weighted Neuronal Networks . . 4.3.1 Overlapping Community Structures in Weighted Networks . . . . . . . . 4.3.2 Hierarchical Spectral Partitioning Algorithm for Identifying a Common Community Structure among Multiple Subjects . . . . . . . . . . . . . . New Weighted Graph Clustering Quality Measures . . . . . . . . . . . . . . . . 4.4.1 Comparing Graph Partitions to Multiple Random Networks . . . . . . . . 4.4.2 Homogeneity and Completeness . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Overlapping Clusters in the Brain Network During the Error-Related Negativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Hierarchical Spectral Consensus Clustering Analysis . . . . . . . . . . . 4.5.2.1 Comparisons with Averaging and Conventional Voting Consensus Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2.2 Hierarchical Modular Structure of the Brain During the ErrorRelated Negativity . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2.3 Inter and Intra Edge Phase Synchrony . . . . . . . . . . . . . . 4.5.2.4 Local Hubs of the Brain Network . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 115 118 118 118 120 123 125 127 127 . . . . . 132 134 134 136 140 . 140 . 142 . 142 . . . . 143 144 145 145 Chapter 5 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 158 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Bibliography . . . . . . . . . . . . . . . viii . . . . . . . . . . . . . . . . . . . . 165 LIST OF TABLES Table 1.1 Basic graph measures for undirected binary and weighted graphs. . . . . . 14 Table 2.1 Comparisons between the proposed and existing weighted clustering coefficients for different variations of a triplet. . . . . . . . . . . . . . . . . 30 Table 2.2 Correlations between multivariate small world and bivariate (FCz-F5/FCzF6) measures for both error and correct responses individually and the error-correct differences (all correlations significant; p<0.001). . . . . . . 50 Table 3.1 Highest energy contributing nodes in karate network compared with their corresponding clustering coefficient and betweenness centrality . . . . . . 85 Table 3.2 Entropy computed for various graph models using the proposed measure. Fifty realizations of each model were generated and an average entropy is reported. Each value is normalized by that of a random network of equivalent order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Table 3.3 Graph entropy computed for various graph models using Burda’s definition. Fifty realizations of each model were generated and an average entropy is reported. Each value is normalized by that of a random network of equivalent order. . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Table 3.4 Entropy rates for three real scale-free networks using the proposed measure and Burda’s measures. All measures are normalized by the entropy rate of random networks with equivalent order. . . . . . . . . . . . . . . . 96 Table 3.5 Bits/symbol achieved with Lempel-Ziv algorithm. . . . . . . . . . . . . . 97 Table 3.6 Bits/symbol achieved using Lempel-Ziv algorithm for four real scale-free networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 ix Table 4.1 Top 3 electrodes with largest differences between Correct and Error ‘participation scores’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Table 4.2 Number of modules identified through optimization of Q and λ in simulated data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 x LIST OF FIGURES Figure 1.1 Lobes of the cerebral cortex (www.morphonix.com). For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. . . . . . . . . . . . . . . . . 8 Figure 1.2 10/20 EEG electrode arrangement (www.frontalcortex.com). . . . . . . . 10 Figure 1.3 Example of an undirected 6 node graph. . . . . . . . . . . . . . . . . . . 13 Figure 1.4 Zachary Karate Club Network. . . . . . . . . . . . . . . . . . . . . . . . 17 Figure 2.1 Nodes A and B have similar path length, however, connectivity in (b) is less robust than in (a). In (b), connectivity between nodes A and C is more robust than connectivity between nodes A and B; despite A and C having a longer path length than A and B. . . . . . . . . . . . . . . . . . 37 Figure 2.2 Markov small-world measures averaged over 100 benchmark networks of order N = 50. a) The proposed binary clustering coefficients of the simulated small-world networks are much greater than that of random networks, i.e. γ 1; b)The proposed binary path lengths of the simulated small-world networks are similar to that of random networks, i.e. λ ≈ 1 . 54 Figure 2.3 Markov and conventional clustering coefficients applied to binary karate network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Figure 2.4 Markov and conventional path lengths applied to binary karate network. . 56 Figure 2.5 Nodal weighted clustering coefficients computed using Onnela’s clustering coefficient and the proposed clustering coefficient for Zachary’s Karate club network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Figure 2.6 Nodal weighted path lengths computed using Stam’s path length and the proposed path length for Zachary’s Karate club network. . . . . . . . . . 58 xi Figure 2.7 Connectivity plot based on the RID-TFPS measure for electrode pairs whose phase synchrony is significantly larger for error responses compared to correct responses (Wilcoxon rank sum, 0.0001 ≤ p ≤ 0.001). . . 59 Figure 2.8 Topomap of the differences in the nodal weighted clustering coefficient, path length and small world parameter for error and correct responses. These differences are computed for both the previously published and proposed measures. Topographic significance plots are provided for each measure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Figure 2.9 Optimal binary graph generated from weighted connectivity matrices for error responses. Hubs of the network are indicated in color. Hubs are identified based on the number of shortest paths in which each node participates. Green indicates the most important local hubs, blue indicates nodes globally important for connecting distant portions of the graph, and red is the FCz node (also an important local hub) . . . . . . . . . . . . . 61 Figure 2.10 Optimal binary graphs generated from weighted connectivity matrices for correct responses. Hubs of the network are indicated in color. Hubs are identified based on the number of shortest paths in which each node participates. Green indicates the most important local hubs, blue indicates nodes globally important for connecting distant portions of the graph, and red is the FCz node (also an important local hub). . . . . . . . . . . . . . 62 Figure 2.11 Comparisons between small world calculations using the conventional thresholding method and the propose optimal binary transformation approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Figure 3.1 Local energy indices for a n = 34 karate club network: 1-hop Energy Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Figure 3.2 Local energy indices for a n = 34 karate club network: 2-hop Energy Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Figure 3.3 Local energy indices for a n = 34 karate club network: 3-hop Energy Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Figure 3.4 1-hop neighborhood Energy differences (Error - Correct). . . . . . . . . . 89 Figure 3.5 2-hop neighborhood Energy differences (Error - Correct). . . . . . . . . . 90 xii Figure 3.6 Comparisons between nodal graph entropy measures for identifying anomalous regions in a simulated network with n = 60 nodes. The network is composed of four complete graphs and a random graph performing as a perturbation in the network. . . . . . . . . . . . . . . . . . . . . . . . . 92 Figure 3.7 Comparisons between nodal graph entropy rate measures for identifying important regions in the karate club network with n = 34 nodes. . . . . . 93 Figure 3.8 Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes. a) Random networks; b) Modular networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Figure 3.9 Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes. a) Scale-Free networks; b) Star networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Figure 3.10 Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes (Lattice networks) . . . . . . . . . 101 Figure 3.11 Effects of order value on graph entropy measures for various graph topologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Figure 3.12 Rate/Distortion curves average and standard deviation across all subjects for each response type: a) error; b) correct. . . . . . . . . . . . . . . . . 103 Figure 3.13 Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 2 levels. . . . . . 104 Figure 3.14 Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 3 levels. . . . . . 107 Figure 3.15 Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 4 levels. . . . . . 108 Figure 3.16 Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 5 levels. . . . . . 109 Figure 3.17 Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 6 levels. . . . . . 110 xiii Figure 3.18 Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 7 levels. . . . . . 111 Figure 3.19 Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 8 levels. . . . . . 112 Figure 3.20 Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 9 levels. . . . . . 113 Figure 3.21 Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 10 levels. . . . . 114 Figure 4.1 Weighted matrix to adjacency matrix for a given threshold . . . . . . . . 129 Figure 4.2 Participation Scores of various clustering derived from various thresholds of the network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Figure 4.3 Average Participation scores in terms of network degree . . . . . . . . . . 131 Figure 4.4 Variations in homogeneity and completeness where the true number of modules is 3; a) High homogeneity and low completeness; b) High completeness and low homogeneity; c) High homogeneity and completeness. . 138 Figure 4.5 Modularity with respect to degree and number of clusters for Correct responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Figure 4.6 Modularity with respect to degree and number of clusters for Error responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Figure 4.7 Six overlapping clusters derived for Correct responses. . . . . . . . . . . 149 Figure 4.8 Six overlapping clusters derived for Error responses. . . . . . . . . . . . . 150 Figure 4.9 Hierarchical decomposition of the brain network during an Error response. 151 Figure 4.10 Hierarchical decomposition of the brain network during a Correct response.152 Figure 4.11 Significance cluster plots for Error responses. . . . . . . . . . . . . . . . 153 Figure 4.12 Significance cluster plots for Correct responses . . . . . . . . . . . . . . 154 xiv Figure 4.13 Cross-Module average synchrony for Error responses across four clusters. 155 Figure 4.14 Cross-Module average synchrony for Correct responses across 2 clusters. 156 Figure 4.15 Normalized differences between Hub Identification Measures between Error and Correct a) Within Module Z-Score; b) Participation Coefficient. 157 xv Chapter 1 1.1 Introduction To this day, the human brain remains one of the most intriguing objects of interest in the scientific community. It is a complex network composed of billions of neurons and is capable of performing parallel information processing millions of times faster and more efficiently than any supercomputer in current existence. In contrast to man made systems, its performance tends to degrade gracefully under partial damage and can learn to reorganize itself to bypass damaged areas. Furthermore, the brain supports intelligence and self-awareness simultaneously. The dynamics of communication among distributed regions responsible for these remarkable characteristics of the brain are not yet fully understood by the neuroscience community. Experimental approaches to human cognition have been improved with the development of advanced functional neuroimaging techniques, however, interpretations of activation patterns is limited due to a lack of information pertaining to the nature of structure-function mappings of the brain. As a response to this problem, the human ‘connectome’ project was initiated in July 2009 and is sponsored by sixteen components of the National Institutes of Health (NIH) [1]. The goals of this five year project are to increase the understanding of how functional brain states emerge from their underlying structural substrate, provide new mechanistic insights into how brain function 1 is affected if this structural substrate is disrupted, and allow ‘connectome’ data and research to be easily accessible by the entire scientific community [2]. Ultimately, the project will facilitate the advancement of imaging and analyzing brain connections, resulting in improved sensitivity, resolution, and utility, thereby accelerating progress in the emerging field of human connectomics [3], the application of neural imaging techniques in order to increase the resolution of maps of the brain’s neural connections. This dissertation focuses on the ‘connectome’ goal of understanding how function is associated with an underlying structure using advanced neuroimaging techniques. Neuroimaging techniques such as fMRI, electroencephalography, and magnetoencepholography make it possible to study brain activity with high spatial and temporal resolution [4] and better understand the processes underlying human cognition. However, interpretation of this data is limited due to the lack of understanding pertaining to the organizational structure of the brain. The brain is theoretically believed to function according to two primary organizational principles: functional segregation and functional integration [5]. Functional segregation describes the tendency of neural assemblies to merge together into specialized groups segregated across the cortical regions of the brain. Functional integration describes the coordinated activation of these segregated groups which enables cognitive processes and motor responses. The interplay of these principles is thought to contribute to the complexity of the brain network. Currently, three types of networks are recognized for forming the overall brain network: the anatomical connectivity network (physical links between neural assemblies), functional connectivity network (statistical dependencies between neural assemblies), and effective connectivity network (causal interactions between neural assemblies). Functional connectivity can be quantified using neuroimaging techniques, which can record data from the macroscale anatomical regions of the brain, combined with statistical measures of dependency that 2 quantify the relationships between distributed regions of the brain. Functional connectivity is defined as the temporal correlations between spatially remote neurophysiological events [6] and is the key to understanding how the coordinated and integrated activity of the human brain takes place [7]. Depending on the measure used to quantify it, functional connectivity may reflect linear or nonlinear interactions [8] or interactions observed at various time scales [9]. The literature has demonstrated that local and long range synchronization of oscillatory neural activity facilitates in the exchange of information throughout the brain. The strength of these interactions can quantify pair-wise connectivity in the brain, i.e. functional connectivity. Unfortunately, functional connectivity does not describe a link between function and organization of the brain but only describes the pair-wise associations of neural populations. The challenge is to define methods which reveal the underlying complex network associated with the observed pairwise connectivity patterns. Recently, graph theoretical methods have been implemented to address this problem. Graph theory describes the important properties of complex systems by quantifying topologies of their respective network representations [8]. It has been used for analyzing social networks, the internet, power grids, protein-protein networks, and more recently the brain [10, 11, 12, 7]. Using graph theory, He et al. [13] showed that Alzheimer’s disease patients had abnormal topological organization in the whole-brain structural networks and decrease of hubness occurred in the temporal and parietal regions. These changes suggest a shift to more local processing and a disrupted structural integrity of the larger-scale brain systems [14]. Bassett et al. [15] reported that schizophrenia patients are associated with abnormal topology in the multimodal brain network characterized by a reduced organizational hierarchy, an increased distance between network nodes, and a general loss of hubness in the frontal lobe. In an EEG study on epilepsy, Dellen et. al. 3 [16] reports local graph measures, describing temporal lobe functional network connectivity, are lower and correlated with a more random network in patients with longer histories of epilepsy. Finally, Leistedt et. al. [17] showed decreases in small-world connectivity, associated with problems in information processing across the brain’s functional network, may be indicative of depressive disease. Although these studies have demonstrated the usefulness of graph theory for revealing functional networks from neurimaging data, the implementation of graph theory is limited. The reason is that these measures were originally employed as modeling tools for random graphs and graphs whose properties are well understood, e.g. bipartite, tree, planar, cyclic [18]. However, the properties observed in these well-understood graphs are rarely encountered in real world networks. This naturally poses the challenge of developing biologically meaningful measures for the topologies observed in biological networks. In the application of graph theoretic measures to the study of functional brain networks, some problems are encountered. First, most graph theoretical tools are only applicable to binary networks in which edges are unweighted unlike the case of functional connectivity where a weight is assigned to all node pairs. Second, graph theory measures are typically applied to sparse networks whereas functional connectivity matrices are fully connected. Typically, the weighted information is thresholded to obtain a binary network, thus discarding important information contained by the edge weights. There is also a lack of a sufficient set of local measures to describe local connectivity. Most measures characterize the network as a whole rather than by its subregions. Last, to integrate the abstractness of graph theory with the analysis of a real biological network, methods borrowed from signal processing, information theory, and statistical physics need to be incorporated into designing new graph measures. In this dissertation, the newly proposed graph measures and methods are specifically designed to evaluate functional connectivity matrices obtained from 4 multichannel neurophysiological data. In this dissertation, new graph measures, weighted clustering coefficient and weighted path length, are introduced to define a weighted small-world measure for the brain network without resorting to thresholding of the weighted data. Also, methods from signal processing and information theory are used to develop new local measures of graph energy and graph entropy for weighted graphs to quantify each node’s contribution to network connectivity and to identify anomalous subregions. Finally, a representation of the hierarchical modular organization common across multiple brain networks is extracted using a bi-partitioning algorithm based on spectral decomposition of these networks. The proposed measures are applied to functional connectivity graphs formed by computing pairwise phase synchrony between neural oscillations. A recently introduced time-varying phase synchrony measure [19] was used to quantify the statistical dependencies between neural populations in data, gathered during an EEG study [20], containing the error-related negativity (ERN). The ERN is a brain potential response that occurs following performance errors in a speeded reaction time task. The study tested the hypothesis that the medial prefrontal cortex (mPFC) interacts with the lateral prefrontal cortex (lPFC) in a dynamic loop during goal directed performance. This theoretical network uses a monitoring system, in particular the anterior cingulate cortex (ACC), to signal the need for enhanced cognitive control during conditions of conflict or after an error. After an error, the action-monitoring system (mPFC) is proposed to function as an alarm, recruiting the control network (lPFC) to enhance attention. Recently, Cavanagh et. al. [21] presented evidence that synchronized neural oscillations may be one mechanism that underly this functional communication between action-monitoring and cognitive control networks across mPFC and lPFC regions. The proposed graph measures will give a succinct representation of the functional networks in- 5 volved during action-monitoring and cognitive control and provide insight into the reorganization of the neural networks during error processing. 1.2 Background In this section, a short review on human brain physiology, neuroimaging techniques, metrics quantifying functional connectivity, and graph theory concepts is presented. This study is particularly interested in evaluating ‘scale-free’ networks. Benchmark scale-free networks, which are used to test the proposed graph measures, are also introduced. Finally, a description of the biological dataset, an EEG study performed at the University of Minnesota, evaluated within this dissertation is presented. 1.2.1 Physiology of the Human Brain Neurons extend throughout the human body and vary in structure depending on their specific purpose. A typical neuron possesses three distinct components: the soma, axon, and dendrites. The soma is the cell body of the neuron containing the nucleus and fluid separating it from other cell bodies and the axon allows the neuron to transmit electrical information to other neurons. The axon extends from the soma into far regions of brain tissue where it ultimately terminates and splits into smaller branches. Only one axon can ever extend from a soma; however, numerous dendrites may exist. Signals deriving from other neurons are received either by the soma directly or by its dendrites. Dendrites can extend and split into branches many times but become thinner the further out they reach; this is unlike axons which maintain a fairly constant width [22]. Communication among neurons is due to the action potential, abrupt depolarizations that are the basis for neural signaling in the electrically excitable domains [22]. Neurons transmit electrical information via ion channels which are opened and closed according to concentrations of sodium, potassium, and chloride ions. At rest, these three chemicals remain at stable concentrations and 6 cause the neuron to have a resting state potential. Typically, this resting state is −70 mV thus indicating the cell membrane is polarized such that the inside is more negative than the outside. As the chemical concentrations change, depolarization, an increase in positive polarity, or hyperpolarization, an increase in negative polarity, occurs within the cell. When a motor or cognitive response occurs, the neurons follow a three phase action potential. The first phase begins with a depolarization of the cell to which the potential increases to 56 mV. This is immediately followed by a hyperpolarization phase which decreases the potential to −93 mV, past the resting potential. Finally, the cell depolarizes again and the potential returns back to −70 mV. The action potential is dependent upon the magnitude of the stimulus received by the neuron. This magnitude is referred to as the threshold which typically ranges between 0 and 10 mV. When the stimulus causes the resting potential to reach a value between this range, the action potential activates, otherwise no action potential takes place. There does not exist a medium action potential hence the "all or none" concept of a neuron. After an action potential occurs, there exists a refractory period in which the neuron will not have another action potential even if it receives a stimulus surpassing the threshold. The threshold and refractory period vary with each type of neuron, but the "all or none" concept remains true for all neurons. Neuron function and structure vary throughout the brain but for the purpose of evaluating functional connectivity, neural regions of the cerebral cortex are of primary interest. The cerebral cortex is the top layer of the human brain and can be categorized into four primary sections called lobes: Frontal, Occipital, Parietal, and Temporal (Figure 1.1). The cortex is also split into two functional halves, or hemispheres, by the midline of the brain called the corpus callosum. The functions associated with the frontal lobe are motor functions, reasoning, judgement, impulse control, planning, and memory. The functions associated with 7 Figure 1.1: Lobes of the cerebral cortex (www.morphonix.com). For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. the occipital lobe include vision and color recognition. The parietal lobe processes information, registers sensations of pain, controls speech, visual perception and maintains spatial orientation. Finally, the temporal lobe is associated with emotion, hearing, memory, and speech [22]. The neurons in these regions are densely packed and may be classified as pyramidal or non pyramidal. The pyramidal cells’ action potentials along the cerebral cortex can be monitored through the electroencephalogram. 1.2.2 The Electroencephalogram (EEG) Numerous neuroimaging methods exist such as Magnetoencephalography (MEG), functional Magnetic Resonance Imaging (fMRI), Positron Emitted Tomography (PET), and the electroencephalogram (EEG). To examine functional integration in the temporal frame, there is a need to 8 characterize the temporal dynamics of neural networks with millisecond accuracy. Neuroimaging techniques with high temporal resolution, such as the EEG, are the most appropriate tools for examining the dynamic interactions of neural networks. The electroencephalogram is composed of a collection of oscillating electrical signals observed at each channel, i.e. electrodes, on the surface of the head/scalp. The electrodes used for recording an EEG are typically assembled in a 10/20 arrangement (Figure 1.2) such that the physical distance between adjacent electrodes are 10% or 20% of the total front to back or left to right distance of the skull. Each electrode is assigned a letter representing a lobe and a number for the hemisphere. The letters F, T, C, P, and O represent the frontal, temporal, central, parietal, and occipital regions respectively. Although there is no central lobe, the label exists to help indicate location. Electrodes located along the midline of the skull are represented by this set of letters accompanied by the letter ‘z’ . Letters accompanied by even numbers are electrodes belonging to the right hemisphere and those with odd numbers represent the left hemisphere [23]. Electroencephalogram oscillations can be separated into different frequency bands and classified as five types of waves: delta (1 - 4 Hz), theta (4 - 7 Hz), alpha (8 - 12 Hz), beta (12 -30 Hz), and gamma (25 - 100 Hz). Delta waves are very slow high amplitude oscillations observed during sleep. Theta waves are mostly observed in children; however, only during drowsy or meditative states can they be observed in older adults. They are also associated with moments of enhanced creativity and daydreaming. Alpha waves are associated strongly with the occipital lobe. These waves are highly observable during wakeful relaxation when the eyes are closed. The waves disappear when the eyes are open or when the individual is asleep. Beta waves are associated with active and anxious thinking. Current research suggests gamma waves are associated with conscious perception [24]. 9 Figure 1.2: 10/20 EEG electrode arrangement (www.frontalcortex.com). 1.2.3 Measures of Functional Connectivity After selecting an appropriate neuroimaging technique, a metric to quantify functional connectivity needs to be defined. Types of indices used for quantifying functional connectivity include linear measures, such as correlation and coherence, and nonlinear measures, such as phase synchrony and generalized synchronization measures [25]. Phase synchronization between neuronal oscillations has attracted a lot of attention in the last 10 decade for understanding the functional integration during cognition and attention tasks. It also has increased understanding of the underlying networks involved in psychopathologies. Synchrony is a commonly used measure of functional connectivity between neuronal oscillations and quantifies the relationship between the temporal structures of the signals regardless of signal amplitude. The amount of synchrony between two signals is usually quantified by estimating the instantaneous phase of the individual signals around the frequency of interest. The two main current approaches to isolating the instantaneous phase of the signal are Hilbert transform and complex wavelet transform. In both methods, the goal is to obtain an expression for the signal in terms of its instantaneous amplitude, a(t) and phase φ (t) at the frequency of interest as x(t, ω) = a(t) exp( j(ωt + φ (t))). ˜ This formulation can be repeated for different frequencies and the relationships between the temporal organization of two signals, x and y, can be observed by their instantaneous phase difference Φxy (t) = |nφx (t) − mφy (t)|, where n and m are integers that indicate the ratios of possible frequency locking. Most studies focus on the case n = m = 1. In neuroscience, the focus is on the case when the phase difference is approximately constant over a limited time window. This is defined as a period of phase locking between two events. In this study, a new time-varying measure of phase synchrony [19] based on the complex timefrequency distribution known as the Rihaczek distribution [26], is used to quantify the non-linear dependencies between the neural assemblies of the brain. For a signal, x(t), Rihaczek distribution is expressed as 1 C(t, ω) = √ x(t)X ∗ (ω)e− jωt 2π (1.1) and measures the complex energy of a signal at time t and frequency ω. One of the disadvantages of Rihaczek distribution is the existence of cross-terms for multicom- 11 ponent signals. In order to get rid of these cross-terms, a reduced interference version of Rihaczek distribution was introduced by applying a kernel function such as the Choi-Williams (CW) kernel −(θ τ)2 with φ (θ , τ) = exp( σ ) to filter the cross-terms to obtain C(t, ω) = exp −(θ τ)2 σ exp j θτ A(θ , τ)e− j(θt+τω) dτdθ 2 (1.2) where A(θ , τ) = x(u+ τ )x∗ (u− τ )e jθ u du is the ambiguity function of the signal and exp( jθ τ/2) 2 2 is the kernel corresponding to the Rihaczek distribution [26]. The phase difference between two signals based on this complex distribution is computed as ∗ C1 (t, ω)C2 (t, ω) Φ12 (t, ω) = arg |C1 (t, ω)||C2 (t, ω)| (1.3) where C1 (t, ω) and C2 (t, ω) refer to the complex energy distributions of the two signals x1 (t) and x2 (t) respectively and a synchrony measure quantifying the intertrial variability of the phase differences, phase locking value (PLV), is defined as PLV (t, ω) = 1 N N ∑ k=1 exp( jΦk (t, ω)) 12 (1.4) where N is the number of trials and Φk (t, ω) is the time-varying phase estimate between two 12 electrodes for the kth trial. If the phase difference varies little across the trials, PLV is close to 1. 1.2.4 Graph Theory Graphs can be used to model relationships in a variety of complex networks including social and biological networks. In recent years, the developments in the quantitative analysis of complex networks, based largely on graph theory, have been applied to studies of functional brain networks [27]. An undirected graph, G = (V, E), is defined by a set of vertices, V = {v1 , v2 , ..., vn } of order 12 n and a set of edges, E = {ei j |vi , v j ∈ V } of size m (Figure 1.3). The graph can be represented by an n × n binary adjacency matrix, A, such that A(i, j) =    1 if an edge exists between v and v i j (1.5)   0 otherwise or by a weighted matrix W such that W (i, j) = wi j where wi j can be any real number quantifying the relationship between vertices vi and v j . Another class of graphs are directed graphs, or di-graphs, which assign direction to the edges, i.e. wi j is not necessarily equal to w ji . In this dissertation, the focus is on weighted symmetric graphs where wi j ∈ [0, 1]. The most fundamental measures that define the properties of a graph or di-graph are the degree of nodes, cost of the network, and the distance matrix (Table 1.1). Figure 1.3: Example of an undirected 6 node graph. This phase synchrony measure quantifies the dependent relationships between neural regions, which are represented by the EEG electrodes, and can be represented as an undirected weighted graph. The strength of the relationship between an electrode pair, i.e. edge weight, quantified by the phase locking value is represented by a single number for a particular time and frequency window obtained by averaging PLV (t, ω) in that time-frequency window. This is repeated for all 13 Graph Measure Adjacency Matrix Connectivity Matrix Edge Weight Binary Node Degree Symbol A W Wi j ki w Weighted Node Degree ki Degree Matrix K Laplacian Matrix L Path Pi j Shortest Path SPi j Distance Matrix D Equation ki = ∑n Ai j j=1 w = ∑n W ki j=1 i j Kii = ki L= K −W Description 1 indicates an edge (0 otherwise). Set of all pair-wise strengths. Edge value. Number of node neighbors. Total weight of all neighbors. Node degrees along diagonal. Set of edges between a node pair. SPi j = min Pi j All pair-wise shortest paths. Table 1.1: Basic graph measures for undirected binary and weighted graphs. electrode pairs and the resulting phase locking values generate a weighted connectivity matrix W , where element wi j ∈ W is the phase locking value between electrodes vi and v j for a particular time-frequency window. 1.2.5 Graph Topologies In this section, a range of graph topologies often evaluated in graph theory research are presented. This includes lattice, random, star, and scale-free graphs. 1.2.5.1 Lattice Graphs Ball et. al. Coxeter [28] defines a lattice graph as the graph obtained by taking the n2 ordered pairs of the first n positive integers as nodes and drawing an edge between all pairs having exactly one number in common. This type of graph is highly regular and organized and represents the spectrum of graphs opposite to random graphs. 1.2.5.2 Random Graphs There are various methods to generating a random graphs. The most popular random graph model is described by Erdos and Renyi [29]. Given a finite set S and a function P : S → [0, 1] where ∑s∈S P(s) = 1, we call (S, P) a probability space with sample space S and probability function P. 14 Recall that Gn denotes the set of all labeled simple graphs on n nodes. Given a probability space (Gn , P), selecting an element of Gn according to the probability function P is called a random graph. Let p be a real number between 0 and 1, and consider the function P : Gn → [0, 1] such that for each graph G in Gn , P(G) = pm (1 − p)N−m (1.6) where m is the size of the graph G and N = n . Using the binomial theorem, (Gn , P) is indeed a 2 probability space such that (a + b)r = r ∑ k=0 r k r−k a b k (1.7) The above (Gn , P) model is known as the Erdos-Renyi model of random graphs. In the Erdos1 Renyi model, when p = 2 , the probability of selecting a graph G ∈ Gn at random is 1 . This 2N is interpreted as selecting a member of Gn where all members have the same probability of being chosen [30]. Watts et. al. [31] presented another method of generating a random network by rewiring the edges of a lattice graph. Each edge is rewired with probability p where 0 ≤ p ≤ 1. 1.2.5.3 Star Graphs Star graphs are connected graphs composed of a set of nodes such that the degree of node v1 is d(1) = n − 1 and the degree of all other nodes is d(i) = 1|i = 1. 1.2.5.4 Scale-Free Networks A scale free network is a type of non-random network whose degree distribution fits a power law model, p(k) = k−α , such that 2 ≤ α ≤ 3. Many real world networks such as social networks, power grids, neural networks follow a scale free topology. Watts et. al. [31] investigated networks generated by rewiring the nodes and edges of a regular network each with probability p. A regular 15 network of N nodes and k edges is a network where all nodes are connected to the same number of edges. Therefore a 5-regular network has five edges connected to each of its node members. Watts et. al. [31] explored the unknown region between a regular network (p = 0) and a completely disorganized random network (p = 1). A ‘small world’ network was generated by using regular networks whose edges were rewired with 0 < p < 1. ‘Small world’ networks demonstrated features that distinguished them from other networks in the sense that they possessed meaningful structural topologies. ‘Small world’ networks possess degree distributions similar to a power law model and therefore are considered scale free. This dissertation will use the terms ‘small world’ and scale free interchangeably. 1.2.6 Benchmark Data In this section, the various networks used to test the proposed measures in this dissertation are presented. This includes simulated networks and real-world networks. 1.2.6.1 Simulated Data Scale-free graphs were generated using the LFR benchmark software [32] which generates networks of a known underlying structure, such that membership of segregated clusters is known. The LFR benchmark is capable of generating graphs of any size with increasing complexity. Complexity is defined here as the total number of edges between clusters, kout , with respect to the the total number of edges, m. 1.2.6.2 Real-World Networks Four different real-world networks were used to evaluate some of the measures presented in this dissertation: • A karate club social network [33]: Zachary’s karate club, Figure 1.4, is a social network of friendships among 34 members of a university karate club in 1970. The network represents 16 the split within the group following disputes among the members. The relationship between members are presented as an unweighted matrix such that an edge exists only if two members were friends in the class and as a weighted matrix where entries represent the number of situations in and outside the club in which interactions occurred. The senior members of the karate group are numbered 1, 33, and 34. This is a useful network since the functional role of each of the nodes is well known and is revealed to have a scale-free structure such that the degree distribution follows a power law where α ≈ 2. Figure 1.4: Zachary Karate Club Network. • Dolphin network [34]: This network is derived from a social network analysis of statistically significant frequent associations between 62 bottlenose dolphins in New Zealand. • Les Miserables [35]: The fictional social network describes the co-appearance of 77 characters in the novel Les Miserables. • The North American airport network [36]: This network is a summary of the frequent flight/connection patterns associated 456 U.S. airports. 17 1.2.6.3 Biological Data The proposed phase synchrony measure in conjunction with graph measures are applied to a set of EEG data containing the error-related negativity (ERN). The ERN is an event-related potential that occurs following performance errors in a speeded reaction time task. Previously reported EEG data [20] from 62-channels (10/20 system) were utilized. This included 90 undergraduate students (34 male) from the University of Minnesota (two of the original 92 participants were dropped due to artifacts rendering computation of the TFPS values problematic). Full methodological details of the recording are available in the previous report. The task was a common speeded-response letter (H/S) flanker, where error and correct response-locked trials were from each subject were utilized. A random subset of correct trials was selected, to equate the number of errors relative to correct trials for each participant. Before computing the phase-synchrony measures, all EEG epochs were converted to current source density (CSD) using published methods [37, 38]. This was done to accentuate local activity (e.g. to better index small world properties) and to attenuate distal activity (e.g. volume conduction). Aviyente [19] reported there is increased phase synchrony associated with ERN for the theta frequency band (4-7 Hz) and ERN time window (25-75 ms) for Error responses compared to Correct responses. It is expected that the proposed graph measures in this dissertation will corroborate Aviyente’s findings. Functional networks underlying control processes have been an important recent focus of study and provide a reasonable test of the current measure of functional connectivity. In particular, a great deal of empirical and theoretical work suggests that frontal networks involving the medial prefrontal cortex (mPFC) and lateral prefrontal cortex (lPFC) are centrally involved in cognitive control processes [39, 40, 41, 42, 43, 21]. Time-frequency (TF) analysis techniques have been shown to be particularly well suited to isolating these control-related processes in the theta band 18 (3-7 Hz), because they can disentangle relevant overlapping event-related activity occurring at different frequencies [44, 45, 20, 46, 47]. Recently, Cavanagh and colleagues [21] presented evidence that synchronized neural oscillations in the theta-band may be one mechanism that underlies functional communication between networks involving mPFC and lPFC regions during the ERN. They quantified dynamics of the ERN by computing bivariate theta-band wavelet-based phase synchrony between midline-frontal and lateral-frontal sites. Specifically, they found increased synchronization in the theta band between F5-FCz and F6-FCz for error, and that this difference significantly predicted post-error slowing. Together, these findings support the view that TFPS is sensitive to the changes in functional connectivity in frontal networks underlying cognitive control, and suggest the hypothesis that this increased connectivity would correspond to increased small-world network behavior. 1.3 Thesis Outline This thesis is organized as follows. First, in Chapter 2, graph measures classifying a network as ‘small-world’ are reviewed for unweighted and weighted connection matrices. An important limitation of current ‘small-world’ measures is that they are mostly restricted to binary undirected graphs which require the transformation of the pairwise connectivity indices, e.g. bivariate phase synchrony values, to 0 or 1 by means of a thresholding operation. This thresholding operation loses the information about connectivity strengths between pairs of electrodes as well as brings up the issue of the choice of the optimal threshold. In a lot of studies using ‘small-world’ measures, a range of thresholds is chosen corresponding to graphs with different degrees to avoid the arbitrary choice of a threshold. Although this approach does not suffer from the choice of the threshold, it increases the computational complexity of the analysis and results in uncertainty about which threshold is best for representing the underlying network. A weighted network, on the other hand, 19 provides further relevant information by indicating the varying levels of strengths of the pairwise connections and may be especially useful in filtering the influence of weak and potentially non-significant links. Recently, generalizations of small world measures including the weighted clustering coefficient and weighted path length have been introduced. However, these measures do not incorporate multiple and longer paths between nodes which may significantly contribute to integration in larger and sparser networks. Therefore, Chapter 2 presents new weighted clustering coefficient and weighted path length measures that take all possible connections between nodes into account to compute the small worldness of the network. Also, for situations when thresholding is absolutely required, a process to construct an optimal binary mapping of the biological network is presented. In chapter 3, concepts from signal processing and information theory are employed to introduce new local measures or features to describe graph anomalies and hubs. A new measure of localized graph energy quantifies a brain region’s local contribution to functional global organizational structure during error processing. This method is based on the idea of h-hop neighborhoods of a node which can describe overall graph connectivity, similar to time scaling and shifting in a signal. The proposed measure of energy, Laplacian Huckel Energy, is based on the spectral features of the graph which provide greater insight on the structural characteristics of the network unlike other published measures of graph energy. Furthermore, Chapter 3 presents how global and local complexities of unweighted and weighted graphs are quantified using a new measure of graph entropy rate. The adjacency and connectivity matrices are modeled as a Markov process and entropy is computed according to state transitions in these matrices. The entropy rate of various graph topologies are investigated to determine their complexity. The entropy rate measure is extended to evaluate the local properties of graphs to identify anomalies and hubs of the net- 20 work. Ultimately, the goal of the proposed entropy rate measure is to design a generalized graph compression algorithm. In chapter 4, multivariate analysis is performed on connectivity matrices by implementing new hierarchical spectral clustering techniques over a group of networks. Clustering algorithms are typically applied to sparse and unweighted networks, whereas the proposed methods are designed specifically for fully connected and weighted networks. CONGA, a hierarchical clustering method, is implemented in a new clustering process to identify an optimal community structure in which nodes are allowed to participate in multiple clusters. Finally, most clustering algorithms compute an average connectivity matrix when evaluating multiple networks, thus disregarding variances across subjects. A spectral clustering algorithm, implementing the Fiedler vector, identifies a community structure common to a set of multiple connectivity matrices without first averaging the matrices. These techniques are applied to EEG data to identify which brain regions form highly synchronized segregated groups during error processing. In chapter 5, the contributions of this dissertation are summarized as well as the final evaluation of the ERN dataset. Finally, future work pertaining to clustering, graph compression, and dynamic brain networks is discussed. 21 Chapter 2 Small-World Type Measures for Weighted Graphs 2.1 Introduction Many complex networks such as social and biological networks demonstrate a phenomenon referred to as ‘small world’. The concept of ‘small world’ originally stems from sociology and refers to a network composed of a sparsely connected set of nodes which form dense clusters (producing a high clustering coefficient) linked by short path lengths [31]. A ‘small world’ is thought to lie between two extreme configurations: regular lattice and random graph which are defined in Chapter 1. The advantage of characterizing a complex network as ‘small world’ is that it offers a quantitative means to evaluate varying levels of non-randomness in the network. The frequent occurrence of dense clusters introduces redundancy in the network thus increasing stability of the system and robustness in case of node breakdown [48]. Short path lengths permit rapid communication and robustness to random network attacks [49]. In the brain network, high clustering coefficient promotes functional overlap of densely connected neuronal elements. These 22 elements act as the building blocks of the cortical architecture. Short path length promotes effective interactions between neuronal elements across cortical regions which contributes to functional integration. Categorizing a network as a ‘small world’ presents the opportunity to correlate a network’s structure to its dynamic properties. In a study on scientific collaborations, Newman et. al. reported underlying ‘small world’ structures have an impact on the dynamics of complex networks [50] by demonstrating a ‘small world’ structure influenced the speed of information and idea dissemination in the academic community. Kogut et. al. [51] demonstrated that German corporate firms with higher clustering coefficients and lower path lengths within their ownership structure are more likely to be involved in takeovers and restructuring. Uzzi et. al. [52] evaluated the ‘small world’ of Broadway musicals from 1945 to 1989 and found that the varying ‘small world’ properties affected the creativity of Broadway musicals. The literature thus shows a clear correlation between ‘small world’ structure and network behavior. Recently, this functional dependence on network structure has been exploited for understanding the topology of structural and functional brain networks. The literature suggests functional connectivity of the human brain network can be explained and modeled according to the fundamental organizational principles of a ‘small world’. Sporns et. al [27] discovered robust ‘small world’ properties and non-random patterns in functional connection matrices of invertebrates and mammalian cerebral cortices; including macaque cortex and cat cortex. Further studies demonstrate that large-scale cortical networks exhibit a specific composition of structural and functional modules [53], small-world attributes, i.e. high clustering and short path lengths [54, 55], short wiring length [56] and high complexity [57]. Studies of patterns of functional connectivity in the brain, e.g. coherence or correlation, among cortical regions have demonstrated that functional brain net- 23 works also exhibit ‘small-world’ [58, 59] and scale-free properties [60]; possibly reflecting an underlying structural organization of neural connections. In general, it has been hypothesized that the large-scale organization of neural connections in the brain may impose important constraints on functional dynamics underlying cognition [61, 62]. ‘Small world’ has also been effective in distinguishing between healthy and pathological functional brain networks. In a magnetocenephalographic (MEG) study [7], it was shown that patients with Alzheimer’s disease demonstrated significant decreases in ‘small world’ compared to healthy brains indicating a disruption in effective interactions between and across cortical regions. A similar conclusion was made by Supekar et. al. [63] who evaluated patients through task-free functional magnetic resonance imaging (fMRI). Similar disruptions in functional connectivity have been revealed in an EEG study on schizophrenia [64] and in an MRI study on heroin-dependent individuals [65]. These studies indicate that an analysis of ‘small world’ describes the effective integration of multiple segregated sources of information in the brain and how decreases in the ‘small world’ structure are indicative of cortical dysfunction due to disease. Quantifying the ‘small-world’ measure of brain networks is limited because ‘small-world’ measures, clustering coefficient and path length, are originally defined for unweighted networks. However, most functional brain networks are defined through weighted networks where the edge weights correspond to the temporal dependence between different brain regions. In this chapter, we introduce new methods for representing the ‘small-world’ behavior of weighted brain networks. There has been a limited investigation into developing equivalent clustering coefficient and path length measures for weighted networks [66] and even less studies with applications to real-world weighted networks such as the brain network. Therefore, we introduce new measures of weighted clustering coefficient and weighted path length and demonstrate their applicability to quantifying 24 a weighted small-world measure for the human brain network and other real world networks. The literature typically overcomes problems encountered with weighted networks through thresholding. This methodology is not optimal because thresholding can eliminate important connectivity information. Also, the resulting network is biased because of the arbitrary selection of a threshold. There does not exist a systematic way for choosing an ‘optimal’ threshold and therefore studies have usually evaluated the unweighted ‘small-world’ measure of a network across multiple thresholds. This leads to high computational times particularly where multiple subject data is available. In large part, these problems with thresholding would be avoided using the weighted graph measures we propose in this chapter, but we also offer an alternative to thresholding through the introduction of a systematic method for transforming the weighted network to a binary network whose ‘small world’ characteristics are maximized. Both the weighted graph measures and the weighted to binary network transformation algorithm are applied to both well-known networks and a biological network obtained from ERN EEG data with the intent of demonstrating that cognitive control is associated with increased functional connectivity in the brain and therefore promotes an increase in ‘small world’ structure. 2.2 Clustering Coefficient Clustering coefficient of an unweighted network, a metric necessary for computing ‘smallworld’ of a network, is the fraction of triangles around a node and is equivalent to the fraction of the node’s neighbors that are neighbors of each other. In this section, the conventional measure of clustering coefficient is described along with a discussion of its fundamental weaknesses. This background is followed by proposed measures for addressing these weaknesses. 25 2.2.1 Conventional Clustering Coefficient The clustering coefficient of a node vi quantifies its ability to form a complete graph among its neighbors where all nodes are connected to one another [31]. For a binary graph, the clustering coefficient is equivalent to the fraction of the node’s neighbors that are also neighbors of each other and is defined as C(i) = ti ki (ki − 1)(1/2) (2.1) where ti is the number of closed triplets centered at vi and ki is the degree of vi . A closed triplet centered at vi is defined as a set of three nodes, including vi , which are linked by three edges thus forming a triangle. An open triplet, on the other hand, is composed of three nodes but with only two edges between the nodes. The conventional clustering coefficient is in the range [0,1] where Ci = 0 corresponds to a node vi with degree ki = 0, ki = 1, or one where none of the neighbors share a common edge. Ci = 1 if all of vi ’s neighbors are connected to each other. Clustering coefficient is important for determining the extent to which nodes group into communities. This clustering information is summarized for a particular network simply by computing the average 1 clustering coefficient as C = N ∑N Ci , , where N is the number of nodes in the network. i=1 A fundamental weakness of the conventional clustering coefficient is that it is originally defined for binary networks, i.e. is independent of edge weights. It is well known that network characterization is not limited to its topology but also to the dynamic flow of information described by the edge weights. For example, regions with homogenous activity within the network could be important for understanding the functionality of the network. The amount of information flow characterizing the connections is fundamental for a complete description and characterization of networks [67]. In this chapter, a new method is proposed for computing clustering coefficient in 26 weighted networks by treating edges weights as probabilities. Clustering coefficient is defined as the probability of a node participating in a closed triplet with respect to all open and closed triplets for which it is a member. Furthermore, we argue that the conventional measure of clustering coefficient does not completely characterize the importance of a node in binary networks. This is because it does not fully incorporate the degree of a node, which is an indicator of network centrality. We end this section on clustering coefficient by also proposing a similar clustering coefficient for binary networks that improves the identification of central nodes in the network. 2.2.2 Existing Definitions for Weighted Clustering Coefficient The literature presents various attempts at developing or implementing a clustering coefficient measure designed for weighted networks [68, 69, 66, 70]. Weighted networks contain information pertaining to the dynamic flow of information between node pairs and therefore a clustering coefficient should reflect the information flow through closed triplets. Barrat et. al. [71] was the first to propose such a weighted measure as wi j + wih 1 w C1 (i) = w ai j a jh aih ki (ki − 1) ∑ 2 j,h=i (2.2) w where ai j = 1 if there is an edge between nodes vi and v j and 0 otherwise and ki = ∑N wi j j=1 is the strength of node vi . This measure quantifies the local cohesiveness within a closed triplet centered at vi . However, this measure considers only the average weight of the edges connected to vi within a triplet and neglects to consider all edges that form the triplet, i.e does not take into account the weight w jh . Ignoring the weight w jh will result in clustering coefficients that do not capture the total connectivity within the closed triplet. Another closely related and widely used weighted clustering measure defined by Onnela et al. [72], previously implemented in applications to functional brain network analysis [7, 8, 73], 27 addresses this issue and takes into account all of the edges that form the triplet as 1 w C2 (i) = (wi j wih w jh )1/3 ki (ki − 1) ∑ j,h=i (2.3) This measure normalizes the geometric mean of the edges that form the triplet by the total possible number of connections for node vi . However, Onnela’s measure requires the availability of information pertaining to the underlying binary network, i.e., ki . Therefore, if the network is fully connected and weighted, thresholding will be required to obtain a binary adjacency matrix. If the network is not thresholded, every node is assigned an equal degree value. This results in a constant integer in the denominator and would cause all nodes, regardless of participation in the network, to be normalized equally thus obscuring the uniqueness of each node’s connection strength. In the event there exists a sparse binary representation accompanying the weights, the degree will have a strong influence on the clustering coefficient thus obscuring the information provided by the weights. The binary degree, ki , clearly introduces biases to the clustering coefficient computation that should be strongly avoided. Zhang et. al. proposed a weighted clustering coefficient definition to address this issue [74] by computing the clustering coefficient only through the use of the weighted edges and inspired by the definition in [75]: wi j w jh whi ∑ ∑ i=h j=i, j=h w C3 (i) = ∑ wi j wih j=h The numerator computes the total triplet intensity for a particular reference node and the denominator normalizes the clustering coefficient to values between 0 and 1 by using the underlying open triplets centered at node vi . Although Zhang’s method presents an alternative to Onnela’s require- 28 ment to use binary node degree, the method is prone to false identification of triplets [66]. A false positive identification occurs when at least one of the edges within the closed triplet has significantly lower weight than the other edges and the clustering coefficient is still large. This false identification indicates that a “strongly" linked closed triplet is identified as equivalently “strong" as an open triplet. 2.2.3 Newly Proposed Weighted Clustering Coefficient There is a need to both decrease the number of false positives as well as obtain a measure that is independent from the node degree. Therefore, this motivates the extension of Onnela’s and Zhang’s clustering coefficients by incorporating the total triplet (closed and open) intensities within the denominator to provide a complete characterization of a node’s connectivity strength through 3 ∑ wi j w jh whi j,h=i w C4 (i) = ∑ wih wh j + ∑ wi j w jh whi j,h=i j,h=i (2.4) which treats the pairwise weights as probabilities. In this definition, wi j is treated as the probability of information flowing through a direct path between nodes vi and v j without the use of intermediate nodes. The clustering coefficient is computed as the ratio of the total probability of forming closed triplets to the total probability of all possible triplets with respect to node vi . This approach guarantees a normalized clustering coefficient between 0 and 1 using all of the edge weights. Our interpretation of an open triplet differs from that of Zhang’s, where an open triplet of vi is defined as one in which vi is the center node. Our measure, however, defines an open triplet of vi for which vi is the starting point (or end point) in a connected set of three nodes. Unlike Zhang’s definition, our definition of an open triplet incorporates the weighted edge that is not incident upon vi . This provides a more complete representation of the node’s ability to integrate itself with the rest of 29 the network since it takes into account all possible paths originating from vi to all other nodes in the triplet. In this way, the likelihood of a false positive is reduced, i.e., the proposed measure quantifies connectivity intensities primarily unique to “strong" closed triplets. Table 2.1 compares the clustering coefficient computed using Barrat’s, Onnela’s, Zhang’s, and the proposed measure for different variations of a triplet. The dotted edges represent “weak" links and the solid edges represent “strong" links. The solid node represents the reference node. One hundred triangles were simulated for each of the four configurations where weights for the “weak" links were selected from a uniform distribution between 0 and 0.2. “Strong" links were assigned weights chosen from a uniform distribution between 0.8 and 1. From the table, our proposed measure is shown to reduce the number of false positives compared to Zhang’s measure, is a more complete representation of clustering than Barrat’s measure since it incorporates the third edge in the calculation, and provides a significantly larger clustering coefficient value for the ‘strongest ’ closed triplet than Onnela and Barrat since binary nodal degree is not a factor. Table 2.1: Comparisons between the proposed and existing weighted clustering coefficients for different variations of a triplet. CB CO CZ C proposed 0.05 ± 0.02 0.05 ± 0.06 0.10 ± 0.05 0.10 ± 0.06 0.05 ± 0.03 0.10 ± 0.04 0.89 ± 0.06 0.15 ± 0.08 30 0.25 ± 0.02 0.20 ± 0.06 0.91 ± 0.06 0.23 ± 0.13 0.45 ± 0.03 0.45 ± 0.03 0.90 ± 0.06 0.93 ± 0.05 2.2.4 A Random Walk Based Clustering Coefficient Measure for Binary Networks As demonstrated by the proposed weighted clustering coefficient, connectivity of a network is best represented when the edges are weighted. Unfortunately, all edges in a binary graph are equally weighted and therefore all triplets in the network are considered equal. This property makes the conventional clustering coefficient less discriminative than the weighted clustering coefficient. This shortcoming of the conventional clustering coefficient can be demonstrated with a simple example: If nodes vi and v j are linked to 20 and 5 neighboring nodes, respectively, and all their neighbors share edges with each other, then both Ci = 1 and C j = 1. However, it would be accurate to describe vi and its local connections as more modular than that of v j . If these nodes were part of a real-world network, node vi can be characterized as an anomaly since it is less probable for there to naturally exist 20 edges shared among the neighbors of vi compared to 2 5 edges among neighbors of v . Therefore, clustering coefficient can be made more discriminaj 2 tive for the purpose of identifying and ranking submodules of a binary network. In this section, a more discriminative clustering coefficient, based on the proposed weighted clustering coefficient, is described. A challenge with applying the proposed weighted clustering coefficient to a binary network is that there are no distinct weights associated with the edges. To overcome this, a weight can be assigned to an edge equal to its probability of being selected by a random walk. The probability 1 of selecting edge ei j is k , i.e. the reciprocal of node vi ’s degree [76]. Since the direction of a i random walk is dependent only on the previously visited node, the binary graph can be modeled as a first-order Markov chain [77]. Transitions between all node pairs, i.e. states of the Markov chain, in the graph can be represented by a transition matrix, P, where 31    1/k if e = 1 i ij P(i, j) =   0 otherwise (2.5) where ki is the degree of node vi . The nth power of the transition matrix Pn determines the probability of a random walk between nodes, i.e. Pn (i, j) is the probability of moving from vi to v j in only n state changes. Therefore, the total probability of vi participating in an open triplet and closed triplet are computed as open p(vi ) = ∑[P2 ]i j j (2.6) p(vclosed ) = [P3 ]ii i (2.7) , and a modified clustering coefficient can be described as p(vclosed ) i C(i) = . open p(vi ) + p(vclosed ) i (2.8) This clustering coefficient for binary graphs is more discriminative than the conventional measure since weights are assigned to the edges. Also, this proposed clustering coefficient is still between 0 and 1. 2.3 Characteristic Path Length Path length, the second metric necessary for computing ‘small-world’ of a network, is computed as the average number of edges contained among the shortest paths between node pairs [31]. In this section, the conventional measure of path length is described along with a discussion 32 of its fundamental weaknesses. This background is followed by proposed measures for addressing these weaknesses. 2.3.1 Conventional Characteristic Path Length While clustering coefficient describes how nodes group together, the characteristic path length quantifies the extent to which individual nodes are segregated from each other. Characteristic path length, Li , of a node vi is computed using the distance matrix, D, of the graph such that 1 N Li = D N ∑ ij i=1 (2.9) where D contains the shortest pair-wise distances. The average network path length is defined as 1 L = N ∑N Li . i=1 Unlike clustering coefficient, the literature is much more limited in offering alternative methods for quantifying path length. The primary interest within the literature for improving path length is developing search algorithms that discover shortest paths in networks with negative and positive weights with minimal computational time [78] or computing shortest paths for networks containing negative cycles [79]. However, with particular interest in ‘small-world’ analysis, there are limited options for computing path length of weighted and binary networks. Conventional path length is computed from a shortest path cost function. Small-world analysis reported in the literature [7, 80] typically focuses on one type of shortest path cost function, the minimum weighted sum (or minimum number of edges in a binary graph) between a node pair. This definition may not be robust for a variety of reasons: 1) There may exist multiple shortest paths; 2) The shortest path may not always be used in information sharing across real-world networks; 3) A path with the minimum weighted sum may not always be the path with maximum probability. In graph theory, there are many other possible cost functions for computing the minimum distance between a node 33 pair [81]. Therefore, a new path length measure can be proposed using a new and more robust cost function for weighted graphs and binary graphs. 2.3.2 Existing Definition for Weighted Characteristic Path Length To retain functional information introduced by edge weights, the path length measure should be extended for weighted graphs. One such method is proposed by Stam et. al. [7] where weighted path length is defined as L(i) = 1 1 N − 1 ∑N w j=i i j (2.10) This measure suggests that the stronger the relationship between a node pair, the shorter the weighted path length should be. This definition takes the strength of the edge connecting two nodes as the interaction intensity between those two nodes, i.e. only considers the strength of a single edge path. However, two nodes in a complex network, particularly the brain, can interact through multiple edged paths whose connection strengths may be larger than the weight of the single edged path. Therefore, the above method fails to capture the true flow of information throughout the network. For example, if nodes vi and v j do not share an edge, they are assigned a pair-wise weight of wi j = 0. This implies that a direct path between the two nodes does not exist. However, this does not suggest that vi and v j cannot exchange information, even if not directly. The weighted path length measure above does not consider the possibility of any information flow between these two nodes. 2.3.3 Newly Proposed Weighted Characteristic Path Length Based on the disadvantages of the existing weighted path length measure, it would be appropriate to define a new measure that considers the path length between two nodes to be equivalent 34 to the maximum possible connection strength between them. Among all possible connections between nodes vi and v j , the shortest path will now be defined as the one with the highest probability of connection which corresponds to the maximum product of edge weights forming a path linking a node pair. Such a method would be in line with the fundamental basis of the conventional path length measure. First, the connectivity strength, or the amount of information flow, between two nodes is defined as the product of the edge weights along the path, Pi j , linking a node pair where Pi j = {wi,q , wq1 ,q2 , ..., wqm−1 ,qm } is the set of weighted edges in the path, and Q = {q1 , q2 , ..., qm } 1 is the sequence of nodes along the path Pi j . An efficient search algorithm, based on a modification of the Floyd-Warshall algorithm [82], is applied to the graph. The algorithm identifies a path that maximizes the cost function Fi j = wi,q wqm , j Πm wq −1 ,q . The proposed weighted path =2 1 length is then defined as L(i) = 1 N −1 ∑ j=i 1 Fi j (2.11) The weighted path length of node vi is based on the average probability of forming a path to all other nodes in the network. The advantages of this proposed path length measure are that it is based on the connection weights and considers all paths between nodes. The only stipulation for the evaluated connectivity matrix is that its entries must be in the range [0, 1] to ensure that as the number of edges in a path increases, the probability of linking a node pair within the path decreases. In some applications, edges of a graph may be associated with a probability of reliability [83]. For situations where this information is unavailable, normalizing the weights of the connectivity matrices by the maximum connectivity weight would satisfy this requirement. 35 2.3.4 A Random Walk Based Characteristic Path Length Measure for Binary Networks Conventional measures of shortest distance is defined as the minimum total sum of weights along a path. Practical applications such as the optimal processing of parallel algorithms in a multiprocessor system [84], adopt this cost function for computing shortest path in order to identify the ‘best’ choice for facilitating information flow in a network. Yu et. al. [85] argues that a single cost function, such as the one implemented for small-world analysis, is not guaranteed to reveal the most robust connections in a network. Consider the examples in Figure 2.1 which show variations in connectivity among three nodes. In both figures 2.1a and 2.1b, nodes A and B have a shortest path of 1; however, the robustness of their connections differs in each case. The arrangement in 2.1a shows a more robust connection between A and B, however the single edge linking them in 2.1b may be easily disrupted in a random attack. Therefore, the connectivity in 2.1a is more robust than the one in 2.1b although in both cases the value of the shortest path is the same. Further evaluation of Figure 2.1b by conventional path length would have indicated that the connection between A and B with a shortest path of 1 is more ‘optimal’ than the connection between A and C with shortest path length of 2. However, it is clear that the connectivity between A and C is more robust than the one between A and B. This simple example illustrates the importance of robustness for real-world networks where perturbations are likely to occur [86, 87, 88]. Therefore, a path length, computed from a different cost function than that used in conventional small-world analysis, should convey the robustness of a network. The new cost function is based on the N × N Markov transition matrix introduced in Section 2.2.4. The matrix is characterized as absorbing if it contains at least one node which behaves as a sink [77], i.e. the probability of a directed random walk along a path from the sink to all other 36 (a) (b) Figure 2.1: Nodes A and B have similar path length, however, connectivity in (b) is less robust than in (a). In (b), connectivity between nodes A and C is more robust than connectivity between nodes A and B; despite A and C having a longer path length than A and B. nodes is zero. All other nodes are considered transient since a random walk may traverse between them in any sequence, i.e. the probabilities of traversal along these bi-directed paths are non-zero. P, representing the transition matrix of an absorbing Markov chain, is defined as    Q P=  0 R    1 (2.12) where Q is the (N −1)×(N −1) matrix containing the transitional probabilities among the transient nodes and R is the (N − 1) × 1 vector containing the transitional probabilities from transient nodes to the absorbing node. The expected number of states needed to reach the absorbing state from any transient state is M = I + Q + Q2 + ....Q∞ , where M is the fundamental matrix of an absorbing Markov chain and is shown to be M = (I − Q)−1 [89]. The following theorem then is used to 37 determine the expected number of state transitions necessary for reaching the absorbing node. Theorem 1. Let ti j be the expected number of steps before the Markov chain is absorbed at node vi , given the chain starts at node v j . Also, let T be the (N − 1) × 1 vector whose jth entry is ti j . Then T = Mc (2.13) where c is the (N − 1) × 1 ones vector and M is defined above [90]. By transforming a binary graph into a absorbing Markov chain model, path length can be computed as the expected value of the distance between any node pair. Since the graphs of interest in this study are undirected and connected, they can be characterized as ergodic. Therefore to compute path length, these ergodic networks are converted into absorbing networks by first selecting any node to be the absorbing state. If vi is the selected node, then P is modified such that the entire ith row is set to zero with the exception of Pii which is set to 1. This designates vi as the absorbing node. If the initial state begins at node v j , the average distance between nodes v j and vi is the average number of state changes expected before being absorbed by vi . T is then computed for each node by iteratively designating each node as the new and only absorbing state. Averaging T for each absorbing node is equivalent to the path length of that node. Algorithm 1 describes the process of computing path length. In the algorithm, K is the diagonal degree matrix of a graph G where Kii = ki ( 0 otherwise) and I is the identity matrix. This process identifies the expected length of paths traversed frequently between pairs of nodes. The expected length of a path is not necessarily the shortest, but rather the most probable and robust. Therefore, the proposed path length measure, L, is an improved extension of the conventional measure implemented in small-world analysis. Since the new path length is based on the evaluation 38 Algorithm 1 Markov Chain Path Length for Binary Graphs Input: N × N binary graph, G = (V, E), and adjacency matrix A. Output: Path Length vector, L. 1: P = K −1 A {# Transition matrix of graph G.} 2: z = N − 1. 3: for i = 1 to |V | do 4: Q = P − {vi }. 5: M = (Iz×z − Q)−1 6: c = 1z×1 7: T = Mc ∑ j Tj 8: L(i) = z 9: end for 10: return of multiple alternative paths, it is also a more complete representation of network connectivity. 2.4 Small-World Measure of a Network In this section, the conventional measure of small-world implemented throughout the literature [27, 7, 91] is discussed. Since applications of small-world have primarily included binary graphs, an extension to weighted graphs is presented. 2.4.1 Conventional Small-World Measure Watts et. al. [31] computed the small-world value of a network using clustering coefficient and path length. First clustering coefficient and path length are normalized with respect to the clustering coefficient and path length of a random network. The normalized values for clustering coefficient and path length are denoted as γ and λ , respectively. It is expected that a network with ‘small-world’ property will demonstrate γ σ= γ where σ λ 1 and λ ≈ 1. The small-world value, σ is computed as 1 is indicative of a ‘small world’ topology. Although the small-world measure is primarily defined for binary networks, a weighted small-world measure has been discussed by Latora et. al. [80, 92]; yet they do not describe a formal definition which utilizes a weighted clustering coefficient and weighted path length. Here, a weighted extension to the small-world 39 measure is presented using the proposed weighted clustering coefficient and weighted path length defined in sections 2.2 and 2.3. 2.4.2 Newly Proposed Weighted Small-World The proposed weighted clustering coefficient and path length are used to define a weighted small-world measure. Normalization of the two parameters is performed in a similar manner described by Stam et. al. [7]. Normalized weighted clustering coefficient and normalized weighted path length are computed by dividing the network’s clustering coefficient and path length by the average corresponding measures, Cw (i) and Lw (i) , of an ensemble of surrogate graphs. rand rand The surrogate graphs are generated by randomly permuting the weighted edges of the original connectivity matrices. With the normalized weighted clustering coefficient and path length defined as Cw (i) Lw (i) Cw (i) = w and Lw (i) = w , we extend the conventional small-world definition C (i) L (i) rand rand Cw (i) to evaluate local weighted small-world at each node in the weighted network as σ w (i) = . Lw (i) A weighted network is characterized as a small-world if σ w (i) >> 1 . 2.5 Mapping of Weighted Graphs to Binary Graphs In this section, fundamental issues associated with weighted to binary mappings of networks are discussed. Since edges of real-world networks typically are associated with non-binary weights, there exists common approaches for obtaining a binary transformation. The weighted measures presented in sections 2.2 and 2.3 circumvent this problem, however, for cases when the binary representation is absolutely necessary, an alternative transformation algorithm is offered. 40 2.5.1 Existing Approaches for Transforming Weighted Graphs to Binary Graphs Most of the current graph theoretic measures are defined primarily for binary adjacency matrices [27, 93, 94]. There is little research on weighted adjacency matrices and this poses a problem for applying the conventional measures to weighted adjacency matrices and for performing comparisons between weighted graphs. A review of thresholding techniques which are currently implemented in weighted graph studies is presented by Wijk et. al. [95]. There are two primary methods for transforming a fully connected weighted graph to a sparse binary network: 1) Choosing an arbitrary threshold, τ, for which all entries of the weighted matrix equal to or are greater than τ are mapped to a 1 and those below are mapped to a 0; 2) Applying a range of thresholds, [τ1 , τ2 ], to the weighted matrices and obtain a collection of binary matrices corresponding to each threshold. These two approaches are either purely arbitrary or computationally exhaustive. Furthermore, comparisons across different networks are unreliable since τ can cause variations in sparsity across networks. Choosing a threshold which also maintains connectivity, without sacrificing sparsity, continues to also be a challenge. The proposed method in this section offers a third option, optimal weighted to binary transformation. The advantage of an optimal transformation cost function would reduce dependence of graph measures on τ but rather utilize the strength of connectivity among node pairs. 2.5.2 Weighted to Binary Graph Transformation: Alternative to Matrix Thresholding To address the issue of connectivity presented by thresholding, it is proposed that a ‘backbone’ of the network is first identified. In computer networking, a backbone is a larger transmission 41 line that carries data gathered from smaller lines that interconnect with it [96]. We extend the use of this terminology to describe the underlying primary structure of a graph. This ‘backbone’ would ensure all nodes are reachable from any starting node. The question is how to choose the best ‘backbone’ of the network. In graph theory, a spanning tree is a subgraph which contains all nodes in the network, has exactly n − 1 edges, and each node is reachable from all other nodes in the network. There are no cycles in the spanning tree, i.e. removing an edge from the tree would disconnect the tree. In a complete graph with n nodes, there are nn−2 possible spanning trees [18]. There exists a spanning tree whose total cost, defined as the sum of all weights in the spanning tree, is minimum with respect to all possible spanning trees in the weighted graph. This minimum cost spanning tree is called a minimum spanning tree (MST). In the proposed work, we use the MST as the ‘backbone’ of the new binary transformation of the weighted graph. In order to extract the MST corresponding to a given graph, we use Prim’s algorithm [18], which is capable of identifying the MST for any graph with negative and positive weights. If WG is the weighted connectivity matrix of a graph G, then WH is the transformation from G to H such that WH = −1×WG . In this manner, the highest weights are now transformed into the lowest weights. Applying Prim’s algorithm to H would produce a minimum spanning tree containing edges forming node pairs with the highest weights in G. After identifying the ‘backbone’, i.e. tree, of the new graph, the tree can be expanded to a cyclic graph by adding edges to the tree based on a user defined cost function. One such cost function in the realm of real-world network analysis, for example, identifies a degree distribution which follows a power-law model. Another cost function is the ‘small-world’ parameter, defined by the clustering coefficient and path length. We propose a cost function which would maximize the clustering coefficient and minimize the path length by forming cycles in the minimum spanning 42 tree using nodes separated by at most λ edges in the tree where 1 ≤ λ ≤ n. λ can be incremented as long as the bounds set by the cost functions are obeyed. Since the focus of this thesis is analyzing the connectivity of real-world networks, a cost function which bounds the degree distribution according to a power law model and a cost function which facilitates small-world behavior will be implemented. Algorithm 2 describes the process for generating a binary graph from a fully connected weighted graph. The function T = MST (G) identifies the binary adjacency matrix representing the minimum spanning tree for graph G. DT = dist(T ) is the pair-wise distance matrix of the minimum spanning tree T . The variables [φ1 , φ2 ] are the lower and upper bounds of the degree exponent, α, estimated for the degree distribution. As presented in Chapter 1, for a power-law model −3 ≤ α ≤ −2. Similarly, clustering coefficient is bound by the limits [γ1 , γ2 ]. From previous studies [94, 50], clustering coefficients of real-world networks have been observed to be in the range 0.2 ≥ γ ≥ 0.9. Algorithm 2 Optimal Transformation from a Weighted to a Binary Adjacency Matrix Input: An n ordered weighted graph G = (V, E) with connectivity matrix WG , where ei j ∈ E|wi j ∈ WG . Output: Optimal binary matrix, A. 1: HW ← −1 ×WG . 2: T = MST (H). 3: A ← T . 4: DT = dist(T ). 5: λ ← 2. 6: while λ < n do 7: A ← A ∪ {ei j |DT (i, j) ≤ λ }. 8: if φ1 ≤ α ≤ φ2 then 9: if γ1 ≤ C ≤ γ2 then 10: λ ← λ + 1. 11: else 12: return 13: end if 14: end if 15: end while 43 2.6 Results In this section, the performance of the proposed graph theoretic measures are demonstrated on both simulated data derived from benchmark software, the popular Zachary karate social network, and neural connectivity matrices derived from the ERN EEG study. 2.6.1 Random Walk Based Clustering Coefficient/Path Length Measures for Binary Benchmark Networks The proposed Markov clustering coefficient and path length measures are first tested to determine if they can distinguish between different network topologies including unweighted random, scale-free (‘small-world’), and lattice graphs. One-hundred networks were generated, using the LFR benchmark generator [32], of each topology with order N = 50 and the proposed graph measures were applied. The average clustering coefficient and average path length are computed across all nodes for all 100 networks. Figure 2.2 demonstrates Crandom < C‘small−world << Clattice and Lrandom ≈ L‘small−world << Llattice using the Markov measures. These figures verify that the proposed measures are suitable for defining a modified ‘small-world’ parameter. Similar results were observed for N = 100 and N = 250 which are not reported here. Next, the conventional measures are compared to Markov measures as applied to the unweighted karate network. Figures 2.3 and 2.4 demonstrate the differences between the measures. The proposed measure provides very different yet accurate information about the network’s clustering properties compared to the conventional measure. For example, nodes 1, 33, and 34 are assigned the largest clustering coefficients as expected since they represent the three dojo leaders within the karate club. They were assigned significantly lower values using the conventional measure whereas nodes 15 and 16, students not participating in leadership roles, were assigned larger 44 values using the conventional measure. This is due to nodes 15 and 16 forming all possible triplets among their neighbors. However, this obscures the importance of 1, 33, and 34 since they have more direct connections than 15 and 16 (a problem previously discussed in section 2.2). Therefore, the proposed clustering coefficient metric is better suited for evaluating which nodes promote centrality within the community structure of the network, i.e. acting as important central hubs. In this case, the dojo leaders of the club were accurately identified since it is their responsibility to bring students together. Similarly with path length, the proposed measure assigns nodes corresponding to dojo leaders the lowest path lengths. When these nodes were chosen as absorption states in the Markov chain, it required very few state transitions to link them with all other nodes within the network. Therefore, nodes 1, 33, and 34 have very robust connections and would less likely be disconnected from the network in case of failure among node pairs. Node 12 possesses the largest path length therefore indicating it is at higher risk of being disconnected from the network. This student does not have social ties with anyone in the class except for the dojo leader. Therefore, if dojo leader 1 fails to adequately provide attention to student 12, the student may very well drop out of the club. The path length information attained from the conventional measure is nearly uniform across nodes and does not accurately portray the differences in nodal robustness and therefore less discriminative. 2.6.2 Weighted Graph Measures on Benchmark Data Performances of the proposed weighted graph measures were verified by applying them to the weighted version of Zachary’s Karate club and comparing the results to those obtained through existing weighted measures, in particular Onnela’s weighted clustering coefficient and Stam’s characteristic path length. The first analysis was performed using Onnela’s and the proposed weighted clustering coefficients (Figure 2.5). Compared with the results of Onnela’s measure, the proposed 45 weighted clustering coefficient indicates an increased clustering coefficient for the central nodes 1, 33, and 34 and decreased clustering coefficient for outer boundary nodes 15 and 16, who are only regular members of the club. The relationships maintained by the senior members are important for forming the two large communities in the karate club. This indication of network topology and the identification of the central nodes are not available through Onnela’s measure. This difference between Onnela’s and the proposed weighted clustering coefficients is reflected by a low correlation coefficient of 0.3. The weighted path length described by Stam et. al. and the proposed measure in this paper are next applied to the karate network. Based on Figure 2.6, the two measures are closely related for most nodes, indicated by a correlation coefficient of 0.7. However, there are some differences between the two measures. For example, based on the proposed measure node 17 has the largest path length, whereas for Stam’s measure nodes 10, 12, 18 and 19 have the largest path lengths. The proposed measure yields the highest path length for node 17 since that node is not directly connected to any of the central nodes, i.e. nodes 1, 33 and 34, whereas nodes 10, 12, 18 and 19 are all directly connected to a central node. Since the proposed measure computes the path length by considering all possible connections between a particular node and the rest of the network, it is a better indicator of nodes that are more isolated from the rest of the network. In general, most nodes in the karate network were assigned increased path lengths since all shortest paths, which may be composed of more than one edge, were considered. Therefore, the proposed path length measure presents a much more complete representation of network connectivity. Also, the minimum of the proposed weighted path length measure is 1, similar to the conventional measure for binary networks. Unlike Stam’s measure, this permits a weighted small world measure to be computed according to the same principles defining the conventional small world measure for 46 binary networks. 2.6.3 Weighted Graph Measures on ERN EEG Data 2.6.3.1 Bivariate Phase-Synchrony Measures The RID-TFPS measure was applied to trials from each electrode pair in the 62-channel montage, separately for each subject and condition (i.e. error and correct, where correct trials were randomly matched to the number of correct trials). The average phase synchrony corresponding to the theta frequency band (4-7 Hz) and ERN time window (25-75 ms) were recorded into separate 62 x 62 connectivity matrices, one for each subject (90) and condition (2), producing 180 total connectivity matrices. Figure 2.7 illustrates the significant phase synchrony differences between error and correct responses based on a Wilcoxon Rank Sum Test (p<.001). From this figure, it is clear that error trials produce stronger connections among a number of sites than correct trials, with a dense representation around FCz (consistent with mPFC). Importantly, bivariate TFPS effects (error > correct) reported by Cavanagh et al. [21] are replicated in the current data (F5-FCz, t=2.26, p<.026; F6-FCz, t=3.84, p<.001). However, this plot makes it clear that there are many more significant electrode pairs, and that these two a priori pairs do not fully represent the complexity of the functional connectivity. Thus, although the distribution of pairwise error-correct phasesynchrony differences is useful for determining which electrodes are most phase-synchronous during the ERN, this does not serve to characterize the network behavior corresponding to cognitive control, or how such networks reorganize during error trials. 2.6.3.2 Graph Measures To assess control-related network characteristics using the graph theoretic framework, the proposed weighted clustering coefficient and the weighted path length measures were applied to all of the connectivity matrices. The weighted graph measures were evaluated for each electrode 47 across participants for the error and correct conditions. This resulted in two 62 x 90 matrices for each condition, corresponding to the clustering coefficient, path length, and overall small-world measure. Significant error-correct differences (where Wilcoxon p<.01) for the proposed measures are presented in Figure 2.8. The clustering coefficient was significantly greater during error responses versus correct responses in the frontal region, centered on site FCz, as hypothesized, suggesting increased community formation or functional segregation for error responses relative to correct. Results also indicate that path length was shorter during error responses relative to correct responses supporting the idea that connectivity was increased among the observed network’s sub-communities. Both measures are indicative of increased functional organization among the neuronal regions in the frontocentral region of the brain for error responses, consistent with involvement of the mPFC. Like the constituent clustering and path length parameters, the proposed weighted small-world measure evidenced significant error-correct differences in the same frontocentral regions. Together, these results indicate a significant increase in network efficiency and functional connectivity during error trials relative to correct. We also compared the performance of the proposed measures with previously published weighted graph measures on the same ERN data set. Figure 2.8 shows the topographic distributions for the error-correct differences and the associated significance of these differences for both measures. It can be seen that both the existing and proposed measures show significant differences in the small-world index in the mPFC and lPFC areas with the significance being higher for the proposed small-world index. Similarly, the path length and clustering coefficients show significant differences in similar areas for both the proposed and existing measures, with the proposed measure yielding increased significance. These results indicate that the proposed weighted graph measures capture similar information about the network topology compared to the existing measures while 48 being more sensitive to the differences in network topology involved in error relative to correct responses. 2.6.3.3 Relationship between Graph and Bivariate Measures Next, we assessed whether the significant graph theoretic measures accounted for the bivariate F5-FCz and F6-FCz relationships. This is important for verifying that these new results are directly relevant to the previously published effects of Cavanagh and colleagues [21]. First, correlations between the three graph theoretic and two bivariate measures were computed (Small World, Clustering Coefficient, and Path Length correlated with the FCz-F5 and FCz-F6 bivariate TFPS measures; presented in Table 2.2). For both the error and correct conditions independently, the correlations were high. This verified that a majority of the variance in the bivariate measures was indeed represented in the graph theoretic measures. The correlations for the error-correct difference were also significant, but were qualitatively smaller than those for the conditions independently. Thus, we next evaluated whether the significant bivariate error-correct differences for FCz-F5 and FCz-F6 would be explained by the graph theoretic measures. This was accomplished by taking the variance unique to the bivariate measures (i.e. residual variance in each bivariate measure after removing variance shared with the overall small world measure), and assessing for remaining error-correct differences. This residual variance was not significantly different from zero (both one-sample ts < 1), verifying that the small-world measure accounted for the previously reported bivariate error-correct differences. Next, additional analyses were conducted to assess whether one of the constituent small-world measures (i.e. clustering coefficient or path length measure) was specifically or uniquely responsible for the relationship between the smallworld and bivariate measures. To accomplish this, regressions were conducted using alternatively each of the significant bivariate error-correct differences as dependent variables, and the associated 49 clustering coefficient and path length as independent variables. Results indicated that the clustering coefficient carried some unique variance beyond what was shared with path length, for both FCzF5 phase synchrony (R=.60, F(‘1,89)=23.94, p<.001; Clustering Coefficient, t(89)=3.72, p<.001; Path Length, t<1, ns) and FCz-F6 phase synchrony (R=.52, F(1,89)=16.19, p<.001; Clustering Coefficient, t=4.49, p<.001; Path Length, t(89)=1.92, p<.058). Table 2.2: Correlations between multivariate small world and bivariate (FCz-F5/FCz-F6) measures for both error and correct responses individually and the error-correct differences (all correlations significant; p<0.001). Small World Clustering Coefficient Path Length F5-FCz F6-FCz F5-FCz F6-FCz F5-FCz F6-FCz Error 0.91 0.92 0.88 0.93 -0.82 -0.82 0.92 0.91 0.89 0.92 -0.80 -0.79 Correct Error-Correct 0.63 0.49 0.60 0.49 -0.50 -0.32 2.6.4 Optimal Transformation From Weighted to Binary Matrix: ERN/EEG Data The proposed method for transforming a weighted matrix into a sparse binary matrix was applied to the EEG ERN data. The 90 phase synchrony matrices were averaged into a single 62 × 62 matrix for Correct and Error responses. These two matrices were then inputted into the transformation algorithm. The graphs in figures 2.9 and 2.10 represent the optimal binary transformation of the original weighted connectivity matrices. The red node indicates the location of the FCz node. The graph representing error responses contains dense connections among the frontal and frontal-central nodes centered at FCz. As hypothesized, stronger interactions are expected between these node regions during the ERN. Modules of parietal and central nodes are also formed further along the graph. The graph representing the correct responses similarly contains dense connections between 50 FCz and frontal and central nodes but is not as centrally located compared to its location in the error graph. For comparison, the ‘small-world’ parameter for binary graphs was computed for binary matrices generated by thresholding. Binary graphs corresponding to a range of threshold values, 0 ≤ τ ≤ 1, with a step size of 0.001 were formed. Figure 2.11 reports the ‘small-world’ parameter with respect to network degree (which is dependent on threshold) generated for different τ for both error and correct responses. Small-world values computed for disconnected binary graphs are not reported. The small-world measure was also computed for the optimal binary graphs generated for error and correct using the proposed method. From inspection of figure 2.11, small-world, computed via thresholding, is larger for error compared to correct for very sparse representations (graphs with average degree approximately between 13 to 20). Small-world, computed for the optimal binary graph transformations, is larger for error as well. Furthermore, the average degrees of the optimal binary graphs are sparse (15 for error and 14 for correct). These results demonstrate the optimal binary transformation is capable of identifying a binary graph structure with similar characteristics as the graph derived from the ‘best’ threshold value. This approach saves computational time since numerous thresholds do not need to be inspected, eliminates a dependence on τ all together, and maintains connectivity. 2.7 Conclusions In this chapter, we introduced new graph theoretical measures, clustering coefficient and path length, for binary and weighted graphs. The proposed measures offer improvements over existing small-world measures. By using graph measures that are directly applicable to weighted networks, the need for thresholding the elements of connectivity matrices has been circumvented and the complete information about the strength of the connections between nodes is taken into account, simultaneously. The proposed measures also improved upon existing weighted graph measures by 51 considering all of the alternative ways information can flow in a network rather than through its immediate neighbors only. This improvement is accomplished by adopting a probabilistic framework where the edge weights are treated as the probability of connection. These weighted measures demonstrated integration and segregation of members of the karate network. The proposed weighted measures were also used to assess the small-world properties of the ERN EEG data. Topographic results indicated that frontal areas contained increased clustering and decreased path length (i.e. small-worldness) in response to errors, relative to correct responses. This supports the hypothesis that the brain increases organization, due to the anterior cingulate cortex, across the frontal brain regions during error processing. The usefulness of these weighted measures inspired the need to improve upon the conventional measures for binary graphs. The challenge was to associate distinct weights to edges across a binary network. This was addressed by modeling the adjacency matrix as a Markov chain. Edge weights were assigned based on the probability of a random walk moving between nodes where the probability was computed as the reciprocal of the node’s degree. Both clustering coefficient and path length defined for binary graphs through this manner were shown to be more indicative of a node’s hubness and an edge’s robustness for binary graphs as shown through simulated binary networks. Finally, in the case of weighted to binary graph transformation, the need for arbitrary thresholding was circumvented through the introduction of an optimal weighted to binary transformation algorithm. The algorithm transformed a fully connected weighted graph to a sparse binary graph while simultaneously preserving connectivity, small-worldness, and a degree sequence whose distribution was modeled as a power-law distribution. This transformation algorithm was applied to the average phase synchrony matrices pertaining to the ERN EEG study. Two different structures 52 representing the brain network were generated and revealed the increased presence of hub nodes both in the frontal and frontal-central regions during error responses. 53 (a) (b) Figure 2.2: Markov small-world measures averaged over 100 benchmark networks of order N = 50. a) The proposed binary clustering coefficients of the simulated small-world networks are much greater than that of random networks, i.e. γ 1; b)The proposed binary path lengths of the simulated small-world networks are similar to that of random networks, i.e. λ ≈ 1 54 Normalized Clustering Coefficient Markov Clustering Coefficient 1 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 nodes (a) Normalized Clustering Coefficient Conventional Clustering Coefficient 1 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 nodes (b) Figure 2.3: Markov and conventional clustering coefficients applied to binary karate network. 55 Markov Path Length Normalized Path Length 1 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 nodes (a) Conventional Path Length Normalized Path Length 1 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 nodes (b) Figure 2.4: Markov and conventional path lengths applied to binary karate network. 56 Proposed Weighted Clustering Coefficient Clustering Coefficient 1 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 nodes (a) Onnela Weighted Clustering Coefficient Clustering Coefficient 1 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 nodes (b) Figure 2.5: Nodal weighted clustering coefficients computed using Onnela’s clustering coefficient and the proposed clustering coefficient for Zachary’s Karate club network. 57 Proposed Weighted Path Length 1 Path Length 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 nodes (a) Stam Weighted Path Length 1 Path Length 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 nodes (b) Figure 2.6: Nodal weighted path lengths computed using Stam’s path length and the proposed path length for Zachary’s Karate club network. 58 Figure 2.7: Connectivity plot based on the RID-TFPS measure for electrode pairs whose phase synchrony is significantly larger for error responses compared to correct responses (Wilcoxon rank sum, 0.0001 ≤ p ≤ 0.001). 59 Figure 2.8: Topomap of the differences in the nodal weighted clustering coefficient, path length and small world parameter for error and correct responses. These differences are computed for both the previously published and proposed measures. Topographic significance plots are provided for each measure. 60 O2 PO8 P8 PO6 OZ P6 O1 PO4 T8 TP8 C6 C1 CP6 P4 FT8 FCZ CP2 AF8 FC6 CP4 PO7 POZ P2 PO3 PZ CPZ F8 T7 TP7 C4 C2 FC2 FC4 CZ F2 FC1 C3 FC3 FZ F1 AFZ P3 PO5 P1 P5 CP1 F6 FC5 AF7 FT7 F3 C5 P7 CP3 F4 FP2 CP5 FPZ F5 F7 AF3 FP1 Figure 2.9: Optimal binary graph generated from weighted connectivity matrices for error responses. Hubs of the network are indicated in color. Hubs are identified based on the number of shortest paths in which each node participates. Green indicates the most important local hubs, blue indicates nodes globally important for connecting distant portions of the graph, and red is the FCz node (also an important local hub) . 61 Figure 2.10: Optimal binary graphs generated from weighted connectivity matrices for correct responses. Hubs of the network are indicated in color. Hubs are identified based on the number of shortest paths in which each node participates. Green indicates the most important local hubs, blue indicates nodes globally important for connecting distant portions of the graph, and red is the FCz node (also an important local hub). 62 Figure 2.11: Comparisons between small world calculations using the conventional thresholding method and the propose optimal binary transformation approach. 63 Small World 10 1 1.5 2 2.5 3 20 30 Degree, K 40 50 60 Error (thresholding) Correct (thresholding) Error (Optimal) Correct (Optimal) Small World Computed using Thresholding and Optimality Algorithm Chapter 3 Local Feature Extraction from Weighted Graphs 3.1 Introduction Chapter 2 demonstrated how complex networks may be modeled as graphs composed of vertices and edges and may be characterized as ‘small world’ based on graph theoretic measures such as clustering coefficient and path length. The ‘small world’ measure, however, is as much as a mathematical abstraction as the graph it describes [97]. The small-world measure and other existing graph measures, e.g. girth, diameter, degree, fall short in providing a functional interpretation of the network familiar in research fields such as signal processing and information theory. This renders it difficult to interpret the physical significance of conventional graph measures describing non-abstract systems such as biological networks. The goal of this chapter is to complement current graph measures with non-abstract measures possessing physical meanings supported by well-established theory from other fields of research. Furthermore, these new measures can serve to identify local discriminate features of the graph. 64 Feature extraction of data is widely explored and is an important tool in signal processing, machine learning and pattern recognition [98, 99, 100]. Feature extraction usually maps the highdimensional input data onto a lower dimensional feature space which reflects the inherent structure of the original data [101]. In unsupervised learning, these discovered features are the most representative of the original data and can be used to identify similarities and differences between data sets or to detect anomalies within the data. Adapting existing feature extraction methods from signal processing to graphs is difficult because graphs, particularly those representing real-world networks or systems, are semi-structured and non-Euclidean [102]. Regardless, there have been various proposed measures and algorithms for extracting features from graphs using well-known methods from across different fields of research. For example, in a study of elastic graph matching for facial expression recognition, Zafeiriou et. al [103] implemented a kernel-based technique for discriminant feature extraction in graphs. Fei et. al. [104] adapted data mining techniques to graphs by demonstrating the frequency of subgraphs can be used as a feature extraction method for graph classification. Miller et. al. [105] demonstrated anomaly detection in subgraphs through spectral decomposition of the modularity matrix, and using a test statistic, determined the presence of noise in a graph. Ortega et. al. [106] proposed a lifting wavelet transform on sensor networks for de-noising applications. It is evident from these published measures that local graph feature extraction methods based on signal processing concepts would provide a more detailed and multiscale view of network organization. A close examination could reveal hubness of individual nodes, roles of subgraphs in maintaining connectivity, compressibility of sparsely connected graphs, or the existence of noise or anomalies. The most common local feature extraction method applied to graphs is centrality which can be used to describe the roles individual edges and vertices serve in connectivity [107]. However, the 65 current centrality measures such as betweenness or closeness centrality do not have any physical meaning in signal processing. In this chapter, established measures from signal processing and information theory, energy and entropy, are extended to binary and weighted graphs. These measures are designed to detect anomalous subgraphs in the network as well as classify the hubness of nodes. The proposed definitions of graph energy and graph entropy are evaluated on simulated and real-world networks, including ERN data, and compared to the performance of recently published measures. 3.2 Existing Feature Measures for Graphs In this section, a review of some of the most common feature measures for graphs is presented. These measures have been used in the literature to compare graphs with each other, understand network structure, and evaluate the flow of information across the network. 3.2.1 Hub Centrality A hub is defined as a node within a network which takes on the role of maintaining connectivity, sometimes also known as an influential node. ‘Hubness’ of a node can be quantified by betweenness, closeness, eigenvector, and degree centrality [108]. Betweenness centrality [109], the most commonly utilized centrality measure, quantifies the fraction of shortest paths within the network that travel through node vi such that CB (i) = ∑ m=i=n∈V σmn (i) σmn (3.1) where σmn is the number of shortest paths between nodes vm and vn and σmn (i) is the number of shortest paths which travel through node vi . Closeness centrality [110] is the mean geodesic distance between node vi and all other nodes such that 66 1 CC (i) = ∑ j∈V /i D(i, j) (3.2) where D is the distance matrix whose pair-wise entries are the minimum number of edges required to form a path between vi and v j . Degree centrality [110] quantifies a node’s importance based on the number of direct connections such that ki CD (i) = . N −1 (3.3) Finally, eigenvector centrality, introduced by Bonacich [111], quantifies a node’s importance based on the principle eigenvector of the adjacency matrix such that 1 CE = ACE λ (3.4) where λ is the most dominant eigenvalue of adjacency matrix A and CE is the eigenvector associated with λ such that CE (i) is the centrality score for vi . Among these four centrality measures, eigenvector centrality is the most reliable of true graph connectivity [111] and is similar to a commonly used signal processing method, principle component analysis (PCA) [112]. Bonacich argued that a centrality measure based on degree, e.g. closeness, betweenness, and degree centrality, weights every edge connection in the network equally while the eigenvector weights edge connections according to their true centralities. Furthermore, eigenvector centrality is a weighted sum of not only direct connections, as is the case for degree centrality, but indirect connections of every length such that it accounts for the entire underlying connection pattern across the network. 67 3.2.2 Provincial and Connector Hubs Another method for evaluating the centrality of a node is to classify it as either a provincial hub, connector hub, or non-hub [113] according to its role in a community structure. A dense group of highly connected nodes in a network are referred to as clusters. A node is considered to be provincial if it is primarily responsible for linking nodes together within a cluster. A connector hub is a node which links together multiple clusters and are located along the inter-edges of two or more clusters. A non-hub does not demonstrate any important leadership roles in maintaining network connectivity and may be the outer node of a cluster. Node categorization is performed by computing a within-cluster score, zi , and a participation coefficient, Pi . Nodes with similar roles are expected to have similar relative within-cluster connectivity. A cluster set is defined as C = {c1 , c2 , ..., cs } where c j ⊂ V and s is the number of clusters in the community structure. If Φ(i) = {c j |vi ∈ c j }, κi − κ Φ(i) zi = σκ Φ(i) (3.5) where κi is the number of edges shared between vi and nodes within its cluster, i.e. intra-cluster edges for node vi , κΦ(i) is the average of κi over all the nodes in Φ(i), and σκ is the standard Φ(i) deviation of κi . The within-cluster degree z-score measures how well-connected node vi is to other nodes in the cluster. Roles are also determined by connections between clusters. For example, two nodes with the same z-score will play different roles if one of them is connected to several nodes in other clusters while the other is not. The participation coefficient Pi of node vi is 68 κic 2 j s Pi = 1 − ∑ ki j=1 (3.6) where κic is the number of links of vi to nodes in cluster c j and ki is the total degree of node vi . j The participation coefficient of a node is therefore close to 1 if its links are uniformly distributed among all the clusters and 0 if all its links are within its own cluster. Guimera et. al. [113] describe a method to classify a node as provincial, connector, or non hub based on the value of the node’s within-cluster score and participation coefficient. For example, a node with a within-cluster score greater than 2 and a participation coefficient less than 0.3 is categorized as a provincial node. However, the problem with the classification technique of [113] is that a priori knowledge about cluster membership is required. Therefore, a reliable clustering algorithm is first necessary to identify clusters and their nodal memberships. 3.3 Graph Energy In this section, current definitions of graph energy are presented. This is followed by the introduction of a new graph energy metric based on the Laplacian matrix and an algorithm for detecting local and global structural anomalies in a graph. 3.3.1 Existing Measures of Graph Energy Energy is a well defined metric in signal processing for quantifying the norm of a finite signal. To measure the stability and reliability of connectivity in a graph, a global energy measure was first defined by Gutman et. al [114] as n E(G) = ∑ |λi| (3.7) i=1 where λ1 , λ2 , ...λn are the eigenvalues of the adjacency matrix. If a graph is not connected, its 69 energy is the sum of the energies of its connected components; therefore, isolated vertices do not contribute or affect the total energy. Energy is affected, however, by the number of edges in the graph. Day et. al [115] reports graph energy changes if edges are removed or added to the graph. Therefore, if a node which is linked to the network by at least one edge, is removed, then energy will change. Another measure of energy, Hückel energy, was introduced by Coulson [116] to determine the energies of molecular orbitals in hydrocarbon systems. The vertices of the graph associated with a given molecule are in one to one correspondence with the carbon atoms of the hydrocarbon system. Hückel energy, defined as HE(G) =    2 ∑r λi i=1 if n = 2r (3.8)   2 ∑r λ + λ r+1 if n = 2r + 1 i=1 i insures that the total π-electron energy of a conjugated hydrocarbon is simply the energy of the corresponding molecular graph. Balakrishnan [117] argues Hückel energy provides a better approximation for the total π-electron energy than Gutman’s definition. This is attributed to using only a subset of the leading eigenvalues associated with the adjacency matrix. This approach can be thought of as revealing the internal structure of the data such that the variance in the data is best explained [118]. Coulson’s approach is also supported by Allefeld et. al. [119] who demonstrated that the leading eigenvalues have increased correspondence with the strength of local network connectivity than trailing eigenvalues. The information revealed by Hückel energy, however, is limited since the spectral information of the adjacency matrix has been shown to be less indicative of structural content than the spectral content of the Laplacian matrix [120]. Furthermore, these measures of graph energy do not evaluate the local properties of graphs. Individual nodes or subgraphs may be responsible for the energy content of a graph due to their local connections. In the next section, an extension of Hückel energy, Laplacian Hückel Energy, is presented along with an algorithm for 70 computing local energy indices. 3.3.2 Laplacian Hückel Graph Energy It has been shown that the Laplacian matrix retains greater structural information of a graph than the adjacency matrix [120, 121] where the Laplacian matrix of a graph is defined as    k if i = j  i    Li j =  −1 if Ai j = 1      0 otherwise (3.9) . Therefore, an energy measure computed from the eigenspectra, (µ1 ≥ µ2 ≥ ... ≥ µn ), of the Laplacian matrix is proposed. The Laplacian matrix is positive semi-definite and its eigenvalues are all positive. While the adjacency matrix eigenvalues have a zero-mean distribution, i.e. ∑n λi i=1 = 0, the n Laplacian matrix does not share this property since all its eigenvalues are positive. Computing the trace of the Laplacian matrix as tr(L) = ∑ Lii (3.10) ∑ ∑ Ai j (3.11) i n n = i j n = ∑ ki i (3.12) ∑n µ ∑n k and with ∑n µi = tr(L), it can be shown that i=1 i = i=1 i , i.e. the average degree of the n n i network is equivalent to the average of the Laplacian eigenvalues. The average is subtracted from 71 ∑n k each of the Laplacian eigenvalues to render a zero-mean distribution such that µi = µi − i=1 i . n Now, ∑n µi = ∑n λi = 0 and the adjacency matrix eigenvalues in the original definition of i=1 i=1 Hückel energy are substituted with the new eigenvalues of the Laplacian matrix to obtain the Laplacian Hückel Energy measure defined as LHE(G) =    2 ∑r |µi | i=1 if n = 2r (3.13)   (2 ∑r |µ |) + |µ r+1 | if n = 2r + 1 i=1 i This new definition of graph energy incorporates the advantage that most structural information is contained in the first few largest eigenvalues of the Laplacian matrix. It is also expected to be more discriminative of energy content than Gutman’s and the conventional Hückel energy in the graph since the spectrum of the Laplacian matrix is more superior at representing structural connectivity than that of the adjacency matrix. 3.3.3 Localized Laplacian Hückel Graph Energy In this section, the lack of signal processing inspired local graph indices is addressed by extending the proposed Laplacian Hückel Energy to evaluate the energy content of nodes and their neighborhoods across the network. This index can be used to quantify the importance of a node based on its magnitude of contribution to total graph energy. The h-hop neighborhood of a node vi is defined as H(i) = {v j ∈ V |D(i, j) ≤ h}, where h ∈ {1, 2, 3, .., d}. Therefore, Vh (i) = (vi ∪ H(i)) and Eh (i) = {ek ∈ E|vk , v ∈ Vh (i)}, where Gh (i) = (Vh (i), Eh (i)) is the subgraph of G containing node vi and its h-hop neighbors. To define nodal energy contribution of a node, the subgraph, Gh (i), obtained by the node of interest vi and all of its h-hop neighbors is considered. This limits the energy dependencies of G to the internal structure of Gh (i) only. Removing a node or subgraph and computing the effect on the original network’s 72 energy is not an accurate representation of energy contribution since the original network’s total energy is not a sum of its subgraphs’ individual energies. Therefore, instead of removing the nodes and edges of Gh (i), randomizations of Gh (i)’s internal structure are considered, i.e. changing the position of edges, to determine how a perturbation in the local neighborhood of node vi affects the energy of the whole graph G. The number of nodes, n = |Vh (i)|, and edges, m = |Eh (i)|, are preserved to maintain a constant Cost, where Cost = 2|E| and 0 ≤ Cost ≤ 1, and graph n(n−1) order across randomizations. As Cost approaches 1, there are less permutations of the subgraph possible. When Cost = 1, Gh (i) is complete, and therefore cannot be permuted. Therefore the node is deemed redundant and unnecessary for maintaining structural integrity of the network, i.e. it contributes zero energy to G since removing it will not change connectivity in the graph. A permuted Gh (i) will be identified as PGh (i). If the total possible number of edges in a graph is n (n −1) , then the number of possible PGh (i) s is α= 2 Algorithm 3 Nodal contribution to graph energy 1: Input: Graph G=(V,E). 2: Output: d × n matrix β containing nodal energy contributions. 3: n = |V |. 4: Compute n × n distance matrix, D, of G. 5: for i = 1 to n do 6: for h = 1 to d do 7: Find Gh (i) 8: m = |Eh (i)|, n = |Vh (i)|. 1 9: α = 2 n (n − 1). 10: for r = 1 to |PGh (i)| do 11: Gr = G\Gh (i) ∪ PGr (i) h 12: energy(r)=LHE(Gr ). 13: end for 14: LHErand =mean(energy) 15: βh (i) = |LHE(G) − LHErand |. 16: end for 17: end for 73 α |PGh (i)| = m −1 (3.14) Therefore, a new subgraph, PGr (i), is formed from each permutation where the integer r = h {1, 2, 3, ...|PG j (i)|}. The Laplacian Hückel Energy of the new graph is evaluated, Gr , where Gr = G\Gh (i) ∪ PGr (i) for each randomization of Gh (i) and an average change in energy is comh puted. The node associated with the greatest change in energy is deemed the most important node for maintaining structural integrity as long as the following constraints are adhered to during the randomization of the subgraphs: • Every Gr generated must be a connected graph, i.e. no isolated vertices or multiple components. • Gh (i) is connected but the permuted version, PGh (i), does not need to be connected. The importance of node vi , associated with the subgraph Gh (i), in preserving structural connectivity of G is proportional to |LHE(G) − LHErand |, i.e. the larger the difference between the two energies within a particular h-hop neighborhood, the more important a node is to the network. The h-hop neighborhood determines if the energy change is due to the center node’s local or more extensive/global relationships. 3.4 Graph Entropy Entropy is another classic measure commonly used in signal processing and information theory. In this section, current definitions of graph entropy and graph entropy rate are presented. This is followed by the introduction of a new graph entropy rate measure based on modeling the adjacency matrix as a Markov process. This new definition of graph entropy is utilized in an algorithm designed to quantify the complexity of graph subregions as well as to detect local anomalies. 74 3.4.1 Existing Measures of Graph Entropy In information theory [122], entropy is a measure of the uncertainty associated with a random variable. The original definition of graph entropy was introduced by Korner [123] which quantifies the lower bound on the complexity of graphs. It is defined as H(G) = minX,Y I(X;Y ) such that the minimum is taken over all pairs of random variables X, Y such that X is a uniformly selected vertex in G and Y is the set of vertices containing X. Similar to basic properties of entropy in information theory, if G contains no edges, then H = 0 and if G is complete such that every node is connected to all of the other nodes, then H(G) = log2 (n), where n is the cardinality of V , the number of nodes. Implementation of Korner’s definition of entropy, however, is NP hard which makes its evaluation for real world networks impracticle. A more recent definition for computing entropy in real-world networks was proposed by Dehmer et al. [124]. Dehmer proposed to quantify the complexity of a graph using Shannon’s definition of entropy such that the probability distribution is computed from a metrical property or graph invariant, e.g. node degree or closeness centrality. Entropy is defined as  |V | I f (G) =  f (vi )  f (vi )  log   |V | |V | i=1 ∑ f (v j ) f (v j ) ∑ j=1 j=1 ∑ (3.15) where f is an arbitrary information functional. The set S j (vi , G) = {v ∈ V |d(vi , v) = j, j ≥ 1} is called the j-sphere, similar to the previously defined h-hop neighborhood, of vi regarding G and d(vi , v) denotes the shortest distance between nodes vi and v. For example, the information functional computed from node degree is f V (vi ) = α c1 |S1 (vi ,G)|+c2 |S2 (vi ,G)|+...+cρ |Sρ (vi ,G)| 75 (3.16) where ck ≥ 0 is an arbitrary real positive coefficient, 1 ≤ k ≤ ρ, ρ is the maximum j-sphere level evaluated, and α > 0. Dehmer’s method is a good approach to incorporating the local connectivity information to compute overall graph entropy. However there are three noticeable disadvantages of Dehmer’s method. First, the many variables in the information functional and the sensitivity to the type of graph metric incorporated will have a biased impact on entropy. Entropy will be significantly different across varying α and graph metrics. Second, Dehmer’s solution evaluates random realizations of ck and which have no actual significance or relationship to the structure of the network. This also makes it difficult to compare across multiple graphs if the same realizations of ck are not used. Finally, as discussed in section 3.2, node degree is a weak metric to understand the structural connectivity of the network. The entropy measure is dependent on this graph metric and therefore further research is necessary to determine which is the ‘best’ graph metric for representing the graph. Another local entropy measure was proposed by Shetty et. al [125] for identifying the local hubs in the social network of the Enron corporation. Shetty computed a probability distribution based on the frequency of certain e-mail correspondences within the company. The novelty is not in the definition of entropy, since he directly implements Shannon’s definition, but rather from the procedure for evaluating the contribution of individual nodes to entropy of the entire network. Nodes which have the most effect of graph entropy when they are removed are thought to be important to network connectivity. Next, measures for computing entropy rate of a graph are discussed. The first was presented in an application by Demetrius et. al. [126] who implemented Kolmogorov-Sinai entropy rate by evaluating random walks along the graph. This measure of graph entropy rate, proposed by Burda et. al. [127], computes the entropy rate of a graph as log λ1 where λ1 is the largest eigenvalue of the adjacency matrix. Sinatra et. al [128] recently applied Burda’s definition to 76 quantify the maximum level of information diffusion across a network. Burda’s measure, however, is plagued by a similar problem observed in Dehmer’s measure; a dependence on node degree which is the weakest measure of network connectivity [129]. Finally, Szpankowski et. al. [130] proved that the structural entropy rate of an Erdos-Renyi random graph is n h(p) − n log n + O(n) 2 where h(p) = −p log p − (1 − p) log(1 − p) is the entropy rate of a conventional memoryless binary source. Szpankowski is the first to present a graph coding scheme based on a measure of entropy rate. However, entropy is computed under the assumption that all graphs with an order n and size m are equiprobable and edges are formed with uniform probability. This is not the case in real-world networks which follow a power-law distribution. Burda’s entropy rate is generalizable to any graph but is not suitable for evaluating graph compressibility since a graph cannot be uniquely reconstructed solely with knowledge of its degree sequence. Szpankowski offers a novel compression method for graphs, however it is developed from examining random graphs only. This motivates the need for a new entropy measure which considers the unique structure across an array of graph topologies, not only random graphs, and can also lead to an appropriate and practical graph coding algorithm. 3.4.2 Proposed Graph Entropy Rate In this section, a new measure of graph entropy rate is proposed for unweighted (binary) and weighted graphs by modeling their adjacency and connectivity matrices as a Markov process. The adjacency matrix is directly related to the structure of a graph and therefore intuitively the spatial coherence of elements in the matrix will reflect the pair wise relationships among nodes in the network. For example, nodes in a random graph do not have an organized relationship and therefore it’s expected this random relationship will be reflected in the lack of spatial coherence among elements in the adjacency matrix. To examine the variation in adjacency matrix element 77 dependencies, similar to pixel dependencies in an image [131, 132, 133], it is proposed that the adjacency matrix be modeled as a Markov process. Entropy rate of a Markov process X1 , X2 , X...Xr is formally defined as H(χ) = − ∑ πi Pi j log2 Pi j ij (3.17) where P = [Pi j ] is the transition probability matrix such that i, j = 1, 2, ..., n, Pi j = Pr(Xr+1 = j|Xr = i), and π is the stationary distribution. In this section, we first introduce a new definition of the entropy rate of a binary graph, H(G), by transforming the binary adjacency matrix into a stochastic process X1 , X2 , ..., Xr , for r = 1, 2, ..., through permutation and scanning operations. If X is an mth order Markov process, we define the general form of the entropy rate for a graph as H(G; m) min Hx (ZAZ T ; ψ) Z,ψ (3.18) where Z is a permutation matrix applied to the adjacency matrix, A, and ψ is the particular scanning function of the upper triangular matrix elements. These two variables define our transition matrix, P. Equation 3.18 can be extended to evaluate the entropy characteristics of a weighted graph such that H w (G; m) min Hx (ZW Z T ; ψ;CQ ) Z,ψ,CQ where CQ is a codebook generated by a quantizer. 78 (3.19) 3.4.2.1 Matrix Bandwidth Minimization For a graph, G, composed of n nodes, there exists n! permutations, i.e. isomorphisms, of G. The structural information of the graph is retained across all isomorphisms of G. Therefore, to maximize the spatial coherence of the adjacency matrix elements, similar elements in the adjacency matrix are grouped together by implementing an optimization technique, matrix bandwidth minimization [134]. A labeling f of G assigns the integers {1,2,...n} to the vertices of G [135]. Let f (v) be the label of vertex v, where each node has a different label. The bandwidth minimization function generates a permutation matrix, Z, such that Z is an n order binary symmetric orthogonal square matrix where each row and column contains precisely a single 1 and zeros everywhere else. The permuted versions of the binary adjacency and weighted connectivity matrices are obtained, respectively, by ZAZ T and ZW Z T . For binary graphs, the bandwidth of a node vi , B f (vi ), is the maximum of the differences between f (vi ) and the labels of its adjacent nodes. That is B f (vi ) = max {| f (vi ) − f (v j )|} v j ∈N(vi ) (3.20) where N(vi ) = {v j : Ai j = 1} for j = {1, 2, ...., n}. The bandwidth of a binary graph G with respect to a labeling f is then B f (G) = maxv ∈V {B f (vi )}. i For weighted graphs, the weighted graph bandwidth minimization (WGBM) [134] cost function identifies a permutation of the connectivity matrix which places similar weights close together. A labeling f of weighted graph G assigns positive integers to the vertices of G. Let f (v) ∈ {1, 2, ..., n} be the label of vertex v, where each node has a different label. The objective of the WGBM is to minimize the maximum weighted label difference of an edge, i.e. the edge weight times the label difference. That is 79 B f (vi ) = max {wi j × | f (vi ) − f (v j )|} v j ∈N(vi ) (3.21) For a fully connected graph, e.g. pairwise phase synchrony matrices, N(vi ) = V \vi where V is the set of all vertices in the graph. The bandwidth of a weighted graph G with respect to a labeling f is then B f (G) = maxv ∈V B f (vi ). i Z is generated by reordering the rows and columns of an identity matrix according to the optimal labeling of the unweighted or weighted bandwidth minimization cost function, which can be based on node degree [136], clustering algorithms [137] , and centrality measures [110]. For example, DBSCAN is a density based clustering algorithm proposed by Ester et. al. [138] which applies a Euclidean distance measure, and identifies a number of clusters, for binary and weighted graphs, starting from the estimated density distribution of corresponding nodes. The advantage of this method is that a priori knowledge of the number of expected clusters is not required. Dongen et. al. [139] introduced a fast and scalable unsupervised clustering algorithm, Markov Cluster Algorithm (MCL), for binary and weighted graphs based on the simulation of random flow in graphs. Ding et. al. [140] introduced a spectral bandwidth minimization method (Spectral BWM) which is a cross between clustering and node centrality measures. The permutation matrix Z is formed by assigning labels to nodes based on their co-occurrence in a cluster and can maximize spatial coherence in the adjacency matrix. However, bandwidth minimization cost functions based on these three clustering algorithms are primarily suitable for highly modular networks only. An alternative and much more generalizable cost function is the Cuthill-Mckee algorithm by CutHill et. al. [141]. It is a variant on the breadth-first search of graphs and assigns labels based on the degree relationship of adjacent nodes. An extension of the Cuthill-Mckee algorithm to weighted graphs is proposed by Lin et. al. [142]. The weighted algorithm has a complexity of O(n3 ) where 80 n is the number of vertices in the graph and is generalizable to all weighted graphs. Walker et. al. [136] introduced Approximate Minimum Degree (AMD) for the purpose of reordering a symmetric sparse matrix prior to numerical factorization. The degree may be the binary or weighted definition. Again, Z is generated such that it satisfies the reordering conditions set by CuthillMckee or AMD reordering. In general, Cuthill-Mckee is far superior than AMD in minimizing the matrix bandwidth [143]. 3.4.2.2 Scanning Function of the Permuted Matrix The upper triangle elements of the permuted binary adjacency matrix are mapped to a 1dimensional vector of dimension 1 × k, where k = n(n−1) and n = |V |. In a binary sequence 2 for a first order Markov process, there exists 4 possible state transitions: 0 → 0, 0 → 1, 1 → 0, and 1 → 1. The states, 0 → 0 and 1 → 1, are non-transitional sub-sequences and 0 → 1 and 1 → 0 are transitional sub-sequences. The permutation matrix, Z, plays a major role in the entropy rate measure, H(G), since the entropy rate bounds are dictated by the frequency of occurrence of these sub-sequences, i.e. the magnitude of spatial coherence among adjacent elements. Entropy will be minimized if similar states are positioned next to each other in a permuted connectivity matrix since homogeneity of these subsequences is increased. A scanning function is implemented to extract this sequence from the permuted adjacency matrix. The scanning function, ψ, preserves the spatial coherence in the permuted 2-dimensional adjacency matrix when mapped to the 1-dimensional vector. This approach would facilitate convergence to the true lower bound on entropy rate. In image processing, there are various methods for scanning the pixel values into a vector. Four of the most common approaches are: raster scanning [144], Cantor Diagonal scanning [145], Morton Z order scanning [146], and Hilbert Curve scanning [147]. Raster scanning extracts elements in the matrix either in an alternating horizontal 81 pattern or an alternating vertical pattern. Cantor diagonal scanning extracts elements along the diagonals of the matrix. Both raster and Cantor diagonalization preserve the locality of element pairs in the matrix but fail to maximally preserve locality among blocks of elements. Morton Z order scanning is a function which maps multidimensional data to a one dimension vector while maximally preserving spatial locality of the data points. The z-value of a point in multi-dimensions is calculated by interleaving the binary representations of its coordinate values. The scanning pattern resembles a group of ‘Z’s linked to each other. Morton Z order scanning is better suited than raster scanning for grouping regions of similar pixels in large blocks of an image [148]. The Hilbert Curve algorithm is a continuous fractal space-filling curve whose discrete approximation is useful because it provides a mapping between 1D and 2D space that preserves locality. If (x, y) is the coordinates of a point within the unit square, and d is the distance along the curve when it reaches that point, then points that have nearby d values will also have nearby (x, y) values. Kamata et. al. [145] reports that neighborhood preservation of the Hilbert Curve scan is better than that of the raster, Cantor Diagonal, and Morton Z order scanning functions. 3.4.2.3 Quantization of the Weighted Connectivity Matrix Prior to matrix permutation and scanning on the weighted connectivity matrix, quantization is applied across the connectivity matrix in order to establish a finite alphabet of states, Q. Matrix elements are assigned the value pertaining to the level of quantization. A non-uniform quantizer is best suited for determining these levels since the distribution of values in the connectivity matrix of a real world network are typically non-uniform, e.g. most real-world networks have a power-law distribution of their weighted node degrees. Therefore, non-uniform quantization is generalizable to connectivity matrices representing a variety of graph topologies than uniform quantization. For a desired number of quantization levels, Lloyd-Max algorithm [149] offers an optimal solution for 82 minimizing distortion, typically quantified by the mean-square quantization error. 3.4.3 Localized Graph Entropy Rate In this section, the proposed graph entropy rate measure is extended to evaluate the local complexity of a network. The connectivity topology centered around individual nodes is expected to have an impact on overall network complexity. An algorithm is presented here which quantifies the magnitude of importance of individual nodes based on their contribution to network entropy. The proposed graph entropy measure is first applied to the original adjacency or connectivity matrix of graph G. Next, a node vi is selected from the network and its corresponding row and column from the matrix are removed, i.e. all edges incident upon vi in G are deleted. Once again, the proposed graph entropy measure is applied to the transformed graph. The process is formally described in Algorithm 4. This is repeated for each node to obtain n entropy difference measures. Nodes that contribute significantly to network complexity are expected to be associated with a large difference in entropy. The algorithm is applicable to both binary and weighted graphs under the condition that the appropriate graph entropy measure, H(G) or H w (G), is implemented. Algorithm 4 Local Graph Entropy Input: An n ordered graph G = (V, E). Output: hi ∀vi ∈ V . 1: h ← H(G). 2: for i = 1 to n do 3: G = G\vi . 4: h ← H(G). 5: hi = h − h. 6: end for 7: return 83 3.5 Results In this section, the proposed Laplacian Hückel Energy and Markov Graph Entropy Rate measures are applied to real-world networks. The performance of these measures, as well as the local indices based on these measures, are compared to published measures. The real-world networks evaluated are the Zachary karate network and the brain networks represented by the phase synchrony matrices obtained from the ERN EEG study. 3.5.1 Comparison Between Energy Measures In this section, the local energy index using the proposed Laplacian Hückel Energy is compared to the local indices of Gutman’s and conventional Hückel energies. These three measures are applied to the Zachary karate network (Figures 3.1, 3.2, and 3.3). The Laplacian Hückel Energy was more discriminative than the other two energy measures. In the 1-hop neighborhood, only Laplacian Hückel Energy associated very high energy contribution to nodes 1 and 33, the dojo leaders of each of the karate groups in network. In the 2-hop neighborhood, Laplacian Hückel Energy was much more discriminative than the other two measures and identified nodes 3, 9, 14, 20, 30, and 31 as nodes which contributed most to network energy. These particular nodes are responsible for long range connectivity since they represent the karate members associated with both groups in the karate network. The set of nodes identified as contributing most to energy by Gutman and Hückel energy measures include those identified by Laplacian Hückel Energy but also nodes whose long range connections are not necessary for maintaining connectivity across the network. These represent the karate members who are primarily associated with one karate group but not both. In the 3-hop neighborhood, Gutman, Hückel, and Laplacian Hückel Energy associate high energy contribution to most of the nodes. Only Laplacian Hückel Energy indicates node 17 as less important for maintaining connectivity since it is the most distant node and its removal 84 would not deteriorate overall connectivity. Therefore, Laplacian Hückel Energy is a much more discriminative local energy measure than Gutman and Hückel in the karate network. Next, the nodal Laplacian Hückel energies are compared to betweenness centrality (Bi ) and conventional clustering coefficient (Ci ) of nodes in the karate network (Table 3.1). Conventional clustering coefficient fails to recognize the importance of nodes 1, 33, and 34 through very low clustering coefficient values. This is due to the clustering coefficient only considering the interdependent relationships among nodes within the 1-hop neighborhood rather than the relationships a node has with the entire graph and its effect on the global energy of the network. The ranking of betweenness centrality agrees with the proposed measure for the nine nodes shown in Table 3.1. However, it does not indicate whether the node is important at a local or a more global scale. Table 3.1: Highest energy contributing nodes in karate network compared with their corresponding clustering coefficient and betweenness centrality Node h-hop 34 1 1 1 33 1 14 2 3 2 9 2 20 3 32 3 2 3 3.5.2 Energy Contribution 16.7 11.2 10.3 26.3 24.6 21.1 22.5 21.7 21.4 Ci 0.11 0.15 0.19 0.60 0.24 0.50 0.33 0.20 0.33 Bi 0.69 1.00 0.33 0.10 0.32 0.12 0.07 0.31 0.12 Laplacian-Hückel Energy of a Biological Network: EEG ERN data The local energy measure was applied to the ERN data set to determine which regions of the brain contributed most to network energy. An average matrix was first computed for both Error and Correct data across all subjects. Next, a threshold was applied to each average matrix to derive a sparsely connected weighted graph such that the average degree was k = 10. This value of k 85 (a) Gutman (b) Hückel (c) Laplacian Hückel Energy Figure 3.1: Local energy indices for a n = 34 karate club network: 1-hop Energy Analysis. was chosen such that varying h-hop levels could be evaluated. Evaluating energy for the average matrix without thresholding is equivalent to only one neighborhood containing all nodes, i.e. the 1-hop neighborhood, which did not provide useful localized energy information. A value of k = 10 allowed 5 different neighborhood levels to be observed thus demonstrating how increases in energy 86 (a) Gutman (b) Hückel (c) Laplacian Hückel Energy Figure 3.2: Local energy indices for a n = 34 karate club network: 2-hop Energy Analysis. contribution propagated across neighborhoods of increasing size or scale. Energy was reevaluated for graph mappings of the ERN data with average degrees of k = 40, 30, 20 to determine the extent to which k influenced energy in the biological network. Increasing k resulted in a larger set of hhop neighborhoods, i.e. beyond h = 5, yet the variations in nodal energy were significant for only 87 (a) Gutman (b) Hückel (c) Laplacian Hückel Energy Figure 3.3: Local energy indices for a n = 34 karate club network: 3-hop Energy Analysis. h = 1, h = 2, and h = 3 h-hop neighborhoods. Therefore, biological significances of the graph were evaluated with k = 10. Figures 3.4 and 3.5 reveal that the most influential brain regions to network energy contribution are located around the frontal and frontal/central regions of the brain. Previous work indicates that 88 the mPFC is expected to be more influential to communication across the brain during an error response than a correct response. This is shown to be true in the 1-hop neighborhood of Error Correct energy contribution differences (Figure 3.4). FCz, FC2, and FC4 demonstrated significantly (p ≤ 0.01 Wilcoxon Rank Sum) greater energy contributions to the network during error responses compared to correct responses. The 2-hop neighborhood demonstrated significantly greater energy during error responses (p ≤ 0.04 Wilcoxon Rank Sum) in the lateral regions of the brain, particularly for the right lateral nodes. This may be indicative of nodes in the lPFC communicating with distant neural regions due to activation of the ACC. Increased activation is also observed in the occipital region but is expected since the data was collected from a visual based test. Similar anomalies are observed in the 3-hop, 4-hop, and 5-hop neighborhoods (not shown) with decreasing significance. 3.5.3 Comparison Between Local Graph Entropy Measures In this section, the local entropy index using the proposed Markov Graph Entropy Rate is compared to the local indices using Burda’s and Szpankowski’s entropy rate definitions. First, a scale-free network composed of n = 60 nodes was generated. A subregion of the network, consisting of 17 nodes, was perturbed by randomly re-wiring the connections according to the Erdös-Rényi model with probability p = 0.5. This perturbed region contains the subset of nodes indicated by dark black node borders in Figure 3.6. The three local indices were computed for both the simulated network (Figure 3.6) and Zachary’s karate network (Figure 3.7). All entropy measures are normalized by the maximum node entropy. The graphs in Figure 3.6 demonstrate that the proposed Markov Graph Entropy rate measure is capable of identifying the nodes in the network contributing most to network connectivity. The node members of the perturbed region are all assigned magnitudes greater than the rest of the nodes 89 1−hop Energy: Error − Correct 4 3 2 1 0 −1 −2 −3 −4 Figure 3.4: 1-hop neighborhood Energy differences (Error - Correct). in the network. Szpankowski’s entropy generates results opposite that of the proposed entropy measure. According to the results of Szpankowski’s entropy, the subregions of the network with a more organized structure is deemed more complex than the subregion with random connectivity. Burda’s entropy rate fails to identify the perturbed sub-network and assigns high entropy rate to only one of the organized sub-networks. It fails to choose either the perturbed network as the proposed measure does or all the organized sub-networks like Szpankowski’s does. The Markov Graph Entropy Rate is therefore more discriminative than Burda’s entropy and correctly identifies the perturbed network unlike Szpankowski. The graphs in Figure 3.7 reveal very similar evaluation of node contribution to network com90 2−hop Energy: Error − Correct 20 15 10 5 0 −5 −10 −15 −20 Figure 3.5: 2-hop neighborhood Energy differences (Error - Correct). plexity. Nodes 1 and 34 are identified by all three methods as being important and in general the nodes centered in the network contributed to overall complexity. The similarities in nodal entropy contributions between all three methods may be due to the small order of the network. However, this result indicates there is consistency between the proposed entropy measure and published measures. 3.5.4 Entropy Rate Across Binary Graph Topologies Graph entropy 1 was computed for five types of artificial graphs: star graph, scale-free BarabásiAlbert graph, Erdös-Rényi random graph, three modular graphs with 10, 15, and 25 clusters, and a lattice graph. Fifty instances of each graph were generated for increasing order, n ∈ {50, 250, 91 (a) Markov Graph Entropy Rate (b) Szpankowski (c) Burda Figure 3.6: Comparisons between nodal graph entropy measures for identifying anomalous regions in a simulated network with n = 60 nodes. The network is composed of four complete graphs and a random graph performing as a perturbation in the network. 500, 1000}. Entropy was computed using both the proposed measure (Table 3.2) and that proposed by Burda in [127] (Table 3.3). Burda’s measure is computed as H(G) = log2 λ1 where λ1 is the largest eigenvalue of the adjacency matrix. Graph entropy was also computed for the following published undirected scale-free networks (Table 3.4): 1. Dolphin network: a social network of frequent associations between 62 dolphins in New Zealand [34]. 92 (a) Markov Graph Entropy Rate (b) Szpankowski (c) Burda Figure 3.7: Comparisons between nodal graph entropy rate measures for identifying important regions in the karate club network with n = 34 nodes. 2. Les Miserables: a co-appearance network of 77 characters in the novel Les Miserables [35]. 3. The North American airport network: a flight connection network composed of 456 airports [36]. 1 In this section, we employ our definition of the entropy rate to evaluate the graph entropy of different classes of popular graph topologies. Due to the finite nature of the graphs used in our simulations, effectively we are evaluating an idealistic measure of the graph entropy and not the true entropy rate, which theoretically requires infinitely large graphs. Our objective is still the same 93 Graph entropy was computed assuming a first-order Markov process, matrix bandwidth minimization was performed using the reverse Cuthill-McKee (RCM) algorithm [141], and the horizontal raster scanning function was implemented. Results indicated random graphs had an entropy of H ≈ 1 for all n using the proposed entropy rate. Burda’s entropy rate revealed random graph had the largest entropy rate across all the graph topologies investigated here. This is expected since random graphs do not have an organized structure in which an underlying pattern can be identified. For the rest of these results, the entropy values for all other topologies are normalized with respect to entropy of the random networks to allow a comparison between the proposed and Burda’s entropy rate measures. Table 3.2 reveals that both the lattice and star graphs possess trivial structures which result in a low entropy. However, the more complex structures underlying the modular, scale-free, and random graphs are expected to have a much larger entropy than star and lattice graphs. In contrast to the structure of random graphs, scale-free and modular graphs possess underlying organized structures. This permits the Cuthill-McKee algorithm to group together more 1s, i.e. generating more non-transitional sub-sequences composed of 1s. The scanned sequence results in a significantly lower entropy than a random network. Across all scale-free graphs generated, H ≈ 0.5 for low n and monotonically decreased with increased n. This is caused by the occurrence of longer non-transitional 0 sub-sequences generated by the growing network sparsity as n increases. Entropy across the modular networks are low because the individual clusters generate sub-regions of increased spatial coherence between neighboring elements in the permuted adjacency matrix. In general, the organized structures of scale-free and modular graphs are appropriately described by the lower measures of entropy. The application of our proposed measure on the real-world netthough; and that is to gain an insight about the entropy bound associated with different types of graph topologies. 94 works (Table 3.4) reveals they all possess entropy values similar to the simulated scale-free and modular networks. This is expected since these real-world networks are classified in the literature as scale-free since their degree distributions model power law distributions. It is important to note that the expected complexity of each of the observed graphs is reflected by their respective entropy quantified by the proposed definition. The definition of graph entropy rate by Burda assigns large entropy values to both lattice and random graphs. This measure is therefore misleading by indicating that a simple lattice is as complex as a random network. According to Sinatra et. al. [128], information diffusion across a lattice and random network will be similar since most nodes will have similar degrees. A network with a fairly homogenous distribution of degrees will permit all paths in the network to have equal probability of being traversed by a random walk, i.e. Burda’s entropy rate will increase. This is a consequence of Burda’s measure being highly dependent upon node degree. Analysis of the real-world network entropy, Table 3.4, reveals the dolphin network can be characterized as scale-free according to Burda. However, Burda’s measure characterizes the Airport and Les Miserable networks more like that of a random network. The increased order and sparsity of these two networks may be the cause for this misclassification by Burda’s measure. Table 3.2: Entropy computed for various graph models using the proposed measure. Fifty realizations of each model were generated and an average entropy is reported. Each value is normalized by that of a random network of equivalent order. Random (Erdös-Rényi) Modular (10 clusters) Modular (15 clusters) Modular (25 clusters) Scale-Free (Barabási-Albert) Lattice Star n = 50 0.99 ±10−3 n = 250 1.00 ±10−3 n = 500 1.00 ±10−3 n = 1000 1.00 ±10−3 0.46 ± 0.03 0.38 ± 0.01 0.27 ± 0.01 0.50 ±10−3 0.22 ± 0.01 0.23 0.51 ± 0.01 0.41 ± 0.01 0.33 ± 0.01 0.46 ±10−3 0.06 ±10−3 0.07 0.52 ± 0.01 0.42 ± 0.01 0.34 ± 0.01 0.39 ±10−3 0.03 ±10−3 0.04 0.51 ± 0.01 0.43 ± 0.01 0.33 ± 0.01 0.22 ±10−3 0.01 ±10−3 0.02 95 Table 3.3: Graph entropy computed for various graph models using Burda’s definition. Fifty realizations of each model were generated and an average entropy is reported. Each value is normalized by that of a random network of equivalent order. n = 50 n = 250 1±10−3 0.70±10−3 Random (Erdös-Rényi) 1 ± 0.04 Modular (10 clusters) 0.54 ± 0.06 Modular (15 clusters) 0.45 ± 0.06 Modular (25 clusters) 0.34 ± 0.07 Scale-Free (Barabási-Albert) 0.53± 0.17 Lattice 0.79 ±10−3 Star 0.61 0.64± 0.01 0.57± 0.02 0.39 ± 0.13 0.86±10−3 0.57 n = 500 1 ± 0.01 0.75 ± 10−3 0.69 ±10−3 0.63 ±10−3 0.38 ± 0.08 0.88 ±10−3 0.56 n = 1000 1 ±10−3 0.78 ±10−3 0.73 ±10−3 0.67 ±10−3 0.35 ± 0.08 0.90±10−3 0.56 Table 3.4: Entropy rates for three real scale-free networks using the proposed measure and Burda’s measures. All measures are normalized by the entropy rate of random networks with equivalent order. Proposed Measure Burda’s Measure Dolphin (n = 62) Les Miserable (n = 77) Airport (n = 456) 0.41 0.38 0.65 0.57 0.68 0.99 After computing the entropy rate of each network, the upper triangular sequences were coded using the Lempel-Ziv compression algorithm [150]. Tables 3.5 and 3.6 report the achieved compression of each network. Using the universal coding algorithm Lempel-Ziv, the elements within the adjacency matrix upper triangle, with the exception of random graphs, could on average be represented with less than 1 bit. The lowest bit/symbol occurred for lattice, star, and scale-free graphs particularly for higher graph orders. Across all n, the variations in compression are in agreement with the variations in entropy computed by our proposed measure. In general, the bits/symbol achieved by Lempel-Ziv, as compared with the computed entropy of both simulated and real-world networks, indicate that Lempel-Ziv may not be the best coding algorithm for this application. Our entropy measures indicate that the graphs can be compressed further. 96 Table 3.5: Bits/symbol achieved with Lempel-Ziv algorithm. n = 50 Star 0.50 Scale-Free (Barabási-Albert) 0.52 Random (Erdös-Rényi) 1.51 Modular (10 clusters) 0.94 Modular (15 clusters) 0.69 Modular (25 clusters) 0.64 Lattice 0.51 n = 250 0.12 0.14 1.33 0.68 0.58 0.51 0.14 n = 500 0.07 0.08 1.30 0.68 0.58 0.49 0.07 n = 1000 0.04 0.04 1.29 0.67 0.56 0.48 0.03 Table 3.6: Bits/symbol achieved using Lempel-Ziv algorithm for four real scale-free networks. Network Dolphin (n = 62) Les Miserable (n = 77) Airport Network (n = 456) 3.5.5 Bits/Symbol 0.74 0.63 0.73 Entropy with Respect to Permutation and Scanning Function In this section, the proposed entropy rate was evaluated across graph topologies, each with 100 node members, by varying the combinations of permutation cost functions and scanning functions. For the analysis, 50 simulations of each graph topology were generated. The plots in figures 3.8 through 3.10 reflect the average entropy rate for each possible permutation/scanning function pair. The permutation functions tested include: MCL, Spectral BWM, DBscan, Cuthill-Mckee, and AMD. The scanning functions include raster, Cantor, Morton, and Hilbert scans. According to the plots associated with random and star graphs, entropy rate remained fairly similar across all possible combinations. Therefore, the choice in permutation/scanning functions does not make a large difference in computing entropy rate for either topology. As for modular and scale-free graphs, entropy rate remained midway between that of a random and star graph. However, entropy rate was minimized for modular graphs using MCL as the permutation function combined with a vertical raster scanning function. MCL performed the best for forming denser 97 regions of 1s in the adjacency matrix. Scale-free graph entropy rate was minimized using the Cuthill-Mckee algorithm as the permutation matrix. There was no significant difference in entropy from using any of the five scanning functions. Interestingly, there was a large gap in entropy for lattice graphs with respect to certain permutation/scanning combinations. Spectral BWM, DBscan, and MCL performed poorly and the results were as good as if no permutation was applied to the adjacency matrix. Lattice graphs are not modular, which may explain why clustering algorithms such as these did not perform well. The lowest entropy rate measures were computed using either Cuthill-Mckee or AMD as the permutation matrices. The choice of scanning function for the lattice graph made no significant difference to entropy rate. 3.5.6 Graph Entropy with Respect to Order of the Markov Process In this section, we evaluate the effect of the choice of Markov process order on the computed entropy rate. Figure 3.11 reports the relationship between Markov order and graph entropy rate across graph topologies. Entropy was computed using the Cuthill-Mckee algorithm for permutation with a horizontal raster scanning function. The results indicate entropy rate of random graphs do not significantly change as the assumed order increases. This is expected since random graphs do not have much structure and increasing the Markov model order has no effect. Modular graph entropy significantly, p ≤ 0.05 Wilcoxon Rank Sum, decreased when transitioning from order 1 to order 2. This change in entropy supports the fact that there will exist isolated subregions, in the permuted adjacency matrix, containing a dense collection of 1s. Since these regions are finite in size, entropy change is restricted to a low Markov order. Scale-free graphs similarly perform as modular graphs with the exception that entropy decreases significantly, p ≤ 0.05 Wilcoxon Rank Sum, for higher orders. Entropy of lattice graphs do not change significantly across order number. This may be indicative of the lower entropy rate bound being achieved. Finally, entropy rate for 98 Random Networks, n=128 None Cuthill−Mckee Spectral BWM AMD DBscan MCL Graph Entropy Rate 1 0.8 0.6 0.4 0.2 0 Raster (horiz) Cantor Diagonal Raster (vert) Scanning Function, ψ Morton Hilbert (a) Random Modular Networks, n=128 1 None Cuthill−Mckee Spectral BWM AMD DBscan MCL Graph Entropy Rate 0.8 0.6 0.4 0.2 0 Raster (horiz) Cantor Diagonal Raster (vert) Scanning Function, ψ Morton Hilbert (b) Modular Figure 3.8: Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes. a) Random networks; b) Modular networks. 99 Scale−Free Networks, n=128 1 None Cuthill−Mckee Spectral BWM AMD DBscan MCL Graph Entropy Rate 0.8 0.6 0.4 0.2 0 Raster (horiz) Cantor Diagonal Raster (vert) Scanning Function, ψ Morton Hilbert (a) Scale-Free Star Networks, n=128 1 None Cuthill−Mckee Spectral BWM AMD DBscan MCL Graph Entropy Rate 0.8 0.6 0.4 0.2 0 Raster (horiz) Cantor Diagonal Raster (vert) Scanning Function, ψ Morton Hilbert (b) Star Figure 3.9: Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes. a) Scale-Free networks; b) Star networks. 100 Lattice Networks, n=128 1 None Cuthill−Mckee Spectral BWM AMD DBscan MCL Graph Entropy Rate 0.8 0.6 0.4 0.2 0 Raster (horiz) Cantor Diagonal Raster (vert) Scanning Function, ψ Morton Hilbert Figure 3.10: Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes (Lattice networks) star graphs decreases significantly up to the maximum order evaluated, 8. Star graphs are among the simplest topologies and therefore they are less of a challenge to compress. The only information required to reconstruct a star graph is the number of nodes in the network and the label of the center node. 3.5.7 Entropy Rate of a Biological Network: EEG ERN data We are primarily interested in understanding the change in complexity across neural networks involved during cognitive control. The proposed local weighted graph entropy measure was applied to each matrix containing the pairwise phase locking values for error and correct responses across subjects. Each matrix was quantized for Q = 2 to 10 levels. The quantization errors, measured as the mean square error (MSE), were evaluated across all subjects and are reported in Figure 3.12. Rate/Distortion plots are provided to demonstrate the variations in quantization errors when 101 Graph Entropy Rate vs Markov Order 1 0.9 Entropy Rate 0.8 0.7 Random Modular Lattice Scale−Free Star 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 Markov Order 7 8 Figure 3.11: Effects of order value on graph entropy measures for various graph topologies. applying the Lloyd-Max algorithm across all phase synchrony matrices. Figure 3.12 reveals low quantization errors across all quantization levels, Q, evaluated and as well as for levels beyond Q = 10. Variance across subjects, indicated by the error bars, is also small across the number of Markov states. This indicates the phase synchrony connectivity information was not significantly compromised due to quantization. Bandwidth minimization on the weighted matrices was performed using the weighted version of the Cuthill-Mckee algorithm and scanning was performed using the horizontal raster scan. The entropy differences for all 62 electrodes, representing different neural regions, were also computed across subjects. The topographical plots, obtained for quantization levels 2 ≤ Q ≤ 10, representing the significant difference, using a Wilcoxon Rank Sum, in local graph entropy between response types are reported in Figures 3.13 through 3.21. From these results, it can be seen that nodes located around the frontal-central region of the brain contributed significantly more to the complexity of the underlying brain network during error 102 Rate/Distortion of Quantized Connectivity Matrices (Error) 0.04 Average MSE 0.03 0.02 0.01 0 2 3 4 5 6 States (a) 7 8 9 10 Rate/Distortion of Quantized Connectivity Matrices (Correct) 0.04 Average MSE 0.03 0.02 0.01 0 2 3 4 5 6 States (b) 7 8 9 10 Figure 3.12: Rate/Distortion curves average and standard deviation across all subjects for each response type: a) error; b) correct. responses vs correct responses. As expected, the entropy contributions of nodes located around FCz, in the vicinity of the ACC, were significantly greater, p ≤ 0.02 Wilcoxon Rank Sum, in error responses. Across all realizations of the topographic plots, the difference between Error and 103 Figure 3.13: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 2 levels. Correct responses remained more significant around the frontal central region nodes than other brain regions. Regardless of the number of quantization levels, these significant differences still occurred. Beyond quantization levels of Q = 10, distortion significantly decreased; however, there was no significant difference between the response types in terms of the local entropy measures. 104 3.6 Conclusions In this chapter, new local feature extraction techniques were introduced for graphs. Well-known signal processing and information theoretic measures were adapted for explaining the underlying structure of a graph as well as for identifying possible anomalies within the network. With applications to the karate network and the ERN data, the localized graph energy measure was shown to identify nodes which contribute most to the structural dynamics of the graph. The proposed Laplacian Hückel energy was shown to be more discriminate than Gutman’s or the conventional Hückel energy definitions. Furthermore, the proposed energy measure identified the local neural populations of the brain which contributed most to the energy of the brain network. Energy during error responses was significantly greater than that for correct responses particularly in the mPFC and lPFC regions. The greatest 1-hop neighborhood energy was observed in the mPFC and the greatest 2-hop neighborhood energy was observed in the lPFC. Next, the proposed definition of graph entropy rate outperformed published entropy measures by correctly classifying various graph topologies in terms of their structural complexity. The proposed entropy rate computed by modeling the adjacency matrix as a Markov process, revealed node members in the perturbed region of a simulated network. Compared to the entropy rate definitions proposed by Burda and Szpankowski, the proposed measure was more discrimative for determining the magnitude of importance of each node within the network. The entropy measure also revealed local brain regions of increased complexity during the ERN. These areas were primarily situated in the frontal central region of the brain as expected according to the hypothesis stated in this thesis. Challenges still remain for adapting signal processing and information theoretic measures to graphs. These challenges include computing exact energy and entropy bounds based on the number 105 of nodes, edges, and topology of the network. Szpankowski presented a method for computing the lower bound entropy rate of random graphs but the next step is to compute the lower bound across all topologies especially for scale-free networks. Also, a coding method which achieves the entropy rates presented in this thesis would be beneficial for compressing all types of graphs. The graph compression and reconstruction algorithms should be applicable to both binary and weighted networks. 106 Figure 3.14: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 3 levels. 107 Figure 3.15: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 4 levels. 108 Figure 3.16: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 5 levels. 109 Figure 3.17: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 6 levels. 110 Figure 3.18: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 7 levels. 111 Figure 3.19: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 8 levels. 112 Figure 3.20: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 9 levels. 113 Figure 3.21: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 10 levels. 114 Chapter 4 Hierarchical Multi-Subject Community Detection for Weighted Neuronal Networks 4.1 Introduction One important characteristic of complex networks is modularity. A module is composed of a densely connected set of nodes with sparse connections between modules in the network [151] [152]. Node members of each module, or cluster, share a common trait, function, or other similar characteristic. Identifying this inhomogeneity of connections is important for understanding how network topology and function are related. Some of the applications of community detection include understanding the functional pathways of metabolic networks [113], discovering topic related webpages on the internet [153], and evaluating the rapid spread of disease throughout populations [41]. It’s hypothesized that the modular structure of complex neuronal networks is indicative of robustness [154] and contributes to functionality [113] by compartmentalizing specific functions within certain cortical regions without perturbing the rest of the network [155]. Intra-module associations are thought to describe the segregation of information processing while 115 the inter-module associations testify to the integration of information processing across distant brain regions [4]. This modular structure is therefore a compelling model for emphasizing the information distributing nature of the brain’s structure and function mappings [27, 61]. Community detection is an important topic in the study of complex networks yet there does not exist a precise formulation to uncovering the "correct" underlying structure; therefore, a vast array of different methodologies for community detection and cluster identification have been proposed throughout the literature [156] [119]. These methodologies are inspired by stochastic theory [157], [158] signal processing [159], and genetic algorithms [160]. The availability of highly complex networks has attracted the need for fast algorithms which find reasonably good partitions. The primary reason for the existence of numerous community detection methodologies is due to the lack of a priori knowledge of what a ‘natural’ partitioning of a network really is. How many communities should exist in the network? Are the sizes of the communities equal to each other or differ significantly? For example, should a network composed of 100 vertices split into two groups of 50 or two groups of 25 and 75? Is all the information provided in the network important or may some of it be classified as noise or anomalies? Also, should vertices be restricted to membership in one cluster or be allowed to participate in multiple clusters? The list of questions can go on and many of them may be answered with a priori knowledge of the community structure of the network. However, the availability of this knowledge would negate the need for a clustering algorithm. Therefore, community detection becomes a bootstrapping problem which depends solely upon the original data representing the graph, i.e. the weighted or unweighted connectivity matrix. There are a variety of clustering algorithms which address this problem including spectral clustering [161], Girvan-Newman [151], and k-means [162]. These and other algorithms found in the literature have been designed for a variety of applications. This chapter primarily addresses community 116 detection issues unique to analyzing the modular structure of the human brain. Identification of functional modules or communities in the brain has been originally addressed using methods like principle component analysis [163] and independent component analysis [164] which put non-physiological constraints in the obtained components such as orthogonality and independence. Recently, methods from spectral analysis of graphs have been used to extract modules [165, 140, 119, 166] by mapping the functional connections to a multi-dimensional subspace defined by a set of eigenvectors. This method, however, does not reveal a hierarchical decomposition of the network. Meunier et. al. [167] argues that most information processing systems possess a multi-scale modularity characteristic, i.e. are hierarchically decomposable into a finite number of modular levels. This property allows a complex system to be more rapidly and robustly assembled since it is embedded in a relatively low-dimensional space. Therefore, a hierarchical decomposition of a complex network, particularly the human brain, is a natural representation of modular organization. A key challenge in identifying modular organization of brain networks is determining a common structure across multiple subjects. Current work either focuses on obtaining the community structure for the average connectivity network or on analyzing each subject individually and obtaining a common modular structure using a voting technique. Averaging neglects the variance across subjects whereas conventional voting methods either require a priori knowledge of cluster labels or for one to first acquire the modular structure of each subject [168]. Voting does not permit convergence to an optimal modular structure since it requires only one iteration of the process. Various studies [169, 170] have proposed alternatives for identifying similarities across multiple subjects by introducing random effect analysis and Bayesian classifiers. With an application specific to module identification across multiple subjects, a possible solution would be to 117 implement a new consensus based approach [74, 171, 172] in which the best modular organization is identified by combining information shared across multiple modular structures and information provided by previous decompositions of the data set. Integrating modular information shared across multiple subjects decreases the uncertainty that node pairs are randomly clustered together and incorporating information from previous modular structures maintains consistency throughout the hierarchical tree. In this chapter, the most common methods designed for community detection are reviewed and shown how they differ fundamentally. Furthermore, this chapter addresses challenges pertaining to thresholding weighted graphs, community quality measures for weighted graphs, and cluster analysis across multiple subjects. The proposed methods are applied to weighted graphs obtained by computing pairwise phase synchrony values for the error-related negativity. The proposed community detection methods are applied to EEG data collected during a study of cognitive control in the brain based on the error-related negativity. In particular, this study focusses on understanding the modular organization of the brain during error processing. 4.2 Background on Community Detection on Graphs In this section, we provide a review of various community detection measures, implementation of consensus clustering, and quality measures for quantifying the “goodness" of a discovered community. 4.2.1 Community Detection on Graphs Numerous methods for identifying community structures within a graph exist throughout the literature [173]. These include spectral and hierarchical approaches. In this section, we briefly review some of the most popular published community detection algorithms. 118 4.2.1.1 Girvan-Newman One of the pioneers of community detection in complex networks is M. Newman, who along with M. Girvan, developed an iteratively divisive hierarchical algorithm referred to as the GirvanNewman method [151]. The algorithm iteratively removes edges in a graph that are deemed weakest compared to all existing edges where the strength of an edge is defined through betweenness centrality. Betweenness centrality is defined as the fraction of shortest paths within the network that travel through each edge, ei j , such that CB (ei j ) = ∑ m=i=n∈V σmn (ei j ) σmn (4.1) where σmn (ei j ) is the number of shortest paths between nodes vm and vn which contain edge ei j and σmn is the total number of shortest paths between vm and vn . The edge assigned the largest betweenness centrality is considered important to maintaining global network connectivity, i.e. inter-connecting edges between clusters will have high betweenness centrality and edges within clusters will have lower betweenness centrality. The Girvan-Newman algorithm computes the edge betweenness for all edges and removes the edge with largest CB from the graph. The removal process is repeated until the graph is partitioned into a desired number of clusters. Girvan-Newman algorithm performs well for identifying the boundaries of clusters and is easily implementable. However, Girvan-Newman algorithm has the disadvantage of running in O(m2 n) time on an arbitrary network with m edges and n vertices and therefore would be computationally expensive for non-sparse networks. Although it can be applied to weighted graphs, it is designed for unweighted networks since betweenness centrality evaluates shortest paths. Given a fully connected network, the shortest path will always be equal to one and betweenness centrality would have no effect on the results. The algorithm also does not consider overlapping clusters where vertices may belong to 119 more than one cluster. Finally, connectivity is based on shortest paths rather than all possible paths thus ignoring much of the available connectivity information particularly in a weighted network. The problem with overlapping clusters is addressed by an extension of the Girvan-Newman algorithm, the Cluster-Overlap Newman Girvan Algorithm (CONGA), proposed by Gregory et. al. [174]. CONGA extends the Girvan-Newman algorithm by incorporating a ‘split betweenness’, for splitting nodes into various clusters if a node is strongly connected to more than one cluster. CONGA divides the betweenness of every edge by its corresponding weight such that CB (ei j )new = CB (ei j ) Wi j (4.2) where Wi j is the weight associated with the edge between nodes vi and v j . This is followed by the calculation of each node’s ‘split betweenness’. Each node, vi , is split into two nodes, vi1 and vi2 , and an imaginary edge is placed between them. Using Eq. 4.2, the edge betweenness for the imaginary edge is computed, referred to as ‘split betweenness’. If there exists a node with ‘split betweenness’ greater than the edge betweenness of all the edges connected to it, then the node is split. Otherwise, the edge with maximum edge betweenness is removed as in the GN algorithm. The process repeats itself until all nodes are isolated or until the desired number of clusters are found. CONGA presents its results as a set of community structures C = {c1 , c2 , c3 , ..., ck } such that community c1 contains one cluster, community c2 contains two clusters, and community ck contains k clusters. CONGA addresses the problem of overlapping clusters but still depends on the shortest paths to define clusters and is not suitable for fully connected weighted graphs. 4.2.1.2 Spectral Clustering Another commonly used method for graph clustering is known as spectral clustering based on the eigenvectors and eigenvalues of a matrix describing the graph. The spectral decomposition of a 120 graph can be applied to its adjacency matrix, A, laplacian matrix, L, normalized laplacian matrix, L and modularity matrix, B. The spectral properties can be be used for identifying the number of clusters in the network and identifying memberships within a cluster. Since the matrices that are decomposed are symmetric, they will have an orthonormal basis of eigenvectors, {u1 , u2 , ..., un } with real eigenvalues λ1 ≤ λ2 ≤ ... ≤ λn such that Mui = λi ui , ∀i where M is any symmetric matrix of size n × n. Although spectral clustering can be performed on the adjacency matrix, it is typically performed on the Laplacian of a graph since it provides a better representation of network structure [161]. The Laplacian of a graph is defined as L = K − A where K is the diagonal matrix containing the degree of each vertex such that    k if i = j  i    Li j =  −1 if Ai j = 1      0 otherwise (4.3) . The Laplacian matrix is positive semi-definite and therefore its eigenvalues are all positive. The multiplicity of the eigenvalue λ = 0 indicates the number of components in the graph. Provided we are only interested in connected graphs, the multiplicity of λ = 0 is 1. It has been shown that the L , provides significantly better clustering results than the unnormalized counterpart [161]. L is defined as L = K −1/2 LK −1/2 such that normalized Laplacian,          Li j =         1 if i = j 1 if i and j are adjacent ki k j 0 otherwise 121 (4.4) . The normalized Laplacian matrix has eigenvalue properties similar to the un-normalized Laplacian matrix but has the extra advantage of having bounded eigenvalues such that 0 ≤ λ ≤ 2; where λ = 2 is indicative of a bipartite component within the graph. It has been shown that the number of clusters within a network is revealed by the largest eigengap [175] [176] [177] [178]. To find the largest eigengap, the eigenvalues are first arranged in descending order, i.e. λ1 ≥ λ2 ≥ ...λi ≥ λN where N is the number nodes in the network. Each successive pair of eigenvalues form an eigengap such that the ith eigengap is computed as λi − λi+1 . The number of communities is then defined as the location of the largest eigengap, e.g. there are i clusters if the ith gap is the largest eigengap. Another popular approach to community discovery is use of the eigenvectors to project the network into a lower dimensional representation [165] [140] [119] [166]. A common method of utilizing the eigenvectors is to take a subset, {u1 , u2 , ..uc } where c < n, of the computed eigenvectors and form an n × c matrix with the columns being the c eigenvectors. The maximum number of clusters, k, which can be detected by any clustering algorithm is limited to k ≤ c. The elements along the ith row of the matrix represent the projected node vector corresponding to node vi . One of the most popular methods to evaluate a set of eigenvectors for identifying cluster membership is the k-means algorithm [162] which minimizes the objective function k J= n ∑ ∑ ||xij − c j ||2 (4.5) j=1 i=1 where ||xij − c j ||2 is a distance measure between a data point xij , i.e. the jth element of the ith eigenvector, and the cluster centre c j . c j is chosen randomly and designated the seed node. However, one significant problem with the k-means method is the uncertainty about the number of 122 and selection of eigenvectors to utilize in the process. Another problem pertains to the selection of c j which can affect the resulting clusters. This remains a challenge in many other eigenvector based clustering techniques such as fuzzy k-means, generalized Synchronization Cluster Analysis [119], the Ng-Jordan-Weiss algorithm (NJW) [179], and power iteration clustering (PIC) [180]. An alternative to choosing a subset of eigenvectors of a network is to evaluate only one eigenvector for the purpose of bi-partitioning, i.e. identifying a minimal cut of the graph. This eliminates the problem of searching for the optimal set of eigenvectors. According to Holzrichter et. al. [181], the optimal minimal cut of a graph matrix is defined by the eigenvector, u2 , associated with the second smallest non-zero eigenvalue, λ2 , of the Laplacian or normalized Laplacian matrices. This eigenvector is referred to as the Fiedler vector, uF , and defines a set of two clusters {C1 ,C2 } where vi ∈ C1 |uF (i) ≥ 0 and vi ∈ C2 |uF (i) < 0. The Fiedler partition can iteratively be applied to each successive partition in order to achieve a clustering with k > 2. The partitioning of a graph will be referred to by the syntax, FiedlerPartition(G), which splits graph G into two clusters, G1 and G2 such that G1 ∪ G2 = V . The advantage of using the Fiedler vector over other eigenvectors is that the partitions performed on the network are guaranteed to be optimal, compared to identifying multiple splits in one iteration, since the Fiedler vector is the optimal solution to a min cut problem. The Fiedler vector also allows one to obtain the hierarchical structure of the network. Finally, the Fiedler partitioning method can be applied to a fully connected weighted matrix without the need of thresholding as opposed to the Girvan-Newman algorithm. 4.2.2 Consensus Clustering In many clustering problems, it is common to apply different algorithms to the same data and then use a consensus method to combine the results [182]. Two popular consensus methods are consensus averaging and the hypergraph partitioning algorithm (HPGA). The first approach 123 averages m connectivity matrices to obtain W = [wi j ] where 1 m (r) wi j = w m ∑ ij r=1 (4.6) A clustering algorithm, e.g the FiedlerPartition(G), can then be applied to W to obtain a hierarchical community structure. Inter-subject variability is not considered through this approach. The computational complexity of O(kN 3 ) is attractive for identifying a community structure for numerous graphs, however confidence in the resulting clustering is low since variance between the multiple similarity matrices is neglected. Variance between matrices should be considered in order to decrease the uncertainty that a node pair are not randomly assigned to the same cluster. A second option for obtaining a common community structure across multiple graphs is to identify the community structure of each individual graph. A process is then implemented which uses information common across the multiple community structures to identify a global community structure. With the availability of multiple graphs, nodes within each similarity matrix are first (r) (r) (r) partitioned into a set of clusters C(r) = {c1 , c2 , ..., c }, i.e. base clusterings, using a clustering k algorithm, where k is restricted to be the same across all graphs. These base clusterings can be used to identify a common community structure using the HyperGraph-Partitioning Algorithm (HGPA) [183]. HGPA treats each cluster across all base clusterings as a hyperedge within a single global graph. The algorithm is a multilevel graph partitioning system which partitions this graph in three steps: 1) compress the graph by collapsing hyper-edges, 2) partition the compressed graph using a minimum cut objective function, and 3) decompress the partitions and repeat the process. HGPA has a computational complexity of O(kNh) where h is the number of hyperedges, however its overall complexity is dependent upon the total complexity of the clustering algorithms used to obtain the base clusterings. HGPA has the disadvantage of generating clusters of approximately 124 equal sizes, even though in real networks, equally sized clusters are unlikely. 4.2.3 Quality Measures for Community Structures When a community structure is discovered, it is difficult to determine if it is a suitable representation of the network of interest without a priori knowledge. Therefore, it is necessary to use a metric which quantifies the quality of the discovered community structure. The literature overwhelmingly supports the use of a measure known as modularity. Modularity was developed by Newman et. al. [156] to determine if the number of edges between clusters is lower than expected in a random graph, i.e. less dense arrangement of inter-connecting edges, and if the number of edges within clusters are higher than expected in a random graph, i.e. highly dense arrangement of intra-connecting edges. Modularity compares a detected community structure with a network of equivalent size where edges are randomly linked among node pairs with probability Zi j = ki k j where ki is the degree of node vi . This value of Zi j maintains a fair comparison between the actual network and the random network, such that the expected degree of node vi in the random network should be equivalent to the actual degree of node vi in G such that ∑ j Zi j = ki . Since the random connection between vi and v j is dependent on their degrees in graph G, then Zi j is modified as ki k j Zi j = 2m where the factor of 2 accounts for counting each edge twice since the network is symmetric. If A is the adjacency matrix of G, then modularity is proportional to Ai j − Zi j summed over all node pairs belonging to the same cluster such that Q= 1 A − Zi j σ (ci , c j ) 2m ∑ i j ij (4.7) where σ (ci , c j ) = 1 if nodes vi andv j are in the same cluster and 0 otherwise and m is the number of edges in the network. This can also be rewritten as 125 Q= 1 B σ (ci , c j ) 2m ∑ i j ij (4.8) ki k j where Bi j = Ai j − 2m is the modularity matrix. Modularity may be negative or positive where positive values indicate the possible presence of a community structure. Therefore, one can quantify the quality of a community structure by evaluating the value of Q. The modularity matrix B is a symmetric matrix which may consist of both positive and negative values. However, if all the entries of B are negative, then the corresponding network is considered indivisible, i.e. any clustering algorithm will not reveal a good community structure and Q will be assigned low, possibly negative, values. The modularity matrix, B, can also be used to perform spectral clustering in the same way as the Laplacian matrix. Modularity for weighted graphs can be defined by modifying Equation 4.9 as Q= si s j 1 ∑ Wi j − 2m σ (i, j) 2m i j (4.9) such that m is the number of edges, si (indegree) is the sum of the ith row in W , and σ (i, j) = 1 if vi and v j are in the same cluster and 0 otherwise. In ideal cases, clusters will not overlap. However, in real world networks, this may not always be the case. Therefore it is imperative to used a modified version of modularity which considers nodes belonging to more than one community. Nicosia et. al. [184] proposed an improved modularity measure where a ‘belonging factor’ for each node vi , αi,c , is defined to quantify the strength of node vi in cluster c. αi,c = 1 for i ∈ c where s is the number of clusters a node belongs s to. Next, a coefficient for defining the ‘belonging strength’ of an edge within a particular cluster, c, is calculated as β (ei j , c) = 1 where f (αi,c ) is a linear scaling − f (α j,c ) − f (αi,c ) (1+e )(1+e ) 126 function, defined as f (x) = 60x − 30 in the literature. The modified modularity is defined as  Q= 1 β (e , c)W (e ) − ij ij 2m ∑ ij β 2 (ei j , c)ki k j 2m   σ (c , c ) i j (4.10) If no clusters overlap, then β (ei j , c) = 1 and the extended version of modularity will produce the same results as the original modularity presented in Eq. (4.8). 4.3 Hierarchical Modular Detection Algorithms for Weighted Neuronal Networks In this section, two methods for evaluating the modular structure of a fully connected weighted network are presented. The first combines CONGA and Nicosia’s modularity in an iterative manner to identify the ‘best’ overlapping modular structure of neuronal data. The second algorithm is a hierarchical consensus clustering method which utilizes the properties of the Fiedler vector for identifying a community structure common across multiple neuronal data sets. 4.3.1 Overlapping Community Structures in Weighted Networks Modularity has been established as a reliable method to determine if a community structure is optimal. In the case of weighted and symmetric graphs representing the functional connectivity between neuronal populations, several problems exist in determining the optimal cluster arrangement. First, clustering algorithms typically require fully connected weighted graphs to be mapped to binary graphs by thresholding. Thresholding allows the important information pertaining to possible clusters to be retained while filtering the unimportant information. As discussed in previous chapters, choosing a threshold which optimally represents the connectivity structure of a network remains a challenge. Sparsely weighted graphs may be evaluated by Girvan-Newman and CONGA by scaling edge betweenness by the weight of the edge however this idea fails for non-sparse or 127 fully connected weighted matrices as is the case for neuronal data where each pair of regions is assigned a weighted relationship. These methods fail since all edges would be considered equally important according to the definition of edge betweenness centrality. Therefore, thresholding is an important issue and discovering an optimal threshold which preserves the cluster information hidden within the data is necessary. Second, most clustering algorithms also assume a node does not participate in multiple clusters. In real-world networks, the expectation of a community structure in which clusters are not allowed to overlap is an unreasonable assumption. Overlapping clusters are increasingly expected as the network becomes more complex or larger in size [185]. Clusters within real-world networks, such as neural assemblies in the brain, must interact with each other in order for the network to function properly. These interactions may be due to nodal members common to multiple clusters. CONGA is one of the few clustering algorithms which considers overlapping clusters and therefore it is the basis of this optimal cluster identification process. Nicosia’s modularity is applied to evaluate the resulting overlapping community structures. Considering these challenges, the goal is to identify a single sparse binary mapping of the weighted data which results in the optimal community structure based on threshold (T ), number of clusters (k), and modularity (Q). We present an exhaustive search algorithm in terms of the threshold and number of clusters and determine the best combination of threshold and number of clusters based on Nicosia’s modularity since this permits us to evaluate a hierarchical soft clustering of the data. Using CONGA, the nodes of the weighted matrix are categorized into 2 ≤ K ≤ k clusters for each binary mapping generated by T where 0 ≤ T ≤ 1. The contribution of each node to the community structures is defined by an N × N ‘participation matrix’ as PM k (i, j) = σ (ck , ck ) i j T,c for each community structure generated by each T . Modularity, QT , is also computed for each 128 thresholded graph. σ (ck , ck ) = 1 if nodes vi and v j participate in the same clusters within the i j community structure ck and 0 otherwise. A ‘participation score’ for each node, vi , for within ck , is computed as PS 1 (i) = N ∑N PM k (i, j) which indicates how central each node is within j=1 T,ck T,c their respective cluster. This is applied to all PM matrices generated for the network. Next, all PS k are categorized according to the degree range, R, for which the associated threshold T T,c falls within. This generates a set of degree ranges, {R1 , R2 , .., R j } ∈ R∗ , where j is the cardinality of degree range categories, which are averaged together and stacked side by side with all ck to form the N × k participation score matrix PS(R j ). The modularity values are averaged to obtain an 1 × k modularity vector such that Qk (R j ) = 1 ∑ Q(T ). The following steps summarize |T | ∀T ∈R j the evaluation procedure. • Step 1: Acquire the fully connected weighted matrix, G = (V, E), where {wi j ∈ E|vi , v j ∈ V }. • Step 2: Iteratively threshold G by T , 0 ≤ T ≤ 1, to obtain a binary adjacency matrix (Figure 4.1). Figure 4.1: Weighted matrix to adjacency matrix for a given threshold 129 • Step 3: Apply CONGA and acquire K clusters such that 2 ≤ K ≤ k. • Step 4: Compute Nicosia’s modularity for each community structure (Equation 4.10). • Step 5: Obtain Participation matrices and Participation Scores across thresholds and different community structures, K (Figure 4.2). Figure 4.2: Participation Scores of various clustering derived from various thresholds of the network • Step 6: Average Participation scores based on the corresponding average degree, R. (Figure 4.3). The final participation score for node vi is defined as PS(i) = 1 ∑ PSi(R j ) k(N − 1) j 130 (4.11) Figure 4.3: Average Participation scores in terms of network degree A large PS(i) is indicative of a node which participates in overlapping clusters with high frequency, i.e. a possible central hub of the network. Next, to evaluate average participation scores for the entire network with respect to degree range, R j , and community structure, ck , compute P(R j ) = N ∑ PSi(R j ) (4.12) i=1 and form the j × k participation matrix, PSmat , where the jth row is P(R j ). Finally, to evaluate optimal modularity, the 1 × k modularity vector of each degree range, R j , containing the average modularity for each community structure, Qk (R j ), are assembled to form a j × k average modularity matrix. Maximum modularity can then be determined with respect to degree range, R j , and community structure ck . 131 4.3.2 Hierarchical Spectral Partitioning Algorithm for Identifying a Common Community Structure among Multiple Subjects Since the literature has consistently demonstrated that the Fiedler vector is highly effective in partitioning graphs into clusters [186, 187], in this study we will use the Fiedler vector for performing consensus clustering across multiple weighted graphs. The original connectivity matrices are bi-partitioned into two clusters using the Fiedler partitioning method. This results in a cluster matrix T r such that    1 if nodes v , v are in the same cluster i j r (i, j) = T   0 otherwise and r = {1, 2, ..., m} and m is the number of connectivity matrices, e.g. number of subjects in the brain study, evaluated. In order to find the common community structure across multiple graphs, T r (i, j) we introduce a co-occurance matrix P where P(i, j) = ∑m and P(i, j) ∈ [0, 1]. P(i, j) r=1 m is the probability that a pair of nodes are members of the same cluster across multiple graphs. The similarity matrix reflects the strength of a direct relationship between a node pair, whereas P reflects the likeliness that a pair of nodes are in the same cluster across all subjects. The Laplacian matrix of P is computed and the Fiedler vector is found to form a bi-partition of P into a community structure composed of clusters c1 and c−1 . Since P represents the probability that a node pair should be clustered together, the Fiedler partition of P represents the community structure common to all graphs. The initial partition set, C = {c1 , c−1 }, contains 2 clusters but if k > 2 is desired, the process can be repeated by selecting a cluster in C to partition. In this case, c1 or c−1 may be selected based on a user defined cost function. Next, sub-matrices are extracted from the original connectivity matrices such that they only contain the nodes of the chosen cluster. 132 Algorithm 5 Probabilistic Fiedler Clustering Algorithm 1: Input: m N × N dimensional graphs, G = {G1 , G2 , ..., Gm } with vertices V = {v1 , v2 , .., vN } and edges E r = {wr j : vi , v j ∈ V } such that Gr = (V, E r ) and r = {1, 2, ..., m}. i 2: Input: Number of clusters, k. 3: Output: k clusters C = {c1 , c2 , ..., ck } where c j ⊂ V . 4: C = 0 / 5: for t = 2 to k do 6: P = 0|V|×|V| 7: for r = 1 to m do ˆ ˆ 8: submatrix Gr ⊂ Gr |Gr = (V, E r ) ˆ 9: (V1 ,V2 ) = SubRoutine(Fiedler Partition(Gr )) r (i, j) where 10: P (i, j) = P (i, j) + T T r (i, j) = 1 if nodes vi , v j ∈ V1 or vi , v j ∈ V2 0 otherwise 11: end for 12: P = P m 13: (V1 ,V2 ) = SubRoutine(Fiedler Partition(P)) 14: C = C ∪ {V1 ,V2 }. 15: if t = k then 16: V = ci |i = maxq {Hcq } and cq ∈ C 17: E r = {wr j : vi , v j ∈ V } i 18: C = C\{ci } 19: end if 20: end for These sub-matrices are used to derive the new sub co-occurance matrix, Py , where y = 1 if cluster c1 was selected and y = −1 if cluster c−1 was selected. The Fiedler partition will result in two y y y y new clusters, c1 and c−1 . The final cluster set is C = {c−y , c1 , c−1 } which is a concatenation of the two new clusters with the original cluster that was not chosen for bi-partitioning. Algorithm 5 describes this process for obtaining community structures for a given number of clusters. The algorithm, in this case, implements the homogeneity cost function, Hcq , for selecting subclusters for bi-partitioning. Homogeneity is a quality function which will be described further in the next section. 133 Algorithm 6 Fiedler Partition 1: 2: 3: 4: 5: 6: 7: 8: Input: graph G = (V, E). Output: Vertex sets V1 and V2 . Compute Laplacian Matrix, L, of G. Compute |V | eigenvectors, u, and eigenvalues, λ , of L. Order eigenvalues in ascending order: λ1 ≤ λ2 ≤ ... ≤ λ|V | . uF = ui where i = minq {λq }|λq = 0. for j = 1 to |V | do vj ∈ V1 if uF ( j) ≥ 0 V2 if uF ( j) < 0 9: end for 4.4 New Weighted Graph Clustering Quality Measures The primary problem with iteratively dividing a network is determining the optimal number of clusters. The most commonly used optimization measure, modularity [156], compares a community structure to the expected community structure of a single random graph such that there exists a high number of edges within clusters and low number of edges between clusters. Unfortunately, modularity does not always result in the highest value for the true community structure [188] and can reveal a suboptimal structure which may be due to the simplicity of the random model comsi s j puted through 2m in Equation 4.9. In this section we propose alternative measures for quantifying the quality of a cluster partition across weighted networks. 4.4.1 Comparing Graph Partitions to Multiple Random Networks In this section, it is proposed that a comparison between the detected weighted community structure to numerous randomly weighted community structures be performed. It is expected that vertex pairs connected with large weights are more likely to be in the same cluster as opposed to pairs connected with a low weight. A community matrix is defined as 134   W i j if vi and v j are in the same cluster k (i, j) = C   0 otherwise (4.13) where k is the number of clusters. The random community matrix is obtained by randomly assigning each edge to one of k clusters and computing equation 4.13 for the random assignments (k,p) to obtain C , such that p = {1, 2, ..., r} is the number of permutations. Membership size of rand each cluster may differ between the random model and the detected community structure for each p, but the number of clusters must be the same. A large p is recommended such that numerous random clustering possibilities are evaluated. The average inter-cluster weight corresponding to each clustering matrix is computed as ˆ Ck = 1 ∑ Ck (i, j) N(N − 1) i j (4.14) and 1 (k,p) ˆ (k,p) = C C (i, j) rand N(N − 1) ∑ rand ij (4.15) ˆk ˆ (k,p) provides the random inter-cluster value. Averaging over all p such that Crand = 1 ∑ p C rand |p| ˆ ˆk Then, k is found such that Q = Ck − Crand is maximum. A maximum value indicates that the vertices within a cluster share significantly larger weights than if they were clustered with any other set of vertices. This method relies heavily on weights and demonstrates how weight distribution is a significant contributor to community structure. 135 4.4.2 Homogeneity and Completeness By the definition of a cluster, the pairwise connections within a cluster must be stronger than the inter-cluster connections. In this section, measures that evaluate the quality of a particular clustering structure based on the distribution of the inter and intra-edge distributions across m graphs are proposed. These distributions will be defined similar to probability mass functions (pmfs). Prior to defining the pmfs of intra-cluster and inter-cluster edges, a function that maps (r) (r) (r) the continuous edge values to a discrete alphabet is defined as f : Wi → Si where Wi = (r) (r) (r) {wi j ∈ [0, 1]} refers to the ith row of the rth similarity matrix, Si = {si j ∈ {1, 2, ..., N}}, and r = {1, 2, ..., m}. The elements of each row of the connectivity matrix across subjects are mapped to discrete integer values between 1 and N to eliminate the variation of edge strengths across subjects and extract only relational information about the pairwise edge strengths. The rank function is proposed to perform this mapping such that the node pair with the largest similarity measure is assigned a 1 and the weakest node pair is assigned N. When a particular cluster set C = {c1 , c2 , ..., ck } is identified, the probability mass function of intra-cluster ranks for a particular cluster, ct , is defined as    intra (β )  Fct intra Pct (β ) =  ∑N F intra (β )  β =1 ct (4.16) where intra (β ) = Fct m N ∑ ∑ δ (srj , β ) i r=1 i, j 136 | vi , v j ∈ ct (4.17) t ∈ {1, 2, . . . , k}, β ∈ {1, 2, . . . , N}, and δ (x, y) =    1 if x = y   0 otherwise Similarly, the probability mass function of inter-cluster ranks for cluster ct is defined as    inter (β )  Fct inter (β ) = Pct  ∑N F inter (β )  β =1 ct (4.18) where inter Fct (β ) = m N ∑ ∑ δ (srj , β ) i | vi ∈ ct ; v j ∈ ct / (4.19) r=1 i, j The distribution of edge ranks within a cluster is expected to be concentrated around smaller values compared to the distribution of ranks between clusters since this would be indicative of optimal segregation in a community structure. This is quantified using a one-sided Mann-Whitney U test ˜ ˜ intra such that the relationship between medians, x, is expected to be x(Pct ) <˜ (Pct ) with a x inter significance of p ≤ α. A score is assigned to cluster ct as Qct =    1 if x(Pc ˜ intra ) < x(Pct ) ˜ inter t (4.20)   0 otherwise k This is repeated for all k clusters and an average quality score is computed as Q = 1 ∑t=1 Q(ct ). k Q will always be within the range of 0 and 1 where values closer to one indicate that most clusters contain strong within cluster relationships and weak between cluster relationships across all graphs. The proposed quality measure ensures intra-cluster relationships are significantly ‘stronger’ 137 than inter-cluster relationships. However the next step is to quantify homogeneity and completeness, two principle characteristics which further determine the quality of clustering. A homogenous cluster contains only data points which belong to the same class while a complete cluster contains all possible data points within the sample space (Figure 4.4). (a) (b) (c) Figure 4.4: Variations in homogeneity and completeness where the true number of modules is 3; a) High homogeneity and low completeness; b) High completeness and low homogeneity; c) High homogeneity and completeness. Homogeneity and completeness are approximately inversely proportional to each other such that a cluster with high (low) homogeneity will have low (high) completeness. Similar measures such as F-measure [189] and V-measure [190], have been used in the literature to quantify the quality of a cluster. Both measures, however, require a priori knowledge of class labels. In most cases, this ‘ground truth’ is unknown and therefore alternative measures of cluster accuracy are needed. 138 For this reason, the characteristic that homogeneity is inversely proportional to the variance of edge ranks within a cluster is exploited. If a particular cluster is homogenous, then we would expect the pairwise connection strengths among the members of that cluster to be close to each other, thus implying the ranks of the weights to have small variance. We propose to quantify this variation through a measure of normalized entropy such that the lowest homogeneity score is obtained for a cluster containing a uniform representation of ranks or large variation among the edge weights and the maximum homogeneity score is obtained for a cluster containing only one rank. Therefore, a normalized entropy measure of the cluster’s intra-cluster rank distribution would be indicative intra ) k H(Pc of homogeneity. Homogeneity is defined as Hct = 1 − 1 ∑t=1 logt N and will be maximum k 2 intra ) << log N. when H(Pct 2 Similarly, we define a metric to quantify completeness using relative entropy between the intercluster rank and intra-cluster rank distributions. Rank distributions for inter-edges and intra-edges are expected to be different from each other to maximize completeness. The similarity between two distributions is commonly quantified using divergence measures. In this paper, we propose to use the Jensen-Shannon divergence measure [191] since it is symmetric and bounded. With respect to completeness, a divergence measure approaching 1 is synonymous with increased comk intra inter pleteness. Completeness of a cluster is therefore defined as Cct = 1 ∑t=1 JS(Pct , Pct ) k p(i) q(i) 1 inter where JS(p, q) = 2 ∑i p(i)log2 + ∑i q(i)log2 , p = Pct , and 0.5(p(i)+q(i)) 0.5(p(i)+q(i)) inter k k ˆ ˆ q = Pct . Average completeness, C = 1 ∑t=1 Cct , and average homogeneity, H = 1 ∑t=1 Hct , k k are computed across all clusters. An approach to optimize both homogeneity and completeness is to maximize a cost function, ˆ ˆ λ = Cγ + H(1 − γ) 139 (4.21) where γ ∈ [0, 1], for the whole community structure while maintaining an acceptable user defined value of Q. 4.5 4.5.1 Results Overlapping Clusters in the Brain Network During the Error-Related Negativity Since the data is in the form of fully connected weighted matrices, the proposed optimization clustering process using CONGA was well suited since it evaluated a range of thresholds to identify an optimal community structure. First, the phase synchrony data from all 90 subjects were averaged together to form a single error and correct response matrix. Then, the network was evaluated for 2 ≤ k ≤ 10 clusters and identified the degree range, R, and cluster size, k, which resulted in different modularity values, Q. Finally, the difference between the participation scores belonging to the two response types, Error and Correct, for each degree range is then found as error correct || . ||C − E|| = ||PSmat − PSmat F Table 4.1: Top 3 electrodes with largest differences between Correct and Error ‘participation scores’ Degree Range, R ||C − E||Frobenius 31-35 6.62 26-30 6.73 21-25 7.18 16 -20 5.58 11-15 4.75 6-10 4.29 Electrode 1 F8 FC4 CP2 P2 F1 F6 Electrode 2 AF3 AF8 CP4 AF8 P2 F1 Electrode 3 CP2 AF3 FT7 C5 F6 FC4 Figures 4.5 and 4.6 are plots of modularity versus degree range R j and cluster size k for Correct and Error responses, respectively. Clusters with high modularity were observed for sparsely connected graphs, R ≤ 25, and for smaller number of clusters k ≤ 8. This is an expected result since 140 sparsely connected binary mappings of the phase synchrony index computed from EEG exhibit small-world network characteristics. Therefore, this results in tightly formed clusters [192]. Thus, a high modularity would be expected for these lower degree ranges where an underlying organized structure is observed. A two-sample t-test was applied for each electrode to determine the electrodes which showed significant differences between Error and Correct based on the ‘participation score’. Table 4.1 lists the top three electrodes with respect to the degree ranges that demonstrate the most significant (p ≤ 0.05) difference between Correct and Error responses. The table shows that the largest differences occur for degrees 21 ≤ R ≤ 25 and 26 ≤ R ≤ 30. The electrodes that demonstrated the most significant difference between Correct and Error responses are FC4, AF8, AF3, CP2, CP4, and FT7, corresponding to the frontal and central regions of the brain in accordance with prior work [19]. Finally, the optimal size, k, for a clustering community was determined by finding the k which maximizes ∑N (PSC − PSE k ) in 21 ≤ R ≤ 25 for 2 ≤ k ≤ 10 and is found to be equal i=1 R,ci R,ck i to 6. The clustering community corresponding to the largest difference between Correct and Error responses are shown in Figures 4.7 and 4.8. These optimal clusters have an average modularity of 0.5 for Correct responses and 0.45 for Error responses. It is observed that many of the nodes participate in more than two clusters during both Correct and Error responses. The plots associated with the Correct response demonstrate global non-localized participation of neural regions during the task response test as demonstrated by the large overlapping clusters spread around the topographic line plots. However, the plot of Figure 4.8b strongly indicates that the neural assemblies located around the anterior cingulate cortex specifically formed a highly synchronized cluster in response to the ERN. As expected, this region shows the primary distinguishing graph features between Correct data and Error data since the ACC would only be activated during an error response. 141 Therefore, the proposed optimization technique identified the distinguishing features of neuronal data represented by fully connected weighted matrices. 4.5.2 Hierarchical Spectral Consensus Clustering Analysis In this section, the proposed Fiedler Consensus Algorithm is implemented for revealing the hierarchical modular decomposition across multiple graphs. The optimal modular organization ˆ will be identified by maximizing the quality measure Q, the homogeneity H, and the completeness ˆ score C proposed in section 4.4.2. First, the results obtained for a set of simulated networks will be compared to those obtained through averaging the similarity matrices and HPGA. Next, the proposed measures will be used to identify a modular structure which best describes the multivariate relationships across multiple subjects from connectivity graphs obtained from EEG data. The parameters of the quality measure, Q, and cost function, λ , were set to α = 0.01 and γ = 0.5, respectively. 4.5.2.1 Comparisons with Averaging and Conventional Voting Consensus Measures Ninety simulated networks consisting of 64 nodes and composed of 8 modules were generated to evaluate the Fiedler consensus method against two other consensus clustering methods, averaging and the HGPA. A mixing parameter was selected such that the difficulty of identifying the true modular structure increased with decreasing η where 0 ≤ η ≤ 1. The pairwise elements of each of the simulated connectivity matrices ranged between 0 and 1. The weights of the intra-modular edges were selected from a log-distribution with a mean of µintra = 0.8 and a standard deviation of σintra = 0.1. The weights of the inter-modular edges were selected from a log-normal distribution with a mean of µinter = (1 − η)µintra and a standard deviation of σinter = 0.1. Across all networks, each module had 8 members. Using the proposed Fiedler Consensus Algorithm, the Averaging method, and HGPA, the net- 142 works were evaluated for 2 ≤ k ≤ 20 modules. The best modular structure was selected where Q and λ were maximum. Table 4.2 reports the number of modules for each method with respect to varying degrees of inter-modular strengths. The proposed approach performed best at identifying the correct number of modules for η =1,0.5, 0.1 while Averaging and HPGA were increasingly less accurate. Therefore, the Fiedler Consensus Algorithm, along with the maximization of Q and λ is suitable for identifying the optimal number of modules in a network. Table 4.2: Number of modules identified through optimization of Q and λ in simulated data. Method/η Fiedler Method Averaging HGPA 4.5.2.2 1 8 7 8 0.5 8 7 7 .1 0.05 8 9 7 3 10 20 0 20 17 19 Hierarchical Modular Structure of the Brain During the Error-Related Negativity The new time-varying measure of phase synchrony in conjunction with graph measures are applied to a set of EEG data containing the error-related negativity (ERN). First, the proposed Fiedler consensus approach was applied to the set of Error and Correct data in order to identify an optimal modular structure. Hierarchical decomposition of the networks stopped when Q and λ were maximized. Optimization revealed Error responses were best represented by a structure composed of 4 modules (Figure 4.9), with Q = 1 and λ = 0.17, and Correct responses with 2 modules (Figure 4.10), with Q = 1 and λ = 0.15. Figures 4.11 and 4.12 reveal sparser versions of Figures 4.9 and 4.10 such that only node pairs found to possess significantly greater phase synchrony, (Wilcoxon Sign-Rank, p ≤ 0.05), during Error responses compared to Correct responses are shown. The optimal modular organization describing Error responses consists of three distinct modules located around the mPFC and lPFC brain regions. The appearance of the central module is driven by increased mPFC activity which may be indicative of the activation of a monitoring system, anterior 143 cingulate cortex (ACC), which signals the need for enhanced cognitive control during conditions of conflict or after an error. The optimum modular organization of the brain during Correct responses does not indicate any segregation among nodes in the frontal region. The only partition is between the parietal and frontal regions of the brain. Since an error was not committed, activation of the ACC did not occur and therefore the brain network foregoed the need for increased cognitive control. 4.5.2.3 Inter and Intra Edge Phase Synchrony To further assess the statistical significance of the resulting modular structures of the brain, Q and λ were computed for random connectivity matrices. A random matrix was generated by randomly shuffling the values of each of the original connectivity matrices while preserving the average degree of the network. One hundred random matrices were generated for each subject across Correct and Error responses. The random connectivity matrices were partitioned into 2 ≤ k ≤ 10 clusters using the Fiedler Consensus Algorithm and the resulting Q, λ , and weighted modularity values were compared against Q, λ , and modularity of the Correct and Error response modular networks. The test indicated Q and λ resulting from the Correct and Error modular networks were greater than 100% of the corresponding random measures. Furthermore, no discernible modular structure was observed throughout any of the random graph partitions. Next, average intra-modular and inter-modular edge phase synchronies were computed for Correct and Error matrices and are presented in Figures 4.13 and 4.14. These plots describe the segregated and integrated processes occurring in the brain. Strong functional connectivity is expected within modules and although weaker, connectivity is expected between modules in order to facilitate in the integration of information processing. Evaluation of the Error response crossmodule average synchrony matrix indicates the mPFC module shares greater (p ≤ 0.05) phase 144 synchronization with the left lPFC and right lPFC modules than with the Parietal module. There is less cross module integration between the Correct response modules. Furthermore, increased functional connectivity within the mPFC region during the ERN is illustrated by higher intra-modular synchrony compared to the other three modules (p ≤ 0.01) . 4.5.2.4 Local Hubs of the Brain Network Finally, the within module degree and participation coefficient were computed for all nodes across subjects. Figure 4.15 shows the average differences between the measures for Error and Correct responses. The within module degree was significantly greater , p ≤ 0.05, for Error along the far left and right lateral prefrontal cortex regions as well as in the medial prefrontal cortex. This increase in phase synchrony intensity may indicate greater enforcement of cognitive control by the mPFC. The participation coefficient was significantly greater, p ≤ 0.01, in Error across the entire pre-frontal region of the brain. This may be indicative of increased integration of information processing between the monitoring and control regions of the brain. 4.6 Conclusions In this chapter, two new approaches for community detection, in fully connected weighted graphs, which represent the multivariate relationships among neural populations were presented. First, using CONGA in an optimization process, a method to identify an optimal community structure composed of overlapping clusters based on thresholding and a recent definition of modularity was proposed. The proposed implementation of CONGA permitted modules to overlap and identified a hierarchical decomposition of the brain network. Furthermore, identification of the optimal modular structure representative of human brain connectivity without the need for a priori information on the number of clusters or cluster size is possible with our proposed algorithm. Hierarchical spectral partitioning algorithm using the Fiedler vector for discovering a common community 145 structure across multiple subjects was also presented. By considering the cluster tendencies of nodes across multiple subjects, a community structure representative of all evaluated data could be generated. This approach is an improvement to conventional methods which include averaging which neglects variance information contained across data sets. Finally, alternative measures to the modularity function, an indicator of cluster quality, are presented which were shown to be effective in evaluating community structures identified in weighted networks. The proposed measures were applied to the set of EEG data representing the error-related negativity and revealed that neural assemblies located around the Anterior Cingulate Cortex tend to participate in a common cluster. Both the CONGA optimization process and the Fiedler Algorithm show that node FCz is within the center of this cluster only during Error responses. Therefore, this is indicative of increased in-homogenous structure in the frontal region of the brain compared to Correct responses during the task related test. This observation can be further explored by focussing analysis on the frontal region of the brain. Data associated with the parietal region of the brain may contribute noise to the clustering algorithms and therefore ignoring these regions will most likely reveal further hierarchical levels of the community structure. 146 Figure 4.5: Modularity with respect to degree and number of clusters for Correct responses. 147 Number of clusters, k 5 4 3 2 6−10 11−15 16−20 21−25 26−30 Average Degree Range, R Modularity for Correct responses 31−35 36−40 0 0.2 0.4 0.6 0.8 1 Figure 4.6: Modularity with respect to degree and number of clusters for Error responses. 148 Number of clusters, k 5 4 3 2 6−10 11−15 16−20 21−25 26−30 Average Degree Range, R Modularity for Error responses 31−35 36−40 0 0.2 0.4 0.6 0.8 1 (a) (b) (c) (d) (e) (f) Figure 4.7: Six overlapping clusters derived for Correct responses. 149 (a) (b) (c) (d) (e) (f) Figure 4.8: Six overlapping clusters derived for Error responses. 150 Figure 4.9: Hierarchical decomposition of the brain network during an Error response. 151 Figure 4.10: Hierarchical decomposition of the brain network during a Correct response. 152 Figure 4.11: Significance cluster plots for Error responses. 153 Figure 4.12: Significance cluster plots for Correct responses 154 Figure 4.13: Cross-Module average synchrony for Error responses across four clusters. 155 Figure 4.14: Cross-Module average synchrony for Correct responses across 2 clusters. 156 (a) (b) Figure 4.15: Normalized differences between Hub Identification Measures between Error and Correct a) Within Module Z-Score; b) Participation Coefficient. 157 Chapter 5 Summary and Future Work This dissertation contributes to the analysis of functional brain networks from a complex network theory perspective, in particular using tools from graph theory. The limitations of current graph theoretical analysis of biological networks, such as requirement of binary graphs and global topological measures, are addressed by developing new measures that can be directly applied to weighted networks and can reveal both local and global structural information about the network. Signal processing, information theoretic, and stochastic analysis tools have proven invaluable for characterizing complex signals and systems, and in this dissertation, are integrated with graph theory to produce new measures more natural to our problem. These methods were applied to a set of graphs representing the pair-wise synchrony among neural assemblies quantified by a recently introduced time-varying phase synchrony measure applied to the electroencephalogram (EEG). 5.1 Summary In the first part of this dissertation, the ‘small world’ structure of a complex network was discussed. Based on evidence from previous studies, it was hypothesized that an analysis of ‘small world’ would describe the effective integration of multiple segregated sources of information in 158 the brain and how changes in the ‘small world’ structure are indicative of an adaptive response to error processing. We explored how graph metrics, indicative of ‘small world’ structure, can be made suitable for the analysis of fully connected weighted undirected networks. The two defining measures of ‘small world’, clustering coefficient and path length, were originally defined for binary adjacency matrices. In this chapter, a clustering coefficient and path length that is directly applicable to weighted graphs through a probabilistic approach. Clustering coefficient was computed based on the probability of connectivity among triplets in the network while path length was computed as a cost function which minimized the product of edge weights along a path, of any length, between distinct nodes. These measures were applied to both synthetic networks as well as weighted functional networks derived from pairwise phase synchrony values and revealed increased ‘small world’ phenomenon in the frontal region of the brain. The second part of this dissertation introduces local metrics for identifying influential and anomalous nodes, or nodal regions, in a network. Since feature extraction of data is widely explored and is an important tool in signal processing, machine learning and pattern recognition, we extended well-known methods, energy and entropy, to graphs in order to provide a more detailed and multi-scale view of the organization of the network. Laplacian-Huckel graph energy is introduced as a localized measure which quantifies the stability and reliability of local network connectivity and information processing centered on a node. This measure was shown to be a better representation of node centrality than existing measures such as betweenness and closeness centrality. Another new measure, graph entropy rate, was introduced to quantify the local structural complexity of a weighted network. Based on a Markov process modeling of the connectivity matrix, graph entropy rate was demonstrated to be closely related to the topology of a graph. Measures of graph energy and entropy rate revealed greater influences in overall connectivity and 159 complexity by nodes in the frontal region of the brain, primarily located around the FCz electrode, during cognitive control. The last part of this dissertation introduces a multi-subject clustering algorithm to evaluate multivariate relationships across the brain network. Recent studies have shown that the modular structure, i.e. clusters, of complex biological networks is indicative of robustness and contributes to functionality by compartmentalizing specific functions within certain cortical regions without perturbing the rest of the network. Intra-module associations are thought to describe the segregation of information processing while the inter-module associations testify to the integration of information processing across distant brain regions. A key challenge in identifying modular organization of brain networks is determining a common structure across multiple subjects. Current work either focuses on obtaining the community structure for the average connectivity network or on analyzing each subject individually and obtaining a common modular structure using a voting technique. Averaging neglects the variance across subjects whereas conventional voting methods either require a priori knowledge of cluster labels or for one to first acquire the modular structure of each subject . We developed a community detection algorithm which is based on the well-known Fiedler vector. This algorithm was best suited for identifying a common hierarchical modular structure shared across multiple subjects. To further evaluate the quality of the discovered modular structure, we introduced two new measures, homogeneity and completeness, which applied concepts from information theory to quantify the quality of the modules. The algorithm was applied to the phase synchrony data to reveal that the brain was significantly more modular during error processing particularly in the frontal region of the brain. 160 5.2 Future Work There are still remaining challenges with the application of graph theory to analyzing realworld networks such as the human brain. Some of these challenges and possible solutions include: • The application of signal processing transforms to graphs: Ortega et. al. [106] proposed a lifting wavelet transform on sensor networks for de-noising applications. Lifting wavelet transforms are invertible and therefore would be ideal for decomposing a graph into its high and low frequency components. If a network is modeled by its frequency spectrum described by the Laplacian matrix eigenvalues, high frequency transitions will occur between clusters and low frequency transitions within clusters. Therefore, the lifting wavelet can be used to filter the edges contributing to high frequency changes thus revealing the clusters of the network. However, Ortega’s application of lifting wavelets is on sensor networks for which data is represented by values assigned to the nodes. For the graphs of interest in this dissertation, information is stored on the edges rather than the vertices. Therefore, the lifting wavelet transform should be applied to the line graph of the network. For any given graph G = (V, E) its corresponding line graph L(G) = (V , E ) is defined such that |V | = |E| and there is an edge in L(G) for each pair of edges in G that share a common endpoint, i.e. {(i, j) ∈ E } ↔ {∃k|(k, i) ∈ E, (k, j) ∈ E}. By decomposing the line graph, removing the detail coefficients, and reconstructing the graph from its remaining coefficients, the weights along edges linking clusters should be minimized while weights within clusters are emphasized. An invertible wavelet transform applied to a weighted connectivity matrix would be useful for community detection. The accuracy of spectral clustering using k-means, for example, would increase since information contained in the eigenvectors would pertain primarily to intra-cluster relationships. 161 • A coding method for non-tree weighted graphs: The graph entropy rate measure introduced demonstrated the compressibility of various graph topologies, random networks being the most difficult to compress. However, there does not exist a method to compress and reconstruct the original graph based solely on the adjacency or connectivity matrix. Lempel-Ziv was applied to code the adjacency matrix, but the resulting bits/symbol did not achieve the entropy computed using the proposed graph entropy rate measure. The Prufer sequence [193], developed for coding binary tree graphs, may be extended to non-tree graphs. It may be possible to deconstruct the graph into a set of trees from which each could be individually coded using Prufer’s sequence. • Analysis of dynamic graphs: The modular structure of the brain has up to this point been evaluated for static connectivity matrices. It may be beneficial to evaluate dynamic modular structures of the brain in order to evaluate the evolution of network organization across the brain before and after the occurrence of the error related negativity. Bai et. al. [194] published a study on the dynamic brain networks to evaluate the functional connectivity among multiple brain regions during the post-stimulus phase facilitated by acupuncture. Mucha et al. also developed a network modularity quality function that reveals clusters in timedependent multi-scale and multiplex networks [195]. Therefore, there is existing evidence to suggest that dynamic graphs representing the brain network can be used to to further understand the organization of the brain during error processing. • Effective connectivity of weighted graphs: This dissertation has focussed on functional connectivity of brain networks using phase synchrony and graph measures such that relationships across neural assemblies were modeled as bi-directional and symmetric. However, the directed relationships, effective connectivity, across neural assemblies should be evaluated 162 to further elucidate the information processing patterns of the human brain. With effective connectivity, symmetry of the connectivity matrix can no longer be assumed and new modified measures of clustering coefficient, path length, energy, and entropy must be developed to incorporate directional information. 163 BIBLIOGRAPHY 164 BIBLIOGRAPHY [1] http://www.neuroscienceblueprint.nih.gov/connectome/, 2012. [2] O. Sporns, G. Tononi, and R. Kotter, “The human connectome: A structural description of the human brain,” PLoS Comput Biol, vol. 1, no. 4, p. e42, 09 2005. [3] M. Axer, K. Amunts, D. Grassel, C. Palm, J. Dammers, H. Axer, U. Pietrzyk, and K. Zilles, “A novel approach to the human connectome: Ultra-high resolution mapping of fiber tracts in the brain,” NeuroImage, vol. 54, no. 2, pp. 1091 – 1101, 2011. [4] R. S. J. Frackowiak, K. J. Friston, C. Frith, R. Dolan, and C. J. Price, Human Brain Function, 2nd ed. Academic Press, 2003, no. 1,144. [5] G. Tononi and O. Sporns, “Measuring information integration.” BMC neuroscience, vol. 4, no. 1, pp. 31+, Dec. 2003. [6] L. Lee, L. M. Harrison, and A. Mechelli, “A report of the functional connectivity workshop,” NeuroImage, 2003. [7] C. J. Stam, W. de Haan, A. Daffertshofer, B. Jones, I. Manshanden, A. M. van Cappellen van Walsum, T. Montez, J. Verbunt, J. de Munck, B. van Dijk, H. Berendse, and P. Scheltens, “Graph theoretical analysis of magnetoencephalographic functional connectivity in alzheimer’s disease,” Brain, 2009. [8] M. Rubinov and O. Sporns, “Complex network measures of brain connectivity: Uses and interpretations,” NeuroImage, vol. 52, no. 3, pp. 1059 – 1069, 2010. [9] D. Zhou, W. K. Thompson, and G. Siegle, “Matlab toolbox for functional connectivity,” NeuroImage, vol. 47, no. 4, pp. 1590 – 1607, 2009. [10] M. D. Greicius, B. Krasnow, A. L. Reiss, and V. Menon, “Functional connectivity in the resting brain: A network analysis of the default mode hypothesis,” PNAS, 2002. 165 [11] P. Tu, R. L. Buckner, L. Zollei, K. A. Dyckman, D. C. Goff, and D. S. Manoach, “Reduced functional connectivity in a right-hemisphere network for volitional ocular motor control in schizophrenia,” Brain, 2010. [12] P. L. S. Jacques, F. Dolcos, and R. Cabeza, “Effects of aging on functional connectivity of the amygdala for subsequent memory of negative pictures,” Psychological Science, 2009. [13] Y. He, Z. Chen, and A. Evans, “Structural insights into aberrant topological patterns of large-scale cortical networks in alzheimer’s disease,” J Neurosci, pp. 4756 – 4766, 2008. [14] Y. He and A. Evans, “Graph theoretical modeling of brain connectivity,” Curr Opin Neurol, 2010. [15] D. S. Bassett, E. Bullmore, and B. A. Verchinski, “Hierarchical organization of human cortical networks in health and schizophrenia,” J Neurosci, no. 9239 – 9248, 2008. [16] E. van Dellen, L. Douw, and J. C. Baayen, “Long-term effects of temporal lobe epilepsy on local neural networks: a graph theoretical analysis of cortico- graphy recordings.” PLos One, no. 4, p. e8081, 2009. [17] S. J. Leistedt, N. Coumans, and M. Dumont, “Altered sleep brain functional connectivity in acutely depressed patients,” Human Brain Mapping, no. 30, pp. 2207–2219, 2009. [18] A. Bondy and U. Murty, Graph Theory. Springer, 2008. [19] S. Aviyente, E. M. Bernat, W. S. Evans, and S. R. Sponheim, “A phase synchrony measure for quantifying dynamic functional integration in the brain,” Human Brain Mapping, vol. 32, no. 1, pp. 80–93, 2010. [20] J. R. Hall, E. M. Bernat, and C. J. Patrick, “Externalizing psychopathology and the errorrelated negativity,” Psychological Science, vol. 18, no. 4, pp. 326–333, 2007. [21] J. F. Cavanagh, M. X. Cohen, and J. J. B. Allen, “Prelude to and resolution of an error: Eeg phase synchrony reveals cognitive control dynamics during action monitoring,” The Journal of Neuroscience, vol. 29, no. 1, pp. 98–105, 2009. [22] T. K. Smock, Physiological Psychology. Prentice-Hall, 1999. [23] E. Niedermeyer and F. L. da Silvia, Electroencephalography: Basic Principles, Clinical Applications, and Related Fields. L. Williams and Wilkins, 2004. 166 [24] J. G. Webster, Medical Instrumentation: Application and Design. John Wiley Sons, 1998. [25] E. Pereda, R. Rial, A. Gamundi, and J. González, “Assessment of changing interdependencies between human electroencephalograms using nonlinear methods,” Physica D: Nonlinear Phenomena, vol. 148, no. 1-2, pp. 147 – 158, 2001. [26] A. Rihaczek, “Signal energy distribution in time and frequency,” in IEEE Transactions: Information Theory, vol. 14, 1968, pp. 369–274. [27] O. Sporns, D. Chialvo, M. Kaiser, and C. Hilgetag, “Organization, development and function of complex brain networks,” TRENDS in Cognitive Sciences, vol. 8, no. 9, 2004. [28] W. W. R. Ball and H. S. M. Coxeter, Mathematical Recreations and Essays, 13th ed. Dover, 1987. [29] P. Erdos and A. Renyi, “On the evolution of random graphs,” Mathematical Institute of the Hungarian Academy of Sciences, vol. 5, pp. 17–61, 1960. [30] A. H. Esfahanian, “Algorithmic graph theory,” 2001. [31] D. J. Watts and S. H. Strogatz, “Collective dynamics of ’small world’ networks,” Nature, vol. 393, June 1998. [32] S. Fortunato and A. Lancichinetti, “Community detection algorithms: a comparative analysis: invited presentation, extended abstract,” in Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools, 2009, pp. 27:1–27:2. [33] W. W. Zachary, “An information flow model for conflict and fission in small groups,” Journal Anthropol Res, vol. 33, pp. 452–473, 1977. [34] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, “The bottlenose dolphin community of doubtful sound,” Behav Ecol Sociobiol, vol. 54, pp. 396– 405, 2003. [35] D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing. Addison-Wesley, 1993. [36] B. Frey, The North American Airport Network. 167 www.psi.toronto.edu/ frey/, 2012. [37] J. Kayser and C. Tenke, “Principal components analysis of laplacian waveforms as a generic method for identifying erp generator patterns: I. evaluation with auditory oddball tasks,” Clin Neurophysiol, vol. 117, no. 2, pp. 348–368, 2006. [38] ——, “Principal components analysis of laplacian waveforms as a generic method for identifying erp generator patterns: Ii. adequacy of low-density estimates,” Clin Neurophysiol, vol. 117, no. 2, pp. 369–380, 2006. [39] A. MacDonald, J. Cohen, V. Stenger, and C. Carter, “Dissociating the role of the dorsolateral prefrontal and anterior cingulate cortex in cognitive control,” Science, vol. 288, no. 5472, p. 1835, 2000. [40] M. M. Botvinick, T. Braver, D. Barch, C. Carter, and J. Cohen, “Conflict monitoring and cognitive control,” Psychol Rev, vol. 108, no. 3, p. 624, 2001. [41] J. C. Miller, “Spread of infectious disease through clustered populations,” J. R. Soc. Interface, vol. 6, no. 41, pp. 1121–1134, 2009. [42] M. M. Botvinick, J. Cohen, and C. Carter, “Conflict monitoring and anterior cingulate cortex: an update,” Trends Cogn Sci, vol. 8, no. 12, pp. 539–546, 2004. [43] J. W. Brown and T. S. Braver, “A computational model of risk, conflict, and individual difference effects in the anterior cingulate cortex,” Brain Research, vol. 1202, pp. 99–108, 2008. [44] W. J. Gehring and A. R. Willoughby, Errors, conflicts and the brain. Current opinions on performance monitoring, 2004, no. 14–20. [45] E. M. Bernat, W. J. Williams, and W. J. Gehring, “Decomposing erp time-frequency energy using pca,” Clin Neurophysiol, vol. 116, no. 6, pp. 1314–1334, 2005. [46] L. T. Trujillo and J. J. B. Allen, “Theta eeg dynamics of the error-related negativity,” Clin Neurophysiol, vol. 118, no. 3, pp. 645–668, 2007. [47] E. Bernat, L. Nelson, V. Steele, W. Gehring, and C. Patrick, “Externalizing psychopathology and gain–loss feedback in a simulated gambling task: Dissociable components of brain response revealed by time-frequency analysis,” J Abnorm Psychol, vol. 120, no. 2, pp. 352– 364, 2011. [48] K. Matsumoto, M. Uehara, and H. Mori, “Fault tolerance for small-world cellular neural networks,” Lecture Notes in Computer Science, vol. 5186, pp. 223–231, 2008. 168 [49] M. Karsai, M. Kivela, R. K. Pan, K. Kaski, J. Kertesz, A. L. Barabasi, and J. Saramaki, “Small but slow world: How network topology and burstiness slow down spreading,” Phys. Rev. E, vol. 83, no. 025102(R), 2011. [50] M. E. J. Newman, “The structure of scientific collaboration networks,” Proc Natl Acad Sci U S A, vol. 98, no. 2, pp. 404–9, Jan 2001. [51] B. Kogut and G. Walker, “The Small World of Germany and the Durability of National Networks,” American Sociological Review, vol. 66, no. 3, pp. 317–335, 2001. [52] B. Uzzi and J. Spiro, “Collaboration and Creativity: The Small World Problem,” American Journal of Sociology, vol. 111, no. 2, pp. 447–504, Sep. 2005. [53] O. Sporns and R. Kotter, “Motifs in brain networks,” PLoS Biol, vol. 2, pp. 1910–1918, 2004. [54] O. Sporns and J. D. Zwi, “The small world of the cerebral cortex.” Neuroinformatics, vol. 2, no. 2, pp. 145–162, Jun. 2004. [55] O. Sporns, G. Tononi, and G. Edelman, “Theoretical neuroanatomy: relating anatomical and functional connectivity in graphs and cortical connection matrices,” Cereb Cortex, vol. 10, pp. 127–141, 2000. [56] C. Cherniak, Z. Mokhtarzada, R. Rodriguez-Esteban, and K. Changizi, “Global optimization of cerebral cortex layout,” Proc. Natl. Acad. Sci., no. 101, pp. 1081–1086, 2004. [57] O. Sporns and G. Tononi, “Classes of network connectivity and dynamics,” Complexity, no. 7, pp. 28–38, 2002. [58] R. Salvador, J. Suckling, J. Coleman, M. andPickard, D. Menon, and E. Bullmore, “Neurophysiological architecture of functional magnetic resonance images of human brain,” Cereb. Cortex, no. 15, pp. 1332–1342, 2005. [59] S. Achard, R. Salvador, B. Whitcher, J. Suckling, and E. Bullmore, “A resilient, lowfrequency, small-world human brain functional network with highly connected association cortical hubs.” J. Neurosci, no. 26, pp. 63–72, 2006. [60] V. Eguiluz, D. Chialvo, M. Cecchi, G.A. andBaliki, and A. Apkarian, “Scale-free brain functional networks,” Phys. Rev. Lett., vol. 94, p. 018102, 2005. 169 [61] S. L. Bressler, “Large-scale cortical networks and cognition,” Brain Research Reviews, vol. 20, no. 3, pp. 288 – 304, 1995. [62] K. Friston, “Beyond phrenology: what can neuroimaging tell us about distributed circuitry?” Annu. Rev. Neurosci., no. 25, pp. 221–250, 2002. [63] K. Supekar, V. Menon, D. Rubin, M. Musen, and M. D. Greicius, “Network analysis of intrinsic functional brain connectivity in alzheimer’s disease,” PLoS Comput Biol, vol. 4, no. 6, p. e1000100, 06 2008. [64] S. Micheloyannis, E. Pachou, C. J. J. Stam, M. Breakspear, P. Bitsios, M. Vourkas, S. Erimaki, and M. Zervakis, “Small-world networks and disturbed functional connectivity in schizophrenia.” Schizophrenia research, vol. 87, no. 1-3, pp. 60–66, Oct. 2006. [65] K. Yuan, W. Q., J. Liu, Q. Guo, M. Dong, J. Sun, Y. Zhang, P. Liu, W. Wang, Y. Wang, Q. Li, W. Yang, K. M. von Deneen, M. S. Gold, Y. Liu, and J. Tian, “Altered small-world brain functional networks and duration of heroin use in male abstinent heroin-dependent individuals,” Neuroscience Letters, vol. 477, no. 1, pp. 37 – 42, 2010. [66] J. Saramaki, M. Kivela, J. P. Onnela, K. Kaski, and J. Kertesz, “Generalizations of the clustering coefficient to weighted complex networks,” Phys. Rev. E, vol. 75, no. 2, p. 027105, Feb 2007. [67] G. Caldarelli, A. Capocci, and D. Garlaschelli, “A self-organized model for network evolution,” The European Physical Journal B - Condensed Matter and Complex Systems, vol. 64, pp. 585–591, 2008. [68] G. Kalna, “A clustering coefficient for weighted networks, with application to gene expression data,” AI Commun., vol. 20, no. 4, pp. 263–271, 2007. [69] J. C. Reijneveld, S. C. Ponten, H. W. Berendse, and C. J. Stam, “The application of graph theoretical analysis to complex networks in the brain,” Clin Neurophysiol, vol. 118, no. 11, pp. 2317–2331, 2007. [70] G. Fagiolo, J. Reyes, and S. Schiavo, “On the topological properties of the world trade web: A weighted network analysis,” Applications of Physics in Financial Analysis, vol. 387, no. 15, pp. 3868–3873, 2008. [71] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani, “The architecture of complex weighted networks,” PROC.NATL.ACAD.SCI.USA, vol. 101, p. 3747, 2004. 170 [72] J. P. Onnela, J. Saramaki, J. Kertesz, and K. Kaski, “Intensity and coherence of motifs in weighted complex networks,” Phys. Rev. E, vol. 71, no. 6, p. 065103, Jun 2005. [73] C. Yan, G. Gong, J. Wang, D. Wang, D. Liu, C. Zhu, Z. Chen, A. Evans, Y. Zang, and Y. He, “Sex- and brain size-related small-world structural cortical networks in young adults: a dti tractography study,” Cereb Cortex, vol. 21, no. 2, p. 449, 2011. [74] B. Zhang and S. Horvath, “A general framework for weighted gene co-expression network analysis,” Stat Appl Genet Mol Biol, vol. 4, no. 1, p. Article 17, 2005. [75] P. Grindrod, “Range-depdendent random graphs and their application to modeling large small-world proteome datasets,” Physical Review E, vol. 66, no. 066702, 2002. [76] S. Boyd, P. Diaconis, and L. Xiao, “Fastest mixing markov chain on a graph,” SIAM Review, vol. 46, no. 4, pp. 667–689, 2004. [77] J. Jarvis and D. Shier, Applied Mathematical Modeling: A Multidisciplinary Approach; Graph-theoretic analysis of finite Markov chains. CRC Press, 1999, vol. Chapter 13. [78] B. V. Cherkassky, A. V. Goldberg, and T. Radzik, “Shortest paths algorithms: Theory and experimental evaluation,” Mathematical Programming, vol. 73, pp. 129–174, 1993. [79] B. V. Cherkassky and A. V. Goldberg, “Negative-cycle detection algorithms,” Springer, vol. 85, pp. 277–311, 1999. [80] V. Latora and M. Marchiori, “Efficient behavior of small-world networks,” Phys. Rev. Lett., vol. 87, no. 19, p. 198701, Oct 2001. [81] J. Gao, R. Jin, J. Zhou, J. Xu, X. Jiang, and T. Wang, “Relational approach for shortest path discovery over large graphs,” 38th International Conference on Very Large Data Bases, vol. 5, no. 4, 2011. [82] R. W. Floyd, “Algorithm 97: Shortest path,” Communications of the ACM, vol. 5, no. 6, p. 345, 1962. [83] F. Boesch, “A survey and introduction to network reliability theory,” Communications, Digital Technology - Spanning the Universe. Conference Record., IEEE International Conference on, vol. 2, pp. 678 –682, jun 1988. 171 [84] A. Shatnawi, M. O. Ahmad, and M. N. S. Swamy, “Optimal scheduling of digital signal processing data-flow graphs using shortest-path algorithms,” The Computer Journal, vol. 45, pp. 88–100, 2002. [85] G. Yu and J. Yang, “On the robust shortest path problem,” Computers Ops Res, vol. 25, no. 6, pp. 457–468, 1998. [86] A. Nagurney and Q. Qiang, “Robustness of transportation networks subject to degradable links,” Frontiers of Physics, vol. 80, no. 68001, 2007. [87] A. Bley, F. D’Andreagiovanni, and a. Hanemann, “Robustness in communication networks: Scenarios and mathematical approaches,” Photonic Networks; 12. ITG Symposium; Proceedings of, pp. 1 –8, 2011. [88] M. Manzano, E. Calle, and D. Harle, “Quantitative and qualitative network robustness analysis under different multiple failure scenarios,” in Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), 2011 3rd International Congress on, oct. 2011, pp. 1 –7. [89] E. F. Zipkin, C. S. Jennelle, and E. G. Cooch, “A primer on the application of markov chains to the study of wildlife disease dynamics,” Methods in Ecology and Evolution, vol. 1, no. 2, pp. 192–198, 2010. [90] J. G. Kemeny and J. L. Snell, Finite Markov Chains. Springer Berlin / Heidelberg, 1976. [91] L. Fleming and M. Marx, “Managing innovation in small worlds,” Engineering Management Review, IEEE, vol. 37, no. 4, pp. 90 –92, 2009. [92] Y. Yang and X. Yu, “Weighted small world complex networks: smart sliding mode control,” in Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications, 2009, pp. 935–944. [93] V. Latora and M. Baranger, “Kolmogorov-sinai entropy rate versus physical entropy,” PHYSICAL REVIEW LETTERS, vol. 82, no. 3, 1999. [94] C. Stam, B. Jones, G. Nolte, M. Breakspear, and P. Scheltons, “Small-world networks and functional connectivity in alzheimer’s disease,” Cerebral Cortex, 2006. [95] B. C. M. van Wijk, C. J. Stam, and A. Daffertshofer, “Comparing brain networks of different size and connectivity density using graph theory,” PLoS ONE, vol. 5, no. 10, p. e13701, 10 2010. 172 [96] J. Kim, “Implementation and performance evaluation of mobile ad hoc network for emergency telemedicine system in disaster areas,” IEEE Engineering in Medicine and Biology Society Conference, pp. 1663–1666, 2009. [97] M. He and S. Petoukhov, Mathematics of Bioinformatics: Theory, Methods and Applications. Wiley, 2011. [98] R. S. Choras, “Image feature extraction techniques and their applications for cbir and biometrics systems,” International Journal of Biology and Biomedical Engineering, vol. 1, no. 1, 2007. [99] H. Liu, G. Chen, S. Jiang, and G. Song, “A survey of feature extraction approaches in analog circuit fault diagnosis,” in Proceedings of the 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application - Volume 02, 2008, pp. 676–680. [100] P. Sanz, R. Marin, and J. Sanchez, “Pattern recognition for autonomous manipulation in robotic systems,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 35, pp. 1–3, 2005. [101] H. Zhao, S. Sun, Z. Jing, and J. Yang, “Local structure based supervised feature extraction,” Pattern Recognition, vol. 39, no. 8, pp. 1546 – 1550, 2006. [102] D. Calvanese, G. D. Giacomo, and M. Lenzerini, “Extending semi-structured data,” Proc. of the 6th Italian Conf. on Database Systems (SEBD’98), 1998. [103] S. Zafeiriou, “Exploiting discriminant information in elastic graph matching,” Image Processing, IEEE International Conference on, vol. 3, pp. 768–771, 2006. [104] H. Fei and J. Huan, “Structure feature selection for graph classification,” in Proceedings of the 17th ACM conference on Information and knowledge management, 2008, pp. 991–1000. [105] B. A. Miller, N. T. Bliss, and P. J. Wolfe, “Toward signal processing theory for graphs and non-euclidean data,” Acoustics Speech and Signal Processing (ICASSP), IEEE International Conference on, pp. 5414 – 5417, 2010. [106] G. Shen and A. Ortega, “Optimized distributed 2d transforms for irregularly sampled sensor network grids using wavelet lifting,” ICASSP, pp. 2513–2516, 2008. [107] R. Guimera, S. Mossa, A. Turtschi, and L. A. N. Amaral, “The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles,” hysical Sciences - Applied Mathematics, vol. 102, no. 22, pp. 7794–7799, 2005. 173 [108] M. E. J. Newman, Networks: An Introduction. Oxford University Press, 2010. [109] L. C. Freeman, S. P. Borgatti, and D. R. White, “Centrality in valued graphs: A measure of betweenness based on network flow,” Social Networks, vol. 13, no. 2, pp. 141 – 154, 1991. [110] T. Opashi, G. Agneessens, and J. Skvoretz, “Node centrality in weighted networks: Generalizing degree and shortest paths,” Social Networks, vol. 32, no. 3, p. 245, 2010. [111] P. Bonacich, “Some unique properties of eigenvector centrality,” Social Networks, vol. 29, no. 4, pp. 555 – 564, 2007. [112] K. Pearson, “On lines and planes of closest fit to systems of points in space,” Philosophical Magazine, vol. 2, no. 6, pp. 559–572, 1901. [113] R. Guimera and L. A. N. Amaral, “Functional cartography of complex metabolic networks,” Nature, vol. 433, pp. 895–900, 2005. [114] I. Gutman and B. Zhou, “Laplacian energy of a graph,” Linear Algebra and its Applications, vol. 414, no. 1, pp. 29 – 37, 2006. [115] J. Day and W. So, “Graph energy change due to edge deletion,” Linear Algebra and its Applications, vol. 428, no. 8-9, pp. 2070 – 2078, 2008. [116] G. Blyholder and C. Coulson, “Basis of extended hückel formalism,” Theoretical Chemistry Accounts: Theory, Computation, and Modeling, vol. 10, pp. 316–324, 1968. [117] R. Balakrishnan, “The energy of a graph,” Linear Algebra and its Applications, vol. 387, pp. 287 – 295, 2004. [118] I. T. Jolliffe, Principle Component Analysis. Springer, 2002. [119] C. Allefeld, “Eigenvalue decomposition as a generalized synchronization cluster analysis,” International Journal of Bifurcation and Chaos, vol. 17, no. 10, pp. 3493–3497, 2008. [120] B. Mohar, The Laplacian Spectrum of Graphs. Wiley, 1991, vol. 2, no. 871-898. [121] J. van den Heuvel and P. Snezana, “Using laplacian eigenvalues and eigenvectors in the analysis of frequency assignment problems,” Annals of Operations Research, vol. 107, pp. 349–368, 2001. 174 [122] T. Cover and J. Thomas, Elements of Information Theory. Wiley, 2006. [123] J. Korner, “Coding of an information source having ambiguous alphabet and the entropy of graphs,” Transactions of the 6th Prague Conference on Information Theory, no. 411-425, 1973. [124] M. Dehmer, “Information processing in complex networks: Graph entropy and information functionals,” Appl Math Comput, vol. 201, no. 1-2, pp. 82 – 94, 2008. [125] J. Shetty, “Discovering important nodes through graph entropy: The case of enron email database,” in KDD, Proceedings of the 3rd international workshop on Link discovery. Press, 2005, pp. 74–81. [126] L. Demetrius and T. Manke, “Robustness and network evolution-an entropic principle,” Physica A, vol. 346, pp. 682–696, 2005. [127] Z. Burda, J. Duda, J. M. Luck, and B. Waclaw, “Localization of the maximal entropy randomwalk,” Physical Review Letters, vol. 102, 2009. [128] R. Sinatra, J. Gomez-Gardenes, R. Lambiotte, V. Nicosia, and V. Latora, “Maximal-entropy random walks in complex networks with limited information,” Phys. Rev. E, vol. 83, p. 030103, Mar 2011. [129] T. Milenkovic, V. Memisevic, A. Bonato, and N. Przulj, “Dominating biological networks,” PLoS ONE, vol. 6, no. 8, p. e23016, 2011. [130] Y. Choi and W. Szpankowski, “Compression of graphical structures,” Information Theory, IEEE Transactions on, vol. 58, no. 2, pp. 620 – 638, 2012. [131] T. Pevny, P. Bas, and J. Fridrich, “Steganalysis by subtractive pixel adjacency matrix,” Information Forensics and Security, IEEE Transactions on, vol. 5, no. 2, pp. 215–224, 2010. [132] K. Sayood, Introduction to Data Compression, 3rd ed. Morgan Kaufmann, 2005, no. 24. [133] T. S. Huang, “Face detection with information-based maximum discrimination,” Computer Vision and Pattern Recognition Proceedings., IEEE Computer Society Conference on, p. 782, 1997. [134] R. Rosen, “Matrix bandwidth minimization,” Proceedings of the 1968 23rd ACM national conference, pp. 585–595, 1968. 175 [135] R. Marti, V. Campos, and E. Pinana, “A branch and bound algorithm for the matrix bandwidth minimization,” European Journal Oper Res, vol. 186, no. 2, pp. 513 – 528, 2008. [136] W. F. Tinney and J. W. Walker, “Direct solution of sparse network equations by optimally ordered triangular factorization,” Proc. IEEE, vol. 55, no. 11, pp. 1801–1809, 1967. [137] G. Fung, “A comprehensive overview of basic clustering algorithms,” A Review, 2001. [138] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, 1996. [139] S. van Dongen, “Graph clustering by flow simulation,” Ph.D. dissertation, University of Utrecht, 2000. [140] C. Ding and X. He, “Linearized cluster assignment via spectral ordering,” Proceedings of the 21st International Conference on Machine Learning, 2004. [141] E. Cuthill and J. McKee, “Reducing the bandwidth of sparse symmetric matrices,” Proceedings of the 1969 24th ACM national conference, pp. 157–172, 1969. [142] M. Lin, Z. Lin, and J. Xu, “Graph bandwidth of weighted caterpillars,” Theoretical Computer Science, vol. 363, no. 3, pp. 266 – 277, 2006. [143] M. Benzi, D. B. Szyldy, and A. van Duinz, “A study of different orderings for incomplete factorization preconditioning of nonsymmetric linear systems,” COMPUTATIONAL MECHANICS: New Trends and Applications, 1998. [144] W. S. Yeo and J. Berger, “Application of raster scanning method to image sonification, sound visualization, sound analysis and synthesis,” Proc. of the 9th Int. Conference on Digital Audio Effects, 2006. [145] S. Kamata, M. Niimi, and E. Kawaguchi, “A method of interactive analysis for multidimensional images using a hilber curve,” Trans I.E.I.C.E., no. J77, pp. 1255–1264, 1994. [146] G. M. Morton, “A computer oriented geodetic data base; and a new technique in file sequencing,” Technical Report for IBM Ltd., 1966. [147] D. Hilbert, “Uber die stetige abbildung einer linie auf ein flachenstuck,” Mathematische Annalen, pp. 459–460, 1891. 176 [148] E. Artyomov, Y. Rivenson, G. Levi, and O. Yadid-Pecht, “Morton (z) scan based real-time variable resolution cmos image sensor,” Electronics, Circuits and Systems (ICECS), Proceedings of the 2004 11th IEEE International Conference on, pp. 145–148, 2004. [149] S. P. Lloyd, “Least squares quantization in pcm,” IEEE Trans Inform Theory, vol. 28, no. 2, pp. 129–127, 1982. [150] J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,” IEEE Trans Inform Theory, 1978. [151] M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” Proceedings Natl Acad Sci USA, vol. 99, no. 12, pp. 7821–7826, Jun. 2002. [152] M. E. J. Newman, “Detecting community structure in networks,” The European Physical Journal B - Condensed Matter and Complex Systems, vol. 38, no. 2, pp. 321–330–330, Mar. 2004. [153] G. W. Flake, S. L., C. L. Giles, and F. M. Coetzee, “Self-organization and identification of web communities,” IEEE Computer, vol. 35, pp. 66–71, 2002. [154] M. Chavez, M. Valencia, V. Navarro, V. Latora, and J. Martinerie, “Functional modularity of background activities in normal and epileptic brain networks,” Phys. Rev. Lett., vol. 104, no. 11, p. 118701, Mar 2010. [155] M. Kirschner and J. Gerhart, “Evolvability,” Proceedings of the National Academy of Sciences, vol. 95, no. 15, pp. 8420–8427, 1998. [156] M. E. J. Newman, “Modularity and community structure in networks,” Proceedings of the National Academy of Sciences, vol. 103, no. 23, p. 8577, 2006. [157] V. Satuluri and S. Parthasarathy, “Scalable graph clustering using stochastic flows: applications to community discovery,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’09. New York, NY, USA: ACM, 2009, pp. 737–746. [158] T. Young and I. Chiang, “A novel clustering method of high dimensional data,” in Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. Springer Berlin / Heidelberg, 2007, vol. 4482, pp. 256–262. [159] X. Otazu and O. Pujol, “Wavelet based approach to cluster analysis. application on low dimensional data sets,” Pattern Recogn. Lett., vol. 27, pp. 1590–1605, October 2006. 177 [160] V. Carchiolo, A. Longheu, M. Malgeri, and G. Mangioni, “Search for overlapped communities by parallel genetic algorithms,” CoRR, vol. abs/0912.0913, 2009. [161] S. White and P. Smyth, “A spectral clustering approach to finding communities in graphs,” SIAM International Conference on Data Mining, pp. 76–84, 2005. [162] J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, L. M. L. Cam and J. Neyman, Eds., vol. 1. University of California Press, 1967, pp. 281–297. [163] R. Baumgartner, L. Ryner, W. Richter, R. Summers, M. Jarmasz, and R. Somorjai, “Comparison of two exploratory data analysis methods for fmri: fuzzy clustering vs. principal component analysis,” Magnetic Resonance Imaging, vol. 18, no. 1, pp. 89–94, 2000. [164] A. Meyer-Baese, A. Wismueller, and O. Lange, “Comparison of two exploratory data analysis methods for fmri: unsupervised clustering versus independent component analysis,” Information Technology in Biomedicine, IEEE Transactions on, vol. 8, no. 3, pp. 387–398, 2004. [165] U. Luxburg, “A tutorial on spectral clustering,” CoRR, vol. abs/0711.0189, 2007. [166] J. Forman, P. Clemons, S. Schreiber, and S. Haggarty, “Spectralnet - an application for spectral graph analysis and visualization,” BMC Bioinformatics, vol. 6, no. 1, p. 260, 2005. [167] D. Meunier, R. Lambiotte, A. Fornito, K. D. Ersche, and E. T. Bullmore, “Hierarchical modularity in human brain functional networks,” Frontiers in Neuroinformatics, vol. 3, no. 37, 2009. [168] B. Fischer and J. Buhmann, “Bagging for path-based clustering,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, no. 11, pp. 1411–1415, 2003. [169] I. Rustandi, “Hierarchical gaussian naive bayes classifier for multiple-subject fmri data,” in In NIPS Workshop: New Directions on Decoding Mental States from fMRI Data, 2006. [170] W. D. Penny and A. J. Holmes, “Random effects analysis,” Imaging, vol. 2, no. 1, pp. 1–12, 2004. [171] W. Tang, Z. Lu, and I. S. Dhillon, “Clustering with multiple graphs,” ICDM ’09 Proceedings of the 2009 Ninth IEEE International Conference on, pp. 1016–1021, 2009. 178 [172] H. Wang, H. Shan, and A. Banerjee, “Bayesian cluster ensembles,” Statistical Analysis and Data Mining, no. 4, pp. 54–70, 2011. [173] S. E. Schaeffer, “Graph clustering,” Computer Science Review, vol. 1, no. 1, pp. 27 – 64, 2007. [174] S. Gregory, “An algorithm to find overlapping community structure in networks,” Lecture Notes in Computer Science, vol. 4702, p. 91, 2007. [175] S. Chauhan, M. Girvan, and E. Ott, “Spectral properties of networks with community structure,” Phys. Rev. E, vol. 80, no. 5, p. 056114, Nov 2009. [176] A. Arenas, A. Diaz-Guilera, and C. J. Perez-Vicente, “Synchronization reveals topological scales in complex networks,” Physical Review Letters, vol. 96, no. 114102, 2006. [177] H. Shen and X. Cheng, “Spectral methods for the detection of network community structure: a comparative analysis,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2010, no. 10, p. P10020, 2010. [178] M. E. J. Newman, “Finding community structure in networks using the eigenvectors of matrices,” Phys. Rev. E, vol. 74, no. 3, p. 036104, 2006. [179] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” in ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. MIT Press, 2001, pp. 849–856. [180] Y. Lin, K. Desouza, and S. Roy, “Measuring agility of networked organizational structures via network entropy and mutual inforation,” Applied Mathematics and Computation, vol. 216, pp. 2824–2836, 2010. [181] M. Holzrichter and S. Oliveira, “A graph based method for generating the fiedler vector of irregular problems,” in In Lecture Notes in Computer Science. Lecture, 1999, pp. 978–985. [182] N. Nguyen and R. Caruana, “Consensus clusterings,” Data Mining, Seventh IEEE International Conference on, pp. 607– 612, 2007. [183] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel hypergraph partitioning: Application in vlsi domain,” in IEEE TRANS. VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 1999, pp. 69–529. 179 [184] V. Nicosia, G. Mangioni, V. Carchiolo, and M. Malgeri, “Extending modularity definition for directed graphs with overlapping communities,” Eprint arXiv, vol. 801. [185] M. K. Goldberg, M. Hayvanovych, and M. Magdon-Ismail, “Measuring similarity between sets of overlapping clusters,” IEEE Second International Conference on Social Computing,, pp. 303–308, 2010. [186] M. C. V. Nascimento and A. C. P. L. F. de Carvalho, “Spectral methods for graph clustering - a survey,” European Journal of Operational Research, vol. 211, no. 2, pp. 221 – 231, 2011. [187] S. Yu, X. Liu, L. C. Tranchevent, W. Glänzel, J. A. K. Suykens, B. De Moor, and Y. Moreau, “Optimized data fusion for k-means laplacian clustering,” Bioinformatics, vol. 27, pp. 118– 126, January 2011. [188] K. Steinhaeuser and N. V. Chawla, “Identifying and evaluating community structure in complex networks,” Pattern Recognition Letters, vol. 31, no. 5, pp. 413 – 421, 2010. [189] C. J. Rijsbergen, “Information retrieval,” Journal of the American Society for Information Science, vol. 30, no. 6, pp. 374–375, 1979. [190] A. Rosenberg and J. Hirschberg, “V-measure: A conditional entropy-based external cluster evaluation measure,” Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 410– 420, 2007. [191] J. Lin, “Divergence measures based on the shannon entropy,” IEEE Transactions on Information Theory, vol. 37, no. 1, 1991. [192] M. Bolanos, S. Aviyente, and E. Bernat, “Graph analysis of neuronal interactions for the error-related negativity,” IEEE Engineering in Medicine and Biology Conference, 2010. [193] H. Prufer, “Neuer beweis eines satzes uber permutationen,” Arch. Math. Phys., vol. 27, pp. 742–744, 1918. [194] L. Bai, W. Qin, J. Tian, J. Dai, and W. Yang, “Detection of dynamic brain networks modulated by acupuncture using a graph theory model,” Progress in Natural Science, vol. 19, no. 7, pp. 827–835, 2009. [195] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J.-P. Onnela, “Community structure in time-dependent, multiscale, and multiplex networks,” Science, vol. 328, no. 5980, pp. 876–878, 2010. 180