SIGNAL PROCESSING INSPIRED GRAPH THEORETIC METHODS FOR
UNDERSTANDING FUNCTIONAL CONNECTIVITY OF THE BRAIN
By
Marcos Efren Bolaños

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Electrical Engineering
2012

ABSTRACT
SIGNAL PROCESSING INSPIRED GRAPH THEORETIC METHODS FOR
UNDERSTANDING FUNCTIONAL CONNECTIVITY OF THE BRAIN
By
Marcos Efren Bolaños
Functional brain networks underlying cognitive control processes have been of central interest
in neuroscience. A great deal of empirical and theoretical work now suggests that frontal networks
in particular the medial prefrontal cortex (mPFC) and lateral prefrontal cortex (lPFC) are involved
in cognitive control. The most common way to study functional brain networks has been through
measures of connectivity such as coherence, synchrony and mutual information. However, it has
been noted that functional connectivity measures are limited to quantifying pairwise relationships
between brain regions and do not describe the overall organization of the brain network. Recently,
researchers have adapted tools from graph theory to address this issue. Graph theory can model
a network by a set of vertices and edges upon which complex network analysis may be applied.
With respect to the functional brain network, the vertices represent the individual neural assemblies
and the edges are weighted by their pair-wise phase synchrony. Most graph theoretic measures,
however, are limited to sparsely connected unweighted graphs. Therefore, some of the existing
graph measures cannot be directly applied to the fully connected weighted graphs.
In this thesis, existing graph measures and graph theoretic approaches are modiﬁed speciﬁcally
for the analysis of the functional brain network. First, new weighted clustering coefﬁcient and
path length measures are introduced for quantifying the local weighted ‘small-world’ index of the
brain. These measures are based on modeling the edge weights as probabilities which represent
the reliability of information ﬂowing across these edges. These measures differ from conventional
measures by considering all possible connections with varying strengths of connectivity and do not

require arbitrary thresholding of the weighted connectivity matrix, i.e. they can be applied directly
to a fully connected weighted graph. Next, concepts from signal processing are adapted to graphs
to identify central vertices and anomalies within a network. These measures include new graph
energy and entropy measures for graphs. The proposed graph energy measure outperforms existing
deﬁnitions of graph energy for local anomaly detection because it is computed from the most
relevant spectral content extracted from the graph’s Laplacian matrix. A new deﬁnition of entropy
rate based on modeling the adjacency matrix of a graph as a Markov process is introduced to
quantify the local complexity of a weighted graph. Finally, we introduce a hierarchical consensus
clustering algorithm that uses the well-known Fiedler vector to reveal a hierarchical structure of
the brain network across various modular resolutions.
The proposed methods are applied to error-related negativity (ERN) data, a response-locked
negative deﬂection of the brain event-related potential observed following errors in performance
tasks. Previous research shows that the primary neural generator of the ERN is the anterior cingulate cortex (ACC) and there is signiﬁcant difference in connectivity patterns between mPFC
and lPFC for error and correct responses. The proposed graph theoretic approaches give a succinct representation of the functional networks involved during action-monitoring and cognitive
control and provide insight into the reorganization of the neural networks during error processing.
The ‘small-world’ measures reveal there is increased local functional segregation and integration
among electrodes in the mPFC and lPFC during error responses compared to correct responses.
Also, the mPFC region of the brain network demonstrated increased energy and complexity indicating the presence of an anomalous perturbation located around the FCz. Finally, the hierarchical
consensus clustering algorithm revealed an increase in modularity across the mPFC during error
responses indicating a reorganization of the underlying functional network.

ACKNOWLEDGMENT

I want to thank my family back in El Paso, Texas for their moral support while I pursued my Ph.D.
so far from home. I missed them so much during my time here and I hope to continue making them
proud. My brothers, Andres and Efren, and my parents, Ofelia and Arturo, have kept me grounded
and helped me remember what the important things in life really are. I love you all so very much
and I owe all my success to you.
I especially want to thank Whitney Rantz for being my rock as I progressed through this challenging journey. You are a remarkable woman with such a great heart and I thank God for bringing
you into my life. Thank you for all your encouragement and love. Thank you to Mark, Sue, and
Kayla Rantz for making me feel like a part of their family these past few years and helping me feel
at home. Also, thank you so much for purchasing my Ph.D. regalia. It is one of the most generous
gifts anybody has ever given me and I will always be very grateful. Many thanks to Milton Martinez and Juan Nava for being awesome friends. It is really a blessing that the three of us El Paso
guys ended up in the same state. You guys will always be brothers to me.
I would also like to express my deep-felt gratitude to my advisor, Dr. Selin Aviyente, for her
advice and encouragement. She was never ceasing in her belief in me and constantly pushing me
to bring out the best in my work. Also, many thanks to my lab partners Ali Mutlu, Ying Liu, and
Suhaily Cardona. I’m glad we were able to share the frustrations and celebrations of research work
together.
I also wish to thank the other members of my committee, Dr. Rama, Dr. Chakrabartty, and
Dr. Esfahanian. Their suggestions, comments and additional guidance were invaluable to the
completion of this work. As a special note, Dr. Percy Pierre was a driving force in my success
through his help in obtaining a graduate fellowship to pay for my studies, representing me in front
iv

of the qualiﬁer exam committee, and offering a word of advice when I needed it. I also want
to extend my thanks to Dr. Barbara O’Kelly for personally calling me and inviting me to visit
Michigan State as well as for making my transition into the Ph.D. program a smooth experience.

v

TABLE OF CONTENTS

List of Tables . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

ix

List of Figures

. . . . . . . . . . . . . . . . . . . .

xi

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

1
1
6
6
8
10
12
14
14
14
15
15
16
16
16
18
19

Chapter 2
Small-World Type Measures for Weighted Graphs . . . . . . . . . . . .
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Clustering Coefﬁcient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Conventional Clustering Coefﬁcient . . . . . . . . . . . . . . . . . . . . .
2.2.2 Existing Deﬁnitions for Weighted Clustering Coefﬁcient . . . . . . . . . .
2.2.3 Newly Proposed Weighted Clustering Coefﬁcient . . . . . . . . . . . . . .
2.2.4 A Random Walk Based Clustering Coefﬁcient Measure for Binary Networks
2.3 Characteristic Path Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Conventional Characteristic Path Length . . . . . . . . . . . . . . . . . . .
2.3.2 Existing Deﬁnition for Weighted Characteristic Path Length . . . . . . . .
2.3.3 Newly Proposed Weighted Characteristic Path Length . . . . . . . . . . . .

22
22
25
26
27
29
31
32
33
34
34

. . . . . . . . . . . . . .

Chapter 1
. . . . . . . . . . . . . . . . . . . .
1.1 Introduction . . . . . . . . . . . . . . . . .
1.2 Background . . . . . . . . . . . . . . . . .
1.2.1 Physiology of the Human Brain . .
1.2.2 The Electroencephalogram (EEG) .
1.2.3 Measures of Functional Connectivity
1.2.4 Graph Theory . . . . . . . . . . . .
1.2.5 Graph Topologies . . . . . . . . . .
1.2.5.1 Lattice Graphs . . . . . .
1.2.5.2 Random Graphs . . . . .
1.2.5.3 Star Graphs . . . . . . . .
1.2.5.4 Scale-Free Networks . . .
1.2.6 Benchmark Data . . . . . . . . . .
1.2.6.1 Simulated Data . . . . . .
1.2.6.2 Real-World Networks . .
1.2.6.3 Biological Data . . . . . .
1.3 Thesis Outline . . . . . . . . . . . . . . . .

vi

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

2.3.4

2.4

2.5

2.6

2.7

A Random Walk Based Characteristic Path Length Measure for Binary
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Small-World Measure of a Network . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Conventional Small-World Measure . . . . . . . . . . . . . . . . . . . . .
2.4.2 Newly Proposed Weighted Small-World . . . . . . . . . . . . . . . . . . .
Mapping of Weighted Graphs to Binary Graphs . . . . . . . . . . . . . . . . . . .
2.5.1 Existing Approaches for Transforming Weighted Graphs to Binary Graphs .
2.5.2 Weighted to Binary Graph Transformation: Alternative to Matrix Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Random Walk Based Clustering Coefﬁcient/Path Length Measures for Binary Benchmark Networks . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.2 Weighted Graph Measures on Benchmark Data . . . . . . . . . . . . . . .
2.6.3 Weighted Graph Measures on ERN EEG Data . . . . . . . . . . . . . . . .
2.6.3.1 Bivariate Phase-Synchrony Measures . . . . . . . . . . . . . . .
2.6.3.2 Graph Measures . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.3.3 Relationship between Graph and Bivariate Measures . . . . . . .
2.6.4 Optimal Transformation From Weighted to Binary Matrix: ERN/EEG Data
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 3
Local Feature Extraction from Weighted Graphs . . . . . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Existing Feature Measures for Graphs . . . . . . . . . . . . . . . . . . . .
3.2.1 Hub Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Provincial and Connector Hubs . . . . . . . . . . . . . . . . . . . .
3.3 Graph Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Existing Measures of Graph Energy . . . . . . . . . . . . . . . . .
3.3.2 Laplacian Hückel Graph Energy . . . . . . . . . . . . . . . . . . .
3.3.3 Localized Laplacian Hückel Graph Energy . . . . . . . . . . . . . .
3.4 Graph Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Existing Measures of Graph Entropy . . . . . . . . . . . . . . . . .
3.4.2 Proposed Graph Entropy Rate . . . . . . . . . . . . . . . . . . . .
3.4.2.1 Matrix Bandwidth Minimization . . . . . . . . . . . . . .
3.4.2.2 Scanning Function of the Permuted Matrix . . . . . . . .
3.4.2.3 Quantization of the Weighted Connectivity Matrix . . . .
3.4.3 Localized Graph Entropy Rate . . . . . . . . . . . . . . . . . . . .
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Comparison Between Energy Measures . . . . . . . . . . . . . . .
3.5.2 Laplacian-Hückel Energy of a Biological Network: EEG ERN data
3.5.3 Comparison Between Local Graph Entropy Measures . . . . . . . .
3.5.4 Entropy Rate Across Binary Graph Topologies . . . . . . . . . . .
3.5.5 Entropy with Respect to Permutation and Scanning Function . . . .
3.5.6 Graph Entropy with Respect to Order of the Markov Process . . . .
3.5.7 Entropy Rate of a Biological Network: EEG ERN data . . . . . . .
vii

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

36
39
39
40
40
41
41
44
44
45
47
47
47
49
50
51
64
64
66
66
68
69
69
71
72
74
75
77
79
81
82
83
83
84
85
90
91
97
98
101

3.6

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Chapter 4
4.1
4.2

4.3

4.4

4.5

4.6

Hierarchical Multi-Subject Community Detection for Weighted Neuronal Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Background on Community Detection on Graphs . . . . . . . . . . . . . . . . .
4.2.1 Community Detection on Graphs . . . . . . . . . . . . . . . . . . . . . .
4.2.1.1 Girvan-Newman . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1.2 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Consensus Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.3 Quality Measures for Community Structures . . . . . . . . . . . . . . . .
Hierarchical Modular Detection Algorithms for Weighted Neuronal Networks . .
4.3.1 Overlapping Community Structures in Weighted Networks . . . . . . . .
4.3.2 Hierarchical Spectral Partitioning Algorithm for Identifying a Common
Community Structure among Multiple Subjects . . . . . . . . . . . . . .
New Weighted Graph Clustering Quality Measures . . . . . . . . . . . . . . . .
4.4.1 Comparing Graph Partitions to Multiple Random Networks . . . . . . . .
4.4.2 Homogeneity and Completeness . . . . . . . . . . . . . . . . . . . . . .
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Overlapping Clusters in the Brain Network During the Error-Related Negativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2 Hierarchical Spectral Consensus Clustering Analysis . . . . . . . . . . .
4.5.2.1 Comparisons with Averaging and Conventional Voting Consensus Measures . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2.2 Hierarchical Modular Structure of the Brain During the ErrorRelated Negativity . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2.3 Inter and Intra Edge Phase Synchrony . . . . . . . . . . . . . .
4.5.2.4 Local Hubs of the Brain Network . . . . . . . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

115
115
118
118
118
120
123
125
127
127

.
.
.
.
.

132
134
134
136
140

. 140
. 142
. 142
.
.
.
.

143
144
145
145

Chapter 5
Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Bibliography . . . . . . . . . . . . . . .

viii

. . . . . . . . . . . . . . . . . . . . 165

LIST OF TABLES

Table 1.1

Basic graph measures for undirected binary and weighted graphs. . . . . . 14

Table 2.1

Comparisons between the proposed and existing weighted clustering coefﬁcients for different variations of a triplet. . . . . . . . . . . . . . . . . 30

Table 2.2

Correlations between multivariate small world and bivariate (FCz-F5/FCzF6) measures for both error and correct responses individually and the
error-correct differences (all correlations signiﬁcant; p<0.001). . . . . . . 50

Table 3.1

Highest energy contributing nodes in karate network compared with their
corresponding clustering coefﬁcient and betweenness centrality . . . . . . 85

Table 3.2

Entropy computed for various graph models using the proposed measure.
Fifty realizations of each model were generated and an average entropy
is reported. Each value is normalized by that of a random network of
equivalent order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Table 3.3

Graph entropy computed for various graph models using Burda’s deﬁnition. Fifty realizations of each model were generated and an average
entropy is reported. Each value is normalized by that of a random network of equivalent order. . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Table 3.4

Entropy rates for three real scale-free networks using the proposed measure and Burda’s measures. All measures are normalized by the entropy
rate of random networks with equivalent order. . . . . . . . . . . . . . . . 96

Table 3.5

Bits/symbol achieved with Lempel-Ziv algorithm. . . . . . . . . . . . . . 97

Table 3.6

Bits/symbol achieved using Lempel-Ziv algorithm for four real scale-free
networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
ix

Table 4.1

Top 3 electrodes with largest differences between Correct and Error ‘participation scores’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

Table 4.2

Number of modules identiﬁed through optimization of Q and λ in simulated data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

x

LIST OF FIGURES

Figure 1.1

Lobes of the cerebral cortex (www.morphonix.com). For interpretation of
the references to color in this and all other ﬁgures, the reader is referred
to the electronic version of this dissertation. . . . . . . . . . . . . . . . .

8

Figure 1.2

10/20 EEG electrode arrangement (www.frontalcortex.com). . . . . . . . 10

Figure 1.3

Example of an undirected 6 node graph. . . . . . . . . . . . . . . . . . . 13

Figure 1.4

Zachary Karate Club Network. . . . . . . . . . . . . . . . . . . . . . . . 17

Figure 2.1

Nodes A and B have similar path length, however, connectivity in (b) is
less robust than in (a). In (b), connectivity between nodes A and C is
more robust than connectivity between nodes A and B; despite A and C
having a longer path length than A and B. . . . . . . . . . . . . . . . . . 37

Figure 2.2

Markov small-world measures averaged over 100 benchmark networks
of order N = 50. a) The proposed binary clustering coefﬁcients of the
simulated small-world networks are much greater than that of random
networks, i.e. γ 1; b)The proposed binary path lengths of the simulated
small-world networks are similar to that of random networks, i.e. λ ≈ 1 . 54

Figure 2.3

Markov and conventional clustering coefﬁcients applied to binary karate
network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Figure 2.4

Markov and conventional path lengths applied to binary karate network. . 56

Figure 2.5

Nodal weighted clustering coefﬁcients computed using Onnela’s clustering coefﬁcient and the proposed clustering coefﬁcient for Zachary’s
Karate club network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Figure 2.6

Nodal weighted path lengths computed using Stam’s path length and the
proposed path length for Zachary’s Karate club network. . . . . . . . . . 58
xi

Figure 2.7

Connectivity plot based on the RID-TFPS measure for electrode pairs
whose phase synchrony is signiﬁcantly larger for error responses compared to correct responses (Wilcoxon rank sum, 0.0001 ≤ p ≤ 0.001). . . 59

Figure 2.8

Topomap of the differences in the nodal weighted clustering coefﬁcient,
path length and small world parameter for error and correct responses.
These differences are computed for both the previously published and
proposed measures. Topographic signiﬁcance plots are provided for each
measure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Figure 2.9

Optimal binary graph generated from weighted connectivity matrices for
error responses. Hubs of the network are indicated in color. Hubs are
identiﬁed based on the number of shortest paths in which each node participates. Green indicates the most important local hubs, blue indicates
nodes globally important for connecting distant portions of the graph, and
red is the FCz node (also an important local hub) . . . . . . . . . . . . . 61

Figure 2.10

Optimal binary graphs generated from weighted connectivity matrices for
correct responses. Hubs of the network are indicated in color. Hubs are
identiﬁed based on the number of shortest paths in which each node participates. Green indicates the most important local hubs, blue indicates
nodes globally important for connecting distant portions of the graph, and
red is the FCz node (also an important local hub). . . . . . . . . . . . . . 62

Figure 2.11

Comparisons between small world calculations using the conventional
thresholding method and the propose optimal binary transformation approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Figure 3.1

Local energy indices for a n = 34 karate club network: 1-hop Energy
Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Figure 3.2

Local energy indices for a n = 34 karate club network: 2-hop Energy
Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Figure 3.3

Local energy indices for a n = 34 karate club network: 3-hop Energy
Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Figure 3.4

1-hop neighborhood Energy differences (Error - Correct). . . . . . . . . . 89

Figure 3.5

2-hop neighborhood Energy differences (Error - Correct). . . . . . . . . . 90
xii

Figure 3.6

Comparisons between nodal graph entropy measures for identifying anomalous regions in a simulated network with n = 60 nodes. The network is
composed of four complete graphs and a random graph performing as a
perturbation in the network. . . . . . . . . . . . . . . . . . . . . . . . . 92

Figure 3.7

Comparisons between nodal graph entropy rate measures for identifying
important regions in the karate club network with n = 34 nodes. . . . . . 93

Figure 3.8

Average graph entropy, computed over 50 simulations of each network
type, with respect to various permutation and scanning functions across
graph topologies each with 128 nodes. a) Random networks; b) Modular
networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Figure 3.9

Average graph entropy, computed over 50 simulations of each network
type, with respect to various permutation and scanning functions across
graph topologies each with 128 nodes. a) Scale-Free networks; b) Star
networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Figure 3.10

Average graph entropy, computed over 50 simulations of each network
type, with respect to various permutation and scanning functions across
graph topologies each with 128 nodes (Lattice networks) . . . . . . . . . 101

Figure 3.11

Effects of order value on graph entropy measures for various graph topologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Figure 3.12

Rate/Distortion curves average and standard deviation across all subjects
for each response type: a) error; b) correct. . . . . . . . . . . . . . . . . 103

Figure 3.13

Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy
values using Lloyd-Max quantizer optimization for Q = 2 levels. . . . . . 104

Figure 3.14

Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy
values using Lloyd-Max quantizer optimization for Q = 3 levels. . . . . . 107

Figure 3.15

Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy
values using Lloyd-Max quantizer optimization for Q = 4 levels. . . . . . 108

Figure 3.16

Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy
values using Lloyd-Max quantizer optimization for Q = 5 levels. . . . . . 109

Figure 3.17

Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy
values using Lloyd-Max quantizer optimization for Q = 6 levels. . . . . . 110
xiii

Figure 3.18

Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy
values using Lloyd-Max quantizer optimization for Q = 7 levels. . . . . . 111

Figure 3.19

Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy
values using Lloyd-Max quantizer optimization for Q = 8 levels. . . . . . 112

Figure 3.20

Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy
values using Lloyd-Max quantizer optimization for Q = 9 levels. . . . . . 113

Figure 3.21

Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy
values using Lloyd-Max quantizer optimization for Q = 10 levels. . . . . 114

Figure 4.1

Weighted matrix to adjacency matrix for a given threshold . . . . . . . . 129

Figure 4.2

Participation Scores of various clustering derived from various thresholds
of the network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Figure 4.3

Average Participation scores in terms of network degree . . . . . . . . . . 131

Figure 4.4

Variations in homogeneity and completeness where the true number of
modules is 3; a) High homogeneity and low completeness; b) High completeness and low homogeneity; c) High homogeneity and completeness. . 138

Figure 4.5

Modularity with respect to degree and number of clusters for Correct responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Figure 4.6

Modularity with respect to degree and number of clusters for Error responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Figure 4.7

Six overlapping clusters derived for Correct responses. . . . . . . . . . . 149

Figure 4.8

Six overlapping clusters derived for Error responses. . . . . . . . . . . . . 150

Figure 4.9

Hierarchical decomposition of the brain network during an Error response. 151

Figure 4.10

Hierarchical decomposition of the brain network during a Correct response.152

Figure 4.11

Signiﬁcance cluster plots for Error responses. . . . . . . . . . . . . . . . 153

Figure 4.12

Signiﬁcance cluster plots for Correct responses . . . . . . . . . . . . . . 154
xiv

Figure 4.13

Cross-Module average synchrony for Error responses across four clusters. 155

Figure 4.14

Cross-Module average synchrony for Correct responses across 2 clusters. 156

Figure 4.15

Normalized differences between Hub Identiﬁcation Measures between
Error and Correct a) Within Module Z-Score; b) Participation Coefﬁcient. 157

xv

Chapter 1

1.1

Introduction

To this day, the human brain remains one of the most intriguing objects of interest in the
scientiﬁc community. It is a complex network composed of billions of neurons and is capable
of performing parallel information processing millions of times faster and more efﬁciently than
any supercomputer in current existence. In contrast to man made systems, its performance tends to
degrade gracefully under partial damage and can learn to reorganize itself to bypass damaged areas.
Furthermore, the brain supports intelligence and self-awareness simultaneously. The dynamics of
communication among distributed regions responsible for these remarkable characteristics of the
brain are not yet fully understood by the neuroscience community. Experimental approaches to
human cognition have been improved with the development of advanced functional neuroimaging
techniques, however, interpretations of activation patterns is limited due to a lack of information
pertaining to the nature of structure-function mappings of the brain.
As a response to this problem, the human ‘connectome’ project was initiated in July 2009 and
is sponsored by sixteen components of the National Institutes of Health (NIH) [1]. The goals of
this ﬁve year project are to increase the understanding of how functional brain states emerge from
their underlying structural substrate, provide new mechanistic insights into how brain function
1

is affected if this structural substrate is disrupted, and allow ‘connectome’ data and research to
be easily accessible by the entire scientiﬁc community [2]. Ultimately, the project will facilitate
the advancement of imaging and analyzing brain connections, resulting in improved sensitivity,
resolution, and utility, thereby accelerating progress in the emerging ﬁeld of human connectomics
[3], the application of neural imaging techniques in order to increase the resolution of maps of the
brain’s neural connections.
This dissertation focuses on the ‘connectome’ goal of understanding how function is associated
with an underlying structure using advanced neuroimaging techniques. Neuroimaging techniques
such as fMRI, electroencephalography, and magnetoencepholography make it possible to study
brain activity with high spatial and temporal resolution [4] and better understand the processes
underlying human cognition. However, interpretation of this data is limited due to the lack of
understanding pertaining to the organizational structure of the brain. The brain is theoretically believed to function according to two primary organizational principles: functional segregation and
functional integration [5]. Functional segregation describes the tendency of neural assemblies to
merge together into specialized groups segregated across the cortical regions of the brain. Functional integration describes the coordinated activation of these segregated groups which enables
cognitive processes and motor responses. The interplay of these principles is thought to contribute
to the complexity of the brain network. Currently, three types of networks are recognized for forming the overall brain network: the anatomical connectivity network (physical links between neural
assemblies), functional connectivity network (statistical dependencies between neural assemblies),
and effective connectivity network (causal interactions between neural assemblies). Functional
connectivity can be quantiﬁed using neuroimaging techniques, which can record data from the
macroscale anatomical regions of the brain, combined with statistical measures of dependency that

2

quantify the relationships between distributed regions of the brain.
Functional connectivity is deﬁned as the temporal correlations between spatially remote neurophysiological events [6] and is the key to understanding how the coordinated and integrated activity
of the human brain takes place [7]. Depending on the measure used to quantify it, functional connectivity may reﬂect linear or nonlinear interactions [8] or interactions observed at various time
scales [9]. The literature has demonstrated that local and long range synchronization of oscillatory neural activity facilitates in the exchange of information throughout the brain. The strength
of these interactions can quantify pair-wise connectivity in the brain, i.e. functional connectivity.
Unfortunately, functional connectivity does not describe a link between function and organization
of the brain but only describes the pair-wise associations of neural populations. The challenge is to
deﬁne methods which reveal the underlying complex network associated with the observed pairwise connectivity patterns. Recently, graph theoretical methods have been implemented to address
this problem.
Graph theory describes the important properties of complex systems by quantifying topologies
of their respective network representations [8]. It has been used for analyzing social networks,
the internet, power grids, protein-protein networks, and more recently the brain [10, 11, 12, 7].
Using graph theory, He et al. [13] showed that Alzheimer’s disease patients had abnormal topological organization in the whole-brain structural networks and decrease of hubness occurred in
the temporal and parietal regions. These changes suggest a shift to more local processing and a
disrupted structural integrity of the larger-scale brain systems [14]. Bassett et al. [15] reported
that schizophrenia patients are associated with abnormal topology in the multimodal brain network characterized by a reduced organizational hierarchy, an increased distance between network
nodes, and a general loss of hubness in the frontal lobe. In an EEG study on epilepsy, Dellen et. al.

3

[16] reports local graph measures, describing temporal lobe functional network connectivity, are
lower and correlated with a more random network in patients with longer histories of epilepsy. Finally, Leistedt et. al. [17] showed decreases in small-world connectivity, associated with problems
in information processing across the brain’s functional network, may be indicative of depressive
disease. Although these studies have demonstrated the usefulness of graph theory for revealing
functional networks from neurimaging data, the implementation of graph theory is limited. The
reason is that these measures were originally employed as modeling tools for random graphs and
graphs whose properties are well understood, e.g. bipartite, tree, planar, cyclic [18]. However,
the properties observed in these well-understood graphs are rarely encountered in real world networks. This naturally poses the challenge of developing biologically meaningful measures for the
topologies observed in biological networks.
In the application of graph theoretic measures to the study of functional brain networks, some
problems are encountered. First, most graph theoretical tools are only applicable to binary networks in which edges are unweighted unlike the case of functional connectivity where a weight is
assigned to all node pairs. Second, graph theory measures are typically applied to sparse networks
whereas functional connectivity matrices are fully connected. Typically, the weighted information
is thresholded to obtain a binary network, thus discarding important information contained by the
edge weights. There is also a lack of a sufﬁcient set of local measures to describe local connectivity. Most measures characterize the network as a whole rather than by its subregions. Last, to
integrate the abstractness of graph theory with the analysis of a real biological network, methods
borrowed from signal processing, information theory, and statistical physics need to be incorporated into designing new graph measures. In this dissertation, the newly proposed graph measures
and methods are speciﬁcally designed to evaluate functional connectivity matrices obtained from

4

multichannel neurophysiological data.
In this dissertation, new graph measures, weighted clustering coefﬁcient and weighted path
length, are introduced to deﬁne a weighted small-world measure for the brain network without
resorting to thresholding of the weighted data. Also, methods from signal processing and information theory are used to develop new local measures of graph energy and graph entropy for weighted
graphs to quantify each node’s contribution to network connectivity and to identify anomalous subregions. Finally, a representation of the hierarchical modular organization common across multiple
brain networks is extracted using a bi-partitioning algorithm based on spectral decomposition of
these networks.
The proposed measures are applied to functional connectivity graphs formed by computing
pairwise phase synchrony between neural oscillations. A recently introduced time-varying phase
synchrony measure [19] was used to quantify the statistical dependencies between neural populations in data, gathered during an EEG study [20], containing the error-related negativity (ERN).
The ERN is a brain potential response that occurs following performance errors in a speeded reaction time task. The study tested the hypothesis that the medial prefrontal cortex (mPFC) interacts
with the lateral prefrontal cortex (lPFC) in a dynamic loop during goal directed performance. This
theoretical network uses a monitoring system, in particular the anterior cingulate cortex (ACC), to
signal the need for enhanced cognitive control during conditions of conﬂict or after an error. After
an error, the action-monitoring system (mPFC) is proposed to function as an alarm, recruiting the
control network (lPFC) to enhance attention. Recently, Cavanagh et. al. [21] presented evidence
that synchronized neural oscillations may be one mechanism that underly this functional communication between action-monitoring and cognitive control networks across mPFC and lPFC regions.
The proposed graph measures will give a succinct representation of the functional networks in-

5

volved during action-monitoring and cognitive control and provide insight into the reorganization
of the neural networks during error processing.

1.2

Background

In this section, a short review on human brain physiology, neuroimaging techniques, metrics
quantifying functional connectivity, and graph theory concepts is presented. This study is particularly interested in evaluating ‘scale-free’ networks. Benchmark scale-free networks, which are
used to test the proposed graph measures, are also introduced. Finally, a description of the biological dataset, an EEG study performed at the University of Minnesota, evaluated within this
dissertation is presented.

1.2.1

Physiology of the Human Brain

Neurons extend throughout the human body and vary in structure depending on their speciﬁc
purpose. A typical neuron possesses three distinct components: the soma, axon, and dendrites.
The soma is the cell body of the neuron containing the nucleus and ﬂuid separating it from other
cell bodies and the axon allows the neuron to transmit electrical information to other neurons. The
axon extends from the soma into far regions of brain tissue where it ultimately terminates and splits
into smaller branches. Only one axon can ever extend from a soma; however, numerous dendrites
may exist. Signals deriving from other neurons are received either by the soma directly or by its
dendrites. Dendrites can extend and split into branches many times but become thinner the further
out they reach; this is unlike axons which maintain a fairly constant width [22].
Communication among neurons is due to the action potential, abrupt depolarizations that are
the basis for neural signaling in the electrically excitable domains [22]. Neurons transmit electrical
information via ion channels which are opened and closed according to concentrations of sodium,
potassium, and chloride ions. At rest, these three chemicals remain at stable concentrations and
6

cause the neuron to have a resting state potential. Typically, this resting state is −70 mV thus
indicating the cell membrane is polarized such that the inside is more negative than the outside.
As the chemical concentrations change, depolarization, an increase in positive polarity, or hyperpolarization, an increase in negative polarity, occurs within the cell. When a motor or cognitive
response occurs, the neurons follow a three phase action potential. The ﬁrst phase begins with a
depolarization of the cell to which the potential increases to 56 mV. This is immediately followed
by a hyperpolarization phase which decreases the potential to −93 mV, past the resting potential.
Finally, the cell depolarizes again and the potential returns back to −70 mV. The action potential is
dependent upon the magnitude of the stimulus received by the neuron. This magnitude is referred
to as the threshold which typically ranges between 0 and 10 mV. When the stimulus causes the
resting potential to reach a value between this range, the action potential activates, otherwise no
action potential takes place. There does not exist a medium action potential hence the "all or none"
concept of a neuron. After an action potential occurs, there exists a refractory period in which the
neuron will not have another action potential even if it receives a stimulus surpassing the threshold.
The threshold and refractory period vary with each type of neuron, but the "all or none" concept
remains true for all neurons.
Neuron function and structure vary throughout the brain but for the purpose of evaluating functional connectivity, neural regions of the cerebral cortex are of primary interest. The cerebral cortex
is the top layer of the human brain and can be categorized into four primary sections called lobes:
Frontal, Occipital, Parietal, and Temporal (Figure 1.1).
The cortex is also split into two functional halves, or hemispheres, by the midline of the brain
called the corpus callosum. The functions associated with the frontal lobe are motor functions,
reasoning, judgement, impulse control, planning, and memory. The functions associated with

7

Figure 1.1: Lobes of the cerebral cortex (www.morphonix.com). For interpretation of the references to color in this and all other ﬁgures, the reader is referred to the electronic version of this
dissertation.
the occipital lobe include vision and color recognition. The parietal lobe processes information,
registers sensations of pain, controls speech, visual perception and maintains spatial orientation.
Finally, the temporal lobe is associated with emotion, hearing, memory, and speech [22]. The
neurons in these regions are densely packed and may be classiﬁed as pyramidal or non pyramidal.
The pyramidal cells’ action potentials along the cerebral cortex can be monitored through the
electroencephalogram.

1.2.2

The Electroencephalogram (EEG)

Numerous neuroimaging methods exist such as Magnetoencephalography (MEG), functional
Magnetic Resonance Imaging (fMRI), Positron Emitted Tomography (PET), and the electroencephalogram (EEG). To examine functional integration in the temporal frame, there is a need to
8

characterize the temporal dynamics of neural networks with millisecond accuracy. Neuroimaging techniques with high temporal resolution, such as the EEG, are the most appropriate tools for
examining the dynamic interactions of neural networks.
The electroencephalogram is composed of a collection of oscillating electrical signals observed
at each channel, i.e. electrodes, on the surface of the head/scalp. The electrodes used for recording
an EEG are typically assembled in a 10/20 arrangement (Figure 1.2) such that the physical distance
between adjacent electrodes are 10% or 20% of the total front to back or left to right distance of
the skull. Each electrode is assigned a letter representing a lobe and a number for the hemisphere.
The letters F, T, C, P, and O represent the frontal, temporal, central, parietal, and occipital regions
respectively. Although there is no central lobe, the label exists to help indicate location. Electrodes
located along the midline of the skull are represented by this set of letters accompanied by the letter
‘z’ . Letters accompanied by even numbers are electrodes belonging to the right hemisphere and
those with odd numbers represent the left hemisphere [23].
Electroencephalogram oscillations can be separated into different frequency bands and classiﬁed as ﬁve types of waves: delta (1 - 4 Hz), theta (4 - 7 Hz), alpha (8 - 12 Hz), beta (12 -30 Hz),
and gamma (25 - 100 Hz). Delta waves are very slow high amplitude oscillations observed during
sleep. Theta waves are mostly observed in children; however, only during drowsy or meditative
states can they be observed in older adults. They are also associated with moments of enhanced
creativity and daydreaming. Alpha waves are associated strongly with the occipital lobe. These
waves are highly observable during wakeful relaxation when the eyes are closed. The waves disappear when the eyes are open or when the individual is asleep. Beta waves are associated with
active and anxious thinking. Current research suggests gamma waves are associated with conscious
perception [24].

9

Figure 1.2: 10/20 EEG electrode arrangement (www.frontalcortex.com).

1.2.3

Measures of Functional Connectivity

After selecting an appropriate neuroimaging technique, a metric to quantify functional connectivity needs to be deﬁned. Types of indices used for quantifying functional connectivity include
linear measures, such as correlation and coherence, and nonlinear measures, such as phase synchrony and generalized synchronization measures [25].
Phase synchronization between neuronal oscillations has attracted a lot of attention in the last
10

decade for understanding the functional integration during cognition and attention tasks. It also has
increased understanding of the underlying networks involved in psychopathologies. Synchrony is
a commonly used measure of functional connectivity between neuronal oscillations and quantiﬁes
the relationship between the temporal structures of the signals regardless of signal amplitude. The
amount of synchrony between two signals is usually quantiﬁed by estimating the instantaneous
phase of the individual signals around the frequency of interest. The two main current approaches
to isolating the instantaneous phase of the signal are Hilbert transform and complex wavelet transform. In both methods, the goal is to obtain an expression for the signal in terms of its instantaneous
amplitude, a(t) and phase φ (t) at the frequency of interest as x(t, ω) = a(t) exp( j(ωt + φ (t))).
˜
This formulation can be repeated for different frequencies and the relationships between the temporal organization of two signals, x and y, can be observed by their instantaneous phase difference
Φxy (t) = |nφx (t) − mφy (t)|, where n and m are integers that indicate the ratios of possible frequency locking. Most studies focus on the case n = m = 1. In neuroscience, the focus is on the
case when the phase difference is approximately constant over a limited time window. This is
deﬁned as a period of phase locking between two events.
In this study, a new time-varying measure of phase synchrony [19] based on the complex timefrequency distribution known as the Rihaczek distribution [26], is used to quantify the non-linear
dependencies between the neural assemblies of the brain. For a signal, x(t), Rihaczek distribution
is expressed as

1
C(t, ω) = √ x(t)X ∗ (ω)e− jωt
2π

(1.1)

and measures the complex energy of a signal at time t and frequency ω.
One of the disadvantages of Rihaczek distribution is the existence of cross-terms for multicom-

11

ponent signals. In order to get rid of these cross-terms, a reduced interference version of Rihaczek
distribution was introduced by applying a kernel function such as the Choi-Williams (CW) kernel
−(θ τ)2
with φ (θ , τ) = exp( σ ) to ﬁlter the cross-terms to obtain

C(t, ω) =

exp

−(θ τ)2
σ

exp j

θτ
A(θ , τ)e− j(θt+τω) dτdθ
2

(1.2)

where A(θ , τ) = x(u+ τ )x∗ (u− τ )e jθ u du is the ambiguity function of the signal and exp( jθ τ/2)
2
2
is the kernel corresponding to the Rihaczek distribution [26]. The phase difference between two
signals based on this complex distribution is computed as
∗
C1 (t, ω)C2 (t, ω)
Φ12 (t, ω) = arg
|C1 (t, ω)||C2 (t, ω)|

(1.3)

where C1 (t, ω) and C2 (t, ω) refer to the complex energy distributions of the two signals x1 (t)
and x2 (t) respectively and a synchrony measure quantifying the intertrial variability of the phase
differences, phase locking value (PLV), is deﬁned as

PLV (t, ω) =

1
N

N

∑
k=1

exp( jΦk (t, ω))
12

(1.4)

where N is the number of trials and Φk (t, ω) is the time-varying phase estimate between two
12
electrodes for the kth trial. If the phase difference varies little across the trials, PLV is close to 1.

1.2.4

Graph Theory

Graphs can be used to model relationships in a variety of complex networks including social
and biological networks. In recent years, the developments in the quantitative analysis of complex
networks, based largely on graph theory, have been applied to studies of functional brain networks
[27]. An undirected graph, G = (V, E), is deﬁned by a set of vertices, V = {v1 , v2 , ..., vn } of order
12

n and a set of edges, E = {ei j |vi , v j ∈ V } of size m (Figure 1.3). The graph can be represented by
an n × n binary adjacency matrix, A, such that

A(i, j) =



 1 if an edge exists between v and v
i
j

(1.5)


 0 otherwise
or by a weighted matrix W such that W (i, j) = wi j where wi j can be any real number quantifying the relationship between vertices vi and v j . Another class of graphs are directed graphs, or
di-graphs, which assign direction to the edges, i.e. wi j is not necessarily equal to w ji . In this
dissertation, the focus is on weighted symmetric graphs where wi j ∈ [0, 1]. The most fundamental measures that deﬁne the properties of a graph or di-graph are the degree of nodes, cost of the
network, and the distance matrix (Table 1.1).

Figure 1.3: Example of an undirected 6 node graph.

This phase synchrony measure quantiﬁes the dependent relationships between neural regions,
which are represented by the EEG electrodes, and can be represented as an undirected weighted
graph. The strength of the relationship between an electrode pair, i.e. edge weight, quantiﬁed
by the phase locking value is represented by a single number for a particular time and frequency
window obtained by averaging PLV (t, ω) in that time-frequency window. This is repeated for all
13

Graph Measure
Adjacency Matrix
Connectivity Matrix
Edge Weight
Binary Node Degree

Symbol
A
W
Wi j
ki
w
Weighted Node Degree
ki
Degree Matrix
K
Laplacian Matrix
L
Path
Pi j
Shortest Path
SPi j
Distance Matrix
D

Equation

ki = ∑n Ai j
j=1
w = ∑n W
ki
j=1 i j
Kii = ki
L= K −W

Description
1 indicates an edge (0 otherwise).
Set of all pair-wise strengths.
Edge value.
Number of node neighbors.
Total weight of all neighbors.
Node degrees along diagonal.
Set of edges between a node pair.

SPi j = min Pi j
All pair-wise shortest paths.

Table 1.1: Basic graph measures for undirected binary and weighted graphs.

electrode pairs and the resulting phase locking values generate a weighted connectivity matrix W ,
where element wi j ∈ W is the phase locking value between electrodes vi and v j for a particular
time-frequency window.

1.2.5

Graph Topologies

In this section, a range of graph topologies often evaluated in graph theory research are presented. This includes lattice, random, star, and scale-free graphs.
1.2.5.1

Lattice Graphs

Ball et. al. Coxeter [28] deﬁnes a lattice graph as the graph obtained by taking the n2 ordered
pairs of the ﬁrst n positive integers as nodes and drawing an edge between all pairs having exactly
one number in common. This type of graph is highly regular and organized and represents the
spectrum of graphs opposite to random graphs.
1.2.5.2

Random Graphs

There are various methods to generating a random graphs. The most popular random graph
model is described by Erdos and Renyi [29]. Given a ﬁnite set S and a function P : S → [0, 1] where
∑s∈S P(s) = 1, we call (S, P) a probability space with sample space S and probability function P.
14

Recall that Gn denotes the set of all labeled simple graphs on n nodes. Given a probability space
(Gn , P), selecting an element of Gn according to the probability function P is called a random
graph. Let p be a real number between 0 and 1, and consider the function P : Gn → [0, 1] such that
for each graph G in Gn ,

P(G) = pm (1 − p)N−m

(1.6)

where m is the size of the graph G and N = n . Using the binomial theorem, (Gn , P) is indeed a
2
probability space such that

(a + b)r =

r

∑
k=0

r k r−k
a b
k

(1.7)

The above (Gn , P) model is known as the Erdos-Renyi model of random graphs. In the Erdos1
Renyi model, when p = 2 , the probability of selecting a graph G ∈ Gn at random is 1 . This
2N
is interpreted as selecting a member of Gn where all members have the same probability of being
chosen [30]. Watts et. al. [31] presented another method of generating a random network by
rewiring the edges of a lattice graph. Each edge is rewired with probability p where 0 ≤ p ≤ 1.
1.2.5.3

Star Graphs

Star graphs are connected graphs composed of a set of nodes such that the degree of node v1
is d(1) = n − 1 and the degree of all other nodes is d(i) = 1|i = 1.
1.2.5.4

Scale-Free Networks

A scale free network is a type of non-random network whose degree distribution ﬁts a power
law model, p(k) = k−α , such that 2 ≤ α ≤ 3. Many real world networks such as social networks,
power grids, neural networks follow a scale free topology. Watts et. al. [31] investigated networks
generated by rewiring the nodes and edges of a regular network each with probability p. A regular
15

network of N nodes and k edges is a network where all nodes are connected to the same number of
edges. Therefore a 5-regular network has ﬁve edges connected to each of its node members. Watts
et. al. [31] explored the unknown region between a regular network (p = 0) and a completely
disorganized random network (p = 1). A ‘small world’ network was generated by using regular
networks whose edges were rewired with 0 < p < 1. ‘Small world’ networks demonstrated features
that distinguished them from other networks in the sense that they possessed meaningful structural
topologies. ‘Small world’ networks possess degree distributions similar to a power law model and
therefore are considered scale free. This dissertation will use the terms ‘small world’ and scale
free interchangeably.

1.2.6

Benchmark Data

In this section, the various networks used to test the proposed measures in this dissertation are
presented. This includes simulated networks and real-world networks.
1.2.6.1

Simulated Data

Scale-free graphs were generated using the LFR benchmark software [32] which generates
networks of a known underlying structure, such that membership of segregated clusters is known.
The LFR benchmark is capable of generating graphs of any size with increasing complexity. Complexity is deﬁned here as the total number of edges between clusters, kout , with respect to the the
total number of edges, m.
1.2.6.2

Real-World Networks

Four different real-world networks were used to evaluate some of the measures presented in
this dissertation:

• A karate club social network [33]: Zachary’s karate club, Figure 1.4, is a social network of
friendships among 34 members of a university karate club in 1970. The network represents
16

the split within the group following disputes among the members. The relationship between
members are presented as an unweighted matrix such that an edge exists only if two members
were friends in the class and as a weighted matrix where entries represent the number of
situations in and outside the club in which interactions occurred. The senior members of the
karate group are numbered 1, 33, and 34. This is a useful network since the functional role
of each of the nodes is well known and is revealed to have a scale-free structure such that the
degree distribution follows a power law where α ≈ 2.

Figure 1.4: Zachary Karate Club Network.

• Dolphin network [34]: This network is derived from a social network analysis of statistically
signiﬁcant frequent associations between 62 bottlenose dolphins in New Zealand.

• Les Miserables [35]: The ﬁctional social network describes the co-appearance of 77 characters in the novel Les Miserables.

• The North American airport network [36]: This network is a summary of the frequent
ﬂight/connection patterns associated 456 U.S. airports.
17

1.2.6.3

Biological Data

The proposed phase synchrony measure in conjunction with graph measures are applied to a set
of EEG data containing the error-related negativity (ERN). The ERN is an event-related potential
that occurs following performance errors in a speeded reaction time task. Previously reported EEG
data [20] from 62-channels (10/20 system) were utilized. This included 90 undergraduate students
(34 male) from the University of Minnesota (two of the original 92 participants were dropped due
to artifacts rendering computation of the TFPS values problematic). Full methodological details
of the recording are available in the previous report. The task was a common speeded-response
letter (H/S) ﬂanker, where error and correct response-locked trials were from each subject were
utilized. A random subset of correct trials was selected, to equate the number of errors relative
to correct trials for each participant. Before computing the phase-synchrony measures, all EEG
epochs were converted to current source density (CSD) using published methods [37, 38]. This
was done to accentuate local activity (e.g. to better index small world properties) and to attenuate
distal activity (e.g. volume conduction). Aviyente [19] reported there is increased phase synchrony
associated with ERN for the theta frequency band (4-7 Hz) and ERN time window (25-75 ms) for
Error responses compared to Correct responses. It is expected that the proposed graph measures in
this dissertation will corroborate Aviyente’s ﬁndings.
Functional networks underlying control processes have been an important recent focus of study
and provide a reasonable test of the current measure of functional connectivity. In particular, a
great deal of empirical and theoretical work suggests that frontal networks involving the medial
prefrontal cortex (mPFC) and lateral prefrontal cortex (lPFC) are centrally involved in cognitive
control processes [39, 40, 41, 42, 43, 21]. Time-frequency (TF) analysis techniques have been
shown to be particularly well suited to isolating these control-related processes in the theta band

18

(3-7 Hz), because they can disentangle relevant overlapping event-related activity occurring at
different frequencies [44, 45, 20, 46, 47]. Recently, Cavanagh and colleagues [21] presented evidence that synchronized neural oscillations in the theta-band may be one mechanism that underlies
functional communication between networks involving mPFC and lPFC regions during the ERN.
They quantiﬁed dynamics of the ERN by computing bivariate theta-band wavelet-based phase
synchrony between midline-frontal and lateral-frontal sites. Speciﬁcally, they found increased
synchronization in the theta band between F5-FCz and F6-FCz for error, and that this difference
signiﬁcantly predicted post-error slowing. Together, these ﬁndings support the view that TFPS is
sensitive to the changes in functional connectivity in frontal networks underlying cognitive control, and suggest the hypothesis that this increased connectivity would correspond to increased
small-world network behavior.

1.3

Thesis Outline

This thesis is organized as follows. First, in Chapter 2, graph measures classifying a network
as ‘small-world’ are reviewed for unweighted and weighted connection matrices. An important
limitation of current ‘small-world’ measures is that they are mostly restricted to binary undirected
graphs which require the transformation of the pairwise connectivity indices, e.g. bivariate phase
synchrony values, to 0 or 1 by means of a thresholding operation. This thresholding operation
loses the information about connectivity strengths between pairs of electrodes as well as brings up
the issue of the choice of the optimal threshold. In a lot of studies using ‘small-world’ measures, a
range of thresholds is chosen corresponding to graphs with different degrees to avoid the arbitrary
choice of a threshold. Although this approach does not suffer from the choice of the threshold,
it increases the computational complexity of the analysis and results in uncertainty about which
threshold is best for representing the underlying network. A weighted network, on the other hand,

19

provides further relevant information by indicating the varying levels of strengths of the pairwise connections and may be especially useful in ﬁltering the inﬂuence of weak and potentially
non-signiﬁcant links. Recently, generalizations of small world measures including the weighted
clustering coefﬁcient and weighted path length have been introduced. However, these measures
do not incorporate multiple and longer paths between nodes which may signiﬁcantly contribute to
integration in larger and sparser networks. Therefore, Chapter 2 presents new weighted clustering
coefﬁcient and weighted path length measures that take all possible connections between nodes
into account to compute the small worldness of the network. Also, for situations when thresholding
is absolutely required, a process to construct an optimal binary mapping of the biological network
is presented.
In chapter 3, concepts from signal processing and information theory are employed to introduce new local measures or features to describe graph anomalies and hubs. A new measure of
localized graph energy quantiﬁes a brain region’s local contribution to functional global organizational structure during error processing. This method is based on the idea of h-hop neighborhoods
of a node which can describe overall graph connectivity, similar to time scaling and shifting in a
signal. The proposed measure of energy, Laplacian Huckel Energy, is based on the spectral features of the graph which provide greater insight on the structural characteristics of the network
unlike other published measures of graph energy. Furthermore, Chapter 3 presents how global
and local complexities of unweighted and weighted graphs are quantiﬁed using a new measure of
graph entropy rate. The adjacency and connectivity matrices are modeled as a Markov process
and entropy is computed according to state transitions in these matrices. The entropy rate of various graph topologies are investigated to determine their complexity. The entropy rate measure
is extended to evaluate the local properties of graphs to identify anomalies and hubs of the net-

20

work. Ultimately, the goal of the proposed entropy rate measure is to design a generalized graph
compression algorithm.
In chapter 4, multivariate analysis is performed on connectivity matrices by implementing new
hierarchical spectral clustering techniques over a group of networks. Clustering algorithms are
typically applied to sparse and unweighted networks, whereas the proposed methods are designed
speciﬁcally for fully connected and weighted networks. CONGA, a hierarchical clustering method,
is implemented in a new clustering process to identify an optimal community structure in which
nodes are allowed to participate in multiple clusters. Finally, most clustering algorithms compute
an average connectivity matrix when evaluating multiple networks, thus disregarding variances
across subjects. A spectral clustering algorithm, implementing the Fiedler vector, identiﬁes a community structure common to a set of multiple connectivity matrices without ﬁrst averaging the
matrices. These techniques are applied to EEG data to identify which brain regions form highly
synchronized segregated groups during error processing.
In chapter 5, the contributions of this dissertation are summarized as well as the ﬁnal evaluation
of the ERN dataset. Finally, future work pertaining to clustering, graph compression, and dynamic
brain networks is discussed.

21

Chapter 2
Small-World Type Measures for Weighted
Graphs

2.1

Introduction

Many complex networks such as social and biological networks demonstrate a phenomenon
referred to as ‘small world’. The concept of ‘small world’ originally stems from sociology and
refers to a network composed of a sparsely connected set of nodes which form dense clusters
(producing a high clustering coefﬁcient) linked by short path lengths [31]. A ‘small world’ is
thought to lie between two extreme conﬁgurations: regular lattice and random graph which are
deﬁned in Chapter 1. The advantage of characterizing a complex network as ‘small world’ is
that it offers a quantitative means to evaluate varying levels of non-randomness in the network.
The frequent occurrence of dense clusters introduces redundancy in the network thus increasing
stability of the system and robustness in case of node breakdown [48]. Short path lengths permit
rapid communication and robustness to random network attacks [49]. In the brain network, high
clustering coefﬁcient promotes functional overlap of densely connected neuronal elements. These
22

elements act as the building blocks of the cortical architecture. Short path length promotes effective
interactions between neuronal elements across cortical regions which contributes to functional
integration.
Categorizing a network as a ‘small world’ presents the opportunity to correlate a network’s
structure to its dynamic properties. In a study on scientiﬁc collaborations, Newman et. al. reported
underlying ‘small world’ structures have an impact on the dynamics of complex networks [50] by
demonstrating a ‘small world’ structure inﬂuenced the speed of information and idea dissemination
in the academic community. Kogut et. al. [51] demonstrated that German corporate ﬁrms with
higher clustering coefﬁcients and lower path lengths within their ownership structure are more
likely to be involved in takeovers and restructuring. Uzzi et. al. [52] evaluated the ‘small world’ of
Broadway musicals from 1945 to 1989 and found that the varying ‘small world’ properties affected
the creativity of Broadway musicals. The literature thus shows a clear correlation between ‘small
world’ structure and network behavior.
Recently, this functional dependence on network structure has been exploited for understanding
the topology of structural and functional brain networks. The literature suggests functional connectivity of the human brain network can be explained and modeled according to the fundamental
organizational principles of a ‘small world’. Sporns et. al [27] discovered robust ‘small world’
properties and non-random patterns in functional connection matrices of invertebrates and mammalian cerebral cortices; including macaque cortex and cat cortex. Further studies demonstrate
that large-scale cortical networks exhibit a speciﬁc composition of structural and functional modules [53], small-world attributes, i.e. high clustering and short path lengths [54, 55], short wiring
length [56] and high complexity [57]. Studies of patterns of functional connectivity in the brain,
e.g. coherence or correlation, among cortical regions have demonstrated that functional brain net-

23

works also exhibit ‘small-world’ [58, 59] and scale-free properties [60]; possibly reﬂecting an
underlying structural organization of neural connections. In general, it has been hypothesized that
the large-scale organization of neural connections in the brain may impose important constraints
on functional dynamics underlying cognition [61, 62].
‘Small world’ has also been effective in distinguishing between healthy and pathological functional brain networks. In a magnetocenephalographic (MEG) study [7], it was shown that patients with Alzheimer’s disease demonstrated signiﬁcant decreases in ‘small world’ compared to
healthy brains indicating a disruption in effective interactions between and across cortical regions.
A similar conclusion was made by Supekar et. al. [63] who evaluated patients through task-free
functional magnetic resonance imaging (fMRI). Similar disruptions in functional connectivity have
been revealed in an EEG study on schizophrenia [64] and in an MRI study on heroin-dependent
individuals [65]. These studies indicate that an analysis of ‘small world’ describes the effective
integration of multiple segregated sources of information in the brain and how decreases in the
‘small world’ structure are indicative of cortical dysfunction due to disease.
Quantifying the ‘small-world’ measure of brain networks is limited because ‘small-world’
measures, clustering coefﬁcient and path length, are originally deﬁned for unweighted networks.
However, most functional brain networks are deﬁned through weighted networks where the edge
weights correspond to the temporal dependence between different brain regions. In this chapter,
we introduce new methods for representing the ‘small-world’ behavior of weighted brain networks.
There has been a limited investigation into developing equivalent clustering coefﬁcient and path
length measures for weighted networks [66] and even less studies with applications to real-world
weighted networks such as the brain network. Therefore, we introduce new measures of weighted
clustering coefﬁcient and weighted path length and demonstrate their applicability to quantifying

24

a weighted small-world measure for the human brain network and other real world networks. The
literature typically overcomes problems encountered with weighted networks through thresholding. This methodology is not optimal because thresholding can eliminate important connectivity
information. Also, the resulting network is biased because of the arbitrary selection of a threshold.
There does not exist a systematic way for choosing an ‘optimal’ threshold and therefore studies
have usually evaluated the unweighted ‘small-world’ measure of a network across multiple thresholds. This leads to high computational times particularly where multiple subject data is available.
In large part, these problems with thresholding would be avoided using the weighted graph measures we propose in this chapter, but we also offer an alternative to thresholding through the introduction of a systematic method for transforming the weighted network to a binary network whose
‘small world’ characteristics are maximized. Both the weighted graph measures and the weighted
to binary network transformation algorithm are applied to both well-known networks and a biological network obtained from ERN EEG data with the intent of demonstrating that cognitive control
is associated with increased functional connectivity in the brain and therefore promotes an increase
in ‘small world’ structure.

2.2

Clustering Coefﬁcient

Clustering coefﬁcient of an unweighted network, a metric necessary for computing ‘smallworld’ of a network, is the fraction of triangles around a node and is equivalent to the fraction of
the node’s neighbors that are neighbors of each other. In this section, the conventional measure
of clustering coefﬁcient is described along with a discussion of its fundamental weaknesses. This
background is followed by proposed measures for addressing these weaknesses.

25

2.2.1

Conventional Clustering Coefﬁcient

The clustering coefﬁcient of a node vi quantiﬁes its ability to form a complete graph among its
neighbors where all nodes are connected to one another [31]. For a binary graph, the clustering
coefﬁcient is equivalent to the fraction of the node’s neighbors that are also neighbors of each other
and is deﬁned as

C(i) =

ti
ki (ki − 1)(1/2)

(2.1)

where ti is the number of closed triplets centered at vi and ki is the degree of vi . A closed triplet
centered at vi is deﬁned as a set of three nodes, including vi , which are linked by three edges thus
forming a triangle. An open triplet, on the other hand, is composed of three nodes but with only
two edges between the nodes. The conventional clustering coefﬁcient is in the range [0,1] where
Ci = 0 corresponds to a node vi with degree ki = 0, ki = 1, or one where none of the neighbors
share a common edge. Ci = 1 if all of vi ’s neighbors are connected to each other. Clustering
coefﬁcient is important for determining the extent to which nodes group into communities. This
clustering information is summarized for a particular network simply by computing the average
1
clustering coefﬁcient as C = N ∑N Ci , , where N is the number of nodes in the network.
i=1
A fundamental weakness of the conventional clustering coefﬁcient is that it is originally deﬁned
for binary networks, i.e. is independent of edge weights. It is well known that network characterization is not limited to its topology but also to the dynamic ﬂow of information described by
the edge weights. For example, regions with homogenous activity within the network could be
important for understanding the functionality of the network. The amount of information ﬂow
characterizing the connections is fundamental for a complete description and characterization of
networks [67]. In this chapter, a new method is proposed for computing clustering coefﬁcient in
26

weighted networks by treating edges weights as probabilities. Clustering coefﬁcient is deﬁned as
the probability of a node participating in a closed triplet with respect to all open and closed triplets
for which it is a member. Furthermore, we argue that the conventional measure of clustering coefﬁcient does not completely characterize the importance of a node in binary networks. This is
because it does not fully incorporate the degree of a node, which is an indicator of network centrality. We end this section on clustering coefﬁcient by also proposing a similar clustering coefﬁcient
for binary networks that improves the identiﬁcation of central nodes in the network.

2.2.2

Existing Deﬁnitions for Weighted Clustering Coefﬁcient

The literature presents various attempts at developing or implementing a clustering coefﬁcient
measure designed for weighted networks [68, 69, 66, 70]. Weighted networks contain information
pertaining to the dynamic ﬂow of information between node pairs and therefore a clustering coefﬁcient should reﬂect the information ﬂow through closed triplets. Barrat et. al. [71] was the ﬁrst
to propose such a weighted measure as

wi j + wih
1
w
C1 (i) = w
ai j a jh aih
ki (ki − 1) ∑
2
j,h=i

(2.2)

w
where ai j = 1 if there is an edge between nodes vi and v j and 0 otherwise and ki = ∑N wi j
j=1
is the strength of node vi . This measure quantiﬁes the local cohesiveness within a closed triplet
centered at vi . However, this measure considers only the average weight of the edges connected
to vi within a triplet and neglects to consider all edges that form the triplet, i.e does not take into
account the weight w jh . Ignoring the weight w jh will result in clustering coefﬁcients that do not
capture the total connectivity within the closed triplet.
Another closely related and widely used weighted clustering measure deﬁned by Onnela et
al. [72], previously implemented in applications to functional brain network analysis [7, 8, 73],
27

addresses this issue and takes into account all of the edges that form the triplet as

1
w
C2 (i) =
(wi j wih w jh )1/3
ki (ki − 1) ∑
j,h=i

(2.3)

This measure normalizes the geometric mean of the edges that form the triplet by the total possible number of connections for node vi . However, Onnela’s measure requires the availability of
information pertaining to the underlying binary network, i.e., ki . Therefore, if the network is fully
connected and weighted, thresholding will be required to obtain a binary adjacency matrix. If the
network is not thresholded, every node is assigned an equal degree value. This results in a constant
integer in the denominator and would cause all nodes, regardless of participation in the network,
to be normalized equally thus obscuring the uniqueness of each node’s connection strength. In the
event there exists a sparse binary representation accompanying the weights, the degree will have
a strong inﬂuence on the clustering coefﬁcient thus obscuring the information provided by the
weights. The binary degree, ki , clearly introduces biases to the clustering coefﬁcient computation
that should be strongly avoided.
Zhang et. al. proposed a weighted clustering coefﬁcient deﬁnition to address this issue [74] by
computing the clustering coefﬁcient only through the use of the weighted edges and inspired by
the deﬁnition in [75]:
wi j w jh whi
∑
∑
i=h j=i, j=h
w
C3 (i) =
∑ wi j wih
j=h
The numerator computes the total triplet intensity for a particular reference node and the denominator normalizes the clustering coefﬁcient to values between 0 and 1 by using the underlying open
triplets centered at node vi . Although Zhang’s method presents an alternative to Onnela’s require-

28

ment to use binary node degree, the method is prone to false identiﬁcation of triplets [66]. A false
positive identiﬁcation occurs when at least one of the edges within the closed triplet has significantly lower weight than the other edges and the clustering coefﬁcient is still large. This false
identiﬁcation indicates that a “strongly" linked closed triplet is identiﬁed as equivalently “strong"
as an open triplet.

2.2.3

Newly Proposed Weighted Clustering Coefﬁcient

There is a need to both decrease the number of false positives as well as obtain a measure
that is independent from the node degree. Therefore, this motivates the extension of Onnela’s and
Zhang’s clustering coefﬁcients by incorporating the total triplet (closed and open) intensities within
the denominator to provide a complete characterization of a node’s connectivity strength through
3 ∑ wi j w jh whi
j,h=i
w
C4 (i) =
∑ wih wh j + ∑ wi j w jh whi
j,h=i
j,h=i

(2.4)

which treats the pairwise weights as probabilities. In this deﬁnition, wi j is treated as the probability
of information ﬂowing through a direct path between nodes vi and v j without the use of intermediate nodes. The clustering coefﬁcient is computed as the ratio of the total probability of forming
closed triplets to the total probability of all possible triplets with respect to node vi . This approach
guarantees a normalized clustering coefﬁcient between 0 and 1 using all of the edge weights. Our
interpretation of an open triplet differs from that of Zhang’s, where an open triplet of vi is deﬁned
as one in which vi is the center node. Our measure, however, deﬁnes an open triplet of vi for which
vi is the starting point (or end point) in a connected set of three nodes. Unlike Zhang’s deﬁnition,
our deﬁnition of an open triplet incorporates the weighted edge that is not incident upon vi . This
provides a more complete representation of the node’s ability to integrate itself with the rest of

29

the network since it takes into account all possible paths originating from vi to all other nodes in
the triplet. In this way, the likelihood of a false positive is reduced, i.e., the proposed measure
quantiﬁes connectivity intensities primarily unique to “strong" closed triplets.
Table 2.1 compares the clustering coefﬁcient computed using Barrat’s, Onnela’s, Zhang’s, and
the proposed measure for different variations of a triplet. The dotted edges represent “weak" links
and the solid edges represent “strong" links. The solid node represents the reference node. One
hundred triangles were simulated for each of the four conﬁgurations where weights for the “weak"
links were selected from a uniform distribution between 0 and 0.2. “Strong" links were assigned
weights chosen from a uniform distribution between 0.8 and 1. From the table, our proposed
measure is shown to reduce the number of false positives compared to Zhang’s measure, is a more
complete representation of clustering than Barrat’s measure since it incorporates the third edge in
the calculation, and provides a signiﬁcantly larger clustering coefﬁcient value for the ‘strongest ’
closed triplet than Onnela and Barrat since binary nodal degree is not a factor.

Table 2.1: Comparisons between the proposed and existing weighted clustering coefﬁcients for
different variations of a triplet.

CB
CO
CZ
C proposed

0.05 ± 0.02
0.05 ± 0.06
0.10 ± 0.05
0.10 ± 0.06

0.05 ± 0.03
0.10 ± 0.04
0.89 ± 0.06
0.15 ± 0.08

30

0.25 ± 0.02
0.20 ± 0.06
0.91 ± 0.06
0.23 ± 0.13

0.45 ± 0.03
0.45 ± 0.03
0.90 ± 0.06
0.93 ± 0.05

2.2.4

A Random Walk Based Clustering Coefﬁcient Measure for Binary
Networks

As demonstrated by the proposed weighted clustering coefﬁcient, connectivity of a network
is best represented when the edges are weighted. Unfortunately, all edges in a binary graph are
equally weighted and therefore all triplets in the network are considered equal. This property
makes the conventional clustering coefﬁcient less discriminative than the weighted clustering coefﬁcient. This shortcoming of the conventional clustering coefﬁcient can be demonstrated with a
simple example: If nodes vi and v j are linked to 20 and 5 neighboring nodes, respectively, and all
their neighbors share edges with each other, then both Ci = 1 and C j = 1. However, it would be
accurate to describe vi and its local connections as more modular than that of v j . If these nodes
were part of a real-world network, node vi can be characterized as an anomaly since it is less
probable for there to naturally exist 20 edges shared among the neighbors of vi compared to
2
5 edges among neighbors of v . Therefore, clustering coefﬁcient can be made more discriminaj
2
tive for the purpose of identifying and ranking submodules of a binary network. In this section, a
more discriminative clustering coefﬁcient, based on the proposed weighted clustering coefﬁcient,
is described.
A challenge with applying the proposed weighted clustering coefﬁcient to a binary network is
that there are no distinct weights associated with the edges. To overcome this, a weight can be
assigned to an edge equal to its probability of being selected by a random walk. The probability
1
of selecting edge ei j is k , i.e. the reciprocal of node vi ’s degree [76]. Since the direction of a
i
random walk is dependent only on the previously visited node, the binary graph can be modeled
as a ﬁrst-order Markov chain [77]. Transitions between all node pairs, i.e. states of the Markov
chain, in the graph can be represented by a transition matrix, P, where
31



 1/k if e = 1
i
ij
P(i, j) =

 0 otherwise

(2.5)

where ki is the degree of node vi . The nth power of the transition matrix Pn determines the
probability of a random walk between nodes, i.e. Pn (i, j) is the probability of moving from vi to
v j in only n state changes. Therefore, the total probability of vi participating in an open triplet and
closed triplet are computed as

open
p(vi
) = ∑[P2 ]i j
j

(2.6)

p(vclosed ) = [P3 ]ii
i

(2.7)

,

and a modiﬁed clustering coefﬁcient can be described as
p(vclosed )
i
C(i) =
.
open
p(vi
) + p(vclosed )
i

(2.8)

This clustering coefﬁcient for binary graphs is more discriminative than the conventional measure since weights are assigned to the edges. Also, this proposed clustering coefﬁcient is still
between 0 and 1.

2.3

Characteristic Path Length

Path length, the second metric necessary for computing ‘small-world’ of a network, is computed as the average number of edges contained among the shortest paths between node pairs
[31]. In this section, the conventional measure of path length is described along with a discussion
32

of its fundamental weaknesses. This background is followed by proposed measures for addressing
these weaknesses.

2.3.1

Conventional Characteristic Path Length

While clustering coefﬁcient describes how nodes group together, the characteristic path length
quantiﬁes the extent to which individual nodes are segregated from each other. Characteristic path
length, Li , of a node vi is computed using the distance matrix, D, of the graph such that
1 N
Li =
D
N ∑ ij
i=1

(2.9)

where D contains the shortest pair-wise distances. The average network path length is deﬁned as
1
L = N ∑N Li .
i=1
Unlike clustering coefﬁcient, the literature is much more limited in offering alternative methods
for quantifying path length. The primary interest within the literature for improving path length is
developing search algorithms that discover shortest paths in networks with negative and positive
weights with minimal computational time [78] or computing shortest paths for networks containing
negative cycles [79]. However, with particular interest in ‘small-world’ analysis, there are limited
options for computing path length of weighted and binary networks. Conventional path length
is computed from a shortest path cost function. Small-world analysis reported in the literature
[7, 80] typically focuses on one type of shortest path cost function, the minimum weighted sum
(or minimum number of edges in a binary graph) between a node pair. This deﬁnition may not be
robust for a variety of reasons: 1) There may exist multiple shortest paths; 2) The shortest path
may not always be used in information sharing across real-world networks; 3) A path with the
minimum weighted sum may not always be the path with maximum probability. In graph theory,
there are many other possible cost functions for computing the minimum distance between a node
33

pair [81]. Therefore, a new path length measure can be proposed using a new and more robust cost
function for weighted graphs and binary graphs.

2.3.2

Existing Deﬁnition for Weighted Characteristic Path Length

To retain functional information introduced by edge weights, the path length measure should
be extended for weighted graphs. One such method is proposed by Stam et. al. [7] where weighted
path length is deﬁned as

L(i) =

1
1
N − 1 ∑N w
j=i i j

(2.10)

This measure suggests that the stronger the relationship between a node pair, the shorter the
weighted path length should be. This deﬁnition takes the strength of the edge connecting two
nodes as the interaction intensity between those two nodes, i.e. only considers the strength of a
single edge path. However, two nodes in a complex network, particularly the brain, can interact through multiple edged paths whose connection strengths may be larger than the weight of
the single edged path. Therefore, the above method fails to capture the true ﬂow of information
throughout the network. For example, if nodes vi and v j do not share an edge, they are assigned a
pair-wise weight of wi j = 0. This implies that a direct path between the two nodes does not exist.
However, this does not suggest that vi and v j cannot exchange information, even if not directly.
The weighted path length measure above does not consider the possibility of any information ﬂow
between these two nodes.

2.3.3

Newly Proposed Weighted Characteristic Path Length

Based on the disadvantages of the existing weighted path length measure, it would be appropriate to deﬁne a new measure that considers the path length between two nodes to be equivalent

34

to the maximum possible connection strength between them. Among all possible connections between nodes vi and v j , the shortest path will now be deﬁned as the one with the highest probability
of connection which corresponds to the maximum product of edge weights forming a path linking
a node pair. Such a method would be in line with the fundamental basis of the conventional path
length measure.
First, the connectivity strength, or the amount of information ﬂow, between two nodes is deﬁned as the product of the edge weights along the path, Pi j , linking a node pair where Pi j =
{wi,q , wq1 ,q2 , ..., wqm−1 ,qm } is the set of weighted edges in the path, and Q = {q1 , q2 , ..., qm }
1
is the sequence of nodes along the path Pi j . An efﬁcient search algorithm, based on a modiﬁcation
of the Floyd-Warshall algorithm [82], is applied to the graph. The algorithm identiﬁes a path that
maximizes the cost function Fi j = wi,q wqm , j Πm wq −1 ,q . The proposed weighted path
=2
1
length is then deﬁned as

L(i) =

1
N −1

∑
j=i

1
Fi j

(2.11)

The weighted path length of node vi is based on the average probability of forming a path to all
other nodes in the network. The advantages of this proposed path length measure are that it is
based on the connection weights and considers all paths between nodes. The only stipulation for
the evaluated connectivity matrix is that its entries must be in the range [0, 1] to ensure that as the
number of edges in a path increases, the probability of linking a node pair within the path decreases.
In some applications, edges of a graph may be associated with a probability of reliability [83].
For situations where this information is unavailable, normalizing the weights of the connectivity
matrices by the maximum connectivity weight would satisfy this requirement.

35

2.3.4

A Random Walk Based Characteristic Path Length Measure for Binary Networks

Conventional measures of shortest distance is deﬁned as the minimum total sum of weights
along a path. Practical applications such as the optimal processing of parallel algorithms in a
multiprocessor system [84], adopt this cost function for computing shortest path in order to identify
the ‘best’ choice for facilitating information ﬂow in a network. Yu et. al. [85] argues that a single
cost function, such as the one implemented for small-world analysis, is not guaranteed to reveal the
most robust connections in a network. Consider the examples in Figure 2.1 which show variations
in connectivity among three nodes. In both ﬁgures 2.1a and 2.1b, nodes A and B have a shortest
path of 1; however, the robustness of their connections differs in each case. The arrangement in
2.1a shows a more robust connection between A and B, however the single edge linking them in
2.1b may be easily disrupted in a random attack. Therefore, the connectivity in 2.1a is more robust
than the one in 2.1b although in both cases the value of the shortest path is the same. Further
evaluation of Figure 2.1b by conventional path length would have indicated that the connection
between A and B with a shortest path of 1 is more ‘optimal’ than the connection between A and
C with shortest path length of 2. However, it is clear that the connectivity between A and C is
more robust than the one between A and B. This simple example illustrates the importance of
robustness for real-world networks where perturbations are likely to occur [86, 87, 88]. Therefore,
a path length, computed from a different cost function than that used in conventional small-world
analysis, should convey the robustness of a network.
The new cost function is based on the N × N Markov transition matrix introduced in Section
2.2.4. The matrix is characterized as absorbing if it contains at least one node which behaves as
a sink [77], i.e. the probability of a directed random walk along a path from the sink to all other
36

(a)

(b)
Figure 2.1: Nodes A and B have similar path length, however, connectivity in (b) is less robust
than in (a). In (b), connectivity between nodes A and C is more robust than connectivity between
nodes A and B; despite A and C having a longer path length than A and B.

nodes is zero. All other nodes are considered transient since a random walk may traverse between
them in any sequence, i.e. the probabilities of traversal along these bi-directed paths are non-zero.
P, representing the transition matrix of an absorbing Markov chain, is deﬁned as




 Q
P=

0

R 


1

(2.12)

where Q is the (N −1)×(N −1) matrix containing the transitional probabilities among the transient
nodes and R is the (N − 1) × 1 vector containing the transitional probabilities from transient nodes
to the absorbing node. The expected number of states needed to reach the absorbing state from
any transient state is M = I + Q + Q2 + ....Q∞ , where M is the fundamental matrix of an absorbing
Markov chain and is shown to be M = (I − Q)−1 [89]. The following theorem then is used to
37

determine the expected number of state transitions necessary for reaching the absorbing node.
Theorem 1. Let ti j be the expected number of steps before the Markov chain is absorbed at node
vi , given the chain starts at node v j . Also, let T be the (N − 1) × 1 vector whose jth entry is ti j .
Then

T = Mc

(2.13)

where c is the (N − 1) × 1 ones vector and M is deﬁned above [90].
By transforming a binary graph into a absorbing Markov chain model, path length can be computed
as the expected value of the distance between any node pair. Since the graphs of interest in this
study are undirected and connected, they can be characterized as ergodic. Therefore to compute
path length, these ergodic networks are converted into absorbing networks by ﬁrst selecting any
node to be the absorbing state. If vi is the selected node, then P is modiﬁed such that the entire ith
row is set to zero with the exception of Pii which is set to 1. This designates vi as the absorbing
node. If the initial state begins at node v j , the average distance between nodes v j and vi is the
average number of state changes expected before being absorbed by vi . T is then computed for
each node by iteratively designating each node as the new and only absorbing state. Averaging T
for each absorbing node is equivalent to the path length of that node. Algorithm 1 describes the
process of computing path length. In the algorithm, K is the diagonal degree matrix of a graph G
where Kii = ki ( 0 otherwise) and I is the identity matrix.
This process identiﬁes the expected length of paths traversed frequently between pairs of nodes.
The expected length of a path is not necessarily the shortest, but rather the most probable and robust. Therefore, the proposed path length measure, L, is an improved extension of the conventional
measure implemented in small-world analysis. Since the new path length is based on the evaluation
38

Algorithm 1 Markov Chain Path Length for Binary Graphs
Input: N × N binary graph, G = (V, E), and adjacency matrix A.
Output: Path Length vector, L.
1: P = K −1 A {# Transition matrix of graph G.}
2: z = N − 1.
3: for i = 1 to |V | do
4: Q = P − {vi }.
5: M = (Iz×z − Q)−1
6: c = 1z×1
7: T = Mc
∑ j Tj
8: L(i) = z
9: end for
10: return
of multiple alternative paths, it is also a more complete representation of network connectivity.

2.4

Small-World Measure of a Network

In this section, the conventional measure of small-world implemented throughout the literature
[27, 7, 91] is discussed. Since applications of small-world have primarily included binary graphs,
an extension to weighted graphs is presented.

2.4.1

Conventional Small-World Measure

Watts et. al. [31] computed the small-world value of a network using clustering coefﬁcient
and path length. First clustering coefﬁcient and path length are normalized with respect to the
clustering coefﬁcient and path length of a random network. The normalized values for clustering
coefﬁcient and path length are denoted as γ and λ , respectively. It is expected that a network with
‘small-world’ property will demonstrate γ
σ=

γ
where σ
λ

1 and λ ≈ 1. The small-world value, σ is computed as

1 is indicative of a ‘small world’ topology. Although the small-world measure

is primarily deﬁned for binary networks, a weighted small-world measure has been discussed by
Latora et. al. [80, 92]; yet they do not describe a formal deﬁnition which utilizes a weighted
clustering coefﬁcient and weighted path length. Here, a weighted extension to the small-world
39

measure is presented using the proposed weighted clustering coefﬁcient and weighted path length
deﬁned in sections 2.2 and 2.3.

2.4.2

Newly Proposed Weighted Small-World

The proposed weighted clustering coefﬁcient and path length are used to deﬁne a weighted
small-world measure. Normalization of the two parameters is performed in a similar manner described by Stam et. al. [7]. Normalized weighted clustering coefﬁcient and normalized weighted
path length are computed by dividing the network’s clustering coefﬁcient and path length by the
average corresponding measures, Cw (i) and Lw (i) , of an ensemble of surrogate graphs.
rand
rand
The surrogate graphs are generated by randomly permuting the weighted edges of the original connectivity matrices. With the normalized weighted clustering coefﬁcient and path length deﬁned as
Cw (i)
Lw (i)
Cw (i) = w
and Lw (i) = w
, we extend the conventional small-world deﬁnition
C
(i)
L
(i)
rand
rand
Cw (i)
to evaluate local weighted small-world at each node in the weighted network as σ w (i) =
.
Lw (i)
A weighted network is characterized as a small-world if σ w (i) >> 1 .

2.5

Mapping of Weighted Graphs to Binary Graphs

In this section, fundamental issues associated with weighted to binary mappings of networks
are discussed. Since edges of real-world networks typically are associated with non-binary weights,
there exists common approaches for obtaining a binary transformation. The weighted measures
presented in sections 2.2 and 2.3 circumvent this problem, however, for cases when the binary
representation is absolutely necessary, an alternative transformation algorithm is offered.

40

2.5.1

Existing Approaches for Transforming Weighted Graphs to Binary
Graphs

Most of the current graph theoretic measures are deﬁned primarily for binary adjacency matrices [27, 93, 94]. There is little research on weighted adjacency matrices and this poses a problem for applying the conventional measures to weighted adjacency matrices and for performing
comparisons between weighted graphs. A review of thresholding techniques which are currently
implemented in weighted graph studies is presented by Wijk et. al. [95]. There are two primary
methods for transforming a fully connected weighted graph to a sparse binary network: 1) Choosing an arbitrary threshold, τ, for which all entries of the weighted matrix equal to or are greater
than τ are mapped to a 1 and those below are mapped to a 0; 2) Applying a range of thresholds, [τ1 , τ2 ], to the weighted matrices and obtain a collection of binary matrices corresponding to
each threshold. These two approaches are either purely arbitrary or computationally exhaustive.
Furthermore, comparisons across different networks are unreliable since τ can cause variations in
sparsity across networks. Choosing a threshold which also maintains connectivity, without sacriﬁcing sparsity, continues to also be a challenge. The proposed method in this section offers a third
option, optimal weighted to binary transformation. The advantage of an optimal transformation
cost function would reduce dependence of graph measures on τ but rather utilize the strength of
connectivity among node pairs.

2.5.2

Weighted to Binary Graph Transformation: Alternative to Matrix
Thresholding

To address the issue of connectivity presented by thresholding, it is proposed that a ‘backbone’
of the network is ﬁrst identiﬁed. In computer networking, a backbone is a larger transmission

41

line that carries data gathered from smaller lines that interconnect with it [96]. We extend the
use of this terminology to describe the underlying primary structure of a graph. This ‘backbone’
would ensure all nodes are reachable from any starting node. The question is how to choose the
best ‘backbone’ of the network. In graph theory, a spanning tree is a subgraph which contains all
nodes in the network, has exactly n − 1 edges, and each node is reachable from all other nodes in
the network. There are no cycles in the spanning tree, i.e. removing an edge from the tree would
disconnect the tree. In a complete graph with n nodes, there are nn−2 possible spanning trees [18].
There exists a spanning tree whose total cost, deﬁned as the sum of all weights in the spanning tree,
is minimum with respect to all possible spanning trees in the weighted graph. This minimum cost
spanning tree is called a minimum spanning tree (MST). In the proposed work, we use the MST as
the ‘backbone’ of the new binary transformation of the weighted graph. In order to extract the MST
corresponding to a given graph, we use Prim’s algorithm [18], which is capable of identifying the
MST for any graph with negative and positive weights. If WG is the weighted connectivity matrix
of a graph G, then WH is the transformation from G to H such that WH = −1×WG . In this manner,
the highest weights are now transformed into the lowest weights. Applying Prim’s algorithm to
H would produce a minimum spanning tree containing edges forming node pairs with the highest
weights in G.
After identifying the ‘backbone’, i.e. tree, of the new graph, the tree can be expanded to a
cyclic graph by adding edges to the tree based on a user deﬁned cost function. One such cost
function in the realm of real-world network analysis, for example, identiﬁes a degree distribution
which follows a power-law model. Another cost function is the ‘small-world’ parameter, deﬁned
by the clustering coefﬁcient and path length. We propose a cost function which would maximize
the clustering coefﬁcient and minimize the path length by forming cycles in the minimum spanning

42

tree using nodes separated by at most λ edges in the tree where 1 ≤ λ ≤ n. λ can be incremented
as long as the bounds set by the cost functions are obeyed. Since the focus of this thesis is analyzing the connectivity of real-world networks, a cost function which bounds the degree distribution
according to a power law model and a cost function which facilitates small-world behavior will be
implemented.
Algorithm 2 describes the process for generating a binary graph from a fully connected weighted
graph. The function T = MST (G) identiﬁes the binary adjacency matrix representing the minimum
spanning tree for graph G. DT = dist(T ) is the pair-wise distance matrix of the minimum spanning
tree T . The variables [φ1 , φ2 ] are the lower and upper bounds of the degree exponent, α, estimated
for the degree distribution. As presented in Chapter 1, for a power-law model −3 ≤ α ≤ −2.
Similarly, clustering coefﬁcient is bound by the limits [γ1 , γ2 ]. From previous studies [94, 50],
clustering coefﬁcients of real-world networks have been observed to be in the range 0.2 ≥ γ ≥ 0.9.
Algorithm 2 Optimal Transformation from a Weighted to a Binary Adjacency Matrix
Input: An n ordered weighted graph G = (V, E) with connectivity matrix WG , where ei j ∈ E|wi j ∈
WG .
Output: Optimal binary matrix, A.
1: HW ← −1 ×WG .
2: T = MST (H).
3: A ← T .
4: DT = dist(T ).
5: λ ← 2.
6: while λ < n do
7: A ← A ∪ {ei j |DT (i, j) ≤ λ }.
8: if φ1 ≤ α ≤ φ2 then
9:
if γ1 ≤ C ≤ γ2 then
10:
λ ← λ + 1.
11:
else
12:
return
13:
end if
14: end if
15: end while

43

2.6

Results

In this section, the performance of the proposed graph theoretic measures are demonstrated on
both simulated data derived from benchmark software, the popular Zachary karate social network,
and neural connectivity matrices derived from the ERN EEG study.

2.6.1

Random Walk Based Clustering Coefﬁcient/Path Length Measures for
Binary Benchmark Networks

The proposed Markov clustering coefﬁcient and path length measures are ﬁrst tested to determine if they can distinguish between different network topologies including unweighted random,
scale-free (‘small-world’), and lattice graphs. One-hundred networks were generated, using the
LFR benchmark generator [32], of each topology with order N = 50 and the proposed graph measures were applied. The average clustering coefﬁcient and average path length are computed across
all nodes for all 100 networks. Figure 2.2 demonstrates Crandom < C‘small−world << Clattice
and Lrandom ≈ L‘small−world << Llattice using the Markov measures. These ﬁgures verify
that the proposed measures are suitable for deﬁning a modiﬁed ‘small-world’ parameter. Similar
results were observed for N = 100 and N = 250 which are not reported here.
Next, the conventional measures are compared to Markov measures as applied to the unweighted karate network. Figures 2.3 and 2.4 demonstrate the differences between the measures.
The proposed measure provides very different yet accurate information about the network’s clustering properties compared to the conventional measure. For example, nodes 1, 33, and 34 are
assigned the largest clustering coefﬁcients as expected since they represent the three dojo leaders
within the karate club. They were assigned signiﬁcantly lower values using the conventional measure whereas nodes 15 and 16, students not participating in leadership roles, were assigned larger

44

values using the conventional measure. This is due to nodes 15 and 16 forming all possible triplets
among their neighbors. However, this obscures the importance of 1, 33, and 34 since they have
more direct connections than 15 and 16 (a problem previously discussed in section 2.2). Therefore, the proposed clustering coefﬁcient metric is better suited for evaluating which nodes promote
centrality within the community structure of the network, i.e. acting as important central hubs. In
this case, the dojo leaders of the club were accurately identiﬁed since it is their responsibility to
bring students together.
Similarly with path length, the proposed measure assigns nodes corresponding to dojo leaders
the lowest path lengths. When these nodes were chosen as absorption states in the Markov chain, it
required very few state transitions to link them with all other nodes within the network. Therefore,
nodes 1, 33, and 34 have very robust connections and would less likely be disconnected from the
network in case of failure among node pairs. Node 12 possesses the largest path length therefore
indicating it is at higher risk of being disconnected from the network. This student does not have
social ties with anyone in the class except for the dojo leader. Therefore, if dojo leader 1 fails to
adequately provide attention to student 12, the student may very well drop out of the club. The
path length information attained from the conventional measure is nearly uniform across nodes and
does not accurately portray the differences in nodal robustness and therefore less discriminative.

2.6.2

Weighted Graph Measures on Benchmark Data

Performances of the proposed weighted graph measures were veriﬁed by applying them to
the weighted version of Zachary’s Karate club and comparing the results to those obtained through
existing weighted measures, in particular Onnela’s weighted clustering coefﬁcient and Stam’s characteristic path length. The ﬁrst analysis was performed using Onnela’s and the proposed weighted
clustering coefﬁcients (Figure 2.5). Compared with the results of Onnela’s measure, the proposed

45

weighted clustering coefﬁcient indicates an increased clustering coefﬁcient for the central nodes 1,
33, and 34 and decreased clustering coefﬁcient for outer boundary nodes 15 and 16, who are only
regular members of the club. The relationships maintained by the senior members are important
for forming the two large communities in the karate club. This indication of network topology
and the identiﬁcation of the central nodes are not available through Onnela’s measure. This difference between Onnela’s and the proposed weighted clustering coefﬁcients is reﬂected by a low
correlation coefﬁcient of 0.3.
The weighted path length described by Stam et. al. and the proposed measure in this paper
are next applied to the karate network. Based on Figure 2.6, the two measures are closely related
for most nodes, indicated by a correlation coefﬁcient of 0.7. However, there are some differences
between the two measures. For example, based on the proposed measure node 17 has the largest
path length, whereas for Stam’s measure nodes 10, 12, 18 and 19 have the largest path lengths.
The proposed measure yields the highest path length for node 17 since that node is not directly
connected to any of the central nodes, i.e. nodes 1, 33 and 34, whereas nodes 10, 12, 18 and 19
are all directly connected to a central node. Since the proposed measure computes the path length
by considering all possible connections between a particular node and the rest of the network,
it is a better indicator of nodes that are more isolated from the rest of the network. In general,
most nodes in the karate network were assigned increased path lengths since all shortest paths,
which may be composed of more than one edge, were considered. Therefore, the proposed path
length measure presents a much more complete representation of network connectivity. Also, the
minimum of the proposed weighted path length measure is 1, similar to the conventional measure
for binary networks. Unlike Stam’s measure, this permits a weighted small world measure to
be computed according to the same principles deﬁning the conventional small world measure for

46

binary networks.

2.6.3

Weighted Graph Measures on ERN EEG Data

2.6.3.1

Bivariate Phase-Synchrony Measures

The RID-TFPS measure was applied to trials from each electrode pair in the 62-channel montage, separately for each subject and condition (i.e. error and correct, where correct trials were
randomly matched to the number of correct trials). The average phase synchrony corresponding to
the theta frequency band (4-7 Hz) and ERN time window (25-75 ms) were recorded into separate
62 x 62 connectivity matrices, one for each subject (90) and condition (2), producing 180 total connectivity matrices. Figure 2.7 illustrates the signiﬁcant phase synchrony differences between error
and correct responses based on a Wilcoxon Rank Sum Test (p<.001). From this ﬁgure, it is clear
that error trials produce stronger connections among a number of sites than correct trials, with a
dense representation around FCz (consistent with mPFC). Importantly, bivariate TFPS effects (error > correct) reported by Cavanagh et al. [21] are replicated in the current data (F5-FCz, t=2.26,
p<.026; F6-FCz, t=3.84, p<.001). However, this plot makes it clear that there are many more
signiﬁcant electrode pairs, and that these two a priori pairs do not fully represent the complexity
of the functional connectivity. Thus, although the distribution of pairwise error-correct phasesynchrony differences is useful for determining which electrodes are most phase-synchronous during the ERN, this does not serve to characterize the network behavior corresponding to cognitive
control, or how such networks reorganize during error trials.
2.6.3.2

Graph Measures

To assess control-related network characteristics using the graph theoretic framework, the proposed weighted clustering coefﬁcient and the weighted path length measures were applied to all
of the connectivity matrices. The weighted graph measures were evaluated for each electrode

47

across participants for the error and correct conditions. This resulted in two 62 x 90 matrices for
each condition, corresponding to the clustering coefﬁcient, path length, and overall small-world
measure. Signiﬁcant error-correct differences (where Wilcoxon p<.01) for the proposed measures
are presented in Figure 2.8. The clustering coefﬁcient was signiﬁcantly greater during error responses versus correct responses in the frontal region, centered on site FCz, as hypothesized, suggesting increased community formation or functional segregation for error responses relative to
correct. Results also indicate that path length was shorter during error responses relative to correct responses supporting the idea that connectivity was increased among the observed network’s
sub-communities. Both measures are indicative of increased functional organization among the
neuronal regions in the frontocentral region of the brain for error responses, consistent with involvement of the mPFC. Like the constituent clustering and path length parameters, the proposed
weighted small-world measure evidenced signiﬁcant error-correct differences in the same frontocentral regions. Together, these results indicate a signiﬁcant increase in network efﬁciency and
functional connectivity during error trials relative to correct.
We also compared the performance of the proposed measures with previously published weighted
graph measures on the same ERN data set. Figure 2.8 shows the topographic distributions for the
error-correct differences and the associated signiﬁcance of these differences for both measures.
It can be seen that both the existing and proposed measures show signiﬁcant differences in the
small-world index in the mPFC and lPFC areas with the signiﬁcance being higher for the proposed
small-world index. Similarly, the path length and clustering coefﬁcients show signiﬁcant differences in similar areas for both the proposed and existing measures, with the proposed measure
yielding increased signiﬁcance. These results indicate that the proposed weighted graph measures
capture similar information about the network topology compared to the existing measures while

48

being more sensitive to the differences in network topology involved in error relative to correct
responses.
2.6.3.3

Relationship between Graph and Bivariate Measures

Next, we assessed whether the signiﬁcant graph theoretic measures accounted for the bivariate F5-FCz and F6-FCz relationships. This is important for verifying that these new results are
directly relevant to the previously published effects of Cavanagh and colleagues [21]. First, correlations between the three graph theoretic and two bivariate measures were computed (Small
World, Clustering Coefﬁcient, and Path Length correlated with the FCz-F5 and FCz-F6 bivariate
TFPS measures; presented in Table 2.2). For both the error and correct conditions independently,
the correlations were high. This veriﬁed that a majority of the variance in the bivariate measures
was indeed represented in the graph theoretic measures. The correlations for the error-correct
difference were also signiﬁcant, but were qualitatively smaller than those for the conditions independently. Thus, we next evaluated whether the signiﬁcant bivariate error-correct differences for
FCz-F5 and FCz-F6 would be explained by the graph theoretic measures. This was accomplished
by taking the variance unique to the bivariate measures (i.e. residual variance in each bivariate
measure after removing variance shared with the overall small world measure), and assessing for
remaining error-correct differences. This residual variance was not signiﬁcantly different from
zero (both one-sample ts < 1), verifying that the small-world measure accounted for the previously reported bivariate error-correct differences. Next, additional analyses were conducted to
assess whether one of the constituent small-world measures (i.e. clustering coefﬁcient or path
length measure) was speciﬁcally or uniquely responsible for the relationship between the smallworld and bivariate measures. To accomplish this, regressions were conducted using alternatively
each of the signiﬁcant bivariate error-correct differences as dependent variables, and the associated

49

clustering coefﬁcient and path length as independent variables. Results indicated that the clustering
coefﬁcient carried some unique variance beyond what was shared with path length, for both FCzF5 phase synchrony (R=.60, F(‘1,89)=23.94, p<.001; Clustering Coefﬁcient, t(89)=3.72, p<.001;
Path Length, t<1, ns) and FCz-F6 phase synchrony (R=.52, F(1,89)=16.19, p<.001; Clustering
Coefﬁcient, t=4.49, p<.001; Path Length, t(89)=1.92, p<.058).
Table 2.2: Correlations between multivariate small world and bivariate (FCz-F5/FCz-F6) measures
for both error and correct responses individually and the error-correct differences (all correlations
signiﬁcant; p<0.001).
Small World
Clustering Coefﬁcient
Path Length
F5-FCz F6-FCz F5-FCz
F6-FCz
F5-FCz F6-FCz
Error
0.91
0.92
0.88
0.93
-0.82
-0.82
0.92
0.91
0.89
0.92
-0.80
-0.79
Correct
Error-Correct
0.63
0.49
0.60
0.49
-0.50
-0.32

2.6.4

Optimal Transformation From Weighted to Binary Matrix: ERN/EEG
Data

The proposed method for transforming a weighted matrix into a sparse binary matrix was
applied to the EEG ERN data. The 90 phase synchrony matrices were averaged into a single
62 × 62 matrix for Correct and Error responses. These two matrices were then inputted into the
transformation algorithm.
The graphs in ﬁgures 2.9 and 2.10 represent the optimal binary transformation of the original
weighted connectivity matrices. The red node indicates the location of the FCz node. The graph
representing error responses contains dense connections among the frontal and frontal-central
nodes centered at FCz. As hypothesized, stronger interactions are expected between these node
regions during the ERN. Modules of parietal and central nodes are also formed further along the
graph. The graph representing the correct responses similarly contains dense connections between
50

FCz and frontal and central nodes but is not as centrally located compared to its location in the error graph. For comparison, the ‘small-world’ parameter for binary graphs was computed for binary
matrices generated by thresholding. Binary graphs corresponding to a range of threshold values,
0 ≤ τ ≤ 1, with a step size of 0.001 were formed. Figure 2.11 reports the ‘small-world’ parameter
with respect to network degree (which is dependent on threshold) generated for different τ for both
error and correct responses. Small-world values computed for disconnected binary graphs are not
reported. The small-world measure was also computed for the optimal binary graphs generated
for error and correct using the proposed method. From inspection of ﬁgure 2.11, small-world,
computed via thresholding, is larger for error compared to correct for very sparse representations
(graphs with average degree approximately between 13 to 20). Small-world, computed for the
optimal binary graph transformations, is larger for error as well. Furthermore, the average degrees
of the optimal binary graphs are sparse (15 for error and 14 for correct). These results demonstrate
the optimal binary transformation is capable of identifying a binary graph structure with similar
characteristics as the graph derived from the ‘best’ threshold value. This approach saves computational time since numerous thresholds do not need to be inspected, eliminates a dependence on τ
all together, and maintains connectivity.

2.7

Conclusions

In this chapter, we introduced new graph theoretical measures, clustering coefﬁcient and path
length, for binary and weighted graphs. The proposed measures offer improvements over existing
small-world measures. By using graph measures that are directly applicable to weighted networks,
the need for thresholding the elements of connectivity matrices has been circumvented and the
complete information about the strength of the connections between nodes is taken into account,
simultaneously. The proposed measures also improved upon existing weighted graph measures by

51

considering all of the alternative ways information can ﬂow in a network rather than through its
immediate neighbors only. This improvement is accomplished by adopting a probabilistic framework where the edge weights are treated as the probability of connection. These weighted measures demonstrated integration and segregation of members of the karate network. The proposed
weighted measures were also used to assess the small-world properties of the ERN EEG data.
Topographic results indicated that frontal areas contained increased clustering and decreased path
length (i.e. small-worldness) in response to errors, relative to correct responses. This supports the
hypothesis that the brain increases organization, due to the anterior cingulate cortex, across the
frontal brain regions during error processing.
The usefulness of these weighted measures inspired the need to improve upon the conventional
measures for binary graphs. The challenge was to associate distinct weights to edges across a
binary network. This was addressed by modeling the adjacency matrix as a Markov chain. Edge
weights were assigned based on the probability of a random walk moving between nodes where
the probability was computed as the reciprocal of the node’s degree. Both clustering coefﬁcient
and path length deﬁned for binary graphs through this manner were shown to be more indicative
of a node’s hubness and an edge’s robustness for binary graphs as shown through simulated binary
networks.
Finally, in the case of weighted to binary graph transformation, the need for arbitrary thresholding was circumvented through the introduction of an optimal weighted to binary transformation
algorithm. The algorithm transformed a fully connected weighted graph to a sparse binary graph
while simultaneously preserving connectivity, small-worldness, and a degree sequence whose distribution was modeled as a power-law distribution. This transformation algorithm was applied to
the average phase synchrony matrices pertaining to the ERN EEG study. Two different structures

52

representing the brain network were generated and revealed the increased presence of hub nodes
both in the frontal and frontal-central regions during error responses.

53

(a)

(b)
Figure 2.2: Markov small-world measures averaged over 100 benchmark networks of order N = 50.
a) The proposed binary clustering coefﬁcients of the simulated small-world networks are much
greater than that of random networks, i.e. γ
1; b)The proposed binary path lengths of the
simulated small-world networks are similar to that of random networks, i.e. λ ≈ 1

54

Normalized Clustering Coefficient

Markov Clustering Coefficient
1
0.8
0.6
0.4
0.2
0

2

4

6

8 10 12 14 16 18 20 22 24 26 28 30 32 34
nodes

(a)
Normalized Clustering Coefficient

Conventional Clustering Coefficient
1
0.8
0.6
0.4
0.2
0

2

4

6

8 10 12 14 16 18 20 22 24 26 28 30 32 34
nodes

(b)
Figure 2.3: Markov and conventional clustering coefﬁcients applied to binary karate network.

55

Markov Path Length
Normalized Path Length

1
0.8
0.6
0.4
0.2
0

2

4

6

8 10 12 14 16 18 20 22 24 26 28 30 32 34
nodes

(a)
Conventional Path Length
Normalized Path Length

1
0.8
0.6
0.4
0.2
0

2

4

6

8 10 12 14 16 18 20 22 24 26 28 30 32 34
nodes

(b)
Figure 2.4: Markov and conventional path lengths applied to binary karate network.

56

Proposed Weighted Clustering Coefficient

Clustering Coefficient

1
0.8
0.6
0.4
0.2
0

2

4

6

8 10 12 14 16 18 20 22 24 26 28 30 32 34
nodes
(a)
Onnela Weighted Clustering Coefficient

Clustering Coefficient

1
0.8
0.6
0.4
0.2
0

2

4

6

8 10 12 14 16 18 20 22 24 26 28 30 32 34
nodes
(b)

Figure 2.5: Nodal weighted clustering coefﬁcients computed using Onnela’s clustering coefﬁcient
and the proposed clustering coefﬁcient for Zachary’s Karate club network.

57

Proposed Weighted Path Length
1

Path Length

0.8
0.6
0.4
0.2
0

2

4

6

8 10 12 14 16 18 20 22 24 26 28 30 32 34
nodes
(a)
Stam Weighted Path Length

1

Path Length

0.8
0.6
0.4
0.2
0

2

4

6

8 10 12 14 16 18 20 22 24 26 28 30 32 34
nodes
(b)

Figure 2.6: Nodal weighted path lengths computed using Stam’s path length and the proposed path
length for Zachary’s Karate club network.

58

Figure 2.7: Connectivity plot based on the RID-TFPS measure for electrode pairs whose phase
synchrony is signiﬁcantly larger for error responses compared to correct responses (Wilcoxon rank
sum, 0.0001 ≤ p ≤ 0.001).

59

Figure 2.8: Topomap of the differences in the nodal weighted clustering coefﬁcient, path length and
small world parameter for error and correct responses. These differences are computed for both
the previously published and proposed measures. Topographic signiﬁcance plots are provided for
each measure.

60

O2

PO8

P8
PO6
OZ

P6
O1 PO4

T8

TP8

C6
C1

CP6

P4
FT8
FCZ

CP2
AF8
FC6

CP4

PO7

POZ

P2
PO3

PZ
CPZ
F8

T7

TP7

C4

C2

FC2

FC4 CZ

F2

FC1

C3
FC3 FZ

F1
AFZ

P3

PO5

P1
P5
CP1
F6
FC5 AF7

FT7
F3

C5

P7
CP3
F4 FP2

CP5
FPZ
F5

F7

AF3

FP1

Figure 2.9: Optimal binary graph generated from weighted connectivity matrices for error responses. Hubs of the network are indicated in color. Hubs are identiﬁed based on the number of
shortest paths in which each node participates. Green indicates the most important local hubs, blue
indicates nodes globally important for connecting distant portions of the graph, and red is the FCz
node (also an important local hub)
.
61

Figure 2.10: Optimal binary graphs generated from weighted connectivity matrices for correct
responses. Hubs of the network are indicated in color. Hubs are identiﬁed based on the number
of shortest paths in which each node participates. Green indicates the most important local hubs,
blue indicates nodes globally important for connecting distant portions of the graph, and red is the
FCz node (also an important local hub).

62

Figure 2.11: Comparisons between small world calculations using the conventional thresholding
method and the propose optimal binary transformation approach.

63

Small World
10

1

1.5

2

2.5

3

20

30

Degree, K

40

50

60

Error (thresholding)
Correct (thresholding)
Error (Optimal)
Correct (Optimal)

Small World Computed using Thresholding and Optimality Algorithm

Chapter 3
Local Feature Extraction from Weighted
Graphs

3.1

Introduction

Chapter 2 demonstrated how complex networks may be modeled as graphs composed of vertices and edges and may be characterized as ‘small world’ based on graph theoretic measures such
as clustering coefﬁcient and path length. The ‘small world’ measure, however, is as much as a
mathematical abstraction as the graph it describes [97]. The small-world measure and other existing graph measures, e.g. girth, diameter, degree, fall short in providing a functional interpretation
of the network familiar in research ﬁelds such as signal processing and information theory. This
renders it difﬁcult to interpret the physical signiﬁcance of conventional graph measures describing non-abstract systems such as biological networks. The goal of this chapter is to complement
current graph measures with non-abstract measures possessing physical meanings supported by
well-established theory from other ﬁelds of research. Furthermore, these new measures can serve
to identify local discriminate features of the graph.
64

Feature extraction of data is widely explored and is an important tool in signal processing,
machine learning and pattern recognition [98, 99, 100]. Feature extraction usually maps the highdimensional input data onto a lower dimensional feature space which reﬂects the inherent structure
of the original data [101]. In unsupervised learning, these discovered features are the most representative of the original data and can be used to identify similarities and differences between
data sets or to detect anomalies within the data. Adapting existing feature extraction methods from
signal processing to graphs is difﬁcult because graphs, particularly those representing real-world
networks or systems, are semi-structured and non-Euclidean [102]. Regardless, there have been
various proposed measures and algorithms for extracting features from graphs using well-known
methods from across different ﬁelds of research. For example, in a study of elastic graph matching
for facial expression recognition, Zafeiriou et. al [103] implemented a kernel-based technique for
discriminant feature extraction in graphs. Fei et. al. [104] adapted data mining techniques to
graphs by demonstrating the frequency of subgraphs can be used as a feature extraction method
for graph classiﬁcation. Miller et. al. [105] demonstrated anomaly detection in subgraphs through
spectral decomposition of the modularity matrix, and using a test statistic, determined the presence
of noise in a graph. Ortega et. al. [106] proposed a lifting wavelet transform on sensor networks
for de-noising applications. It is evident from these published measures that local graph feature
extraction methods based on signal processing concepts would provide a more detailed and multiscale view of network organization. A close examination could reveal hubness of individual nodes,
roles of subgraphs in maintaining connectivity, compressibility of sparsely connected graphs, or
the existence of noise or anomalies.
The most common local feature extraction method applied to graphs is centrality which can be
used to describe the roles individual edges and vertices serve in connectivity [107]. However, the

65

current centrality measures such as betweenness or closeness centrality do not have any physical
meaning in signal processing. In this chapter, established measures from signal processing and
information theory, energy and entropy, are extended to binary and weighted graphs. These measures are designed to detect anomalous subgraphs in the network as well as classify the hubness of
nodes. The proposed deﬁnitions of graph energy and graph entropy are evaluated on simulated and
real-world networks, including ERN data, and compared to the performance of recently published
measures.

3.2

Existing Feature Measures for Graphs

In this section, a review of some of the most common feature measures for graphs is presented.
These measures have been used in the literature to compare graphs with each other, understand
network structure, and evaluate the ﬂow of information across the network.

3.2.1

Hub Centrality

A hub is deﬁned as a node within a network which takes on the role of maintaining connectivity, sometimes also known as an inﬂuential node. ‘Hubness’ of a node can be quantiﬁed by
betweenness, closeness, eigenvector, and degree centrality [108]. Betweenness centrality [109],
the most commonly utilized centrality measure, quantiﬁes the fraction of shortest paths within the
network that travel through node vi such that

CB (i) =

∑
m=i=n∈V

σmn (i)
σmn

(3.1)

where σmn is the number of shortest paths between nodes vm and vn and σmn (i) is the number
of shortest paths which travel through node vi . Closeness centrality [110] is the mean geodesic
distance between node vi and all other nodes such that

66

1
CC (i) =
∑ j∈V /i D(i, j)

(3.2)

where D is the distance matrix whose pair-wise entries are the minimum number of edges required
to form a path between vi and v j . Degree centrality [110] quantiﬁes a node’s importance based on
the number of direct connections such that

ki
CD (i) =
.
N −1

(3.3)

Finally, eigenvector centrality, introduced by Bonacich [111], quantiﬁes a node’s importance based
on the principle eigenvector of the adjacency matrix such that

1
CE = ACE
λ

(3.4)

where λ is the most dominant eigenvalue of adjacency matrix A and CE is the eigenvector associated with λ such that CE (i) is the centrality score for vi . Among these four centrality measures,
eigenvector centrality is the most reliable of true graph connectivity [111] and is similar to a
commonly used signal processing method, principle component analysis (PCA) [112]. Bonacich
argued that a centrality measure based on degree, e.g. closeness, betweenness, and degree centrality, weights every edge connection in the network equally while the eigenvector weights edge
connections according to their true centralities. Furthermore, eigenvector centrality is a weighted
sum of not only direct connections, as is the case for degree centrality, but indirect connections of
every length such that it accounts for the entire underlying connection pattern across the network.

67

3.2.2

Provincial and Connector Hubs

Another method for evaluating the centrality of a node is to classify it as either a provincial
hub, connector hub, or non-hub [113] according to its role in a community structure. A dense
group of highly connected nodes in a network are referred to as clusters. A node is considered to
be provincial if it is primarily responsible for linking nodes together within a cluster. A connector
hub is a node which links together multiple clusters and are located along the inter-edges of two
or more clusters. A non-hub does not demonstrate any important leadership roles in maintaining
network connectivity and may be the outer node of a cluster. Node categorization is performed
by computing a within-cluster score, zi , and a participation coefﬁcient, Pi . Nodes with similar
roles are expected to have similar relative within-cluster connectivity. A cluster set is deﬁned as
C = {c1 , c2 , ..., cs } where c j ⊂ V and s is the number of clusters in the community structure. If
Φ(i) = {c j |vi ∈ c j },
κi − κ Φ(i)
zi =
σκ
Φ(i)

(3.5)

where κi is the number of edges shared between vi and nodes within its cluster, i.e. intra-cluster
edges for node vi , κΦ(i) is the average of κi over all the nodes in Φ(i), and σκ
is the standard
Φ(i)
deviation of κi . The within-cluster degree z-score measures how well-connected node vi is to other
nodes in the cluster.
Roles are also determined by connections between clusters. For example, two nodes with the
same z-score will play different roles if one of them is connected to several nodes in other clusters
while the other is not. The participation coefﬁcient Pi of node vi is

68

κic 2
j

s
Pi = 1 −

∑

ki

j=1

(3.6)

where κic is the number of links of vi to nodes in cluster c j and ki is the total degree of node vi .
j
The participation coefﬁcient of a node is therefore close to 1 if its links are uniformly distributed
among all the clusters and 0 if all its links are within its own cluster.
Guimera et. al. [113] describe a method to classify a node as provincial, connector, or non hub
based on the value of the node’s within-cluster score and participation coefﬁcient. For example,
a node with a within-cluster score greater than 2 and a participation coefﬁcient less than 0.3 is
categorized as a provincial node. However, the problem with the classiﬁcation technique of [113]
is that a priori knowledge about cluster membership is required. Therefore, a reliable clustering
algorithm is ﬁrst necessary to identify clusters and their nodal memberships.

3.3

Graph Energy

In this section, current deﬁnitions of graph energy are presented. This is followed by the
introduction of a new graph energy metric based on the Laplacian matrix and an algorithm for
detecting local and global structural anomalies in a graph.

3.3.1

Existing Measures of Graph Energy

Energy is a well deﬁned metric in signal processing for quantifying the norm of a ﬁnite signal.
To measure the stability and reliability of connectivity in a graph, a global energy measure was
ﬁrst deﬁned by Gutman et. al [114] as

n
E(G) =

∑ |λi|

(3.7)

i=1

where λ1 , λ2 , ...λn are the eigenvalues of the adjacency matrix. If a graph is not connected, its
69

energy is the sum of the energies of its connected components; therefore, isolated vertices do not
contribute or affect the total energy. Energy is affected, however, by the number of edges in the
graph. Day et. al [115] reports graph energy changes if edges are removed or added to the graph.
Therefore, if a node which is linked to the network by at least one edge, is removed, then energy
will change. Another measure of energy, Hückel energy, was introduced by Coulson [116] to
determine the energies of molecular orbitals in hydrocarbon systems. The vertices of the graph
associated with a given molecule are in one to one correspondence with the carbon atoms of the
hydrocarbon system. Hückel energy, deﬁned as

HE(G) =





2 ∑r λi
i=1

if n = 2r
(3.8)


 2 ∑r λ + λ
r+1 if n = 2r + 1
i=1 i
insures that the total π-electron energy of a conjugated hydrocarbon is simply the energy of the
corresponding molecular graph. Balakrishnan [117] argues Hückel energy provides a better approximation for the total π-electron energy than Gutman’s deﬁnition. This is attributed to using
only a subset of the leading eigenvalues associated with the adjacency matrix. This approach can
be thought of as revealing the internal structure of the data such that the variance in the data is best
explained [118]. Coulson’s approach is also supported by Allefeld et. al. [119] who demonstrated
that the leading eigenvalues have increased correspondence with the strength of local network connectivity than trailing eigenvalues. The information revealed by Hückel energy, however, is limited
since the spectral information of the adjacency matrix has been shown to be less indicative of structural content than the spectral content of the Laplacian matrix [120]. Furthermore, these measures
of graph energy do not evaluate the local properties of graphs. Individual nodes or subgraphs may
be responsible for the energy content of a graph due to their local connections. In the next section,
an extension of Hückel energy, Laplacian Hückel Energy, is presented along with an algorithm for
70

computing local energy indices.

3.3.2

Laplacian Hückel Graph Energy

It has been shown that the Laplacian matrix retains greater structural information of a graph
than the adjacency matrix [120, 121] where the Laplacian matrix of a graph is deﬁned as


 k if i = j
 i



Li j =
 −1 if Ai j = 1




 0 otherwise

(3.9)

.
Therefore, an energy measure computed from the eigenspectra, (µ1 ≥ µ2 ≥ ... ≥ µn ), of the
Laplacian matrix is proposed. The Laplacian matrix is positive semi-deﬁnite and its eigenvalues
are all positive.
While the adjacency matrix eigenvalues have a zero-mean distribution, i.e.

∑n λi
i=1 = 0, the
n

Laplacian matrix does not share this property since all its eigenvalues are positive. Computing the
trace of the Laplacian matrix as

tr(L) =

∑ Lii

(3.10)

∑ ∑ Ai j

(3.11)

i
n n

=

i j
n
= ∑ ki
i

(3.12)

∑n µ
∑n k
and with ∑n µi = tr(L), it can be shown that i=1 i = i=1 i , i.e. the average degree of the
n
n
i
network is equivalent to the average of the Laplacian eigenvalues. The average is subtracted from
71

∑n k
each of the Laplacian eigenvalues to render a zero-mean distribution such that µi = µi − i=1 i .
n
Now, ∑n µi = ∑n λi = 0 and the adjacency matrix eigenvalues in the original deﬁnition of
i=1
i=1
Hückel energy are substituted with the new eigenvalues of the Laplacian matrix to obtain the
Laplacian Hückel Energy measure deﬁned as

LHE(G) =





2 ∑r |µi |
i=1

if n = 2r
(3.13)


 (2 ∑r |µ |) + |µ
r+1 | if n = 2r + 1
i=1 i
This new deﬁnition of graph energy incorporates the advantage that most structural information is
contained in the ﬁrst few largest eigenvalues of the Laplacian matrix. It is also expected to be more
discriminative of energy content than Gutman’s and the conventional Hückel energy in the graph
since the spectrum of the Laplacian matrix is more superior at representing structural connectivity
than that of the adjacency matrix.

3.3.3

Localized Laplacian Hückel Graph Energy

In this section, the lack of signal processing inspired local graph indices is addressed by extending the proposed Laplacian Hückel Energy to evaluate the energy content of nodes and their
neighborhoods across the network. This index can be used to quantify the importance of a node
based on its magnitude of contribution to total graph energy.
The h-hop neighborhood of a node vi is deﬁned as H(i) = {v j ∈ V |D(i, j) ≤ h}, where h ∈
{1, 2, 3, .., d}. Therefore, Vh (i) = (vi ∪ H(i)) and Eh (i) = {ek ∈ E|vk , v ∈ Vh (i)}, where Gh (i) =
(Vh (i), Eh (i)) is the subgraph of G containing node vi and its h-hop neighbors. To deﬁne nodal
energy contribution of a node, the subgraph, Gh (i), obtained by the node of interest vi and all of its
h-hop neighbors is considered. This limits the energy dependencies of G to the internal structure
of Gh (i) only. Removing a node or subgraph and computing the effect on the original network’s

72

energy is not an accurate representation of energy contribution since the original network’s total
energy is not a sum of its subgraphs’ individual energies. Therefore, instead of removing the nodes
and edges of Gh (i), randomizations of Gh (i)’s internal structure are considered, i.e. changing the
position of edges, to determine how a perturbation in the local neighborhood of node vi affects
the energy of the whole graph G. The number of nodes, n = |Vh (i)|, and edges, m = |Eh (i)|,
are preserved to maintain a constant Cost, where Cost =

2|E|
and 0 ≤ Cost ≤ 1, and graph
n(n−1)

order across randomizations. As Cost approaches 1, there are less permutations of the subgraph
possible. When Cost = 1, Gh (i) is complete, and therefore cannot be permuted. Therefore the
node is deemed redundant and unnecessary for maintaining structural integrity of the network, i.e.
it contributes zero energy to G since removing it will not change connectivity in the graph. A
permuted Gh (i) will be identiﬁed as PGh (i). If the total possible number of edges in a graph is
n (n −1)
, then the number of possible PGh (i) s is
α=
2

Algorithm 3 Nodal contribution to graph energy
1: Input: Graph G=(V,E).
2: Output: d × n matrix β containing nodal energy contributions.
3: n = |V |.
4: Compute n × n distance matrix, D, of G.
5: for i = 1 to n do
6: for h = 1 to d do
7:
Find Gh (i)
8:
m = |Eh (i)|, n = |Vh (i)|.
1
9:
α = 2 n (n − 1).
10:
for r = 1 to |PGh (i)| do
11:
Gr = G\Gh (i) ∪ PGr (i)
h
12:
energy(r)=LHE(Gr ).
13:
end for
14:
LHErand =mean(energy)
15:
βh (i) = |LHE(G) − LHErand |.
16: end for
17: end for
73

α
|PGh (i)| =
m

−1

(3.14)

Therefore, a new subgraph, PGr (i), is formed from each permutation where the integer r =
h
{1, 2, 3, ...|PG j (i)|}. The Laplacian Hückel Energy of the new graph is evaluated, Gr , where
Gr = G\Gh (i) ∪ PGr (i) for each randomization of Gh (i) and an average change in energy is comh
puted. The node associated with the greatest change in energy is deemed the most important node
for maintaining structural integrity as long as the following constraints are adhered to during the
randomization of the subgraphs:
• Every Gr generated must be a connected graph, i.e. no isolated vertices or multiple components.
• Gh (i) is connected but the permuted version, PGh (i), does not need to be connected.
The importance of node vi , associated with the subgraph Gh (i), in preserving structural connectivity of G is proportional to |LHE(G) − LHErand |, i.e. the larger the difference between the two
energies within a particular h-hop neighborhood, the more important a node is to the network.
The h-hop neighborhood determines if the energy change is due to the center node’s local or more
extensive/global relationships.

3.4

Graph Entropy

Entropy is another classic measure commonly used in signal processing and information theory.
In this section, current deﬁnitions of graph entropy and graph entropy rate are presented. This is
followed by the introduction of a new graph entropy rate measure based on modeling the adjacency
matrix as a Markov process. This new deﬁnition of graph entropy is utilized in an algorithm
designed to quantify the complexity of graph subregions as well as to detect local anomalies.
74

3.4.1

Existing Measures of Graph Entropy

In information theory [122], entropy is a measure of the uncertainty associated with a random
variable. The original deﬁnition of graph entropy was introduced by Korner [123] which quantiﬁes
the lower bound on the complexity of graphs. It is deﬁned as H(G) = minX,Y I(X;Y ) such that
the minimum is taken over all pairs of random variables X, Y such that X is a uniformly selected
vertex in G and Y is the set of vertices containing X. Similar to basic properties of entropy in
information theory, if G contains no edges, then H = 0 and if G is complete such that every node
is connected to all of the other nodes, then H(G) = log2 (n), where n is the cardinality of V , the
number of nodes. Implementation of Korner’s deﬁnition of entropy, however, is NP hard which
makes its evaluation for real world networks impracticle.
A more recent deﬁnition for computing entropy in real-world networks was proposed by Dehmer
et al. [124]. Dehmer proposed to quantify the complexity of a graph using Shannon’s deﬁnition
of entropy such that the probability distribution is computed from a metrical property or graph
invariant, e.g. node degree or closeness centrality. Entropy is deﬁned as


|V |

I f (G) =



f (vi ) 
f (vi )

log 

|V |
|V |
i=1 ∑
f (v j )
f (v j )
∑ j=1
j=1

∑

(3.15)

where f is an arbitrary information functional. The set S j (vi , G) = {v ∈ V |d(vi , v) = j, j ≥ 1}
is called the j-sphere, similar to the previously deﬁned h-hop neighborhood, of vi regarding G
and d(vi , v) denotes the shortest distance between nodes vi and v. For example, the information
functional computed from node degree is

f V (vi ) = α c1 |S1 (vi ,G)|+c2 |S2 (vi ,G)|+...+cρ |Sρ (vi ,G)|

75

(3.16)

where ck ≥ 0 is an arbitrary real positive coefﬁcient, 1 ≤ k ≤ ρ, ρ is the maximum j-sphere level
evaluated, and α > 0. Dehmer’s method is a good approach to incorporating the local connectivity
information to compute overall graph entropy. However there are three noticeable disadvantages
of Dehmer’s method. First, the many variables in the information functional and the sensitivity
to the type of graph metric incorporated will have a biased impact on entropy. Entropy will be
signiﬁcantly different across varying α and graph metrics. Second, Dehmer’s solution evaluates
random realizations of ck and which have no actual signiﬁcance or relationship to the structure of
the network. This also makes it difﬁcult to compare across multiple graphs if the same realizations
of ck are not used. Finally, as discussed in section 3.2, node degree is a weak metric to understand the structural connectivity of the network. The entropy measure is dependent on this graph
metric and therefore further research is necessary to determine which is the ‘best’ graph metric for
representing the graph. Another local entropy measure was proposed by Shetty et. al [125] for
identifying the local hubs in the social network of the Enron corporation. Shetty computed a probability distribution based on the frequency of certain e-mail correspondences within the company.
The novelty is not in the deﬁnition of entropy, since he directly implements Shannon’s deﬁnition,
but rather from the procedure for evaluating the contribution of individual nodes to entropy of the
entire network. Nodes which have the most effect of graph entropy when they are removed are
thought to be important to network connectivity.
Next, measures for computing entropy rate of a graph are discussed. The ﬁrst was presented
in an application by Demetrius et. al. [126] who implemented Kolmogorov-Sinai entropy rate
by evaluating random walks along the graph. This measure of graph entropy rate, proposed by
Burda et. al. [127], computes the entropy rate of a graph as log λ1 where λ1 is the largest
eigenvalue of the adjacency matrix. Sinatra et. al [128] recently applied Burda’s deﬁnition to

76

quantify the maximum level of information diffusion across a network. Burda’s measure, however,
is plagued by a similar problem observed in Dehmer’s measure; a dependence on node degree
which is the weakest measure of network connectivity [129]. Finally, Szpankowski et. al. [130]
proved that the structural entropy rate of an Erdos-Renyi random graph is n h(p) − n log n + O(n)
2
where h(p) = −p log p − (1 − p) log(1 − p) is the entropy rate of a conventional memoryless binary source. Szpankowski is the ﬁrst to present a graph coding scheme based on a measure of
entropy rate. However, entropy is computed under the assumption that all graphs with an order n
and size m are equiprobable and edges are formed with uniform probability. This is not the case in
real-world networks which follow a power-law distribution.
Burda’s entropy rate is generalizable to any graph but is not suitable for evaluating graph compressibility since a graph cannot be uniquely reconstructed solely with knowledge of its degree
sequence. Szpankowski offers a novel compression method for graphs, however it is developed
from examining random graphs only. This motivates the need for a new entropy measure which
considers the unique structure across an array of graph topologies, not only random graphs, and
can also lead to an appropriate and practical graph coding algorithm.

3.4.2

Proposed Graph Entropy Rate

In this section, a new measure of graph entropy rate is proposed for unweighted (binary) and
weighted graphs by modeling their adjacency and connectivity matrices as a Markov process.
The adjacency matrix is directly related to the structure of a graph and therefore intuitively the
spatial coherence of elements in the matrix will reﬂect the pair wise relationships among nodes
in the network. For example, nodes in a random graph do not have an organized relationship and
therefore it’s expected this random relationship will be reﬂected in the lack of spatial coherence
among elements in the adjacency matrix. To examine the variation in adjacency matrix element

77

dependencies, similar to pixel dependencies in an image [131, 132, 133], it is proposed that the
adjacency matrix be modeled as a Markov process.
Entropy rate of a Markov process X1 , X2 , X...Xr is formally deﬁned as

H(χ) = − ∑ πi Pi j log2 Pi j
ij

(3.17)

where P = [Pi j ] is the transition probability matrix such that i, j = 1, 2, ..., n, Pi j = Pr(Xr+1 =
j|Xr = i), and π is the stationary distribution.
In this section, we ﬁrst introduce a new deﬁnition of the entropy rate of a binary graph, H(G),
by transforming the binary adjacency matrix into a stochastic process X1 , X2 , ..., Xr , for r = 1, 2, ...,
through permutation and scanning operations. If X is an mth order Markov process, we deﬁne the
general form of the entropy rate for a graph as

H(G; m)

min Hx (ZAZ T ; ψ)
Z,ψ

(3.18)

where Z is a permutation matrix applied to the adjacency matrix, A, and ψ is the particular scanning
function of the upper triangular matrix elements. These two variables deﬁne our transition matrix,
P.
Equation 3.18 can be extended to evaluate the entropy characteristics of a weighted graph such
that

H w (G; m)

min Hx (ZW Z T ; ψ;CQ )
Z,ψ,CQ

where CQ is a codebook generated by a quantizer.

78

(3.19)

3.4.2.1

Matrix Bandwidth Minimization

For a graph, G, composed of n nodes, there exists n! permutations, i.e. isomorphisms, of G.
The structural information of the graph is retained across all isomorphisms of G. Therefore, to
maximize the spatial coherence of the adjacency matrix elements, similar elements in the adjacency matrix are grouped together by implementing an optimization technique, matrix bandwidth
minimization [134]. A labeling f of G assigns the integers {1,2,...n} to the vertices of G [135]. Let
f (v) be the label of vertex v, where each node has a different label. The bandwidth minimization
function generates a permutation matrix, Z, such that Z is an n order binary symmetric orthogonal
square matrix where each row and column contains precisely a single 1 and zeros everywhere else.
The permuted versions of the binary adjacency and weighted connectivity matrices are obtained,
respectively, by ZAZ T and ZW Z T .
For binary graphs, the bandwidth of a node vi , B f (vi ), is the maximum of the differences
between f (vi ) and the labels of its adjacent nodes. That is

B f (vi ) =

max {| f (vi ) − f (v j )|}
v j ∈N(vi )

(3.20)

where N(vi ) = {v j : Ai j = 1} for j = {1, 2, ...., n}. The bandwidth of a binary graph G with respect
to a labeling f is then B f (G) = maxv ∈V {B f (vi )}.
i
For weighted graphs, the weighted graph bandwidth minimization (WGBM) [134] cost function identiﬁes a permutation of the connectivity matrix which places similar weights close together. A labeling f of weighted graph G assigns positive integers to the vertices of G. Let
f (v) ∈ {1, 2, ..., n} be the label of vertex v, where each node has a different label. The objective of
the WGBM is to minimize the maximum weighted label difference of an edge, i.e. the edge weight
times the label difference. That is
79

B f (vi ) =

max {wi j × | f (vi ) − f (v j )|}
v j ∈N(vi )

(3.21)

For a fully connected graph, e.g. pairwise phase synchrony matrices, N(vi ) = V \vi where V is the
set of all vertices in the graph. The bandwidth of a weighted graph G with respect to a labeling f
is then B f (G) = maxv ∈V B f (vi ).
i
Z is generated by reordering the rows and columns of an identity matrix according to the optimal labeling of the unweighted or weighted bandwidth minimization cost function, which can
be based on node degree [136], clustering algorithms [137] , and centrality measures [110]. For
example, DBSCAN is a density based clustering algorithm proposed by Ester et. al. [138] which
applies a Euclidean distance measure, and identiﬁes a number of clusters, for binary and weighted
graphs, starting from the estimated density distribution of corresponding nodes. The advantage of
this method is that a priori knowledge of the number of expected clusters is not required. Dongen
et. al. [139] introduced a fast and scalable unsupervised clustering algorithm, Markov Cluster
Algorithm (MCL), for binary and weighted graphs based on the simulation of random ﬂow in
graphs. Ding et. al. [140] introduced a spectral bandwidth minimization method (Spectral BWM)
which is a cross between clustering and node centrality measures. The permutation matrix Z is
formed by assigning labels to nodes based on their co-occurrence in a cluster and can maximize
spatial coherence in the adjacency matrix. However, bandwidth minimization cost functions based
on these three clustering algorithms are primarily suitable for highly modular networks only. An
alternative and much more generalizable cost function is the Cuthill-Mckee algorithm by CutHill
et. al. [141]. It is a variant on the breadth-ﬁrst search of graphs and assigns labels based on the
degree relationship of adjacent nodes. An extension of the Cuthill-Mckee algorithm to weighted
graphs is proposed by Lin et. al. [142]. The weighted algorithm has a complexity of O(n3 ) where

80

n is the number of vertices in the graph and is generalizable to all weighted graphs. Walker et.
al. [136] introduced Approximate Minimum Degree (AMD) for the purpose of reordering a symmetric sparse matrix prior to numerical factorization. The degree may be the binary or weighted
deﬁnition. Again, Z is generated such that it satisﬁes the reordering conditions set by CuthillMckee or AMD reordering. In general, Cuthill-Mckee is far superior than AMD in minimizing the
matrix bandwidth [143].
3.4.2.2

Scanning Function of the Permuted Matrix

The upper triangle elements of the permuted binary adjacency matrix are mapped to a 1dimensional vector of dimension 1 × k, where k =

n(n−1)
and n = |V |. In a binary sequence
2

for a ﬁrst order Markov process, there exists 4 possible state transitions: 0 → 0, 0 → 1, 1 → 0,
and 1 → 1. The states, 0 → 0 and 1 → 1, are non-transitional sub-sequences and 0 → 1 and 1 → 0
are transitional sub-sequences. The permutation matrix, Z, plays a major role in the entropy rate
measure, H(G), since the entropy rate bounds are dictated by the frequency of occurrence of these
sub-sequences, i.e. the magnitude of spatial coherence among adjacent elements. Entropy will
be minimized if similar states are positioned next to each other in a permuted connectivity matrix
since homogeneity of these subsequences is increased.
A scanning function is implemented to extract this sequence from the permuted adjacency matrix. The scanning function, ψ, preserves the spatial coherence in the permuted 2-dimensional
adjacency matrix when mapped to the 1-dimensional vector. This approach would facilitate convergence to the true lower bound on entropy rate. In image processing, there are various methods
for scanning the pixel values into a vector. Four of the most common approaches are: raster scanning [144], Cantor Diagonal scanning [145], Morton Z order scanning [146], and Hilbert Curve
scanning [147]. Raster scanning extracts elements in the matrix either in an alternating horizontal

81

pattern or an alternating vertical pattern. Cantor diagonal scanning extracts elements along the diagonals of the matrix. Both raster and Cantor diagonalization preserve the locality of element pairs
in the matrix but fail to maximally preserve locality among blocks of elements. Morton Z order
scanning is a function which maps multidimensional data to a one dimension vector while maximally preserving spatial locality of the data points. The z-value of a point in multi-dimensions is
calculated by interleaving the binary representations of its coordinate values. The scanning pattern
resembles a group of ‘Z’s linked to each other. Morton Z order scanning is better suited than raster
scanning for grouping regions of similar pixels in large blocks of an image [148]. The Hilbert
Curve algorithm is a continuous fractal space-ﬁlling curve whose discrete approximation is useful
because it provides a mapping between 1D and 2D space that preserves locality. If (x, y) is the
coordinates of a point within the unit square, and d is the distance along the curve when it reaches
that point, then points that have nearby d values will also have nearby (x, y) values. Kamata et.
al. [145] reports that neighborhood preservation of the Hilbert Curve scan is better than that of the
raster, Cantor Diagonal, and Morton Z order scanning functions.
3.4.2.3

Quantization of the Weighted Connectivity Matrix

Prior to matrix permutation and scanning on the weighted connectivity matrix, quantization is
applied across the connectivity matrix in order to establish a ﬁnite alphabet of states, Q. Matrix
elements are assigned the value pertaining to the level of quantization. A non-uniform quantizer is
best suited for determining these levels since the distribution of values in the connectivity matrix of
a real world network are typically non-uniform, e.g. most real-world networks have a power-law
distribution of their weighted node degrees. Therefore, non-uniform quantization is generalizable
to connectivity matrices representing a variety of graph topologies than uniform quantization. For
a desired number of quantization levels, Lloyd-Max algorithm [149] offers an optimal solution for

82

minimizing distortion, typically quantiﬁed by the mean-square quantization error.

3.4.3

Localized Graph Entropy Rate

In this section, the proposed graph entropy rate measure is extended to evaluate the local complexity of a network. The connectivity topology centered around individual nodes is expected to
have an impact on overall network complexity. An algorithm is presented here which quantiﬁes
the magnitude of importance of individual nodes based on their contribution to network entropy.
The proposed graph entropy measure is ﬁrst applied to the original adjacency or connectivity
matrix of graph G. Next, a node vi is selected from the network and its corresponding row and
column from the matrix are removed, i.e. all edges incident upon vi in G are deleted. Once again,
the proposed graph entropy measure is applied to the transformed graph. The process is formally
described in Algorithm 4. This is repeated for each node to obtain n entropy difference measures.
Nodes that contribute signiﬁcantly to network complexity are expected to be associated with a
large difference in entropy. The algorithm is applicable to both binary and weighted graphs under
the condition that the appropriate graph entropy measure, H(G) or H w (G), is implemented.

Algorithm 4 Local Graph Entropy
Input: An n ordered graph G = (V, E).
Output: hi ∀vi ∈ V .
1: h ← H(G).
2: for i = 1 to n do
3: G = G\vi .
4: h ← H(G).
5: hi = h − h.
6: end for
7: return

83

3.5

Results

In this section, the proposed Laplacian Hückel Energy and Markov Graph Entropy Rate measures are applied to real-world networks. The performance of these measures, as well as the local
indices based on these measures, are compared to published measures. The real-world networks
evaluated are the Zachary karate network and the brain networks represented by the phase synchrony matrices obtained from the ERN EEG study.

3.5.1

Comparison Between Energy Measures

In this section, the local energy index using the proposed Laplacian Hückel Energy is compared
to the local indices of Gutman’s and conventional Hückel energies. These three measures are
applied to the Zachary karate network (Figures 3.1, 3.2, and 3.3). The Laplacian Hückel Energy
was more discriminative than the other two energy measures. In the 1-hop neighborhood, only
Laplacian Hückel Energy associated very high energy contribution to nodes 1 and 33, the dojo
leaders of each of the karate groups in network. In the 2-hop neighborhood, Laplacian Hückel
Energy was much more discriminative than the other two measures and identiﬁed nodes 3, 9, 14,
20, 30, and 31 as nodes which contributed most to network energy. These particular nodes are
responsible for long range connectivity since they represent the karate members associated with
both groups in the karate network. The set of nodes identiﬁed as contributing most to energy by
Gutman and Hückel energy measures include those identiﬁed by Laplacian Hückel Energy but
also nodes whose long range connections are not necessary for maintaining connectivity across the
network. These represent the karate members who are primarily associated with one karate group
but not both. In the 3-hop neighborhood, Gutman, Hückel, and Laplacian Hückel Energy associate
high energy contribution to most of the nodes. Only Laplacian Hückel Energy indicates node 17
as less important for maintaining connectivity since it is the most distant node and its removal
84

would not deteriorate overall connectivity. Therefore, Laplacian Hückel Energy is a much more
discriminative local energy measure than Gutman and Hückel in the karate network.
Next, the nodal Laplacian Hückel energies are compared to betweenness centrality (Bi ) and
conventional clustering coefﬁcient (Ci ) of nodes in the karate network (Table 3.1). Conventional
clustering coefﬁcient fails to recognize the importance of nodes 1, 33, and 34 through very low
clustering coefﬁcient values. This is due to the clustering coefﬁcient only considering the interdependent relationships among nodes within the 1-hop neighborhood rather than the relationships a
node has with the entire graph and its effect on the global energy of the network. The ranking of
betweenness centrality agrees with the proposed measure for the nine nodes shown in Table 3.1.
However, it does not indicate whether the node is important at a local or a more global scale.
Table 3.1: Highest energy contributing nodes in karate network compared with their corresponding
clustering coefﬁcient and betweenness centrality
Node h-hop
34
1
1
1
33
1
14
2
3
2
9
2
20
3
32
3
2
3

3.5.2

Energy Contribution
16.7
11.2
10.3
26.3
24.6
21.1
22.5
21.7
21.4

Ci
0.11
0.15
0.19
0.60
0.24
0.50
0.33
0.20
0.33

Bi
0.69
1.00
0.33
0.10
0.32
0.12
0.07
0.31
0.12

Laplacian-Hückel Energy of a Biological Network: EEG ERN data

The local energy measure was applied to the ERN data set to determine which regions of the
brain contributed most to network energy. An average matrix was ﬁrst computed for both Error
and Correct data across all subjects. Next, a threshold was applied to each average matrix to derive
a sparsely connected weighted graph such that the average degree was k = 10. This value of k
85

(a) Gutman

(b) Hückel

(c) Laplacian Hückel Energy
Figure 3.1: Local energy indices for a n = 34 karate club network: 1-hop Energy Analysis.
was chosen such that varying h-hop levels could be evaluated. Evaluating energy for the average
matrix without thresholding is equivalent to only one neighborhood containing all nodes, i.e. the
1-hop neighborhood, which did not provide useful localized energy information. A value of k = 10
allowed 5 different neighborhood levels to be observed thus demonstrating how increases in energy
86

(a) Gutman

(b) Hückel

(c) Laplacian Hückel Energy
Figure 3.2: Local energy indices for a n = 34 karate club network: 2-hop Energy Analysis.

contribution propagated across neighborhoods of increasing size or scale. Energy was reevaluated
for graph mappings of the ERN data with average degrees of k = 40, 30, 20 to determine the extent
to which k inﬂuenced energy in the biological network. Increasing k resulted in a larger set of hhop neighborhoods, i.e. beyond h = 5, yet the variations in nodal energy were signiﬁcant for only
87

(a) Gutman

(b) Hückel

(c) Laplacian Hückel Energy
Figure 3.3: Local energy indices for a n = 34 karate club network: 3-hop Energy Analysis.
h = 1, h = 2, and h = 3 h-hop neighborhoods. Therefore, biological signiﬁcances of the graph
were evaluated with k = 10.
Figures 3.4 and 3.5 reveal that the most inﬂuential brain regions to network energy contribution
are located around the frontal and frontal/central regions of the brain. Previous work indicates that
88

the mPFC is expected to be more inﬂuential to communication across the brain during an error
response than a correct response. This is shown to be true in the 1-hop neighborhood of Error Correct energy contribution differences (Figure 3.4). FCz, FC2, and FC4 demonstrated signiﬁcantly
(p ≤ 0.01 Wilcoxon Rank Sum) greater energy contributions to the network during error responses
compared to correct responses. The 2-hop neighborhood demonstrated signiﬁcantly greater energy
during error responses (p ≤ 0.04 Wilcoxon Rank Sum) in the lateral regions of the brain, particularly for the right lateral nodes. This may be indicative of nodes in the lPFC communicating
with distant neural regions due to activation of the ACC. Increased activation is also observed in
the occipital region but is expected since the data was collected from a visual based test. Similar
anomalies are observed in the 3-hop, 4-hop, and 5-hop neighborhoods (not shown) with decreasing
signiﬁcance.

3.5.3

Comparison Between Local Graph Entropy Measures

In this section, the local entropy index using the proposed Markov Graph Entropy Rate is
compared to the local indices using Burda’s and Szpankowski’s entropy rate deﬁnitions. First,
a scale-free network composed of n = 60 nodes was generated. A subregion of the network,
consisting of 17 nodes, was perturbed by randomly re-wiring the connections according to the
Erdös-Rényi model with probability p = 0.5. This perturbed region contains the subset of nodes
indicated by dark black node borders in Figure 3.6. The three local indices were computed for
both the simulated network (Figure 3.6) and Zachary’s karate network (Figure 3.7). All entropy
measures are normalized by the maximum node entropy.
The graphs in Figure 3.6 demonstrate that the proposed Markov Graph Entropy rate measure
is capable of identifying the nodes in the network contributing most to network connectivity. The
node members of the perturbed region are all assigned magnitudes greater than the rest of the nodes

89

1−hop Energy: Error − Correct
4
3
2
1
0
−1
−2
−3
−4

Figure 3.4: 1-hop neighborhood Energy differences (Error - Correct).

in the network. Szpankowski’s entropy generates results opposite that of the proposed entropy
measure. According to the results of Szpankowski’s entropy, the subregions of the network with
a more organized structure is deemed more complex than the subregion with random connectivity.
Burda’s entropy rate fails to identify the perturbed sub-network and assigns high entropy rate to
only one of the organized sub-networks. It fails to choose either the perturbed network as the
proposed measure does or all the organized sub-networks like Szpankowski’s does. The Markov
Graph Entropy Rate is therefore more discriminative than Burda’s entropy and correctly identiﬁes
the perturbed network unlike Szpankowski.
The graphs in Figure 3.7 reveal very similar evaluation of node contribution to network com90

2−hop Energy: Error − Correct
20
15
10
5
0
−5
−10
−15
−20

Figure 3.5: 2-hop neighborhood Energy differences (Error - Correct).

plexity. Nodes 1 and 34 are identiﬁed by all three methods as being important and in general the
nodes centered in the network contributed to overall complexity. The similarities in nodal entropy
contributions between all three methods may be due to the small order of the network. However,
this result indicates there is consistency between the proposed entropy measure and published
measures.

3.5.4

Entropy Rate Across Binary Graph Topologies

Graph entropy 1 was computed for ﬁve types of artiﬁcial graphs: star graph, scale-free BarabásiAlbert graph, Erdös-Rényi random graph, three modular graphs with 10, 15, and 25 clusters, and
a lattice graph. Fifty instances of each graph were generated for increasing order, n ∈ {50, 250,
91

(a) Markov Graph Entropy Rate

(b) Szpankowski

(c) Burda
Figure 3.6: Comparisons between nodal graph entropy measures for identifying anomalous regions
in a simulated network with n = 60 nodes. The network is composed of four complete graphs and
a random graph performing as a perturbation in the network.
500, 1000}. Entropy was computed using both the proposed measure (Table 3.2) and that proposed
by Burda in [127] (Table 3.3). Burda’s measure is computed as H(G) = log2 λ1 where λ1 is the
largest eigenvalue of the adjacency matrix. Graph entropy was also computed for the following
published undirected scale-free networks (Table 3.4):
1. Dolphin network: a social network of frequent associations between 62 dolphins in New
Zealand [34].
92

(a) Markov Graph Entropy Rate

(b) Szpankowski

(c) Burda
Figure 3.7: Comparisons between nodal graph entropy rate measures for identifying important
regions in the karate club network with n = 34 nodes.
2. Les Miserables: a co-appearance network of 77 characters in the novel Les Miserables [35].

3. The North American airport network: a ﬂight connection network composed of 456 airports
[36].

1 In this section, we employ our deﬁnition of the entropy rate to evaluate the graph entropy of
different classes of popular graph topologies. Due to the ﬁnite nature of the graphs used in our
simulations, effectively we are evaluating an idealistic measure of the graph entropy and not the
true entropy rate, which theoretically requires inﬁnitely large graphs. Our objective is still the same
93

Graph entropy was computed assuming a ﬁrst-order Markov process, matrix bandwidth minimization was performed using the reverse Cuthill-McKee (RCM) algorithm [141], and the horizontal raster scanning function was implemented. Results indicated random graphs had an entropy
of H ≈ 1 for all n using the proposed entropy rate. Burda’s entropy rate revealed random graph
had the largest entropy rate across all the graph topologies investigated here. This is expected
since random graphs do not have an organized structure in which an underlying pattern can be
identiﬁed. For the rest of these results, the entropy values for all other topologies are normalized
with respect to entropy of the random networks to allow a comparison between the proposed and
Burda’s entropy rate measures.
Table 3.2 reveals that both the lattice and star graphs possess trivial structures which result
in a low entropy. However, the more complex structures underlying the modular, scale-free, and
random graphs are expected to have a much larger entropy than star and lattice graphs. In contrast
to the structure of random graphs, scale-free and modular graphs possess underlying organized
structures. This permits the Cuthill-McKee algorithm to group together more 1s, i.e. generating
more non-transitional sub-sequences composed of 1s. The scanned sequence results in a signiﬁcantly lower entropy than a random network. Across all scale-free graphs generated, H ≈ 0.5 for
low n and monotonically decreased with increased n. This is caused by the occurrence of longer
non-transitional 0 sub-sequences generated by the growing network sparsity as n increases. Entropy across the modular networks are low because the individual clusters generate sub-regions of
increased spatial coherence between neighboring elements in the permuted adjacency matrix. In
general, the organized structures of scale-free and modular graphs are appropriately described by
the lower measures of entropy. The application of our proposed measure on the real-world netthough; and that is to gain an insight about the entropy bound associated with different types of
graph topologies.
94

works (Table 3.4) reveals they all possess entropy values similar to the simulated scale-free and
modular networks. This is expected since these real-world networks are classiﬁed in the literature
as scale-free since their degree distributions model power law distributions.
It is important to note that the expected complexity of each of the observed graphs is reﬂected
by their respective entropy quantiﬁed by the proposed deﬁnition. The deﬁnition of graph entropy
rate by Burda assigns large entropy values to both lattice and random graphs. This measure is
therefore misleading by indicating that a simple lattice is as complex as a random network. According to Sinatra et. al. [128], information diffusion across a lattice and random network will be
similar since most nodes will have similar degrees. A network with a fairly homogenous distribution of degrees will permit all paths in the network to have equal probability of being traversed
by a random walk, i.e. Burda’s entropy rate will increase. This is a consequence of Burda’s measure being highly dependent upon node degree. Analysis of the real-world network entropy, Table
3.4, reveals the dolphin network can be characterized as scale-free according to Burda. However,
Burda’s measure characterizes the Airport and Les Miserable networks more like that of a random network. The increased order and sparsity of these two networks may be the cause for this
misclassiﬁcation by Burda’s measure.
Table 3.2: Entropy computed for various graph models using the proposed measure. Fifty realizations of each model were generated and an average entropy is reported. Each value is normalized
by that of a random network of equivalent order.

Random (Erdös-Rényi)
Modular (10 clusters)
Modular (15 clusters)
Modular (25 clusters)
Scale-Free (Barabási-Albert)
Lattice
Star

n = 50
0.99 ±10−3

n = 250
1.00 ±10−3

n = 500
1.00 ±10−3

n = 1000
1.00 ±10−3

0.46 ± 0.03
0.38 ± 0.01
0.27 ± 0.01
0.50 ±10−3
0.22 ± 0.01
0.23

0.51 ± 0.01
0.41 ± 0.01
0.33 ± 0.01
0.46 ±10−3
0.06 ±10−3
0.07

0.52 ± 0.01
0.42 ± 0.01
0.34 ± 0.01
0.39 ±10−3
0.03 ±10−3
0.04

0.51 ± 0.01
0.43 ± 0.01
0.33 ± 0.01
0.22 ±10−3
0.01 ±10−3
0.02

95

Table 3.3: Graph entropy computed for various graph models using Burda’s deﬁnition. Fifty realizations of each model were generated and an average entropy is reported. Each value is normalized
by that of a random network of equivalent order.
n = 50

n = 250
1±10−3
0.70±10−3

Random (Erdös-Rényi)
1 ± 0.04
Modular (10 clusters)
0.54 ± 0.06
Modular (15 clusters)
0.45 ± 0.06
Modular (25 clusters)
0.34 ± 0.07
Scale-Free (Barabási-Albert) 0.53± 0.17
Lattice
0.79 ±10−3
Star
0.61

0.64± 0.01
0.57± 0.02
0.39 ± 0.13
0.86±10−3
0.57

n = 500
1 ± 0.01
0.75 ± 10−3
0.69 ±10−3
0.63 ±10−3
0.38 ± 0.08
0.88 ±10−3
0.56

n = 1000
1 ±10−3
0.78 ±10−3
0.73 ±10−3
0.67 ±10−3
0.35 ± 0.08
0.90±10−3
0.56

Table 3.4: Entropy rates for three real scale-free networks using the proposed measure and Burda’s
measures. All measures are normalized by the entropy rate of random networks with equivalent
order.
Proposed Measure Burda’s Measure
Dolphin (n = 62)
Les Miserable (n = 77)
Airport (n = 456)

0.41
0.38
0.65

0.57
0.68
0.99

After computing the entropy rate of each network, the upper triangular sequences were coded
using the Lempel-Ziv compression algorithm [150]. Tables 3.5 and 3.6 report the achieved compression of each network. Using the universal coding algorithm Lempel-Ziv, the elements within
the adjacency matrix upper triangle, with the exception of random graphs, could on average be
represented with less than 1 bit. The lowest bit/symbol occurred for lattice, star, and scale-free
graphs particularly for higher graph orders. Across all n, the variations in compression are in
agreement with the variations in entropy computed by our proposed measure. In general, the
bits/symbol achieved by Lempel-Ziv, as compared with the computed entropy of both simulated
and real-world networks, indicate that Lempel-Ziv may not be the best coding algorithm for this
application. Our entropy measures indicate that the graphs can be compressed further.
96

Table 3.5: Bits/symbol achieved with Lempel-Ziv algorithm.
n = 50
Star
0.50
Scale-Free (Barabási-Albert) 0.52
Random (Erdös-Rényi)
1.51
Modular (10 clusters)
0.94
Modular (15 clusters)
0.69
Modular (25 clusters)
0.64
Lattice
0.51

n = 250
0.12
0.14
1.33
0.68
0.58
0.51
0.14

n = 500
0.07
0.08
1.30
0.68
0.58
0.49
0.07

n = 1000
0.04
0.04
1.29
0.67
0.56
0.48
0.03

Table 3.6: Bits/symbol achieved using Lempel-Ziv algorithm for four real scale-free networks.
Network
Dolphin (n = 62)
Les Miserable (n = 77)
Airport Network (n = 456)

3.5.5

Bits/Symbol
0.74
0.63
0.73

Entropy with Respect to Permutation and Scanning Function

In this section, the proposed entropy rate was evaluated across graph topologies, each with 100
node members, by varying the combinations of permutation cost functions and scanning functions.
For the analysis, 50 simulations of each graph topology were generated. The plots in ﬁgures 3.8
through 3.10 reﬂect the average entropy rate for each possible permutation/scanning function pair.
The permutation functions tested include: MCL, Spectral BWM, DBscan, Cuthill-Mckee, and
AMD. The scanning functions include raster, Cantor, Morton, and Hilbert scans.
According to the plots associated with random and star graphs, entropy rate remained fairly
similar across all possible combinations. Therefore, the choice in permutation/scanning functions
does not make a large difference in computing entropy rate for either topology. As for modular
and scale-free graphs, entropy rate remained midway between that of a random and star graph.
However, entropy rate was minimized for modular graphs using MCL as the permutation function
combined with a vertical raster scanning function. MCL performed the best for forming denser
97

regions of 1s in the adjacency matrix. Scale-free graph entropy rate was minimized using the
Cuthill-Mckee algorithm as the permutation matrix. There was no signiﬁcant difference in entropy
from using any of the ﬁve scanning functions. Interestingly, there was a large gap in entropy for
lattice graphs with respect to certain permutation/scanning combinations. Spectral BWM, DBscan,
and MCL performed poorly and the results were as good as if no permutation was applied to the
adjacency matrix. Lattice graphs are not modular, which may explain why clustering algorithms
such as these did not perform well. The lowest entropy rate measures were computed using either
Cuthill-Mckee or AMD as the permutation matrices. The choice of scanning function for the lattice
graph made no signiﬁcant difference to entropy rate.

3.5.6

Graph Entropy with Respect to Order of the Markov Process

In this section, we evaluate the effect of the choice of Markov process order on the computed
entropy rate. Figure 3.11 reports the relationship between Markov order and graph entropy rate
across graph topologies. Entropy was computed using the Cuthill-Mckee algorithm for permutation with a horizontal raster scanning function. The results indicate entropy rate of random graphs
do not signiﬁcantly change as the assumed order increases. This is expected since random graphs
do not have much structure and increasing the Markov model order has no effect. Modular graph
entropy signiﬁcantly, p ≤ 0.05 Wilcoxon Rank Sum, decreased when transitioning from order 1 to
order 2. This change in entropy supports the fact that there will exist isolated subregions, in the
permuted adjacency matrix, containing a dense collection of 1s. Since these regions are ﬁnite in
size, entropy change is restricted to a low Markov order. Scale-free graphs similarly perform as
modular graphs with the exception that entropy decreases signiﬁcantly, p ≤ 0.05 Wilcoxon Rank
Sum, for higher orders. Entropy of lattice graphs do not change signiﬁcantly across order number.
This may be indicative of the lower entropy rate bound being achieved. Finally, entropy rate for

98

Random Networks, n=128
None
Cuthill−Mckee
Spectral BWM
AMD
DBscan
MCL

Graph Entropy Rate

1
0.8
0.6
0.4
0.2
0
Raster (horiz)

Cantor Diagonal

Raster (vert)

Scanning Function, ψ

Morton

Hilbert

(a) Random

Modular Networks, n=128
1

None
Cuthill−Mckee
Spectral BWM
AMD
DBscan
MCL

Graph Entropy Rate

0.8

0.6

0.4

0.2

0
Raster (horiz)

Cantor Diagonal

Raster (vert)

Scanning Function, ψ

Morton

Hilbert

(b) Modular
Figure 3.8: Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes.
a) Random networks; b) Modular networks.

99

Scale−Free Networks, n=128
1

None
Cuthill−Mckee
Spectral BWM
AMD
DBscan
MCL

Graph Entropy Rate

0.8

0.6

0.4

0.2

0
Raster (horiz)

Cantor Diagonal

Raster (vert)

Scanning Function, ψ

Morton

Hilbert

(a) Scale-Free

Star Networks, n=128
1

None
Cuthill−Mckee
Spectral BWM
AMD
DBscan
MCL

Graph Entropy Rate

0.8

0.6

0.4

0.2

0
Raster (horiz)

Cantor Diagonal

Raster (vert)

Scanning Function, ψ

Morton

Hilbert

(b) Star
Figure 3.9: Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes.
a) Scale-Free networks; b) Star networks.

100

Lattice Networks, n=128
1

None
Cuthill−Mckee
Spectral BWM
AMD
DBscan
MCL

Graph Entropy Rate

0.8

0.6

0.4

0.2

0
Raster (horiz)

Cantor Diagonal

Raster (vert)

Scanning Function, ψ

Morton

Hilbert

Figure 3.10: Average graph entropy, computed over 50 simulations of each network type, with
respect to various permutation and scanning functions across graph topologies each with 128 nodes
(Lattice networks)

star graphs decreases signiﬁcantly up to the maximum order evaluated, 8. Star graphs are among
the simplest topologies and therefore they are less of a challenge to compress. The only information required to reconstruct a star graph is the number of nodes in the network and the label of the
center node.

3.5.7

Entropy Rate of a Biological Network: EEG ERN data

We are primarily interested in understanding the change in complexity across neural networks
involved during cognitive control. The proposed local weighted graph entropy measure was applied to each matrix containing the pairwise phase locking values for error and correct responses
across subjects. Each matrix was quantized for Q = 2 to 10 levels. The quantization errors, measured as the mean square error (MSE), were evaluated across all subjects and are reported in Figure
3.12. Rate/Distortion plots are provided to demonstrate the variations in quantization errors when
101

Graph Entropy Rate vs Markov Order
1
0.9

Entropy Rate

0.8
0.7
Random
Modular
Lattice
Scale−Free
Star

0.6
0.5
0.4
0.3
0.2
0.1
0

1

2

3

4
5
6
Markov Order

7

8

Figure 3.11: Effects of order value on graph entropy measures for various graph topologies.

applying the Lloyd-Max algorithm across all phase synchrony matrices. Figure 3.12 reveals low
quantization errors across all quantization levels, Q, evaluated and as well as for levels beyond
Q = 10. Variance across subjects, indicated by the error bars, is also small across the number of
Markov states. This indicates the phase synchrony connectivity information was not signiﬁcantly
compromised due to quantization.
Bandwidth minimization on the weighted matrices was performed using the weighted version
of the Cuthill-Mckee algorithm and scanning was performed using the horizontal raster scan. The
entropy differences for all 62 electrodes, representing different neural regions, were also computed
across subjects. The topographical plots, obtained for quantization levels 2 ≤ Q ≤ 10, representing
the signiﬁcant difference, using a Wilcoxon Rank Sum, in local graph entropy between response
types are reported in Figures 3.13 through 3.21.
From these results, it can be seen that nodes located around the frontal-central region of the
brain contributed signiﬁcantly more to the complexity of the underlying brain network during error
102

Rate/Distortion of Quantized Connectivity Matrices (Error)
0.04

Average MSE

0.03

0.02

0.01

0

2

3

4

5

6
States
(a)

7

8

9

10

Rate/Distortion of Quantized Connectivity Matrices (Correct)
0.04

Average MSE

0.03

0.02

0.01

0

2

3

4

5

6
States
(b)

7

8

9

10

Figure 3.12: Rate/Distortion curves average and standard deviation across all subjects for each
response type: a) error; b) correct.

responses vs correct responses. As expected, the entropy contributions of nodes located around
FCz, in the vicinity of the ACC, were signiﬁcantly greater, p ≤ 0.02 Wilcoxon Rank Sum, in
error responses. Across all realizations of the topographic plots, the difference between Error and
103

Figure 3.13: Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values
using Lloyd-Max quantizer optimization for Q = 2 levels.
Correct responses remained more signiﬁcant around the frontal central region nodes than other
brain regions. Regardless of the number of quantization levels, these signiﬁcant differences still
occurred. Beyond quantization levels of Q = 10, distortion signiﬁcantly decreased; however, there
was no signiﬁcant difference between the response types in terms of the local entropy measures.

104

3.6

Conclusions

In this chapter, new local feature extraction techniques were introduced for graphs. Well-known
signal processing and information theoretic measures were adapted for explaining the underlying
structure of a graph as well as for identifying possible anomalies within the network. With applications to the karate network and the ERN data, the localized graph energy measure was shown
to identify nodes which contribute most to the structural dynamics of the graph. The proposed
Laplacian Hückel energy was shown to be more discriminate than Gutman’s or the conventional
Hückel energy deﬁnitions. Furthermore, the proposed energy measure identiﬁed the local neural
populations of the brain which contributed most to the energy of the brain network. Energy during
error responses was signiﬁcantly greater than that for correct responses particularly in the mPFC
and lPFC regions. The greatest 1-hop neighborhood energy was observed in the mPFC and the
greatest 2-hop neighborhood energy was observed in the lPFC.
Next, the proposed deﬁnition of graph entropy rate outperformed published entropy measures
by correctly classifying various graph topologies in terms of their structural complexity. The proposed entropy rate computed by modeling the adjacency matrix as a Markov process, revealed
node members in the perturbed region of a simulated network. Compared to the entropy rate definitions proposed by Burda and Szpankowski, the proposed measure was more discrimative for
determining the magnitude of importance of each node within the network. The entropy measure
also revealed local brain regions of increased complexity during the ERN. These areas were primarily situated in the frontal central region of the brain as expected according to the hypothesis
stated in this thesis.
Challenges still remain for adapting signal processing and information theoretic measures to
graphs. These challenges include computing exact energy and entropy bounds based on the number
105

of nodes, edges, and topology of the network. Szpankowski presented a method for computing
the lower bound entropy rate of random graphs but the next step is to compute the lower bound
across all topologies especially for scale-free networks. Also, a coding method which achieves the
entropy rates presented in this thesis would be beneﬁcial for compressing all types of graphs. The
graph compression and reconstruction algorithms should be applicable to both binary and weighted
networks.

106

Figure 3.14: Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values
using Lloyd-Max quantizer optimization for Q = 3 levels.

107

Figure 3.15: Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values
using Lloyd-Max quantizer optimization for Q = 4 levels.

108

Figure 3.16: Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values
using Lloyd-Max quantizer optimization for Q = 5 levels.

109

Figure 3.17: Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values
using Lloyd-Max quantizer optimization for Q = 6 levels.

110

Figure 3.18: Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values
using Lloyd-Max quantizer optimization for Q = 7 levels.

111

Figure 3.19: Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values
using Lloyd-Max quantizer optimization for Q = 8 levels.

112

Figure 3.20: Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values
using Lloyd-Max quantizer optimization for Q = 9 levels.

113

Figure 3.21: Signiﬁcance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values
using Lloyd-Max quantizer optimization for Q = 10 levels.

114

Chapter 4
Hierarchical Multi-Subject Community
Detection for Weighted Neuronal Networks

4.1

Introduction

One important characteristic of complex networks is modularity. A module is composed
of a densely connected set of nodes with sparse connections between modules in the network
[151] [152]. Node members of each module, or cluster, share a common trait, function, or other
similar characteristic. Identifying this inhomogeneity of connections is important for understanding how network topology and function are related. Some of the applications of community detection include understanding the functional pathways of metabolic networks [113], discovering
topic related webpages on the internet [153], and evaluating the rapid spread of disease throughout
populations [41]. It’s hypothesized that the modular structure of complex neuronal networks is
indicative of robustness [154] and contributes to functionality [113] by compartmentalizing speciﬁc functions within certain cortical regions without perturbing the rest of the network [155].
Intra-module associations are thought to describe the segregation of information processing while
115

the inter-module associations testify to the integration of information processing across distant
brain regions [4]. This modular structure is therefore a compelling model for emphasizing the
information distributing nature of the brain’s structure and function mappings [27, 61].
Community detection is an important topic in the study of complex networks yet there does not
exist a precise formulation to uncovering the "correct" underlying structure; therefore, a vast array
of different methodologies for community detection and cluster identiﬁcation have been proposed
throughout the literature [156] [119]. These methodologies are inspired by stochastic theory [157],
[158] signal processing [159], and genetic algorithms [160]. The availability of highly complex
networks has attracted the need for fast algorithms which ﬁnd reasonably good partitions. The
primary reason for the existence of numerous community detection methodologies is due to the
lack of a priori knowledge of what a ‘natural’ partitioning of a network really is. How many
communities should exist in the network? Are the sizes of the communities equal to each other or
differ signiﬁcantly? For example, should a network composed of 100 vertices split into two groups
of 50 or two groups of 25 and 75? Is all the information provided in the network important or may
some of it be classiﬁed as noise or anomalies? Also, should vertices be restricted to membership
in one cluster or be allowed to participate in multiple clusters? The list of questions can go on and
many of them may be answered with a priori knowledge of the community structure of the network.
However, the availability of this knowledge would negate the need for a clustering algorithm.
Therefore, community detection becomes a bootstrapping problem which depends solely upon the
original data representing the graph, i.e. the weighted or unweighted connectivity matrix. There are
a variety of clustering algorithms which address this problem including spectral clustering [161],
Girvan-Newman [151], and k-means [162]. These and other algorithms found in the literature
have been designed for a variety of applications. This chapter primarily addresses community

116

detection issues unique to analyzing the modular structure of the human brain.
Identiﬁcation of functional modules or communities in the brain has been originally addressed
using methods like principle component analysis [163] and independent component analysis [164]
which put non-physiological constraints in the obtained components such as orthogonality and independence. Recently, methods from spectral analysis of graphs have been used to extract modules
[165, 140, 119, 166] by mapping the functional connections to a multi-dimensional subspace deﬁned by a set of eigenvectors. This method, however, does not reveal a hierarchical decomposition
of the network. Meunier et. al. [167] argues that most information processing systems possess a
multi-scale modularity characteristic, i.e. are hierarchically decomposable into a ﬁnite number of
modular levels. This property allows a complex system to be more rapidly and robustly assembled
since it is embedded in a relatively low-dimensional space. Therefore, a hierarchical decomposition of a complex network, particularly the human brain, is a natural representation of modular
organization.
A key challenge in identifying modular organization of brain networks is determining a common structure across multiple subjects. Current work either focuses on obtaining the community
structure for the average connectivity network or on analyzing each subject individually and obtaining a common modular structure using a voting technique. Averaging neglects the variance
across subjects whereas conventional voting methods either require a priori knowledge of cluster labels or for one to ﬁrst acquire the modular structure of each subject [168]. Voting does
not permit convergence to an optimal modular structure since it requires only one iteration of the
process. Various studies [169, 170] have proposed alternatives for identifying similarities across
multiple subjects by introducing random effect analysis and Bayesian classiﬁers. With an application speciﬁc to module identiﬁcation across multiple subjects, a possible solution would be to

117

implement a new consensus based approach [74, 171, 172] in which the best modular organization
is identiﬁed by combining information shared across multiple modular structures and information provided by previous decompositions of the data set. Integrating modular information shared
across multiple subjects decreases the uncertainty that node pairs are randomly clustered together
and incorporating information from previous modular structures maintains consistency throughout
the hierarchical tree.
In this chapter, the most common methods designed for community detection are reviewed and
shown how they differ fundamentally. Furthermore, this chapter addresses challenges pertaining
to thresholding weighted graphs, community quality measures for weighted graphs, and cluster
analysis across multiple subjects. The proposed methods are applied to weighted graphs obtained
by computing pairwise phase synchrony values for the error-related negativity. The proposed community detection methods are applied to EEG data collected during a study of cognitive control in
the brain based on the error-related negativity. In particular, this study focusses on understanding
the modular organization of the brain during error processing.

4.2

Background on Community Detection on Graphs

In this section, we provide a review of various community detection measures, implementation of consensus clustering, and quality measures for quantifying the “goodness" of a discovered
community.

4.2.1

Community Detection on Graphs

Numerous methods for identifying community structures within a graph exist throughout the
literature [173]. These include spectral and hierarchical approaches. In this section, we brieﬂy
review some of the most popular published community detection algorithms.

118

4.2.1.1

Girvan-Newman

One of the pioneers of community detection in complex networks is M. Newman, who along
with M. Girvan, developed an iteratively divisive hierarchical algorithm referred to as the GirvanNewman method [151]. The algorithm iteratively removes edges in a graph that are deemed weakest compared to all existing edges where the strength of an edge is deﬁned through betweenness
centrality. Betweenness centrality is deﬁned as the fraction of shortest paths within the network
that travel through each edge, ei j , such that

CB (ei j ) =

∑
m=i=n∈V

σmn (ei j )
σmn

(4.1)

where σmn (ei j ) is the number of shortest paths between nodes vm and vn which contain edge
ei j and σmn is the total number of shortest paths between vm and vn . The edge assigned the
largest betweenness centrality is considered important to maintaining global network connectivity,
i.e. inter-connecting edges between clusters will have high betweenness centrality and edges within
clusters will have lower betweenness centrality. The Girvan-Newman algorithm computes the edge
betweenness for all edges and removes the edge with largest CB from the graph. The removal process is repeated until the graph is partitioned into a desired number of clusters. Girvan-Newman
algorithm performs well for identifying the boundaries of clusters and is easily implementable.
However, Girvan-Newman algorithm has the disadvantage of running in O(m2 n) time on an arbitrary network with m edges and n vertices and therefore would be computationally expensive for
non-sparse networks. Although it can be applied to weighted graphs, it is designed for unweighted
networks since betweenness centrality evaluates shortest paths. Given a fully connected network,
the shortest path will always be equal to one and betweenness centrality would have no effect on
the results. The algorithm also does not consider overlapping clusters where vertices may belong to
119

more than one cluster. Finally, connectivity is based on shortest paths rather than all possible paths
thus ignoring much of the available connectivity information particularly in a weighted network.
The problem with overlapping clusters is addressed by an extension of the Girvan-Newman
algorithm, the Cluster-Overlap Newman Girvan Algorithm (CONGA), proposed by Gregory et.
al. [174]. CONGA extends the Girvan-Newman algorithm by incorporating a ‘split betweenness’,
for splitting nodes into various clusters if a node is strongly connected to more than one cluster.
CONGA divides the betweenness of every edge by its corresponding weight such that

CB (ei j )new =

CB (ei j )
Wi j

(4.2)

where Wi j is the weight associated with the edge between nodes vi and v j . This is followed by
the calculation of each node’s ‘split betweenness’. Each node, vi , is split into two nodes, vi1 and
vi2 , and an imaginary edge is placed between them. Using Eq. 4.2, the edge betweenness for the
imaginary edge is computed, referred to as ‘split betweenness’. If there exists a node with ‘split
betweenness’ greater than the edge betweenness of all the edges connected to it, then the node is
split. Otherwise, the edge with maximum edge betweenness is removed as in the GN algorithm.
The process repeats itself until all nodes are isolated or until the desired number of clusters are
found. CONGA presents its results as a set of community structures C = {c1 , c2 , c3 , ..., ck } such
that community c1 contains one cluster, community c2 contains two clusters, and community ck
contains k clusters. CONGA addresses the problem of overlapping clusters but still depends on the
shortest paths to deﬁne clusters and is not suitable for fully connected weighted graphs.
4.2.1.2

Spectral Clustering

Another commonly used method for graph clustering is known as spectral clustering based on
the eigenvectors and eigenvalues of a matrix describing the graph. The spectral decomposition of a

120

graph can be applied to its adjacency matrix, A, laplacian matrix, L, normalized laplacian matrix,

L

and modularity matrix, B. The spectral properties can be be used for identifying the number of
clusters in the network and identifying memberships within a cluster. Since the matrices that are
decomposed are symmetric, they will have an orthonormal basis of eigenvectors, {u1 , u2 , ..., un }
with real eigenvalues λ1 ≤ λ2 ≤ ... ≤ λn such that Mui = λi ui , ∀i where M is any symmetric matrix
of size n × n.
Although spectral clustering can be performed on the adjacency matrix, it is typically performed on the Laplacian of a graph since it provides a better representation of network structure
[161]. The Laplacian of a graph is deﬁned as L = K − A where K is the diagonal matrix containing
the degree of each vertex such that


 k if i = j
 i



Li j =
 −1 if Ai j = 1




 0 otherwise

(4.3)

.
The Laplacian matrix is positive semi-deﬁnite and therefore its eigenvalues are all positive. The
multiplicity of the eigenvalue λ = 0 indicates the number of components in the graph. Provided we
are only interested in connected graphs, the multiplicity of λ = 0 is 1. It has been shown that the

L , provides signiﬁcantly better clustering results than the unnormalized
counterpart [161]. L is deﬁned as L = K −1/2 LK −1/2 such that
normalized Laplacian,











Li j = 








1

if i = j

1
if i and j are adjacent
ki k j
0

otherwise
121

(4.4)

.
The normalized Laplacian matrix has eigenvalue properties similar to the un-normalized Laplacian
matrix but has the extra advantage of having bounded eigenvalues such that 0 ≤ λ ≤ 2; where λ = 2
is indicative of a bipartite component within the graph.
It has been shown that the number of clusters within a network is revealed by the largest eigengap [175] [176] [177] [178]. To ﬁnd the largest eigengap, the eigenvalues are ﬁrst arranged in
descending order, i.e. λ1 ≥ λ2 ≥ ...λi ≥ λN where N is the number nodes in the network. Each successive pair of eigenvalues form an eigengap such that the ith eigengap is computed as λi − λi+1 .
The number of communities is then deﬁned as the location of the largest eigengap, e.g. there are i
clusters if the ith gap is the largest eigengap.
Another popular approach to community discovery is use of the eigenvectors to project the
network into a lower dimensional representation [165] [140] [119] [166]. A common method of
utilizing the eigenvectors is to take a subset, {u1 , u2 , ..uc } where c < n, of the computed eigenvectors and form an n × c matrix with the columns being the c eigenvectors. The maximum number
of clusters, k, which can be detected by any clustering algorithm is limited to k ≤ c. The elements
along the ith row of the matrix represent the projected node vector corresponding to node vi . One
of the most popular methods to evaluate a set of eigenvectors for identifying cluster membership
is the k-means algorithm [162] which minimizes the objective function

k

J=

n

∑ ∑ ||xij − c j ||2

(4.5)

j=1 i=1

where ||xij − c j ||2 is a distance measure between a data point xij , i.e. the jth element of the
ith eigenvector, and the cluster centre c j . c j is chosen randomly and designated the seed node.
However, one signiﬁcant problem with the k-means method is the uncertainty about the number of
122

and selection of eigenvectors to utilize in the process. Another problem pertains to the selection
of c j which can affect the resulting clusters. This remains a challenge in many other eigenvector
based clustering techniques such as fuzzy k-means, generalized Synchronization Cluster Analysis
[119], the Ng-Jordan-Weiss algorithm (NJW) [179], and power iteration clustering (PIC) [180].
An alternative to choosing a subset of eigenvectors of a network is to evaluate only one eigenvector for the purpose of bi-partitioning, i.e. identifying a minimal cut of the graph. This eliminates
the problem of searching for the optimal set of eigenvectors. According to Holzrichter et. al. [181],
the optimal minimal cut of a graph matrix is deﬁned by the eigenvector, u2 , associated with the
second smallest non-zero eigenvalue, λ2 , of the Laplacian or normalized Laplacian matrices. This
eigenvector is referred to as the Fiedler vector, uF , and deﬁnes a set of two clusters {C1 ,C2 } where
vi ∈ C1 |uF (i) ≥ 0 and vi ∈ C2 |uF (i) < 0. The Fiedler partition can iteratively be applied to each
successive partition in order to achieve a clustering with k > 2. The partitioning of a graph will
be referred to by the syntax, FiedlerPartition(G), which splits graph G into two clusters, G1 and
G2 such that G1 ∪ G2 = V . The advantage of using the Fiedler vector over other eigenvectors is
that the partitions performed on the network are guaranteed to be optimal, compared to identifying
multiple splits in one iteration, since the Fiedler vector is the optimal solution to a min cut problem.
The Fiedler vector also allows one to obtain the hierarchical structure of the network. Finally, the
Fiedler partitioning method can be applied to a fully connected weighted matrix without the need
of thresholding as opposed to the Girvan-Newman algorithm.

4.2.2

Consensus Clustering

In many clustering problems, it is common to apply different algorithms to the same data
and then use a consensus method to combine the results [182]. Two popular consensus methods
are consensus averaging and the hypergraph partitioning algorithm (HPGA). The ﬁrst approach

123

averages m connectivity matrices to obtain W = [wi j ] where
1 m (r)
wi j =
w
m ∑ ij
r=1

(4.6)

A clustering algorithm, e.g the FiedlerPartition(G), can then be applied to W to obtain a hierarchical community structure. Inter-subject variability is not considered through this approach. The
computational complexity of O(kN 3 ) is attractive for identifying a community structure for numerous graphs, however conﬁdence in the resulting clustering is low since variance between the
multiple similarity matrices is neglected. Variance between matrices should be considered in order
to decrease the uncertainty that a node pair are not randomly assigned to the same cluster.
A second option for obtaining a common community structure across multiple graphs is to
identify the community structure of each individual graph. A process is then implemented which
uses information common across the multiple community structures to identify a global community
structure. With the availability of multiple graphs, nodes within each similarity matrix are ﬁrst
(r) (r)
(r)
partitioned into a set of clusters C(r) = {c1 , c2 , ..., c }, i.e. base clusterings, using a clustering
k
algorithm, where k is restricted to be the same across all graphs. These base clusterings can be used
to identify a common community structure using the HyperGraph-Partitioning Algorithm (HGPA)
[183]. HGPA treats each cluster across all base clusterings as a hyperedge within a single global
graph. The algorithm is a multilevel graph partitioning system which partitions this graph in three
steps: 1) compress the graph by collapsing hyper-edges, 2) partition the compressed graph using a
minimum cut objective function, and 3) decompress the partitions and repeat the process. HGPA
has a computational complexity of O(kNh) where h is the number of hyperedges, however its
overall complexity is dependent upon the total complexity of the clustering algorithms used to
obtain the base clusterings. HGPA has the disadvantage of generating clusters of approximately

124

equal sizes, even though in real networks, equally sized clusters are unlikely.

4.2.3

Quality Measures for Community Structures

When a community structure is discovered, it is difﬁcult to determine if it is a suitable representation of the network of interest without a priori knowledge. Therefore, it is necessary to use
a metric which quantiﬁes the quality of the discovered community structure. The literature overwhelmingly supports the use of a measure known as modularity. Modularity was developed by
Newman et. al. [156] to determine if the number of edges between clusters is lower than expected
in a random graph, i.e. less dense arrangement of inter-connecting edges, and if the number of
edges within clusters are higher than expected in a random graph, i.e. highly dense arrangement
of intra-connecting edges. Modularity compares a detected community structure with a network
of equivalent size where edges are randomly linked among node pairs with probability Zi j = ki k j
where ki is the degree of node vi . This value of Zi j maintains a fair comparison between the actual
network and the random network, such that the expected degree of node vi in the random network
should be equivalent to the actual degree of node vi in G such that ∑ j Zi j = ki . Since the random
connection between vi and v j is dependent on their degrees in graph G, then Zi j is modiﬁed as
ki k j
Zi j = 2m where the factor of 2 accounts for counting each edge twice since the network is symmetric. If A is the adjacency matrix of G, then modularity is proportional to Ai j − Zi j summed
over all node pairs belonging to the same cluster such that

Q=

1
A − Zi j σ (ci , c j )
2m ∑ i j
ij

(4.7)

where σ (ci , c j ) = 1 if nodes vi andv j are in the same cluster and 0 otherwise and m is the number
of edges in the network. This can also be rewritten as

125

Q=

1
B σ (ci , c j )
2m ∑ i j
ij

(4.8)

ki k j
where Bi j = Ai j − 2m is the modularity matrix. Modularity may be negative or positive where
positive values indicate the possible presence of a community structure. Therefore, one can quantify the quality of a community structure by evaluating the value of Q. The modularity matrix B
is a symmetric matrix which may consist of both positive and negative values. However, if all the
entries of B are negative, then the corresponding network is considered indivisible, i.e. any clustering algorithm will not reveal a good community structure and Q will be assigned low, possibly
negative, values. The modularity matrix, B, can also be used to perform spectral clustering in the
same way as the Laplacian matrix.
Modularity for weighted graphs can be deﬁned by modifying Equation 4.9 as

Q=

si s j
1
∑ Wi j − 2m σ (i, j)
2m i j

(4.9)

such that m is the number of edges, si (indegree) is the sum of the ith row in W , and σ (i, j) = 1 if
vi and v j are in the same cluster and 0 otherwise.
In ideal cases, clusters will not overlap. However, in real world networks, this may not always
be the case. Therefore it is imperative to used a modiﬁed version of modularity which considers nodes belonging to more than one community. Nicosia et. al. [184] proposed an improved
modularity measure where a ‘belonging factor’ for each node vi , αi,c , is deﬁned to quantify the
strength of node vi in cluster c. αi,c = 1 for i ∈ c where s is the number of clusters a node belongs
s
to. Next, a coefﬁcient for deﬁning the ‘belonging strength’ of an edge within a particular cluster,
c, is calculated as β (ei j , c) =

1
where f (αi,c ) is a linear scaling
− f (α j,c )
− f (αi,c )
(1+e
)(1+e
)
126

function, deﬁned as f (x) = 60x − 30 in the literature. The modiﬁed modularity is deﬁned as

Q=

1
β (e , c)W (e ) −
ij
ij
2m ∑
ij

β 2 (ei j , c)ki k j
2m


 σ (c , c )
i j

(4.10)

If no clusters overlap, then β (ei j , c) = 1 and the extended version of modularity will produce the
same results as the original modularity presented in Eq. (4.8).

4.3

Hierarchical Modular Detection Algorithms for Weighted
Neuronal Networks

In this section, two methods for evaluating the modular structure of a fully connected weighted
network are presented. The ﬁrst combines CONGA and Nicosia’s modularity in an iterative manner to identify the ‘best’ overlapping modular structure of neuronal data. The second algorithm is
a hierarchical consensus clustering method which utilizes the properties of the Fiedler vector for
identifying a community structure common across multiple neuronal data sets.

4.3.1

Overlapping Community Structures in Weighted Networks

Modularity has been established as a reliable method to determine if a community structure is
optimal. In the case of weighted and symmetric graphs representing the functional connectivity
between neuronal populations, several problems exist in determining the optimal cluster arrangement. First, clustering algorithms typically require fully connected weighted graphs to be mapped
to binary graphs by thresholding. Thresholding allows the important information pertaining to possible clusters to be retained while ﬁltering the unimportant information. As discussed in previous
chapters, choosing a threshold which optimally represents the connectivity structure of a network
remains a challenge. Sparsely weighted graphs may be evaluated by Girvan-Newman and CONGA
by scaling edge betweenness by the weight of the edge however this idea fails for non-sparse or
127

fully connected weighted matrices as is the case for neuronal data where each pair of regions is
assigned a weighted relationship. These methods fail since all edges would be considered equally
important according to the deﬁnition of edge betweenness centrality. Therefore, thresholding is
an important issue and discovering an optimal threshold which preserves the cluster information
hidden within the data is necessary.
Second, most clustering algorithms also assume a node does not participate in multiple clusters.
In real-world networks, the expectation of a community structure in which clusters are not allowed
to overlap is an unreasonable assumption. Overlapping clusters are increasingly expected as the
network becomes more complex or larger in size [185]. Clusters within real-world networks, such
as neural assemblies in the brain, must interact with each other in order for the network to function
properly. These interactions may be due to nodal members common to multiple clusters. CONGA
is one of the few clustering algorithms which considers overlapping clusters and therefore it is the
basis of this optimal cluster identiﬁcation process. Nicosia’s modularity is applied to evaluate the
resulting overlapping community structures.
Considering these challenges, the goal is to identify a single sparse binary mapping of the
weighted data which results in the optimal community structure based on threshold (T ), number of
clusters (k), and modularity (Q). We present an exhaustive search algorithm in terms of the threshold and number of clusters and determine the best combination of threshold and number of clusters
based on Nicosia’s modularity since this permits us to evaluate a hierarchical soft clustering of the
data. Using CONGA, the nodes of the weighted matrix are categorized into 2 ≤ K ≤ k clusters
for each binary mapping generated by T where 0 ≤ T ≤ 1. The contribution of each node to the
community structures is deﬁned by an N × N ‘participation matrix’ as PM k (i, j) = σ (ck , ck )
i j
T,c
for each community structure generated by each T . Modularity, QT , is also computed for each

128

thresholded graph. σ (ck , ck ) = 1 if nodes vi and v j participate in the same clusters within the
i j
community structure ck and 0 otherwise. A ‘participation score’ for each node, vi , for within ck , is
computed as PS

1
(i) = N ∑N PM k (i, j) which indicates how central each node is within
j=1
T,ck
T,c

their respective cluster. This is applied to all PM matrices generated for the network. Next, all
PS k are categorized according to the degree range, R, for which the associated threshold T
T,c
falls within. This generates a set of degree ranges, {R1 , R2 , .., R j } ∈ R∗ , where j is the cardinality
of degree range categories, which are averaged together and stacked side by side with all ck to
form the N × k participation score matrix PS(R j ). The modularity values are averaged to obtain an
1 × k modularity vector such that Qk (R j ) = 1 ∑
Q(T ). The following steps summarize
|T | ∀T ∈R j
the evaluation procedure.

• Step 1: Acquire the fully connected weighted matrix, G = (V, E), where {wi j ∈ E|vi , v j ∈
V }.
• Step 2: Iteratively threshold G by T , 0 ≤ T ≤ 1, to obtain a binary adjacency matrix (Figure
4.1).

Figure 4.1: Weighted matrix to adjacency matrix for a given threshold

129

• Step 3: Apply CONGA and acquire K clusters such that 2 ≤ K ≤ k.
• Step 4: Compute Nicosia’s modularity for each community structure (Equation 4.10).
• Step 5: Obtain Participation matrices and Participation Scores across thresholds and different
community structures, K (Figure 4.2).

Figure 4.2: Participation Scores of various clustering derived from various thresholds of the network

• Step 6: Average Participation scores based on the corresponding average degree, R. (Figure
4.3).

The ﬁnal participation score for node vi is deﬁned as

PS(i) =

1
∑ PSi(R j )
k(N − 1) j
130

(4.11)

Figure 4.3: Average Participation scores in terms of network degree

A large PS(i) is indicative of a node which participates in overlapping clusters with high frequency,
i.e. a possible central hub of the network.
Next, to evaluate average participation scores for the entire network with respect to degree
range, R j , and community structure, ck , compute

P(R j ) =

N

∑ PSi(R j )

(4.12)

i=1

and form the j × k participation matrix, PSmat , where the jth row is P(R j ).
Finally, to evaluate optimal modularity, the 1 × k modularity vector of each degree range, R j ,
containing the average modularity for each community structure, Qk (R j ), are assembled to form
a j × k average modularity matrix. Maximum modularity can then be determined with respect to
degree range, R j , and community structure ck .
131

4.3.2

Hierarchical Spectral Partitioning Algorithm for Identifying a Common Community Structure among Multiple Subjects

Since the literature has consistently demonstrated that the Fiedler vector is highly effective in
partitioning graphs into clusters [186, 187], in this study we will use the Fiedler vector for performing consensus clustering across multiple weighted graphs. The original connectivity matrices
are bi-partitioned into two clusters using the Fiedler partitioning method. This results in a cluster
matrix T r such that


 1 if nodes v , v are in the same cluster
i j
r (i, j) =
T

 0 otherwise
and r = {1, 2, ..., m} and m is the number of connectivity matrices, e.g. number of subjects in the
brain study, evaluated. In order to ﬁnd the common community structure across multiple graphs,
T r (i, j)
we introduce a co-occurance matrix P where P(i, j) = ∑m
and P(i, j) ∈ [0, 1]. P(i, j)
r=1 m
is the probability that a pair of nodes are members of the same cluster across multiple graphs.
The similarity matrix reﬂects the strength of a direct relationship between a node pair, whereas P
reﬂects the likeliness that a pair of nodes are in the same cluster across all subjects.
The Laplacian matrix of P is computed and the Fiedler vector is found to form a bi-partition of
P into a community structure composed of clusters c1 and c−1 . Since P represents the probability
that a node pair should be clustered together, the Fiedler partition of P represents the community
structure common to all graphs. The initial partition set, C = {c1 , c−1 }, contains 2 clusters but if
k > 2 is desired, the process can be repeated by selecting a cluster in C to partition. In this case,
c1 or c−1 may be selected based on a user deﬁned cost function. Next, sub-matrices are extracted
from the original connectivity matrices such that they only contain the nodes of the chosen cluster.

132

Algorithm 5 Probabilistic Fiedler Clustering Algorithm
1: Input: m N × N dimensional graphs, G = {G1 , G2 , ..., Gm } with vertices V = {v1 , v2 , .., vN }
and edges E r = {wr j : vi , v j ∈ V } such that Gr = (V, E r ) and r = {1, 2, ..., m}.
i
2: Input: Number of clusters, k.
3: Output: k clusters C = {c1 , c2 , ..., ck } where c j ⊂ V .
4: C = 0
/
5: for t = 2 to k do
6: P = 0|V|×|V|
7: for r = 1 to m do
ˆ
ˆ
8:
submatrix Gr ⊂ Gr |Gr = (V, E r )
ˆ
9:
(V1 ,V2 ) = SubRoutine(Fiedler Partition(Gr ))
r (i, j) where
10:
P (i, j) = P (i, j) + T
T r (i, j) =

1 if nodes vi , v j ∈ V1 or vi , v j ∈ V2
0 otherwise

11: end for
12: P = P
m
13: (V1 ,V2 ) = SubRoutine(Fiedler Partition(P))
14: C = C ∪ {V1 ,V2 }.
15: if t = k then
16:
V = ci |i = maxq {Hcq } and cq ∈ C
17:
E r = {wr j : vi , v j ∈ V }
i
18:
C = C\{ci }
19: end if
20: end for

These sub-matrices are used to derive the new sub co-occurance matrix, Py , where y = 1 if cluster
c1 was selected and y = −1 if cluster c−1 was selected. The Fiedler partition will result in two
y
y
y y
new clusters, c1 and c−1 . The ﬁnal cluster set is C = {c−y , c1 , c−1 } which is a concatenation of
the two new clusters with the original cluster that was not chosen for bi-partitioning. Algorithm
5 describes this process for obtaining community structures for a given number of clusters. The
algorithm, in this case, implements the homogeneity cost function, Hcq , for selecting subclusters
for bi-partitioning. Homogeneity is a quality function which will be described further in the next
section.
133

Algorithm 6 Fiedler Partition
1:
2:
3:
4:
5:
6:
7:
8:

Input: graph G = (V, E).
Output: Vertex sets V1 and V2 .
Compute Laplacian Matrix, L, of G.
Compute |V | eigenvectors, u, and eigenvalues, λ , of L.
Order eigenvalues in ascending order: λ1 ≤ λ2 ≤ ... ≤ λ|V | .
uF = ui where i = minq {λq }|λq = 0.
for j = 1 to |V | do
vj ∈

V1 if uF ( j) ≥ 0
V2 if uF ( j) < 0

9: end for

4.4

New Weighted Graph Clustering Quality Measures

The primary problem with iteratively dividing a network is determining the optimal number of
clusters. The most commonly used optimization measure, modularity [156], compares a community structure to the expected community structure of a single random graph such that there exists
a high number of edges within clusters and low number of edges between clusters. Unfortunately,
modularity does not always result in the highest value for the true community structure [188] and
can reveal a suboptimal structure which may be due to the simplicity of the random model comsi s j
puted through 2m in Equation 4.9. In this section we propose alternative measures for quantifying
the quality of a cluster partition across weighted networks.

4.4.1

Comparing Graph Partitions to Multiple Random Networks

In this section, it is proposed that a comparison between the detected weighted community
structure to numerous randomly weighted community structures be performed. It is expected that
vertex pairs connected with large weights are more likely to be in the same cluster as opposed to
pairs connected with a low weight. A community matrix is deﬁned as
134



W
i j if vi and v j are in the same cluster
k (i, j) =
C

 0 otherwise

(4.13)

where k is the number of clusters. The random community matrix is obtained by randomly assigning each edge to one of k clusters and computing equation 4.13 for the random assignments
(k,p)
to obtain C
, such that p = {1, 2, ..., r} is the number of permutations. Membership size of
rand
each cluster may differ between the random model and the detected community structure for each
p, but the number of clusters must be the same. A large p is recommended such that numerous
random clustering possibilities are evaluated. The average inter-cluster weight corresponding to
each clustering matrix is computed as

ˆ
Ck =

1
∑ Ck (i, j)
N(N − 1) i j

(4.14)

and

1
(k,p)
ˆ (k,p) =
C
C
(i, j)
rand
N(N − 1) ∑ rand
ij

(4.15)

ˆk
ˆ (k,p) provides the random inter-cluster value.
Averaging over all p such that Crand = 1 ∑ p C
rand
|p|
ˆ
ˆk
Then, k is found such that Q = Ck − Crand is maximum. A maximum value indicates that the
vertices within a cluster share signiﬁcantly larger weights than if they were clustered with any other
set of vertices. This method relies heavily on weights and demonstrates how weight distribution is
a signiﬁcant contributor to community structure.

135

4.4.2

Homogeneity and Completeness

By the deﬁnition of a cluster, the pairwise connections within a cluster must be stronger than
the inter-cluster connections. In this section, measures that evaluate the quality of a particular
clustering structure based on the distribution of the inter and intra-edge distributions across m
graphs are proposed. These distributions will be deﬁned similar to probability mass functions
(pmfs). Prior to deﬁning the pmfs of intra-cluster and inter-cluster edges, a function that maps
(r)
(r)
(r)
the continuous edge values to a discrete alphabet is deﬁned as f : Wi → Si where Wi =
(r)
(r)
(r)
{wi j ∈ [0, 1]} refers to the ith row of the rth similarity matrix, Si = {si j ∈ {1, 2, ..., N}}, and
r = {1, 2, ..., m}. The elements of each row of the connectivity matrix across subjects are mapped to
discrete integer values between 1 and N to eliminate the variation of edge strengths across subjects
and extract only relational information about the pairwise edge strengths. The rank function is
proposed to perform this mapping such that the node pair with the largest similarity measure is
assigned a 1 and the weakest node pair is assigned N.
When a particular cluster set C = {c1 , c2 , ..., ck } is identiﬁed, the probability mass function of
intra-cluster ranks for a particular cluster, ct , is deﬁned as




intra (β )

Fct
intra
Pct (β ) =
 ∑N F intra (β ) 
β =1 ct

(4.16)

where

intra (β ) =
Fct

m N

∑ ∑ δ (srj , β )
i

r=1 i, j

136

| vi , v j ∈ ct

(4.17)

t ∈ {1, 2, . . . , k}, β ∈ {1, 2, . . . , N}, and

δ (x, y) =



 1 if x = y

 0 otherwise

Similarly, the probability mass function of inter-cluster ranks for cluster ct is deﬁned as




inter (β )

Fct
inter (β ) =
Pct
 ∑N F inter (β ) 
β =1 ct

(4.18)

where

inter
Fct (β ) =

m N

∑ ∑ δ (srj , β )
i

| vi ∈ ct ; v j ∈ ct
/

(4.19)

r=1 i, j

The distribution of edge ranks within a cluster is expected to be concentrated around smaller values
compared to the distribution of ranks between clusters since this would be indicative of optimal
segregation in a community structure. This is quantiﬁed using a one-sided Mann-Whitney U test
˜
˜ intra
such that the relationship between medians, x, is expected to be x(Pct ) <˜ (Pct ) with a
x inter
signiﬁcance of p ≤ α. A score is assigned to cluster ct as

Qct =



 1 if x(Pc
˜ intra ) < x(Pct )
˜ inter
t

(4.20)


 0 otherwise
k
This is repeated for all k clusters and an average quality score is computed as Q = 1 ∑t=1 Q(ct ).
k
Q will always be within the range of 0 and 1 where values closer to one indicate that most clusters
contain strong within cluster relationships and weak between cluster relationships across all graphs.
The proposed quality measure ensures intra-cluster relationships are signiﬁcantly ‘stronger’

137

than inter-cluster relationships. However the next step is to quantify homogeneity and completeness, two principle characteristics which further determine the quality of clustering. A homogenous cluster contains only data points which belong to the same class while a complete cluster
contains all possible data points within the sample space (Figure 4.4).

(a)

(b)

(c)
Figure 4.4: Variations in homogeneity and completeness where the true number of modules is 3;
a) High homogeneity and low completeness; b) High completeness and low homogeneity; c) High
homogeneity and completeness.

Homogeneity and completeness are approximately inversely proportional to each other such
that a cluster with high (low) homogeneity will have low (high) completeness. Similar measures
such as F-measure [189] and V-measure [190], have been used in the literature to quantify the
quality of a cluster. Both measures, however, require a priori knowledge of class labels. In most
cases, this ‘ground truth’ is unknown and therefore alternative measures of cluster accuracy are
needed.
138

For this reason, the characteristic that homogeneity is inversely proportional to the variance of
edge ranks within a cluster is exploited. If a particular cluster is homogenous, then we would expect
the pairwise connection strengths among the members of that cluster to be close to each other, thus
implying the ranks of the weights to have small variance. We propose to quantify this variation
through a measure of normalized entropy such that the lowest homogeneity score is obtained for a
cluster containing a uniform representation of ranks or large variation among the edge weights and
the maximum homogeneity score is obtained for a cluster containing only one rank. Therefore,
a normalized entropy measure of the cluster’s intra-cluster rank distribution would be indicative
intra )
k H(Pc
of homogeneity. Homogeneity is deﬁned as Hct = 1 − 1 ∑t=1 logt N and will be maximum
k
2
intra ) << log N.
when H(Pct
2
Similarly, we deﬁne a metric to quantify completeness using relative entropy between the intercluster rank and intra-cluster rank distributions. Rank distributions for inter-edges and intra-edges
are expected to be different from each other to maximize completeness. The similarity between
two distributions is commonly quantiﬁed using divergence measures. In this paper, we propose
to use the Jensen-Shannon divergence measure [191] since it is symmetric and bounded. With
respect to completeness, a divergence measure approaching 1 is synonymous with increased comk
intra inter
pleteness. Completeness of a cluster is therefore deﬁned as Cct = 1 ∑t=1 JS(Pct , Pct )
k
p(i)
q(i)
1
inter
where JS(p, q) = 2 ∑i p(i)log2
+ ∑i q(i)log2
, p = Pct , and
0.5(p(i)+q(i))
0.5(p(i)+q(i))
inter
k
k
ˆ
ˆ
q = Pct . Average completeness, C = 1 ∑t=1 Cct , and average homogeneity, H = 1 ∑t=1 Hct ,
k
k
are computed across all clusters.
An approach to optimize both homogeneity and completeness is to maximize a cost function,

ˆ
ˆ
λ = Cγ + H(1 − γ)

139

(4.21)

where γ ∈ [0, 1], for the whole community structure while maintaining an acceptable user deﬁned
value of Q.

4.5
4.5.1

Results
Overlapping Clusters in the Brain Network During the Error-Related
Negativity

Since the data is in the form of fully connected weighted matrices, the proposed optimization clustering process using CONGA was well suited since it evaluated a range of thresholds to
identify an optimal community structure. First, the phase synchrony data from all 90 subjects
were averaged together to form a single error and correct response matrix. Then, the network was
evaluated for 2 ≤ k ≤ 10 clusters and identiﬁed the degree range, R, and cluster size, k, which
resulted in different modularity values, Q. Finally, the difference between the participation scores
belonging to the two response types, Error and Correct, for each degree range is then found as
error
correct || .
||C − E|| = ||PSmat − PSmat
F
Table 4.1: Top 3 electrodes with largest differences between Correct and Error ‘participation
scores’
Degree Range, R ||C − E||Frobenius
31-35
6.62
26-30
6.73
21-25
7.18
16 -20
5.58
11-15
4.75
6-10
4.29

Electrode 1
F8
FC4
CP2
P2
F1
F6

Electrode 2
AF3
AF8
CP4
AF8
P2
F1

Electrode 3
CP2
AF3
FT7
C5
F6
FC4

Figures 4.5 and 4.6 are plots of modularity versus degree range R j and cluster size k for Correct
and Error responses, respectively. Clusters with high modularity were observed for sparsely connected graphs, R ≤ 25, and for smaller number of clusters k ≤ 8. This is an expected result since
140

sparsely connected binary mappings of the phase synchrony index computed from EEG exhibit
small-world network characteristics. Therefore, this results in tightly formed clusters [192]. Thus,
a high modularity would be expected for these lower degree ranges where an underlying organized
structure is observed.
A two-sample t-test was applied for each electrode to determine the electrodes which showed
signiﬁcant differences between Error and Correct based on the ‘participation score’. Table 4.1 lists
the top three electrodes with respect to the degree ranges that demonstrate the most signiﬁcant
(p ≤ 0.05) difference between Correct and Error responses. The table shows that the largest differences occur for degrees 21 ≤ R ≤ 25 and 26 ≤ R ≤ 30. The electrodes that demonstrated the most
signiﬁcant difference between Correct and Error responses are FC4, AF8, AF3, CP2, CP4, and
FT7, corresponding to the frontal and central regions of the brain in accordance with prior work
[19]. Finally, the optimal size, k, for a clustering community was determined by ﬁnding the k
which maximizes ∑N (PSC − PSE k ) in 21 ≤ R ≤ 25 for 2 ≤ k ≤ 10 and is found to be equal
i=1
R,ci
R,ck
i
to 6. The clustering community corresponding to the largest difference between Correct and Error
responses are shown in Figures 4.7 and 4.8. These optimal clusters have an average modularity
of 0.5 for Correct responses and 0.45 for Error responses. It is observed that many of the nodes
participate in more than two clusters during both Correct and Error responses. The plots associated
with the Correct response demonstrate global non-localized participation of neural regions during
the task response test as demonstrated by the large overlapping clusters spread around the topographic line plots. However, the plot of Figure 4.8b strongly indicates that the neural assemblies
located around the anterior cingulate cortex speciﬁcally formed a highly synchronized cluster in
response to the ERN. As expected, this region shows the primary distinguishing graph features between Correct data and Error data since the ACC would only be activated during an error response.

141

Therefore, the proposed optimization technique identiﬁed the distinguishing features of neuronal
data represented by fully connected weighted matrices.

4.5.2

Hierarchical Spectral Consensus Clustering Analysis

In this section, the proposed Fiedler Consensus Algorithm is implemented for revealing the
hierarchical modular decomposition across multiple graphs. The optimal modular organization
ˆ
will be identiﬁed by maximizing the quality measure Q, the homogeneity H, and the completeness
ˆ
score C proposed in section 4.4.2. First, the results obtained for a set of simulated networks will
be compared to those obtained through averaging the similarity matrices and HPGA. Next, the
proposed measures will be used to identify a modular structure which best describes the multivariate relationships across multiple subjects from connectivity graphs obtained from EEG data.
The parameters of the quality measure, Q, and cost function, λ , were set to α = 0.01 and γ = 0.5,
respectively.
4.5.2.1

Comparisons with Averaging and Conventional Voting Consensus Measures

Ninety simulated networks consisting of 64 nodes and composed of 8 modules were generated
to evaluate the Fiedler consensus method against two other consensus clustering methods, averaging and the HGPA. A mixing parameter was selected such that the difﬁculty of identifying the true
modular structure increased with decreasing η where 0 ≤ η ≤ 1. The pairwise elements of each
of the simulated connectivity matrices ranged between 0 and 1. The weights of the intra-modular
edges were selected from a log-distribution with a mean of µintra = 0.8 and a standard deviation
of σintra = 0.1. The weights of the inter-modular edges were selected from a log-normal distribution with a mean of µinter = (1 − η)µintra and a standard deviation of σinter = 0.1. Across all
networks, each module had 8 members.
Using the proposed Fiedler Consensus Algorithm, the Averaging method, and HGPA, the net-

142

works were evaluated for 2 ≤ k ≤ 20 modules. The best modular structure was selected where Q
and λ were maximum. Table 4.2 reports the number of modules for each method with respect to
varying degrees of inter-modular strengths. The proposed approach performed best at identifying
the correct number of modules for η =1,0.5, 0.1 while Averaging and HPGA were increasingly
less accurate. Therefore, the Fiedler Consensus Algorithm, along with the maximization of Q and
λ is suitable for identifying the optimal number of modules in a network.
Table 4.2: Number of modules identiﬁed through optimization of Q and λ in simulated data.
Method/η
Fiedler Method
Averaging
HGPA

4.5.2.2

1
8
7
8

0.5
8
7
7

.1 0.05
8
9
7
3
10 20

0
20
17
19

Hierarchical Modular Structure of the Brain During the Error-Related Negativity

The new time-varying measure of phase synchrony in conjunction with graph measures are applied to a set of EEG data containing the error-related negativity (ERN). First, the proposed Fiedler
consensus approach was applied to the set of Error and Correct data in order to identify an optimal
modular structure. Hierarchical decomposition of the networks stopped when Q and λ were maximized. Optimization revealed Error responses were best represented by a structure composed of
4 modules (Figure 4.9), with Q = 1 and λ = 0.17, and Correct responses with 2 modules (Figure
4.10), with Q = 1 and λ = 0.15. Figures 4.11 and 4.12 reveal sparser versions of Figures 4.9 and
4.10 such that only node pairs found to possess signiﬁcantly greater phase synchrony, (Wilcoxon
Sign-Rank, p ≤ 0.05), during Error responses compared to Correct responses are shown. The optimal modular organization describing Error responses consists of three distinct modules located
around the mPFC and lPFC brain regions. The appearance of the central module is driven by increased mPFC activity which may be indicative of the activation of a monitoring system, anterior
143

cingulate cortex (ACC), which signals the need for enhanced cognitive control during conditions
of conﬂict or after an error. The optimum modular organization of the brain during Correct responses does not indicate any segregation among nodes in the frontal region. The only partition is
between the parietal and frontal regions of the brain. Since an error was not committed, activation
of the ACC did not occur and therefore the brain network foregoed the need for increased cognitive
control.
4.5.2.3

Inter and Intra Edge Phase Synchrony

To further assess the statistical signiﬁcance of the resulting modular structures of the brain, Q
and λ were computed for random connectivity matrices. A random matrix was generated by randomly shufﬂing the values of each of the original connectivity matrices while preserving the average degree of the network. One hundred random matrices were generated for each subject across
Correct and Error responses. The random connectivity matrices were partitioned into 2 ≤ k ≤ 10
clusters using the Fiedler Consensus Algorithm and the resulting Q, λ , and weighted modularity
values were compared against Q, λ , and modularity of the Correct and Error response modular
networks. The test indicated Q and λ resulting from the Correct and Error modular networks were
greater than 100% of the corresponding random measures. Furthermore, no discernible modular
structure was observed throughout any of the random graph partitions.
Next, average intra-modular and inter-modular edge phase synchronies were computed for
Correct and Error matrices and are presented in Figures 4.13 and 4.14. These plots describe the
segregated and integrated processes occurring in the brain. Strong functional connectivity is expected within modules and although weaker, connectivity is expected between modules in order
to facilitate in the integration of information processing. Evaluation of the Error response crossmodule average synchrony matrix indicates the mPFC module shares greater (p ≤ 0.05) phase

144

synchronization with the left lPFC and right lPFC modules than with the Parietal module. There is
less cross module integration between the Correct response modules. Furthermore, increased functional connectivity within the mPFC region during the ERN is illustrated by higher intra-modular
synchrony compared to the other three modules (p ≤ 0.01) .
4.5.2.4

Local Hubs of the Brain Network

Finally, the within module degree and participation coefﬁcient were computed for all nodes
across subjects. Figure 4.15 shows the average differences between the measures for Error and
Correct responses. The within module degree was signiﬁcantly greater , p ≤ 0.05, for Error along
the far left and right lateral prefrontal cortex regions as well as in the medial prefrontal cortex. This
increase in phase synchrony intensity may indicate greater enforcement of cognitive control by the
mPFC. The participation coefﬁcient was signiﬁcantly greater, p ≤ 0.01, in Error across the entire
pre-frontal region of the brain. This may be indicative of increased integration of information
processing between the monitoring and control regions of the brain.

4.6

Conclusions

In this chapter, two new approaches for community detection, in fully connected weighted
graphs, which represent the multivariate relationships among neural populations were presented.
First, using CONGA in an optimization process, a method to identify an optimal community structure composed of overlapping clusters based on thresholding and a recent deﬁnition of modularity
was proposed. The proposed implementation of CONGA permitted modules to overlap and identiﬁed a hierarchical decomposition of the brain network. Furthermore, identiﬁcation of the optimal
modular structure representative of human brain connectivity without the need for a priori information on the number of clusters or cluster size is possible with our proposed algorithm. Hierarchical spectral partitioning algorithm using the Fiedler vector for discovering a common community

145

structure across multiple subjects was also presented. By considering the cluster tendencies of
nodes across multiple subjects, a community structure representative of all evaluated data could
be generated. This approach is an improvement to conventional methods which include averaging
which neglects variance information contained across data sets. Finally, alternative measures to the
modularity function, an indicator of cluster quality, are presented which were shown to be effective
in evaluating community structures identiﬁed in weighted networks.
The proposed measures were applied to the set of EEG data representing the error-related
negativity and revealed that neural assemblies located around the Anterior Cingulate Cortex tend to
participate in a common cluster. Both the CONGA optimization process and the Fiedler Algorithm
show that node FCz is within the center of this cluster only during Error responses. Therefore, this
is indicative of increased in-homogenous structure in the frontal region of the brain compared
to Correct responses during the task related test. This observation can be further explored by
focussing analysis on the frontal region of the brain. Data associated with the parietal region of the
brain may contribute noise to the clustering algorithms and therefore ignoring these regions will
most likely reveal further hierarchical levels of the community structure.

146

Figure 4.5: Modularity with respect to degree and number of clusters for Correct responses.

147

Number of clusters, k
5

4

3

2

6−10

11−15

16−20
21−25
26−30
Average Degree Range, R

Modularity for Correct responses

31−35

36−40

0

0.2

0.4

0.6

0.8

1

Figure 4.6: Modularity with respect to degree and number of clusters for Error responses.

148

Number of clusters, k
5

4

3

2

6−10

11−15

16−20
21−25
26−30
Average Degree Range, R

Modularity for Error responses

31−35

36−40

0

0.2

0.4

0.6

0.8

1

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4.7: Six overlapping clusters derived for Correct responses.

149

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4.8: Six overlapping clusters derived for Error responses.

150

Figure 4.9: Hierarchical decomposition of the brain network during an Error response.

151

Figure 4.10: Hierarchical decomposition of the brain network during a Correct response.

152

Figure 4.11: Signiﬁcance cluster plots for Error responses.

153

Figure 4.12: Signiﬁcance cluster plots for Correct responses

154

Figure 4.13: Cross-Module average synchrony for Error responses across four clusters.

155

Figure 4.14: Cross-Module average synchrony for Correct responses across 2 clusters.

156

(a)

(b)
Figure 4.15: Normalized differences between Hub Identiﬁcation Measures between Error and Correct a) Within Module Z-Score; b) Participation Coefﬁcient.

157

Chapter 5
Summary and Future Work
This dissertation contributes to the analysis of functional brain networks from a complex network theory perspective, in particular using tools from graph theory. The limitations of current
graph theoretical analysis of biological networks, such as requirement of binary graphs and global
topological measures, are addressed by developing new measures that can be directly applied to
weighted networks and can reveal both local and global structural information about the network.
Signal processing, information theoretic, and stochastic analysis tools have proven invaluable for
characterizing complex signals and systems, and in this dissertation, are integrated with graph theory to produce new measures more natural to our problem. These methods were applied to a set
of graphs representing the pair-wise synchrony among neural assemblies quantiﬁed by a recently
introduced time-varying phase synchrony measure applied to the electroencephalogram (EEG).

5.1

Summary

In the ﬁrst part of this dissertation, the ‘small world’ structure of a complex network was discussed. Based on evidence from previous studies, it was hypothesized that an analysis of ‘small
world’ would describe the effective integration of multiple segregated sources of information in
158

the brain and how changes in the ‘small world’ structure are indicative of an adaptive response
to error processing. We explored how graph metrics, indicative of ‘small world’ structure, can be
made suitable for the analysis of fully connected weighted undirected networks. The two deﬁning measures of ‘small world’, clustering coefﬁcient and path length, were originally deﬁned for
binary adjacency matrices. In this chapter, a clustering coefﬁcient and path length that is directly
applicable to weighted graphs through a probabilistic approach. Clustering coefﬁcient was computed based on the probability of connectivity among triplets in the network while path length
was computed as a cost function which minimized the product of edge weights along a path, of
any length, between distinct nodes. These measures were applied to both synthetic networks as
well as weighted functional networks derived from pairwise phase synchrony values and revealed
increased ‘small world’ phenomenon in the frontal region of the brain.
The second part of this dissertation introduces local metrics for identifying inﬂuential and
anomalous nodes, or nodal regions, in a network. Since feature extraction of data is widely explored and is an important tool in signal processing, machine learning and pattern recognition,
we extended well-known methods, energy and entropy, to graphs in order to provide a more detailed and multi-scale view of the organization of the network. Laplacian-Huckel graph energy
is introduced as a localized measure which quantiﬁes the stability and reliability of local network
connectivity and information processing centered on a node. This measure was shown to be a
better representation of node centrality than existing measures such as betweenness and closeness centrality. Another new measure, graph entropy rate, was introduced to quantify the local
structural complexity of a weighted network. Based on a Markov process modeling of the connectivity matrix, graph entropy rate was demonstrated to be closely related to the topology of a graph.
Measures of graph energy and entropy rate revealed greater inﬂuences in overall connectivity and

159

complexity by nodes in the frontal region of the brain, primarily located around the FCz electrode,
during cognitive control.
The last part of this dissertation introduces a multi-subject clustering algorithm to evaluate
multivariate relationships across the brain network. Recent studies have shown that the modular
structure, i.e. clusters, of complex biological networks is indicative of robustness and contributes
to functionality by compartmentalizing speciﬁc functions within certain cortical regions without
perturbing the rest of the network. Intra-module associations are thought to describe the segregation of information processing while the inter-module associations testify to the integration of
information processing across distant brain regions. A key challenge in identifying modular organization of brain networks is determining a common structure across multiple subjects. Current
work either focuses on obtaining the community structure for the average connectivity network or
on analyzing each subject individually and obtaining a common modular structure using a voting
technique. Averaging neglects the variance across subjects whereas conventional voting methods
either require a priori knowledge of cluster labels or for one to ﬁrst acquire the modular structure of
each subject . We developed a community detection algorithm which is based on the well-known
Fiedler vector. This algorithm was best suited for identifying a common hierarchical modular
structure shared across multiple subjects. To further evaluate the quality of the discovered modular
structure, we introduced two new measures, homogeneity and completeness, which applied concepts from information theory to quantify the quality of the modules. The algorithm was applied
to the phase synchrony data to reveal that the brain was signiﬁcantly more modular during error
processing particularly in the frontal region of the brain.

160

5.2

Future Work

There are still remaining challenges with the application of graph theory to analyzing realworld networks such as the human brain. Some of these challenges and possible solutions include:
• The application of signal processing transforms to graphs: Ortega et. al. [106] proposed
a lifting wavelet transform on sensor networks for de-noising applications. Lifting wavelet
transforms are invertible and therefore would be ideal for decomposing a graph into its high
and low frequency components. If a network is modeled by its frequency spectrum described
by the Laplacian matrix eigenvalues, high frequency transitions will occur between clusters
and low frequency transitions within clusters. Therefore, the lifting wavelet can be used
to ﬁlter the edges contributing to high frequency changes thus revealing the clusters of the
network. However, Ortega’s application of lifting wavelets is on sensor networks for which
data is represented by values assigned to the nodes. For the graphs of interest in this dissertation, information is stored on the edges rather than the vertices. Therefore, the lifting
wavelet transform should be applied to the line graph of the network. For any given graph
G = (V, E) its corresponding line graph L(G) = (V , E ) is deﬁned such that |V | = |E| and
there is an edge in L(G) for each pair of edges in G that share a common endpoint, i.e.
{(i, j) ∈ E } ↔ {∃k|(k, i) ∈ E, (k, j) ∈ E}. By decomposing the line graph, removing the
detail coefﬁcients, and reconstructing the graph from its remaining coefﬁcients, the weights
along edges linking clusters should be minimized while weights within clusters are emphasized. An invertible wavelet transform applied to a weighted connectivity matrix would
be useful for community detection. The accuracy of spectral clustering using k-means, for
example, would increase since information contained in the eigenvectors would pertain primarily to intra-cluster relationships.
161

• A coding method for non-tree weighted graphs: The graph entropy rate measure introduced
demonstrated the compressibility of various graph topologies, random networks being the
most difﬁcult to compress. However, there does not exist a method to compress and reconstruct the original graph based solely on the adjacency or connectivity matrix. Lempel-Ziv
was applied to code the adjacency matrix, but the resulting bits/symbol did not achieve the
entropy computed using the proposed graph entropy rate measure. The Prufer sequence
[193], developed for coding binary tree graphs, may be extended to non-tree graphs. It may
be possible to deconstruct the graph into a set of trees from which each could be individually
coded using Prufer’s sequence.
• Analysis of dynamic graphs: The modular structure of the brain has up to this point been
evaluated for static connectivity matrices. It may be beneﬁcial to evaluate dynamic modular
structures of the brain in order to evaluate the evolution of network organization across the
brain before and after the occurrence of the error related negativity. Bai et. al. [194] published a study on the dynamic brain networks to evaluate the functional connectivity among
multiple brain regions during the post-stimulus phase facilitated by acupuncture. Mucha
et al. also developed a network modularity quality function that reveals clusters in timedependent multi-scale and multiplex networks [195]. Therefore, there is existing evidence
to suggest that dynamic graphs representing the brain network can be used to to further
understand the organization of the brain during error processing.
• Effective connectivity of weighted graphs: This dissertation has focussed on functional connectivity of brain networks using phase synchrony and graph measures such that relationships across neural assemblies were modeled as bi-directional and symmetric. However, the
directed relationships, effective connectivity, across neural assemblies should be evaluated
162

to further elucidate the information processing patterns of the human brain. With effective
connectivity, symmetry of the connectivity matrix can no longer be assumed and new modiﬁed measures of clustering coefﬁcient, path length, energy, and entropy must be developed
to incorporate directional information.

163

BIBLIOGRAPHY

164

BIBLIOGRAPHY
[1] http://www.neuroscienceblueprint.nih.gov/connectome/, 2012.
[2] O. Sporns, G. Tononi, and R. Kotter, “The human connectome: A structural description of
the human brain,” PLoS Comput Biol, vol. 1, no. 4, p. e42, 09 2005.
[3] M. Axer, K. Amunts, D. Grassel, C. Palm, J. Dammers, H. Axer, U. Pietrzyk, and K. Zilles,
“A novel approach to the human connectome: Ultra-high resolution mapping of ﬁber tracts
in the brain,” NeuroImage, vol. 54, no. 2, pp. 1091 – 1101, 2011.
[4] R. S. J. Frackowiak, K. J. Friston, C. Frith, R. Dolan, and C. J. Price, Human Brain Function,
2nd ed. Academic Press, 2003, no. 1,144.
[5] G. Tononi and O. Sporns, “Measuring information integration.” BMC neuroscience, vol. 4,
no. 1, pp. 31+, Dec. 2003.
[6] L. Lee, L. M. Harrison, and A. Mechelli, “A report of the functional connectivity workshop,”
NeuroImage, 2003.
[7] C. J. Stam, W. de Haan, A. Daffertshofer, B. Jones, I. Manshanden, A. M. van Cappellen van
Walsum, T. Montez, J. Verbunt, J. de Munck, B. van Dijk, H. Berendse, and P. Scheltens, “Graph theoretical analysis of magnetoencephalographic functional connectivity in
alzheimer’s disease,” Brain, 2009.
[8] M. Rubinov and O. Sporns, “Complex network measures of brain connectivity: Uses and
interpretations,” NeuroImage, vol. 52, no. 3, pp. 1059 – 1069, 2010.
[9] D. Zhou, W. K. Thompson, and G. Siegle, “Matlab toolbox for functional connectivity,”
NeuroImage, vol. 47, no. 4, pp. 1590 – 1607, 2009.
[10] M. D. Greicius, B. Krasnow, A. L. Reiss, and V. Menon, “Functional connectivity in the
resting brain: A network analysis of the default mode hypothesis,” PNAS, 2002.
165

[11] P. Tu, R. L. Buckner, L. Zollei, K. A. Dyckman, D. C. Goff, and D. S. Manoach, “Reduced
functional connectivity in a right-hemisphere network for volitional ocular motor control in
schizophrenia,” Brain, 2010.
[12] P. L. S. Jacques, F. Dolcos, and R. Cabeza, “Effects of aging on functional connectivity of
the amygdala for subsequent memory of negative pictures,” Psychological Science, 2009.
[13] Y. He, Z. Chen, and A. Evans, “Structural insights into aberrant topological patterns of
large-scale cortical networks in alzheimer’s disease,” J Neurosci, pp. 4756 – 4766, 2008.
[14] Y. He and A. Evans, “Graph theoretical modeling of brain connectivity,” Curr Opin Neurol,
2010.
[15] D. S. Bassett, E. Bullmore, and B. A. Verchinski, “Hierarchical organization of human
cortical networks in health and schizophrenia,” J Neurosci, no. 9239 – 9248, 2008.
[16] E. van Dellen, L. Douw, and J. C. Baayen, “Long-term effects of temporal lobe epilepsy
on local neural networks: a graph theoretical analysis of cortico- graphy recordings.” PLos
One, no. 4, p. e8081, 2009.
[17] S. J. Leistedt, N. Coumans, and M. Dumont, “Altered sleep brain functional connectivity in
acutely depressed patients,” Human Brain Mapping, no. 30, pp. 2207–2219, 2009.
[18] A. Bondy and U. Murty, Graph Theory.

Springer, 2008.

[19] S. Aviyente, E. M. Bernat, W. S. Evans, and S. R. Sponheim, “A phase synchrony measure
for quantifying dynamic functional integration in the brain,” Human Brain Mapping, vol. 32,
no. 1, pp. 80–93, 2010.
[20] J. R. Hall, E. M. Bernat, and C. J. Patrick, “Externalizing psychopathology and the errorrelated negativity,” Psychological Science, vol. 18, no. 4, pp. 326–333, 2007.
[21] J. F. Cavanagh, M. X. Cohen, and J. J. B. Allen, “Prelude to and resolution of an error: Eeg
phase synchrony reveals cognitive control dynamics during action monitoring,” The Journal
of Neuroscience, vol. 29, no. 1, pp. 98–105, 2009.
[22] T. K. Smock, Physiological Psychology.

Prentice-Hall, 1999.

[23] E. Niedermeyer and F. L. da Silvia, Electroencephalography: Basic Principles, Clinical
Applications, and Related Fields. L. Williams and Wilkins, 2004.
166

[24] J. G. Webster, Medical Instrumentation: Application and Design. John Wiley Sons, 1998.
[25] E. Pereda, R. Rial, A. Gamundi, and J. González, “Assessment of changing interdependencies between human electroencephalograms using nonlinear methods,” Physica D: Nonlinear Phenomena, vol. 148, no. 1-2, pp. 147 – 158, 2001.
[26] A. Rihaczek, “Signal energy distribution in time and frequency,” in IEEE Transactions:
Information Theory, vol. 14, 1968, pp. 369–274.
[27] O. Sporns, D. Chialvo, M. Kaiser, and C. Hilgetag, “Organization, development and function of complex brain networks,” TRENDS in Cognitive Sciences, vol. 8, no. 9, 2004.
[28] W. W. R. Ball and H. S. M. Coxeter, Mathematical Recreations and Essays, 13th ed. Dover,
1987.
[29] P. Erdos and A. Renyi, “On the evolution of random graphs,” Mathematical Institute of the
Hungarian Academy of Sciences, vol. 5, pp. 17–61, 1960.
[30] A. H. Esfahanian, “Algorithmic graph theory,” 2001.
[31] D. J. Watts and S. H. Strogatz, “Collective dynamics of ’small world’ networks,” Nature,
vol. 393, June 1998.
[32] S. Fortunato and A. Lancichinetti, “Community detection algorithms: a comparative analysis: invited presentation, extended abstract,” in Proceedings of the Fourth International ICST
Conference on Performance Evaluation Methodologies and Tools, 2009, pp. 27:1–27:2.
[33] W. W. Zachary, “An information ﬂow model for conﬂict and ﬁssion in small groups,” Journal Anthropol Res, vol. 33, pp. 452–473, 1977.
[34] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, “The
bottlenose dolphin community of doubtful sound,” Behav Ecol Sociobiol, vol. 54, pp. 396–
405, 2003.
[35] D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing.
Addison-Wesley, 1993.
[36] B. Frey, The North American Airport Network.
167

www.psi.toronto.edu/ frey/, 2012.

[37] J. Kayser and C. Tenke, “Principal components analysis of laplacian waveforms as a generic
method for identifying erp generator patterns: I. evaluation with auditory oddball tasks,”
Clin Neurophysiol, vol. 117, no. 2, pp. 348–368, 2006.
[38] ——, “Principal components analysis of laplacian waveforms as a generic method for identifying erp generator patterns: Ii. adequacy of low-density estimates,” Clin Neurophysiol,
vol. 117, no. 2, pp. 369–380, 2006.
[39] A. MacDonald, J. Cohen, V. Stenger, and C. Carter, “Dissociating the role of the dorsolateral
prefrontal and anterior cingulate cortex in cognitive control,” Science, vol. 288, no. 5472, p.
1835, 2000.
[40] M. M. Botvinick, T. Braver, D. Barch, C. Carter, and J. Cohen, “Conﬂict monitoring and
cognitive control,” Psychol Rev, vol. 108, no. 3, p. 624, 2001.
[41] J. C. Miller, “Spread of infectious disease through clustered populations,” J. R. Soc. Interface, vol. 6, no. 41, pp. 1121–1134, 2009.
[42] M. M. Botvinick, J. Cohen, and C. Carter, “Conﬂict monitoring and anterior cingulate cortex: an update,” Trends Cogn Sci, vol. 8, no. 12, pp. 539–546, 2004.
[43] J. W. Brown and T. S. Braver, “A computational model of risk, conﬂict, and individual
difference effects in the anterior cingulate cortex,” Brain Research, vol. 1202, pp. 99–108,
2008.
[44] W. J. Gehring and A. R. Willoughby, Errors, conﬂicts and the brain. Current opinions on
performance monitoring, 2004, no. 14–20.
[45] E. M. Bernat, W. J. Williams, and W. J. Gehring, “Decomposing erp time-frequency energy
using pca,” Clin Neurophysiol, vol. 116, no. 6, pp. 1314–1334, 2005.
[46] L. T. Trujillo and J. J. B. Allen, “Theta eeg dynamics of the error-related negativity,” Clin
Neurophysiol, vol. 118, no. 3, pp. 645–668, 2007.
[47] E. Bernat, L. Nelson, V. Steele, W. Gehring, and C. Patrick, “Externalizing psychopathology
and gain–loss feedback in a simulated gambling task: Dissociable components of brain
response revealed by time-frequency analysis,” J Abnorm Psychol, vol. 120, no. 2, pp. 352–
364, 2011.
[48] K. Matsumoto, M. Uehara, and H. Mori, “Fault tolerance for small-world cellular neural
networks,” Lecture Notes in Computer Science, vol. 5186, pp. 223–231, 2008.
168

[49] M. Karsai, M. Kivela, R. K. Pan, K. Kaski, J. Kertesz, A. L. Barabasi, and J. Saramaki,
“Small but slow world: How network topology and burstiness slow down spreading,” Phys.
Rev. E, vol. 83, no. 025102(R), 2011.
[50] M. E. J. Newman, “The structure of scientiﬁc collaboration networks,” Proc Natl Acad Sci
U S A, vol. 98, no. 2, pp. 404–9, Jan 2001.
[51] B. Kogut and G. Walker, “The Small World of Germany and the Durability of National
Networks,” American Sociological Review, vol. 66, no. 3, pp. 317–335, 2001.
[52] B. Uzzi and J. Spiro, “Collaboration and Creativity: The Small World Problem,” American
Journal of Sociology, vol. 111, no. 2, pp. 447–504, Sep. 2005.
[53] O. Sporns and R. Kotter, “Motifs in brain networks,” PLoS Biol, vol. 2, pp. 1910–1918,
2004.
[54] O. Sporns and J. D. Zwi, “The small world of the cerebral cortex.” Neuroinformatics, vol. 2,
no. 2, pp. 145–162, Jun. 2004.
[55] O. Sporns, G. Tononi, and G. Edelman, “Theoretical neuroanatomy: relating anatomical and
functional connectivity in graphs and cortical connection matrices,” Cereb Cortex, vol. 10,
pp. 127–141, 2000.
[56] C. Cherniak, Z. Mokhtarzada, R. Rodriguez-Esteban, and K. Changizi, “Global optimization of cerebral cortex layout,” Proc. Natl. Acad. Sci., no. 101, pp. 1081–1086, 2004.
[57] O. Sporns and G. Tononi, “Classes of network connectivity and dynamics,” Complexity,
no. 7, pp. 28–38, 2002.
[58] R. Salvador, J. Suckling, J. Coleman, M. andPickard, D. Menon, and E. Bullmore, “Neurophysiological architecture of functional magnetic resonance images of human brain,” Cereb.
Cortex, no. 15, pp. 1332–1342, 2005.
[59] S. Achard, R. Salvador, B. Whitcher, J. Suckling, and E. Bullmore, “A resilient, lowfrequency, small-world human brain functional network with highly connected association
cortical hubs.” J. Neurosci, no. 26, pp. 63–72, 2006.
[60] V. Eguiluz, D. Chialvo, M. Cecchi, G.A. andBaliki, and A. Apkarian, “Scale-free brain
functional networks,” Phys. Rev. Lett., vol. 94, p. 018102, 2005.
169

[61] S. L. Bressler, “Large-scale cortical networks and cognition,” Brain Research Reviews,
vol. 20, no. 3, pp. 288 – 304, 1995.
[62] K. Friston, “Beyond phrenology: what can neuroimaging tell us about distributed circuitry?”
Annu. Rev. Neurosci., no. 25, pp. 221–250, 2002.
[63] K. Supekar, V. Menon, D. Rubin, M. Musen, and M. D. Greicius, “Network analysis of
intrinsic functional brain connectivity in alzheimer’s disease,” PLoS Comput Biol, vol. 4,
no. 6, p. e1000100, 06 2008.
[64] S. Micheloyannis, E. Pachou, C. J. J. Stam, M. Breakspear, P. Bitsios, M. Vourkas, S. Erimaki, and M. Zervakis, “Small-world networks and disturbed functional connectivity in
schizophrenia.” Schizophrenia research, vol. 87, no. 1-3, pp. 60–66, Oct. 2006.
[65] K. Yuan, W. Q., J. Liu, Q. Guo, M. Dong, J. Sun, Y. Zhang, P. Liu, W. Wang, Y. Wang,
Q. Li, W. Yang, K. M. von Deneen, M. S. Gold, Y. Liu, and J. Tian, “Altered small-world
brain functional networks and duration of heroin use in male abstinent heroin-dependent
individuals,” Neuroscience Letters, vol. 477, no. 1, pp. 37 – 42, 2010.
[66] J. Saramaki, M. Kivela, J. P. Onnela, K. Kaski, and J. Kertesz, “Generalizations of the
clustering coefﬁcient to weighted complex networks,” Phys. Rev. E, vol. 75, no. 2, p. 027105,
Feb 2007.
[67] G. Caldarelli, A. Capocci, and D. Garlaschelli, “A self-organized model for network evolution,” The European Physical Journal B - Condensed Matter and Complex Systems, vol. 64,
pp. 585–591, 2008.
[68] G. Kalna, “A clustering coefﬁcient for weighted networks, with application to gene expression data,” AI Commun., vol. 20, no. 4, pp. 263–271, 2007.
[69] J. C. Reijneveld, S. C. Ponten, H. W. Berendse, and C. J. Stam, “The application of graph
theoretical analysis to complex networks in the brain,” Clin Neurophysiol, vol. 118, no. 11,
pp. 2317–2331, 2007.
[70] G. Fagiolo, J. Reyes, and S. Schiavo, “On the topological properties of the world trade
web: A weighted network analysis,” Applications of Physics in Financial Analysis, vol.
387, no. 15, pp. 3868–3873, 2008.
[71] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani, “The architecture of complex weighted networks,” PROC.NATL.ACAD.SCI.USA, vol. 101, p. 3747, 2004.
170

[72] J. P. Onnela, J. Saramaki, J. Kertesz, and K. Kaski, “Intensity and coherence of motifs in
weighted complex networks,” Phys. Rev. E, vol. 71, no. 6, p. 065103, Jun 2005.
[73] C. Yan, G. Gong, J. Wang, D. Wang, D. Liu, C. Zhu, Z. Chen, A. Evans, Y. Zang, and Y. He,
“Sex- and brain size-related small-world structural cortical networks in young adults: a dti
tractography study,” Cereb Cortex, vol. 21, no. 2, p. 449, 2011.
[74] B. Zhang and S. Horvath, “A general framework for weighted gene co-expression network
analysis,” Stat Appl Genet Mol Biol, vol. 4, no. 1, p. Article 17, 2005.
[75] P. Grindrod, “Range-depdendent random graphs and their application to modeling large
small-world proteome datasets,” Physical Review E, vol. 66, no. 066702, 2002.
[76] S. Boyd, P. Diaconis, and L. Xiao, “Fastest mixing markov chain on a graph,” SIAM Review,
vol. 46, no. 4, pp. 667–689, 2004.
[77] J. Jarvis and D. Shier, Applied Mathematical Modeling: A Multidisciplinary Approach;
Graph-theoretic analysis of ﬁnite Markov chains. CRC Press, 1999, vol. Chapter 13.
[78] B. V. Cherkassky, A. V. Goldberg, and T. Radzik, “Shortest paths algorithms: Theory and
experimental evaluation,” Mathematical Programming, vol. 73, pp. 129–174, 1993.
[79] B. V. Cherkassky and A. V. Goldberg, “Negative-cycle detection algorithms,” Springer,
vol. 85, pp. 277–311, 1999.
[80] V. Latora and M. Marchiori, “Efﬁcient behavior of small-world networks,” Phys. Rev. Lett.,
vol. 87, no. 19, p. 198701, Oct 2001.
[81] J. Gao, R. Jin, J. Zhou, J. Xu, X. Jiang, and T. Wang, “Relational approach for shortest path
discovery over large graphs,” 38th International Conference on Very Large Data Bases,
vol. 5, no. 4, 2011.
[82] R. W. Floyd, “Algorithm 97: Shortest path,” Communications of the ACM, vol. 5, no. 6, p.
345, 1962.
[83] F. Boesch, “A survey and introduction to network reliability theory,” Communications, Digital Technology - Spanning the Universe. Conference Record., IEEE International Conference on, vol. 2, pp. 678 –682, jun 1988.
171

[84] A. Shatnawi, M. O. Ahmad, and M. N. S. Swamy, “Optimal scheduling of digital signal
processing data-ﬂow graphs using shortest-path algorithms,” The Computer Journal, vol. 45,
pp. 88–100, 2002.
[85] G. Yu and J. Yang, “On the robust shortest path problem,” Computers Ops Res, vol. 25,
no. 6, pp. 457–468, 1998.
[86] A. Nagurney and Q. Qiang, “Robustness of transportation networks subject to degradable
links,” Frontiers of Physics, vol. 80, no. 68001, 2007.
[87] A. Bley, F. D’Andreagiovanni, and a. Hanemann, “Robustness in communication networks:
Scenarios and mathematical approaches,” Photonic Networks; 12. ITG Symposium; Proceedings of, pp. 1 –8, 2011.
[88] M. Manzano, E. Calle, and D. Harle, “Quantitative and qualitative network robustness analysis under different multiple failure scenarios,” in Ultra Modern Telecommunications and
Control Systems and Workshops (ICUMT), 2011 3rd International Congress on, oct. 2011,
pp. 1 –7.
[89] E. F. Zipkin, C. S. Jennelle, and E. G. Cooch, “A primer on the application of markov chains
to the study of wildlife disease dynamics,” Methods in Ecology and Evolution, vol. 1, no. 2,
pp. 192–198, 2010.
[90] J. G. Kemeny and J. L. Snell, Finite Markov Chains.

Springer Berlin / Heidelberg, 1976.

[91] L. Fleming and M. Marx, “Managing innovation in small worlds,” Engineering Management
Review, IEEE, vol. 37, no. 4, pp. 90 –92, 2009.
[92] Y. Yang and X. Yu, “Weighted small world complex networks: smart sliding mode control,” in Proceedings of the Intelligent computing 5th international conference on Emerging
intelligent computing technology and applications, 2009, pp. 935–944.
[93] V. Latora and M. Baranger, “Kolmogorov-sinai entropy rate versus physical entropy,” PHYSICAL REVIEW LETTERS, vol. 82, no. 3, 1999.
[94] C. Stam, B. Jones, G. Nolte, M. Breakspear, and P. Scheltons, “Small-world networks and
functional connectivity in alzheimer’s disease,” Cerebral Cortex, 2006.
[95] B. C. M. van Wijk, C. J. Stam, and A. Daffertshofer, “Comparing brain networks of different
size and connectivity density using graph theory,” PLoS ONE, vol. 5, no. 10, p. e13701, 10
2010.
172

[96] J. Kim, “Implementation and performance evaluation of mobile ad hoc network for emergency telemedicine system in disaster areas,” IEEE Engineering in Medicine and Biology
Society Conference, pp. 1663–1666, 2009.
[97] M. He and S. Petoukhov, Mathematics of Bioinformatics: Theory, Methods and Applications. Wiley, 2011.
[98] R. S. Choras, “Image feature extraction techniques and their applications for cbir and biometrics systems,” International Journal of Biology and Biomedical Engineering, vol. 1,
no. 1, 2007.
[99] H. Liu, G. Chen, S. Jiang, and G. Song, “A survey of feature extraction approaches in
analog circuit fault diagnosis,” in Proceedings of the 2008 IEEE Paciﬁc-Asia Workshop on
Computational Intelligence and Industrial Application - Volume 02, 2008, pp. 676–680.
[100] P. Sanz, R. Marin, and J. Sanchez, “Pattern recognition for autonomous manipulation in
robotic systems,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 35, pp. 1–3,
2005.
[101] H. Zhao, S. Sun, Z. Jing, and J. Yang, “Local structure based supervised feature extraction,”
Pattern Recognition, vol. 39, no. 8, pp. 1546 – 1550, 2006.
[102] D. Calvanese, G. D. Giacomo, and M. Lenzerini, “Extending semi-structured data,” Proc.
of the 6th Italian Conf. on Database Systems (SEBD’98), 1998.
[103] S. Zafeiriou, “Exploiting discriminant information in elastic graph matching,” Image Processing, IEEE International Conference on, vol. 3, pp. 768–771, 2006.
[104] H. Fei and J. Huan, “Structure feature selection for graph classiﬁcation,” in Proceedings of
the 17th ACM conference on Information and knowledge management, 2008, pp. 991–1000.
[105] B. A. Miller, N. T. Bliss, and P. J. Wolfe, “Toward signal processing theory for graphs and
non-euclidean data,” Acoustics Speech and Signal Processing (ICASSP), IEEE International
Conference on, pp. 5414 – 5417, 2010.
[106] G. Shen and A. Ortega, “Optimized distributed 2d transforms for irregularly sampled sensor
network grids using wavelet lifting,” ICASSP, pp. 2513–2516, 2008.
[107] R. Guimera, S. Mossa, A. Turtschi, and L. A. N. Amaral, “The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles,” hysical
Sciences - Applied Mathematics, vol. 102, no. 22, pp. 7794–7799, 2005.
173

[108] M. E. J. Newman, Networks: An Introduction.

Oxford University Press, 2010.

[109] L. C. Freeman, S. P. Borgatti, and D. R. White, “Centrality in valued graphs: A measure of
betweenness based on network ﬂow,” Social Networks, vol. 13, no. 2, pp. 141 – 154, 1991.
[110] T. Opashi, G. Agneessens, and J. Skvoretz, “Node centrality in weighted networks: Generalizing degree and shortest paths,” Social Networks, vol. 32, no. 3, p. 245, 2010.
[111] P. Bonacich, “Some unique properties of eigenvector centrality,” Social Networks, vol. 29,
no. 4, pp. 555 – 564, 2007.
[112] K. Pearson, “On lines and planes of closest ﬁt to systems of points in space,” Philosophical
Magazine, vol. 2, no. 6, pp. 559–572, 1901.
[113] R. Guimera and L. A. N. Amaral, “Functional cartography of complex metabolic networks,”
Nature, vol. 433, pp. 895–900, 2005.
[114] I. Gutman and B. Zhou, “Laplacian energy of a graph,” Linear Algebra and its Applications,
vol. 414, no. 1, pp. 29 – 37, 2006.
[115] J. Day and W. So, “Graph energy change due to edge deletion,” Linear Algebra and its
Applications, vol. 428, no. 8-9, pp. 2070 – 2078, 2008.
[116] G. Blyholder and C. Coulson, “Basis of extended hückel formalism,” Theoretical Chemistry
Accounts: Theory, Computation, and Modeling, vol. 10, pp. 316–324, 1968.
[117] R. Balakrishnan, “The energy of a graph,” Linear Algebra and its Applications, vol. 387,
pp. 287 – 295, 2004.
[118] I. T. Jolliffe, Principle Component Analysis.

Springer, 2002.

[119] C. Allefeld, “Eigenvalue decomposition as a generalized synchronization cluster analysis,”
International Journal of Bifurcation and Chaos, vol. 17, no. 10, pp. 3493–3497, 2008.
[120] B. Mohar, The Laplacian Spectrum of Graphs.

Wiley, 1991, vol. 2, no. 871-898.

[121] J. van den Heuvel and P. Snezana, “Using laplacian eigenvalues and eigenvectors in the
analysis of frequency assignment problems,” Annals of Operations Research, vol. 107, pp.
349–368, 2001.
174

[122] T. Cover and J. Thomas, Elements of Information Theory.

Wiley, 2006.

[123] J. Korner, “Coding of an information source having ambiguous alphabet and the entropy of
graphs,” Transactions of the 6th Prague Conference on Information Theory, no. 411-425,
1973.
[124] M. Dehmer, “Information processing in complex networks: Graph entropy and information
functionals,” Appl Math Comput, vol. 201, no. 1-2, pp. 82 – 94, 2008.
[125] J. Shetty, “Discovering important nodes through graph entropy: The case of enron email
database,” in KDD, Proceedings of the 3rd international workshop on Link discovery.
Press, 2005, pp. 74–81.
[126] L. Demetrius and T. Manke, “Robustness and network evolution-an entropic principle,”
Physica A, vol. 346, pp. 682–696, 2005.
[127] Z. Burda, J. Duda, J. M. Luck, and B. Waclaw, “Localization of the maximal entropy randomwalk,” Physical Review Letters, vol. 102, 2009.
[128] R. Sinatra, J. Gomez-Gardenes, R. Lambiotte, V. Nicosia, and V. Latora, “Maximal-entropy
random walks in complex networks with limited information,” Phys. Rev. E, vol. 83, p.
030103, Mar 2011.
[129] T. Milenkovic, V. Memisevic, A. Bonato, and N. Przulj, “Dominating biological networks,”
PLoS ONE, vol. 6, no. 8, p. e23016, 2011.
[130] Y. Choi and W. Szpankowski, “Compression of graphical structures,” Information Theory,
IEEE Transactions on, vol. 58, no. 2, pp. 620 – 638, 2012.
[131] T. Pevny, P. Bas, and J. Fridrich, “Steganalysis by subtractive pixel adjacency matrix,” Information Forensics and Security, IEEE Transactions on, vol. 5, no. 2, pp. 215–224, 2010.
[132] K. Sayood, Introduction to Data Compression, 3rd ed.

Morgan Kaufmann, 2005, no. 24.

[133] T. S. Huang, “Face detection with information-based maximum discrimination,” Computer
Vision and Pattern Recognition Proceedings., IEEE Computer Society Conference on, p.
782, 1997.
[134] R. Rosen, “Matrix bandwidth minimization,” Proceedings of the 1968 23rd ACM national
conference, pp. 585–595, 1968.
175

[135] R. Marti, V. Campos, and E. Pinana, “A branch and bound algorithm for the matrix bandwidth minimization,” European Journal Oper Res, vol. 186, no. 2, pp. 513 – 528, 2008.
[136] W. F. Tinney and J. W. Walker, “Direct solution of sparse network equations by optimally
ordered triangular factorization,” Proc. IEEE, vol. 55, no. 11, pp. 1801–1809, 1967.
[137] G. Fung, “A comprehensive overview of basic clustering algorithms,” A Review, 2001.
[138] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” Proceedings of the Second International
Conference on Knowledge Discovery and Data Mining, pp. 226–231, 1996.
[139] S. van Dongen, “Graph clustering by ﬂow simulation,” Ph.D. dissertation, University of
Utrecht, 2000.
[140] C. Ding and X. He, “Linearized cluster assignment via spectral ordering,” Proceedings of
the 21st International Conference on Machine Learning, 2004.
[141] E. Cuthill and J. McKee, “Reducing the bandwidth of sparse symmetric matrices,” Proceedings of the 1969 24th ACM national conference, pp. 157–172, 1969.
[142] M. Lin, Z. Lin, and J. Xu, “Graph bandwidth of weighted caterpillars,” Theoretical Computer Science, vol. 363, no. 3, pp. 266 – 277, 2006.
[143] M. Benzi, D. B. Szyldy, and A. van Duinz, “A study of different orderings for incomplete
factorization preconditioning of nonsymmetric linear systems,” COMPUTATIONAL MECHANICS: New Trends and Applications, 1998.
[144] W. S. Yeo and J. Berger, “Application of raster scanning method to image soniﬁcation, sound
visualization, sound analysis and synthesis,” Proc. of the 9th Int. Conference on Digital
Audio Effects, 2006.
[145] S. Kamata, M. Niimi, and E. Kawaguchi, “A method of interactive analysis for multidimensional images using a hilber curve,” Trans I.E.I.C.E., no. J77, pp. 1255–1264, 1994.
[146] G. M. Morton, “A computer oriented geodetic data base; and a new technique in ﬁle sequencing,” Technical Report for IBM Ltd., 1966.
[147] D. Hilbert, “Uber die stetige abbildung einer linie auf ein ﬂachenstuck,” Mathematische
Annalen, pp. 459–460, 1891.
176

[148] E. Artyomov, Y. Rivenson, G. Levi, and O. Yadid-Pecht, “Morton (z) scan based real-time
variable resolution cmos image sensor,” Electronics, Circuits and Systems (ICECS), Proceedings of the 2004 11th IEEE International Conference on, pp. 145–148, 2004.
[149] S. P. Lloyd, “Least squares quantization in pcm,” IEEE Trans Inform Theory, vol. 28, no. 2,
pp. 129–127, 1982.
[150] J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,” IEEE
Trans Inform Theory, 1978.
[151] M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,”
Proceedings Natl Acad Sci USA, vol. 99, no. 12, pp. 7821–7826, Jun. 2002.
[152] M. E. J. Newman, “Detecting community structure in networks,” The European Physical
Journal B - Condensed Matter and Complex Systems, vol. 38, no. 2, pp. 321–330–330, Mar.
2004.
[153] G. W. Flake, S. L., C. L. Giles, and F. M. Coetzee, “Self-organization and identiﬁcation of
web communities,” IEEE Computer, vol. 35, pp. 66–71, 2002.
[154] M. Chavez, M. Valencia, V. Navarro, V. Latora, and J. Martinerie, “Functional modularity
of background activities in normal and epileptic brain networks,” Phys. Rev. Lett., vol. 104,
no. 11, p. 118701, Mar 2010.
[155] M. Kirschner and J. Gerhart, “Evolvability,” Proceedings of the National Academy of Sciences, vol. 95, no. 15, pp. 8420–8427, 1998.
[156] M. E. J. Newman, “Modularity and community structure in networks,” Proceedings of the
National Academy of Sciences, vol. 103, no. 23, p. 8577, 2006.
[157] V. Satuluri and S. Parthasarathy, “Scalable graph clustering using stochastic ﬂows: applications to community discovery,” in Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining, ser. KDD ’09. New York, NY, USA:
ACM, 2009, pp. 737–746.
[158] T. Young and I. Chiang, “A novel clustering method of high dimensional data,” in Rough
Sets, Fuzzy Sets, Data Mining and Granular Computing. Springer Berlin / Heidelberg,
2007, vol. 4482, pp. 256–262.
[159] X. Otazu and O. Pujol, “Wavelet based approach to cluster analysis. application on low
dimensional data sets,” Pattern Recogn. Lett., vol. 27, pp. 1590–1605, October 2006.
177

[160] V. Carchiolo, A. Longheu, M. Malgeri, and G. Mangioni, “Search for overlapped communities by parallel genetic algorithms,” CoRR, vol. abs/0912.0913, 2009.
[161] S. White and P. Smyth, “A spectral clustering approach to ﬁnding communities in graphs,”
SIAM International Conference on Data Mining, pp. 76–84, 2005.
[162] J. B. MacQueen, “Some methods for classiﬁcation and analysis of multivariate observations,” in Proc. of the ﬁfth Berkeley Symposium on Mathematical Statistics and Probability,
L. M. L. Cam and J. Neyman, Eds., vol. 1. University of California Press, 1967, pp.
281–297.
[163] R. Baumgartner, L. Ryner, W. Richter, R. Summers, M. Jarmasz, and R. Somorjai, “Comparison of two exploratory data analysis methods for fmri: fuzzy clustering vs. principal
component analysis,” Magnetic Resonance Imaging, vol. 18, no. 1, pp. 89–94, 2000.
[164] A. Meyer-Baese, A. Wismueller, and O. Lange, “Comparison of two exploratory data analysis methods for fmri: unsupervised clustering versus independent component analysis,”
Information Technology in Biomedicine, IEEE Transactions on, vol. 8, no. 3, pp. 387–398,
2004.
[165] U. Luxburg, “A tutorial on spectral clustering,” CoRR, vol. abs/0711.0189, 2007.
[166] J. Forman, P. Clemons, S. Schreiber, and S. Haggarty, “Spectralnet - an application for
spectral graph analysis and visualization,” BMC Bioinformatics, vol. 6, no. 1, p. 260, 2005.
[167] D. Meunier, R. Lambiotte, A. Fornito, K. D. Ersche, and E. T. Bullmore, “Hierarchical modularity in human brain functional networks,” Frontiers in Neuroinformatics, vol. 3, no. 37,
2009.
[168] B. Fischer and J. Buhmann, “Bagging for path-based clustering,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, no. 11, pp. 1411–1415, 2003.
[169] I. Rustandi, “Hierarchical gaussian naive bayes classiﬁer for multiple-subject fmri data,” in
In NIPS Workshop: New Directions on Decoding Mental States from fMRI Data, 2006.
[170] W. D. Penny and A. J. Holmes, “Random effects analysis,” Imaging, vol. 2, no. 1, pp. 1–12,
2004.
[171] W. Tang, Z. Lu, and I. S. Dhillon, “Clustering with multiple graphs,” ICDM ’09 Proceedings
of the 2009 Ninth IEEE International Conference on, pp. 1016–1021, 2009.
178

[172] H. Wang, H. Shan, and A. Banerjee, “Bayesian cluster ensembles,” Statistical Analysis and
Data Mining, no. 4, pp. 54–70, 2011.
[173] S. E. Schaeffer, “Graph clustering,” Computer Science Review, vol. 1, no. 1, pp. 27 – 64,
2007.
[174] S. Gregory, “An algorithm to ﬁnd overlapping community structure in networks,” Lecture
Notes in Computer Science, vol. 4702, p. 91, 2007.
[175] S. Chauhan, M. Girvan, and E. Ott, “Spectral properties of networks with community structure,” Phys. Rev. E, vol. 80, no. 5, p. 056114, Nov 2009.
[176] A. Arenas, A. Diaz-Guilera, and C. J. Perez-Vicente, “Synchronization reveals topological
scales in complex networks,” Physical Review Letters, vol. 96, no. 114102, 2006.
[177] H. Shen and X. Cheng, “Spectral methods for the detection of network community structure:
a comparative analysis,” Journal of Statistical Mechanics: Theory and Experiment, vol.
2010, no. 10, p. P10020, 2010.
[178] M. E. J. Newman, “Finding community structure in networks using the eigenvectors of
matrices,” Phys. Rev. E, vol. 74, no. 3, p. 036104, 2006.
[179] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,”
in ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. MIT Press, 2001,
pp. 849–856.
[180] Y. Lin, K. Desouza, and S. Roy, “Measuring agility of networked organizational structures
via network entropy and mutual inforation,” Applied Mathematics and Computation, vol.
216, pp. 2824–2836, 2010.
[181] M. Holzrichter and S. Oliveira, “A graph based method for generating the ﬁedler vector of
irregular problems,” in In Lecture Notes in Computer Science. Lecture, 1999, pp. 978–985.
[182] N. Nguyen and R. Caruana, “Consensus clusterings,” Data Mining, Seventh IEEE International Conference on, pp. 607– 612, 2007.
[183] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel hypergraph partitioning:
Application in vlsi domain,” in IEEE TRANS. VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS, 1999, pp. 69–529.
179

[184] V. Nicosia, G. Mangioni, V. Carchiolo, and M. Malgeri, “Extending modularity deﬁnition
for directed graphs with overlapping communities,” Eprint arXiv, vol. 801.
[185] M. K. Goldberg, M. Hayvanovych, and M. Magdon-Ismail, “Measuring similarity between
sets of overlapping clusters,” IEEE Second International Conference on Social Computing,,
pp. 303–308, 2010.
[186] M. C. V. Nascimento and A. C. P. L. F. de Carvalho, “Spectral methods for graph clustering
- a survey,” European Journal of Operational Research, vol. 211, no. 2, pp. 221 – 231, 2011.
[187] S. Yu, X. Liu, L. C. Tranchevent, W. Glänzel, J. A. K. Suykens, B. De Moor, and Y. Moreau,
“Optimized data fusion for k-means laplacian clustering,” Bioinformatics, vol. 27, pp. 118–
126, January 2011.
[188] K. Steinhaeuser and N. V. Chawla, “Identifying and evaluating community structure in complex networks,” Pattern Recognition Letters, vol. 31, no. 5, pp. 413 – 421, 2010.
[189] C. J. Rijsbergen, “Information retrieval,” Journal of the American Society for Information
Science, vol. 30, no. 6, pp. 374–375, 1979.
[190] A. Rosenberg and J. Hirschberg, “V-measure: A conditional entropy-based external cluster
evaluation measure,” Proceedings of the 2007 Joint Conference on Empirical Methods in
Natural Language Processing and Computational Natural Language Learning, pp. 410–
420, 2007.
[191] J. Lin, “Divergence measures based on the shannon entropy,” IEEE Transactions on Information Theory, vol. 37, no. 1, 1991.
[192] M. Bolanos, S. Aviyente, and E. Bernat, “Graph analysis of neuronal interactions for the
error-related negativity,” IEEE Engineering in Medicine and Biology Conference, 2010.
[193] H. Prufer, “Neuer beweis eines satzes uber permutationen,” Arch. Math. Phys., vol. 27, pp.
742–744, 1918.
[194] L. Bai, W. Qin, J. Tian, J. Dai, and W. Yang, “Detection of dynamic brain networks modulated by acupuncture using a graph theory model,” Progress in Natural Science, vol. 19,
no. 7, pp. 827–835, 2009.
[195] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J.-P. Onnela, “Community structure
in time-dependent, multiscale, and multiplex networks,” Science, vol. 328, no. 5980, pp.
876–878, 2010.
180