NETWORK ANALYSIS WITH NEGATIVE LINKS

By

Tyler Scott Derr

A DISSERTATION

Michigan State University

in partial fulﬁllment of the requirements

Submitted to

for the degree of

Computer Science – Doctor of Philosophy

2020

ABSTRACT

NETWORK ANALYSIS WITH NEGATIVE LINKS

By

Tyler Scott Derr

As we rapidly continue into the information age, the rate at which data is produced has created
an unprecedented demand for novel methods to eﬀectively extract insightful patterns. We can
then seek to understand the past, make predictions about the future, and ultimately take actionable
steps towards improving our society. Thus, due to the fact that much of today’s big data can be
represented as graphs, emphasis is being taken to harness the natural structure of data through
network analysis. Traditionally, network analysis has focused on networks having only positive
links, or unsigned networks. However, in many real-world systems, relations between nodes in a
graph can be both positive and negative, or signed networks. For example, in online social media,
users not only have positive links such as friends, followers, and those they trust, but also can
establish negative links to those they distrust, towards their foes, or block and unfriend users.

Thus, although signed networks are ubiquitous due to their ability to represent negative links
in addition to positive links, they have been signiﬁcantly under explored. In addition, due to the
rise in popularity of today’s social media and increased polarization online, this has led to both
an increased attention and demand for advanced methods to perform the typical network analysis
tasks when also taking into consideration negative links. More speciﬁcally, there is a need for
methods that can measure, model, mine, and apply signed networks that harness both these positive
and negative relations. However, this raises novel challenges, as the properties and principles of
negative links are not necessarily the same as positive links, and furthermore the social theories
that have been used in unsigned networks might not apply with the inclusion of negative links.

The chief objective of this dissertation is to ﬁrst analyze the distinct properties negative links
have as compared to positive links and towards improving network analysis with negative links by
researching the utility and how to harness social theories that have been established in a holistic

view of networks containing both positive and negative links. We discover that simply extending
unsigned network analysis is typically not suﬃcient and that although the existence of negative
links introduces numerous challenges, they also provide unprecedented opportunities for advancing
the frontier of the network analysis domain. In particular, we develop advanced methods in signed
networks for measuring node relevance and centrality (i.e., signed network measuring), present
the ﬁrst generative signed network model and extend/analyze balance theory to signed bipartite
networks (i.e., signed network modeling), construct the ﬁrst signed graph convolutional network
which learns node representations that can achieve state-of-the-art prediction performance and
then furthermore introduce the novel idea of transformation-based network embedding (i.e., signed
network mining), and apply signed networks by creating a framework that can infer both link and
interaction polarity levels in online social media and constructing an advanced comprehensive
congressional vote prediction framework built around harnessing signed networks.

To my wife, parents, siblings, and entire family for their love and support.

iv

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my advisor, Dr. Jiliang Tang, for his guidance, encourage-
ment, inspiration, and support through my Ph.D. I have learned so much from him during these years
ranging from how to ﬁnd good research problems, writing papers, and designing presentations, to
ways of eﬃciently managing a research lab and mentoring students. He has worked tirelessly to
provide me with countless opportunities and learning experiences, which are what has led to my
growth as an independent researcher. I could not imagine having a better mentor, feel honored to
have been his student, and will always appreciate his suggestions and advice. He is also a dear
friend and has guided me with his knowledge and experience in both my work and life. Dr. Tang,
I cannot thank you enough.

I would like to thank my committee members Dr. Charu Aggarwal, Dr. Kenneth Frank, Dr.
Anil Jain, Dr. Pang-Ning Tan, and Dr. Kaitlin Torphy, for their insightful comments and helpful
suggestions. I had met Dr. Charu Aggarwal for the ﬁrst time at the SIAM International Conference
on Data Mining in 2017 and since then he has provided invaluable insights and guidance for much
of my research in signed networks. I had met Dr. Kenneth Frank for the ﬁrst time when Dr. Tang
and I gave a guest lecture on signed network analysis in Dr. Frank’s social network course. He
has opened my eyes to social network analysis from a computational social science perspective
and this has prepared me for being better equipped as an interdisciplinary researcher, especially
for applying data science techniques to social science problems. Dr. Anil Jain had greatly helped
me during my preparation for my faculty job market interviews by providing numerous tips based
on his many experiences while also helping to broaden my perspectives on research. Although I
have not had the pleasure of taking Dr. Pang-Ning Tan’s data mining course, I have found myself
continuously referencing his Introduction to Data Mining textbook throughout my Ph.D. studies.
Through discussions with Dr. Tan I was also able to strengthen the motivation for some of the
presented methods and now have numerous new ideas in mind to continue my research beyond this
dissertation. I met Dr. Kaitlin Torphy through the Teachers in Social Media Project at MSU. I have

v

learned a lot from Dr. Torphy in the domain of education research and she has mentored me on
some topics in educational data mining. I look forward to a continued collaboration with them all
after the completion of my dissertation.

I was fortunate to have been an intern at HRL Laboratories. It was here that I had the privilege
of having the amazing mentors: Dr. Kang-Yu Ni and Dr. Jiejun Xu. It was thanks to you both that
I gained valuable insights into new problems, techniques, and valuable domain knowledge that has
helped shape my research trajectory. Thank you both for everything. I would also like to thank Dr.
Tsai-Ching Lu for his insightful comments during my time at HRL.

I joined the Data Science and Engineering (DSE) Lab at the end of the Fall 2016 semester as
Dr. Jiliang Tang’s ﬁrst Ph.D. student when he was establishing the lab. During my Ph.D. study, I
have had the pleasure and fortune of having supportive and encouraging friends and colleagues. I
am thankful to all my collaborators from outside the DSE Lab: Dr. Charu Aggarwal, Dr. Kevin
Chen-Chuan Chang, Dr. Yi Chang, Pouya Esmalian, Dr. Kenneth Frank, Amin Javari, Dr. Qing
Li, Dr. Hui Liu, Dr. Zitao Liu, Dr. Kaitlin Torphy, Dr. Jianping Wang, Dr. Lingfei Wu, Dr.
Jiejun Xu, and Dr. Dawei Yin. I am thankful to all of my colleagues from the DSE Lab: Ibrahim
Ahmed, Dr. Meznah Almutairy, Aaron Brookhouse, Jamell Dacon, Wenqi Fan, Bryan Hendryx,
Dr. Jiangtao Huang, Wei Jin, Cassidy Johnson, Hamid Karimi, Juanhui Li, Yaxin Li, Haochen Liu,
Hua Liu, Xiaorui Liu, Yao Ma, Mitansh Madan, Daniel K. Ofori-Dankwa, Namratha Shah, Hannah
Striebel, Pegah Varghaei, Chenxing Wang, Wentao Wang, Xiaoyang Wang, Xin Wang, Yiqi Wang,
Zhiwei Wang, Han Xu, Hansheng Zhao, and Xiangyu Zhao. In particular, thanks to Zhiwei Wang
for collaboration on my ﬁrst co-author paper; thanks to Chenxing Wang and Dr. Suhang Wang for
collaboration on my ﬁrst ﬁrst-author paper; thanks to my DSE collaborators Aaron Brookhouse,
Jamell Dacon, Wenqi Fan, Dr. Jiangtao Huang, Wei Jin, Cassidy Johnson, Hamid Karimi, Haochen
Liu, Xiaorui Liu, Yao Ma, Chenxing Wang, Wentao Wang, Yiqi Wang, Zhiwei Wang, and Xiangyu
Zhao, I will remember staying awake to meticulously revise our papers until the last moment of the
deadlines. During my time in the DSE Lab I have learned so much from you all. I look forward
to continued collaboration and seeing all your future great achievements; I know ﬁrst-hand you are

vi

all in good hands having Dr. Jiliang Tang as an advisor, mentor, and friend.

Finally, I would again like to thank my wife, parents, siblings, and entire family for their love

and support. This dissertation is dedicated to them.

vii

TABLE OF CONTENTS

.
.
.

.
.
.

.
.
.

.
.
.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

.
LIST OF TABLES .
.
LIST OF FIGURES .
LIST OF ALGORITHMS .

.
.

.
.

.
.

.

1
2
4
5

CHAPTER 1

1.1 Research Challenges
.
1.2 Contributions .
1.3 Organization .
.

INTRODUCTION .
.
.
.

.
. .

.
.
.

.
.
.

. . . . . . . . . . . . . . . . . . . .

2.3 Signed Network Datasets, Properties, and Theories

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CHAPTER 2 FOUNDATIONS AND PRELIMINARIES . . . . . . . . . . . . . . . . . .
2.1 Basic Notations and Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Unsigned Network Properties and Theories
.
2.2.1 Degree Distribution and Network Density . . . . . . . . . . . . . . . . . .
2.2.2 Network Reciprocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Transitivity and Clustering Coeﬃcient . . . . . . . . . . . . . . . . . . . .
2.2.4 Network Homophily . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .

6
6
6
7
8
8
9
9
2.3.1
Signed Network Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Data Analysis on Signed Networks Properties . . . . . . . . . . . . . . . . 11
2.3.3
Signed Network Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.3.1 Balance Theory in Signed Networks . . . . . . . . . . . . . . . . 15
Status Theory in Signed Networks . . . . . . . . . . . . . . . . . 17
2.3.3.2
CHAPTER 3 MEASURING NETWORKS WITH NEGATIVE LINKS . . . . . . . . . . 19
3.1 Node Relevance Measurements in Signed Networks . . . . . . . . . . . . . . . . . 20
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1.1 Common Neighbors . . . . . . . . . . . . . . . . . . . . . . . . 23
Jaccard Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1.2
3.1.1.3
Preferential Attachment
. . . . . . . . . . . . . . . . . . . . . . 24
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2.1 Katz
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2.2 Asymmetric Similarity Measure for Weighted Networks . . . . . 28
3.1.2.3 Random Walk with Restart . . . . . . . . . . . . . . . . . . . . . 29
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Sign Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Tie Strength Prediction . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Node Centrality Measurement in Signed Networks . . . . . . . . . . . . . . . . . . 38
. . . . . 39
Signed Centrality Measurement Objective Function . . . . . . . . . . . . . 41
3.2.2.1
Signed Centrality Based on Status Theory . . . . . . . . . . . . . 41

3.2.1 An Overview of Deep Signed Centrality (DeSCent) Measurement
3.2.2

3.1.2 Global Methods .

3.1.1 Local Methods

3.1.3 Experiments .

.

.

3.1.3.1
3.1.3.2

viii

Problem Statement

.

.

.

4.2.2

4.2 Balance in Signed Bipartite Networks

4.1 Generative Modeling of Signed Networks

Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . .

Parameter Learning Experiment . . . . . . . . . . . . . . . . .

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performance Comparison . . . . . . . . . . . . . . . . . . . .

.

3.2.4.1
3.2.4.2 Generalization Across Datasets
3.2.4.3

3.2.2.2 Harnessing Balance Theory and Higher-order Structures . . . .
3.2.2.3 Additional DeSCent Measurement Constraints

4.1.5 Time Complexity of BSCL . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.6 Experiments .

. 42
. . . . . . . . . . 45
3.2.3 Overall DeSCent Deep Network Framework . . . . . . . . . . . . . . . . . 46
3.2.4 Experiments .
. 47
. 50
. . . . . . . . . . . . . . . . . . 52
. 54
CHAPTER 4 MODELING NETWORKS WITH NEGATIVE LINKS . . . . . . . . . . . 56
. . . . . . . . . . . . . . . . . . . . . . 58
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.1
4.1.2 An Overview of Balanced Signed Chung-Lu (BSCL) Model
. . . . . . . . 59
4.1.3 Network Generation for BSCL . . . . . . . . . . . . . . . . . . . . . . . . 62
Parameter Learning for BSCL . . . . . . . . . . . . . . . . . . . . . . . . 65
4.1.4
4.1.4.1
Learning  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Learning  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.4.2
Learning  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.1.4.3
. 72
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.1.6.1 Network Generation Experiment . . . . . . . . . . . . . . . . . . 73
4.1.6.2
. 77
. . . . . . . . . . . . . . . . . . . . . . . . 79
. 80
. . . . . . . . . . . . . . . . 81
4.2.1.1
. . . . . . . . . 82
4.2.1.2
. 83
4.2.1.3
. . . . . . . . . . . . . . . . . . . . . 84
4.2.1.4
4.2.1.5
. 86
Sign Prediction for Signed Bipartite Networks . . . . . . . . . . . . . . . . 87
Signed Caterpillars Based Classiﬁer . . . . . . . . . . . . . . . . 88
4.2.2.1
Low-Rank Sign Prediction . . . . . . . . . . . . . . . . . . . . . 89
4.2.2.2
4.2.2.3 Random Walk Based Sign Prediction . . . . . . . . . . . . . .
. 92
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
. . . . . . . . . . . . . . . . . . . . . . . . 96
Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 99
CHAPTER 5 MINING NETWORKS WITH NEGATIVE LINKS . . . . . . . . . . . . . 101
5.1 Signed Graph Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . 102
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.1.1
5.1.2 The Proposed Signed Graph Convolutional Network Framework . . . . . . 104
5.1.2.1 Unsigned Graph Convolutional Networks . . . . . . . . . . . . . 105
5.1.2.2 Aggregation Paths with Positive and Negative Links
. . . . . . . 107
Signed Graph Convolutional Network . . . . . . . . . . . . . . . 109
5.1.2.3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Performance Comparison . . . . . . . . . . . . . . . . . . . . . 115

Signed Bipartite Network Datasets
Signed Butterﬂies in Signed Bipartite Networks
Signed Butterﬂy Isomorphism Classes
Signed Butterﬂy Analysis
Signed Caterpillars in Bipartite Networks . . . . . . . . . . . .

4.2.1 Balance Theory in Signed Bipartite Networks . . . . . . . . . . . . . . .

4.2.3.1 Comparison Results
4.2.3.2

4.2.3 Experiments .

.

.

.

. . . . . . . . . . . . .

Problem Statement

5.1.3 Experiments .

.

.

5.1.3.1

ix

6.1.1
6.1.2

5.1.3.2

5.2.1

5.2.3.1
5.2.3.2

5.2.3 Experiments .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . .

. 118
5.2 Role-based Signed Network Embedding . . . . . . . . . . . . . . . . . . . . . . . 119
Problem Statement
. 121
5.2.1.1 Unsigned Network Embedding . . . . . . . . . . . . . . . . . . . 121
Signed Network Embedding . . . . . . . . . . . . . . . . . . . . 122
5.2.1.2
5.2.2 Role-based Signed Network Embedding . . . . . . . . . . . . . . . . . . . 123
5.2.2.1 Network Transformation . . . . . . . . . . . . . . . . . . . . . . 124
5.2.2.2
Embedding the Original Network . . . . . . . . . . . . . . . . . 126
. 127
5.2.2.3 Model Justiﬁcation . . . . . . . . . . . . . . . . . . . . . . . .
. 128
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performance Comparison . . . . . . . . . . . . . . . . . . . .
. 129
. . . . . . . . . . 131
Interpretation of the Encodings of Role-nodes:
CHAPTER 6 SIGNED NETWORK APPLICATIONS . . . . . . . . . . . . . . . . . . . 133
6.1 Link and Interaction Polarity Prediction . . . . . . . . . . . . . . . . . . . . . . . 134
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Problem Statement
. . . . . . . . . . . . . . . . 135
Signed Network User Opinion Data Analysis
Extended Epinions Dataset . . . . . . . . . . . . . . . . . . . . . 136
6.1.2.1
6.1.2.2 Correlated User Opinions: A Global Perspective . . . . . . . . . 136
6.1.2.3 Correlated User Opinions: A Local Perspective . . . . . . . . . . 137
6.1.3 The Joint Link and Interaction Polarity Prediction (LIP) Framework . . . . 139
6.1.3.1 Basic Link and Interaction Polarity Prediction Models . . . . . . 139
6.1.3.2 Modeling User Opinion Correlations
. . . . . . . . . . . . . . . 141
6.1.3.3
The Proposed Joint Framework . . . . . . . . . . . . . . . . . . 142
6.1.3.4 An Optimization Method for LIP . . . . . . . . . . . . . . . . . 143
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Sparsity Experiments . . . . . . . . . . . . . . . . . . . . . . . . 147
. . . . . . . . . . . . . . . . . . . . . . 149
Experiment Discussions . . . . . . . . . . . . . . . . . . . . . . 152
Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.2 Congressional Vote Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2.1
6.2.2 Overview of Multi-factor Congressional Vote Prediction (MFCVP) . . . . . 157
Ideology Factors of MFCVP . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.2.3
Social Factors of MFCVP . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.2.4
Party Features
6.2.4.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Signed Bipartite Network Features . . . . . . . . . . . . . . . . . 161
6.2.4.2
6.2.5 Classiﬁcation Details of MFCVP . . . . . . . . . . . . . . . . . . . . . . . 163
6.2.6 Experiments .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.2.6.1 Dataset and Data Collection . . . . . . . . . . . . . . . . . . . . 164
Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . 165
6.2.6.2
6.2.6.3
Individual Representative Vote Predictions
. . . . . . . . . . . . 167
6.2.6.4 Overall Roll-call Vote Outcome Predictions . . . . . . . . . . . . 168
Political Factor Analysis . . . . . . . . . . . . . . . . . . . . . . 170
6.2.6.5

6.1.4.1
6.1.4.2 Cold-Start Experiments
6.1.4.3
6.1.4.4

Problem Statement

6.1.4 Experiments .

.

.

.

.

.

.

.

x

7.1 Summary .
.
7.2 Future Directions .

.

.

.

CHAPTER 7 CONCLUSION AND FUTURE DIRECTIONS . . . . . . . . . . . . . . . . 173
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
. 176
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 178

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BIBLIOGRAPHY .

.
.

.
.

.
.

.

.
.

.

.

.

.

.

.

.

xi

LIST OF TABLES

Table 2.1: Statistics of four signed social networks. . . . . . . . . . . . . . . . . . . . . . . 10

Table 2.2: Probability of links being reciprocal in signed social networks. . . . . . . . . . . 13

Table 2.3: Tie strengths of positive and negative links in signed social networks.

. . . . . . 14

Table 3.1: Notations regarding node relevance in signed networks. . . . . . . . . . . . . . . 21

Table 3.2: Performance comparison of link prediction under the undirected setting. . . . . . 34

Table 3.3: Performance comparison of link prediction under the directed setting.

. . . . . . 35

Table 3.4: Performance comparison of tie strength prediction under the undirected setting.

. 37

Table 3.5: Performance comparison of tie strength prediction under the directed setting.

. . 37

Table 3.6: Signed link prediction results with AUC.

. . . . . . . . . . . . . . . . . . . . . 52

Table 4.1: Notations regarding signed network generative modeling. . . . . . . . . . . . . . 60

Table 4.2: Statistics of three signed social networks for generative modeling.

. . . . . . . . 73

Table 4.3: Positive/negative link sign distribution.

. . . . . . . . . . . . . . . . . . . . . . 75

Table 4.4: Proportion of triangles balanced in generated signed networks.

. . . . . . . . . . 75

Table 4.5: Distribution of signed triangle types in the Bitcoin-Alpha dataset.

. . . . . . . . 76

Table 4.6: Distribution of signed triangle types in the Bitcoin-OTC dataset.

. . . . . . . . . 76

Table 4.7: Distribution of signed triangle types in the Epinions dataset.

. . . . . . . . . . . 76

Table 4.8: Absolute diﬀerence from the generated networks to the real signed networks

averaged over the three datasets for each respective property.

. . . . . . . . . . . 78

Table 4.9: Notations regarding signed bipartite networks. . . . . . . . . . . . . . . . . . . . 80

Table 4.10: Statistics on signed bipartite networks. . . . . . . . . . . . . . . . . . . . . . . . 81

Table 4.11: Signed butterﬂy statistics on signed bipartite networks.

. . . . . . . . . . . . . 85

xii

Table 4.12: Link sign prediction results in terms of (AUC,F1).

. . . . . . . . . . . . . . . . 97

Table 5.1: Notations in regards to signed graph convolutional networks. . . . . . . . . . . . 103

Table 5.2: Statistics of two signed network dataset variants for SGCN. . . . . . . . . . . . . 117

Table 5.3: Link sign prediction results with AUC.

. . . . . . . . . . . . . . . . . . . . . . 117

Table 5.4: Link sign prediction results with F1. . . . . . . . . . . . . . . . . . . . . . . . . 117

Table 5.5: Statistics of three signed network dataset variants for ROSE.

. . . . . . . . . . . 129

Table 5.6: AUC of the proposed model (ROSE) and the baseline methods on the Wiki-

Election, Slashdot and Epinions datasets.

. . . . . . . . . . . . . . . . . . . . . 130

Table 6.1: Extended epinions dataset statistics.

. . . . . . . . . . . . . . . . . . . . . . . . 135

Table 6.2:

Interaction polarity prediction cold-start results.

. . . . . . . . . . . . . . . . . 151

Table 6.3: Link prediction cold-start results.

. . . . . . . . . . . . . . . . . . . . . . . .

. 151

Table 6.4: Notations regarding congressional vote prediction.

. . . . . . . . . . . . . . . . 157

Table 6.5: US Congress dataset statistics.

. . . . . . . . . . . . . . . . . . . . . . . . . . . 164

xiii

LIST OF FIGURES

Figure 1.1: A visualization of an unsigned and signed network.

. . . . . . . . . . . . . . .

Figure 1.2: An overview of the research contributions presented in this dissertation.

. . . .

1

4

Figure 2.1: Degree distributions in signed social networks. . . . . . . . . . . . . . . . . . . 12

Figure 2.2: Visualizing balance theory in the form of signed triangles. . . . . . . . . . . . . 16

Figure 3.1: Triplets encountered during signed random walk.

. . . . . . . . . . . . . . .

. 32

Figure 3.2: An illustration of our deep neural network for learning signed centrality scores.
40
Figure 3.3: An illustration of how we calculate the matrices T+, T+, T−, and T−. . . . 43

Figure 3.4: Signed link prediction performance comparison of within versus cross training.

53

Figure 3.5: Analyzing the signed centrality additional constraints on Bitcoin-Alpha.

. . . . 55

Figure 4.1: Visualization of the degree distributions and local clustering coeﬃcients. . . . . 74

Figure 4.2: BSCL parameter learning analysis.

. . . . . . . . . . . . . . . . . . . . . . . . 79

Figure 4.3: Undirected signed butterﬂy isomorphism classes.

. . . . . . . . . . . . . . . . 83

Figure 4.4: High-level overview of how we construct A from B, P and P.

. . . . . . . . 93

Figure 4.5: Parameter sensitivity on  and  in MFwBT on the U.S. Senate dataset.

. . . . 98

Figure 4.6: Parameter sensitivity on  and  in SBRW on the U.S. House dataset.

. . . . 99

Figure 5.1: An illustration of the aggregation paths according to balanced and unbalanced

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

paths. .

.

network.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure 5.2: An illustration of how SGCN aggregates neighbor information in a signed

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Figure 5.3: Parameter sensitivity when varying the parameter  on the Bitcoin-Alpha dataset.118

Figure 5.4: Transformation of a signed network with two nodes to an unsigned bipartite

network of role-nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

xiv

Figure 5.5: Transformation process of the input signed network to the network of role-node. 124

Figure 5.6: The average pairwise distance of the encoding vectors of the role-nodes of a
node pair (, ) for diﬀerent interaction-types between them: positive link,
negative link, and absence of a link. . . . . . . . . . . . . . . . . . . . . . . . . 131

Figure 6.1: Giving and receiving behaviors from the global perspective on opinion cor-

relations. .

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Figure 6.2: Giving and receiving behaviors from the local perspective on opinion correlations.138

Figure 6.3: Experimental results with varied sparsity settings.

. . . . . . . . . . . . . . . . 149

Figure 6.4: Performance variations of LIP on the 90% data sparsity experiment w.r.t. 

and .

.

.

.

.

.

.

. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Figure 6.5: The proposed Multi-Factor Congressional Vote Prediction (MFCVP) framework. 158

Figure 6.6:

Illustrations of the signed bipartite network features. . . . . . . . . . . . . . . . 162

Figure 6.7: Performance evaluation of MFCVP predicting individual representative votes.

. 167

Figure 6.8: Performance evaluation of MFCVP predicting the overall roll-call vote outcome. 169

Figure 6.9: Feature analysis using the feature importance values from MFCVP_RF. . . . . . 171

xv

LIST OF ALGORITHMS

Algorithm 3.1: Optimization procedure for DeSCent.

. . . . . . . . . . . . . . . . . . .

.

. 47

Algorithm 4.1: Balanced Signed Chung-Lu (BSCL) model.

. . . . . . . . . . . . . . . . . 62

Algorithm 4.2: BSCL_Network_Generation(, , , , , ).

. . . . . . . . . . . . . . .

. 63

Algorithm 5.1: Typical unsigned GCN framework.

. . . . . . . . . . . . . . . . . . . . . 107

Algorithm 5.2: Signed Graph Convolutional Network (SGCN) embedding generation.

. . 113

Algorithm 6.1: The optimization method for the proposed LIP framework.

. . . . . . . . . 145

xvi

CHAPTER 1

INTRODUCTION

Most existing network analysis has solely focused on unsigned networks (or networks with only
positive links) as shown in Figure 1.1(a). However, in many real-world systems, relations can be
both positive and negative. For instance, social media users not only have positive links such as
friends (e.g., Facebook and Slashdot), followers (e.g., Twitter), and trust (e.g., Epinions), but also
can establish negative links such as foes (e.g., Slashdot), distrust (e.g., Epinions), blocked and
unfriended users (e.g., Facebook and Twitter). These relations can be represented as networks with
both positive and negative links (or signed networks) as shown in Figure 1.1 (b). The introduction
of negative links in signed networks not only increases the complexity of the representation, but also
poses tremendous challenges for traditional unsigned network analysis. It is evident from recent
researches that negative links in signed networks have distinct properties from positive links [1].
Meanwhile, the fundamental principles and theories of signed networks are substantially diﬀerent
from those of unsigned networks [2, 1]. Hence, signed network analysis cannot be carried out by
simply extending unsigned network analysis. On the other hand, the existence of negative links
also brings about unprecedented opportunities for network analysis. First, negative links have been
proven to have signiﬁcant added value over positive links in various analytical tasks [3, 4, 5, 6].
Second, analogous to unsigned networks, we can have similar analysis tasks for signed networks;
however, negative links in signed networks make them applicable to a broader range of applications
and tasks [7].

(a) Unsigned network

(b) Signed network

Figure 1.1: A visualization of an unsigned and signed network.

1

Therefore, in this dissertation, I investigate network analysis with negative links (i.e., signed
network analysis) focusing on each of the four primary directions. In particular, I propose novel
frameworks to tackle the challenges associated with measuring, modeling, mining, and applying
signed networks.

1.1 Research Challenges

While analyzing and making predictions in signed networks, we are faced with several challenges

that must be overcome:

• The ﬁrst major challenge is that the properties and theories of signed networks deviate from
traditional unsigned networks. In Chapter 2 we perform an initial investigation to analyze
multiple network properties for both positive and negative links, which empirically show
their many diﬀerences on a representative set of signed networks. Furthermore, many of
the unsigned network methods are built with classical network theories, but recent work has
shown they might not apply to signed networks. Thus, this leads to needing dedicated eﬀorts
for new ways of constructing signed network analysis methods built in accordance to signed
network speciﬁc theories.

• For measuring signed networks, we have the challenge of polarity being introduced. More
speciﬁcally, when seeking to deﬁne how related two nodes are in a signed network (i.e.,
signed node relevance) we must not only consider the strength of their relationship, but also
whether it is in a positive or negative way. This also requires balancing between situations
such as two users having many common friends while also disagreeing on many other users
(e.g., one user trusts the third parties while the other distrusts them), and the situation of two
users having only a single friend in common, but having no diﬀering opinions on others.
Similarly, for deﬁning a signed node centrality, the polarities of relations pose the speciﬁc
challenge of needing to now diﬀer from the “infamous” users who are important for negative
reasons from the “famous” users who are popular for positive reasons.

2

• For signed generative network modeling the major challenge is that this domain has never been
explored before to the best of our knowledge. Hence, just as unsigned generative modeling
requires elegant strategies to allow the generation of networks maintaining network properties
such as a correct degree distribution and the level of local clustering, signed networks require
similar level of ﬁnesse. More speciﬁcally, while generating a synthetic signed network if
we seek to maintain the correct ratio of positive to negative links this will conﬂict with
maintaining the correct distribution of balanced to unbalanced triangles due to these signed
triangles requiring to be of a certain form to adhere to the signed network social theory.
Then, when seeking to harness the same social balance theory for bipartite networks new
challenges are introduced for signed bipartite networks. This is due to prior works primarily
using balance theory in the form of triangles, but there are inherently no triangles in bipartite
networks, which thus require investigating other useful network subgraph structures beyond
signed triangles.

• Recently network embedding methods and graph neural networks have shown to provide
signiﬁcant improvements in a wide range of network analysis tasks, such predicting missing
links in the network or classifying the node type for those missing labels. The ﬁrst challenge
here is that many methods are based on random walks to build up a context of the local
neighborhood and utilizes contrastive loss in relation to randomly sampled nodes in the
network. In addition, many also make use of aggregating features from surrounding nodes
under the assumption of homophily. However, this will not suﬃce for signed networks, since
a traditional random walker is not suﬃcient to diﬀerentiate the context from both a positive
and negative perspective, while homophily may also not apply on negative links (as it does
on positive links). Furthermore, the situation becomes even more complex the further the
aggregation distance is from the focal node.

The chief objective of this dissertation is to ﬁrst analyze the distinct properties of negative
links as compared to positive links and towards improving network analysis with negative links by

3

Figure 1.2: An overview of the research contributions presented in this dissertation.

researching the utility and how to harness social theories that have been established in a holistic
view of networks containing both positive and negative links. We discover that simply extending
unsigned network analysis is typically not suﬃcient and that although the existence of negative links
introduces numerous challenges, they also provide unprecedented opportunities for advancing the
frontier of the network analysis domain. An overview of my dissertation research is summarized
in Figure 1.2.

1.2 Contributions

The contributions of this dissertation can be summarized as follows:

• We develop advanced methods in signed networks for measuring node relevance and centrality

(i.e., signed network measuring);

• Presenting the ﬁrst generative signed network model and extend/analyze balance theory in

signed bipartite networks (i.e., signed network modeling);

• Construct the ﬁrst signed graph convolutional network and introduce the novel idea of
network transformation based signed network embedding, which both are able to learn node
representations that can achieve state-of-the-art prediction performance on two representative
link-oriented tasks (i.e., signed network mining);

• Applying signed networks by creating a framework that can infer both link and interaction

4

polarity levels in online social media, and constructing an advanced comprehensive congres-
sional vote prediction framework built around harnessing signed networks.

1.3 Organization

The remainder of this dissertation is organized as follows.

In Chapter 2, we introduce the
preliminaries, including basic deﬁnitions, and social theories used in network analysis. In Chapter
3, we introduce multiple signed network node relevance measures and a signed centrality measure
of which all are developed in accordance to signed social theories. Then, in Chapter 4, we introduce
the ﬁrst proposed generative network model for signed networks and furthermore introduce models
for making predictions in signed bipartite graphs by harnessing structural balance theory through
our novel deﬁned signed butterﬂy network substructures. Chapter 5 presents our work on mining
signed networks through the development of node level representations that can be used to solve
traditional signed network mining tasks, such as predicting missing signed links and the unknown
polarity of existing links. This is performed by developing the ﬁrst signed graph convolutional
network and introducing the novel idea of network transformation based embeddings. Chapter 6
presents our work on applying signed network analysis techniques to important interdisciplinary
research in predicting future congressional votes, while also developing a method to help alleviate
the cold-start problem of predicting the polarity of direct and indirect links between users in online
platforms. Finally, Chapter 7 concludes the dissertation and presents promising future research
directions.

5

CHAPTER 2

FOUNDATIONS AND PRELIMINARIES

In this section, we brieﬂy introduce basic network deﬁnitions and social theories for networks
consisting of only positive links (i.e., unsigned networks), only negative links, and networks
containing both positive and negative links. This provides the groundwork for later introducing our
novel methodologies for measuring, modeling, mining, and applying signed networks in the later
chapters.

2.1 Basic Notations and Deﬁnitions

A signed network G is composed of a set of  nodes (i.e., users) U = {1, 2, . . . , }, a set of
positive links E+ and a set of negative links E−. We represent directed signed links between users
in an adjacency matrix, A ∈ R×, where A  = 1 if  has a positive link to  , −1 if  creates
a negative link to  , and 0 when  has no link to  . Furthermore, we can separate a signed
network into two networks, one containing only positive links and the other with only negative links,
which we can represent in the adjacency matrices A+ ∈ R× and A− ∈ R×, respectively. We
represent a positive link from  to   with A+
  = 0 otherwise. Similarly, we represent
a negative link from  to   with A−
  = 0 otherwise. Note that we can similarly
deﬁne an undirected signed network such that there are no directions associated with their signed
links. Therefore, unlike the directed signed links, when  and   have a positive (or negative link)
undirected link then A  = A  = 1 (or A  = A  = −1). In other words, an undirected signed
network will have a symmetric adjacency matrix. We furthermore note that A = A+ − A−.

  = 1 and A−

  = 1 and A+

2.2 Unsigned Network Properties and Theories

In networks consisting of only positive links (i.e., unsigned networks, A+) there are a few well
known properties that are quite universally studied, such as the degree distribution, reciprocity,
transitivity, and clustering coeﬃcient, which we deﬁne here. Additionally, we introduce the social

6

network theory of homophily [8].

2.2.1 Degree Distribution and Network Density

In a network, observing the distribution of node degrees can provide great insight. One of the
ﬁrst analyses of the degree distribution of a network was done in [9]. They studied a friendship
network among children in a school and noticed that while many children were selected as a friend
by only a few of the other students, there were a few children that had been selected as a friend by
very many other students. This led to later works, such as [10], observing that many real-world
(unsigned) networks have degree distributions that follow a power-law (more speciﬁcally in [10]
they studied citation networks). Essentially what many have observed is that for both in- and
out-degrees of positive links in unsigned networks follow power-law distributions – most nodes
have a small degrees while a few nodes have large degrees, and the networks with power-law degree
distributions are commonly called “scale-free” networks [11].

In comparison, the network density is the property that compares the speciﬁc connectivity of a
given unsigned network (which is deﬁned by their links E+) to a the maximum number of possible
edges in that network. We note that for an unsigned network having the maximum number of edges
is denoted as a fully-connected network and assuming no self loops (i.e., edges connecting a node
2 ( − 1) edges in the undirected setting and ( − 1) edges in the directed
setting. Thus, the density of  and  in the undirected and directed unsigned network setting can
be deﬁned as follows:

to itself) it has(cid:0)

(cid:1) = 1

2

(cid:1) =

(cid:0)

|E+|
2

 =

2|E+|
( − 1) =

||A+||+
( − 1)

 =

|E+|

( − 1) =

||A+||+
( − 1)

(2.1)

(cid:1) =

2(cid:0)

|E+|
2

where we assume E+ and A+ are appropriately constructed in the undirected and directed settings
(i.e., A+ is symmetrical in the undirected setting) for the respective deﬁnitions of unsigned network
density.

7

2.2.2 Network Reciprocity

Links in directed social networks can be generally categorized into reciprocal (two-way) and
parasocial (one-way) links [12]. Reciprocal links among nodes in unsigned networks are usually
treated as the basis to create stable social ties and play an important role in the formation and
evolution of networks [13]. Reciprocity is uniquely deﬁned on directed networks. More speciﬁcally,
given a pair of users (,  ), we deﬁne an edge  to   being reciprocated if there also exists a link
from   back to . Note that this is the shortest loop in a simple directed network (assuming no
self-loops). An example of a reciprocal link would be if you follow someone on the social media
site Twitter and they also follow you back.

While works such as [14] have examined relationships between in-degrees and out-degrees of
social networks, it was in [15] where it is believed the ﬁrst measurement of reciprocity appeared [16].
We can more formally deﬁne reciprocity  [17] for an directed unsigned network as the percentage
of positive links that are reciprocated as follows:
 A+
A+
 =
where Tr(·) is the trace of a matrix. Note that A+
 A+
between  and  , and A+

 = 1 if and only if there is a reciprocal link

(cid:16)(A+)2(cid:17)

1
|E+| Tr

1
|E+|


=1

=+1

 A+

 = 0 otherwise.

 =

(2.2)

2.2.3 Transitivity and Clustering Coeﬃcient

In an unsigned network transitivity can be explained as “a friend’s friend is a friend”. Thus,
the network only has perfect transitivity if each of the networks components are fully connected.
However, in most real-world unsigned networks this is not going to be true for the entire network.
Thus, we can measure the level of partial transitivity as follows. We ﬁrst observe the wedges of
the users , , and  , which are paths of length two, such as the wedge consisting of the edges
 friends with  and  friends with  . We then consider this speciﬁc wedge closed if we also
have  friends with   (i.e., forms a triangle). The clustering coeﬃcient (which was ﬁrst measured
in [18]) is then the fraction of wedges that are closed into triangles. More formally, we deﬁne the

8

clustering coeﬃcient  in an unsigned network as:

||A+ ◦ (A+)2||+

||(A+)2||+

=

(total number of closed wedges)

 =

(total number of wedges)

(2.3)
where◦ denotes the Hadamard product (i.e., element-wise product) of two matrices and we use ||·||+
to denote the summation of all elements of a matrix. Note that many real-world unsigned networks
observe relatively higher clustering coeﬃcients as compared to random graphs. Furthermore, a
lot of prior work (ﬁrst starting in [19]) have shown evidence that transitivity and having a higher
clustering coeﬃcient is both common and important in social networks.

2.2.4 Network Homophily

In an unsigned social network homophily can be summarized with the proverb “birds of a feather
ﬂock together” [8]. Coming from roots in sociology [20], it has been applied to social networks and
used to explain the phenomenon that users who are similar are more likely to become friends with
each other. This theory has been used in numerous social network areas, such as explaining the
evolution/growth of networks [21] and in the recently developed graph neural networks [22], which
are a generalization of deep neural networks to graph structured data. More speciﬁcally, the latter
utilizes this fact they can aggregate node/user features from their local structural neighborhood
(e.g., a user’s set of friends) to better represent themselves. However, it is not necessarily the case
that the same will apply for negative links.

2.3 Signed Network Datasets, Properties, and Theories

Now, having deﬁned the basic network properties and social theory governing unsigned net-
works, here will investigate and discuss them further in the signed network setting. Furthermore,
we will introduce the two major social theories that have been developed for networks containing
both positive and negative links.

First we introduce a few representative signed network datasets. Then, we present an analysis
for the same set of unsigned network properties on negative links and introduce signed network
social theories that help explain the discovered diﬀerences, help motivate the need for dedicated

9

Table 2.1: Statistics of four signed social networks.

# Users ()

3,784
5,901
79,116
131,828

# Positive Edges (|E+|)

22,651
32,271
392,179
717,667

# Negative Edges (|E−|)

1,556
3,438
123,218
123,705

Network

Bitcoin-Alpha
Bitcoin-OTC

Slashdot
Epinions

signed network analysis eﬀorts, and guidance towards building our novel methodologies presented
in the later chapters.

2.3.1 Signed Network Datasets

For the majority of this dissertation we study four signed network datasets we have collected,
namely Bitcoin-Alpha1, Bitcoin-OTC2, Slashdot3 and Epinions4. Some basic statistics of these
four signed network datasets are demonstrated in Table 4.2. Below we describe more details about
these datasets.

The Bitcoin-Alpha and Bitcoin-OTC networks are signed networks that we collected from
publicly available data from their respective websites5. We note that smaller and less comprehensive
versions of this data had previously been collected in [23]. The two Bitcoin sites are open market
websites that allow users to buy and sell things. Due to the anonymity behind users’ Bitcoin
account, users of these websites form trust networks to prevent against scammers (e.g., fake users
who are just attempting to have another user send them bitcoins, but never deliver their end of
the deal, which is usually the delivery of some other monetary good). In addition to the signed
networks, users in both websites can specify scores in the range [1,10] (or [-10,-1]) to indicate the
positive (or negative) tie strength. Note that negative links in both websites are visible to the public.
The Slashdot dataset was obtained from [24]. Slashdot focuses on providing technology news
1http://www.btcalpha.com
2https://www.bitcoin-otc.com
3http://www.slashdot.org
4http://www.epinions.com
5The Bitcoin-Alpha and Bitcoin-OTC data was exhaustively crawled on December 18th of 2016.

10

since 1997. More speciﬁcally, the technology stories are written by either editors or submitted by
the users and then other users are allowed to comment on these published stories. In addition to
being an website allowing social connections, it added a novel component allowing users to ﬂag
others with negative sentiment in addition to the more traditional positive only sentiment (e.g.,
friends feature). In more detail, this unique feature was established in 2002 where the website has
since allowed users to explicitly mark other users as their friends (i.e., positive links) or foes (i.e.,
negative links) (i.e., their “Zoo” feature). Note that negative links in Slashdot are only visible to
users who are currently logged into the system (i.e., negative relations are not publicly available,
but require signing up as a member to the site).

In addition, we have collected a dataset from the product review site Epinions where users can
establish trust (i.e., positive) and distrust (i.e., negative) links. In addition, users can write reviews
for items from certain pre-deﬁned categories. Furthermore, users can then rate the “helpfulness”
of those reviews on a scale of 1 to 6 (with a higher score denoting the user found the review
more helpful). While there have been numerous multiple versions of this dataset [25, 2, 26, 27],
we primarily only use the explicitly created trust and distrust links made between users. Later in
Section 6.1 we will utilize these helpfulness ratings (obtained from [27]), but otherwise we only
use the more commonly used basic signed network version from [2]. We note that negative links
in Epinions are totally invisible to the public, but were obtained from Epinions staﬀ for research
purposes in this dataset. However, the helpfulness ratings are publicly available.

2.3.2 Data Analysis on Signed Networks Properties

Now, having deﬁned the basic network properties and social theory governing unsigned networks,
here we investigate and discuss them further in the signed network setting. More speciﬁcally, we
perform an initial study of these properties on both the networks consisting of only negative links
(i.e., A−) and also positive links (i.e., A+) from four real-world signed networks (as shown in
Table 4.2). In addition, as previous studies suggested that balance theory is helpful to explain social
phenomena in signed networks [4] and status theory is another inﬂuential social theory in signed

11

(a) Bitcoin-Alpha

(b) Bitcoin-BTC

(c) Epinions

(d) Slashdot

Figure 2.1: Degree distributions in signed social networks.

networks [2], we also deﬁne and discuss some insights from both of these signed social network
theories.

Degree Distributions: Here, we analyze the undirected degree distributions for the positive
(i.e., E+ and negative (i.e., E−) links. For each user, we calculate and combine the numbers of in-
and out-degrees for both positive and negative links. The distributions of the undirected degrees
for positive and negative links in our four signed networks are demonstrated in Figure 2.1. From
the ﬁgure, it is clearly observed that the degree distributions of positive and negative links in all
four signed networks also follow power-law distributions. For instance, a few nodes give a large
number of negative links; while many nodes only give few negative links. Note that this conclusion
can be approximated by observing a linear relationship in the distribution plots when having a
log-log scale. Hence, we can see that positive and negative links indeed empirically have at least
one similar property so far in regards to having a power-law degree distribution.

12

Table 2.2: Probability of links being reciprocal in signed social networks.

Datasets

Bitcoin-Alpha
Bitcoin-OTC

Slashdot
Epinions

Positive Links Negative Links

85.4%
83.8%
30.7%
34.8%

18.0%
17.8%
7.4%
3.8%

Network Density: From Table 4.2 we can easily observe that the negative links are sparser
than positive links. In other words, the density of the positive links (i.e., deﬁned in Eq. (2.1)) is
signiﬁcantly larger than that of the negative links (i.e., redeﬁning Eq. (2.1) with E− and A− instead
of E+ and A+), which is obvious from the fact that both the positive and negative linked networks
have the same number of users (i.e., the same denominator in Eq. (2.1 and its negative link variant).
Thus, we can already start to notice that negative links are having diﬀerent global behaviors from
positive links.

Reciprocal Links: For a pair of users (,  ), there are four types of reciprocal links –
( +   ,   + ), ( +   ,   − ), ( −   ,   − ) and ( −   ,   + ), where  +   (or
 −  ) denotes that there is a positive link (or a negative link) from  to  . We checked our four
signed networks and found that among four types of reciprocal links, there are few ( +   ,   − )
and ( −   ,   + ). Therefore, our analysis on reciprocal links focuses on ( +   ,   + ) and
( −   ,   − ). We calculate if  has a positive link (or a negative link) to  , how likely  
also has a positive link (or a negative link) to . The results on four signed networks are shown in
Table 2.2.

From the table, we ﬁrst make the observation that the percent of reciprocal positive links is much
higher than that of reciprocal negative links in all four signed social networks. Next, we notice that
in all four websites, positive links are always visible to the public, the percent of reciprocal positive
links in Bitcoin-Alpha and Bitcoin-OTC is much higher than that in Slashdot and Epinions. Users
in Bitcoin Alpha and OTC exchange bitcoins with others; while users share free content (news or

13

Table 2.3: Tie strengths of positive and negative links in signed social networks.

Overall Link
Tie Strengths
Non-reciprocal
Tie Strengths

Positive Reciprocal

Tie Strengths

Negative Reciprocal

Tie Strengths

Positive
Negative
Positive
Negative

(ui uj,   + )
(ui uj,   − )
(ui − uj,   − )
(ui − uj,   + )

1.998
-6.319
2.225
-6.120
1.955
2.611
-7.079
-6.291

Bitcoin-Alpha

Avg. Strength

Bitcoin-OTC

# Links Avg. Strength
22,651
1,556
3,054
1,029
19,350
247
280
247

1.968
-7.538
2.215
-7.540
1.915
2.571
-7.920
-6.875

# Links
32,271
3,438
4,868
2,467
27,044
359
612
359

reviews) with others in Slashdot and Epinions. Thus, Bitcoin Alpha and OTC users need much
stronger social ties for bitcoin trading in the online worlds than users in Slashdot and Epinions
to consume online free content. Finally, we note that the percent of reciprocal negative links
in Bitcoin-Alpha and Bitcoin-OTC is much higher than that in Slashdot, where the percent of
reciprocal negative links in Slashdot is much higher than that in Epinions. Four websites have
diﬀerent access controls to negative links. In Bitcoin Alpha and OTC, negative links are totally
visible to the public; only users who login to the Slashdot can see negative links; while negative
links are totally private in Epinions. Exposing negative links may cause revenges that consequently
could lead to more reciprocal negative relations [28].

Furthermore, since the Bitcoin-Alpha and Bitcoin-OTC are weighted signed networks, we also
investigate if there are any diﬀerences/similarities between positive and negative when it comes
to the strength of the ties. More speciﬁcally, we ﬁrst calculated the average positive and negative
link strengths, which are reported in Table 2.3. The ﬁrst observation is that the magnitude of
negative links is signiﬁcantly higher than that of positive links in both datasets. One interpretation
of this could be that users are conservative in their judgement of others they have done successful
transactions with, but are very aggressive in their judgement of other users when having a negative
experience. We can also notice the ranking of tie strength magnitude between: (i) non-reciprocal
links, (ii) reciprocal links with the same sign, and (iii) reciprocal links with opposing signs. For

14

positive links in both datasets we have the largest magnitude is (iii), followed by (i) and (ii). In
comparison, for negative links in both datasets (ii) is the largest, but then for Bitcoin-Alpha we have
(iii) followed by (i) and Bitcoin-OTC we have (i) followed by (iii). It is interested that the strongest
negative links are those where the two users both have expressed a negative sentiment towards each
other by creating a reciprocal negative link However, for positive links, the largest magnitude is
coming from positive links that are reciprocated by a negative link, which is somehow unintuitive.
In summary, there are more diﬀerences than commonalities for positive and negative links when it
comes to the tie strengths in signed networks.

2.3.3 Signed Network Theories

In this section, we will deﬁne and introduce the two most inﬂuential signed network theories,
namely that of balance and status. This will then help motivate the need for dedicated signed
network analysis eﬀorts beyond the diﬀerences in network properties previously discussed.

2.3.3.1 Balance Theory in Signed Networks

It was in [29] that balance theory was ﬁrst developed at the individual level with the general notion
that “the friend of my friend is my friend" and “the enemy of my enemy is my friend". Then, later
in [30] structural balance theory was introduced at the group level with a network perspective. A
signed network is deﬁned as being balanced if and only if all the cycles in the network contain an
even number of negative links. It had been proved in [31] that a signed network is balanced if and
only if the nodes can be partitioned into two mutually exclusive subsets such that all links within
the subsets are positive, while the links having an endpoint in each of the two subsets should be
negative. However, it is rare to have real-world signed networks that are completely balanced.
There have been multiple methods to measure the level of balance in a signed network.

It
was in [32] and [33] that the ratio of balanced to unbalanced cycles were used to calculate the
level of balance of a signed network. Later methods were then developed, such as in [34] where
they performed a clustering of the signed network and then analyzed the number of postive links

15

Figure 2.2: Visualizing balance theory in the form of signed triangles.

between clusters and negative links within clusters. This led to other methods similarly deﬁned
using clusterings [35, 36]. In addition, in [37] a signed spectral theory approach was taken. It
should also be noted that it was in [34] where the notion of weak structural balance came, which
based on the clustering ignores the assumption that “the enemy of my enemy is my friend” is a
balanced relation. In other words, weak structral balance does not concern whether the enemy of
my enemy is either my friend or my enemy, and thus only takes into consideration that a friend of
a friend should be a friend as compared to an enemy.

However, it is still common to use the deﬁnition based on the ratio of triangles (even though it
is less computationally eﬃcient taking (3) when utilizing matrix operations on the adjacency
matrix). In regards to analyzing the level of balance in our signed networks we ﬁrst adopt   to
denote the link sign between two users  and   where   = 1 (or   = −1) if the positive (or
negative) link between  and  . As previously methods, and given that typically we focus on
triads (or 3-cycles) [4], a triad of three users (,   , ) is balanced if  = 1 and    = 1, then
 = 1; or   = 1 and    = −1, then  = −1. Therefore, for a triad, there are four possible sign
combinations: (a) (+,+,+); (b) (+,−,−); (c) (+,+,−); and (d) (−,−,−), while only (a) and (b) are
balanced. We visualize these four signed triangle combinations in Figure 2.2.

Note that balance theory is only applicable to undirected signed network, and thus we perform
some basic preprocessing to ignore the link directions when applying it to directed signed networks
following the discussions in [4]. More speciﬁcally, in our speciﬁc mapping from a directed to undi-

16

rected signed network we count each of the four sign combinations and ﬁnd that 92.0%, 91.5%,
94.5% and 92.4% of triads in Bitcoin-Alpha, Bitcoin-OTC, Slashdot and Epinions are balanced,
respectively. This is in line with prior works measuring the level of balance in signed networks and
it has also been shown that the ratio of balance to unbalance triangles increases over time [28].

2.3.3.2 Status Theory in Signed Networks

Based on some of the observations found in [3] status theory was later developed in [2]. Unlike
balance theory, which is deﬁned in an undirected signed network based on users liking or disliking
(i.e., positive or negative links), status theory takes a separate perspective in directed signed
networks. Rather than assuming a positive link from  to   implies that   is a friend of , it
considers that perhaps  instead thinks that   has a higher status. Similarly, a negative link from
 to   might not imply that  dislikes  , but perhaps  just believes that   is of a lower status in
the network. Thus, from a triangle perspective, status theory is based on the concept of consistency
in the logical deductions of the directed relations. We note that directed signed triangles can take
12 diﬀerent forms, where 4 are cyclic and 8 are acyclic.

+−→ , 

+−→  , and  

Determining if a triangle adhere to status theory can be done by the following three steps: 1)
ﬁrst ﬂip negative links to positive and reverse their direction; and 2) if the triangle is acyclic, then
it adheres to status theory. For example, if we have the cyclic signed triangle consisting of the
+−→ , then based on status theory these three links
three positive links 
respectively imply that  believes that  has a higher status then them (i.e.,  <  ),  believes
that   has a higher status than them (i.e.,   > ), and   believes that  has a higher status than
them (i.e.,  >  ). However, we then from the ﬁrst two links we have that  <  <  , but then
the third link with   thinking that  >   creates a contradiction. We note that prior works have
shown that many real-world signed networks have the majority (and almost all) triangles adhering
to status theory quite similar to balance theory [2].
It can be calculated that when converting
directed signed network triangles to undirected signed network triangles that only 6 agree between

17

the theories of balance and status while the other 6 are in disagreement. For example, our prior
example of the cyclic signed triangle of three positive links does not adhere to status theory, but
aligns with triangle (a) from Figure 2.2 and thus adheres to balance theory.

18

CHAPTER 3

MEASURING NETWORKS WITH NEGATIVE LINKS

In this chapter1, we investigate network measurements for networks having negative links. A
network metric is a mathematical expression that allows the expression of information contained in
a network to be output in numerical form. It is then via these metrics that we can deﬁne network
measurements through the use of algorithms and/or mathematical formula that can be used to
compare and/or rank users, pairs of users, subgroups (i.e., communities), or entire networks.

Node relevance, which measures how relevant two nodes are in a social network, is one of the
keystones of social network analysis. This has been shown by their usage in diverse social network
analysis tasks and applications such as link prediction [38, 39], node classiﬁcation [40], community
detection [41], search and recommendations [42]. The vast majority of existing node relevance
measurements have been designed for unsigned networks (or social networks with only positive
links) [11, 43]. We note that these measurements can be divided into local and global measurements
according to the information used – local measurements only use local neighborhood information
such as common neighbors; while global measurements utilize the whole structural information of
the network such as Random Walk with Restart [44]. Thus, in Section 3.1, we present our proposed
node relevance measures for signed networks and discuss their relationship to balance theory.

Node centrality is a fundamental network measurement that has a diverse set of applica-
tions [45, 46, 47, 48, 49] across many domains, such as economics [45], biology [46], and urban
infrastructure [47, 48] to sociology, which the later has a plethora of applications [49]. In general,
the task is to construct a ranking of nodes based on how “central” or “important” they are in the
network. Most of the previous work has been for unsigned networks. Due to the inclusion of
negative links, existing unsigned centrality measures are not directly applicable to signed networks.
1Tyler Derr, Chenxing Wang, Suhang Wang, and Jiliang Tang. “Relevance Measurements in
Online Signed Social Networks.” KDD 14th International Workshop on Mining and Learning with
Graphs (MLG), 2018.

19

This is partially due to the added complexities associated with the introduction of negative links
and also because we now need to diﬀerentiate between the “famous” and “infamous” users in a
signed network. Though recently there have been some measurements proposed that take into
account negative links by extending existing unsigned centrality measurements [24, 50, 51], they
have not explored the use of exploiting higher-order network information in signed networks. Thus,
in Section 3.2, we develop a signed centrality measurement built upon both prominent triadic social
theories deﬁned on networks having both positive and negative links, namely the status and balance
theories.

3.1 Node Relevance Measurements in Signed Networks

It is evident from recent research that negative links have signiﬁcant added value over positive
links in various analytical tasks. For example, even a small number of negative links can signiﬁcantly
boost the performance of positive link prediction [3, 4], and can similarly improve recommender
systems [5, 6]. Thereby, negative links could oﬀer the potential to help us develop novel relevance
measurements for signed networks. There are a few very recent works in designing node similarities
for link prediction [52, 53]. However, a general and systematic investigation on signed relevance
measurements and their eﬀects on signed network analysis had previously not been explored. Hence,
we perform the initial and comprehensive study on the problem of measuring node relevance in
signed social networks. Analogous to node relevance research in unsigned networks, we aim to
investigate the following: (a) how to make use of both positive and negative links in signed relevance
measurements; and (b) what are the eﬀects of these measurements on two signed network mining
tasks of sign prediction and signed tie strength prediction.

Data Analysis Discussions: Social theories such as homophily [8] play an important role
in building node relevance measurements for unsigned social networks [54]. This stimulates the
investigation for one of the most fundamental social theories related to signed social networks, i.e.,
balance theory [30], that could be helpful in building node relevance measurements in signed social
networks.

20

Table 3.1: Notations regarding node relevance in signed networks.

Notations
A
A+(A−)
|A|
R

 (

)
(+
+
)
(−
−
)


 ( 
)

+
 (−
 )
X 


Descriptions
Adjacency matrix
Adjacency matrix of only positive(negative) links
Absolute adjacency matrix
Node relevance matrix
Degree of node 
Indegree (Outdegree) of node 
Indegree (Outdegree) of positive links of node 
Indegree (Outdegree) of negative links of node 
Set of neighbors for node 
Set of incoming (outgoing) neighbors for node 
Set of positive (negative) neighbors for node 
the (i,j) entry of the matrix X

Based on the analysis performed in Chapter 2.3, properties of negative links are diﬀerent from
positive links, which makes signed social networks be distinct from unsigned social networks.
Therefore, though node relevance measurements have been extensively studied, it still needs dedi-
cated eﬀorts to systematically investigate signed relevance measurements. Furthermore, as most of
the triads in signed social networks satisfy balance theory, we can use this to guide building novel
signed relevance measurements.

Node relevance measurements have been extensively studied in unsigned networks. According
to our preliminary data analysis, the availability of negative links makes signed networks unique
in many aspects such as properties and balance theory.
In this section, analogous to unsigned
networks, we develop node relevance measurements for signed networks.

Notations and Deﬁnitions: We use R ∈ R× to denote the relevance score matrix, where
R  represents the node relevance from user  to user  . Note that node relevance values are not
necessarily symmetrical. We summarize the above notations in Table 3.1 where  and  denote
degree and the set of neighbors of  in an unsigned network.

Many node relevance measurements have been proposed for unsigned networks. According
to the used information, we can roughly divide them to local and global measurements. Local

21

measurements only use local neighborhood information such as common neighbors; while global
measurements utilize the whole structural information such as Random Walk with Restart. Mean-
while, node relevance measurements can be undirected and directed, corresponding to undirected
and directed networks. Note that we could use any method that requires a directed network for an
undirected network, since undirected networks are simply directed networks where each edge has
both directions. In this work, we will group signed relevance measurements as local and global
methods.

With node relevance measurements for unsigned networks, there are three strategies to design
signed ones. The ﬁrst is to only use A+ in the calculation of node relevance scores. This strategy
completely ignores the negative links that could result in over-estimation of the impact of positive
links [55]. The second strategy would be to convert negative links in the signed network into
positive links, thus making the signed network into an unsigned network. Such a network can be
represented by the matrix ˜A where ˜A  = |A |. Ignoring signs of links not only overlooks the
diﬀerences between negative and positive links; but also makes balance theory for signed networks
not applicable. Our third strategy is to take advantage of negative links and balance theory to
develop signed relevance measurements based on unsigned ones. In the following subsections, we
will detail how to apply the third strategy to representative unsigned node relevance measurements.

3.1.1 Local Methods

In this subsection, we build local signed relevance measurements based on representative local
methods for unsigned networks including common neighbors, Jaccard Index, and Preferential
Attachment [56, 17]. For each unsigned measurement, we will ﬁrst brieﬂy introduce it, then detail
how to design the signed one and ﬁnally discuss its connection with signed network properties and
balance theory.

22

3.1.1.1 Common Neighbors

Unsigned Common neighbors (UCN): If two nodes share a lot of common friends, they are likely
to be relevant. Based on this intuition, UCN deﬁnes the relevance score between  and   as the
number of common neighbors, which is formally deﬁned as:

R  = | ∩  |

(3.1)

where || denotes the size of the set .

Signed Common neighbors (SCN): UCN cannot be directly extended to include negative

links. Therefore, we deﬁne SCN as follows:
 | + |−

R  = (|+

 ∩ +

 ∩ −

 |) − (|+

 ∩ −

 | + |−

 ∩ +
 |)

We can interpret SCN as number of common neighbors of  and   where they agree on the
polarity of the sign (|+
 ∩ +
 |) and then subtracting the number of neighbors that
 ∩ +
they disagree on the sign (|+
 |).

 | + |−
 ∩ −

 ∩ −
 | + |−

 ∩ +

 ∩ −

 | + |−

Connection to Balance Theory: If  and   agree with the majority of the signs of their
neighbors, i.e., (|+
 |), then R  is positive which
will lead to more balanced triads. Otherwise, they have more disagreements on the signs, i.e.,
(|+
 |), then R  is negative, which will also result
in more balanced triads. Therefore, SCN aims to force more triads with  and   to be balanced.

 |) > (|+

 |) > (|+

 | + |−

 | + |−

 ∩ +

 | + |−

 ∩ −

 ∩ −

 ∩ +

 ∩ −

 ∩ +

3.1.1.2

Jaccard Index

Unsigned Jaccard Index (UJI): UCN only considers the number of common neighbors of  and
 , but it ignores the number of unique neighbors these two users have. Therefore, UCN is likely
to give users with large numbers of neighbors high relevance scores. To mitigate such eﬀect, UJI
penalizes the UCN scores by the number of unique neighbors two users have as:

R  =

| ∩  |
| ∪  |

23

(3.2)

Signed Jaccard Index (SJI): Similar to from UCN to UJI, SJI is deﬁned as SCN divided by

the total number of unique neighbors  and   have:

R  =

|+
 ∪ −

 
 ∪ +

 ∪ −
 |

(3.3)

Connection to Balance Theory: Similar to SCN, SJI targets to force more triads balanced.

3.1.1.3 Preferential Attachment

Unsigned Preferential Attachment (UPA): One commonly used interpretation behind this method,
taken from the ﬁnance realm, is that the rich gets richer. In terms of social network analysis, users
that already have many friends are more likely to create new friends in the future. Therefore, the
node relevance score of UPA is to multiply the degrees of the two users [11].

R  =  ×  

(3.4)

Signed Preferential Attachment (SPA): In the Section 2.3.2, we demonstrate that both positive
and negative links follow the power-law distributions. In other words, we observe “the rich getting
richer” for both positive and negative links, which paves us a way to deﬁne SPA. We ﬁrst split the
network from A to a positive network A+ and a negative network A−. Then we can use UPA to
calculate relevance scores from the positive and negative networks, separately, since degrees in both
networks follow power-law distributions. The relevance score for  and  from A+ is denoted as
 +
  are computed

  and similarly we denote the relevance as  −

  from A−.  +

  and  −

as:

 +

  = +

 × +

 ,  −

  = −

 × −


Then we deﬁne SPA between  and   as:

R  = ( +

  ,  −
 )
where () = 1, 0, or -1 if  is larger, equal or smaller than 0.
relevance score  +

  is larger than the negative one  −

 )  ( +

  −  −

Intuitively, if the positive
 , the overall R  should be positive;

(3.5)

24

  ,  −

 ) = | +

otherwise, R  should be negative. Therefore the sign of R  is decided by ( +
  −  −
 ).
The relevance strength |R | is to aggregate  +
  and  −
  via a function  . A straightforward
way is to set  ( +
  −  −
 |. It may not work well. For example, when 
and   have both larger positive and negative degrees, positive and negative relevance scores will
cancel each other, which contradicts with “ the rich getting richer". Actually we empirically ﬁnd
that  ( +
  −
 −
 |.
Connection to the signed network property: According to the power-law distributions of positive
and negative links, we design SPA, which will allow users with higher degrees to have higher
relevance scores with others.

 ) works better than  ( +

 ) = max( +

  ,  −

 ) = | +

  ,  −

  ,  −

3.1.2 Global Methods

The global methods make use of not only the local neighborhoods, but also allow for the propagation
of relevance information to pass through the whole network. Most of the global methods for
unsigned networks assume that two users  and   should have high relevance if they have
neighbors with high relevance. In this subsection, we detail how to design global signed relevance
measurements based on representative unsigned ones and then connect them to balance theory.

3.1.2.1 Katz

Unsigned Katz (UK) : This method sums over the collection of all paths from  to  and has an
exponential decay on the weight associated with the count of paths as the length increases [57]:

∞

=1

∞

=1

R  =

 · |paths

, | =

A

(3.6)

where |paths
, | is the count of paths of length  from  to . Note that we should have  < 1 so that
longer paths will be assigned less weight than shorter paths. This can be formulated recursively as

25

follows to handle the counting of the paths of varying length:


=1

R  =


AR  +  

(3.7)


Note that   is used to ensure that every node in the network has a high relevance to themselves
(i.e., “self-similarity”). It is a diagonal term and is deﬁned as  = I. It normalizes the relevance
scores from each user  based on the degree .

Signed Katz (SK): Balance theory states that a k-cycle in a signed social network is balanced
if it contains an even number of negative edges and unbalanced if it contains an odd number of
negative edges. With relevance scores from SK, we expect more balanced k-cycles than unbalanced
ones involving users  and . To achieve this, we would therefore need to choose the sign of the node
relevance R  to be either positive or negative, such that it optimizes over all the cycles involving 
and  (i.e., all the paths between  and ). As done in UK, we also can similarly allow the decay of
importance on the longer paths. Our formulation and its recurrence relation for the calculation of
paths of length  having an even or odd number of negative edges is deﬁned as follows:

R =

  (B, U)

(3.8)

with

=1

B = B−1A+ + U−1A−
U = B−1A− + U−1A+
B1 = A+, U1 = A−

where  (B, U) is a function to combine the counts of paths with even and odd number of negative
links. B and U are the matrices that hold the number of paths with an even and odd number of
negative links in paths of length , respectively. Next we will discuss the inner working of SK.
When counting paths of length 1 (i.e., a direct edge connecting the two nodes), we set B1 as A+
since having a positive edge is trivially having an even number of negative links in a path of length
1, and similarly reasoned for initializing A−. We assume that B−1 and U−1 represent the paths

26

of length  − 1 having an even and odd number of negative edges, respectively, between all pairs
of nodes. Adding one positive link (A+) to a path in B−1 or adding a negative link (A− ) to a
path in U−1 will result in a path of length  with an even number of negative links. This intuition
leads to the update rule of B = B−1A+ + U−1A−. Similarly, we can obtain the update rule of
U = B−1A− + U−1A+.
Theorem 1. When we choose  (B, U) = (B − U) and A ∈ R×, where A  = 1 if  has a
positive link to  , −1 if  creates a negative link to  , and 0 when  has no link to  , signed Katz
in Eq (3.8) is equivalent to applying unsigned Katz in Eq (3.6) on the signed network adjacency
matrix deﬁned as A.

Proof. To prove the theorem, we only need to show that: B − U = A. We use mathematical
induction as:
Basis: Let  = 1, based on our deﬁnition of B1 and U1, we have (B1 − U1) = (A+ − A−) = A = A.
Inductive Hypothesis: Suppose the theorem holds for  = . In other words, (B − U) = A.
Inductive Step: Let  =  + 1. Then our left size is (B+1 − U+1) =

(cid:16)(BA+ + UA−) − (BA− +

UA+)(cid:17)

= (B − U)(A+ − A−) = A(A) = A+1, which completes the proof.

(cid:3)

Connection to Balance Theory: SK is built based on balance theory. SCN and SJI forces more
balanced triads (or 3-cycles), while SK pushes more for any -circles to be balanced. If the majority
of paths between  and  have an even number of negative links, according to balance theory, we
should have a positive node relevance between them. Similarly, when having an odd number of
negative edges, we want to have a negative relevance. Therefore, if we count the number of paths
between  and  with an even or odd number of negative edges, then we can subtract the number with
an odd number of negative links from the number of paths having an even number of links, since
this will give us the optimal choice of sign between  and  as mentioned above. More speciﬁcally,
if the resulting value is positive, the node relevance between  and  is positive, otherwise negative.

27

3.1.2.2 Asymmetric Similarity Measure for Weighted Networks

Unsigned Asymmetric Similarity Measure for Weighted Networks (UASCOS++): This method
is an enrichment of the ASCOS [58] to handle weighted networks. The formulation of ASCOS is
the following:


∈





|


|

1

R  =

R 

 ࣔ 

 = 

Let P  =

A 


and we can rewrite the formulation as:

R = P(cid:62)R + (1 − )I

It deﬁnes the node relevance as the summation of normalized relevance from the incoming
neighbors of  to . The modiﬁcations for ASCOS++ were performed to handle weights on the
edges. The formulation is shown below:

 
A

R  =

∈


∈




A

(1 − −)R 

 ࣔ 

(3.9)

1

 = 

The adjustment is that they now normalize each of the edge weights coming into  by the summation
of all the incoming weights into . The term (1 − −) maps the weights to be close to 1 when
edge weights are large, and when the weights are small, it maps them close to 0.

Signed ASCOS++ (SASCOS++): ASCOS++ has diﬃculties to directly adapt to signed net-
works. Assume that a node  has an even number of incoming edges, where half the edges are
positive, while the other half are negative. Therefore, this would lead to an undeﬁned value as the

summation over all incoming edges to  
with balance theory. To ease our analysis in the following case, let  = 

Another issue is if we directly apply ASCOS++, the resulting relevance score could contradict
and
 = (1− −). If A = 1 and  is negative, hence  is negative and  is positive. Thus, if R  is

A is zero.

A,  =

A


∈


∈


28

A
|A| (1 − −)R 

R  =

∈


∈




1

also positive, then the product of these three terms R  is negative and the resulting triad (+, +, −)
does not follow balance theory. Similarity, when R  is negative, the product is positive and the
resulting triad (+,−,+) is also not balanced.

Due to the fact using ASCOS++ with signed networks, could inherently disagree with balance
theory, which motivates us to build SASCOS++. We note that when using ASCOS++ with signed
networks,  is equal to approximately 0.63 and -1.72 when A is positive or negative, respectively.
Thus, it is providing a stronger push in the similarity (by about three times) when seeing a negative
link. Due to the imbalance of the numbers of positive and negative links in signed networks, we
leave this  term as is, but make a change to the normalization (i.e., ). The formulation for
SASCOS++ is shown below:

 ࣔ 

 = 

(3.10)

Connection to Balance Theory: It is easy to verify that SASCOS++ is able to have the relevance
measurements aligning with balance theory. In other words, it will push more balanced triads.

3.1.2.3 Random Walk with Restart

Unsigned Random Walk with Restart (URWR): A random walker starting on node  that has a
probability of (1 − ) to return to  and with probability  chooses a neighbor of the current node
to move to based on a transition matrix W (where W  is the probability that the walker starting
at  will end at node ). We deﬁne this transition matrix as W  = 1
 if  and  are connected and
W  = 0 otherwise (i.e., no link between  and ). With the intuition, URWR is formulated as [44]:

R = WR + (1 − )I = (1 − )(I − W(cid:62))−1

(3.11)

Signed Random Walk with Restart (SRWR): The transition matrix W has to be non-negative,
thus we cannot directly apply URWR to signed networks. Therefore, we study signed random walk

29

with restart. Based on balance theory, the relevance score of  w.r.t  can be useful to infer that
of   to  if there’s a link from  to  . For example, if A  > 0 (or  and   are friends), and
R > 0 (or  and  are likely to be friends), it may suggest that  and   are friends (or R  > 0)
because friends’ friends are friends. On the contrary, if A  < 0 (or  and   are enemies) but
R > 0 (or  and  are likely to be friends), it may indicate that  and   are enemies (or
R  < 0) because friends’ enemies are enemies, which is implied from “the enemy of my enemy
is my friend". This indicates that (1)  ’s relevance score to  can be indicated by these of nodes
(e.g., ) that have links to  ; and (2) the estimation also depends on the signs of links from  to
  and the relevance scores from  to . These intuitions suggested by balance theory pave us a

way to build SRWR. Let ¯D be a diagonal matrix with its diagonal element ¯D = |A|. In this

way, ¯D is the out degree of  considering both positive and negative links. Thus, the normalized
weight of the link from  to  is given as

¯W =

|A|
¯D

According to aforementioned intuitions, R can be used to estimate R  with A  ࣔ 0.
Intuitively the portion of relevance score of  contributes to R  should be weighted by ¯W . This
is to account for the number of neighbors of . If ¯D is large, then ¯W  is small and the eﬀects of
 to each of its neighbor is small. Thus, R  can be estimated as:

R  ∝

(A ) ¯W R

(3.12)


where (A ) is used to encode the impact of the sign of the links. With sign introduced
in the estimation of R , the relevance score can be both positive and negative. Two users with
negative links can aﬀect each other with negative relevance scores and thus can capture the semantic
meanings of signed links.

With the analysis above, we are ready to discuss the details of SRWR. We focus on the relevance
score of   ,  = 1, . . . , ,  ࣔ  w.r.t  since the relevance scores w.r.t other nodes can be derived
similarly. Firstly, R  ,  = 1, . . . , ,  ࣔ , are initialized to 0, which means that the relevance
scores of   to  is unknown; while R is initialized to 1 because  should be positively relevant

30

to itself. Now considering that a random walker starting from . It can iteratively transmit to its
neighborhood through positive and negative outgoing links. Each time the walker arrives at a node
 , it will update R  by the relevance scores of nodes that have links to  . If the random walker
arrives at , then R is updated as

(A) ¯WR + (1 − ) ∗ 1

(3.13)

where the ﬁrst term of the right-hand side of Eq.(3.13) is the relevance score estimated from
neighborhood, and the second term is to make sure that R > 0, i.e.,  is relevant to itself.  is a
scalar between 0 and 1, which is used to control the contribution of the two parts. If the random
walker arrives at   ,  ࣔ , R  is updated as

(A ) ¯W R

(3.14)

R ← 


R  ← 


R  ← 


R  = 

Combining Eq.(3.12) and Eq.(3.13) together, R  is updated as

(A ) ¯W R + (1 − )I(, )

where I(, ) is a binary indicator function with I(, ) = 1 if  =  and 0 otherwise. The random
walker keeps moving until R doesn’t change, which gives

(A ) ¯W R + (1 − )I(, )

(3.15)


, we deﬁne S = ¯D−1A and then Eq.(3.15) can be written
By noticing that (A ) ¯W  =
in matrix form with R = RS + (1 − )I where I is the identity matrix. The solution to the above
equation is given as

A 
¯D 

R = (1 − )(I − S)−1

(3.16)
Correctness: Here we show that SRWR is correct, i.e., (I − S)−1 exists. The existence of
(I − S)−1 can be proofed using the following lemma, which is known as Levy-Desplanques
theorem [59]. The Levy-Desplanques theorem is stated as follows

31

(a) +++

(b) + - +

(c) ++ -

(d) + - -

(e) - -+

(f) - - -

Figure 3.1: Triplets encountered during signed random walk.

Lemma 1. Let P ∈ R× be a square matrix.If |P| > ࣔ |P | for all  = 1, . . . , , then P is

nonsingular.

Based on the above lemma, we have

Theorem 2. I − S, 0 <  < 1, is non-singular.

Proof. Let P = I − S. Since S = 0, we have P = 1. Also, ࣔ |S | is given as
which leads to ࣔ |P | =  ࣔ |S | = . Then we have |P| > ࣔ |P | for all  = 1, . . . , .

  |A |


ࣔ


|A |
¯D

=

|S | =

|S | =

= 1.

¯D

(3.17)

(cid:3)

Thus, I − S is non-singular and (I − S)−1 exists.

Connection to balance theory: Figure 3.1 gives representative triplets that will happen during
the update process. The solid line with +/- means positive/negative links. The dashed line with +/-
means R  > 0/R  < 0. According to the social balance theory [49], the resulting triads in Figures
3.1(a), 3.1(d) and 3.1(e) are balanced while the remaining three are unbalanced. Next we show
that SRWR is likely to keep the balanced structures while reducing unbalanced structures during
the updating process. For example, in Figure 3.1(a), RS  > 0 will be added to R  according to
Eq. (3.14), which increases the positive relevance score R . However, in Figure 3.1(b), RS  < 0
will be added to R  that reduces the positive relevance score R  > 0. R  will be consistently
reduced until R  becomes negative (or the triad becomes balanced). Following a similar process,
we can give similar observations for other triads. Thus, SRWR actually tends to learn relevance
scores that increase the structural balance of a given signed network.

32

+ + + + - + + + - + - - - - + - - - 3.1.3 Experiments

In this section, we investigate the impact of signed relevance measurements on two signed network
analysis tasks, i.e., sign prediction and tie strength prediction. We aim to answer the following
two questions. As mentioned in the last section, we can have three strategies to adapt unsigned
measurements for signed networks – (1) removing negative links; (2) ignoring signs; and (3)
building advanced signed versions based on signed network properties and balance theory. Note
that in the following subsections, given an unsigned measurement “X”, we use “X-R" and “X-I" to
denote the corresponding measurements applicable to signed networks by removing negative links
and ignoring signs, respectively. For example, “UCN-R" and “UCN-I” denote the strategies of
adapting “UCN” to signed networks by removing negative links and ignoring signs, separately. The
ﬁrst question we want to answer is – which strategy leads to better measurements. We have built
numerous local and global measurements. The second question is – how they perform in diﬀerent
tasks.

For each of the parameterized measurements, we performed cross validation for the parameter
tuning for each of the tasks. Among measurements discussed in the last section, common neighbor
(CN), Jaccard Index (JI), and Preferential Attachment (PA)based measurements are designed for
undirected networks; while ASCOS and RWR are for directed networks. As mentioned before
directed measurements can be naturally applied to undirected ones by considering one undirected
link as two directed links. Therefore, we conduct experiments with both undirected and directed
settings.

3.1.3.1 Sign Prediction

The problem of sign prediction in signed networks is to predict whether an unlabeled links is
positive or negative given knowledge of other link signs in the signed network. A previous study
in unsigned networks suggested that good node relevance measurements generally are good for
the prediction of links [60]. Therefore, the sign prediction performance can reﬂect the quality of
relevance measurements.

33

Table 3.2: Performance comparison of link prediction under the undirected setting.

Metrics
UCN-R
UCN-I
SCN
UJI-R
UJI-I
SJI

UPA-R
UPA-I
SPA
UK-R
UK-I
SK

URWR-R
URWR-I
SRWR

UASCOS++-R
UASCOS++-I
SASCOS++

Bitcoin-
Alpha
0.500
0.501
0.671
0.499
0.497
0.669
0.497
0.481
0.559
0.517
0.488
0.730
0.531
0.500
0.751
0.530
0.496
0.765

Bitcoin-
OTC
0.523
0.497
0.716
0.524
0.489
0.725
0.587
0.475
0.628
0.587
0.482
0.766
0.628
0.481
0.775
0.603
0.484
0.774

Slashdot Epinions
0.520
0.520
0.508
0.508
0.629
0.549
0.513
0.522
0.512
0.503
0.630
0.550
0.634
0.571
0.484
0.498
0.634
0.641
0.560
0.542
0.498
0.538
0.693
0.702
0.566
0.569
0.530
0.494
0.677
0.703
0.573
0.554
0.537
0.497
0.705
0.663

For each dataset, we randomly choose 80% as training, and the remaining as testing. We
perform relevance measurements on the training set to get the relevance scores for each pair of
users. The signed speciﬁc measurements can obtain a relevance score from [−1, 1]; hence we
directly use the sign of the relevance score to indicate the sign of links. For “X-R" and “X-I",
the relevance score is in “[0,1]". From the training data, we search an optimal threshold from
the training data, and then if the relevance score is less than threshold, we predict a negative link
and positive otherwise. Since positive and negative links are usually imbalanced in real-world
signed networks, we use Area Under the Curve (AUC) as the metric to assess the performance of
link prediction. For all four datasets, network information is available thus they all can be used
in the link prediction experiment. Under the undirected setting, we ignore the directions of links
following common practice in [4].

Sign Prediction Performance: The sign prediction comparison results are shown in Table 3.2

34

Table 3.3: Performance comparison of link prediction under the directed setting.

Metrics

UASCOS++-R
UASCOS++-I
SASCOS++
URWR-R
URWR-I
SRWR

Bitcoin-
Alpha
0.588
0.562
0.644
0.606
0.556
0.791

Bitcoin-
OTC
0.630
0.639
0.705
0.644
0.590
0.809

Slashdot Epinions
0.524
0.516
0.493
0.519
0.580
0.578
0.541
0.565
0.563
0.500
0.627
0.687

and Table 3.3 for undirected and directed settings, respectively. We note that signed speciﬁc
relevance measurements perform much better than these that (1) remove negative links and (2)
ignore signs. These results suggest the importance of negative links in building node relevance
measurements for signed networks. Meanwhile, global signed measurements consistently obtain
better sign prediction performance than local signed measurements. We note that global methods
consider long circles; while local methods only consider triads. This observation is consistent with
that in [61] – long circles contain rich information in helping predict the signs of links. Under
the directed setting, SASCOS++ also outperforms the ASCOS++ variants that (1) remove negative
links and (2) ignore signs; while the signed RWR (i.e., SRWR) obtains the best performance.

3.1.3.2 Tie Strength Prediction

The relevance score for signed networks not only can indicate the signs of links but also can indicate
the connection strengthen. Therefore, another possible application of relevance measurements
is tie strength prediction, which aims to assign a weight to a link to indicate the connection
strengthen [62, 63, 64].
In other words, the input of a tie strength prediction algorithm is an
unweighted (or binary) network and the output is a weighted network.

We have only used the two Bitcoin datasets (Bitcoin-Alpha and Bitcoin-OTC) for this task as
they are the only two of the four datasets that have a ground truth strength associated with each
edge in the network. Note that we have normalized the two datasets to have their strength in the
range [-1,1] to ensure easy mappings from our presented node relevance measurements to the tie

35

strengths associated with these datasets edges.

We provide the entire binary network as input and then attempt to predict the tie strength
associated with each edge of the network. Note that we directly use the relevance scores of signed
speciﬁc measurements as the predicted tie strength. While for “X-R” and “X-I", we use the similar
strategy as sign prediction for tie strength prediction – we search an optimal threshold from the
training data to map the relevance scores to [-1,1]. Therefore, we use root-mean-square error
(RMSE) as the metric to evaluate the performance of tie strength prediction.

Tie Strength Prediction Performance: The tie strength prediction performance is demon-
It can be

strated in Table 3.4 and Table 3.5 for undirected and directed settings, respectively.
observed from the Table 3.4 for the undirected setting:

The ﬁrst observation Table 3.4 for the undirected setting is that the random tie strength predic-
tion of picking values uniformly in the range [-1,1] results in the worst performance. Now, given
the context of the random baseline performance, we further discuss the results of the relevance
measurements. We note that most of the time, signed speciﬁc measurements outperform these that
(1) remove negative links or (2) ignore signs for tie strength prediction. The overall best measure-
ment in each dataset was a signed speciﬁc measurement. This further supports the importance of
negative links in signed relevance measurements. Meanwhile, local signed measurements obtain
comparable or even better performance than global signed measurements in tie strength predic-
tion. This observation is diﬀerent from that of sign prediction. To achieve better sign prediction
performance, we only need to predict the sign accurately. However, for tie strength prediction, in
addition to signs of links, we also need to predict the strength of the relevance correctly. Thus,
local information could be good at predicting relevance strength. In fact, most existing tie strength
prediction algorithms for unsigned networks only use local information [63, 62]. For the directed
setting, we can see that again SRWR is the best performing measurement.

36

Table 3.4: Performance comparison of tie strength prediction under the undirected setting.

Bitcoin-Alpha Bitcoin-OTC

Table 3.5: Performance comparison of tie strength prediction under the directed setting.

Bitcoin-Alpha Bitcoin-OTC

0.324
0.332
0.308
0.324
0.332
0.308
0.333
0.333
0.335
0.326
0.333
0.320
0.329
0.331
0.328
0.328
0.345
0.334
0.664

0.362
0.364
0.364
0.361
0.363
0.338

Metrics
UCN-R
UCN-I
SCN
UJI-R
UJI-I
SJI

UPA-R
UPA-I
SPA
UK-R
UK-I
SK

URWR-R
URWR-I
SRWR

UASCOS++-R
UASCOS++-I
SASCOS++

Random

Metrics

UASCOS++-R
UASCOS++-I
SASCOS++
URWR-R
URWR-F
SRWR

0.286
0.291
0.277
0.286
0.291
0.277
0.298
0.298
0.302
0.290
0.295
0.284
0.294
0.296
0.291
0.292
0.302
0.299
0.648

0.321
0.319
0.320
0.318
0.319
0.301

37

3.2 Node Centrality Measurement in Signed Networks

Over the years, a large volume of research has focused on the development of centrality measures
for networks. These measures seek to deﬁne a real-valued function on the nodes of a network,
where these values can provide a ranking of the nodes based on how central (i.e., important) they
are to the network. Social network analysis has primarily driven these eﬀorts seeking to answer
the question – “Who are the most important or central users in a network?”. However, most of the
literature has only focused on unsigned networks, but today there are many networks that can have
positive and negative links (or signed networks), especially in online social media.

There have been recent attempts to deﬁne centrality when considering the inclusion of negative
links and they can be roughly grouped into the following two categories: 1) separating the positive
and negative links into two independent networks, then applying existing unsigned centrality
measures to each network, and ﬁnally combining the isolated results [65]; and 2) handling positive
and negative links simultaneously by treating negative links as either weak positive links or the
negation of positive links [66]. Apparently, the separation of the positive and negative links is
inherently losing vital information as they fail to capture the interactions between them. It is also
evident that negative links have very diﬀerent properties from positive links and they are not the
negation of positive links [67] and so methods in the second group are also insuﬃcient to handle
signed networks. Hence, new signed centrality measures are still desired.

Deep learning has been proven to not only be powerful in learning and extracting complex
patterns in data [68, 69, 70, 71] but also being able to approximate functions [72, 73]. Given
these advantages, deep learning has been used to advance various analytical and mining tasks
in complex networks such as learning representations of networks [74, 75, 76, 77], generative
network modeling [78, 79, 80, 81] and node classiﬁcation [82]. In addition, we have seen deep
learning’s utility in a plethora of other applications [83, 84, 85] and eﬀorts to understand them have
seen continued improvements [86]. Furthermore, the use of a deep neural network would allow
the incorporation of multiple perspectives in deﬁning a signed centrality measurement including
multiple social theories and higher-order structural information, while also having the inductive

38

properties such that centrality can be calculated across networks (i.e., training the deep model on
one signed network and then utilizing it for calculating the centralities of the nodes in another
network). Therefore, due to the complexities already inherent in unsigned networks and even
more introduced by negative links, along with the previously stated beneﬁts, deep models have the
potential to capture the complexities for an advanced node centrality in signed networks.

Therefore, we aim to investigate the problem of developing a dedicated centrality measurement
speciﬁc for signed networks. We propose a deep framework for learning a signed centrality score
for each user guided by status and balance theories.

3.2.1 An Overview of Deep Signed Centrality (DeSCent) Measurement

Node centrality in unsigned networks is to measure the status of users in a network such that a more
“central” user has a higher value, while other typical “normal” users have lower values. These
measures are based on the network structure and also typically have intuitive physical interpretations
as to which users are being targeted to have a higher centrality based on their respective deﬁnitions.
In signed networks, in addition to diﬀerentiating between “normal” and “important” users due to
the introduction of negative links, we note a user can be important in either a positive or negative
way (e.g., famous or infamous users).

Hence, we need a dedicated way to realize the signed centrality value for each user. However
signed networks are inherently very complex due to the fact that users can form relations to other
users with both positive and negative links. We therefore propose to use a deep learning framework
to learn a signed centrality mapping  :  → c that projects a user  to their learned/corresponding
signed centrality value c. The beneﬁts of the deep neural network for signed centrality are three fold:
1) has the signiﬁcant beneﬁt that allows for the deep network parameters learned in one signed
network to be utilzed for signed centrality calculation in other (perhaps much larger) networks
without needing to train a new model for the other network, 2) the deep network can better capture
the complex patterns in the signed network found in the feature input coming from the interactions of
both positive and negative links, 3) it also allows us to construct our deﬁnition of signed centrality to

39

Figure 3.2: An illustration of our deep neural network for learning signed centrality scores.

easily include multiple perspectives including higher-order structures and multiple social theories.
Due to the fact that users status is related to their connections in the network, we choose to
represent each user  with a feature vector x extracting from their connections. More speciﬁcally,
we extract a set of node and local neighborhood features. However, other approaches could be
taken to construct x, such as signed network embedding techniques [87, 88, 89], but and we leave
this as one future work. Thus, we can redeﬁne the mapping we wish to discover as  : x → c.

Figure 3.2 illustrates the deep learning framework for learning the signed centrality score for
each user. We let our deep model, parameterized by , be represented by , such that it deﬁnes the
mapping (x) → c. Note that it is not a supervised task since we do not have the “ground truth”
signed centrality scores. Hence, learning the mapping function is challenging. Thus, we seek for
discovering the mapping function that can optimize an objective related to centrality.

In the remainder of this section, we develop our Deep Signed Centrality (DeSCent) measurement
using the two social theories on signed networks, namely status [2] and balance [30, 29] theories.
We ﬁrst develop our objective function for DeSCent, then discuss some details of the deep neural
network and the optimization procedure used to train it.

40

3.2.2 Signed Centrality Measurement Objective Function

This subsection is organized in four parts: ﬁrst, we introduce the basic objective for signed
centrality based on status theory and eigenvector centrality to capture local information. Thereafter
we propose to harness balance theory in our signed centrality to include global information. Finally,
two additional constraints are added to our measurement’s objective. After introducing the idea
for each component of DeSCent, we formalize them into an objective function that our deep neural
network can optimize to learn the signed centrality score for each user.

3.2.2.1 Signed Centrality Based on Status Theory

For deﬁning a user’s signed centrality, we ﬁrst want to discuss the usage of status theory [2]. The
theory states that a positive (or negative) link from   to  is implying  ’s opinion that  is of
a higher (or lower) status (i.e., rank) than user  . Therefore, if we want to utilize the collective
opinions of other users in the network to describe the ranking of , we can derive the following:

c = |+


| − |−


| =

A−


(3.18)


 −

A+


Note that we do not use the ratio of positive and negative links since in our setting, centrality scores
can be both positive and negative. However this does not take into account the actual status of the
users giving their links to , but instead simply counting the number of links  receives of each
type. Thus, based on the ideas of Eigenvector centrality [90], we can modify Eq. (3.18) to be the
following:


c  − 

∈+


∈−


c =


 −


c =

c A+

cA−


(3.19)

We can see that now, rather than counting the number of positive and negative incoming neighbors,
we utilize the centrality of ’s neighbors to weight these links and construct a recursive deﬁnition
of signed centrality.

41

3.2.2.2 Harnessing Balance Theory and Higher-order Structures

Until now our formulation only includes information from single directed signed links. However, it
has been shown that not all links are the same (i.e., of equal strength) in social networks and in fact
there is a spectrum of strength implicitly associated with every connection. In unsigned networks,
one heavily studied heuristic to determine the strength of links (both from a social theoretical
standpoint [91] and empirically [62]) is the use of local triangles. Similarly in signed networks we
use local triangles with structural balance theory [30, 29] to further diﬀerentiate the types of signed
triangles.

Balance theory tells us that balanced triangles are more likely to form in social networks as
compared to unbalanced ones. A balanced (or unbalanced) triangle is deﬁned as having an even
(or odd) number of negative links. The theory implies that any such unbalanced triangles are
unstable (due to higher frustration in those social triads) and this is the reason they are less likely
to exist. We can utilize these diﬀerences and parameterize both types of triangles such that we
can diﬀerentiate between those adhering to the social theory (i.e., balanced) and those that do not
(i.e., unbalanced) while adding this local clustering (i.e., higher-order) information to our signed
centrality measurement.

We therefore propose to utilize not only the relationships between users on the single link level
(as shown so far in Eq. (3.19)), but also with triangles in an attempt to more accurately deﬁne a
node’s signed centrality. One intuition is that we are more likely to trust the opinion   gives to
 if they have a “stronger” connection (i.e. have more common neighbors). Note that whether
the link from   to  is positive or negative we assume that when these two users are involved
in more triangles, they have a better sense at judging the status of one another and therefore their
opinion (i.e., directed signed link) should have a higher weight. This provides a principled way for
estimating signed edge strength, and can be further understood through the following example: if a
user  is positively connected to three users , , and , but shares a balanced triad with , an
unbalanced triad with , and no triangles with , we might want to infer the strength of the given
links to  are not equal from all three neighbors. More speciﬁcally, we want to parameterize the

42

Figure 3.3: An illustration of how we calculate the matrices T+, T+, T−, and T−.

triangles such that  receives their ranking based more on  and  (however not necessarily the
same importance for these two) over . Although this example is given in the context of  having
three positively linked neighbors, a similar logic applies for negatively linked neighbors.

Here we deﬁne the matrix T+ which will represent the relations between positive linked pairs

of users based on the number of triangles they have in common as follows:

 = |{(, , )|( , ) ∈ E+ and where ,  , and together form a triangle}|
T+

Similarly we can deﬁne for negatively linked pairs a matrix T− as follows:

T−
 = |{(, , )|(, ) ∈ E− and where ,  , and together form a triangle}|.

Next, we further separate the triangles relation matrices based on whether they are balanced (i.e.,
having an even number of negative links) or unbalanced (i.e., having an odd number of negative
links). More speciﬁcally, we separate T+ into T+ and T+, for the balanced and unbalanced
triangles, respectively, and similarly separate T− into T− and T−. Figure 3.3 shows how we
deﬁne and calculate the matrices T+, T+, T−, and T− where blue double (or red single) lines
represent the positive (or negative) links. We extend Eq. (3.19) to obtain the below formulation:

(cid:17)

(cid:16)A+


c 
p 

c =

(cid:17) −


c
p

(cid:16)A−

 + +T+

 + +T+


 + −T−

 + −T−


(3.20)

where + and + are now used to control the contribution of shared balanced and unbalanced triads
between   and , respectively, and similarly for − and − with  and . We deﬁne p for a

43

user  as:


p =

(cid:16)A+

 + A−

 + +T+

 + −T−

 + +T+

 + −T−


(cid:17)

Note that the utilization of the normalizing vector p is based on the same idea as the normalization
in PageRank [92]. More speciﬁcally, this is done to prevent a user of very high absolute centrality
from distributing their inﬂuence too much to other users and also allows the measurement to be
less susceptible to malicious users attempting to boost (or shrink) rankings.

We point out that we continue to incorporate the single directed edge information from A+ and
A−. This is because if there exists no triangle involving a pair of nodes, but they have an existing
edge, then the corresponding values in the four triangle matrices would all have zero. However, we
seek to include the triangle information to strengthen certain edges in the network, but not wanting
to remove existing “weaker” connections. Although we only focus on triangles to include 2-hop
information, balance theory can be applied to circles of any length. Thus, we can extend our work
to consider longer circles to capture more global information and we will leave it as one future
work.

In terms of our objective for DeSCent, if we let (x) replace c for a given user , then

Eq. (3.20) can be converted into the below objective:

∀ ∈ U : (x) =

(cid:34)
−


(cid:16) (x )
(cid:0)A+
(cid:16) (x)
(cid:0)A−

p 

p

(cid:1)(cid:17)
(cid:1)(cid:17)(cid:35)

 + +T+

 + +T+


 + −T−

 + −T−


The above ensures our network correctly maps the user feature vectors to centrality scores that
match our recursive deﬁnition. This leads to the following minimizing problem:
 + +T+

 + +T+


L() =

(3.21)

(cid:1)


min


(x )
p 

(cid:32)
(x) −(cid:16)
(cid:0)A−

 + −T−


−


∈U
(x)
p

(cid:0)A+
(cid:1)(cid:17)(cid:33)2

 + −T−


+ ()

where the last term () is a regularization term on the deep neural network parameters.

44

3.2.2.3 Additional DeSCent Measurement Constraints

We note that there exists a critical problem in the proposed objective function. If a user  has no
incoming links, but only outgoing links, then their centrality would be zero (or undeﬁned). In fact,
this problem actually diﬀuses throughout the network because for another user , even if  has
incoming links, if those incoming neighbors are all similar to  (in that they themselves have no
incoming links) then  will also have a centrality of zero (or undeﬁned) [17]. One solution to this
issue (and the one we have included in DeSCent) is to deﬁne users having no incoming links to have
some small constant signed centrality value . We can see that this is similar to how PageRank [92]
assigns the zapping probability for all nodes. Furthermore, we also want to prevent our network
from optimizing a trivial solution where the signed centrality value is zero for all users. Therefore
we enforce a constraint having the sum lowerbound on the absolute signed centrality for all users
to be |U|, i.e., the L1-norm of c should be at least  times the number of users in the network. In
the rest of the paper, we will refer to this constraint as the sum constraint. We therefore also seek
to minimize L() with respect to  according to the below constraint:


∈U

||c||1 =

|c| ≥ |U|

which can be represented as minimizing the following after substituting the network output (x)
for c:

max(cid:0)0, |U| − 

| (x)|(cid:1)

∈U

(3.22)

The second additional constraint we place on our signed centrality measurement is based on the
sign of the centrality values. We note that although we want to utilize our more comprehensive
formulation for the signed centrality, we still wish to have the signed centrality scores maintain
the correct sign suggested by status theory. This can be performed by requiring the centrality
of  to be the same sign as suggested by status theory, which can be expressed formally as:
(c) = (|+
|). In terms of minimization (and similarly substituting (x) for

| − |−


45

c), this can be rewritten as the following:

∈U

I[](cid:0)| (x)| + (cid:1)

(3.23)

where we add the margin  = 1 to force (x) to be the same sign by pushing through zero to
the correct sign, and I[] is an indicator function. The indicator function’s purpose is to equal
1 whenever the sign of the centrality does not match the sign suggested by status theory and 0
otherwise. Formally we have deﬁned I[] below:

if(cid:0) (x) × (|+


| − |−


|)(cid:1) < 0

1

0

I[] =

otherwise

|) have diﬀering signs then(cid:0) (x) × (|+
|)(cid:1) < 0 and therefore I[] = 1 (and I[] = 0 otherwise). We refer to this term as the status

where we can see that if (x) and (|+
|−
constraint.

| − |−

| −


We can now construct DeSCent’s full objective function using Eqs. (3.21), (3.22), and (3.23)

as the following:

L() =

min


(cid:32)

∈U
+ 1

(cid:32)
(x) −(cid:16)
max(cid:0)0, |U| − 


(x )
p 

(cid:0)A+
 + +T+
| (x)|(cid:1)(cid:33)

+ 2

∈U

(cid:1) −
I[](cid:0)| (x)| + (cid:1)(cid:33)

(x)
p


 + +T+

(cid:32) 

∈U

(cid:0)A−
 + −T−

 + −T−


(cid:1)(cid:17)(cid:33)2

+ ()

(3.24)

where 1 and 2 are introduced to regularize the two terms from our additional constraints.

3.2.3 Overall DeSCent Deep Network Framework

Now, having the objective deﬁned for our deep signed centrality measurement, next we describe
the deep network structure, and the optimization procedure used.

To optimize the objective given in Eq. (3.24) we use a 3-layer fully connected network that
consisting of 500 hidden neurons per layer and using LeakyReLU [93] activation functions on the
hidden layers with negative slope 0.2. We perform batch gradient descent using ADAM [94] with
an initial learning rate set to 0.001.

46

Algorithm 3.1: Optimization procedure for DeSCent.
Input: G = (U, E+, E−)
Output: c
1 Respectively create A+ and A− from E+ and E−
2 Construct T+, T−, T+, T− from A+ and A−
3 Use A+ and A− to extract node features X
4 Randomly initialize the neural network parameters 
5 while Not convergent do
6
7

Create constant vector k where k ← (x) for user 
Calculate gradient of L() using (x) after replacing (x ) and (x) with constants k 
and k, respectively
Update parameters  using batch gradient descent
8
9 Construct signed centrality vector c where c ← (x)

Algorithm 3.1 details the steps for optimizing our model. On line 1 we construct the two
adjacency matrices A+ and A−, and thereafter on line 2 create the four signed triangle motif
matrices (separated based on whether the triangles are balanced or unbalanced). Line 3 extracts
node features based on the network structure. Then, on line 4, the parameters  of the deep neural
network are randomly initialized. Lines 5 to 8 loop until convergence and discuss how to calculate
the gradient of our objective. We construct a constant vector k on line 6, which contains DeSCent’s
current proposed signed centrality values. Next on line 7, to perform the update to  we replace
(x ) and (x) with k  and k, respectively, and by holding their values, we are eﬀectively
treating
in our objective
(i.e., Eq. (3.24)) as a constant when calculating the error and only using the derivative in relation
to (x). This update procedure is repeated until convergence using batch gradient descent.

 + −T−

 + −T−

(cid:1) −

 + +T+

 + +T+


(cid:0)A−

(x)
p


(cid:1)(cid:17)


(cid:16)


(cid:0)A+

(x )
p 

3.2.4 Experiments

In this section, we conduct experiments to evaluate the eﬀectiveness of the proposed deep signed
centrality measurement (DeSCent). We seek to answer the following three questions: (1) Can
DeSCent learn better signed centrality scores than other existing signed centrality measurements?
(2) Can the use of deep learning enable centrality cross networks? and (3) How do the parameters
of DeSCent aﬀect its performance?

47

Addressing the ﬁrst two questions is not straight-forward because we do not have “ground-truth”
signed centrality values (i.e., a signed centrality ranking of the users of a signed network). It is
observed that both positive and negative links follow the power law distributions [67]; thus the
link formation in signed networks is related to node status [95]. Therefore, following the common
centrality evaluation in the literature [65], we perform an indirect approach to evaluate the quality
of the signed centrality values by utilizing them for the signed link prediction problem. We will
further discuss the signed link prediction problem and the results of these experiments later in
this section. Then, to answer the second question, we perform further experiments to evaluate
how general the learned deep networks are for mapping the node features to their corresponding
centrality scores, which is evaluated by training on a single dataset and then utilizing the learned
network to calculate the centrality of nodes in the other datasets. To address the third question, we
perform a parameter analysis on DeSCent to observe the contribution of balance theory (+, −,
+, and −) and the  and  constraints (controlled by 1 and 2, respectively).

Extracting Node Features: As noted before, the node features can be extracted manually or
learnt automatically via embedding from the network structure. Here we will ﬁrst try manual
extraction and leave automatic embedding as one future work. We propose to extract three groups
of features for each node – the given node’s signed degree distribution, their neighbors signed
degree distribution, and the number of balanced/unbalanced triangles they are involved in. Below
we deﬁne and discuss each feature extracted.

First we discuss how to extract the node signed degree distribution features for a user . These
|, | +
|,

4 features are the in/out positive/negative degrees for the given user  (i.e., |+
and | −

|).


|, |−


For the group of signed degree distribution of ’s neighbors, we extract the average in/out
positive/negative features. However, we obtain four diﬀerent sets of these averages based on
averaging over neighbors that linked with  using one of the four possible directed signed links.
This provides an additional 16 features. For example, one of these features would be the average
incoming negative degree for the set of neighbors that  has given a positive link to. If we were to

48

denote this example feature as x∗, then more formally this can be deﬁned as follows:


x∗ =

1

| +

|


 ∈+


|−


|

(3.25)

Note that the other 15 neighbor based sign distribution features for user  can be deﬁned similar
to x∗.

Finally the last two features are the number of undirected balanced and unbalanced triangles
 is involved in. These two features can be easily calculated with vector and matrix operations
on S+ and S−, which are the undirected symmetric versions of A+ and A−. Here we show how to
calculate the balanced (i.e., Ɗ 
 S−S−
 = (S+
Ɗ 

 ) and unbalanced (i.e., Ɗ
 ) + (S−

 )/2 + (S+

 )/2 Ɗ

 ) triangles for user  as:
 S+S+

 )/2 + (S−

 S−S−

 = (S−

 ) + (S+

 S−S+

 )/2

 S+S−

 S+S+

Signed Link Prediction: Since the link formation in signed networks is related to node status,
following the tradition [65], we compare the signed centrality measurements by using them to
perform the signed link prediction task. The problem of link prediction in signed networks is
to predict new positive and negative links when given an existing signed network [4]. For every
user  in the signed network, we construct a feature vector f consisting of 5 features based on
the computed signed centrality. However, before constructing the feature vectors, we normalize
all signed centrality measurements for a fair comparison across signed centrality measurements.
The ﬁrst feature, f1, is the centrality value c for the user  themselves. The other four are
the average centrality scores associated with the neighbor sets of  when categorized based on
incoming/outgoing positive/negative connections. The formulations are as follows:


f2 =

1
|+


|

  ,

 ∈ +


f3 =

1
|−


|

 ,

∈ −


f4 =

1

| +

|


  ,

 ∈ +


f5 =

1

| −

|


∈ −


We approach the signed link prediction problem as a classiﬁcation problem similar to that done
in [4]. For every edge  , we can construct a feature vector that is the concatenation of feature
vector f and f  and the label is based on whether the edge   was positive or negative. We train a
logistic regression model on the training dataset of edges and then predict the signs of unseen links
(i.e., those in the testing set).

49


For the evaluation of the signed link prediction binary classiﬁcation problem we use Area Under
the receiver operating characteristics Curve (AUC), since in real-world datasets the positive and
negative links are typically imbalanced (i.e., signiﬁcantly more positive links than negative links).
Note that a higher AUC means a higher probability we rank a randomly selected positive edge higher
than a randomly selected negative one and therefore the higher the better the performance. For
each dataset, we randomly use 90% as training, and the remaining 10% as testing. Hyperparameter
tuning used cross-validation on the training set.

3.2.4.1 Performance Comparison

Here we present some existing signed network centrality measurements such that we can study
the eﬀectiveness of our proposed measurement. We have selected baseline methods that were
designed for determining the centrality or importance of nodes in signed networks. We note that
for succinctness we have selected representative measurements that include recent measurements
and those that have shown to perform well. Similarly, for the sake of space, we do not include
any comparison against unsigned node ranking measurements, but our experiments when ignoring
negative links or treating them equivalent to positive links have shown to perform signiﬁcantly
worse on the signed link prediction task. Below we have categorized the baselines into two groups:
1) Single Network Baselines; and 2) Separate Network Baselines.

Single Network Baselines: those that utilize the positive and negative links together while

calculating the signed centrality values.

• Signed Spectral Ranking (SR) [24]: This method computes the dominant left eigenvector of

the signed adjacency matrix A.

• Exponential Ranking (ER) [51]: This method simultaneously uses both positive and negative
links and is based upon PageRank [92]. It utilizes a heuristic approach on an exponential
variation and has a ﬁxed-point solution if the exponential parameter  is selected appropri-
ately.

50

• Signed Random Walk with Restart (sRWR) [96]: This model is state-of-the-art for personal-
ized ranking in signed networks based on the unsigned random walk with restart method and
incorporates balance theory. We note that this method is not the same as the signed random
walk with restart method we proposed in Section 3.1. The centrality c of a user  can thus
be the summation of personalized rankings for all other users   towards .

Separate Network Baselines: those that split the positive and negative links into two inde-
pendent networks and then at the end combine two separate centrality values that were calculated
independently.

• Modiﬁed PageRank (MPR) [65]: Here PageRank [92] is performed on the positive only and
negative only networks (i.e., A+ and A−, respectively) and then the negative centrality scores
are subtracted from the positive centrality scores.

• Modiﬁed HITS (MHITS) [65]: This method recursively calculates the hubs and authorities
scores separately for the positive only and negative only networks. Then centrality is the
authority The ﬁnal signed centrality is the authority score on the positive network minus the
authority score on the negative network.

The parameter settings for the baselines were set as follows: 1) for sRWR we use the selected
parameter values of  = 0.6 and  = 0.9 for Slashdot,  = 0.5 and  = 0.9 for the Epinions, and the
two Bitcoin datasets use the same a Epinions, since these were the selected parameters from a grid
search in [96] for Slashdot and Epinions, and since the two Bitcoin datasets have a more similar
balance and positive/negative link ratio to Epinions as compared to Slashdot; 2) for MPR and SR
we used  = 0.15 for the zapping probability as commonly used in practice [24, 65]; 3) for ER we
use  = 2 as this value satisﬁes their convergence requirement discussed in [51] based on signed
edge weights as either -1 or 1. For the parameters of our DeSCent measurement we had performed
a grid search over a set of parameter values. More speciﬁcally we varied 1, 2, , and  while
ﬁxing  = 0.1 and  = 1. Note that in our experiments we ﬁxed + = − and + = − and we use
 and  to denote these merged parameter values, respectively.

51

Table 3.6: Signed link prediction results with AUC.

Centrality

Measurement

SR
ER
sRWR
MPR
MHITS
DeSCent

Bitcoin-
Alpha
0.567
0.598
0.576
0.618
0.602
0.622

Bitcoin-
OTC
0.565
0.619
0.570
0.692
0.650
0.702

Slashdot Epinions
0.570
0.669
0.666
0.602
0.666
0.570
0.593
0.658
0.667
0.680
0.692
0.713

Comparison Results: The results across our four signed networks can be found in Table 3.6.
We ﬁrst observe that most of the time, single network baselines obtain worse performance. These
observations support that we should not consider negatives links as neither weak positive links nor
the negation of positive links and distinguishing positive and negative links is necessary. We also
note that our Deep Signed Centrality (DeSCent) measurement has the best AUC across all four
datasets. Due to the fact that the AUC metric is more sensitive to incorrectly handling negative
relations, we believe the better performance is DeSCent’s ability to utilize a deep neural network
(which can eﬀectively extract more complex patterns between the positive and negative links)
along with the utilization of both status and balance theories along with higher-order relations in
our objective function.

3.2.4.2 Generalization Across Datasets

In this subsection we seek to further quantify the advantages of using deep learning in deﬁning our
deep signed centrality measurement. Thus we perform experiments to test how well our DeSCent
deep network framework is able to generalize across signed network datasets. More speciﬁcally, we
perform inductive experiments where we train a deep network on a single dataset and then utilize
that learned model to extrapolate the centrality scores in other networks from their respective node
features. This for one tests whether the deep framework is learning speciﬁc properties nested inside
each signed network dataset, or if it is leaning more general patterns that are inherently found in all
signed network datasets.

52

(a) Bitcoin-Alpha

(b) Bitcoin-OTC

(c) Slashdot

(d) Epinions

Figure 3.4: Signed link prediction performance comparison of within versus cross training.

Here we present the AUC for the signed link prediction comparing the performance when their
centralities were calculated from a deep network trained on the same network (which we denote
as “within”) or when utilizing the deep network trained on another dataset and thus performing
centrality calculations across signed networks (which we denote as “cross”). This can be seen in
Figure 3.4, where we have shown the generalization across datasets when training in each of our
four datasets and applying to the other three. The main observation we can make from Figure 3.4
is that indeed the parameters learned for the deep networks are very robust and general, since the
performance when trained cross networks is very similar (and in fact sometimes slightly better
or identical) to the centralities calculated within the same network. We note that there are many
advantages of being able to calculate the centralities across networks eﬃciently. One such example
would be the ability to learn the parameters of DeSCent’s deep network from a small network
(such as either of the two bitcoin datasets) fast and eﬃciently to then have the ability to calculate
the signed centrality scores in a larger network (such as Epinions or Slashdot) by just feeding
their features through the deep network already optimized from the smaller dataset. Thus, these
inductive experiments provide even more evidence to the usefulness of harnessing deep learning in

53

Bitcoin-OTCSlashdotEpinions0.00.20.40.60.81.0AUCBitcoin-Alpha  Other datasetsCrossWithinBitcoin-AlphaSlashdotEpinions0.00.20.40.60.81.0AUCBitcoin-OTC  Other datasetsCrossWithinBitcoin-AlphaBitcoin-OTCEpinions0.00.20.40.60.81.0AUCSlashdot  Other datasetsCrossWithinBitcoin-AlphaBitcoin-OTCSlashdot0.00.20.40.60.81.0AUCEpinions  Other datasetsCrossWithinour DeSCent framework to discover signed centrality scores.

3.2.4.3 Parameter Analysis

Here we evaluate the contribution of the sum constraint (i.e., Eq. (3.22)) and the status constraint
(i.e., Eq. (3.23)). We only show results on Bitcoin-Alpha as a representative dataset, since we have
similar observations on other datasets.

The parameter 1 in this method is used to control the contribution of the sum constraint, which
ensures our algorithm can avoid convergence towards the trivial zero solution. Here we investigate
the change in performance as we vary the value of 1 (including setting it to zero, which would
fully eliminate any contribution). Note that we keep all of DeSCent’s other parameters ﬁxed while
we vary the value of 1. We present the results for a representative dataset, Bitcoin-Alpha, from
the cross validation results in terms of AUC in Figure 3.5(a). We point out that while 1 is set to
zero the performance is decreased, thus showing the sum constraint is able to aid in ﬁnding a higher
performing solution.

Next we vary the parameter 2 while keeping the other parameters ﬁxed. In this method 2 had
been used for controlling the contribution of the status constraint, which was designed to ensure
that the solution we ﬁnd (in a global sense) still adheres to the sign suggested by status theory.
Similarly as done for the sum constraint we report the AUC found while varying the value of 2
on the Bitcoin-Alpha dataset. From the results shown in Figure 3.5(b), we can see similar ﬁndings
to those discovered for 1. More speciﬁcally, we notice the usage of the status constraint with
2 = 1000 is shown to be quite eﬀective.

54

(a) Sum constraint (i.e., 1).

(b) Status constraint (i.e., 2).

Figure 3.5: Analyzing the signed centrality additional constraints on Bitcoin-Alpha.

55

CHAPTER 4

MODELING NETWORKS WITH NEGATIVE LINKS

In this chapter1, 2, 3, we investigate modeling of networks having negative links. More speciﬁ-
cally, we ﬁrst focus on constructing a generative network model for signed networks including
an automated parameter learning framework that we empirically evaluate compared with existing
mechanistic signed network models and other baseline models. Thereafter, we study how to extend
and model balance theory in signed bipartite networks, followed by an empirical analysis verifying
its applicability towards being harnessed for sign link prediction.

Generative network modeling aims to design a model to represent a complex network through
a few relatively simple set of equations and/or procedures such that, when provided a network as
input, the model can learn a set of parameters to construct another network that is as similar to
the input as possible. Ideally this would result in many observable/measurable properties being
maintained from the input to the generated output network.
In unsigned networks, the typical
modeled properties are the power law degree distribution [97, 98, 99, 11], assortativity [100, 101],
clustering coeﬃcients [102, 103, 104, 105], and small diameter [98, 103]. Nowadays, more data
can be represented as large networks in many real-world applications such as the Web [28, 106],
biology [107, 108], and social media [109, 110]. Increasing attention has been attracted in better
understanding and modeling networks. Traditionally network modeling has focused on unsigned
networks. However, many networks can have positive and negative links (or signed networks [30,
29]), especially in online social media, which then raises the question – whether dedicated eﬀorts
1Tyler Derr, Charu Aggarwal, and Jiliang Tang. “Signed Network Modeling Based on Structural
Balance Theory.” In Proceedings of the 27th ACM International Conference on Information and
Knowledge Management (CIKM). 2018.

2Tyler Derr, Cassidy Johnson, Yi Chang, and Jiliang Tang. “Balance in Signed Bipartite Net-
works.” In Proceedings of the 28th ACM International Conference on Information and Knowledge
Management (CIKM). 2019.

3Tyler Derr and Jiliang Tang. “Congressional Vote Analysis Using Signed Networks.” In

Proceedings of the 18th International Conference on Data Mining Workshops (ICDMW). 2018.

56

are needed to model signed networks in addition to the unsigned techniques. Thus, in Section 4.1
we present our proposed generative signed network model that focuses on maintaining core network
properties including the degree distribution and local clustering and in addition those speciﬁc to
signed networks, namely link sign ratio and signed triangle distributions to ensure they maintain
the correct level of balance.

Although we have primarily focused on the development of signed networks, which are a speciﬁc
type of network that has become increasingly ubiquitous, there are in fact various variants of signed
networks. However, previous work and theories for signed networks have primiarly focused on
unipartite signed networks, which are networks that have a single node type and signed links are
able to connect any two nodes in the network. However, a common form of signed networks that
have primarily been overlooked – signed bipartite networks. These networks have two sets of nodes
and links are only able to be formed between nodes of diﬀerent types. Actually, signed bipartite
networks appear across multiple domains. For example, in e-commerce, a signed bipartite network
can be constructed between buyers and sellers in multi-vendor marketplaces when the users are
asked to rate the other after each transaction and helpfulness ratings from users to reviews can
be naturally denoted as a signed bipartite network. Signed bipartite networks on the one hand,
are commonly found, but have primarily been overlooked. Their complexities of having two node
types where signed links can only form across the two sets introduce challenges that prevent most
existing literature on unipartite signed and unsigned bipartite networks from being applied. On
the other hand, balance theory, a key signed social theory, has been generally deﬁned for cycles
of any length and is being used in the form of triangles for numerous unipartite signed network
tasks. However, in bipartite networks there are no triangles and furthermore there exist two types
of nodes. Therefore, in Section 4.2, we conduct the ﬁrst comprehensive analysis and validation of
balance theory using the smallest cycle in signed bipartite networks - signed butterﬂies (i.e., cycles
of length 4 containing the two node types). Then, to investigate the applicability of balance theory
aiding signed bipartite network tasks, we develop multiple sign prediction methods that utilize
balance theory in the form of signed butterﬂies.

57

4.1 Generative Modeling of Signed Networks

Signed networks are unique from unsigned not only due to the increased complexity added to
the network by having a sign associated with every edge, but also (and more importantly) because
there are speciﬁc principles (or social theories), such as balance theory, that play a key role driving
the dynamics and construction of signed networks [2, 4]. For example, in unsigned networks we
have the property of transitivity and we see a large amount of local clustering (i.e., formation of
triangles). In comparison, with signed networks, not only are their patterns in the network driving
local clustering, but also in the distribution of triangles (based on their edge signs) found in the
network. Suggested by balance theory [29], some triangles are more likely to be formed (i.e.,
balanced) than others (i.e., unbalanced) in signed networks. Hence, modeling signed networks
requires to preserve not only unique properties of signed networks such as the sign distribution,
but also other properties suggested by their principles such as the distribution of formed triangles.
However, these mechanisms are not incorporated into unsigned network modeling and unsigned
network models are unequipped for signed networks. Thus, there is a need to design network
models for signed networks.

Network models have many direct applications and a diverse set of beneﬁts beyond and including
the better understanding of the network structure and dynamics. Currently there is a signiﬁcant push
for better anonymization in social media. However, for researchers wanting to further advance their
ﬁeld, it is necessary to utilize the network data for knowledge discovery, mining, and furthermore
for testing and benchmarking their methods and algorithms. A generative network model could be
utilized for constructing synthetic networks having similar properties as their corresponding real
network, but without compromising the user’s privacy and allowing further advancements through
the use of the synthetic network datasets. Similarly such a model can be used as a null-model for
network property signiﬁcance testing or for constructing synthetic networks of varying network
properties to further understand the relationship between the network model and real world networks
in terms of their dynamics and construction process. Thus, we propose a novel signed network
model, which targets to preserve three key properties of signed networks – (1) degree distribution;

58

(2) sign distribution and (3) balance/unbalanced triangle distribution suggested by balance theory.

4.1.1 Problem Statement
A signed network G is composed of a set  = {1, 2, . . . , } of  vertices, a set of + positive
links E+ and a set of − negative links E−. Let E = E+ ∪ E− represent the set of  = + + −
edges in the signed network when not considering the sign. Here we focus on undirected and
unweighted signed networks and leave modeling directed and weighted signed networks as one
future work. We note that unlike deﬁned in Section 2.1 here we use V to denote the set of vertices
in the signed graph, as compared to U representing the users in the signed network to follow more
traditional notations in unsigned generative graph models.

We can formally deﬁne the generative signed network modeling problem as follows:

 , E−

Given a signed network G = ( , E+
 ) as input, we seek to learn a set of parameters Ǝ for
a given model M that can retain the network properties found in G, such that we can construct
synthetic output networks G = (, E+
 ), using M based on Ǝ, that closely resemble the input
network in terms of measured network properties.

 , E−

Traditionally network modeling has focused on unsigned networks and preserving unsigned
network properties. Signed networks have distinct properties from unsigned networks [67]. For
example, negative links are available in signed networks and ignoring the negative links can result
in over-estimation of the impact of positive links [111]; and most of triangles in signed networks
satisfy balance theory [28]. However, these properties cannot be simply captured by unsigned
network models. Hence, dedicated eﬀorts are demanded to model signed networks. The notations
we will use in deﬁning our proposed signed generative network model are demonstrated in Table 6.4.

4.1.2 An Overview of Balanced Signed Chung-Lu (BSCL) Model

A previous study demonstrated that the node degrees of signed networks also follow power law
distributions [67] similar to that of unsigned networks. Hence, we propose to build the signed
network model based on the unsigned Chung-Lu model, which can preserve the degree distributions

59

Table 4.1: Notations regarding signed network generative modeling.

Notations Descriptions
 

d

Ɗ 
π


 /)

undirected edge between vertices  and  
set of neighbors for node 
degree vector based on E where  is the degree of 
fraction of links being positive in G (i.e., +
fraction of triangles balanced in G
sampling vector from degree distribution in E
probability new edge closes a wedge to be a triangle in G
via two-hop walk instead of a randomly inserting an edge
probability a randomly inserted edge into G is positive
probability of closing a wedge to have more balanced triangles in G
approximation to the E[# of (balanced) triangles]
that will get created due to randomly inserting  
average of Ɗ
across all possible edges
approximation to the E[# (balanced) triangles]
created when inserting edge   via the wedge closing procedure
average of Ɗ

across all possible edges


Ɗ
 ()

Ɗ
G()
Ɗ
 ()

Ɗ
G()

 ()

 ()

of the input work. The Chung-Lu (CL) model ﬁrst takes an unsigned network G = ( , E) as
input and independently decides whether each of the 2 edges are placed in the generated network
  
with each edge   having probability
2 where  is the degree of node  and  is the number of
edges in the network. It can be shown that the expected degree distribution of the output network G
is equivalent to that of G. A fast variant of the Chung-Lu model, FCL [112], is proposed to create
a vector π which consists of 2 values, where for each edge both incident vertices are added to
the vector. Rather than deciding whether each of the 2 edges get added to the network (as done in
CL), FCL can just randomly sample two vertices from π uniformly, since this simulates the degree
distribution. Note that FCL ignores self-loops and multi-edges when sampling  edges. However,
since most real-world unsigned networks have higher clustering coeﬃcients than those generated by
CL and FCL, another CL variant Transitive Chung-Lu (TCL) was introduced in [102] to maintain
the transitivity. Rather than always picking two vertices from π, instead, TCL occasionally picks a
single vertex from π and then, with a parameter , performs a two-hop walk to select the second
vertex. When including this edge, the process is explicitly constructing at least one triangle by

60

closing oﬀ the wedge (i.e., wedge closing procedure) created by the two-hop walk.

The proposed Balanced Signed Chung-Lu model (BSCL) is based on the TCL model, which
automatically allows the mechanism for maintaining the degree distribution and also the local
clustering coeﬃcient during the construction process. However, as previously mentioned, the
distribution of formed triangles is a key property in signed networks and most of these triangles
adhere to balance theory. Note that, when performing the wedge closure procedure, we are not only
closing the single wedge we explicitly constructed (through our two-hop walk), but there could be
other common neighbors between these two vertices. Thus, we introduce a parameter , which
denotes the probability of assigning the edge sign to ensure the majority of the triangles being
created by this new edge are balanced. With the introduction of this parameter, our model is able
to capture a range of balance in signed networks. This is necessary since not all signed networks
are completely balanced, and in fact real-world networks can have a varied percentage of triangles
being balanced [4].

Meanwhile, we also want to maintain sign distribution. However, the above process of deter-
mining the edge sign for wedge closure is based on balance theory (i.e., local sign perspective) and
not on the global sign perspective (i.e., ). This implies that when randomly inserting an edge into
the network if we simply choose the sign based on , then this could lead to our generated networks
deviating from the true sign distribution of the input network. Therefore, we introduce , which is
a corrected probability (instead of using ) for a randomly inserted link and is used to correct the
bias of positive or negative edges that will be created through the use of  which is from the local
sign perspective.

With the introduction of three parameters (i.e., ,  and ), the proposed balanced signed
Chung-Lu model (BSCL) is shown in Algorithm 4.1. Here we step through the high level processes
of BSCL before later discussing both the network generation process and the parameter learning
algorithms. On line 1 of Algorithm 4.1, we ﬁrst construct E. Then, using the degree distribution
of E, we can construct the vertex sampling vector π as shown on line 2. Next we calculate the
properties of the input network we aim to preserve. These include the percentage of positive links

61

 , E−
 )

 , E−
 )

 ∪ E−

Algorithm 4.1: Balanced Signed Chung-Lu (BSCL) Model
Input: Signed Network G = ( , E+
Output: Synthetic Signed Network G = ( , E+
1 E = E+
2 π ← Sampling vector based on the degree distribution in E
3  ← +
+
 +−
4 d ← Calculate_Degree_Vector( , E)
5 Ɗ  ← Percentage balanced triangles in G
6 , ,  ← Parameter_Learning(E , , Ɗ , d, )
7 E+
 ← Network_Generation(, , π, , , )


 , E−

, vector of degrees d, and percentage of balanced triads Ɗ  from lines 3 to 5. With these values,
we will estimate the major parameters of BSCL including ,  and  as mentioned on line 6 using
our learning algorithms that will be discussed in subsection 4.1.4. Finally, we generate the network
based on the learnt parameters on line 7 and then output the constructed synthetic signed network
G. In the next subsection, we will discuss the details of the network generation process performed
by BSCL and then discuss how these parameters can be automatically and eﬃciently learned.

4.1.3 Network Generation for BSCL

Given the parameter values for ,  and , we show in Algorithm 4.2 how BSCL can generate a
synthetic signed network maintaining the key signed network properties. First, on line 1, we use
the FCL method for the construction of a set E of M edges, which adheres to the original degree
distribution. Then, on line 2, we split the unsigned edges into two sets, E+
 , by randomly
assigning edge signs, based on , such that the percentage of positive links matches that of the
input network. Next, from lines 3 to 17, we add  new edges to the network, one at a time, while
removing the oldest edge in the network for each new edge inserted. The reason for starting with
this initial set of edges from FCL on line 1 is due to the fact when performing the wedge closing
procedure (from lines 5 to 10), if the starting network is initially too sparse, there will not be many
opportunities for two-hop walks to create triangles. We note that after each iteration from lines 3
to 17, G maintains the correct total number of edges, .

 and E−

62

 , E−

 ← Randomly partition E based on 

Algorithm 4.2: BSCL_Network_Generation(, , , , , ).
1 E =  (M,π)
2 E+
3 for 1 to M do
4
5
6
7
8

  = Perform two-hop walk from  through neighbor 
if close_for_balance() then

 = sample from π
if wedge_closing_procedure() then

 or E−

Add   to E+
neighbors to have more balanced triangles
Add   to E+

 or E−

 to have more unbalanced triangles

 based on the sign that closes the wedge and other common

9
10

11
12
13
14
15
16

else

else insert a random edge
  = sample from π
if create_positive_edge() then
 ∪ { }
 ∪ { }

E+
 ← E+
E−
 ← E−

else

Remove oldest edge from {E+

 ∪ E−

 } respectively

17
18 return E+

 , E−


We will either insert an edge by closing a wedge into a triangle and using  to help maintain
the balance, or insert a random edge and select its sign based on  to correctly maintain the sign
distribution. On line 5, we use our parameter  to determine which edge insertion method we will
use. Next we will further discuss these two edge insertion procedures.

The wedge closing procedure is selected with probability  on line 5, but starts on line 4 with
the selection of  uniformly at random from π. Then, on line 6, we perform a two-hop walk from
 through a neighbor  to land on  . We have just selected the wedge consisting of edges 
and   to close into a triangle. We note that although we are explicitly constructing the triangle
composed of vertices ,  and   that edge   would also implicitly be closing wedges to form
triangles with any other common neighbors that  and   might have. Hence, we use our learned
parameter  for determining if we should introduce more balanced or unbalanced triangles into the
network based on the total balance in the input signed network (i.e., Ɗ ). Therefore, on line 7,
with probability , we choose to select the edge sign of   such that the majority of the triangles

63

being created (both those explicitly through the two-hop walk and implicitly through other common
neighbors) will adhere to balance theory. As mentioned on line 8, depending on whether balance
theory would suggest   to being positive or negative, we will add the edge to the set E+
 or E−
 ,
respectively. Similarly, with probability (1-), the sign of   will be selected to introduce more
unbalanced triangles into the generated network.

If not performing the wedge closing procedure, BSCL will instead insert a random edge with
probability (1 − ). This process starts similarly as line 4 by selecting the ﬁrst vertex . Then,
on line 12, a second vertex is sampled from π such that we can then insert edge   into the
network. However, since we desire our generated network to maintain the correct sign distribution,
the sign for the edge   needs to carefully be determined. As previously discussed, the wedge
closing procedure will disrupt the global sign distribution and therefore rather than using  for
the sign selection, we use our learned parameter . We again note that  will be learnt such
that it incorporates the bias induced from the local sign selections made during the wedge closing
procedure controlled by . Therefore, with probability , on line 13, we choose to go to line 14 and
insert   as a positive link and add it to the set E+
. On the other hand, with probability (1 − ),
we go to line 16 and select   to be negative and therefore add it to the set of negative edges E−
 .
After edge insertion, the next step is to remove the oldest edge in the generated network G
such that it maintains  edges. Line 17 shows that we select the oldest edge from the union of the
positive and negative edge sets (i.e., E+
 ) and then respectively remove it from the edge set
it was selected from. After performing this loop from lines 3 to 17  times, all the initial edges
from FCL will have been removed and the network generator can return the resulting positive and
negative edge sets E+

 , respectively.

 ∪ E−

 and E−

One step we did not mention in Algorithm 4.2 for ease of description is that we also make use of
a queue for when having collisions (i.e, selecting to insert an edge that already exists in the network
or a self-loop). For every time we have such a collision, the vertices are added to the queue. Then,
before each time selecting an edge from π (i.e. on lines 4 and 12), the queue is checked. If the
queue is empty, then we proceed to sample from π. However, if the queue is non-empty, then we

64

instead take from the front of the queue. Similarly, we utilize the queue if unable to perform a two
hop walk from vertex . Next we will discuss how we can learn the parameters , , and .

4.1.4 Parameter Learning for BSCL

In the last subsections, we have introduced the BSCL model and network generation process based
on the parameters , , and , here we discuss how to learn these parameters from the input
signed network. We notice that these parameters are related to each other. For example, when
constructing triangles to be balanced or unbalanced (based on ), this will disrupt the global sign
distribution since these decisions are only based on the local sign perspective. Similarly, when
inserting a random edge with a sign based on , this has the potential to disrupt the distribution
of triangles and the percentage of triangles that are balanced in the network. This is because the
decision for the sign of a random edge insertion is based solely on the global sign perspective
and ignores the local perspective of whether triangles are being created via this inserted edge to
be balanced or unbalanced. Hence, next we discuss the proposed algorithm for learning these
parameters alternatively and iteratively.

4.1.4.1 Learning 

For the parameter , we make use of the Expectation-Maximization (EM) learning method following
a similar process in the TCL model [102]. The general idea is that it can be learned after deﬁning
a hidden variable associated with each edge, which determine whether the edge was added to the
network randomly or through a wedge being closed into a triangle. More speciﬁcally, let   ∈ 
be the latent variable assigned to each edge  . These latent variables can be equal to 1 or 0, where
  = 0 indicates that the edge was created via random sampling from π and   = 1 suggests that
the edge   was created through the two-hop walk wedge closing procedure.

Let π represent the probability of selecting  from the sampling vector π, I[  ∈ ] as
an indicator function to be 1 if   is in the neighbor set of  and 0 otherwise, and  denote the
value of  at iteration  during the EM process. Next we analyze the two procedures of wedge

65

closing or random insertion given a starting node (i.e., ﬁrst selected node) . We can calculate the
probabilities based on the following: (1) for the random insertions with probability (1 − ) and
selecting   as the second node with probability π; (2) the wedge closing with probability  and
the probability we were able to perform a two-hop walk to   is based on ﬁrst having  that is a
mutual neighbor of  and   (i.e.,  ∈  and  ∈  ) and then the walk continues to   (i.e.,
selecting   from the  neighbors of ) once arriving at the mutual neighbor . Therefore, we
can formulate the conditional probabilities for placing the edge   given , the starting node  and
the method for either the random insertion or wedge closing procedure (that is represented with
 ) are as follows, respectively:

( |  = 0, , ) = (1 − )(π)

( |  = 1, , ) =  

(cid:16) I[  ∈ ]

(cid:17)(cid:16) 1

(cid:17)

∈


For the calculation of the expectation of   given ,  , and the starting node , the conditional

probability of   can be deﬁned using the Bayes’ Rule as follows:

(  = 1|  , , ) =

( |  = 1, , )

( |  = 1, , ) + ( |  = 0, , )

(4.1)

which calculates the probability of   being 1 based on the probability of the edge being created
by wedge closure over the probability the edge   is expected to get created. This leads to the
expectation of   to E[ |] = (  = 1|  , , ). Furthermore, the maximization for the
expectation can be calculated via sampling a set of edges S uniformly from E. Then, due to the
fact   is conditionally independent, we can individually calculate the expectation of   for each
edge in S and then take the average across the set of edges sampled as:


 ∈S

+1 =

1
S

E[ |]

(4.2)

4.1.4.2 Learning 

Note that we have calculated Ɗ  from the input network that denotes the percentage of triangles
that were adhering to balance theory. We seek to approximate the expected number of triangles

66

G

G

and Ɗ

BSCL will construct through the wedge closure and the random edge insertion methods on average
for each edge added to the network. Let us denote the values we calculate for these two methods
as Ɗ
, respectively, which will be calculated with respect to both  and .
Furthermore, we will calculate what percent of these we expect to be balanced as Ɗ
and
Ɗ
will be discussed later. To
G
correctly maintain the percentage of triangles being balanced in the synthetic network, we desire
the following:

. Details of estimating Ɗ

, and Ɗ

, Ɗ

G

G

G

G

Ɗ  =

Ɗ
G
Ɗ
G

+ Ɗ
+ Ɗ

G
G

which simply states the combined balanced percentage from the two methods should be the balanced
percentage of the input network. Then, we can calculate the above mentioned values and if we
let Ɗ
, which denotes that  percent of the triangles we close via the wedge
closing procedure are balanced, then we can solve  and obtain the below:

= Ɗ

G

G

(4.3)

Ɗ (cid:0)Ɗ

G

 =

(cid:1) − Ɗ

G

G

+ Ɗ
Ɗ
G

Next, we discuss how to estimate Ɗ

G

, Ɗ

G

and Ɗ

G

.

G

Estimating random

: We note that the starting set of edges are constructed with the FCL
method and edge signs randomly assigned to them. Furthermore, each edge will have been added
  
2 . We note that the expected number of common
into the network with probability   =
neighbors between two vertices  and   would be equivalent to the number of triangles that get
created if the edge   was inserted into the network where we denote this number of triangles to
be Ɗ

 
To obtain the number of common neighbors for  and  , we calculate the probability that
 ∈ \{,  } is a common neighbor based on the probability there exists an edge from  to
both  and  . Note that after having the probability of the existence for the ﬁrst edge , we must
subtract 1 from , since we have already conditioned on the existence of the ﬁrst edge , thus

.

67

causing  to have one less opportunity to connect to  . We formulate this idea as the following:

(cid:16) 

(cid:17) 
(cid:16)  

∈\{, }

2

(cid:17)(cid:16)  ( − 1)
(cid:16) ( − 1)

2

(cid:17)
(cid:17)

2

∈\{, }

2

Ɗ
 

=

=

Next we present the average value of Ɗ
follows:

 
across all possible unordered pairs of vertices as

Ɗ

G

=

1

2 ( − 1)
1

−1


=1

=+1

Ɗ
 

G

where Ɗ
is used to denote the average triangles constructed by a randomly inserted edge in
the model. We note that the above would require (3) time to compute, but using the fact that
2 =  ∗ (), we can use the following approximation if we treat the summation such that it
includes  and   instead of excluding them. We use () to denotes the average degree and
(2) represents the average value of squared degrees. Then we have:


(cid:16) ( − 1)

2

∈\{, }

=

(cid:17) ≈ 
(cid:16) ( − 1)
(cid:17)
2
∈
(cid:0)(2)(cid:1) −(cid:0)()(cid:1)
2 + · · · + 2
(2
1 + 2
) − (1 + · · · + )
2
(cid:17)
(cid:16) (2) − ()
 ∗ ()

−1

()

=

=

 
=1

=+1

(4.4)

We therefore can rewrite Ɗ

G

Ɗ

G

as follows:
≈ (2) − ()
()  ( − 1)

in () time). Second, for−1

First we note that we only need to compute () and (2) once (which can be performed
=+1  , rather than iterating over the nested sum of
 =  + 1 to , we can instead use dynamic programming to construct a vector s where  represents

=1 

68


=+1  . We can construct this vector s starting with  =  when  =  −1 and then recursively
ﬁlling in the vector using −1 =  + , which can be performed in linear time in relation to the
can be performed in ()
number of vertices . Therefore the below approximation for Ɗ
time instead of (3).

G

Ɗ

G

≈ (2) − ()
()  ( − 1)

−1

=1


(4.5)

G

G

Estimating Ɗ

: We further analyze beyond our calculation of Ɗ

by examining the
wedges (i.e., common neighbors) to be one of the following formations: ({+,+}, {+,−}, {−,+},
{−,−}), where {+,−} is used to represent the wedge formed by edges  and   and their signs
are positive and negative, respectively. Note that we ﬁrst initialize our model with edges from FCL
and select edge signs to perfectly match the sign distribution of the original network, and then
we attempt to correctly maintain this distribution with the parameter ; hence we assume that all
wedges were created by the original sign distribution (where  is the probability of a link being
positive). Below, we use Ɗ+−
to represent the number of wedges that would be closed into
triangles when adding the edge   and were formed with a wedge of type {+,−}. The deﬁnitions
for all the wedge types are:

 
Ɗ++
Ɗ+−
Ɗ−−

 
= Ɗ
= Ɗ−+
= (1 − )(1 − )Ɗ

 
= (1 − )Ɗ

 
The expected number of balanced triangles that would be created if the edge   is inserted
, can be obtained via the expected number of wedges of diﬀerent types and the

randomly, Ɗ
corrected positive link probability  as:

  
Ɗ
  

= Ɗ++
+(1 − )Ɗ−+

+ (1 − )Ɗ+−
+ Ɗ−−

 
(4.6)

where for a wedge with two existing edges to close to a balanced triangle, the added third edge
would need to have a sign such that there are an even number of negative links in the resulting

69

G

by

G

triangle, according to balance theory. This can also be extended for the calculation of Ɗ
averaging across all edges.
Estimating Ɗ

: Similarly, we will calculate the expected total number of triangles and
the balanced percentage when using the wedge closing procedure. The main idea for this wedge
closure is that we are guaranteed to select vertices such that we have at least one triangle being
created each time. We then need to also add the expected number of triangles that would be created
randomly by other common neighbors of  and   (similar to the random edge insertion case
above). Note that we must however discount the degree of  and   by 1, since in this method,
we have already explicitly used one of the links coming from both to discover this one common
neighbor for the wedge closure edge insertion. Let us denote the selected common neighbor as ,
which forms a wedge with the edges  and  .

Ɗ
 

2

∈\{,  ,}

= 1 + 
= 1 +(cid:16)   −  −   + 1
≈ 1 +(cid:16)   −  −   + 1
−1

G
G
≈ 1 + (2) − ()
()  ( − 1)

similar to Ɗ

2

=1

Ɗ
 

Ɗ
G

(cid:16)( − 1)
(cid:17) 

(cid:17)
(cid:17)(cid:16)(  − 1)( − 1)
(cid:16) ( − 1)

2

2

(cid:17)

∈\{,  ,}

2

(cid:17)(cid:16) (2) − ()

(cid:17)

()

, which results in the following:

(cid:0)( − 1)( −  + )(cid:1)

Similarly, as the approximation in Eq. (4.4), we can simplify the formulation of Ɗ

as:

 
Then we can also calculate Ɗ

4.1.4.3 Learning 

Here we want to estimate the percentage of positive edges. Therefore, we will examine this
percentage for both the wedge closing and the random edge insertion, which are denoted as
 and , respectively. Then, if we have these two values, we can correctly maintain

70

the percentage of positive links in the synthetic network as the following:

 =  + (1 − )

(4.7)

The above is due to the fact that the wedge closing procedure will insert  percent of the links into
the generated network while the random insertion method will construct (1 − ) percent.

For the wedge closure, we examine the probability of the four types of wedges: {+,+}, {+,-},
{-,+} and {-,-}. Then we deﬁne their probabilities of existing in the network to be P({+,+}),
P({+,-}), P({-,+}) and P({-,-}). Next we note that the wedge {+,+} and {-,-} would result in a
positive edge being created with probability , while wedge {+,-} and {+,-} would only provide a
positive edge with probability (1 − ). The reason for this is due to the fact that balance theory
would be controlling the third edge sign and while the ﬁrst two require a positive link to adhere
to balance theory; while it would be when we construct more unbalanced triangles that the later
two wedge types would result in a positive link insertion. Therefore, we denote the probability of
inserting a positive edge with the wedge closure to be the following:

 = 

(cid:16)

(cid:0)({+,+}) + ({−,−})(cid:1)
(1 − )(cid:0)({+,−}) + ({−,+})(cid:1)(cid:17)

(4.8)

(4.9)

We note that  is the probability of performing the wedge closing procedure. If we assume that 
and  are correctly solved, then the expected probability for the wedges is based on  and (1 − ).
We can therefore rewrite Eq. 4.8 as:

 = 

(cid:16)

(cid:0)() + (1 − )(1 − )(cid:1)
(1 − )(cid:0)(1 − ) + (1 − )(cid:1)(cid:17)

Next, we show how to calculate . We notice that  will be the percentage of links
we construct to be positive and this process will happen (1 − ) percent of the time. Thus,
 = (1 − ). We can now substitute  and  into Eq. 4.7 and solve for  as

71

follows:

(cid:32)

 −(cid:16)

(cid:0)() + (1 − )(1 − )(cid:1)

 =

1

(1 − )

+ (1 − )(cid:0)(1 − ) + (1 − )(cid:1)(cid:17)(cid:33)

(4.10)

4.1.5 Time Complexity of BSCL

We ﬁrst discuss the running time of the learning algorithm and then the time needed for the
network generation process. The preprocessing needed for the learning algorithm is to determine
the probability of edges being positive, , and the probability of triangles in the network being
balanced and adhering to balance theory, Ɗ , of the original network. We can determine  trivially
in (). However, Ɗ  can be reduced the complexity of triangle listing algorithms, which can be
easily performed using classical methods in ((3/2,  )), where  is the maximum
degree in the network[113]. The learning process for the parameter  is () where  is the
number of iterations in the EM method and  = |S| is the number of edges sampled for each
iteration. The running time of  and  can be determined as follows. We initially calculate the
expected number of triangles added by each of the processes of BSCL, which takes () instead
of (3) due to the approximation used and dynamic programming approach. Then the update
equations (i.e., Eqs. (4.3) and (4.10)) can both be performed in (1) time. Thus when allowing
for (cid:48) maximum iterations of the alternating update process between  and  (which empirically
only takes a small constant number of iterations to converge), we have the overall learning time
complexity for BSCL as ((3/2, ) + 2 +  + (cid:48)). The generation process of BSCL,
is built upon the fact that the running time for TCL is shown to be ( + ) in [102]. The triangle
closing process of determining the best sign selection based on the set of triangles being closed,
is reduced to the complexity of common neighbors between two vertices, which is known to be
(2

). Thus the generation process of BSCL is ( +  2

).

72

Table 4.2: Statistics of three signed social networks for generative modeling.

Network

Bitcoin-Alpha
Bitcoin-OTC

Epinions

N

3,784
5,901
131,580

(E, )

(14,145 , 0.915)
(21,522 , 0.867)
(711,210 , 0.830)

Ɗ 
0.862
0.869
0.892

4.1.6 Experiments

In this section, we conduct experiments to evaluate the eﬀectiveness of the proposed signed network
model. In particular, we try to answer two questions via our experiments - (1) can the proposed
model, BSCL, eﬀectively maintain signed network properties? and (2) is the parameter learning
algorithm able to learn the appropriate parameter values from the input signed network?

For our study of signed network modeling, we utilize three signed network datasets, i.e., Bitcoin-
Alpha, Bitcoin-OTC, and Epinions. We provide more details of the datasets in Table 4.2. Note
that, since we focus on undirected and unweighted signed networks, we ignore the directions of
signed links in these datasets.

4.1.6.1 Network Generation Experiment

The ﬁrst set of experiments are to compare the network properties of the resulting generated networks
from our model and the baselines. These properties will be used as a metric to determine how well
the models are able to capture the underlying dynamics of signed networks. More speciﬁcally, we
will focus on the three key signed network properties - (1) degree distribution; (2) positive/negative
link ratio and (3) proportion of balance/unbalanced triangles suggested by balance theory. Note
that we also present the local clustering coeﬃcient distribution and the triangle distribution (in
relation to the edge signs in the triangles). Our results are the averaged results of 10 generated
networks for each of the methods on each dataset.

The ﬁrst group of two baselines are existing signed network models:

• Ants14: This method is an interaction-based model for signed networks based on using ants

73

(a) Bitcoin-Alpha Local Clustering.

(b) Bitcoin-OTC Local Clustering.

(c) Epinions Local Clustering.

(d) Bitcoin-Alpha Degree Dist.

(e) Bitcoin-OTC Degree Dist.

(f) Epinions Degree Dist.

Figure 4.1: Visualization of the degree distributions and local clustering coeﬃcients.

to lay pheromone on edges [114].

• Evo07: This method is an evolutionary model for signed networks that had a “friendliness”
index that controls the probability of positive or negative links and also a parameter that
controls the maximum amount of unbalance [115].

Note that for Ants14, we perform a grid search on the parameter space for its 6 parameters
according to the values reported in [114]. Similarly, for Evo07, a grid search was performed for
the two parameters.

The next two baselines are built upon two popular unsigned generative models. We ﬁrst convert
the network to unsigned by ignoring the links, run the baseline model, and then randomly assign
signs to the edges such that the global sign distribution is maintained using .

• STCL: This method is from the unsigned TCL model [102].

• SKron: This method is from the unsigned Kronecker Product model [99].

74

Table 4.3: Positive/negative link sign distribution.

Links Positive
Bitcoin-Alpha
Bitcoin-OTC

Epinions

Absolute Diﬀerence

Real Ants14 Evo07 BSCL
0.912
0.915
0.867
0.860
0.808
0.830
0.031

0.917
0.869
0.830
0.004

0.741
0.740
0.930
0.401

Table 4.4: Proportion of triangles balanced in generated signed networks.

Percent Balanced
Bitcoin-Alpha
Bitcoin-OTC

Epinions

Absolute Diﬀerence

Real
0.862
0.869
0.892

STCL/SKron Ants14 Evo07 BSCL
0.802
0.752
0.748
0.321

0.786
0.698
0.644
0.484

0.750
0.677
0.639
0.557

0.787
0.817
0.939
0.174

The results of the properties that are in common with unsigned networks (i.e., the degree
distribution and the local clustering coeﬃcient) can be can be seen in Figure 4.1. We see that
BSCL and STCL both perform near identically on the degree distribution as they are both based
on the Transitive Chung-Lu model and therefore can very closely maintain the degree distribution.
However, it can be seen that the two signed network baselines, Ants14 and Evo07, perform very
poorly and do not even appear to follow a power-law distribution. We mention that SKron is not
able to exactly model the degree distribution, but does not perform as poorly as the two existing
signed network baselines. For the two existing signed network models similar poor ﬁndings can
be found for the local clustering coeﬃcient. Our proposed model BSCL along with STCL perform
the best. The SKron model has some clustering, but not near that of the original input network.

In Table 4.3, we show the positive/negative link ratio; while in Table 4.4, we present the
proportion balance/unbalance triangles. We make a comparison to the two existing signed network
methods, and also then follow with a comparison of BSCL to STCL and SKron which are the two
modiﬁed unsigned network models.

In Bitcoin-Alpha dataset, our model BSCL is able to achieve the closest proportion of balance
triangles. Then in the Bitcoin-OTC dataset, the Ants14 performs the best in terms of the proportion
of balance in the network. We further show a ﬁne-grained comparison by separating the four types
of triads on the Bitcoin-Alpha, Bitcoin-OTC, and Epinions datasets in Table 4.5, Table 4.6, and

75

Table 4.5: Distribution of signed triangle types in the Bitcoin-Alpha dataset.

Triad Type
{+,+,+}
{+,+,-}
{+,-,-}
{-,-,-}

Absolute Diﬀerence

Real
0.793
0.134
0.069
0.004

STCL/SKron Ants14 Evo07 BSCL
0.804
0.174
0.021
0.001
0.102

0.750
0.250
0.000
0.000
0.232

0.766
0.213
0.020
0.001
0.158

0.446
0.200
0.341
0.013
0.694

Table 4.6: Distribution of signed triangle types in the Bitcoin-OTC dataset.

Triad Type
{+,+,+}
{+,+,-}
{+,-,-}
{-,-,-}

Absolute Diﬀerence

Real
0.724
0.122
0.145
0.009

STCL/SKron Ants14 Evo07 BSCL
0.707
0.246
0.045
0.002
0.248

0.613
0.323
0.064
0.000
0.401

0.652
0.300
0.046
0.002
0.356

0.445
0.170
0.372
0.013
0.558

Table 4.7: Distribution of signed triangle types in the Epinions dataset.

Triad Type
{+,+,+}
{+,+,-}
{+,-,-}
{-,-,-}

Absolute Diﬀerence

Real
0.808
0.096
0.084
0.013

STCL/SKron Ants14 Evo07 BSCL
0.698
0.250
0.050
0.002
0.309

0.929
0.061
0.010
0.000
0.243

0.572
0.351
0.072
0.005
0.511

0.587
0.349
0.052
0.012
0.507

Table 4.7, respectively. We notice that Ants14 achieves this by drastically changing the distribution
among the four triangle types. However, our model performs the best overall in terms of the triangle
distribution. Similarly we notice a drastically low overall clustering in the Ants14 output networks
as seen in the local clustering coeﬃcient plot in Figure 4.1(b). Furthermore, although the Ants14
method more closely resembles the percentage of triangles being balanced across the three signed
networks (having a smaller absolute diﬀerence than BSCL), we can observe that this comes at a
tradeoﬀ of having a very inconsistent percentage of links being positive in the network as compared
to the input signed network, which can be seen in the absolute diﬀerence row of Table 4.3. Note
that in both of the Bitcoin datasets, our model BSCL is able to achieve better performance than
the baselines in terms of the triangle distributions, while only at the expense of sacriﬁcing < 1%
in terms of matching the correct positive link percentage, , of the input networks. The results are

76

similar in the Epinions dataset.

When comparing BSCL with STCL and SKron we mention that by design these other two models
will always have the exact percentage of positive links and their expected triangle distribution can

be calculated as:(cid:0)(cid:1),(cid:0)3×((1− ))(cid:1),(cid:0)3×((1− )(1− ))(cid:1), and(cid:0)(1− )(1− )(1− )(cid:1) for the

{+,+,+}, {+,+,-}, {+,-,-}, and {-,-,-} triangle types, respectively. We can observe that in terms of
the absolute diﬀerence between the percentages in each of these triangle types, the BSCL method
performs much better having an absolute diﬀerence of only 0.248 in the Bitcoin-OTC dataset while
the straight-forward modiﬁcation to the unsigned network models of maintaining the positive link
percentage correctly results in an absolute diﬀerence of 0.356. Similarly, when looking at the
percentage of triangles adhering to balance theory across the three signed network datasets BSCL
has a value of only 0.321 while STCL and SKron share an absolute diﬀerence of 0.484.

Overall we can see that the Ants14 model favors capturing the percentage of triangles adhering
to balance theory, but at the sacriﬁce of the triangle distribution (in regards to the triangle edge
signs) and performing just as poorly at maintaining the positive/negative link ratio. On the other
hand, the Evo07 model is able to correctly maintain the positive/negative link ratio, but struggles
to maintain reasonable percentage of balanced triangles when examining across the three datasets.
Our model BSCL overall outperforms the two previous signed network models as seen in the
Tables and Figures. Furthermore, we can see BSCL out performs STCL and SKron in terms of
maintaining the percentage of balanced triangles and triangle distribution. We note that BSCL
ﬁnds this improvement in other signed network properties while only losing about 1% on average
across each the three datasets in maintaining the positive/negative link ratio.

4.1.6.2 Parameter Learning Experiment

The second set of experiments are designed to test the learning algorithm we have proposed in
determining appropriate parameters for BSCL. Here we ﬁrst utilized natural and intuitive heuristics
for setting the parameter values of  and  to further evaluate the eﬀectiveness of our parameter
learning algorithms. More speciﬁcally, the value of  (i.e., the real network’s percentage of positive

77

Table 4.8: Absolute diﬀerence from the generated networks to the real signed networks averaged
over the three datasets for each respective property.

Sign

Distribution
Proportion

Balanced Triangles

Distribution
Triangle Types

BSCL

(learning  & )

BSCL

( =  &  = Ɗ )

0.031

0.107

0.220

0.040

0.164

0.351

links) is used as a natural choice for the value of  (i.e., the percentage of the time a randomly inserted
edge is positive) when not utilizing the proposed learning algorithm described in Section 4.1.4 for
BSCL. Similarly, the value Ɗ  (i.e., the real network’s percentage of triangles that are balanced)
can be chosen as the value for  (i.e., the percentage of time we explicitly close a wedge into a
balanced triangle) if we were to not use our proposed learning algorithm. Table 4.8 contains the
performance comparison (in relation to the link sign distribution, proportion of triangles balanced,
and distribution of triangle types) between the proposed parameter learning algorithms selected
values against the above mentioned heuristically picked values for  and . Note that these absolute
diﬀerence values are averaged across the three real-world signed networks. We can observe that
in all three properties our parameter learning algorithm signiﬁcantly out performs the most natural
heuristically picked values and thus providing further evidence our model can learn parameters to
more accurately generate synthetic variants of the real input signed network.

For a more detailed analysis we also perform a grid search across a reasonable area of the
parameter space for  and  to obtain optimal parameters. Then we compare the performance of
the learnt parameters and the searched optimal parameters to demonstrate the ability of the proposed
parameter learning algorithm. We only present the results in Figures 6.4 in terms of percentage
of balanced triangles and positive/negative link ratio for the Bitcion Alpha dataset, since we have
similar observations for Bitcion OTC and Epinions with other settings. Note that the z-axis is the
absolute diﬀerence away from the true input networks value (where lower is better). The “stars”

78

(a) Bitcoin-Alpha % balanced triangles results.

(b) Bitcoin-Alpha positive/negative ratio results.

Figure 4.2: BSCL parameter learning analysis.

in the ﬁgures are the coordinates along the x- and y-axis for the learned parameter. From the
ﬁgures, it looks convincing that indeed our parameter learning algorithm is able to ﬁnd appropriate
parameters for the input network.

4.2 Balance in Signed Bipartite Networks

Signed bipartite networks have begun to appear more commonly especially online. For example,
many online rating systems, such as Netﬂix and YouTube, adopt “thumbs-up" or “thumbs-down"
rating that can also be formulated as signed bipartite networks. Another real-world example is
from the political science domain, more speciﬁcally, we observe that indeed the United States
Congress is inherently a signed bipartite network formed from the representatives and the bills
they have voted on (where the "Yea" and "Nay" votes can be represented as positive and negative
links, respectively) [116, 117], which we present in more detail in this speciﬁc application later in
Section 6.2).

Although there have been works focused on unsigned bipartite networks, these methods are
lacking the capability to handle the further complexities of negative links. Similarly, methods
developed for unipartite signed networks might not be applicable when having the two node
types or limiting the possible connections in the network. For example, a fundamental theory
that explains the social phenomena of the link structure in signed network analysis is balance
theory [29, 30]. It suggests that a cycle in signed networks with an even number of negative links

79

Table 4.9: Notations regarding signed bipartite networks.

Notations Descriptions
B
P
P
A
U
V

Undirected signed bidjacency matrix
Adjacency matrix for the U-projection network
Adjacency matrix for the U-projection network
Adjacency matrix constructed from B, P, and P
Low-dimensional representation of nodes in U
Low-dimensional representation of nodes in U

is balanced, which is typically stated as “a friend of my friend is my friend” while an “enemy
of my friend is my enemy”. In unipartite signed networks balance theory has been extensively
applied on signed triangles (i.e., the smallest undirected cycle) across various real-world networks
to obtain better performance across modeling [115, 114, 118], measuring [51, 119, 120, 65], and
mining applications [23, 61, 53, 121]. However, in signed bipartite networks it is fundamentally
impossible to have any triangles while having the two diﬀerent types of nodes. Therefore it is
important to understand balance theory in signed bipartite networks and its possibility to enhance
applications, due to the prevalence of signed bipartite networks. Thus, dedicated eﬀorts are desired
for signed bipartite networks in additional to unipartite signed networks and unsigned bipartite
networks. Here we present a comprehensive analysis and validation of balance theory using signed
butterﬂies and then show their eﬀectiveness on improving the performance of link sign prediction
in signed bipartite networks. Furthermore, the empirical ﬁndings in sign prediction paves the way
for improvements in other signed bipartite network analysis tasks.

4.2.1 Balance Theory in Signed Bipartite Networks

In this section, we will introduce the signed bipartite network datasets we have collected for this
study. Thereafter we discuss balance theory from a general signed network perspective, then we
validate its applicability in signed bipartite networks, and perform a preliminary analysis on our
datasets; but ﬁrst, we introduce the deﬁnitions and notations.

Consider an undirected signed bipartite network, G = (U,U, E+, E−), where U =

80

Table 4.10: Statistics on signed bipartite networks.

U.S. Senate U.S. House

 = |U|
 = |U|
|E| = |E+| + |E−|
% Links Positive
% Links Negative
Density of B

Bonanza
7,919
1,973
36,543
97.98%
2.02%

2.339 × 10−3

1,056
145
27,083
55.31%
44.69%
0.1769

1,281
515

114,378
53.96%
46.04%
0.1734

{1, 2, . . . , } and U = {1, 2, . . . , } represent two mutually exclusive sets of homo-
geneous nodes with  and  representing the number of nodes for each set, respectively.
E+ ⊂ U × U and E− ⊂ U × U represent the sets of positive and negative edges, re-
spectively, between the two sets of nodes U and U. We let E = E+ ∪ E− be the set of all edges
where E+ ∩ E− = ∅, in other words, two nodes cannot have both a positive and negative edge
between them. We use B ∈ R× to represent the undirected signed bipartite biadjacency matrix
of G, where B  = 1,−1, or 0, when there exists a positive, negative, or no link between  and  .
We further summarize the major notations used throughout this section in Table 6.4.

4.2.1.1 Signed Bipartite Network Datasets

We have collected three signed bipartite networks for this study. The ﬁrst signed bipartite network
is from the e-commerce website Bonanza4. Bonanza is similar to eBay5 and Amazon Marketplace6
in that users create an account for which they can buy or sell various goods. After a buyer purchases
a product from a seller, both are able to provide a rating about the other along with a short comment.
At the time of collection, Bonanza was using a rating scale of “Positive”, “Neutral”, and “Negative”
to rate another user after a transaction. For representing the buyers and sellers, we use U and U,
respectively.

The next two datasets are representing the role call votes combined from the 1st to 10th United
4http://www.bonanza.com
5http://www.ebay.com
6http://www.amazon.com

81

States Congress. More speciﬁcally, we collected two separate datasets7; one for the U.S. Senate
and the other for the U.S. House of Representatives (which we will refer to as U.S. House). In
each of these datasets we represent the bills that were voted by the set U and the senators or
representatives by U. If a congressperson voted “Yea” or “Nay” for the bill, then we represent
these as positive or negative links between them, respectively, and leave the connection missing
otherwise.

Note that for simplicity throughout the rest of this section we will refer to the nodes in U as
“buyers” and those in U as “sellers”. In Table 4.10 we report some basic statistics of our three
collected datasets. We note that in the Bonanza dataset there is a signiﬁcant imbalance between the
number of positive and negative links as compared to the two U.S. Congress datasets. Although
these datasets are representing vastly diﬀerent real-world social structures, we next investigate
balance theory [30, 29] to the signed bipartite network setting.

4.2.1.2 Signed Butterﬂies in Signed Bipartite Networks

In signed networks one of the most fundamentally studied social theories is balance theory [30, 29],
which discusses the settings in signed networks that are socially “balanced” (i.e., stable), and those
that are more likely to change (to be balanced) due to the social tensions involved in maintaining
“unbalanced” and seemingly unnatural connections.
In recent signed network analysis works
balance theory is usually investigated and then applied towards many tasks [74, 2, 55], but almost
always in the form of triangles (or cycles of length 3) in a unipartite signed network. As seen in
Figure 2.2, there are four possible conﬁgurations between the three nodes. We can further observe
in Figure 2.2 that triangles (a) and (b) are balanced (due to having an even number of negative
links), while (c) and (d) are unbalanced. Nevertheless, as previously mentioned, since there are no
triangles in signed bipartite networks and they have two diﬀerent node types, it is unknown whether
balance theory is still applicable towards a bipartite setting.

Next, we will therefore introduce how we plan to extend the usage of balance theory to the
7https://www.govtrack.us/data/

82

Figure 4.3: Undirected signed butterﬂy isomorphism classes.

smallest signed cycles (i.e., butterﬂies) in undirected signed bipartite networks. Thereafter we
investigate and present our initial analysis of these signed butterﬂies in three real-world signed
bipartite networks.

4.2.1.3 Signed Butterﬂy Isomorphism Classes

In unsigned bipartite networks, one commonly investigated structure is that of a “butterﬂy” [122,
123], which is a cycle of length 4. More formally, a butterﬂy is the simplest cohesive higher-order
structure and also a complete biclique. Thus, this provides the most natural structure to investigate
as a possible extension for balance theory in signed bipartite networks.

Just as there are diﬀerent types of signed triangles, there are diﬀerent types of signed butterﬂies.
In Figure 4.3 we present the 7 non-isomorphic undirected signed butterﬂies. Note that there are
ﬁve that adhere to balance theory while only two are categorized as unbalanced. We use the
notation (∗,∗,∗,∗) to denote a signed butterﬂy isomorphism class that represents the links between
the buyers and sellers (,   ,  , ) (in that order with the last sign connecting  and ). The
simplest of types are (+,+,+,+) and (−,−,−,−), which denote the classes having all positive or all
negative links, respectively, and both are balanced due to having an even number of negative links
(and can be seen in Figures 4.3(A) and 4.3(E), respectively). We can interpret the (+,+,+,+) class as
the situations where two buyers have bought from the same two sellers and the sentiment amongst
them across the four purchases was positive. Next, we have (+,+,+,−) and (+,−,−,−), which

83

are the two unbalanced classes of signed butterﬂies (since they have an odd number of negative
links). In Figure 4.3(F) we have the signed butterﬂy isomorphism class that encompasses all the
signed butterﬂies with a single negative link. We can observe that no matter where this single
negative link is placed, we always have one buyer with two positive links, one buyer with a positive
and negative link, and similar structure for the two sellers. The isomorphism class (+,−,−,−)
can be seen as the complement (if deﬁned as swapping link signs in a signed network) of the
class (+,+,+,−) and deﬁned in a similar way, but with swapping the positive and negative links
in the deﬁnition. This leaves the signed butterﬂies having two positive and two negative links, of
which we have three isomorphism classes. In Figure 4.3(D) we see the class (+,−,+,−) is used to
represent signed butterﬂies where all buyers and sellers have one positive and one negative link in
their cycle. When one of the buyers has two positive links, while the other buyer has two positive
links, we observe in Figure 4.3(B) that both sellers have a single positive and single negative link,
and deﬁne the isomorphism class of (+,−,−,+). Finally, the last type of signed butterﬂy has both
buyers connected positively to one seller, and negatively to the other, which we represent as the
class (+,+,−,−) shown in Figure 4.3(C).

4.2.1.4 Signed Butterﬂy Analysis

In Table 4.11 we report our analysis after counting the number of signed butterﬂies for each
isomorphism class as shown in Figure 4.3. We further calculated the percentage each isomorphism
class takes up of the total signed butterﬂy count in each dataset (given in column “%”). Next, we
analyzed the signiﬁcance of these signed butterﬂies being found in signed bipartite networks and
wanted to test whether they are overrepresented or underrepresented. Remember, balance theory
would suggest that balanced isomorphism classes (A) through (E) should appear frequently while
(F) and (G) (being unbalanced) should appear less frequently. To quantify this, extending the
approach taken in [2], we calculate “%” as the expected percentage of total signed butterﬂies to
fall into the given isomorphism class when randomly reassigning the positive and negative signs to
the signed bipartite network. In other words, for example, “%” for the isomorphism class

84

Table 4.11: Signed butterﬂy statistics on signed bipartite networks.

Signed Butterﬂy

Isomorphism Classes
( ) (+,+,+,+)
() (+,−,−,+)
() (+,+,−,−)
() (+,−,+,−)
() (−,−,−,−)
Balanced
() (+,+,+,−)
() (+,−,−,−)
Unbalanced

Bonanza
%
%
0.922
0.986
0.001
7.8e-04
2.8e-04 7.8e-04
1.7e-04 7.8e-04
7.7e-06 1.7e-07
0.988
0.924
0.012
0.076
3.9e-05 3.2e-05
0.012
0.076


386
40
-29
-35
30

-390
2

Count
2554388

3830
726
456
20

2559420
30685
100
30785

U.S. Senate

Count

% %
13404168 0.262 0.094
0.110 0.122
5595440
0.184 0.122
9404006
5537080
0.108 0.122
6815324
0.133 0.040
40756018 0.797 0.500
6225745
4118075
10343820 0.203 0.500


4142
-277
1349
-302
3414


Count

U.S. House of
Representatives
% %
227660420 0.244 0.085
103731010 0.111 0.123
173875858 0.186 0.123
101409932 0.109 0.123
137478104 0.147 0.045
744155324 0.797 0.500
109763190 0.118 0.289 -11565
79053742
-9430
0.085 0.210
188816932 0.203 0.500

17459
-1137
5843
-1368
15104

0.122 0.302 -2811
0.081 0.197 -2099

85

(+,−,−,−) is calculated by (cid:0)4

(cid:1)(cid:16)(|E+|/|E|) × (|E−|/|E|)3(cid:17)

1

, since there are 4 permutations of
having a single positive link in a signed butterﬂy in class (+,−,−,−) and the probability of each
link appearing in a signed network with randomly assigned link signs would be the independent
probabilities of having a single positive link (i.e., |E+|/|E|) and three negative links (i.e., |E−|/|E|).
Finally, the value “” is used to denote the number of standard deviations the actual count diﬀers
from our calculated expected number (based on “%”) for each signed butterﬂy type and just
as in [2], a positive (or negative) “” value signiﬁes appearing signiﬁcantly more (or less) than
expected.

We ﬁrst observe that the large majority of signed butterﬂies in our three signed bipartite
networks are indeed balanced. Furthermore, they are signiﬁcantly more balanced than expected
based on the link sign ratio in the given network (i.e., comparing columns “%” and %). The
second observation is that all unbalanced signed butterﬂies across the three datasets are signiﬁcantly
underrepresented, except for the (+,−,−,−) butterﬂies in Bonanza, where it shows a minimal over
representation. Similarly, across all datasets the (+,+,+,+) and (−,−,−,−) signed butterﬂies are
signiﬁcantly overrepresented, further strengthening the applicability of balance theory in signed
bipartite networks. However, the isomorphism classes involving two positive and two negative links
appear to not always be found overrepresented. For example, the class where all buyers and sellers
have one positive and one negative link, i.e., (+,−,+,−), is less commonly found than expected
across all three datasets.

In summary, our ﬁndings suggest that: 1) we can use signed butterﬂies to extend balance theory
for signed bipartite networks; and 2) signed bipartite networks adhere to balance theory when
deﬁned in terms of signed butterﬂies, thus making them applicable to advance numerous tasks in
signed bipartite networks.

4.2.1.5 Signed Caterpillars in Bipartite Networks

A “signed caterpillar” we deﬁne as paths of length 3 that are missing just one link to becoming
a signed butterﬂy. Therefore, a signed caterpillar can take on one of eight diﬀerent forms, since

86

it is composed of three links being ether positive or negative. Note that all caterpillar types have
the potential to be transformed into a signed butterﬂy (i.e., closed into a cycle of length 4) that is
either balanced or unbalanced. If a signed caterpillar contains an even number of negative links,
we refer this as a “balanced path” and balance theory would suggest a positive (or negative) link
transforming it into a balanced (or unbalanced) signed butterﬂy. Similarly, we deﬁne an a signed
caterpillar as an “unbalanced path” when having an odd number of negative links and balance
theory would suggest a negative (or positive) link to close into a balanced (or unbalanced) signed
butterﬂy.

4.2.2 Sign Prediction for Signed Bipartite Networks

With the aforementioned deﬁnitions and notations, we formally deﬁne the problem of sign prediction
in undirected signed bipartite networks as the following:

Given an undirected signed bipartite network G = (U,U, E+, E−) represented as a bi-
adjacency matrix B ∈ R|U|×|U|, we seek to predict the signs of no link pairs (,  ) ∈
{U × U}\{E+ ∪ E−}.

Sign prediction in signed networks has been previously studied [61, 124, 125, 126, 4]. However,
in the signed bipartite setting, many of these methods are no longer applicable, since there are no
triangles. In Section 4.2.1.2, we validated that the large majority of signed butterﬂies in signed
bipartite networks are balanced. Methods for predicting link signs in unipartite signed networks
can be categorized into three main groups: 1) supervised methods; 2) low-rank approximation
methods; and 3) propagation based methods. Therefore we develop a representative sign prediction
method speciﬁc to signed bipartite networks from each group. More speciﬁcally, we propose: 1)
a supervised classiﬁcation method that uses signed caterpillars/butterﬂies; 2) extend a low-rank
modeling method to ensure the predicted signs favor creating more balanced signed butterﬂies; and
3) a random walk based approach that integrates one-mode projection networks for U and U
constructed using balance theory.

87

4.2.2.1 Signed Caterpillars Based Classiﬁer

One common approach towards predicting links or link signs in both signed and unsigned networks
is to frame the task in terms of a supervised classiﬁcation problem [4, 61, 127, 128]. Here we
extend the idea to the signed bipartite setting by formulating the problem of predicting the sign
between a buyer  and a seller   by extracting features from either the individuals (i.e., their
positive and negative degrees) or local neighborhood features based on balance theory (i.e., signed
caterpillars).

To train our model we construct a training dataset consisting of known signed links (between
a buyer and seller). Then, after having a trained model, we can extrapolate what we learned
from the training data to predict a positive or negative sign for an unknown buyer and seller pair.
More speciﬁcally, we use a logistic regression model following the prediction on directed signed
unipartite networks work in [4].

Feature Extraction. The two diﬀerent sets of features we evaluate are either based on the two
nodes degree distributions or information about how many signed caterpillars they are the two
endpoints of (i.e., they would be the buyer and seller connection transforming the signed caterpillar
  for the pair (,  )
to a balanced or unbalanced signed butterﬂy). Thus, the feature vector x
  contains
includes the the positive and negative degrees for both  and  .
the counts for each of the 8 possible signed caterpillars that have  and  as the endpoints.
The expectation is that the features x
  because they
would provide a vast amount of informaiton as to whether their link sign is likely to be positive or
negative according to balance theory when considering the types of signed butterﬂies that would
be constructed. This is in comparison to only using the degrees in a method similar in nature to
a signed preferential attachment model with x. We denote the supervised classiﬁers that use x
 
and x

  will be more informative than those of x

In comparison, x

  as SCd and SCsc, respectively.

88

4.2.2.2 Low-Rank Sign Prediction

In recent years the low-rank matrix factorization approaches have been gaining popularity for
numerous applications involving link related network predictions [129, 130, 131]. Although some
of these works have focused on signed networks [131, 132, 130], none are structured to select link
signs that would explicitly push towards more signed butterﬂies being balanced in signed bipartite
networks. Thus, we ﬁrst introduce a basic matrix factorization approach to model the signed
bipartite network using the biadjacency matrix B. Then we introduce how we can successfully
modify this model through the inclusion of additional pairs of buyers and sellers derived from
suggested implicit signed links that would construct the most balanced signed butterﬂies with the
suggested link sign.

Basic Matrix Factorization Model: The set of existing edges in B are denoted in the set
E = {(,  )|B ࣔ 0}. In terms of the link sign prediction task we would like to discover two latent
matrices U = [u1, u2, . . . , u] ∈ R× and V = [v1, v2, . . . , v] ∈ R× of dimension  for
the set of buyers and sellers, respectively, to solve the following optimization problem:


min
U,V

(, )∈E

(cid:16)

max

0, 1 − B (u(cid:62)

 v )(cid:17)2 + 

(cid:16)|U|2

 + |V|2


(cid:17)

(4.11)

where u(cid:62)
 v  is used to model the link sign between buyer  to seller  . Note that when the real
link sign (i.e., B ) and the predicted link sign (i.e.,u(cid:62)
 v ) are of the same sign (i.e., both positive
or both negative) then B (u(cid:62)
 v ) is positive, and if over 1 then there is no loss. However, when
the real and predicted values have diﬀering signs then there is a higher loss value associated to
drive the minimization during the training process. Following the work in [131] we use Stochastic
Gradient Descent (SGD) to minimize the objective in Eq. (4.11).

This allows us to then utilize the learned low-dimensional representations for each buyer and
seller to predict the sign of unknown buyer and seller pairs. However, although this model is eﬀec-
tively learning a representation that can accurately predict the existing links, it does not explicitly
control whether the signs of non-existing links are actually going to predict link signs that adhere
to balance theory (i.e., having more signed butterﬂies balanced than unbalanced). Therefore we

89

denote this method simply as MF. Next, we will present an extension to this basic framework to
further ensure more signed butterﬂies between the missing links are balanced.

Matrix Factorization with Balance Theory: As previously discussed, the aforementioned
basic matrix factorization approach given in Eq. (4.11) does not explicitly enforce the non-existing
link signs to favor balanced relationships. Instead it can only focus on learning low-dimensional
representations for each buyer and seller such that the model minimizes the error on predicting the
existing link signs. The approach we have selected is to further encourage the model learning link
signs for buyer and seller pairs that currently do not exist in the signed bipartite network, but would
convert many signed caterpillars into balanced signed butterﬂies if they were to exist.

(cid:105)

(cid:104) 0 B

B 0

(cid:104)

(cid:105)

BB B

0

The ﬁrst step is calculating whether balance theory would suggest a positive or negative link
for each buyer and seller pair (, ), that currently do not have a link between them, based on the
types of signed caterpillars they’re jointly involved in and the endpoints of.
Theorem 3. Given a signed undirected biadjacency matrix B, then the matrix ˆS = BBB (cid:12) B is
such that ( ˆS ) suggests the sign of a non-existent link in B that would result in a net gain
of | ˆS| additional balanced signed butterﬂies created (after subtracting the number of potential
unbalanced signed butterﬂies created simultaneously) if the suggested signed link were to be added
between  and  , where we deﬁne B as B  = 0 if B  ࣔ 0 and B  = 1 when B  = 0.

0

, where M

, M

Proof. If we let A =

. We note that in [133] it has been shown A = M

be the adjacency matrix in R|U|×|U|. We can observe that A3 =
 ∈
B BB
R|U|×|U| store the number of balanced and unbalanced paths of length , respectively, between all
pairs of nodes in a signed network represented as A. Thus, since A3
  for some buyer 
and seller  , we observe that this represents the number of of balanced paths of length 3 subtracted
by the number of unbalanced paths of length 3. By deﬁnition of a signed caterpillar, if one is a
balanced path, then it would suggest a positive link to close to be a balanced signed butterﬂy, but
if it was formed by an unbalanced path it would require the closing link to be negative to form a

balanced butterﬂy. Therefore, it follows that ((cid:2)BBB(cid:3)) = (cid:0)M

(cid:1) indeed represents

 − M

  =(cid:2)BBB(cid:3)

 − M


90

the sign that would promote the creation of more balanced signed butterﬂies, and similarly for
the net gain of balanced butterﬂies being formed equaling the absolute value of their diﬀerence
(., |M
after taking the element-wise product with B that zeros out the pairs that have an existing link. (cid:3)

|). It is then easy to extend to only the buyer and seller pairs  and   in(cid:2)BBB(cid:3) (cid:12)B

−M

Note that ˆS can also be calculated (sometimes more eﬃciently) using the following:

(cid:2)BB(cid:62)B(cid:3)



ˆS  =

if B  = 0

 
0

otherwise

to avoid using the potentially very dense matrix B for sparse signed bipartite networks.

Using Theorem 3 we can construct additional sets E+

 of implicit positive and negative
links, respectively, suggested by balance theory that would create the highest net gain of balanced
signed butterﬂies in the signed bipartite network. We deﬁne these sets as follows:

 and E−

ˆE+
 = {(,  ) | ˆS  > 0 and ˆS  ∈  ( ˆS)}
ˆE−
 = {(,  ) | ˆS  < 0 and ˆS  ∈ ( ˆS)}

(4.12)

where  ( ˆS) and ( ˆS) are used to denote the  largest and smallest values, respectively,
in ˆS.

We formulate our object that incorporates balance theory as follows:


+  

min
U,V

(, )∈E

(, )∈ ˆE+


(cid:16)

(cid:16)

max

0, 1 − B (u(cid:62)

max

0, 1 − ˆS (u(cid:62)

(cid:17)

(cid:16)|U|2
 v )(cid:17)2 + 
 v )(cid:17)2 +  

 + |V|2


max

(, )∈ ˆE−


(cid:16)

 v )(cid:17)2

0, 1 − ˆS (u(cid:62)

(4.13)

where  and  are used to control the level at which we incorporate the modeling of signed
butterﬂies through the inclusion of the implicit positive and negative links, respectively. We again
note that these implicit positive and negative links are implied by balance theory by using ˆS, which
eﬀectively counts for each node pair (,  ) what the net gain of total balanced signed butterﬂies

91

would be once including the link with the suggested sign (according to the majority count of signed
caterpillars being of balanced or unbalanced paths of length 3). We denote this matrix factorization
method using balance theory as MFwBT.

4.2.2.3 Random Walk Based Sign Prediction

Typical propagation based methods, such as the random walk with restart [44] have seen many
variants and been applied to solve link prediction and ranking related tasks in unsigned unipartite
networks. However, signed bipartite networks pose multiple challenges that prevent them from
directly using the typical methods. One such problem is that bipartite networks do not have a
stationary distribution and thus do not converge [134]. One way of handling this problem in unsigned
bipartite networks is considered a “lazy” random walk, where the walker will probabilistically stay
at the same node. We will later use this method as a comparison against our proposed random
walk based method. Furthermore as seen in previous sign prediction methods for unipartite
signed networks, balance theory is the key component towards obtaining higher performance when
predicting the sign of unknown links. Thus, due to our analysis of the signed butterﬂies, indeed
signed bipartite networks are showing high levels of balance and therefore we should also be using
balance theory to guide the random walk based method for signed bipartite networks towards a
solution having more balanced relations.

Here we present a random walk based approach that integrates the U and U one-mode
projection adjacency matrices, which are constructed using balance theory, to aid in handling the
issues faced with the bipartite setting, and develop a signed random walk based approach to not only
allow a proper transition matrix, but to furthermore have the random walker be promoting balance
theory. The ﬁrst step will be the construction of a signed adjacency matrix A based on balance
theory, followed by deﬁning a signed transition matrix that can further promote and propagate
balanced relations throughout the network.

Constructing the one-mode adjacency matrices: In unsigned bipartite network analysis one-
mode projections are typically used for both analysis and aiding to solve various tasks [135, 136,

92

Figure 4.4: High-level overview of how we construct A from B, P and P.

137]. They are constructed by creating a projection network that creates implicit connections
between nodes of the same type. In terms of our deﬁnitions, two one-mode projection networks
can be performed, one that connects the buyers in U together amongst themselves and the other
for the sellers in U by constructing seller to seller links; these relations can be represented in the
adjacency matrices P ∈ R|U|×|U| and P ∈ R|U|×|U|. A visual example can be seen in
Figure 4.4 when going from B to P and P from left to right through the ﬁrst arrow.

We note that there is not just one way to discover these implicit connections between pairs of
users in the same set, and in fact there are many possible methods for one-mode projections [136,
137]. It has also been studied that using diﬀerent methods to construct the projection networks
can cause drastic changes to the usability and performance [138]. In wanting to carefully construct
these projection networks, we choose to utilize balance theory in the form of signed triangles. Next
we will discuss the formation of the adjacency matrix P, and a similar process can be followed
for constructing P (although we only discuss P here).

Based on the ideas of common neighbor similarity in unsigned networks, we will possibly
connect two buyers  and   if they have at least one seller in common they are linked to. Let the
number of common sellers that  and   agree upon (in terms of link sign) be denoted as  
 .
Similarly let 
  denote the number of sellers these two buyers disagree on in terms of link sign.
Then we deﬁne P  = P  =  
  , which we can see is taking the number of sellers they
agree upon in terms of signed connections (i.e., both either negatively or positively connected to
that seller) and subtracting the number of sellers they disagree on (i.e., the sellers where one buyer

  − 

93

 0

has a positive link while the second buyer has a negative connection with that seller). We can now
further see the connection between P  and the common neighbor similarity method. It is easy
to verify that out of all the triangles formed between ,  , and the sellers  they are commonly
linked to, that using the links B, B   and P , we see that the majority will adhere to balance
  then (P ) is positive and closing the  
theory. This is by design since if  
 
triangles to be balanced, while the lesser number of 
  will close to be unbalanced. Note that a
  then P  = 0 and no signed
similar argument can be given when  
triangles are formed. Ultimately, we construct a parameterized version as follows:

  and if  

  > 

  < 

  = 

P  =

  − 
 

 <  

  − 

  < 

otherwise

(4.14)

where  and  are used to deﬁne thresholds for the necessary magnitude of  
  to have
a non-zero value in P . This allows us to ignore adding smaller values (e.g., 2), since in some
settings having such a small value might not be very signiﬁcant and thus we might not want to
construct a link between  and  . Note that for simplicity we allow  and  to be shared for
constructing both P and P.

  − 

Performing the random walk: Now having the two projection adjacency matrices P and P,
we can use them to construct an adjacency matrix A ∈ R|U|×|U|, which will be the unibipartite
signed network we perform our random walk on, whereU = {1, . . .  , 1, . . . }. In Figure 4.4
we show the high-level intuition of how to construct A. First we denote ˆB as the row normalized
|B|. We similarly construct row normalized adjacency

biadjacency matrix where ˆB  = B /

matrices ˆP and ˆP. Now we can formulate A as follows:


A =

 ˆP  ˆB


 ˆB

ˆP

(4.15)

where  is a parameter that can be used to bias the random walker to favor the real links in our
signed bipartite network as compared to the implicit links we obtained through the U and U
one-mode projection networks. Next we construct a similar row normalized adjacency matrix ˆA

where ˆA  = A /


|A|.

94

Finally, we utilize ˆA in a random walk propagation model where we deﬁne Y to be the matrix

holding the inferred link signs as follows:

Y  =


ˆAY 

(4.16)


Next we describe how the above adheres to balance theory in terms of triangles of the adjacency
matrix A. This is because for some  if ˆAY  > 0 it increases Y  ensuring it to be positive, which
would be a triangle consisting of either three positives, or two negatives and a positive. Similarly
when ˆAY  < 0 we are decreasing Y  and encouraging it to be negative, thus also following
balance theory. The closed form solution that includes the restart capability, with probability
(1 − ), is given to be the following:

Y = (1 − )(I −  ˆA)−1

(4.17)

Note that each signed butterﬂy involving ,   ,  , and  in the original network B now consists

(cid:1) triangles in ˆA. Thus, when we are encouraging balanced triangles here in Y this

of up to (cid:0)4

3

correlates to having balanced signed butterﬂies in the upper right corner of Y, which is where we
obtain the link sign predictions (i.e., when predicting the sign between  and   we have use ˆB (cid:48)
where (cid:48) = ( + ). We denote this method as Signed Bipartite Random Walk (SBRW).

For comparison, if we set the two one-mode projection matrices to the identity matrix (i.e.,
P = P = I) and set  = 1 then Eq. 4.16 becomes the equation for a lazy random walk method,
which we denote as LazyRW.

4.2.3 Experiments

In this section, we empirically evaluate our proposed sign prediction methods for signed bipartite
networks that harness balance theory. We seek to answer the following: (1) Does the extended
balance theory to signed butterﬂies in the bipartite setting provide an increase in performance for
sign prediction? and (2) How do the proposed methods work/compare? To address these questions
we perform experiments to measure the performance for each of the proposed sign prediction

95

methods across three real-world signed bipartite networks. To better understand our methods and
the contribution of balance theory, we also follow-up with a parameter sensitivity analysis for the
major parameters of our methods.

Experimental Settings: Here we discuss the settings used for our experiments on sign predic-
tion in signed bipartite networks. As previously discussed in Section 4.2.1.1 we have collected three
signed bipartite networks for this study, namely, Bonanza, U.S. Senate, and U.S. House. For our
sign prediction experiments we have randomly selected 10% of the links as test, utilized a random
5% for validation purposes of tuning the hyperparameters of our models, and the remaining 85% as
training for each of our datasets. More speciﬁcally, each method is only given access to the signed
bipartite network induced from the training links, then, for each edge in the testing set, we compare
the ground truth link sign with the link sign the speciﬁc method suggests for that undirected pair.
For evaluation we use both F1 and Area Under the receiver operating characteristic Curve (AUC),
since the positive and negative links are unbalanced especially in the Bonanza dataset. To the best
of our knowledge this is the ﬁrst study of predicting link signs in signed bipartite networks; hence
other existing methods either for unipartite signed or unsigned bipartite networks are likely not
applicable. The main investigation is two-fold. First, we want to test the applicability of balance
theory (based on signed butterﬂies) to aid in sign prediction. Second, we want to provide insights
to guide practical usage of sign predictors with diﬀerent types of signed bipartite networks. Thus,
we only provide a comparison against the methods we have presented in this dissertation.

4.2.3.1 Comparison Results

The results across our three signed bipartite networks in terms of AUC and F1 can be found
in Table 4.12 and the ﬁrst observation we make is that there is not one proposed method that
outperforms the others across all the datasets.

The second observation we make is that the three methods SCsc, MFwBT, and SBRW, which
receive aid in prediction from balance theory when deﬁned using signed butterﬂies, always perform
better than their respective baseline method (i.e., SCd, MF, LazyRW) that only use generic signed

96

Table 4.12: Link sign prediction results in terms of (AUC,F1).

Sign Prediction

Method

SCd
SCsc
MF

MFwBT
LazyRW
SBRW

Bonanza

(0.553 , 0.959)
(0.664 , 0.674)
(0.593 , 0.903)
(0.608 , 0.905)
(0.547 , 0.979)
(0.582 , 0.949)

U.S. Senate
(0.638 , 0.654)
(0.812 , 0.823)
(0.792 , 0.812)
(0.814 , 0.827)
(0.808 , 0.821)
(0.836 , 0.849)

U.S. House

(0.625 , 0.635)
(0.827 , 0.837)
(0.831 , 0.846)
(0.834 , 0.848)
(0.815 , 0.827)
(0.846 , 0.858)

network information in terms of AUC and only in two cases the F1 is worse.
In the Bonanza
dataset we have the SCd and LazyRW outperforming SCsc and SBRW, respectively, in terms of
F1 (although performing worse in AUC). The reason for this is the heavy imbalance between the
positive and negative links in this dataset, more speciﬁcally, almost 98% of the links are positive,
which is generally a setting where the AUC measurement is preferred to understand the performance
better. Therefore we can see that to better detect the few negative links comes at the sacriﬁce of
misclassifying some of the positive links, which is why the F1 of SCsc and SBRW is less than
SCd and LazyRW, but comes with a signiﬁcant increase in AUC. In general we observe that in
fact the usage of signed butterﬂies for sign prediction in signed bipartite networks provides a very
signiﬁcant improvement in almost all cases. This fact suggests that we can give a positive answer to
our ﬁrst question – the usage of balance theory in the form of signed butterﬂies for sign prediction
in signed bipartite networks indeed provides an empirically veriﬁable improvement.

In the U.S. Senate and U.S. House datasets, for the methods constructed based on intuitions
of how to correctly ensure more balanced signed butterﬂies are being created when predicting
missing link signs (i.e, SCsc, MFwBT, and SBRW), we see the low-rank model outperforms the
the supervised classiﬁer approach, while the random walk method performs the best (for both AUC
and F1).

However, unlike the two U.S. Congress datasets, in the Bonanza dataset we actually observe the
complete opposite behavior (in terms of AUC) for the ranking of methods that utilize the signed
butterﬂy based balance theory. We hypothesize this is due to the heavy class imbalance between

97

(a) MFwBT (AUC)

(b) MFwBT (F1)

Figure 4.5: Parameter sensitivity on  and  in MFwBT on the U.S. Senate dataset.

the positive and negative links. With this imbalance the SBRW method might be unable to directly
handle this setting as the parameters only focus on separating real/implicit and balance/unbalance
through  and /. Futhermore, if most negative links are involved in balance relationships then
actually this would cause even more positive links to be constructed in the two one-mode projection
matrices (since two negatives would result in a positive link being created). In comparison, MFwBT
is able to more accurately control the ratio of positive to negative implicit links being used in the
training procedure (through selecting the size of both ˆE+
 ) when extracting them from
investigating which links would cause the most signed caterpillars to turn into balance signed
butterﬂies. Also, we note that in our study we ﬁxed  = , but this mechanism would further allow
MFwBT to balance the contribution of implicit positive and negative links towards learning the
most eﬀective representations. Finally, although we see a drastic improvement in terms of AUC for
the SCsc method, we also observe this comes at great cost to the F1 measure, and thus this method
is just discovering a trade-oﬀ of predicting more negative links. This is because we have tuned
our logistic regression model to use weights on each training example inversely proportional to the
frequency of that link type.

 and ˆE−

98

(a) SBRW (AUC)

(b) SBRW (F1)

Figure 4.6: Parameter sensitivity on  and  in SBRW on the U.S. House dataset.

4.2.3.2 Parameter Analysis

Among our three proposed sign prediction methods, the low-rank modeling with balance the-
ory (MFwBT) and random walk (SBRW) methods contain interesting hyperparameters from the
perspective of wanting to further understand balance theory in signed bipartite networks.

 and E−

 and E−

In our MFwBT method, we discussed that we can control the number of suggested implicit
positive and negative links from signed butteﬂies being included in E+
 , respectively. We
performed a grid search for both the size of E+
 , in the set {0,1000,10000}. We discovered
that the best setting when considering across the three datasets was having |E+
 | = 1000 and
|E−
 | = 10000, which we suspect is due to the class imbalance and having more explicit positive
links than negative links. Furthermore, the values of  and  were used to control the contribution
of training on both positive and negative links suggested based on signed butterﬂies (i.e., links in
E+
 and E−
 ), respectively. For simplicity of our analysis we set  =  and report the performance
on our validation set for the U.S. Senate dataset in Figure 4.5. We observe that updating the
node representations using suggested signed links (that were selected since they would close the
most signed caterpillars into balanced signed butterﬂies) provides an improvement over not taking
balance theory into account (which is when  =  = 0), but care should be taken to not put too
much focus on these implicit links. We observe similar ﬁndings in our other datasets.

For our SBRW method, there are two main sets of parameters  and the threshold pair 

99

with . We varied  at a large granularity in the set {1, 2, 3, 5}, and observed there was not as
much signiﬁcant diﬀerence as found in varying  and , thus we selected the best on average
across the three datasets of  = 2 and ﬁxed this value to investigate the impact of  and  on the
performance for predicting the missing link signs.

  and 

In Figure 4.6, for the U.S. House dataset, we varied both  in the set {0,25,50,75,100} and
similarly for  in the set {0,-25,-50, -75, -100}. Note that although we saw similar trends across
the three datasets, the speciﬁc magnitude of  and  we needed to tune separately for each dataset
due to the average magnitude of  
  (i.e., number of common sellers  and   agree
or disagree on, respectively) for constructing P and similarly for P, although we ﬁxed  and
 for constructing both one-mode projection matrices. We observe in terms of both AUC and F1
from Figure 4.6 that indeed using these two thresholds to avoid implicit links that do not have a
signiﬁcant amount of information (i.e., low magnitude of | 
  |) provides great improvement
to our method. It appears that implicit positive links that have low support are helpful to include.
However, it seems better to avoid inferred negative links, which we obtained based on balance
theory in the form of signed triangles between two buyers and a seller (similarly for the case of two
sellers and a buyer). Although including a few is helpful, ones that have low amount of balance
theory support from the network are better left out of the propagation process.

  - 

100

CHAPTER 5

MINING NETWORKS WITH NEGATIVE LINKS

In this chapter1, 2, we focus on the development of signed network mining methods. In traditional
network analysis there are two major directions for network mining, which focuses on utilizing
data mining techniques for graph data. More speciﬁcally, they can be categorized into either
link-oriented tasks and node-oriented tasks.

For signed network analysis the objective of link-oriented tasks are to reveal ﬁne-grained
and comprehensive understanding of positive and negative links. The availability of negative
links not only enriches the existing link-oriented tasks for unsigned networks such as tie strength
prediction [62, 63, 64], but also encourages novel link-oriented tasks speciﬁc to signed networks
such as sign prediction and negative link prediction. While positive/negative link prediction [3, 4,
61, 131] and sign prediction [26, 139, 140] have been extensively explored, research on signed tie
strength prediction is rather limited. We note that the signed node relevance measures we presented
in Section 3.1 were evaluated on these link-oriented mining tasks of sign and tie-strength prediction
in signed networks.

Node-oriented mining tasks provide necessary means in order to better understand nodes in
networks. Hence, for signed networks, the major node-oriented tasks include community detection,
node classiﬁcation, and node embedding, among which community detection [141, 142] is the
most extensively studied, while signed network embedding is in the earliest of stages compared
to the others. Hence, in Section 5.1 we ﬁrst seek to develop state-of-the-art node embeddings by
combining signed social theories with the modern techniques of graph neural networks [143, 144,
22, 145, 75, 146] that are a class of deep learning methods speciﬁcally designed for graph-structured
1Tyler Derr, Yao Ma, and Jiliang Tang. “Signed Graph Convolutional Networks.” In Proceedings

of the 18th International Conference on Data Mining (ICDM). 2018.

2Amin Javari, Tyler Derr, Pouya Esmalian, Jiliang Tang, and Kevin Chen-Chuan Chang. “ROSE:
Role-based Signed Network Embedding.” In Proceedings of the 29th International Conference on
The World Wide Web (WWW). 2020.

101

data. Then, thereafter we seek to develop a novel transformation-based signed network embedding
methodology that is able to ﬁrst transform the network based on node roles associated with directed
labeled edges to take advantage of traditional network embeddings models and then aggregate the
learned embeddings back to the original signed network.

5.1 Signed Graph Convolutional Networks

Recently there has been a large and growing interest of generalizing neural network models to
structured data, with one of the most prevalent structures being graphs (such as those found in social
media). The idea of generalizing neural network models to graph structures, namely graph neural
networks (GNNs) [143, 144, 22, 145, 75, 146, 147, 148], has lately started to become more developed
by overcoming the diﬃculties and trade-oﬀs previously associated with fast heuristics compared to
slow and more principled approaches. One particular type of GNN are graph convolutional networks
(GCNs) which are modeled after the classical convolutional neural networks [70]. The ﬁrst GCN
introduced for learning representations at the node level was in [22], where they utilized GCNs
for the semi-supervised node classiﬁcation problem. Furthermore, learning low-dimensional node
representations have been previously proven to be useful in many network analysis tasks including
node classiﬁcation [82, 77], such as link prediction [149, 150], community detection [151, 152],
and visualization [153, 149].

Previous work has mostly focused on using GCNs for unsigned graphs (or graphs consisting of
only positive links). However, especially with the ever growing popularity of online social media,
signed graphs are becoming increasingly ubiquitous. This naturally leads the question as to whether
unsigned GCNs are suitable to be used on signed networks. Unfortunately, there are many reasons
as to why unsigned GCNs are not capable of learning meaningful node representations in signed
networks. First, it is unclear how they would handle the availability of negative links in signed
networks, and furthermore, negative links invalidate some of the underlying key assumptions of
GCNs. For example, GCNs designed for unsigned networks learn a node representation using
the fundamental social theory homophily [8], which states users having connections are more

102

Table 5.1: Notations in regards to signed graph convolutional networks.

Notations
Z
() (())
() (())

Descriptions
Low-dimensional representation of signed network G
The set of users that can be reached from 
along a (un)balanced path of length .
The aggregator responsible for incorporating
the information from the set of users ()(())
z
The ﬁnal embedding of user 
N+
Set of positive (negative) neighbors of 
h()
The (un)balanced representation of  at the th layer
W() (W()) Weight matrices used for learning how to propagate

)

(N−
 )
(h()


(un)balanced information in the th layer

likely to be similar than those without links. Hence, the aggregation processes of GCNs use
local neighborhood information when constructing the low-dimensional embedding for each node.
However, homophily may not be applicable to signed networks [67]. Instead, in signed networks,
there are speciﬁc social theories and principles deﬁned in the context of having both positive and
negative links. Therefore dedicated eﬀorts are needed for redesigning GCNs speciﬁcally for signed
networks.

Although it is now clear that GCNs will need to be speciﬁcally redesigned to provide the same
fruitful performance as previously shown in unsigned networks when applied to signed networks,
there are still tremendous challenges to overcome. When designing signed GCNs the primary
challenges are: (1) how to correctly handle negative links, since their properties are inherently
diﬀerent than those of positive links; and (2) how to combine the positive and negative links into a
single coherent model to learn eﬀective node representations. Thus, we turn our attention towards
social theories speciﬁc to signed networks (similarly to how the unsigned models were constructed
using unsigned theories like homophily). More speciﬁcally, one fundamental signed network
social theory that had been developed in social psychology is balance theory [29, 30]. Thus, we
seek to harness this signed network social theory to solve these two challenges of applying graph
convolutional networks to signed graphs.

103

5.1.1 Problem Statement

With the aforementioned notations and deﬁnitions presented in previous chapters and those sum-
marized in Table 6.4, we can formally deﬁne the problem of signed network embedding as follows:
Given a signed network G = (U, E+, E−) represented as an adjacency matrix A ∈ R×, we seek
to discover a low-dimensional vector for each node as
 : A → Z

(5.1)

where  is a learned transformation function that maps the signed network’s adjacency matrix A
to a -dimensional representation Z ∈ R× for the  nodes of the signed network.

5.1.2 The Proposed Signed Graph Convolutional Network Framework

Graph convolutional neural networks have recently started to become more developed and have
already shown their superiority in extracting and aggregating information from graph data. Their
use cases spread over the vast ﬁeld of network analysis, but one such domain that has shown to be
very inﬂuential recently is network embedding. The discovery of representative low-dimensional
features for each node in the network has previously shown to enhance many tasks from link
prediction and node classiﬁcation, to community detection and visualization. However, previous
work has mostly focused on constructing GCNs for unsigned networks. Due to the inherent
diﬀerences between unsigned and signed networks, this leaves a gap that we seek to bridge with
the development of a signed graph convolutional network (SGCN).

Even with dedicated eﬀorts towards the construction of a GCN speciﬁc to signed networks,
there are still tremendous challenges we must face and overcome. The ﬁrst of which is ﬁguring out
how we can correctly incorporate negative links during the aggregation process. We cannot simply
treat the negative links the same as positive links, since their properties and semantic meaning
vastly diﬀer. The second challenge is how we can combine the two sets of links (i.e., positive and
negative) into a single coherent model. This combination is essential because certainly positive
and negative links interact in the network structure in complex ways and indeed are not segregated
and isolated from each other.

104

In this work we propose to go to the roots of signed network analysis and utilize one of the
most fundamental and indispensable signed social theories developed in social psychology, balance
theory [30, 29]. We harness balance theory to construct a bridge to connect the gap between the
ongoing development of GCNs for unsigned networks and signed networks. In the remainder of
this section we will ﬁrst brieﬂy discuss a general GCN framework in the unsigned network setting
and discuss the relationships of this framework to the structure of signed networks. Then we
introduce balance theory and how we can use this signed social theory to correctly capture both
positive and negative links simultaneously during the aggregation process. Thereafter, we present
how to learn the parameters of our SGCN – ﬁrst through the construction of an objective function
designed to eﬀectively learn the node representations in signed networks, and ﬁnally discussing the
optimization procedure taken to optimize our proposed objective.

5.1.2.1 Unsigned Graph Convolutional Networks

Currently, most GCNs have a similar structure in that they utilize a convolutional operator that can
share weights across all locations in the graph. The beneﬁts of this neural network structure in
graphs as compared to the cumbersome fully connected models are at least three fold: 1) it avoids
the parameter explosion associated with fully connected layers (especially when handling larger
graphs); 2) it allows for parameter sharing across the network to avoid overﬁtting; and 3) a single
GCN is capable of handling as input graphs of varying structures and even sizes (in terms of the
number of nodes and edges).

Typically the architecture of an unsigned GCN for learning node representations is of the form
In the process of generating the -dimensional embedding matrix
shown in Algorithm 5.1.
Z ∈ R×, they make use of the unsigned adjacency matrix A ∈ R× and a feature matrix
X ∈ R×, where  is the length of feature vector x for user . The matrices H() ∈ R×
for  ∈ {1, . . . , } represent the hidden representations for each of the  nodes of the graph at each
layer  of the GCN. On line 1 we set the initial representation H(0) equal to X to ease the notations
in the remainder of the algorithm. Then, on line 2 we loop updating the parameters of the GCN

105

until convergent. Inside this loop, for each update iteration we propagate the graph features through
the  layers of the GCN using the unsigned adjacency matrix A and neighborhood aggregation
function  (). Note that the function  () is where the variations of GCNs primarily diﬀer. Finally,
after the model converges, the embedding is taken as the last layer’s representation matrix H().
Limitations of unsigned GCN for signed networks: Given the above discussion on the unsigned
GCN framework, we note that in relation to signed networks this would be similar to applying the
unsigned GCN on the positive only adjacency matrix A+ where A+
  = 1 if there exists a positive
link between users  and  , and 0 otherwise (i.e., when there exists either a negative link or no
link between them). However, this would ignore the negative links.

Initially, our thoughts may lead to some naïve approaches of handling the negative links by either
ignoring them, treating them the same or the negation of positive links, or separately applying the
GCN framework to ﬁrst the positive network, and then the negative network with ﬁnally combining
them at the end stage. However, each of these methods is either based on incorrect assumptions or
ignoring parts of the rich information awaiting to be extracted from the complex network structure
of signed networks towards the learning of an advantageous low-dimensional representation. For
example, trivially treating the negative links the same as the positive links would be an incorrect
assumption, since negative links have been shown to have diﬀerent principles and semantically
represent vastly diﬀerent meanings. Similarly, treating negative links as the negation of positive
links is likely an incorrect assumption [67]. This leaves the last two initial thoughts of ignoring
the negative links or applying an unsigned GCN separately on the positive only and negative only
networks, but intuitively the ﬁrst choice is certainly ignoring a large amount of information, and
based on signed social theories [2, 29, 30], there exist complex relations between the positive
and negative links that if extracted, can provide fruitful results [4, 24]. Therefore, next we will
discuss one such signed social theory, balance theory [29, 30] and how we propose to harness it for
capturing both the positive and negative links coherently together during the aggregation process.

106

number of aggregation layers ; neighborhood aggregation function  ()

Algorithm 5.1: Typical unsigned GCN framework.
Input: An unsigned network adjacency matrix A ∈ R×; a feature matrix X ∈ R×;
Output: Low-dimensional representation matrix Z ∈ R×
1 H(0) ← X
2 while not convergent do
3
4

5
6 Z ← H()

for  ∈ {0, . . . ,  − 1} do
H(+1) ←  (H(), A)

Update GCN parameters based on (H())

Figure 5.1: An illustration of the aggregation paths according to balanced and unbalanced paths.

5.1.2.2 Aggregation Paths with Positive and Negative Links

Balance theory dates back to the early seminal work in [29] and later generalized in [30] having
a graph theoretical foundation.
In general, balance theory implies “the friend of my friend is
my friend” and “the enemy of my friend is my enemy”. The theory classiﬁes cycles in a signed
network as being either balanced or unbalanced, where a balanced cycle consist of an even number
of negative links while a cycle having an odd number of negative links is considered unbalanced.
More details on balance theory and the four possible cycles that can be formed in a signed network
can be found in Section 2.3.3.1, where in Figure 2.2 we can see that triangles (a) and (b) are
balanced, while (c) and (d) are unbalanced. We propose to denote a balanced path as one that

107

consists of an even number of negative links, and similarly an unbalanced path being one that has
an odd number of negative links. With these deﬁnitions, along with balance theory, we can see that
if we had a path of length  from  to   that had an even number of negative links, then balance
theory would suggest a positive link between  an  . An example of a balanced path can be seen
in Figure 2.2 triangle (b), where the path of length two from  to   (through ) consists of two
negative links and thus balance theory would suggest a positive link connecting  and   (to result
in a balanced cycle between users , , and  ). From the context of user  we would then place
  into the set (2), which we use to denote the set of users that can be reached from user  along
a balanced path of length 2. In the general case, users that can be reached from  along a balanced
(or unbalanced) path of length  we place in the set () (or ()). In Figure 5.1 we provide an
illustration of how all the signed paths of a given length would place users along paths from  into
their respective sets. Note that the arrows are only used to aid the illustration and that our deﬁnition
is based on the more general undirected setting.


Before continuing, let us deﬁne N+

to be the set of positive neighbors of a user , i.e.,   ∈ N+
, where   ∈ N−
if A  = 1. We similarly denote the set of negative neighbors for user  as N−
when A  = −1. In Figure 5.1 we can see that when having a balanced path of length  from  to
some user  (i.e.,  ∈ ()), then all the positively linked neighbors of  (which we denoted
as the set N+
 ) would be placed in ( + 1). This is because adding a positive link to a balanced
path (i.e., a path consisting of an even number of negative links) still results in a balanced path, but
just of additional length. Similarly when adding a negative link to a balanced path, we obtain an
unbalanced path.

Another key observation from Figure 5.1 is how we can obtain the balanced and unbalanced
sets ( + 1) and ( + 1) of length  + 1, respectively for user , from the sets () and () of
length . Below we provide a recursive deﬁnition for calculating the balanced and unbalanced sets
from the perspective of user  as follows:
When  = 1

(1) = {  |   ∈ N+
 },

(1) = {  |   ∈ N−
 }

108

For  > 1

( + 1) = {  |  ∈ () and   ∈ N+
( + 1) = {  |  ∈ () and   ∈ N+

 } ∪ {  |  ∈ () and   ∈ N−
 }
 } ∪ {  |  ∈ () and   ∈ N−
 }

(5.2)

Given the above deﬁnition, we again note that the users in the balanced sets (which are reached
along balanced paths) for a user  are those that either: 1) have a positive link directly to ; or
2) those that balance theory would suggest a positive link between them since they have an even
number of negative links along the path connecting them. For the unbalanced sets the deﬁnition
is similar, except with direct/suggested negative links. We note that these deﬁnitions, based upon
balance theory, now allow us a principled way of aggregating and propagating information in signed
networks using balanced and unbalanced paths/sets. Next we will propose aggregation functions
for our signed GCN and follow with the rest of the details of our framework.

5.1.2.3 Signed Graph Convolutional Network

Before formalizing our signed graph convolutional network, we provide some insights and intuitions
behind the construction in light of balanced and unbalanced sets and paths. The ﬁrst insight is
that in unsigned GCNs, when constructing a node representation, they aggregate their immediate
local neighbors’ information into a single representation and then through the use of multiple
layers, propagate this in the network allowing a node to incorporate information from a multi-hop
neighborhood (where the number layers in the GCN denotes the number of hops away information
is being aggregated from). However, in signed networks, we cannot categorize all users the same.
This is because semantically users that are connected through positive links to  are thought of
as their “friends” while neighbors across negative links are their “foes”. Similarly, for users in
’s balanced sets, balance theory would suggest they are their “friends” (even though they are not
directly linked) and those in ’s unbalanced sets are suggested to be their “foes” based on this
social theory. This phenomenon can be visualized in Figure 5.1. Therefore, we propose rather than
maintaining a single representation for each node, we keep a representation of both their “friends”

109

Figure 5.2: An illustration of how SGCN aggregates neighbor information in a signed network.

and “foes”, which successfully incorporates both the positive and negative links and gives a more
thorough representation of a given user.

In Figure 5.2 we provide an illustration of how we plan to aggregate and propagate information
in a signed networks. Note that the circles labeled  = 1, 2, . . . ,  are used to denote how many
hops away the user is from  and simultaneously denotes at which layer in our signed GCN that
user’s information will be incorporated into the two learned representations for user . We can
observe that we could have a separate aggregator responsible for incorporating the information
from each respective balanced and unbalanced sets. For example, in the ﬁrst layer of Figure 5.2
we can see that the two positive neighbors of  will be incorporated into the level one “friend”
representation through the use of aggregator (1). Similarly ’s single negatively linked neighbor
is used for learning the level one “foes” representation. Then, through the use of a second layer in
our GCN, we can incorporate the two-hop neighbors. However, the crucial step here is that we must
aggregate the information of these neighbors correctly to adhere to balance theory according to
our deﬁned balanced and unbalanced paths/sets. Therefore we employ a second set of aggregators,
namely (2) and (2) which will help propagate the information from users in sets (2) and
(2), respectively. Notice that just as shown in Figure 5.1 users being included by the (2)

110

aggregator are the users who are along a path of two consecutive positive links, or two consecutive
negative links, because they are both suggested as “friends” according to balance theory. On the
other hand, aggregator (2) (which is gathering information from users in the set (2)) seeks to
utilize the information from users along paths that consist of one positive and one negative link (in
either ordering, since both fall into set (2)). Now we can more formally discuss the aggregation
functions used by our proposed SGCN.

While aggregating and propagating information in our SGCN, we will maintain two represen-
tations at each layer, one for the corresponding balanced set of users (i.e., suggested “friends”), and
one for the users in the respective unbalanced set (i.e., suggested “foes”). Similar to the unsigned
GCN, we use h(0)
∈ R to represent the initial  node features for user . Thus, for the ﬁrst
aggregation layer (i.e, when  = 1), we utilize the following:
h(0)
|N+

| , h(0)

h(1)

(5.3)

= 


(cid:32)
W(1)(cid:104) 
(cid:32)
W(1)(cid:104) 


∈N+

∈N−


(cid:105)(cid:33)
(cid:105)(cid:33)

h(1)


= 

h(0)
|N−


| , h(0)


(5.4)

where () is a non-linear activation function, W(1), W(1) ∈ R×2 are the linear
transformation matrices responsible for the “friends” and “foes” coming from sets (1) and (1),
respectively, and  is the length of the two internal hidden representations. More speciﬁcally,
for determining the hidden representation h(1)
we also concatenate the hidden representation of
user  (i.e., h(0)
In all subsequent layers, the
aggregation is more complex, just as the deﬁnition of () an () were more complex when
 > 1 in Eq. (5.2). This is similarly due to the cross linking of negative links as seen in Figure 5.1.
The aggregations for  > 1 are deﬁned as follows:
h(−1)
|N+
|

) along with the mean of the users in set (1).

W()(cid:104) 

h(−1)
|N−
|

, 

, h(−1)

h()

(cid:105)(cid:33)

(5.5)

= 

(cid:32)


∈N+


∈N−


111

(cid:32)

h()


= 

W()(cid:104) 

∈N+


h(−1)
|N+
|


, 

∈N−


(cid:105)(cid:33)

h(−1)
|N−
|


, h(−1)


(5.6)


) for all positively linked neighbors   ∈ N+

where W(), W() ∈ R×3 for  > 1. Note that we are utilizing the same logic here
as when deﬁning the sets () and (). When gathering user ’s “friend” representation (i.e.,
h()
) at layer  (when  > 1) it is based upon aggregating the “friend” representation at layer
( − 1) (i.e., h(−1)
 while simultaneously collecting
the average amongst the “foes” level ( − 1) (i.e, h(−1)
) information from all negatively linked
neighbors  ∈ N−
. Thus, for the case when  = 2 we can see the “friend” representation is in fact
gathering information from not only their direct friends (i.e., positively linked neighbors), but also
(at the two hop level) friends of friends’, and foes of foes’. Similarly, in the case of  = 2 our hidden
representation h()
(i.e., user ’s “foes” representation), the ﬁrst layer would have gathered direct
negatively linked neighbor information, but in the second layer, we are gathering from ’s friends’
foes and their foes’ friends.


With the above discussed aggregation methods, we can now present the entire framework of
SGCN. First, in Algorithm 5.2 we discuss how to obtain the embedding for each user  in the
signed network. On line 1, we set h(0)
equal to x for ease in deﬁning the rest of the algorithm.
Then on lines 2 through 5 we show the ﬁrst layers aggregation process. Next, if the total number of
layers in the SGCN is greater than one (i.e,  > 1), then we perform the subsequent aggregations
according to the deﬁned higher level aggregation functions we designed based on balance theory.
Finally, on line 14 the last step is concatenating the two hidden representations for user , namely
h()

together into a single low-dimensional representation.

and h()


Next we design an objective function to learn the parameters of SGCN. The objective function
for SGCN is based upon two components, both of which are based on the goal that we would
like the representations to be able to understand the relationships between pairs of users in the
signed network’s embedded space. The ﬁrst term incorporates an additional layer for performing
a weighted multinomial logistic regression (MLG) classiﬁer. Here we wish to classify whether a

112

Algorithm 5.2: Signed Graph Convolutional Network (SGCN) embedding generation.
Input: G = (U, E+, E−); an initial seed node representation {x,∀ ∈ U}; number of aggregation
layers L; weight matrices W() and W(), ∀ ∈ {1, . . . , }; non-linear function 
Output: Low-dimensional representations z,∀ ∈ U
1 h(0)
 ← x,∀ ∈ U
2 for  ∈ U do
h(1)
 ← 

3

4

h(1)
 ← 

5 if L > 1 then
6
7

for  = 2 . . .  do
for  ∈ U do


(cid:105)(cid:33)
h(0)
 | , h(0)
(cid:105)(cid:33)
|N+
h(0)
| , h(0)
|N−


∈N+

∈N−

(cid:32)
W(1)(cid:104) 
(cid:32)
W(1)(cid:104) 
(cid:32)
W()(cid:104) 
(cid:32)
W()(cid:104) 

= 

= 


∈N+


∈N+


8

9

h()


h()


, 
, 

∈N−


∈N−


h(−1)
|N+
 |
h(−1)
|N+
 |


h(−1)
|N−
|
h(−1)
|N−
|


(cid:105)(cid:33)
(cid:105)(cid:33)

, h(−1)


, h(−1)


10 z ← [h()


, h()


],∀ ∈ U

pair of node embeddings are from users with a positive, negative, or no link between them. More
speciﬁcally, we construct a mini-batch of users and then a set M, which contains triplets of the form
(,   , ) which denotes the pair of users (, ) along with  ∈ {+,−, ?} for denoting whether
there was a positive, negative, or no link between the pair of users. For input into the classiﬁer,
[z, z ]). We use
we use the ﬁnal embeddings for users  and   concatenated together (i.e,.
 to denote the weight associated with class . We introduce a second term that is founded on
extended structural balance theory. This term is controlled by  to balance the contribution towards
the overall objective. The goal of this second term is to have positively linked users closer in the
embedded space than the no link pairs, and the no link paired users should be closer than users

113

having a negative link between them. The overall objective is formalized in the following:

L( ,   ) = − 1
M

exp ([z, z ]  


)

exp ([z, z ]  


)

(,  ,)∈M

∈{+,−,?}

2)(cid:17)
2)(cid:17)(cid:35)

(5.7)


(cid:34)

+ 

1

|M(+,?)|


(cid:16)

 log


(,  ,)
∈M(+,?)

(cid:16)

+

1

|M(−,?)|

(,  ,)
∈M(−,?)
+ ( ,   )

max

0, (||z − z ||2

2 − ||z − z||2

max

0, (||z − z||2

2 − ||z − z ||2

 represents the weight matrices used in the layers of our SGCN,    denotes the parameters of
the MLG classiﬁer,  is used for the weight associated with the class  (with  ∈ {+,−, ?} for the
positive, negative, and no link classes), M(+,?) and M(−,?) are the sets for the pairs of positive and
negatively linked users, respectively, where for every linked pair (,  ) we further sample another
user  randomly (and diﬀerent in each epoch) that has no link to . The term ( ,   )
we use for regularization on the parameters of our model. For updating the parameters, we utilize
the same SGD style updating as presented in [75], since it has been show to eﬀectively update the
parameters of a GCN using a mini-batch setting (as compared to previous work such as in [22] that
performed batch gradient descent).

5.1.3 Experiments

In this section, we experimentally evaluate the eﬀectiveness of the proposed signed graph con-
volutional network (SGCN) in learning node representations. We seek to answer the following
questions: (1) Is SGCN capable of learning meaningful low-dimensional representations? and
(2) Does the introduction of balance theory into the aggregation process along with longer path
information provide a performance increase in learning the node embeddings?

To address the ﬁrst question, we conduct experiments to measure the learned embedding quality
by performing the most fundamental signed network analysis task, namely link sign prediction [4],

114

and compare against the signed network embedding state-of-the-art baseline methods. To answer
the second question, we investigate variants of our framework that do not exploit the longer paths
(i.e., only performing a single aggregation step) or that do not make use of balance theory (i.e., the
fundamental signed social network theory).

Experimental Settings: Next, we note the datasets used, the link sign prediction problem,
and the metrics used for evaluation. For our study of learning representations using signed graph
convolutional networks, we conduct our experiments on four real-world signed network datasets,
i.e., Bitcoin-Alpha, Bitcoin-OTC, Slashdot, and Epinions. We note that for each of these datasets
we perform our experiments on the undirected signed networks and have further ﬁltered out users
randomly from the two larger networks (Slashdot and Epinions) that had very few links. We
summarize the new variants of the Slashdot and Epinions datasets in Table 5.2 with some basic
statistics, while Bitcoin-Alpha and Bitcoin-OTC are the same as previously presented in Table 4.2.
The problem of predicting the signs of links [4] is that given a set of existing links in the signed
network that had been held out of the training set, we wish to predict their signs being positive or
negative between those pairs of users. Thus, a binary classiﬁer is used to predict the sign based
on a set of input features from the pair of users (more speciﬁcally we employ a logistic regression
model). In our case we concatenate the ﬁnal embeddings of the two users together as the set of
features. The model is trained using the labeled edges from the training data. For evaluation, since
the positive and negative links are unbalanced (i.e., there are many more positive links than negative
links), we utilize both F1 and Area Under the receiver operating characteristic Curve (AUC). We
note that higher F1 and AUC both mean better performance. For each dataset, we randomly choose
20% of the data as test, and the remaining 80% as training. Note that we used a grid search along
with cross validation on the training data to tune the hyperparameters of our model.

5.1.3.1 Performance Comparison

Here we present some existing state-of-the-art signed network embeddings methods such that we
can study the eﬀectiveness of our signed GCN (SGCN) in learning node representations in signed

115

networks. For succinctness we do not include unsigned methods since previous signed network
embedding work has shown their superiority over the non-dedicated eﬀorts towards signed network
embeddings. The baselines are as follows:

• Signed Spectral Embedding (SSE) [153]: A spectral clustering algorithm based on the
proposed signed version of the Laplacian matrix. We utilize the top- eigenvectors
corresponding to the smallest eigenvalues as the embedding vectors for each node.

• SiNE [87]: This method is a deep learning framework that utilized extended structural

balance theory.

• SIDE [89]: A random walk based method, utilizing balance theory, is used to obtain indirect

connections for a likelihood formulation.

Furthermore, we propose to evaluate the following two variants of our model:

• SGCN-1: This method only makes use of the ﬁrst single aggregation layer and therefore only
separates the positive from the negative links (i.e, does not yet make use of balance theory
and our deﬁned balanced paths).

• SGCN-1+: This method similar to SGCN-1 does not make use of balance theory, instead
it performs the naïve aggregation of the ﬁrst layer, but twice.
In other words, the ﬁnal
representation for each user is based on propagating information along the positive links
twice, and the negative links twice, separately.

Some ﬁnal notes are the following: 1) in our experiments we do not have node attributes,
therefore instead we use the ﬁnal embedding of the SSE model as the input feature matrix (i.e.,
) to all our SGCN variants; 2) for all embedding methods we ﬁxed the ﬁnal low-dimensional
representation to be 64; 3) We used the authors released code for SiNE3 and use their suggested
hyperparameters [87] for our experiments; 4) For SIDE, we use the authors implementation4 and

3http://www.public.asu.edu/ swang187/codes/SiNE.zip
4https://datalab.snu.ac.kr/side/resources/side.zip

116

Table 5.2: Statistics of two signed network dataset variants for SGCN.

Network
Slashdot
Epinions

# Users
33,586
16,992

# Positive

Links
295,201
276,309

# Negative

Links
100,802
50,918

Table 5.3: Link sign prediction results with AUC.

Embedding

Method

SSE
SiNE
SIDE
SGCN-1
SGCN-1+
SGCN-2

Bitcoin-Alpha Bitcoin-OTC Slashdot Epinions
0.822
0.849
0.571
0.663
0.722
0.864

0.764
0.778
0.630
0.780
0.785
0.796

0.803
0.814
0.618
0.818
0.817
0.823

0.769
0.792
0.547
0.784
0.804
0.804

the suggested hyperparameter settings from [89], but for the unsuggested parameters we used a
grid search around their code’s default settings; and 5) for our models we set  = 5 and the “friend”
and “foe” hidden representations were each set to 32, such that the ﬁnal embeddings were of size 64.

Table 5.4: Link sign prediction results with F1.

Embedding

Method

SSE
SiNE
SIDE
SGCN-1
SGCN-1+
SGCN-2

Bitcoin-Alpha Bitcoin-OTC Slashdot Epinions
0.901
0.914
0.711
0.851
0.893
0.933

0.898
0.888
0.738
0.910
0.912
0.917

0.923
0.878
0.750
0.918
0.923
0.925

0.820
0.854
0.646
0.853
0.865
0.864

Comparison Results:
The comparison results in terms of AUC and F1 are demonstrated in Tables 5.3 and 5.4, respectively.
For the tables, we make the following observations:

• SGCN-1 with only one step aggregation from positive and negative links obtains comparable
performance with the best performance from the baselines. This observation suggests that it

117

(a) SGCN-2 (F1)

(b) SGCN-2 (AUC)

Figure 5.3: Parameter sensitivity when varying the parameter  on the Bitcoin-Alpha dataset.

is necessary to separate positive and negative links.

• SGCN-1+ outperforms SGCN-1. The results indicate that propagating multiple steps during

the aggregation can help improve the performance.

• Most of the time, SGCN-2 outperforms SGCN-1 and SGCN-1+. Aggregation following the

longer balance and unbalanced paths can boost the performance.

5.1.3.2 Parameter Analysis

The proposed signed GCN has one major hyperparameter,  (besides the number of layers and the
aggregation types which we have already investigated with our variants of SGCN). The parameter 
is used to control the balance between the two terms in our objective function as given in Eq. (5.7).
More speciﬁcally, the ﬁrst term introduced the multinomial logistic regression term in an attempt
to guide the learned node embedding to be separable such that pairs of user that have positive,
negative, and no link can be positioned such that the classiﬁer can distinguish their relationship.
The second term we utilized for discovering node embeddings that adhere to extended structural
balance theory [154]. With its contribution controlled by , this term forced pairs of users that have
positive links to be closer in the low-dimensional embedding space than to other users they had no
link with, and further also sought to have users with negative links pulled further apart by wanting
no linked pairs closer together than the negative pairs.

118

In Figures 5.3(a) and 5.3(b) we report the results when varying  for one of the signed network
datasets, namely Bitcoin-Alpha. We do not show results of other settings since we can have similar
observations. As we can see from these two ﬁgures  = 5 seems to be a good balance between the
AUC and F1 performance. The second observation is that when setting  equal to zero we have
a drastic decrease in performance. Note that we saw similar results across all datasets in that the
contribution of the second term, based on balance theory, was able to provide an improvement.

5.2 Role-based Signed Network Embedding

Most existing network embedding methods have been designed for networks with only a single
edge type [155] and where relations between two nodes implies closeness. Hence, they primarily
try to encode an unsigned network in a way that neighboring nodes are closer in the embedding
space [155]. However, real-world networks might have more than one link type, i.e., a network can
have  types of links where each type represents a diﬀerent quality of relation between the nodes.
Signed networks are an important class of such networks, having two types of links: positive and
negative [55, 7].

A variety of social media sites, such as Amazon, Wikipedia, and Epinions can be represented
as a signed network where positive signs represent trust, agreement or friendship while negative
ones may show distrust, disagreement or enmity. The underlying principles of signed networks can
be quite diﬀerent from those of classic networks due to having both positive and negative links.
Therefore, network embedding for signed networks cannot be carried out by simply applying classic
embedding models. While embedding of signed networks is challenging, it has the potential to
greatly advance network analysis tasks such as link/sign prediction [87].

Recently, signed network embedding has attracted increasing attention [156, 157, 158, 118].
Similar to many embedding models for unsigned networks, these models try to embed the network
through ﬁnding similarities between nodes assuming connecting paths represent closeness. How-
ever, in signed networks path-based similarities are challenging since signed paths can indicate
either closeness or distantness. Existing works [118, 156, 157] solve this challenge by relying on

119

Figure 5.4: Transformation of a signed network with two nodes to an unsigned bipartite network
of role-nodes.

two signed social theories, namely balance theory [29, 30] and status theory [2]. However, the
general architecture on the way they deﬁne similarity between two nodes and the use of social
theories is associated with two major challenges. First, social theories are incomplete in explaining
signed network structure, so models built on them are aﬀected and sometimes result in lower quality
embeddings and depend how closely the networks align to these theories. Second, classic embed-
ding models aim to capture presence/absence of links while existing signed embedding models only
use two of the possible interaction states: positive and negative links. Thus, since they ignore link
absence (the third interaction state), they can not reconstruct the presence/absence of links well,
resulting in low performance in link prediction.

To address these shortcomings, we lay out a new perspective for network embedding denoted
as network transformation based embedding: if embedding the original network is challenging, it
can be transformed into another network for which the embedding task has lower complexity. The
transformation can be done by mapping each node in the original network to multiple nodes in the
transformed network. Next, the transformed network can be embedded. Finally, the embedding
vectors obtained from the transformed network can be aggregated to encode the original network.
More speciﬁcally, to embed signed networks, we introduce a ROle based Signed network Embedding
(ROSE) that bypasses the aforementioned challenges. The underlying idea is to transform the signed
network into a bipartite network where each node takes both “user” and “item” roles for which they
are the giver and receiver of signed links, respectively. Therefore, each node of a signed network

120

can be modeled by a set of roles, denoted as role-nodes, where the relations between role-nodes can
be fully captured using unsigned links. Then, ultimately this transformed network can utilize the
state of the art unsigned embedding technique. Figure 5.4 is a toy example of the transformation
process

Each role-node captures a certain aspect of a node in the original network. Hence, a compre-
hensive embedding of a node can be obtained by aggregating the embeddings of their role-nodes.
We introduce two aggregation methods, denoted as ﬁxed aggregation and target-aware aggregation.
The ﬁxed aggregation simply concatenates all the role-node embeddings together. The target-
aware aggregation is based on a recent deep learning based recommendation model that introduced
a model for target dependent encodings of users [159]. Based on this idea, we propose an attention
mechanism based model to aggregate the embeddings of role-nodes in which attendance weights
are obtained with respect to the target entity. To the best of our knowledge, this is the ﬁrst work to
build target aware embeddings of nodes in a signed network.

5.2.1 Problem Statement
Let a graph be deﬁned as (, ) with a link type mapping function  :  →  where  represents
the nodes,  represents the links, and each link  ∈  belongs to a link type () ∈  = +,−.
In unsigned networks  has only one values, but signed networks have two values: positive and
negative. Given the graph , the task of node encoding is to learn a function  :  −→ || that
maps each node  to a -dimensional embedding vector which can be parametrized by the Matrix
 with size || ∗ .

5.2.1.1 Unsigned Network Embedding

Embedding models can be described as an encoding-decoding framework [160] having four com-
ponents: 1) A pairwise node similarity function. 2) An encoder function to create embeddings
from the similarity function. 3) A decoding function to recover the pairwise node similarities
from their embeddings. 4) A loss function that evaluates the reconstructed similarity values. The

121

primary diﬀerence in the literature is how the embedding methods deﬁne node similarity. However,
the shared principle in unsigned similarity functions is that an unsigned path between two nodes
indicates their closeness (e.g., as in [161, 150, 149]).

5.2.1.2 Signed Network Embedding

The unsigned similarity functions cannot be directly applied to signed networks because negative
edges do not represent closeness. Thus, the challenge in embedding signed networks is how to
involve negative edges without hindering positive proximity. Existing methods have sought to
capture node similarity using the paths between them at length one, two, or higher order paths.

Single-length paths: A trivial approach to embed signed networks while involving negative
links is to embed a node based on its immediate neighbors [162, 88]. This, however, has limited
eﬀectiveness because it cannot capture the higher order proximities between nodes. However,
capturing global structures in signed networks is challenging, e.g., given a path containing 
positive links and  negative links, how can the similarity of the nodes can be deﬁned? Does the
path indicate closeness or distantness between the nodes? To aid this previous works used the two
signed social theories, namely balance and status.

Paths of length two: As the above social theories are formed on triangle structures, existing
methods [87, 157] have used them to determine whether the signed path is representing closeness
or distantness between two nodes. However, these methods do not go beyond paths of length two.
As such, they have limited power in capturing global structures.

Longer paths: More recent work has tried to capture longer cycle paths in their embedding
process mainly by relying on the extended version of social theories. [156, 163] both run a random
walk on signed networks similar to node2vec algorithm, [133] applied to node relevance and
personalized ranking [164] using random walks. Then, a graph convolutional network method for
embedding signed networks has been introduced which relies on balance theory [118].

In all, the shared strategy of all these works is that they embed nodes by analyzing the paths
If a path indicates closeness, they embed the nodes closer and distance them

between them.

122

otherwise. However, to interpret if a path indicates closeness or distantness, they exploit some
strong assumptions which naturally induce noise to the embedding process. Also, this strategy
does not use a principled way to distant nodes based on the absence of links/paths between them,
i.e., it only focuses on capturing positive/negative paths.

5.2.2 Role-based Signed Network Embedding

In this section, we describe the structure of ROSE. Based on the drawbacks of the previous works,
we outline the following requirements: an eﬀective universal network embedding model should be
able to 1) capture the higher order connectivity between nodes, 2) take into account the link labels
as well as the link structures (presence or absence of links), and 3) do not make assumptions about
the origin of the network.

Network transformation based embedding: To address the requirements, we introduce the
general notion of network transformation based embedding. Rather than directly ﬁnding the
similarities of the nodes in the input network, it can be transformed to another network in which
we do not encounter the embedding challenges present in the input network. One possible way
to do transformation is to deﬁne diﬀerent roles for a node, denoted as “role-nodes”, and build
a network of role-nodes in a way that the similarities between role-nodes can be determined by
adopting the classic similarity functions. Since each role-node captures a certain aspect of an
original node, the embedding vector of a target node can be derived by aggregating the embeddings
of the corresponding role-node. In sum, a network transformation based embedding model can be
described in three main steps: 1) Network transformation. 2) Embedding the transformed network.
3) Embedding the original network by aggregating the embeddings of the transformed network. By
relying on the general idea of network transformation, we propose ROSE. In the following, ROSE
is described based on the aforementioned three-step architecture. We then illustrate how ROSE
addresses the requirements of the problem.

123

Figure 5.5: Transformation process of the input signed network to the network of role-node.

5.2.2.1 Network Transformation

The way the role nodes are deﬁned is fundamental to the eﬀectiveness of ROSE. We aim to
deﬁne the transformation such that similarities between role-nodes can be obtained using classical
methods. Note that there can be multiple ways to deﬁne the transformation process. Various
embedding techniques have been introduced based on diﬀerent similarity measures. Similarly,
diﬀerent embedding methods can be developed based on the idea of transformation based embedding
by creating diﬀerent transformations. Our transformation idea is inspired by recommender systems.
Traditionally, user-item interactions in recommender systems are modeled by a user-item bipartite
network. A signed all-to-all connected network can also be viewed as a bipartite network where
each node plays a “user” role for the links it creates and plays an “item” role for the links it receives.
Based on this analogy, we capture user/item roles of a node separately through a transformation
process.

Step 1: Transformation to a bipartite network. Based on user-item analogy, each node is
mapped to two role-nodes, i.e., the node  is mapped to the role-nodes  and  where a link
from  to  in the original network is modeled as an undirected link between  and . As it
can be seen in Figure 5.5(a), the input network is transformed to a signed bipartite network [165]
with two types of nodes “in” and “out”. However, applying a classic similarity measure on the
transformed network is still a challenge due to presence of positive and negative links.

Step 2: Transformation to an unsigned network. We transform the network into an unsigned

124

network by deﬁning new role-nodes. A role-node of type “in”  is mapped into two role-nodes:
+
 (or −
) representing its role when positive (or negative) links point toward it. Accordingly, a
link from  to  with label  is modeled as an unlabeled, undirected link between  and

. This enables us to use the well-established similarity functions to determine the similarities of
role-nodes.

 and +

Step 3: Augmenting the network. Our strategy is to encode the original network by embedding
In
the transformed network. However, some of the role-nodes may have a very low degree.
particular, role-nodes of type “in-” tend to have a very low degree due to the fact that the number
of negative links is often under-represented compared to positive links. According to our results,
this can dramatically hinder the accurate embedding of such role-nodes. To solve this, we leverage
implicit knowledge about the problem domain. If node  has connections towards both +
 and
−
, not only it reﬂects the adjacency between these two role-nodes but also it implies dependence
between their opposite role-nodes: −
. To bring this knowledge into our embedding
process, which can attenuate the sparsity problem, we augment our unsigned network with a set of
dummy nodes of type “out”, i.e., for each node of type “out” in the unsigned network with the set of
 }, we add a node of type “out-dummy” with the set of connections
connections {11
(cid:48)
{1
2
, ..., 

(cid:48)
1
, 2
Summary of transformation: In sum, (, ) is transformed to a bipartite unsigned graph
(, ) where || = 4||, || = 2||, i.e., a node  ∈  is mapped to four role-nodes in : 1)
 which initiates a link, 2) 
 which receives a positive
link, and 4) −
 which receives a negative link. And a link , with the label  is transformed
to links 
. Figure 5.5 depicts a toy example describing the process. It
should be noted that the transformation is lossless, i.e., (, ) can be fully reconstructed from
(, ).

which initiates a dummy link, 3) +

, ..., 

, 22
(cid:48)
 } where (cid:48) is the inverse of , i.e., if  is “-”, (cid:48) is “+” and vice versa.


 ,


and 


,(cid:48)


125

5.2.2.2 Embedding the Original Network

Analogous to unsigned networks, the links between role-nodes indicates their closeness. Hence, a
classic embedding model can be used to embed role-nodes. We employ node2vec [150]. Note that
more advanced embedding models [155, 166] can be used/designed for this purpose. However,
in this paper, our focus is on introducing the structure of ROSE. Once having the embeddings of
role-nodes, a node’s embedding is created by aggregating the embeddings of the corresponding
role-nodes. In the following, we introduce two diﬀerent aggregation models.

Fixed Aggregation: Each of the roles represents a certain perspective/role of a node in the
original network. In general, if there are multiple representations of an entity, we can concatenate
them or linearly combine them to build a uniﬁed representation. Accordingly, a straight forward
way to build a comprehensive and uniﬁed embedding of a node is to concatenate the embedding
vectors of the corresponding role-nodes. As such, the ﬁxed representation of node  can be deﬁned
as  = ||+
 in which || represents concatenation. Note that dummy role-nodes are
not used in the aggregation process. In fact, “out-dummy” role nodes are inverses of the role nodes
of type “out” and do not add extra knowledge about the representations of the original nodes.

||−


Target Aware Aggregation: One important application of graph embedding is to use the
embedding vectors to predict the pairwise interactions of nodes, e.g., link prediction. Intuitively,
for such tasks, it is more accurate to encode the given initiator node with respect to the target
entity.
In fact, the idea of target-aware proﬁling is the basis for most of the recommendation
models. For example, in item-based collaborative ﬁltering, to predict the rating of a user towards
an item, her previous ratings are aggregated in a weighted way because not all of her interactions
are equally important in reﬂecting her taste towards the item [167]. Typically, the weight of a rating
is determined based on the similarity of the corresponding item to the target item [159].

Inspired by recommender systems, we introduce a target-aware embedding technique by propos-
ing a target-aware aggregation model. To the best of our knowledge, the existing techniques build
only ﬁxed embeddings. In our framework, intuitively predicting the pairwise interaction from  to
 depends on the “out” role node of  and “in” role nodes of . We propose that “out” role-node of

126

 can be embedded according to  which is denoted by 
, the target dependent
||. Indeed, we concatenate the ﬁxed embed-
embedding of  w.r.t.  is deﬁned as  
ding of  with a component that depends on the target entity to build its target-aware embedding.
To build 
can be obtained by
attending to the neighbors of  based on their relevancy to the target entity . More formally,

, we design an attention mechanism. We suggest that 

. Having 

 =  


is deﬁned as:


=


[(

, )

],


∈()


, ) is the importance weight of  w.r.t.  and () is the set of role-nodes
where (
connected to . To estimate the importance weights, we introduce an unsupervised attention
model. The intuition behind the model is that the “in” role-nodes of two nodes are more related
if the are more tightly connected in the network. Note that we do not take into account the labels
of connections to ﬁnd the relevancy of two nodes. To systematically implement the idea, given the
target signed network, we assume the links are unsigned and transform it to a bipartite network
where the obtained network has two types of role-nodes: “in” and “out”. Next, the transformed
network is embedded using node2vec. Finally, (
, ) = (, ) =

, ) is deﬁned as follows:

(

1

1 +  (− .) .

In fact, this weight determines how tightly  and  are connected in terms of the nodes that rated
them regardless of the rating values.

5.2.2.3 Model Justiﬁcation

We outline three main requirements for an eﬀective signed embedding technique. ROSE fulﬁlls the
requirements. 1) Unlike existing models, ROSE does not rely on any assumption about the origin
of the network. 2) To obtain the embedding of role-nodes, we use a random walk based model
to ensure the obtained embeddings capture the higher order proximities. 3) The model preserves
both link labels and link structures. To address the sign/link prediction tasks the embeddings from
ROSE can be fed to a nonlinear function trained by a method like MLP to determine the target

127


and 

(cid:48)


than 

interaction state. In fact, the embeddings of role-nodes contain major patterns that can aid to fully
reconstruct the graph. We encode the role-nodes in a way that if a link with label  exists from
(cid:48)
 to ,  has higher proximity to 
. And if there is no link from from  to

,  is expected to have low proximities to both 
. As such, having a function
to ﬁnd the similarities of embeddings, the label of the link from  to  is expected to be  if the
. And if  has low similarity
proximity of  to 
to both +
 it indicates absence of link. Moreover, we observe other interesting patterns

in our experiments, which will be discussed in the experiments section. In addition to addressing
the requirements of the problem, the proposed framework creates an avenue to make a connection
between the recommender systems and signed networks contexts. Lastly, the proposed model is
quite generalizable. Since the model does not rely on any assumption speciﬁc for signed networks,
it can be generalized for networks with multi-type of links.

is greater than it’s proximity to 

and −

(cid:48)


5.2.3 Experiments

We conducted experiments to verify the eﬀectiveness of the proposed framework and the ideas
behind the model. The experiments are focused on answering two key questions:

• How do the proposed embedding frameworks perform when compared to the state of the art

models in terms of link-label prediction and link prediction tasks?

• What is the interpretation of the embeddings obtained from the network of role-nodes?

Datasets: Three real-world datasets were used in the experiments: Epinions [3], WikiElection
[168], and Slashdot [169] which have been used in previous works [55]. WikiElection: In Wikipedia
election, users may give positive or negative votes for the promotion of other users as administrator.
WikiElection dataset is the signed network obtained from users’ votes for elections of administrators.
Epinions: Epinions was an online product review site. Users can express positive or negative votes
to other users regarding the trustworthiness of their reviews; this dataset is from the positive/negative
votes between users. Slashdot: Slashdot dataset is also obtained from an online service (technology

128

Table 5.5: Statistics of three signed network dataset variants for ROSE.

WikiElection

Epinions
Slashdot

Nodes
7118
119217
82144

Edges
103747
841200
549202

78.7%
85.0%
77.4%

Positive Edges Negative Edges

21.2%
15.0%
22.6%

news website) where users can share comments and ﬂag each other as friend or foe. The ﬂags
indicate approval or disapproval of comments. Analogous to Epinions, Slashdot dataset models
the interactions of users using a signed network. Statistics of the datasets are given in Table 5.5.

5.2.3.1 Performance Comparison

In this experiment, we compared the performance of ROSE with four recently introduced signed
network embedding models on two tasks: sign prediction and link prediction. Moreover, we
compared the target dependent variant of ROSE denoted as ROSE-UAT with the ﬁxed variant
represented by ROSE on both of the tasks. AUC (Area Under the Curve) was used as the evaluation
metric.

The following is the list of the models used in this experiment. SIDE is a random walk based
approach that aims to capture global structures in the embedding process [156]. BESIDE aims to
use both balance and status theories in a complementary manner to encode signed networks[157].
SiNE is a deep learning based framework that performs based on undirected networks. The main
principle behind the model is that “users should sit closer to their friends than their foes” [87].
SIGNet is also a random-walk based model that maintains structural balance using targeted negative
sampling. [88].

Evaluation: In our datasets, the number of negative edges is much smaller than the positive
links. Thus, comparing methods based on their original test set accuracy could be misleading,
especially for sign prediction. Thus, analogous to previous works [4, 157], we balanced the
datasets by randomly removing positive links and used 5-fold cross-validation for our experiments.
The baselines were evaluated based on the source-codes released by their authors. The embedding
dimension for all of the models was set to 30.

129

Table 5.6: AUC of the proposed model (ROSE) and the baseline methods on the WikiElection,
Slashdot and Epinions datasets.

Sign Prediction

Link Prediction

Model WikiElection Slashdot Epinions WikiElection Slashdot Epinions
0.9314
SIDE
0.9397
BESIDE
0.6488
SiNE
0.9205
0.9403
0.9444

0.7986
0.8953
0.8632
0.8943
0.9091
0.9116

0.8815
0.9012
0.8680
0.8997
0.9082
0.9095

0.9342
0.9265
0.5983
0.8862
0.9357
0.9391

SIGNET
ROSE

ROSE-UAT

0.8672
0.9342
0.8543
0.9181
0.9533
0.9547

0.9184
0.9092
0.5833
0.9099
0.9418
0.9426

Sign Prediction: The problem of sign prediction [3, 124] is the major task that has been used
to evaluate encoding models in previous works. The Table 5.6 shows the AUC of the models on
three datasets. First, we notice ROSE-UAT and ROSE outperform all the baselines. For example,
ROSE-UAT outperforms BESIDE by 1.6%, 0.8%, and 2% in terms of AUC on WikiElection,
Slashdot, and Epinions datasets, respectively. The higher accuracy of ROSE can be attributed to its
eﬀectiveness in addressing the requirements of the problem. Additionally, we observe ROSE-UAT
perform better than ROSE. In fact, encoding the nodes with respect to a target entity helps to better
analyze the interactions of the node and the entity.

Link Prediction: Although link prediction is an important task in network mining [54],
previous works have not evaluated their models based on link prediction task. To evaluate the
models for the link prediction task, we ﬁrst fed the training graph to the models and obtained the
node encodings. Next, we created training and test sets. Each data instance in the training/test sets
is the concatenation of the encoding vectors of a node pair (, ) and the label of the instance is
1 if there is a link from  to  and 0 otherwise. In both training and test sets, 50% of instances have
label 1 and 50% of them have label 0. The node pairs with 0 label were randomly selected. The
training set obtained from each embedding method was fed to a multi-layer perceptron classiﬁer, and
the AUC of the trained model was obtained based on the test set. Table 5.6 shows the results of the
experiments. As it can be seen, ROSE has superior performance than the baseline models on all but
one of the datasets since the model systematically diﬀerentiates the three diﬀerent interaction-states
between nodes in its embedding process.

130

Figure 5.6: The average pairwise distance of the encoding vectors of the role-nodes of a node pair
(, ) for diﬀerent interaction-types between them: positive link, negative link, and absence of a
link.

5.2.3.2

Interpretation of the Encodings of Role-nodes:

Given nodes  as initiator and  as receiver, three interaction-states can be considered between
them: absence of link, positive link and negative link. We introduced the major pattern extracted
from the distance/similarity of the encoding vectors of  and  that can aid to determine the
interaction type between them. We investigated the existence of such patterns. Figure 5.6, shows
the average distance of diﬀerent encoding components of a node pair (, ) as a function of the
interaction-state between them. For example, in Slashdot dataset if the link is positive the average
) is 1.2. The average distances of the
distance of  from +

components are consistent with the introduced major patterns. If the link from  to  is positive,
(, +
). For example, for a positive link in Epinions,
(, +
) is 1.7. Moreover, we observe that if there is a link

) is expected to be smaller than (, −

denoted as (, +

) is 1.1 while (, −

131

voutv+invinuoutu+inuin1.11.11.71.11.11.71.81.81.3Positivevoutv+invinuoutu+inuin1.21.21.51.21.21.51.61.61.4Positivevoutv+invinuoutu+inuin1.11.11.41.11.11.41.31.31.1Positivevoutv+invinuoutu+inuin1.31.71.31.31.71.31.61.51.6Negativevoutv+invinuoutu+inuin1.31.51.31.31.51.31.61.61.6Negativevoutv+invinuoutu+inuin1.31.41.31.31.41.31.31.31.3Negativevoutv+invinuoutu+inuin2.12.12.12.12.12.12.12.12.1(a) EpinionsAbsentvoutv+invinuoutu+inuin1.71.61.71.71.61.71.71.71.6(b) SlashdotAbsentvoutv+invinuoutu+inuin1.71.71.51.71.71.51.61.61.4(c) WikipediaAbsent1.21.41.61.82.01.21.41.61.82.01.21.41.61.82.01.21.31.41.51.61.71.21.31.41.51.61.71.21.31.41.51.61.71.11.21.31.41.51.61.71.11.21.31.41.51.61.71.11.21.31.41.51.61.7) + (, +

from  to , (, −
) is smaller than the case when there is no link. For
example, in Slashdot, (, −
) is 3.3 if there is no link from  to  and it
is 2.75 if there is a link. In addition to the major pattern, we observe four other patterns. We name
these patterns as implicit patterns because our model has not targeted to extract them.

) + (, +

, +

) and (−

First, if the sign of the link from  to  is positive, similar nodes rate them similarly and if it
is negative, similar nodes rate them with diﬀerent signs. In fact, the smaller distance between the
embeddings of the role-nodes of type “in+” and of type “in-” of two nodes means that they were
rated by similar nodes similarly. For example, in Epinions dataset (+
, −
)
are 1.1 and 1.3 respectively when the sign of the edge from  to  is positive while those distances
are 1.7 and 1.6 respectively when the edge sign is negative. It can be said this pattern is aligned
with balance theory, i.e., the triangle structures described in balance theory can be regarded as a
special case of this pattern [170]. Second,  and  rate similar nodes more similarly when there
is a positive link between them than when there is a negative a link connecting them. The smaller
distance values between the embeddings of the role-nodes of type “out” of two nodes indicate
that they have rated similar nodes similarly. As it can be seen, (, ) is smaller when
there is a positive link from  to . Again, balance theory can be regarded as the special case of
this pattern. Third, the signs of the link between two nodes in diﬀerent directions are correlated.
(, +
) is smaller when there is a positive link from  to  than when
there is a negative link. This pattern is contradictory to status theory. Fourth, the average distance
between the embeddings of the role-nodes of two nodes is quite larger when there is no link between
them than when there is a link. A large distance between the embeddings of two nodes implies they
are not tightly connected and belong to diﬀerent clusters.

) - (, −

132

CHAPTER 6

SIGNED NETWORK APPLICATIONS

In this chapter 1, 2, 3 , we investigate applying signed networks to understanding and solving real-
world application problems. In traditional unsigned network analysis there are many data mining
applications that are improved through harnessing these networks, such as information propaga-
tion [171] and recommendation [172]. Similarly, signed networks have also been used towards
improving these traditional applications in information propagation [111, 173, 174] and recom-
mendation [175, 132, 130].
In addition, although recently there have been algorithms for both
the tasks of predicting interaction polarity scores [176, 177, 178, 179] and link signs between
users [61, 131, 4, 180]. However, there are still some challenges associated with signed networks
for predicting interaction/link polarities, such as the cold-start problem [181, 179], which is deﬁned
when users ﬁrst join the system and have not logged many interactions/links yet. Furthermore, most
of these methods them have tackled these two tasks independently. This naturally does not utilize
the linkage between the two if the content (i.e., “items”) were generated by the users themselves.
Thus, in Section 6.1 we present a joint model that is able to both predict the interaction polarity
scores between users and content, but simultaneously predict the polarity directly between users.
In addition to the traditional applications, there are also a plethora tasks in other domains that
can beneﬁt from signed networks. More speciﬁcally, signed networks in chemistry (i.e., Möbius
graphs) are used with studying molecular systems [182]; in ecology for analyzing community
structure [183]; in physics for modeling frustration in spin glasses [184]; and in political science
1Tyler Derr, Zhiwei Wang, Jamell Dacon, and Jiliang Tang. “Link and Interaction Polarity

Predictions in Signed Networks.” Social Network Analysis and Mining. 2020.

2Tyler Derr, Zhiwei Wang, and Jiliang Tang. “Opinions Power Opinions: Joint Link and
Interaction Polarity Predictions in Signed Networks.” In Proceedings of the 2018 IEEE/ACM
International Conference on Advances in Social Networks Analysis and Mining (ASONAM).
2018.

3Tyler Derr*, Hamid Karimi*, Aaron Brookhouse, and Jiliang Tang. “Multi-Factor Congres-
sional Vote Prediction.” In Proceedings of the 2019 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining (ASONAM). 2019.

133

for analyzing balance and polarization [185]. In addition, other domain areas such as health (e.g.,
weight loss prediction using social networks [186]), education (e.g., utilizing social networks to
better understand teachers [187, 188]) Thus, in Section 6.2 we present a comprehensive congres-
sional vote prediction framework built around modeling both ideological and social factors, with
the latter being modeled as a signed bipartite network.

6.1 Link and Interaction Polarity Prediction

Online social media has become an increasingly popular place for people to share and exchange
their opinions. These opinions among users can be expressed in two main ways, namely either
directly or indirectly. More speciﬁcally, in signed networks, users can directly specify opinions
to others via establishing positive or negative links; and they also can give opinions to content
generated by others via a variety of social interactions such as commenting and rating. Intuitively
these two types of opinions should be related. For example, users are likely to give positive (or
negative) opinions to content from those they have established positive (or negative) links; and users
tend to create positive (or negative) links with those that they frequently positively (or negatively)
interact with. Therefore we can leverage one type of opinions to power the other by capturing
the correlation between these two types of opinions. Hence, a joint framework has the potential
to mitigate the data sparsity and cold-start problems for both tasks. Meanwhile, they can enrich
each other that can help mitigate the data sparsity and cold-start problems in the corresponding
predictive tasks – link and interaction polarity predictions, respectively.

6.1.1 Problem Statement
Let U = {1, 2, . . . , } denote the set of  users. We represent signed links between users in an
adjacency matrix, T ∈ R×, where T  = 1 if  creates a positive link to  , −1 if  creates a
negative link to  , and 0 otherwise (i.e., when  has shown no link to  ). Let R = {1, 2, . . . , }
be the set of  content items generated by U. We use A ∈ R× to denote the authorship matrix
where A  = 1 if  creates   and A  = 0 otherwise. Social media provides multiple ways for

134

Table 6.1: Extended epinions dataset statistics.

# of Users
# of Positive Links
# of Negative Links
Density of T
# of Reviews
# of Positive Interactions
# of Negative Interactions
Density of H

233,429
717,667
123,705
7.75 × 10−5
755,722
12,581,553
1,086,551
1.54 × 10−5

users to express their opinions to content items generate by other users. For example, Facebook and
Twitter allow their users to comment on content; Youtube provides thumbs-up and -down buttons;
and Epinions enables its users to rate the helpfulness of the content with scores from 1 to 6. We use
H ∈ R× to denote opinions expressed by U to R, where H = 1( − 1) if  gives a positive
(or negative) opinion to  and we use H = 0 to indicate no explicit opinion is expressed from
 to . Note that in this paper, we deﬁne positive (or negative) interactions between  and  
as  giving positive (or negative) opinions to content items generated by  . In other words, an
interaction between users is deﬁned as a triplet (,  ,  ) where  expresses opinions to  that
was generated by  .

With the above notations and deﬁnitions, our problem is stated as follows: given the signed
relations T, the authorship matrix A and the user-item opinion matrix H, we aim to learn a predictor
that can infer signed links and interaction polarities simultaneously by leveraging T, A, and H.

Note that when the content of the item is available, we also can utilize the content of R.
However, in this paper, we focus on leveraging T, A, and H and would like to leave the problem of
exploiting content as one future work.

6.1.2 Signed Network User Opinion Data Analysis

In this section, we conduct preliminary analysis on the correlation between signed links and
interaction polarities. We begin by introducing an extended version of our Epinions dataset as
previously described in Table 4.2.

135

6.1.2.1 Extended Epinions Dataset

We collected an extended dataset from Epinions for this investigation. Epinions users can give
positive and negative links to each other, which we use to construct the T matrix. They also can
write reviews and we use this data to construct the authorship matrix A. For each review others
can use scores from 1 to 6 to indicate the helpfulness of the given reviews and that we use these
to construct the matrix H. We deﬁne positive and negative helpfulness ratings to be {4, 5, 6} and
{1, 2, 3}, respectively. Some statistics of the dataset are shown in Table 6.1. From the table, we
can observe that (1) there are more positive links (or interactions) than negative ones; and (2) both
links and interactions are very sparse. The task of creating (or receiving) a signed link to others
can be thought of as an explicit form of expressing one’s opinion of (or from) others. In contrast,
when a user interacts with the content authored by others, they are implicitly marking their opinion
towards others in these interactions. Therefore, it is reasonable to assume that the implicit and
explicit opinions among users are correlated, so we investigate these correlations from both global
and local perspectives.

6.1.2.2 Correlated User Opinions: A Global Perspective

From a global perspective, we want to examine the correlations between these explicit and implicit
opinions from one user. In particular, we aim to answer the following questions – (1) is a user,
giving more positive (or negative ) links, likely to give more positively (or negatively) on content
from others? and (2) is a user, receiving more positive (or negative ) links, likely to receive more
positive (or negative) opinions on his/her content? In this work, we refer to giving links or opinions
on content as giving behaviors; while receiving links or opinions on content as receiving behaviors.
To answer the ﬁrst question, we group users into three classes based upon their outgoing links
as follows: (1) users who only have positive outgoing links (76,819 users); (2) users having only
negative outgoing links (7,138 users); and (3) users who have both positive and negative outgoing
links (11,361 users). Then, we calculate the opinions (or helpfulness ratings) they gave to content
from others for each group and we plot kernel smoothing density estimation for each group in

136

(a) Giving behaviors.

(b) Receiving behaviors.

Figure 6.1: Giving and receiving behaviors from the global perspective on opinion correlations.

Figure 6.1(a). We note that on average users who only create positive links also tend to interact
more positively with the content generated by other users as compared to users who only create
negative links. Furthermore, users who create both positive and negative links show a higher
variance than the only positive and only negative classes, and thus are more likely to express both
positive and negative behaviors in their interactions.

To answer the second question, we divide users into three groups based upon their incoming
links as follows: (1) users who only have positive incoming links (52,810 users); (2) users having
only negative incoming links (14,701 users); and (3) users who have both positive and negative
incoming links (17,090 users). Following the similar procedure, we plot kernel smoothing density
estimation of receiving behaviors for each group in Figure 6.1(b). From the ﬁgures, 6.1(a) and
6.1(b), we can make very similar observations for receiving behaviors as giving behaviors, which
lead to a positive answer to the second question – users, receiving more positive (or negative) links,
are likely to obtain more positive (or negative) opinions on their content.

6.1.2.3 Correlated User Opinions: A Local Perspective

The global perspective in Subsection 6.1.2.2 focuses on correlations between one user and the
remaining network.
In this subsection, we focus on a pair of users and we want to investigate

137

(a) Giving behaviors.

(b) Receiving behaviors.

Figure 6.2: Giving and receiving behaviors from the local perspective on opinion correlations.

whether the existence of a positive (or negative) link for a pair of users makes a diﬀerence on how
they give (or receive) opinions on each other’s content. In particular, for a pair of users  to  ,
we aim to answer – (1) if  gives a positive (or negative) link to  , is  likely to give positive (or
negative) opinions to content from  ? ; and (2) if   receives a positive (or negative) link from ,
is   likely to give positive (or negative) opinions to the content from ? Note that in this work,
we use  +  ,  −   and ?  to denote a positive, negative and no link from  to  .

To answer the ﬁrst question, we divide all pairs of users into three groups – (a) positive pairs
 +  ; (b) negative pairs  −  ; and no-link pairs ? . For each pair in each group, we
calculate the average opinion (or helpfulness ratings) from  to the content of  . We apply kernel
smoothing density estimation for each group and the distributions are shown in Figure 6.2(a). From
this ﬁgure, we note that on average positive pairs have higher helpfulness scores than no-link pairs,
which have higher scores than negative pairs. Hence, it is quite evident from the ﬁgure that if 
gives a positive (or negative) link to  ,  is likely to give positive (or negative) opinions to the
content from  .

Intuitively, if   receives a positive link from ,   is likely to be friendly to , and as a
consequence,   is likely to give positive opinions to the content of . On the other hand, if  
receives a negative link from ,   could do revenge back and give negative opinions to the content

138

of . We follow a similar procedure of answering the ﬁrst question for the second question. The
results are demonstrated in Figure 6.2(b). From this ﬁgure, we observe that (1) on average,  
mostly gives positive opinions to the content from those who give positive links to  ; while  
mostly gives negative opinions to the content from those who give negative links to  . These
observations support that if   receives a positive (or negative) link from , then   is likely to
give opinions being more positive (or negative) to the content from .

6.1.3 The Joint Link and Interaction Polarity Prediction (LIP) Framework

In Section 6.1.2, we validated that there exist correlations between a user’s opinion of other users
in regards to the links they form in signed social networks and the polarities of the interactions
between them. Thus, these ﬁndings naturally lead us to the question of whether this knowledge
can beneﬁt the two prediction tasks that are found in the two domains; link and interaction polarity
prediction.
In this section, we ﬁrst brieﬂy discuss a basic framework to solve the two tasks of
link and interaction polarity predictions individually. We then discuss how to model the opinion
correlations that enable us to have the opinions in one task power the other. Finally we present our
proposed framework LIP, which directly incorporates these correlations into a joint optimization
algorithm that can infer the polarities of links and interactions jointly.

6.1.3.1 Basic Link and Interaction Polarity Prediction Models

The low-rank matrix factorization approach has gained popularity recently and is now being used
across various applications such as link prediction [189, 131] and recommender systems [132,
179]. In this work, we choose to build the basic prediction models based on the low-rank matrix
factorization approach.

Link Prediction: Let T = {(,  ) | T  ࣔ 0} be the set of pairs with links. In terms of the
link prediction task, we would like to ﬁnd two latent matrices U = [u1, u2, . . . , u] ∈ R× and
V = [v1, v2, . . . , v] ∈ R× ,with  being the number of latent dimensions, by solving the

139

following optimization problem:


(cid:16) (cid:107)U(cid:107)2

 + (cid:107)V(cid:107)2


(cid:17)

(6.1)

(T  − u(cid:62)

 v )2 + 1
2

min
U,V

1
2

(, )∈T

where u and v are the user latent vectors representing giving and receiving link behaviors of ,
respectively. Thus, u(cid:62)
 v  models the sign of a link from  to  , and therefore after optimizing the
above formulation, we can use such inner products as a prediction for unknown user-user signed
links in the network. Note that (cid:107)U(cid:107)2
 denotes the Frobenius norm of U and is used as a regularization
term to prevent overﬁtting, similarly for V, and both are controlled by the hyperparameter 1.

Interaction Polarity Prediction: Let H = {(,  ,  ) | H ࣔ 0, A   ࣔ 0} be the set of
interaction triplets and H denotes the opinion from  to the content  authored by  . The
main diﬀerence between the basic model for this task from traditional matrix factorization based
recommender systems is that we now have a third piece of information, the author. Thus, rather
than taking the typical user-item formulation, we instead want to formulate the model so that we
can include information about the author of the content.

In this problem, we wish to ﬁnd three latent matrices P = [p1, p2, . . . , p] ∈ R×, Q =
[q1, q2, . . . , q] ∈ R×, and S = [s1, s2, . . . , s] ∈ R×, where p and q respectively denote
the giving and receiving interaction behaviors of , and s is the latent vector for content .
One way to represent this would be to ignore the author and want p(cid:62)
 s to model the interaction
between user  on content  that was authored by  . Similarly, we could ignore the content and
only use the author, i.e., p(cid:62)
 q ), but each of these are lacking information. Hence, we propose to
use p(cid:62)
 (q  + s, which includes both the context of the author and the content itself. These three
matrices can be obtained via solving the following optimization problem:

1
2

min
P,Q,S

the term (cid:0) (cid:107)P(cid:107)2

(, , )∈H
 + (cid:107)S(cid:107)2

 + (cid:107)Q(cid:107)2


(H − p(cid:62)

 (q  + s))2 + 2
2

(cid:1) is introduced to avoid over-ﬁtting, which is controlled by


 + (cid:107)Q(cid:107)2

 + (cid:107)S(cid:107)2

(6.2)

2. Note that another way of modeling could be to linearly combine the author and content
representation. In that way we could deﬁne M ∈ R×2 with p(M(q ||s)), where || is used

140


(cid:16) (cid:107)P(cid:107)2

(cid:17)

to denote concatenation. However, this would add extra complexity by needing to learn M, so we
use p(cid:62)
 (q  + s), and leave other formulations as future work. Next we will discuss how to capture
correlations based on the two aforementioned basic models.

6.1.3.2 Modeling User Opinion Correlations

In Section 6.1.2, we found that the giving (or receiving) behaviors in terms of links and interactions
In the basic models from Subsection 6.1.3.1, we use u and v to denote users’
are correlated.
behaviors when giving and receiving links, respectively. While we use p and q to respectively
indicate users’ behaviors when giving and receiving interactions, separately. Therefore, we can
capture the opinion correlations by bridging the two giving behaviors via u and p, and the two
receiving behaviors via v and q.

Since the two giving behaviors are correlated, we can ﬁnd a linear mapping matrix W ∈
R× that can map ’s latent vector u, which denotes his/her underlying behavior on how to
create links, to the latent vector p, which captures their behavior towards how they give opinions
to the content authored by other users in the network. Given a set of latent vectors for all users
 ∈ U, it can then be easily seen that the linear mapping between them would be a solution to the
following optimization problem:

(cid:107)Wu − p(cid:107)2
2

(6.3)


min
WO

∈U

Similarly, we seek to ﬁnd a matrix W ∈ R× to represent the mapping between the user  ’s
latent vectors v  , and q , which denote their receiving behaviors of receiving links and interactions,
respectively. The mapping W can be learned as follows:


(cid:13)(cid:13)Wv  − p 

(cid:13)(cid:13)2

2

min
WI

 ∈U

(6.4)

Eqs. (6.3) and (6.4) can capture opinion correlations for links and interactions. They also allow us
to bridge the two basic models for link and interaction polarity predictions together. Next we will
introduce the proposed joint framework.

141

6.1.3.3 The Proposed Joint Framework

Now we have formulated a model on how to optimize a linear mapping between both the giving
and receiving behaviors in the two tasks. Next we show how these mappings can be used as two
additional terms in our joint matrix factorization framework, LIP, for the purpose of joint link and
interaction polarity prediction. LIP solves the following optimization problem:

L(U, V, P, Q, S, W , W)

(T  − u(cid:62)

 v )2

(H − p(cid:62)

min

U,V,P,Q,
S,W ,W

(, )∈T


(cid:16) 
(cid:16) (cid:107)U(cid:107)2
(cid:16) (cid:107)W(cid:107)2

∈U

=

1
2

+ 
2

+ 
2

+ 1
2
+ 3
2

(, , )∈H

(cid:17)

(cid:13)(cid:13)2

2

 (q  + s))2

2 + 
(cid:13)(cid:13)Wv  − q 
(cid:16) (cid:107)P(cid:107)2
(cid:17) + 2
(cid:17)

 + (cid:107)Q(cid:107)2

 ∈U

2

 + (cid:107)S(cid:107)2


(cid:107)Wu − p(cid:107)2


 + (cid:107)V(cid:107)2
 + (cid:107)W(cid:107)2


(cid:17)

(6.5)

where the ﬁrst term is a standard user-user matrix factorization model (as discussed in Subsection
6.1.3.1) for the link prediction problem. The second term is a modiﬁcation to the user-review
matrix factorization model that also incorporates the additional vector q  ∀  ∈ U to represent
the inﬂuence of the author   in the prediction of ’s opinion on , when  was written by  .
The third and fourth terms capture the correlations of giving and receiving behaviors, respectively,
and their contributions are controlled by a hyperparameter . Other terms in Eq. (6.5) are added to
avoid overﬁtting.

We note that the balance between optimizing for the two tasks (sign link prediction and user
interactions polarities) is balanced by the hyperparameter , where a small increase in this value will
result in an increase to the importance of the user interaction polarity prediction task, and similarly
towards the link prediction task when decreasing its value. Also, this transfer of information between
problems is done by the linear mapping used in LIP (more speciﬁcally the terms controlled by 

142

in Eq. (6.5) ). If a user  has no link information, they are deemed a cold-start user in the link
prediction task. Thus there is no way to learn u and v in the basic model and we fail to do link
prediction for . However, if  has had some interactions with other users in the network, we can
learn p and q from his/her interaction data. Thus, the proposed framework LIP can also learn u
and v via the model components of capturing giving and receiving correlations via the third and
fourth terms in Eq. (6.5). Similarly, LIP can also help when  has no interaction data but has link
information. Via the above analysis, we note that LIP has the potential to mitigate the data sparsity
and cold-start problems in either link prediction or interaction polarity prediction.

6.1.3.4 An Optimization Method for LIP

Given the the optimization objective shown above, we now present how to solve this problem.
We have chosen to use stochastic gradient descent (SGD) due to the non-convexity of the joint
optimization formulation. First, we compute the partial derivatives with respect to each of the
parameters (i.e., u, v  , p, q  , s , W, and W) and then iteratively update them using SGD until
convergence. We use the combined training data X = {T ∪ H}, where T and H are the link and
interaction training data, respectively.

For simplicity in the below, let T  = (T  − u(cid:62)

 v ) be the error of estimating the link (which
in some social networks, such as Epinions, can represent trust-distrust) from user  to user  ,
H  = (H − p(cid:62)
 (q  + s)) be the error of estimating the interaction value user  gave to content
 that had been authored by user  ,  = (Wu − p) be the error for our linear mapping from
user ’s latent vector u (representing the way they give links) to their latent vector p (representing
how they interact with content created by others), and ﬁnally, we denote  = (Wv  − q ) be the
error for our linear mapping from user  ’s latent vector v  (representing the way they receive links)
to their latent vector q  (representing how the content they had authored receives interactions).

Gradients of L with respect to U and V: The gradients of Eq. (6.5) w.r.t. u and v  are as

follows, respectively:

143

{  | (, )∈T}


{ | (, )∈T}

(cid:16) − T  v 
(cid:16) − T  u

L(U)
u

=

L(V)
v 

=

(cid:17) + W(cid:62)
(cid:17) + W(cid:62)

  + 1u

  + 1v 

Gradients of L with respect to P, Q, and S: The gradients of Eq. (6.5) w.r.t. p, q  and s

are the following, respectively:

{(, ) | (, , )∈H}

L(P)
p

=

L(Q)
q 

=


(cid:16) − H  (q  + s)(cid:17) −  + 2p
(cid:17) −  + 2q 
(cid:17) + 2s

(cid:16) − H  p
(cid:16) − H  p


{ | (, , )∈H}

L(S)
s

=

{ | (, , )∈H}

Gradients of L with respect to W and W: Finally, we present the gradients of Eq. (6.5)

w.r.t. W and W, which are shown below, in respective order.


∈U

(cid:16)
(cid:16)

 ∈U

L(WO)
W

=

L(W)
W

=

(cid:17) + 3W
(cid:17) + 3W

u(cid:62)


v(cid:62)


With update rules to optimize Eq. (6.5), we use SGD to optimize the framework using the combined
training data X = {T ∪H}, where T and H are the link and interaction training data, respectively.

144

Algorithm 6.1: The optimization method for the proposed LIP framework.
Input: T = {(,  )|T  ࣔ 0} be the set of pairs with links and

H = {(,  ,  )|H ࣔ 0, A   ࣔ 0} be the set of interaction triplets

Output: U and V for link predictions; and P, Q, and S for interaction polarity predictions
1 Randomly initialize U, V, P, Q, S, W, W
2 Construct the learning data set X = {T ∪ H}
3 while Not convergent do
4
5
6
7

Calculate gradients of L(U, V, P, Q, S, W , W) w.r.t. u, v  , W , and W

Shuﬄe(X)
foreach  ∈ X do
if  ∈ T then

if  ∈ H then

8
9

10

Calculate gradients of L(U, V, P, Q, S, W , W) w.r.t. p, q  , s , W , and W

Update the respective parameters using gradient descent methods

Note that although there are additional methods for optimizing matrix factorization based methods,
SGD has been shown to be both eﬃcient and easy to tune, e.g., adaptive learning rates.

With gradients calculated above to optimize Eq. (6.5), the detailed optimization algorithm is
presented in Algorithm 6.1. Next we brieﬂy introduce Algorithm 6.1.
In line 1, we randomly
initialize model parameters. In line 2, the learning data includes links and interactions. From line
3 to line 14, we use stochastic gradient decent to optimize the framework. In particular, for each
iteration, we ﬁrst shuﬄe the data in line 4; and then update model parameters using gradient decent
methods from line 5 to line 12. When having a signed user-user link training example, the algorithm
utilizes lines 6 through 8 to calculate the gradients, as compared to when having an interaction
training example, lines 9 through 11 are used. Then, on line 12, the model parameters for the
respective part of the problem (based on whether we are updating on a signed link or interaction)
can be updated using a gradient based method.

6.1.4 Experiments

In this section, we conduct experiments to answer the following two questions: (1) Can our joint
model help alleviate the sparsity problem in these two prediction tasks? (2) Do the terms based

145

upon correlated user opinions/behaviors in LIP provide a transfer of information between the two
problems? To address the ﬁrst question, we perform experiments in which we increase the sparsity
of the training data and compare the performance with representative baselines. We address the
second question by examining if our algorithm is robust to handle some cold-start users. In the
next subsection we will further introduce our dataset and how it was used, the metric used in
evaluating the two prediction tasks, then we introduce the experimental settings for the two types
of experiments we have performed.

Experimental Settings: As mentioned in Section 6.1.2, we have collected a dataset from
Epinions for these experiments. Note that for the purpose of this study, we have ﬁltered our
collected Epinions dataset to form more dense user-user and user-content matrices. The ﬁrst step is
to pre-process the data such that we have the appropriate training, validation, and testing sets from
our dataset.

The ﬁltering we perform only keeps users that have both given and received a link, and also
requires the users to have given at least one helpfulness rating and have also authored at least one
review that has received at least one helpfulness rating. For all selected users to be ﬁltered out,
we remove all their user links, reviews they had written, and helpfulness ratings associated with
that user. The reason for this ﬁltering is that it will allow us to later remove portions of the data to
artiﬁcially create training sets that have a varying percentage of cold start users and also diﬀerent
levels of sparsity and therefore seemingly becoming more similar to the raw dataset.

The initial extended Epinions dataset had contained 233,429 users, 841,373 user-user links,
and 13,668,105 helpfulness ratings. After the above mentioned ﬁltering process, we were left with
29,901 users, 600,976 user-user links, and 11,555,599 helpfulness ratings. The dataset has been
randomly split into 70% for training, 10% for validation, and 20% for testing. Note that we then
balanced our testing dataset to be 50% positive and 50% negative similar to that done in [4].

For all the models that required hyperparameters to be tuned, we used the validation set to obtain
the best hyperparameters for each respective model. Also, the hyperparameter settings for each
experiment was ﬁxed (e.g., all LIP results for the ﬁve varying cold-start experiment were selected

146

based on a commonly “good” set of hyperparameters for all ﬁve percentages, and not separately for
each of the ﬁve). However, between the two experiments, we allowed for diﬀerent hyperparameters
as the dynamics of cold-start users and varying the amount of induced sparsity required a diﬀerent
set of hyperparameters for our model and similarly for the baselines. To evaluate and compare the
performance of LIP we present the F1 measure for the interaction polarity and the link prediction
tasks. Note that the higher the value, the better the performance.

6.1.4.1 Sparsity Experiments

To answer the ﬁrst question, we compare the proposed framework, LIP, with existing interaction
polarity and link prediction methods. We ﬁrst present the baselines for the interaction polarity
prediction task followed by those for the link prediction task.

We choose the following representative interaction polarity prediction baselines for comparison:

• uCF: User-based collaborative ﬁltering approach where we used the ﬁve most similar users (in
terms of cosine similarity) based on their helpfulness rating history for making the predictions.
For details on collaborative ﬁltering please see [190]. We use the user-based collaborative
ﬁltering approach as our ﬁrst baseline for predicting the user interaction polarities. Here we
present the results where we used the ﬁve most similar users (in terms of cosine similarity)
based on their helpfulness rating history for making the predictions.

• MF: Our low-rank matrix factorization method as shown in Eq. (6.2). Here a comparison is
made with the low-rank matrix factorization method, that attempts to ﬁnd a lower dimensional
representation of the user-review matrix. Note this follows the same formulation as that in
Eq.(6.2) where we use the matrices P, Q, S, and H equivalently as they are in LIP for the
predictions.

For link prediction, the representative baselines are presented below and details of the methods

can be found in their respective cited work.

147

• SSA: A spectral based method using the signed Laplacian matrix [153] and regularized
Laplacian kernel [191] is used. Due to the fact this method was presented for undirected
networks, we convert the directed link information by making T symmetric, thus resulting
in an undirected network, we use the undirected version of the dataset by removing the
directions of the links, but keep the testing set the same.

• HOC-3:

It is an approach that was based on the social balance and status theories [2].
Features for a supervised approach are extracted from triads and also node features (e.g.,
number of incoming positive edges). A total of 23 features are created based on 16 possible
directed triad conﬁgurations, and 7 node features. The details of this method can be found in
[4].

• MF: Low-rank matrix factorization method as shown in Eq. (6.1), which was ﬁrst introduced
in [131]. The ﬁnal comparison is with the low-rank matrix factorization method, which was
ﬁrst introduced for this problem in [131]. This is the natural baseline predictor for our model
since LIP is built upon this MF technique. This method optimizes the squared error, has the
regularization hyperparameter , and uses SGD. We note that it is formulated just as seen in
Eq.(6.1) and the matrices U, V, and T are used equivalently to those found in LIP.

In the ﬁrst experiment, we are able to simulate a ranging sparsity across each user, since we
have already limited our attention to a subset of the data that is denser than the original dataset.
We remove x% of the links and interactions for each user and vary x in {50, 60, 70, 80, 90}. we
are able to simulate a ranging sparsity across each user. We vary the sparsity of the dataset by
removing 50% to 90% of the data, in increments of 10%.

Experimental Results: The interaction polarity prediction results can be found in Figure 6.3(a).
Most of the time, we see that the baseline MF method outperforms the user-based collaborative
ﬁltering method. Similarly, we have LIP ﬁnding signiﬁcant gains over MF across the levels of
sparsity induced. Another thing to mention is that since we had ﬁrst increased the density of the
user-review matrix , it is not until the 80% sparsity that the density of the network drops below that

148

(a) Interaction Polarity Prediction.

(b) Link Prediction.

Figure 6.3: Experimental results with varied sparsity settings.

of the original matrix H. Therefore in fact at 80% sparsity the density of this induced sparse network
is quite similar to that of the original network. We report the results of the sparsity experiments
for the link prediction in Figure 6.3(b). LIP and MF obtain much better performance than SSA
and HOC-3. We are able to observe that LIP performs comparable to the MF method for the lower
sparsity settings, but upon reaching the higher sparsity level, LIP achieves better performance than
MF.

From the results in the sparsity experiment, we have seen LIP’s ability to help alleviate the
sparsity problem found in the interaction polarity and link prediction tasks; thus providing evidence
that our joint framework is able to partially alleviate the sparsity problem inherent in signed net-
works. More speciﬁcally, we see a signiﬁcant improvement in the interaction polarity predictions,
and increasing improvement for the link prediction with the increase of the sparsity.

6.1.4.2 Cold-Start Experiments

Note that one of the main contributions of this work is the ability of the framework to handle not just
the data sparsity problem, but also to help alleviate issues that are commonly faced with cold-start
users in signed networks, which are quite common characteristics in these datasets. Therefore
to answer the second question, we compare LIP with existing algorithms that are able to handle
cold-start users in both of the two prediction tasks.

For this experiment we want to empirically evaluate the robustness of LIP when faced with

149

networks having cold-start users. Note that this is a very diﬃcult problem to overcome due to
the fact if there is no knowledge about a user in a certain domain, then it becomes diﬃcult, if
not impossible, to make reasonable predictions involving them. However, since LIP is jointly
predicting the signed links and user interaction polarities, the opinions formulated in one task can
power those in the other task and simultaneously they should be able to gain information for users
that previously had none in one of the tasks.

For the cold-start setting, we choose the following user interaction polarity prediction baselines:

• RG: The random guessing method for user interactions ﬁrst calculates the class distributions,
and then selects randomly based on that distribution to make predictions for unknown values.

• AvgG: The average guessing method (AvgG), ﬁrst calculates the average interaction value
found in the entire training set, next it predicts that value for all missing values, and then it
predicts that same value for all other edges in the network that have yet to be assigned.

• MFwRG: We note that the typical matrix factorization method would not be applicable in
this experiment, since if we have no training information for a given user, then the latent
vectors of such users would never be updated. Thus, this would leave the predicted value
to be assigned based on the dot product of two randomly initialized vectors. So instead we
modify MF by adding the condition that if either of the two users’ vectors have not been
updated (i.e., they had no data in the training set and thus are a cold-start user), then instead
of using the dot product as we normally would with MF for predicting links, we instead use
the RG method for the given link.

We note that the typical matrix factorization method would not be applicable in this experiment,
since if we have no interaction information for a given user, then the latent vectors of such users
would never be updated. This would leave the predicted value to be assigned based on the inner
product of two randomly initialized vectors. Thus, we modiﬁed MF by adding the condition that
if either of the two users’ vectors have not been updated (i.e., they had no training interaction data

150

Table 6.2: Interaction polarity prediction cold-start results.

Induced Percent Cold Start Users
Method
5% 10% 15% 20% 25%
0.655
0.655
RG
0.667
AvgG
0.667
0.739
MFwRG 0.769
LIP
0.773
0.763

0.655
0.667
0.746
0.766

0.655
0.667
0.764
0.771

0.655
0.667
0.754
0.769

Table 6.3: Link prediction cold-start results.
Induced Percent Cold Start Users
Method
5% 10% 15% 20% 25%
0.640
RG
0.641
0.797
MFwRG 0.848
LIP
0.860
0.839

0.641
0.825
0.853

0.641
0.837
0.858

0.641
0.813
0.848

and are therefore a cold-start user), then instead of using the inner product as we normally would
with MF for predicting links, we instead use the RG method for that given link.

We compare the proposed framework LIP with the following link prediction baselines:

• RG: Randomly guess missing links to be positive or negative based on training data class

distribution.

• MFwRG: This method has the identical extension for the cold-start users as described in

MFwRG for the interaction polarity prediction task.

For these experiments, we vary the percentage of users that become cold start users in a given
task, but do not modify the testing set. We randomly select x% of the users and remove all their
links, then randomly select x% of the users (who we have not already selected) and remove their
interaction information while varying vary x in {5, 10, 15, 20, 25} i.e. the number of cold-start
users from 5% of the training dataset users to 25%, in intervals of 5%, thus making 5 data subsets.
Experimental Results: Table 6.2 holds the results of the cold-start experiments for the in-
teraction polarity prediction task when varying the number of cold start users. The very naive
baseline RG is just shown to provide a reference for the F1 measure, but the MFwRG is expected to
perform quite well. In this table, we are able to observe LIP’s superiority over the baseline methods

151

when observing cold-start users. We also see that LIP’s performance as compared to the baselines
drastically increases as the number of cold-start users increases, which is extremely intuitive based
upon the use of the correlation terms. This is because even if a user has no current helpfulness rating
information, LIP is able to transfer information (i.e., their opinions) through the linear mapping
matrices W and W and use information that the user had from their link information.

In Table 6.3, we present the link prediction results when varying the amount of cold-start
users in the training set. Upon seeing these results the advantages of LIP over the other baseline
methods become even more obvious. We note that whenever MFwRG has the ability to learn a
low dimensional representation for a user, it can then perform the prediction using it’s learned
low dimensional latent vectors. But when there is no link information for a given user, then the
user must resort to randomly guessing. Similarly to the interaction polarity prediction task, as the
percentage of cold-start users increases, the performance gap in terms of F1 becomes larger in favor
of LIP having the best prediction.

6.1.4.3 Experiment Discussions

This leads us back to our second question, where we set out to determine if the linking terms based
upon the correlated user opinions in LIP are able to provide a transfer of information between the
two tasks that ultimately have a user’s opinions in one task power the other. Based upon the results
presented in this section, for both the sparsity and the cold-start experiments, we have shown that
indeed LIP is able to utilize the inherent correlations behind the opinions expressed in the two
tasks to boost the performance in both the prediction tasks simultaneously. Next we present our
analysis on the hyperparameters of LIP. We seek to not only to gain a better understanding of the
relation between these two prediction tasks (i.e., ), but perhaps even more important in this study,
is the focus on , since it controlled the amount of opinion information to be transferred from one
prediction task to the other; speciﬁcally the ones that control the correlation terms and the balance
between optimizing the interaction polarity prediction task along with the link prediction task.

Based on the above experimental results we have successfully veriﬁed our claim that our joint

152

matrix factorization model using additional terms for modeling the fact that user’s in social networks
express their opinions in correlated ways across tasks when faced with sparse datasets. However,
the most obvious claim we are now able to express is that LIP does indeed help alleviate the
cold-start problem over the baseline MF method and the other baselines. In the next subsection we
perform a hyperparameter analysis to gain a better understanding to not only the relation between
these two prediction tasks (i.e., ), but perhaps more importantly in this study is the focus on ,
since it controlled the amount of opinion information to be transferred from one prediction task to
the other.

6.1.4.4 Parameter Analysis

First we will discuss the hyperparameters used in LIP. Thereafter we discuss an analysis on some
of the important hyperparameters in our model.

In this work 1, 2, and 3 are used as the typical regularization hyperparameters and we noticed
they behave normally. In fact, they could be collapsed into a single regularization hyperparameter
 without much change to the performance (as compared to splitting them into three separate
hyperparameters). The other hyperparameters are quite necessary and typical for joint modeling
(and similarly for cross-domain recommendation problems). For  this is used to balance between
the two tasks, which is assumed to result in large changes in performance when varying this
hyperparameter greatly. This is because it controls to what extend the optimization is favoring
higher performance (perhaps at the cost of the other) for one of the two problems over the other. As
for , we have introduced this as a Lagrange multiplier used to solve this challenging optimization
problem. In other words, based on our analysis, it appears there should be a transformation between
the two domains of links and interactions, and to solve this problem we have relaxed this constraint
of ﬁnding such a mapping to instead ﬁnd a mapping with minimal error (since we also assume
the data is noisy). Hence, we introduce the hyperparameter  to solve the optimization problem.
Finally, we have  and  that denote the length of the representations in the link and interaction
domains, respectively. These are the typical hyperparameters for embedding based methods, and

153

(a) Link prediction.

(b) Interaction polarity prediction.

(c) Trade-oﬀ between the two tasks
(i.e., mean of 6.4(a) and 6.4(b)).

Figure 6.4: Performance variations of LIP on the 90% data sparsity experiment w.r.t.  and .

we have observed similar results as other methods that vary the embedding, i.e., the performance
starts to increase, but then drops once the embedding becomes too large. Next we will discuss an
analysis on  and  as these are the most interesting hyperparameters of LIP.

The hyperparameters  and  control the balance between optimizing the link prediction and user
interaction polarity tasks, and how strongly to keep the two tasks low dimensional representations
correlated, respectively. In this subsection, we perform an analysis on how changing these two
hyperparameters aﬀects the performance of LIP. We ﬁrst ﬁx all other hyperparameters (i.e., the
regularization hyperparameters 1, 2, and 3 and dimension sizes  and ) based upon the
best hyperparameters found against our validation set when performing a grid search over the
hyperparameter space. We evaluate the performance on all paired (, ) values while we vary
the value of  as {0.25, 0.5 0.75, 1.0, 1.25} and  as {0.0001, 0.001, 0.01, 0.1}, providing us
with 20 possible combinations for running the grid search. Although the best hyperparameter
settings varied between the two above mentioned experiments, we only display one representative
from the sparsity user experiment, since we have similar observations in every other experimental
setting. We present the analysis on the 90% sparsity experiment since it had the most variation in
performance across the diﬀerent settings.

In Figure 6.4, we have shown the 3D surfaces for the mentioned combination of hyperparam-
eters.
In Figure 6.4(a), we can see that  = 0.01 is shown to clearly be a good region for this
hyperparameter, as both to the left and right the performance in terms of F1 drops for the link

154

prediction. However, there is little to no signiﬁcant diﬀerence between the link predictions when
varying  in the range provided. It can also be noticed that for the interaction polarity prediction task
(seen in Figure 6.4(b)) the larger  leads to much better performance, which intuitively makes sense
because a larger  relates to increasing the weight of how much we were to optimize the interaction
polarity prediction as compared to the link prediction task. Unlike what we observed in the link
prediction task, the interaction polarity prediction performs better with a smaller ; meaning the
two tasks have a diﬀerent preferred weight to be associated with the correlation between the user
latent vectors.

Finally, in Figure 6.4(c) shows that there is a drastic trade-oﬀ between the two tasks. Where
if one of the tasks has a large increase in F1, then the other task becomes slightly worse. Thus to
obtain better performance in both tasks, we would want to choose a hyperparameter setting such
that the trade-oﬀ between the two tasks is balanced. Based on our analysis such a point would have
 = 0.01, but as for the value of , there is not a decisive value to choose. Thus, we have shown
that the balance between optimizing the two tasks is not very sensitive, although from the ﬁgure it
appears choosing  = 0.75 has a slight advantage in both of the two tasks.

6.2 Congressional Vote Prediction

Recently there has been an enormous interest in computational approaches to solve political
science related problems, especially in relation to political elections and congressional voting. With
the seemingly ever-growing tension between the two dominant political parties in the U.S. [185],
congressional representatives are receiving immense social pressure towards blindly following their
political party and associated leaders. However, due to the nature of some representatives refusing
to give up their beliefs and ethical grounds, they sometimes vote against their party or cast no vote;
thus resulting in a highly complex system.

Although knowing the voting behaviors in the congressional system are undoubtedly compli-
cated, we remain diligent towards the goal of being able to predict and understand them. If we
can construct better vote prediction models, we could utilize this information to better inform the

155

public of the real intentions of those running for re-election on upcoming critical issues. Similarly,
congressional leadership could utilize these models for speciﬁcally targeting potential swing voters.
We recognize and identify two sets of eﬀective factors. The ﬁrst set being ideological factors,
which are well recognized to play an important role in the U.S. congress [192, 193] and come
from both the congressional representatives as well as the ideology of the bills, whose values and
beliefs are woven deep into the content of the bills. The second set of inﬂuential factors are social
factors and are in relation to 1) the party aﬃliations of representatives, and 2) how their past voting
recording intertwines with other representatives.
In relation to the ﬁrst social factor, it is well
known that representatives in the U.S. Congress are polarized [194, 195, 196, 197, 185]; and thus
likely to follow their political aﬃliation when casting their votes (although not all the time). As for
the second social factor, we propose the voting records to be modeled as a signed bipartite social
network (i.e., contains both positive and negative connections) between the representative and the
bills [116], which opens the door to extracting a plethora of novel predictive features.

We propose an end-to-end framework Multi-Factor Congressional Vote Prediction (MFCVP)
that ﬁrst utilizes Wikipedia4 pages of the representatives to learn an embedding that encodes
ideological information associated with each representative. Then, for the bills, we use their texts
to directly learn an embedding that encodes their semantic ideological information. Next, we utilize
signed network analysis to ﬁrst construct a bipartite voting network between the representatives
and the bills, followed by harnessing powerful signed social theories to construct novel features.
Finally, all the extracted features coming from multiple factors are combined to be utilized for vote
prediction.

6.2.1 Problem Statement
To introduce the problem, we ﬁrst denote the set of  representatives as R = {1, 2, . . . , }.
We let B = [1, 2, . . . , ] denote the sequence of bills associated with the past  roll-call
votes for which we know the voting outcomes. These voting outcomes are denoted in the set

4https://www.wikipedia.org/

156

Table 6.4: Notations regarding congressional vote prediction.

Notations Descriptions

R
B
V
˜B
˜V

 
 ( ˜ )

The set of representatives.
The set of past roll-call votes and their bills.
The set of past votes R gave on B.
The set of future roll-call votes and their bills.
The future votes we seek to predict.
The representative  when they are voting.
The sponsor of bill  .
The set of cosponsors for the bill  .
The vote associated with voter  on bill   (˜ ).

V = { | voted on bill  } and   ∈ {+,−, }, which denotes a “yea”, “nay”, or “present”/“no
vote”, respectively. Furthermore, we have the sequence of
˜ future roll-call bills denoted as
˜V = {˜ | will vote on bill ˜ }
˜B = [ ˜1, ˜2, . . . ˜ ˜]. The sequence ˜B has corresponding votes
which we seek to predict . We then denote any additional contextual feature or those extracted from
the past votes as the set X. Note that these notations and others used throughout the paper can be
found in Table 6.4. Finally, we can formally deﬁne the congressional vote prediction problem as
follows:

Given a set of congressional representatives R, a sequence of past roll-call votes on the bills B
having associated votes V, features X, and a future sequence of the upcoming roll-call votes on
the bills ˜B, we seek to learn a model  as follows:

 : {R, B,V,X, ˜B} → ˜V

(6.6)

6.2.2 Overview of Multi-factor Congressional Vote Prediction (MFCVP)

For congressional vote prediction, we must overcome the challenges of how to represent the
underlying factors inﬂuencing the voting system and how to handle this added complexity introduced
by incorporating multiple factors. To address these we propose the end-to-end framework Multi-
Factor Congressional Vote Prediction demonstrated in Figure 6.5. More speciﬁcally, MFCVP will
utilize ideological factors and social factors, which the latter consists of factors coming from both

157

Figure 6.5: The proposed Multi-Factor Congressional Vote Prediction (MFCVP) framework.

a network and political party aﬃliation perspective. We ﬁrst explain how these diﬀerent factors
are represented through both learning embeddings and constructing novel hand-crafted features.
Thereafter, we discuss how the representations of diﬀerent factors are combined and used for the
vote classiﬁcation.

6.2.3

Ideology Factors of MFCVP

The ﬁrst set of factors is ideology factors.
It is without a doubt that representatives’ ideology
and ideological information reﬂected in a bill are inﬂuencing how a voter will vote on a bill. To
eﬀectively and comprehensively represent ideology factors in our framework, we recognize and
propose the use of two other entities (besides a bill and a voting representative) which are associated
with ideology factors, namely the sponsor and possible cosponsor(s). These two entities are
essentially representatives who construct and promote a bill. Hence, we seek to learn representations
about the beliefs and values of the voters, sponsors, and cosponsors, along with those that are present
in the bills.

To represent the representatives, many previous works focused on ideal point models [192,
198, 199]. Nevertheless, ideal point methods require many assumptions about voter behaviors

158

SponsorBillCosponsor(s)VoterDoc2VecDoc2VecDoc2VecDoc2VecIdeological EmbeddingSocial FactorsNetworkPartySocial TheoriesOne-hotNetwork FeaturesParty FeaturesIdeological Factors||ClassificationPositiveNegative No vote which are inherently highly complex, so instead, it seems more natural and reasonable to extract
a vector representation from the raw data [195, 193] (e.g., Wikipedia pages that are collectively
written about the representative from the large online community). Furthermore, extracting vector
representations are practically more feasible than attempting to compute ideal points [192] which are
also open for biases in their human construction. Given the more recently developed deep models
for extracting meaningful representations for text documents, we propose to utilize doc2vec [200] as
an eﬃcient embedding method to represent the ideological factors. Doc2vec has shown signiﬁcant
improvement in many approaches [201, 202].

We use Wikipedia pages to learn a representation for each of the congressional representatives
using doc2vec as illustrated in Figure 6.5. We combine all textual information about a representa-
tive from their Wikipedia proﬁle page as a single document. Then, we train a doc2vec model which
learns a compact embedding about each entire document (i.e., a representative’s Wikipedia page)
encoding the semantic information about a representative including their political ideology. Due
to the fact that voters, sponsors, and cosponsors are all representatives, we utilize the same repre-
sentation obtained through the learned embeddings of our trained doc2vec model (i.e., the doc2vec
model fed with Wikipedia pages of representatives) as the ideology factors for the representative in
all three roles. We should emphasize that in our experiments we utilize historical Wikipedia pages
to ensure there is no data leakage.

The usefulness of Wikipedia is that this ideological perspective is less susceptible to biases or
falsehoods since it is maintained by a large community. However, other data sources could be used
to obtain the ideological representation, such as the generated content of voters on social media
(e.g., their tweets on Twitter5 or their campaign ﬁnancial information as to which organizations are
supporting them. We leave connecting other sources of data about the congressional representatives
as one future work. Finally, we let ,   , and   denote embeddings of the voter  ∈ R, the
sponsor   ∈ R sponsoring the bill   ∈ B ∪ ˜B, and the cosponsor   ∈ R cosponsoring the bill
  ∈ B ∪ ˜B for the votes   ∈ V ∪ ˜V.

5http://www.twitter.com

159

The textual content of the bill oﬀers very essential information.

In fact, the text of a bill
reﬂects both the conscious and sometimes even subconsciously instilled ideologies of the sponsor
and cosponsors who prepared it. Therefore, it is of great importance to eﬀectively represent the
semantic information about a bill in a compact and eﬃcient way. To achieve this, similar to our
embeddings for the representatives, we utilize a doc2vec model to represent the bills, where each
bill’s textual data (after some preprocessing) is considered as a document. Let   be the learned
embedding of the bill   ∈ B ∪ ˜B. Note that we train the bill doc2vec model on B.

We can now succinctly represent the set of embedded ideological features that we will utilize
when considering the relation between a voter  and a bill   (along with their sponsor and
cosponsor(s),   and  , respectively) as E  = { ,  

,   ,   }.

6.2.4 Social Factors of MFCVP

Having discussed the ideological factors that get incorporated into MFCVP, here we discuss the
more novel social factors (with an emphasis on the network features) that have been commonly
overlooked by previous methodologies and analyses in relation to the predictions and understanding
of congressional votes. We propose to categorize these social factors into two main groups as
follows: 1) political party aﬃliation features, and 2) features coming from the network constructed
from the past voting records. Next we discuss these two feature categories.

6.2.4.1 Party Features

The inspiration of these features for our proposed framework comes from the fact that sometimes
there is an inﬂuence coming from voters of a political party to cast their votes aligned with the
party’s interest.

Given a single vote   made by voter  on bill   that was sponsored by   and cosponsored by
the set of representatives  , we construct the corresponding features ,   , and   to represent
their party aﬃliations, respectively. More speciﬁcally,  and   are one-hot vectors indicating
the aﬃliated party of the voter and sponsor, respectively. Then for the set of cosponsors   we

160

obtain the distribution of the cosponsors across the party aﬃliations. Note that if there are no
cosponsors, we simply use a vector of zeros for   . These three features are represented in the set
of features P  = { ,   ,   }.

6.2.4.2 Signed Bipartite Network Features

Typical network representations that are used for congressional voting records are the two one-
mode networks coming from a bipartite network which ultimately separates and/or condenses
the “yea” and “nay” votes [203]. However, this is inherently destined to lose drastic amounts of
vital information that could have perhaps been extracted if using network analysis techniques that
incorporate the “yea” and “nay” votes simultaneously. Therefore, we propose a more advanced
representation - signed bipartite network.

Let G = {{R ∪ B},V} denote the signed bipartite network that is constructed using the set
{R ∪ B} of  +  nodes (i.e., the representatives and bills), and set of links (i.e., votes V) between
them where we treat “yea”, “nay”, and “no vote” as a positive, negative, and non-existent link in
the signed network. Now, given that we have modeled the voting history in the form of a signed
network, we can utilize signed social theories to extract insightful features. More speciﬁcally, we
utilize balance theory, which colloquially can be summarized as “a friend of a friend is a friend”
while “an enemy of a friend is an enemy” [30, 29].

The ﬁrst set of features we construct when considering the relationship between a voter  and
a bill   can be seen in Figure 6.6(a). We can observe that we want to extract information on how
the voter  and the sponsor   have interacted together on other bills  to gain information on
how  might vote on the current bill  . We note that there can be 9 possible situations when
considering the triplet (,  ,  ), since both  and   can have either a positive, negative, or no
link to the other bills  ∈ B\{ }. We utilize this information to construct a feature vector
 
   that represents the distribution over the nine aforementioned possibilities where the triangles
involving an even number of negative links are adhering to balance theory. The distribution over
the number of balanced and unbalanced triangles along with the number of open structures (i.e.,

161

(a)  
  

(b)  

  
Figure 6.6: Illustrations of the signed bipartite network features.

those involving at least one “no link”) should provide great insight for our model to discover the
patterns related to this fundamental social theory. Signed triangle distributions have also recently
been used in benchmarking generative signed network models [204], since they hold such rich
information about a signed network.

We note that these features are similar to the ones utilized in the seminal work [4] that focused
on building a supervised model to predict the missing sign between  an  , but here we use   as
a proxy for their introduced bill  . This relates to balance theory because the signed social theory
would suggest that if  has voted equally to   (i.e.,  =   ), then it is likely that  should think
positively towards  . Similarly, we construct a feature vector  
  where instead of using  , we
obtain the average over the cosponsors in the set  .

We furthermore extract the second type of feature from our constructed signed network. In
the ﬁrst network feature (described before), we sought to discover how the overall distribution of
balance between the votes from the voter  and the current sponsors and cosponsors (i.e,   and
 ) towards the rest of the bills . However, unlike the ﬁrst features, here we want to directly
observe how  has interacted on the bills sponsored by   or sponsored by someone in   (i.e., a
more personalized set of social features), which is related to the polarity of their interactions in the
signed network [205]. In Figure 6.6(b) we show an illustration for how we construct the feature

162

vector  
   having length 3. Given the fact that we want to extract information about how  might
vote on  , we observe the distribution over the three possible votes (i.e., positive, negative, or no
link in terms of the signed network) that  has given to all other bills 
 that were also sponsored by
 . Similarly, we construct the feature vector  
  , but rather than observing the vote distribution
over the set of bills 
, instead, we average over 
, which denotes the set of bills sponsored by the
cosponsors   (who has cosponsored  ).

Finally, we construct the full set of network features N  = { 
  

  }, where
|N | = 24. Note that these network features are in fact general and if given additional context (e.g.,
the connections between the voters, sponsors, and cosponsors on Twitter), we could easily extend
these ideas to obtain a larger social context between the representatives; we leave this as future
work along with the use of advanced signed network embeddings [206].

,  

,  

 
,  

  
6.2.5 Classiﬁcation Details of MFCVP

Now that we have discussed all the features coming from multiple factors, we next discuss how we
can utilize them together for training a model for congressional vote prediction. We note that our
framework is ﬂexible in that the choice of the classiﬁer is not ﬁxed and can be chosen based on the
desired outcome. One choice is to utilize a random forest [207] since it is typically an easy oﬀ-the-
shelf model to train and also have the added beneﬁt of being interpretable. More speciﬁcally, feature
importance can be calculated from this model that can give insight into which features are more
important for the correct classiﬁcation of the votes (this will be shown in Section 6.2.6.5). Another
choice could be made to utilize the power of deep learning [208] for obtaining perhaps better
performance in prediction, but losing the ease of interpretation (although we note that interpreting
deep neural networks is a current hot topic ﬁeld in itself). In this work, we utilize both random
forest and a deep neural network as classiﬁers.

163

Table 6.5: US Congress dataset statistics.

113th House of
Representatives
# roll-call votes

# total

“Yea” votes

“Present”/No votes

# total

“Nay” votes

# total

Total
Dataset

499

Train
(80%)
400

Dev.
(10%)

49

Test
(10%)

50

137,926

110,882

12,407

14,637

68,487

54,874

7,790

5,823

8,929

6,934

902

1,093

6.2.6 Experiments

To evaluate the performance of the proposed framework MFCVP, we conduct a set of experiments
for predicting individual representative votes and the overall outcome of the roll-call vote for a set
of new incoming bills when giving a training set of historical information. Through the conducted
experiments, we seek to answer the following research questions:

• Q1: How does the proposed framework perform on congressional vote prediction?

• Q2: How diﬀerent factors contribute to the congressional vote prediction?

Next, we describe the dataset followed by experimental setting. Then, we describe the base-
lines methods and comparison results. We conclude this section by presenting experiments and
discussions on factor analysis.

6.2.6.1 Dataset and Data Collection

For our experiments, we have focused on the 113th U.S. Congress House of Representatives. We
collected the roll-call vote data along with the sponsor, cosponsor, and party aﬃliation from the
Govtrack database6. After obtaining this dataset, we ﬁltered out the roll-call votes not associated
with a bill, joint resolution, concurrent resolution, or a simple resolution; for example, roll-call
votes related to amendments are not included in our dataset. We obtained ideological embeddings

6https://www.govtrack.us

164

for each of the bills based on the bill’s text, which we obtained from the Library of Congress7.
Ultimately, we split the dataset chronologically into three sets i.e., a train set, a dev set, and a test
set as shown in Table 6.5. The training set is constructed with roughly the ﬁrst 80% of the roll-call
votes and all happened before March 5, 2014. Thus, as we mentioned before, to ensure no data
leakage, we searched the historical Wikipedia proﬁle pages for each of the representatives to ﬁnd
the date closest to but before March 5, 2014; this data was then collected and used to obtain our
ideological embeddings.

6.2.6.2 Experimental Settings

First, we obtain the results for the prediction of individual representative votes. Next, we utilize
these individual vote predictions to get the aggregated prediction as to whether the roll-call vote
will pass or fail (which is the overall outcome of the roll-call vote). Since our MFCVP framework
is ﬂexible in utilizing diﬀerent classiﬁers, we utilize a random forest and a deep neural network.
For random forest we utilize the scikit-learn library and we used the PyTorch library for our neural
network implementation. We denote these two as variants of our framework as MFCVP_RF, and
MFCVP_NN, respectively. For the random forest, we use the library default settings. For the
deep neural network model (see Figure 6.5), we employ a multi-layer fully connected network with
Leaky ReLU (Rectiﬁed Linear Unit) [94] as the non-linear activation function. Hyperparameters
are set by the grid search via evaluating the framework on the dev set. Using the grid search, the
number of layers is set to 5 with 100 hidden units and no regularization is utilized. We utilize
ADAM [94] as the optimization algorithm whose learning rate starts from 0.01 and is adjusted
dynamically every 100 optimization steps with the decay rate of 0.9. Each simulation is run 2000
steps with the batch size of 100 votes at each step. The embedding size of doc2vec model is set to
50. We repeat each simulation ﬁve times and report the average F1 score and accuracy in regards
to the test set.

Baselines Methods: To show the eﬀectiveness of our proposed framework MFCVP, we present
7https://www.congress.gov

165

a set of baseline congressional vote prediction methods and discuss why we have selected these
baselines from a political standpoint.

• Random Guess: This method performs a random guess when presented with a vote ˜  ∈ ˜V
to predict for voter  on a future bill ˜ . The random guess is based on the class distribution
of “yea”, “nay”, and “no vote” from the set of past votes V. This method is selected to just
give context into how diﬃcult this problem is as compared to the most naïve approach.

• Personalized Random Guess: Extending the Random Guess method, here rather than a
In
global class distribution, we extract a personalized class distribution for each voter.
other words, to guess the vote ˜  ∈ ˜V we extract the class distribution from the set
{|∀ ∈ B and  ∈ V}. This method is used to test if indeed individual voters
have their own unique patterns in terms of their vote distribution (e.g., one representative
might abstain and not vote signiﬁcantly more often than another).

• Party Voter: This method forces all the representatives to vote aligned with the political
parties. More speciﬁcally, for predicting a vote ˜  ∈ ˜V if the voter  has the same party
aﬃliation of the sponsor ˜  of bill ˜ , then we predict “yea” and otherwise we predict “nay”.

• Sponsor Biased Voter: Given a vote ˜  ∈ ˜V to be predicted, ﬁrst the sponsor ˜  is obtained
from ˜ , and then we obtain the set of all past votes{| ∈ V and ˜  is the sponsor of  ∈
B}. This represents the votes that voter  has given on past bills  that were also sponsored
by ˜  and we choose the highest vote type over the class distribution. The Sponsor Biased
Voter does not necessarily adhere to the political aﬃliation when voting, but they base their
vote on their past experiences with the sponsor of the current bill. In other words, if they
have liked (i.e., voted “yea”) the past bills of this sponsor, then they will again vote “yea”,
having similar reasoning for voting “nay” or “no vote” on a bill.

• Top-K Bills: When seeking to predict the vote ˜  ∈ ˜V this method ﬁrst obtains the
ideological bill embedding   and then ﬁnds the closest  bills  ∈ B based on their

166

Figure 6.7: Performance evaluation of MFCVP predicting individual representative votes.

embeddings  . Top-K Bills method solely bases their vote on the ideological factors of
the proposed bills text. That is to say, predicting the votes using the Top-K Bills method
ignores all direct or indirect party aﬃliations and allows the voter to cast their vote only
based on their ideologies. To select hyperparameter , we varied the value of  in the set
{1, 3, 5, 8, 10, 20, 30} while predicting on the dev set; which resulted in  = 8 being the best
performing value. We utilized the Euclidean distance for determining the closest  bills
based on their embeddings.

To answer the research Q1, we compare the proposed framework MFCVP with the representative
baselines for both the local individual representative vote level and also for the global overall roll-
call vote. Similar to MFCVP variants, we repeat the Random Guess, Personalized Random Guess
methods 5 times and report the average F1 score and accuracy (since they are non-deterministic
methods).

6.2.6.3

Individual Representative Vote Predictions

The results are shown in Figure 6.7. Based on the results presented in this ﬁgure, we make the
following observations:

• Among the baselines methods, sponsor voter approach outperforms the others. This shows
the fact that the historical relations between a voter and sponsor have a signiﬁcant impact

167

on determining the vote status of a voter for an upcoming bill. Further, as described before,
our proposed framework, unlike sponsor voter method, incorporates these relations in a
sophisticated way by extracting more principled features from the constructed signed network.

• Comparing Top-K bills method with party voter, we can note that the content of a bill is more
important than blindly voting based on a bill’s sponsor party. In fact, the low performance
of party voter method supports the argument that despite the polarized voting behavior of
the U.S. Congress, some representatives adhere to their prior beliefs and ideology instead
of merely always voting with or against a proposed bill based on the sponsor’s political
aﬃliation.

• Personalized random guess outperforms the random guess. This is not surprising, as per-
sonalized random guess incorporates, not eﬀectively though, the prior history of how a
representative voted on past bills.

• The variants of the proposed framework MFCVP outperform all baselines methods and in
some cases very signiﬁcantly. This framework, in a comprehensive and sophisticated man-
ner, incorporates various inﬂuencing political factors associated with congressional voting.
Although MFCVP_NN achieves slightly better performance than MFCVP_RF, we opt to
use the random forest for the rest of experiments since it provides with more interpretable
insights into the proposed factors.

Therefore, from a local congressional vote perspective, this shows that MFCVP can be utilized
as a reliable congressional vote prediction framework. Next, we investigate the global predictions
as to whether MFCVP can accurately detect when a proposed bill will pass or fail.

6.2.6.4 Overall Roll-call Vote Outcome Predictions

Here, we utilize the predictions from the local level (i.e., the individual representative vote predic-
tions) to obtain the overall global roll-call vote outcome of whether the bill will pass or fail. The

168

Figure 6.8: Performance evaluation of MFCVP predicting the overall roll-call vote outcome.

results are shown in Figure 6.8. Based on the results presented in this ﬁgure, we make the following
observations:

• The ﬁrst observation is that although the personalized random guess performed worse than the
party voter for determining individual representative votes, here it signiﬁcantly outperforms
the party voter method. This means that for individual representative votes the better predictor
is based on their political party. However, when aggregating all representative votes to the
prediction of whether the bill with pass/fail, using the representatives previous voting patterns
is better than just considering their party.

• Next we observe a similar swap in that the Top-K bills is now outperforming the Sponsor
Voter model. This is interesting since it implies that the overall pass/fail decisions for roll-
call votes are happening more likely due to the correlation the voted upon bill has with past
similar bills as compared to the relationship all the voters have with the sponsor of the bill.
More speciﬁcally, this indicates two phenomena: 1) the representatives are quite stable in
their ideologies; and 2) when averaged out, the prediction of whether a bill will pass or fail is
better predicted through the representatives history according to the bill content as compared
to their relation to the sponsor of the proposed bill.

• Although the MFCVP_ NN is better able to predict the local individual representative votes
better, it is likely to have slightly overﬁt the training data (since the models were trained on
the local voting patterns) and thus cannot generalize as well when aggregated to the global

169

level. However, when pairing the random forest model with our MFCVP framework (i.e.,
MFCVP_ RF), we see it enjoys a better generalization over the neural network variant.

Therefore, based on the results for both the local individual representative predictions as well as
the global pass or fail aggregated predictions for the proposed bills, it is clear that our MFCVP
framework is a superior and eﬀective methodology for predicting the congressional votes.

6.2.6.5 Political Factor Analysis

The research question Q2 is concerned with the contribution of political factors for congressional
vote prediction. To answer this question, we conduct some experiments for the local individual
representative congressional vote prediction. We focus our attention on using the random forest
(i.e., MFCVP_RF ) as it provides us with feature importance values in an explainable manner.

First, we compute the importance of the three essential factors in our framework i.e., ideological,
network and party factors (the latter two are social factors) using the Gini importance [209].
Figure 6.9 shows the importance of these three factors where Embeddings indicate the contribution
of ideological factors. Based on this ﬁgure, we make the following observations:

• Embeddings (i.e., ideological factors) are the most important features in individual vote
prediction. This shows that ideological factors play a central role in determining a vote cast
on a bill and many representatives adhere to their ethics and beliefs.

• Quite interestingly, network features turn out to be very important. This indicates 1) the
interactions and connections among U.S. House of Congress representatives have a signiﬁcant
bearing on the voter’s voting behavior, and 2) any political vote prediction should not merely
focus on ideological factors and avoid overlooking the role of social networks established
among representatives and their historical votes, since it is quite eﬀective in vote prediction.

• In line with the performance of the Party Voter baseline (see Figure 6.7), party features
have an inconsequential eﬀect on individual vote prediction. This is politically reassuring

170

Figure 6.9: Feature analysis using the feature importance values from MFCVP_RF.

as representatives do not submissively follow the inclination of the political party of a bill’s
sponsor. In the future, we will follow up more on this line of research to investigate if such a
phenomenon persists at other points in time throughout history in the U.S. House and Senate
or even in other country’s political systems.

Now, we narrow down the feature analysis illustrated in Figure 6.9 to investigate contributing
features in each of the three overall factors in more detail. We make the following observations
according to the results shown in this ﬁgure:

• Among the ideological factors, the bill embedding has the highest contribution. This seems

reasonable since, after all, it is a bill that is being voted on.

• Interestingly and somehow surprisingly, the embeddings associated with cosponsors are more
eﬀective than those of sponsors. It is known that over half of bills being introduced into
U.S. Congress are cosponsored [210]. Therefore, based on this fact, it allows our model
to categorize whether a given bill has received no cosponsors, or when aggregated across
all cosponsors the average representative embedding can provide insight into whether it has
received bipartisan support, or only from a single party. This is due to the fact that the
embeddings of the representatives are designed such that they hold their ideology and thus
likely easily separable in the embedded space for our model.

171

• Almost the entire contribution of the party features (though very insigniﬁcant compared to
other factors) stems from the cosponsors party of bills. Similar to ideological factors, this
indicates that cosponsors play an important role whose even party aﬃliation should be taken
into account, but as expected, the learned embeddings about the representative’s ideology are
signiﬁcantly more important than just knowing the political party they are associated with.

• However, for the network features, we observe the opposite as compared to the embedding
and party features in that here the sponsors have a stronger signal. This is likely because
aggregating over all the votes a representative has given to a bill proposed by any of the
current cosponsors (which tends to be a very large set of votes) results in a more noisy signal
as compared to the party or embeddings features (which is just on the order of the number of
cosponsors) that can retain most of the information.

• When comparing between the two network features, we observe that the features related
to balance theory were more insightful than the social ones that looked at how a voters
behavior was with past proposed bills by the same (co)sponsor. One reasoning for this is
that the balance theory based features are more principled and looking at a more global view,
as compared to the local social features that only look at bills previously proposed by that
(co)sponsor. Also, this is likely due to the fact our balance related features are based on
pseudo-triangles we extracted from our constructed signed bipartite network (that we note
naturally does not contain triangles) and are related to the features extracted in [4] where they
were observed to be well suited for predicting whether the sign of a missing link would be
positive or negative.

172

CHAPTER 7

CONCLUSION AND FUTURE DIRECTIONS

In this chapter1, we provide a summary of our research result and present promising future research
directions.

7.1 Summary

In this dissertation, we proposed novel research in the four major directions of network analysis
with negative links - (1) signed network measuring; (2) signed network modeling; (3) signed
network mining; and (4) signed network applications.

For measuring signed networks, we ﬁrst performed an analysis of the properties and theories in
signed networks taking into consideration both positive and negative links both individually, and
holistically together. Next, we performed an initial and comprehensive study of node relevance
measurements in signed networks. We built numerous local and global measurements guided by
signed network properties and balance theory. We further study the impact of signed relevance
measurements on two signed network analysis tasks, i.e., link sign prediction and tie strength
prediction. Experimental results demonstrate that (1) dedicated eﬀorts are necessary to build
signed relevance measurements with negative links; (2) global methods signiﬁcantly outperform
local methods for link prediction. Thereafter, we developed a Deep Signed Centrality (DeSCent)
measure that allowed us to harness the power of deep learning to extract out and learn these complex
patterns between positive and negative links while identifying a signed centrality score for each
user. Furthermore, the deep framework allows for centrality to be calculated across networks.
In other words, training a model in one signed network learns a general enough mapping such
that it can be eﬃciently applied to extract out the centrality of users in other signed networks.
Thus, we developed a novel objective for DeSCent based on status theory and balance theory that
1Tyler Derr. “Network Analysis with Negative Links.” In Proceedings of the 13th ACM

International Conference on Web Search and Data Mining (WSDM). 2020.

173

utilizes higher-order structures. Then, to evaluate the eﬀectiveness of our approach, we conducted
experiments using the signed centrality scores for signed link prediction. These experimental results
on four real-world signed networks have shown the superiority of DeSCent over other recently
proposed signed centrality measures. Furthermore, experiments were performed to validate the
usefulness of the deep framework for learning parameters in one network that are general enough
to extract meaningful centrality scores in other signed networks, which further strengthens the
applicability of the proposed approach.

For modeling signed networks, we proposed our Balanced Signed Chung-Lu model (BSCL),
which was the ﬁrst signed generative network model. BSCL was designed with the objective of
preserving three key properties of signed networks - (1) degree distribution; (2) local clustering
coeﬃcient; (2) positive/negative link ratio and (3) proportion of balance/unbalanced triangles sug-
gested by balance theory. To achieve this, we introduced a triangle balancing parameter and a
sign balancing parameter to control the distribution of formed triangles and signed links, respec-
tively. An automated estimation approach for the two parameters paired with another parameter
for controlling how often to create triangles versus inserting a random edge allows BSCL to take
as input a signed network, learn appropriate parameters needed to model the key properties, and
then output a similar network maintaining the desired properties. In addition, we also provided an
initial investigation of balance theory in signed bipartite networks that: (1) extends the deﬁnition
in the form of signed butterﬂy isomorphism classes; (2) validated that indeed balanced signed
butterﬂies are found signiﬁcantly more often as compared to unbalanced in signed bipartite net-
works; (3) leveraged balance theory for the construction of multiple sign prediction methods; and
(4) performed experiments on three real-world signed bipartite networks to provide insight into
both balance theory and sign prediction in signed bipartite networks.

For mining signed networks, we proposed the ﬁrst signed graph convolutional network (SGCN)
and examined since unsigned graph convolutional networks (GCNs) are built on the assumption
of homophily, that to apply the local neighborhood aggregation the technique could not easily be
reused with the introduction of negative links. Hence, we built a novel aggregation scheme for

174

SGCN built on balance theory and how the positive and negative links interact with each other
along the balanced/unbalanced paths we deﬁned. This allowed us to bridge the gap between
the recent advances in unsigned GCNs and the domain of signed network analysis. Using our
constructed signed graph convolutional network, we performed empirical evaluations through
experiments on four real-world signed networks. Comparing against the state-of-the-art signed
network embedding algorithms, we had shown the superiority of the SGCNs when performing the
classical link sign prediction task. Thereafter, seeking to develop a more universal signed network
embedding method, we proposed the idea for network transformation based embedding, namely
role-based signed network embedding (ROSE). Essentially the idea is to transform the initial signed
network into nodes being transformed to a network where they appear multiple times (one instance
of the node per role, e.g., in/out perspective). This led to the creation of an unsigned network
consisting of multiple viewpoints for each node from the original network, and after performing a
traditional unsigned network embedding we are able to aggregate back the multiple roles/viewpoints
to construct an embedding for each of the nodes in the original signed network. Empirically we
discovered this is surprisingly very eﬀective and achieved state-of-the-art performance on the link
and sign prediction tasks while not relying on social theories.

For applying signed networks, we proposed LIP the joint model that can predict both the polarity
between users as well as the interaction polarity scores between users and content generated by
other users. The framework is built on harnessing the opinions from both problems, and since
we show these opinions are correlated, we were empirically able to help alleviate the cold-start
problem that resides in seeking to predict the polarity of both the user-user and user-content links
in a real-world dataset. In addition, we presented a comprehensive congressional vote prediction
framework MFCVP, that is capable of harnessing both ideological and social factors. We modeled
the historical votes in congress between the members of congress and the bills they are to vote on as a
signed bipartite network and then extracted network features in addition to other ideological features
coming from document embeddings and ﬁnally party aﬃliation features. We furthermore, were
able to discover the most inﬂuential attributes in congressional vote prediction while discovering

175

simply predicting based on party aﬃliations have signiﬁcantly worse performance. Ultimately, we
showed that the proposed signed network features while being informative are having signiﬁcant
role at the congressional vote prediction.

7.2 Future Directions

In this section we present some possible future directions across the major areas of signed
network analysis in relation to each of the major directions of measuring, modeling, mining, and
applying.

• Tie Strength Prediction in Signed Networks: For measuring, in Chapter 2 we presented
a set of local and global signed node relevance measurements and performed an empirical
evaluation on both the sign prediction and tie strength prediction problems. However, this
initial eﬀort just scratched the surface of what can be done in this direction with dedicated
eﬀorts. For example, this task can be performed with using only the network structure, or
with the added assumption of having additional side information associated with the links
and/or nodes, such as link creation timestamps, chat logs between users, comments associated
with the links, account creation time, etc.

• Deep Generative Modeling of Signed Networks: Recently, there has been a rapid devel-
opment of generative graph models for unsigned or node attributed networks that utilize
deep learning on graph techniques ranging from generative adversarial networks (i.e., GAN-
based) [211], recurrent neural networks (i.e., RNN-based) [79], graph recurrent attention
network (i.e., GRAN-based) [212] variational autoencoders (VAE-based) [78], and self-
attention-based [213]. Thus, to improve upon our BSCL method presented in Chapter 4, the
development of a deep generative modeling would be of interest. We refer the readers to a
recent survey on this general topic [214].

• Attack and Defense Methodologies in Signed Networks: Although there has been some
eﬀorts created to both attack and defend against adversarial attacks on traditional network

176

embedding and graph neural network models (e.g., [215]) this area is recently developed and
still evolving. With the given the increased polarization online and social media networks
being one of the major areas of concerns for such attacks, both the attacks and defenses
are likely to be improved by utilizing this additional information of negative links, such as
blocked users, unfollowings, etc. We refer the readers to a recent survey on this general
topic [216].

• Understanding and Predicting Unfollower Relations in Online Social Media: One im-
portant aspect of signed network anlaysis is that it provides a more realistic viewpoint of
the underlying system we seek to model with a network (i.e., set of nodes and edges linking
them together). However, most signed network analysis thus far has not been able to focus
on many mainstream social media sites (such as Twitter), give the lack of data available. For
example, Epinions was a product review website that although having negative ties, only later
released this this data anatomized for research purposes. Hence, to push the frontier forward
in signed social network analysis it is important to understand and characterize which types
of interactions online can lead to the modeling of both direct (e.g., unfollowing [217]) and
indirect (e.g., negatively commenting on another user’s post [218]) negative links in social
media.

177

BIBLIOGRAPHY

178

BIBLIOGRAPHY

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Jiliang Tang and Huan Liu. Trust in social computing. In Proceedings of the 23rd Interna-
tional Conference on World Wide Web, pages 207–208. ACM, 2014.

Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. Signed networks in social media.
In Proceedings of the SIGCHI conference on human factors in computing systems, pages
1361–1370. ACM, 2010.

Ramanthan Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. Propagation of
trust and distrust. In Proceedings of the 13th International Conference on World Wide Web,
pages 403–412. ACM, 2004.

Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. Predicting positive and negative
In Proceedings of the 19th International conference on
links in online social networks.
World wide web, pages 641–650. ACM, 2010.

Patricia Victor, Chris Cornelis, Martine De Cock, and Ankur Teredesai. Trust-and distrust-
based recommendations for controversial reviews. In Web Science Conference (WebSci’09:
Society On-Line), 2009.

Hao Ma, Michael R Lyu, and Irwin King. Learning to recommend with trust and distrust
relationships. In Proceedings of the third ACM conference on Recommender systems, pages
189–196. ACM, 2009.
Tyler Derr. Network analysis with negative links. In Proceedings of the 13th International
Conference on Web Search and Data Mining, pages 917–918, 2020.

[8] Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily

in social networks. Annual review of sociology, pages 415–444, 2001.
Anatol Rapoport and William J Horvath. A study of a large sociogram. Behavioral science,
6(4):279–291, 1961.

[9]

[10] Derek J De Solla Price. Networks of scientiﬁc papers. Science, pages 510–515, 1965.
[11] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science,

286(5439):509–512, 1999.
John Scott. Social network analysis. Sage, 2012.

[12]

[13]

[14]

Jundong Li, Jiliang Tang, Yilin Wang, Yali Wan, Yi Chang, and Huan Liu. Understanding
and predicting delay in reciprocal relations. arXiv preprint arXiv:1703.01393, 2017.
Irawati Karmarkar Karve and Yashwant Bhaskar Damle. Group relations in village commu-
nity, volume 24. Published by SM Katre for the Deccan College Post-graduate and Research,
1963.

179

[15] Leo Katz and James H Powell. Measurement of the tendency toward reciprocation of choice.

Sociometry, 18(4):403–409, 1955.

[16] A Ramachandra Rao and Suraj Bandyopadhyay. Measures of reciprocity in a social network.

Sankhy¯a: The Indian Journal of Statistics, Series A, pages 141–188, 1987.

[17] Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY,

USA, 2010.

[18] R Duncan Luce and Albert D Perry. A method of matrix analysis of group structure.

Psychometrika, 14(2):95–116, 1949.

[19] Paul W Holland and Samuel Leinhardt. Holland and leinhardt reply: some evidence on the

transitivity of positive interpersonal sentiment, 1972.

[20] PF Lazarsfeld and RK. Merton. Friendship as a social process: a substantive and method-

ological analysis interpersonal sentiment, 1954.

[21] Kibae Kim and Jörn Altmann. Eﬀect of homophily on network formation. Communications

in Nonlinear Science and Numerical Simulation, 44:482–494, 2017.

[22] Thomas N Kipf and Max Welling. Semi-supervised classiﬁcation with graph convolutional

networks, 2016.

[23] Srijan Kumar, Francesca Spezzano, VS Subrahmanian, and Christos Faloutsos. Edge weight

prediction in weighted signed networks. In ICDM, pages 221–230, 2016.

[24]

Jérôme Kunegis, Andreas Lommatzsch, and Christian Bauckhage. The slashdot zoo: mining
a social network with negative edges. In Proceedings of the 18th International conference
on World wide web, pages 741–750. ACM, 2009.

[25] Paolo Massa and Paolo Avesani. Controversial users demand local trust metrics: An ex-
perimental study on epinions. com community. In Proceedings of the National Conference
on artiﬁcial Intelligence, volume 20, page 121. Menlo Park, CA; Cambridge, MA; London;
AAAI Press; MIT Press; 1999, 2005.

[26] Shuang-Hong Yang, Alexander J Smola, Bo Long, Hongyuan Zha, and Yi Chang. Friend or
frenemy?: predicting signed ties in social networks. In Proceedings of the 35th International
ACM SIGIR conference on Research and development in information retrieval, pages 555–
564. ACM, 2012.

[27]

Jiliang Tang, Shiyu Chang, Charu Aggarwal, and Huan Liu. Negative link prediction in
social media. In Proceedings of the Eighth ACM International Conference on Web Search
and Data Mining, pages 87–96. ACM, 2015.

[28] Michael Szell, Renaud Lambiotte, and Stefan Thurner. Multirelational organization of large-
scale social networks in an online world. Proceedings of the National Academy of Sciences,
107(31):13636–13641, 2010.

180

[29] Fritz Heider. Attitudes and cognitive organization. The Journal of psychology, 21(1):107–

112, 1946.

[30] Dorwin Cartwright and Frank Harary. Structural balance: a generalization of heider’s theory.

Psychological review, 63(5):277, 1956.

[31] Frank Harary et al. On the notion of balance of a signed graph. The Michigan Mathematical

Journal, 2(2):143–146, 1953.

[32] Dorwin Cartwright and Terry C Gleason. The number of paths and cycles in a digraph.

Psychometrika, 31(2):179–199, 1966.

[33] Nancy M Henley, Robert B Horsfall, and Clinton B De Soto. Goodness of ﬁgure and social

structure. Psychological Review, 76(2):194, 1969.
James A Davis. Clustering and structural balance in graphs. Human relations, 1967.

[34]
[35] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine Learning,

56(1-3):89–113, 2004.

[36] Giuseppe Facchetti, Giovanni Iacono, and Claudio Altaﬁni. Computing global structural
balance in large-scale signed social networks. Proceedings of the National Academy of
Sciences, 108(52):20953–20958, 2011.

[37] Evimaria Terzi and Marco Winkler. A spectral algorithm for computing social balance. In

Algorithms and models for the web graph, pages 1–13. Springer, 2011.

[38] Lars Backstrom and Jure Leskovec. Supervised random walks: predicting and recommending
links in social networks. In Proceedings of the fourth ACM International conference on Web
search and data mining, pages 635–644. ACM, 2011.

[39] Zhijun Yin, Manish Gupta, Tim Weninger, and Jiawei Han. A uniﬁed framework for link
recommendation using random walks. In Advances in Social Networks Analysis and Mining
(ASONAM), 2010 International Conference on, pages 152–159. IEEE, 2010.

[40] Smriti Bhagat, Graham Cormode, and S Muthukrishnan. Node classiﬁcation in social

networks. In Social network data analytics, pages 115–148. Springer, 2011.

[41] Lei Tang and Huan Liu. Community detection and mining in social media. Synthesis

Lectures on Data Mining and Knowledge Discovery, 2(1):1–137, 2010.

[42] Hongzhi Yin, Bin Cui, Jing Li, Junjie Yao, and Chen Chen. Challenging the long tail

recommendation. Proc. VLDB Endow., 5(9):896–907, May 2012.

[43] Lada A Adamic and Eytan Adar. Friends and neighbors on the web. Social networks,

25(3):211–230, 2003.

[44] Hanghang Tong, Christos Faloutsos, and Jia-yu Pan. Fast random walk with restart and its
applications. In Data Mining, 2006. ICDM’06. Sixth International Conference on, pages
613–622. IEEE, 2006.

181

[45] Ranjay Gulati. Network location and learning: The inﬂuence of network resources and ﬁrm

capabilities on alliance formation. Strategic management journal, 20(5):397–420, 1999.

[46] Arzucan Özgür, Thuy Vu, Güneş Erkan, and Dragomir R Radev. Identifying gene-disease
associations using centrality on a literature mined gene-interaction network. Bioinformatics,
24(13):i277–i285, 2008.

[47] Paolo Crucitti, Vito Latora, and Sergio Porta. Centrality measures in spatial networks of

urban streets. Physical Review E, 73(3):036125, 2006.

[48] Roger Guimera, Stefano Mossa, Adrian Turtschi, and LA Nunes Amaral. The worldwide
air transportation network: Anomalous centrality, community structure, and cities’ global
roles. Proceedings of the National Academy of Sciences, 102(22):7794–7799, 2005.

[49] Stanley Wasserman and Katherine Faust. Social network analysis: Methods and applications,

volume 8. Cambridge university press, 1994.

[50] Kiyana Zolfaghar and Abdollah Aghaie. Mining trust and distrust relationships in social web
applications. In Intelligent Computer Communication and Processing (ICCP), 2010 IEEE
International Conference on, pages 73–80. IEEE, 2010.

[51] Vincent Traag, Yurii Nesterov, and Paul Van Dooren. Exponential ranking: Taking into

account negative links. Social Informatics, pages 192–202, 2010.

[52] Panagiotis Symeonidis and Eleftherios Tiakas. Transitive node similarity: predicting and

recommending links in signed social networks. World Wide Web, 17(4):743–776, 2014.
Jinhong Jung, Woojeong Jin, Lee Sael, and U. Kang. Personalized ranking in signed
networks using signed random walk with restart. In IEEE 16th International Conference on
Data Mining, ICDM 2016, December 12-15, 2016, Barcelona, Spain, pages 973–978, 2016.
[54] David Liben-Nowell and Jon Kleinberg. The link-prediction problem for social networks.
Journal of the American society for information science and technology, 58(7):1019–1031,
2007.
Jiliang Tang, Yi Chang, Charu Aggarwal, and Huan Liu. A survey of signed network mining
in social media. ACM Computing Surveys (CSUR), 49(3):42, 2016.

[53]

[55]

[56] Francois Lorrain and Harrison C White. Structural equivalence of individuals in social

networks. The Journal of mathematical sociology, 1(1):49–80, 1971.

[57] Leo Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39–

43, 1953.

[58] Hung-Hsuan Chen and C Lee Giles. Ascos: an asymmetric network structure context
similarity measure. In Advances in Social Networks Analysis and Mining (ASONAM), 2013
IEEE/ACM International Conference on, pages 442–449. IEEE, 2013.

[59] Roger A. Horn and Charles R. Johnson. Matrix analysis. Cambridge University Press,

Cambridge, second edition, 2013.

182

[60] Linyuan Lü and Tao Zhou. Link prediction in complex networks: A survey. Physica A:

Statistical Mechanics and its Applications, 390(6):1150–1170, 2011.

[61] Kai-Yang Chiang, Nagarajan Natarajan, Ambuj Tewari, and Inderjit S Dhillon. Exploiting
In Proceedings of the 20th ACM
longer cycles for link prediction in signed networks.
International conference on Information and knowledge management, pages 1157–1162.
ACM, 2011.

[62] Eric Gilbert and Karrie Karahalios. Predicting tie strength with social media. In Proceedings
of the SIGCHI conference on human factors in computing systems, pages 211–220. ACM,
2009.

[63] Rongjing Xiang, Jennifer Neville, and Monica Rogati. Modeling relationship strength in
online social networks. In Proceedings of the 19th International conference on World wide
web, pages 981–990. ACM, 2010.
Indika Kahanda and Jennifer Neville. Using transactional information to predict link strength
in online social networks. In Third International AAAI Conference on Weblogs and Social
Media, 2009.

[64]

[65] Moshen Shahriari and Mahdi Jalili. Ranking nodes in signed social networks. Social Network

Analysis and Mining, 4(1):172, 2014.

[66] Cristobald de Kerchove and Paul Van Dooren. The pagetrust algorithm: How to rank web
In Proceedings of the 2008 SIAM International

pages when negative links are allowed?
Conference on Data Mining, pages 346–352. SIAM, 2008.
Jiliang Tang, Xia Hu, and Huan Liu. Is distrust the negation of trust?: the value of distrust
in social media. In Proceedings of the 25th ACM conference on Hypertext and social media,
pages 148–157. ACM, 2014.

[67]

[68] Geoﬀrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with

neural networks. science, 313(5786):504–507, 2006.

[69] Yann Lecun, Yoshua Bengio, and Geoﬀrey Hinton. Deep learning. Nature, 521(7553):436–

444, 2015.

[70] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haﬀner. Gradient-based learning

applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[71] Brian D Ripley. Pattern recognition and neural networks. Cambridge university press, 2007.
[72] George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics

of control, signals and systems, 2(4):303–314, 1989.

[73] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks

are universal approximators. Neural networks, 2(5):359–366, 1989.

[74] Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. Community

preserving network embedding. In AAAI, pages 7203–209, 2017.

183

[75] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large
graphs. In Advances in Neural Information Processing Systems (NIPS), pages 1025–1035,
2017.

[76] Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S
Huang. Heterogeneous network embedding via deep architectures. In Proceedings of the
21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
pages 119–128. ACM, 2015.

[77] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social
In Proceedings of the 20th ACM SIGKDD International Conference on

representations.
Knowledge Discovery and Data Mining, pages 701–710. ACM, 2014.

[78] Aditya Grover, Aaron Zweig, and Stefano Ermon. Graphite: Iterative generative modeling

of graphs. In International Conference on Machine Learning, pages 2434–2444, 2019.

[79]

Jiaxuan You, Rex Ying, Xiang Ren, William L Hamilton, and Jure Leskovec. Graphrnn:
Generating realistic graphs with deep auto-regressive models. In ICML, 2018.

[80] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep

generative models of graphs. arXiv preprint arXiv:1803.03324, 2018.

[81] Thomas N Kipf and Max Welling. Variational graph auto-encoders. In NIPS Workshop on

Bayesian Deep Learning, 2016.

[82] Suhang Wang, Jiliang Tang, Charu Aggarwal, and Huan Liu. Linked document embed-
ding for classiﬁcation. In Proceedings of the 25th ACM International on Conference on
Information and Knowledge Management, pages 115–124. ACM, 2016.

[83] Wenqi Fan, Tyler Derr, Yao Ma, Jianping Wang, Jiliang Tang, and Qing Li. Deep adversarial

social recommendation. arXiv preprint arXiv:1905.13160, 2019.

[84] Tyler Derr, Hamid Karimi, Xiaorui Liu, Jiejun Xu, and Jiliang Tang. Deep adversarial

network alignment. arXiv preprint arXiv:1902.10307, 2019.

[85] Haochen Liu, Tyler Derr, Zitao Liu, and Jiliang Tang. Say what i want: Towards the dark

side of neural dialogue models. arXiv preprint arXiv:1909.06044, 2019.

[86] Hamid Karimi, Tyler Derr, and Jiliang Tang. Characterizing the decision boundary of deep

neural networks. arXiv preprint arXiv:1912.11460, 2019.

[87] Suhang Wang, Jiliang Tang, Charu Aggarwal, Yi Chang, and Huan Liu. Signed network
embedding in social media. In Proceedings of the 2017 SIAM International Conference on
Data Mining, pages 327–335. SIAM, 2017.

[88] Mohammad Raihanul Islam, B Aditya Prakash, and Naren Ramakrishnan. Signet: scalable
embeddings for signed networks. In Paciﬁc-Asia Conference on Knowledge Discovery and
Data Mining, pages 157–169. Springer, 2018.

184

[89]

Ji-Eun Lee Junghwan Kim, Haekyu Park and U Kang. Side: Representation learning in
signed directed networks. In Proceedings of the 27th International Conference on World
Wide Web (The Web Conference), 2018.

[90] Phillip Bonacich. Power and centrality: A family of measures. American journal of sociology,

92(5):1170–1182, 1987.

[91] David Easley and Jon Kleinberg. Strong and weak ties. In Networks, Crowds, and Markets:

Reasoning about a Highly Connected World, pages 47–84, 2010.

[92] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation

ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.

[93] Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng. Rectiﬁer nonlinearities improve

neural network acoustic models. In Proc. ICML, 2013.

[94] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv

preprint arXiv:1412.6980, 2014.

[95] Mark EJ Newman. Clustering and preferential attachment in growing networks. Physical

review E, 64(2):025102, 2001.
Jinhong Jung, Woojeong Jin, Lee Sael, and U Kang. Personalized ranking in signed net-
works using signed random walk with restart. In Data Mining (ICDM), 2016 IEEE 16th
International Conference on, pages 973–978. IEEE, 2016.

[96]

[97] Fan Chung and Linyuan Lu. The average distances in random graphs with given expected

[98]

[99]

degrees. Proceedings of the National Academy of Sciences, 99(25):15879–15882, 2002.
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: densiﬁcation
laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM
SIGKDD International conference on Knowledge discovery in data mining, pages 177–187.
ACM, 2005.
Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin
Ghahramani. Kronecker graphs: An approach to modeling networks. Journal of Machine
Learning Research, 11(Feb):985–1042, 2010.

[100] Stephen Mussmann, John Moore, Joseph J. Pfeiﬀer, III, and Jennifer Neville. Incorporating
assortativity and degree dependence into scalable network models. In Proceedings of the
Twenty-Ninth AAAI Conference on Artiﬁcial Intelligence, AAAI’15, pages 238–246. AAAI
Press, 2015.

[101] Mark EJ Newman. Assortative mixing in networks. Physical review letters, 89(20):208701,

2002.

[102] Joseph J Pfeiﬀer, Timothy La Fond, Sebastian Moreno, and Jennifer Neville. Fast generation
In Privacy, Secu-
of large scale social networks while incorporating transitive closures.
rity, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International
Confernece on Social Computing (SocialCom), pages 154–165. IEEE, 2012.

185

[103] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’networks.

nature, 393(6684):440, 1998.

[104] Comandur Seshadhri, Tamara G Kolda, and Ali Pinar. Community structure and scale-free

collections of erdős-rényi graphs. Physical Review E, 85(5):056109, 2012.

[105] Petter Holme and Beom Jun Kim. Growing scale-free networks with tunable clustering.

Physical review E, 65(2):026107, 2002.

[106] Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. Community
structure in large networks: Natural cluster sizes and the absence of large well-deﬁned
clusters. Internet Mathematics, 6(1):29–123, 2009.

[107] Albert-Laszlo Barabasi and Zoltan N Oltvai. Network biology: understanding the cell’s

functional organization. Nature reviews genetics, 5(2):101, 2004.

[108] Erik Volz and Lauren Ancel Meyers. Susceptible–infected–recovered epidemics in dynamic
contact networks. Proceedings of the Royal Society B: Biological Sciences, 274(1628):2925–
2934, 2007.

[109] Mohammad Malekzadeh, M Fazli, P Jalali Khalilabadi, H Rabiee, and M Safari. Social

balance and signed network formation games. 2011.

[110] Jure Leskovec and Julian J Mcauley. Learning to discover social circles in ego networks. In

Advances in neural information processing systems, pages 539–547, 2012.

[111] Yanhua Li, Wei Chen, Yajun Wang, and Zhi-Li Zhang. Inﬂuence diﬀusion dynamics and
inﬂuence maximization in social networks with friend and foe relationships. In Proceedings
of the sixth ACM International conference on Web search and Data Mining, pages 657–666.
ACM, 2013.

[112] Ali Pinar, Comandur Seshadhri, and Tamara G Kolda. The similarity between stochastic
In Proceedings of the 2012 SIAM International

kronecker and chung-lu graph models.
Conference on Data Mining, pages 1071–1082. SIAM, 2012.

[113] Thomas Schank and Dorothea Wagner. Finding, counting and listing all triangles in large

graphs, an experimental study. In WEA. Springer, 2005.

[114] Vida Vukašinović, Jurij Šilc, and Risth Škrekovski. Modeling acquaintance networks based
on balance theory. International Journal of Applied Mathematics and Computer Science,
24(3):683–696, 2014.

[115] Mark Ludwig and Peter Abell. An evolutionary model of social networks. European Physical

Journal B–Condensed Matter, 58(1), 2007.

[116] Tyler Derr and Jiliang Tang. Congressional vote analysis using signed networks. In 2018
IEEE International Conference on Data Mining Workshops (ICDMW), pages 1501–1502.
IEEE, 2018.

186

[117] Tyler Derr, Hamid Karimi, Aaron Brookhouse, and Jiliang Tang. Multi-factor congressional
vote prediction. In Advances in Social Networks Analysis and Mining (ASONAM), 2019
IEEE/ACM International Conference on. IEEE, 2019.

[118] Tyler Derr, Yao Ma, and Jiliang Tang. Signed graph convolutional networks. In 2018 IEEE

International Conference on Data Mining (ICDM), pages 929–934. IEEE, 2018.

[119] Phillip Bonacich and Paulette Lloyd. Calculating status with negative relations. Social

Networks, 26(4):331–338, 2004.

[120] Zhaoming Wu, Charu C Aggarwal, and Jimeng Sun. The troll-trust model for ranking in
signed networks. In Proceedings of the Ninth ACM International Conference on Web Search
and Data Mining, pages 447–456. ACM, 2016.

[121] Pranay Anchuri and Malik Magdon-Ismail. Communities and balance in signed networks:
A spectral approach. In Advances in Social Networks Analysis and Mining (ASONAM), 2012
IEEE/ACM International Conference on, pages 235–242. IEEE, 2012.

[122] Sinan G Aksoy, Tamara G Kolda, and Ali Pinar. Measuring and modeling bipartite graphs

with community structure. Journal of Complex Networks, 5(4):581–603, 2017.

[123] Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sariyuce, and Srikanta Tirthapura. Butterﬂy count-
ing in bipartite networks. In Proceedings of the 24th ACM SIGKDD International Conference
on Knowledge Discovery & Data Mining, pages 2150–2159. ACM, 2018.

[124] Amin Javari and Mahdi Jalili. Cluster-based collaborative ﬁltering for sign prediction in
social networks with positive and negative links. ACM Transactions on Intelligent Systems
and Technology (TIST), 5(2):24, 2014.

[125] Tongda Zhang, Haomiao Jiang, Zhouxiao Bao, and Yingfeng Zhang. Characterization and
edge sign prediction in signed networks. Journal of Industrial and Intelligent Information
Vol, 1(1), 2013.

[126] Athanasios Papaoikonomou, Magdalini Kardara, Konstantinos Tserpes, and Dora Var-
varigou. Edge sign prediction in social networks via frequent subgraph discovery. IEEE
Internet Computing, 2014.

[127] Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, and Mohammed Zaki. Link prediction
using supervised learning. In SDM06: workshop on link analysis, counter-terrorism and
security, 2006.

[128] Zhengdong Lu, Berkant Savas, Wei Tang, and Inderjit S Dhillon. Supervised link prediction
using multiple sources. In Data Mining (ICDM), 2010 IEEE 10th International Conference
on, pages 923–928. IEEE, 2010.

[129] Aditya Krishna Menon and Charles Elkan. Link prediction via matrix factorization.

In
Machine Learning and Knowledge Discovery in Databases, pages 437–452. Springer, 2011.

187

[130] Jiliang Tang, Charu Aggarwal, and Huan Liu. Recommendations in signed social networks.
In Proceedings of the 25th International Conference on World Wide Web, pages 31–40.
International World Wide Web Conferences Steering Committee, 2016.

[131] Cho-Jui Hsieh, Kai-Yang Chiang, and Inderjit S Dhillon. Low rank modeling of signed
networks. In Proceedings of the 18th ACM SIGKDD International conference on Knowledge
discovery and Data Mining, pages 507–515. ACM, 2012.

[132] Rana Forsati, Iman Barjasteh, Farzan Masrour, Abdol-Hossein Esfahanian, and Hayder
Radha. Pushtrust: An eﬃcient recommendation algorithm by leveraging trust and distrust
In Proceedings of the 9th ACM Conference on Recommender Systems, pages
relations.
51–58. ACM, 2015.

[133] Tyler Derr, Chenxing Wang, Suhang Wang, and Jiliang Tang. Relevance measurements in
online signed social networks. In Proceedings of the 14th International Workshop on Mining
and Learning with Graphs (MLG), 2018.

[134] László Lovász. Random walks on graphs: A survey. Combinatorics, Paul erdos is eighty,

2(1):1–46, 1993.

[135] Man Gao, Ling Chen, Bin Li, Yun Li, Wei Liu, and Yong-cheng Xu. Projection-based link

prediction in a bipartite network. Information Sciences, 376:158–171, 2017.

[136] Tao Zhou, Jie Ren, Matúš Medo, and Yi-Cheng Zhang. Bipartite network projection and

personal recommendation. Physical Review E, 76(4):046115, 2007.

[137] Katharina Anna Zweig and Michael Kaufmann. A systematic approach to the one-mode

projection of bipartite graphs. Social Network Analysis and Mining, 1(3):187–218, 2011.

[138] Muhammed A Yildirim and Michele Coscia. Using random walks to generate associations

between objects. PloS one, 9(8):e104813, 2014.
[139] Jie Tang, Tiancheng Lou, and Jon Kleinberg.

Inferring social ties across heterogenous
networks. In Proceedings of the ﬁfth ACM International conference on Web search and Data
Mining, pages 743–752. ACM, 2012.

[140] Jihang Ye, Hong Cheng, Zhe Zhu, and Minghua Chen. Predicting positive and negative links
in signed social networks by transfer learning. In Proceedings of the 22nd International con-
ference on World Wide Web, pages 1477–1488. International World Wide Web Conferences
Steering Committee, 2013.

[141] Ahmed Hassan, Amjad Abu-Jbara, and Dragomir Radev. Detecting subgroups in online
discussions by modeling positive and negative relations among participants. In Proceedings
of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and
Computational Natural Language Learning, pages 59–70. Association for Computational
Linguistics, 2012.

[142] Kai-Yang Chiang, Joyce Jiyoung Whang, and Inderjit S Dhillon. Scalable clustering of
signed networks using balance normalized cut. In Proceedings of the 21st ACM International
conference on Information and knowledge management, pages 615–624. ACM, 2012.

188

[143] Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural
networks for graphs. In International Conference on Machine Learning (ICML), pages 2014–
2023, 2016.

[144] Michaël Deﬀerrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural net-
works on graphs with fast localized spectral ﬁltering. In Advances in Neural Information
Processing Systems (NIPS), pages 3844–3852, 2016.

[145] David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy
Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for
In Advances in Neural Information Processing Systems
learning molecular ﬁngerprints.
(NIPS), pages 2224–2232, 2015.

[146] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. Spectral networks and
locally connected networks on graphs. In Proceedings of the International Conference on
Learning Representations (ICLR), 2014.

[147] Tyler Derr, Yao Ma, Wenqi Fan, Xiaorui Liu, Charu Aggarwal, and Jiliang Tang. Epidemic
graph convolutional network. In Proceedings of the 13th International Conference on Web
Search and Data Mining, pages 160–168, 2020.

[148] Wei Jin, Tyler Derr, Haochen Liu, Yiqi Wang, Suhang Wang, Zitao Liu, and Jiliang Tang.
Self-supervised learning on graphs: Deep insights and new direction. arXiv preprint
arXiv:2006.10141, 2020.

[149] Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In Proceed-
ings of the 22nd ACM SIGKDD International conference on Knowledge Discovery and Data
Mining, pages 1225–1234. ACM, 2016.

[150] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In
Proceedings of the 22nd ACM SIGKDD International conference on Knowledge Discovery
and Data Mining, pages 855–864. ACM, 2016.

[151] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with
global structural information. In Proceedings of the 24th ACM International on Conference
on Information and Knowledge Management (CIKM), pages 891–900. ACM, 2015.

[152] Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. Community

preserving network embedding. In AAAI, pages 7203–209, 2017.

[153] Jérôme Kunegis, Stephan Schmidt, Andreas Lommatzsch, Jürgen Lerner, Ernesto W
De Luca, and Sahin Albayrak. Spectral analysis of signed graphs for clustering, predic-
tion and visualization. In Proceedings of the 2010 SIAM International Conference on Data
Mining, pages 559–570. SIAM, 2010.

[154] Marek Cygan, Marcin Pilipczuk, Michał Pilipczuk, and Jakub Onufry Wojtaszczyk. Sitting
closer to friends than enemies, revisited. Theory of computing systems, 56(2):394–405,
2015.

189

[155] Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. A survey on network embedding. IEEE

Transactions on Knowledge and Data Engineering, 31(5):833–852, 2018.

[156] Junghwan Kim, Haekyu Park, Ji-Eun Lee, and U Kang. Side: Representation learning in
signed directed networks. In Proceedings of the 2018 World Wide Web Conference on World
Wide Web, pages 509–518. International World Wide Web Conferences Steering Committee,
2018.

[157] Yiqi Chen, Tieyun Qian, Huan Liu, and Ke Sun. Bridge: Enhanced signed directed network
embedding. In Proceedings of the 27th ACM International Conference on Information and
Knowledge Management, pages 773–782. ACM, 2018.

[158] Suhang Wang, Charu Aggarwal, Jiliang Tang, and Huan Liu. Attributed signed network
embedding. In Proceedings of the 2017 ACM on Conference on Information and Knowledge
Management, pages 137–146. ACM, 2017.

[159] Feng Xue, Xiangnan He, Xiang Wang, Jiandong Xu, Kai Liu, and Richang Hong. Deep item-
based collaborative ﬁltering for top-n recommendation. ACM Transactions on Information
Systems (TOIS), 37(3):33, 2019.

[160] William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs:

Methods and applications. arXiv preprint arXiv:1709.05584, 2017.

[161] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity
In Proceedings of the 22nd ACM SIGKDD International

preserving graph embedding.
conference on Knowledge Discovery and Data Mining, pages 1105–1114. ACM, 2016.

[162] Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, and Qi Liu. Shine: signed
heterogeneous information network embedding for sentiment link prediction. In Proceedings
of the Eleventh ACM International Conference on Web Search and Data Mining, pages 592–
600. ACM, 2018.

[163] Shuhan Yuan, Xintao Wu, and Yang Xiang. Sne: signed network embedding. In Paciﬁc-Asia

conference on knowledge Discovery and Data Mining, pages 183–195. Springer, 2017.

[164] Jinhong Jung, Woojeong Jin, Lee Sael, and U Kang. Personalized ranking in signed networks
using signed random walk with restart. In 2016 IEEE 16th International Conference on Data
Mining (ICDM), pages 973–978. IEEE, 2016.

[165] Tyler Derr, Cassidy Johnson, Yi Chang, and Jiliang Tang. Balance in signed bipartite
networks. In Proceedings of the 28th ACM International Conference on Information and
Knowledge Management, pages 1221–1230. ACM, 2019.

[166] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu.
A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596, 2019.

[167] Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. Deep learning based recommender system:

A survey and new perspectives. ACM Computing Surveys (CSUR), 52(1):5, 2019.

190

[168] Moira Burke and Robert Kraut. Mopping up: modeling wikipedia promotion decisions. In
Proceedings of the 2008 ACM conference on Computer supported cooperative work, pages
27–36, 2008.

[169] Cliﬀ AC Lampe, Erik Johnston, and Paul Resnick. Follow the reader: ﬁltering comments on
slashdot. In Proceedings of the SIGCHI conference on Human factors in computing systems,
pages 1253–1262, 2007.

[170] Alec Kirkley, George T Cantwell, and MEJ Newman. Balance in signed networks. Physical

Review E, 99(1):012320, 2019.

[171] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of inﬂuence through
a social network. In Proceedings of the ninth ACM SIGKDD International conference on
Knowledge discovery and Data Mining, pages 137–146. ACM, 2003.

[172] Jiliang Tang, Xia Hu, and Huan Liu. Social recommendation: a review. Social Network

Analysis and Mining, 3(4):1113–1133, 2013.

[173] Wei Li, Pengyi Fan, Pei Li, Hui Wang, and Yiguang Pan. An opinion spreading model in

signed networks. Modern Physics Letters B, 27(12), 2013.

[174] Yanhua Li, Wei Chen, Yajun Wang, and Zhi-Li Zhang. Voter model on signed social

networks. Internet Mathematics, 2014.

[175] Rana Forsati, Mehrdad Mahdavi, Mehrnoush Shamsfard, and Mohamed Sarwat. Matrix
factorization with explicit trust and distrust side information for improved social recommen-
dation. ACM Transactions on Information Systems (TOIS), 32(4):17, 2014.

[176] Samaneh Moghaddam, Mohsen Jamali, and Martin Ester. Etf: extended tensor factorization
model for personalizing prediction of review helpfulness. In Proceedings of the ﬁfth ACM
International conference on Web search and data mining, pages 163–172. ACM, 2012.

[177] Samaneh Moghaddam, Mohsen Jamali, and Martin Ester. Review recommendation: per-
In Proceedings of the 20th ACM
sonalized prediction of the quality of online reviews.
International conference on Information and knowledge management, pages 2249–2252.
ACM, 2011.

[178] Jiliang Tang, Huiji Gao, Xia Hu, and Huan Liu. Context-aware review helpfulness rating
prediction. In Proceedings of the 7th ACM conference on Recommender systems, pages 1–8.
ACM, 2013.

[179] Suhang Wang, Jiliang Tang, and Huan Liu. Toward dual roles of users in recommender
systems. In Proceedings of the 24th ACM International on Conference on Information and
Knowledge Management, pages 1651–1660. ACM, 2015.

[180] Panagiotis Symeonidis and Nikolaos Mantas. Spectral clustering for link prediction in social
networks with positive and negative links. Social Network Analysis and Mining, 3(4):1433–
1447, 2013.

191

[181] Kai-Yang Chiang, Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit S Dhillon, and Ambuj
Tewari. Prediction and clustering in signed networks: a local to global perspective. Journal
of Machine Learning Research, 15(1):1177–1213, 2014.

[182] Nenad Trinajstic. Chemical graph theory. Routledge, 2018.
[183] Jeﬀrey M Dambacher, Hiram W Li, and Philippe A Rossignol. Relevance of community
structure in assessing indeterminacy of ecological predictions. Ecology, 83(5):1372–1385,
2002.

[184] G Toulouse. Theory of the frustration eﬀect in spin glasses: I. Spin Glass Theory and

Beyond: An Introduction to the Replica Method and Its Applications, 9:99, 1987.

[185] Zachary P Neal. A sign of the times? weak and strong polarization in the us congress,

1973–2016. Social Networks, 2018.

[186] Zhiwei Wang, Tyler Derr, Dawei Yin, and Jiliang Tang. Understanding and predicting weight
loss with mobile social networking data. In Proceedings of the 2017 ACM on Conference on
Information and Knowledge Management, pages 1269–1278. ACM, 2017.

[187] Hamid Karimi, Tyler Derr, Kaitlin T Torphy, Kenneth A Frank, and Jiliang Tang. Towards
improving sample representativeness of teachers on online social media: A case study on
pinterest. In International Conference on Artiﬁcial Intelligence in Education, pages 130–134.
Springer, 2020.

[188] Hamid Karimi, Kaitlin T Torphy, Tyler Derr, Kenneth A Frank, and Jiliang Tang. Character-
izing teacher connections in online social media: A case study on pinterest. In Proceedings
of the Seventh ACM Conference on Learning@ Scale, pages 249–252, 2020.

[189] Priyanka Agrawal, Vikas K Garg, and Ramasuri Narayanam. Link label prediction in signed
In Proceedings of the Twenty-Third International joint conference on

social networks.
Artiﬁcial Intelligence, pages 2591–2597. AAAI Press, 2013.

[190] Xiaoyuan Su and Taghi M Khoshgoftaar. A survey of collaborative ﬁltering techniques.

Advances in artiﬁcial intelligence, 2009:4, 2009.

[191] Takahiko Ito, Masashi Shimbo, Taku Kudo, and Yuji Matsumoto. Application of kernels
to link analysis. In Proceedings of the eleventh ACM SIGKDD International conference on
Knowledge discovery in data mining, pages 586–592. ACM, 2005.

[192] Simon Jackman. Multidimensional analysis of roll call data via bayesian simulation: Iden-
tiﬁcation, estimation, inference, and model checking. Political Analysis, 9(3):227–241,
2001.

[193] Peter Kraft, Hirsh Jain, and Alexander M Rush. An embedding model for predicting roll-call
votes. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language
Processing, pages 2066–2070, 2016.

[194] Nolan McCarty, Keith T Poole, and Howard Rosenthal. Polarized America: The dance of

ideology and unequal riches. 2016.

192

[195] Joshua Clinton, Simon Jackman, and Douglas Rivers. The statistical analysis of roll call

data. American Political Science Review, 98(2):355–370, 2004.

[196] Mason A Porter, Peter J Mucha, Mark EJ Newman, and Casey M Warmbrand. A network
analysis of committees in the us house of representatives. Proceedings of the National
Academy of Sciences, 102(20):7057–7062, 2005.

[197] Carlo Dal Maso, Gabriele Pompa, Michelangelo Puliga, Gianni Riotta, and Alessandro
Chessa. Voting behavior, coalitions and government strength through a complex network
analysis. PloS one, 9(12):e116046, 2014.

[198] Sean Gerrish and David M Blei. Predicting legislative roll calls from text. In International

Conference on Machine Learning, pages 489–496, 2011.

[199] In Song Kim, John Londregan, and Marc Ratkovic. Voting, speechmaking, and the di-
mensions of conﬂict in the us senate. In Annual Meeting of the Midwest Political Science
Association, 2014.

[200] Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In

International Conference on Machine Learning, pages 1188–1196, 2014.

[201] Jey Han Lau and Timothy Baldwin. An empirical evaluation of doc2vec with practical

insights into document embedding generation. arXiv preprint arXiv:1607.05368, 2016.

[202] Hamid Karimi, Courtland VanDam, Liyang Ye, and Jiliang Tang. End-to-end compromised
account detection. In International Conference on Advances in Social Networks Analysis
and Mining, pages 314–321. IEEE, 2018.

[203] Clio Andris, David Lee, Marcus J Hamilton, Mauro Martino, Christian E Gunning, and
John Armistead Selden. The rise of partisanship and super-cooperators in the us house of
representatives. PloS one, 10(4):e0123507, 2015.

[204] Tyler Derr, Charu Aggarwal, and Jiliang Tang. Signed network modeling based on structural
balance theory. In Proceedings of the 27th ACM International Conference on Information
and Knowledge Management, pages 557–566. ACM, 2018.

[205] Tyler Derr, Zhiwei Wang, and Jiliang Tang. Opinions power opinions: Joint link and interac-
tion polarity predictions in signed networks. In 2018 IEEE/ACM International Conference
on Advances in Social Networks Analysis and Mining (ASONAM), pages 363–366. IEEE,
2018.

[206] Tyler Derr, Yao Ma, and Jiliang Tang. Signed graph convolutional networks. In 2018 IEEE

International Conference on Data Mining (ICDM), pages 929–934. IEEE, 2018.

[207] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
[208] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks,

61:85–117, 2015.

193

[209] L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classiﬁcation and Regression Trees.

The Wadsworth and Brooks-Cole statistics-probability series. Taylor & Francis, 1984.

[210] Rick K Wilson and Cheryl D Young. Cosponsorship in the us congress. Legislative Studies

Quarterly, pages 25–43, 1997.

[211] Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. Net-
gan: Generating graphs via random walks. In International Conference on Machine Learn-
ing, pages 610–619, 2018.

[212] Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Will Hamilton, David K Duvenaud,
Raquel Urtasun, and Richard Zemel. Eﬃcient graph generation with graph recurrent attention
networks. In Advances in Neural Information Processing Systems, pages 4255–4265, 2019.
arXiv preprint

[213] Sohil Atul Shah and Vladlen Koltun.

arXiv:2006.02879, 2020.

Auto-decoding graphs.

[214] Xiaojie Guo and Liang Zhao. A systematic survey on deep generative models for graph

generation. arXiv preprint arXiv:2007.06686, 2020.

[215] Yao Ma, Suhang Wang, Tyler Derr, Lingfei Wu, and Jiliang Tang. Attacking graph convo-

lutional networks via rewiring. arXiv preprint arXiv:1906.03750, 2019.

[216] Wei Jin, Yaxin Li, Han Xu, Yiqi Wang, and Jiliang Tang. Adversarial attacks and defenses

on graphs: A review and empirical study. arXiv preprint arXiv:2003.00653, 2020.

[217] Haewoon Kwak, Hyunwoo Chun, and Sue Moon. Fragile online relationship: a ﬁrst look at
unfollow dynamics in twitter. In Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems, pages 1091–1100, 2011.

[218] Betty van Aken, Julian Risch, Ralf Krestel, and Alexander Löser. Challenges for toxic
comment classiﬁcation: An in-depth error analysis. arXiv preprint arXiv:1809.07572, 2018.

194