FAIRNESS IN SOCIAL NETWORK ANALYSIS: MEASURES AND ALGORITHMS
                                      By
                          Farzan Masrour Shalmani
                              A DISSERTATION
                                  Submitted to
                          Michigan State University
                  in partial fulfillment of the requirements
                               for the degree of
                 Computer Science – Doctor of Philosophy
                                     2021


                                            ABSTRACT
     FAIRNESS IN SOCIAL NETWORK ANALYSIS: MEASURES AND ALGORITHMS
                                                  By
                                      Farzan Masrour Shalmani
The use of machine learning in human subject-related tasks has resulted in growing concerns
about the inherent biases within such automated decision-making algorithms. In response to these
concerns, we are witnessing a growing body of literature that focuses on designing fairness-aware
machine learning algorithms. However, current fairness research is mostly limited to non-relational,
independent and identically-distributed (i.i.d) data. To overcome this limitation, this thesis aims to
develop fairness measures and algorithms for analyzing social networks, which is an important class
of relational data. In particular, this work investigates the challenges of ensuring fairness in link
prediction, node classification, and network sampling, which are three important network analysis
tasks. First, we develop a novel fairness-aware link prediction framework that combines adversarial
network representation learning with supervised link prediction based on network modularity
measure. We show that this approach promotes more diverse links and addresses the filter bubble
problem in social networks. Second, we investigate the node classification problem from a fairness
perspective. We introduce a novel yet intuitive measure known as fairness perception and provide an
axiomatic approach to analyze its properties. A fairness-aware classification algorithm is developed
to balance the trade-off between maximizing accuracy and minimizing the perception of bias in
the classification decisions. Using a graph-theoretic framework, we present a theoretical bound on
the gap between the true positive rates for different groups of individuals when fairness perception
is maximized. Finally, we investigate the network sampling problem from a fairness perspective.
Specifically, we propose a novel fairness-aware network sampling framework that combines the
structural preservability and group representativity objectives into a unified structure. We also
present a fair greedy sampling algorithm with bounded approximation guarantees.


               Copyright by
FARZAN MASROUR SHALMANI
                      2021


                                  TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    ix
LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        x
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . .             . . . . . . . . . . . . . . .  1
   1.1 Fairness in Network Analysis . . . . . . . . . . . . . .    . . . . . . . . . . . . . . .  2
   1.2 Thesis Contributions . . . . . . . . . . . . . . . . . .    . . . . . . . . . . . . . . .  4
       1.2.1 Metrics for Assessing Fairness in Network Data        . . . . . . . . . . . . . . .  4
       1.2.2 Fairness-Aware Network Mining Tasks . . . . .         . . . . . . . . . . . . . . .  5
               1.2.2.1 Fairness-Aware Link Prediction . . .        . . . . . . . . . . . . . . .  5
               1.2.2.2 Fairness-Aware Node Classification .        . . . . . . . . . . . . . . .  6
               1.2.2.3 Fairness-Aware Network Sampling .           . . . . . . . . . . . . . . .  7
   1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
   1.4 Bibliographic Notes . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . .  8
CHAPTER 2 LITERATURE REVIEW . . . . . . . . . . . .                . . . . . . . . . . . . . . . 10
   2.1 Networks . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . . . . . . . . . . 10
       2.1.1 Basic Concepts and Definitions . . . . . . . . .      . . . . . . . . . . . . . . . 10
       2.1.2 Random Walk . . . . . . . . . . . . . . . . . .       . . . . . . . . . . . . . . . 11
       2.1.3 Network Measures and Metrics . . . . . . . . .        . . . . . . . . . . . . . . . 12
               2.1.3.1 Network Centrality . . . . . . . . . .      . . . . . . . . . . . . . . . 12
               2.1.3.2 Homophily and Assortative Mixing .          . . . . . . . . . . . . . . . 14
   2.2 Network Mining Tasks . . . . . . . . . . . . . . . . .      . . . . . . . . . . . . . . . 15
       2.2.1 Node Oriented Tasks . . . . . . . . . . . . . .       . . . . . . . . . . . . . . . 15
               2.2.1.1 Classification . . . . . . . . . . . . .    . . . . . . . . . . . . . . . 15
               2.2.1.2 Clustering and Community Detection          . . . . . . . . . . . . . . . 16
               2.2.1.3 Anomaly Detection . . . . . . . . . .       . . . . . . . . . . . . . . . 17
       2.2.2 Link Oriented Tasks . . . . . . . . . . . . . .       . . . . . . . . . . . . . . . 17
               2.2.2.1 Link prediction . . . . . . . . . . . .     . . . . . . . . . . . . . . . 17
               2.2.2.2 Network Alignment . . . . . . . . .         . . . . . . . . . . . . . . . 19
       2.2.3 Representation Learning . . . . . . . . . . . .       . . . . . . . . . . . . . . . 20
   2.3 Fairness in Machine Learning . . . . . . . . . . . . . .    . . . . . . . . . . . . . . . 21
CHAPTER 3 LINK PREDICTION: FAIRNESS-AWARE PERSPECTIVE                          . . . . . . . . . 23
   3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . 23
   3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . . . . 25
   3.3 Fairness for Network Data . . . . . . . . . . . . . . . . . . . . .     . . . . . . . . . 27
       3.3.1 Dyadic-level Fairness . . . . . . . . . . . . . . . . . . . .     . . . . . . . . . 28
       3.3.2 Network Modularity . . . . . . . . . . . . . . . . . . . .        . . . . . . . . . 28
       3.3.3 Greedy Post-Processing . . . . . . . . . . . . . . . . . . .      . . . . . . . . . 31
                                                iv


  3.4 Adversarial Learning for Fair Link Prediction .     . . . . . . . . . . . . . . . . . . . 31
      3.4.1 Discriminator . . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . 33
      3.4.2 Generator . . . . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . 34
      3.4.3 Link prediction . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . 35
  3.5 Experimental Evaluation . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . 36
      3.5.1 Experiment Setup . . . . . . . . . . . .      . . . . . . . . . . . . . . . . . . . 36
              3.5.1.1 Datasets . . . . . . . . . . .      . . . . . . . . . . . . . . . . . . . 36
              3.5.1.2 Baseline Algorithms . . . . .       . . . . . . . . . . . . . . . . . . . 37
              3.5.1.3 Sampling process and training       . . . . . . . . . . . . . . . . . . . 38
              3.5.1.4 Evaluation Metric . . . . . .       . . . . . . . . . . . . . . . . . . . 39
      3.5.2 Experimental Results . . . . . . . . . .      . . . . . . . . . . . . . . . . . . . 39
              3.5.2.1 General Performance . . . . .       . . . . . . . . . . . . . . . . . . . 39
  3.6 Conclusions . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . . . . . 40
CHAPTER 4 FAIRNESS PERCEPTION FROM A NETWORK-CENTRIC PERSPECTIVE                                41
  4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  41
  4.2 Quantifying Fairness Perception . . . . . . . . . . . . . . . . . . . . . . . . . . .     43
      4.2.1 Axioms for Fairness Perception . . . . . . . . . . . . . . . . . . . . . . . .      44
      4.2.2 Proposed Network-Centric Fairness Perception . . . . . . . . . . . . . . .          46
  4.3 Fairness Visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
  4.4 Application to Peer Review Networks . . . . . . . . . . . . . . . . . . . . . . . .       50
      4.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    50
      4.4.2 Fairness Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     53
      4.4.3 Fairness Visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   56
  4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    57
CHAPTER 5   MITIGATING PERCEPTION OF BIAS IN PEER-REVIEW DECISION
            MAKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      . . 58
  5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . 58
  5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . 60
  5.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   . . 62
      5.3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . .       . . 62
      5.3.2 Fairness Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . .       . . 63
  5.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . 64
      5.4.1 Proposed Expected Function . . . . . . . . . . . . . . . . . . . . . . .        . . 64
      5.4.2 Maximizing Fairness Perception . . . . . . . . . . . . . . . . . . . . .        . . 65
      5.4.3 FPRank: Fair Peer Review Recommendation . . . . . . . . . . . . . .             . . 69
  5.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . 70
      5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . .      . . 70
              5.5.1.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      . . 70
              5.5.1.2 Baseline Algorithms . . . . . . . . . . . . . . . . . . . . . .       . . 71
              5.5.1.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . .      . . 71
      5.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . .      . . 72
  5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   . . 74
                                               v


CHAPTER 6 FAIR GRAPH SAMPLING . . . . . . . . . . . . .               . . . . . . . . . . . . . 75
   6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
   6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . 78
   6.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . 79
       6.3.1 Network Sampling as Subset Selection Problem . .         . . . . . . . . . . . . . 80
   6.4 Proposed Sampling Framework . . . . . . . . . . . . . . .      . . . . . . . . . . . . . 81
       6.4.1 Node-level Sampling Measure . . . . . . . . . . .        . . . . . . . . . . . . . 81
       6.4.2 Fairness-aware Network Sampling Objective . . . .        . . . . . . . . . . . . . 83
   6.5 Proposed Greedy Fair Network Sampling Algorithm . . . .        . . . . . . . . . . . . . 85
       6.5.1 Greedy Algorithm for Max-min Subgraph Fairness           . . . . . . . . . . . . . 85
       6.5.2 Theoretical Bounds on Greedy Approximation . . .         . . . . . . . . . . . . . 86
   6.6 Experimental Evaluation . . . . . . . . . . . . . . . . . .    . . . . . . . . . . . . . 87
       6.6.1 Experiment Setup . . . . . . . . . . . . . . . . . .     . . . . . . . . . . . . . 87
               6.6.1.1 Datasets . . . . . . . . . . . . . . . . .     . . . . . . . . . . . . . 87
               6.6.1.2 Evaluation Metric . . . . . . . . . . . .      . . . . . . . . . . . . . 87
               6.6.1.3 Baseline Sampling Algorithms . . . . . .       . . . . . . . . . . . . . 89
       6.6.2 Experimental Results . . . . . . . . . . . . . . . .     . . . . . . . . . . . . . 90
               6.6.2.1 General Performance . . . . . . . . . . .      . . . . . . . . . . . . . 90
               6.6.2.2 Performance on Target Nodes . . . . . .        . . . . . . . . . . . . . 91
   6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . 93
CHAPTER 7 FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
   7.1 Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
   7.2 Fairness and Explainability in Machine Learning . . . . . . . . . . . . . . . . . . 96
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
                                               vi


                                        LIST OF TABLES
Table 3.1: The real social network data sets statistics . . . . . . . . . . . . . . . . . . . . . 36
Table 3.2: Proximity based link prediction algorithms. For each node E, # (E) is the set
           of its immediate neighbors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Table 3.3: Performance comparison of various link prediction algorithms on 3 real-world
           network datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Table 4.1: Summary distribution of acceptable and accepted papers for the ICLR 2020
           conference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Table 5.1: Summary statistics of submitted and accepted papers for the ICLR conference
           from 2017 to 2020. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Table 5.2: Average precision of various methods compared to the conference decisions
           (using top institutions as protected attribute). . . . . . . . . . . . . . . . . . . . 72
Table 5.3: Average precision of the various methods compared to the conference decisions
           (using famous authors as protected attribute). . . . . . . . . . . . . . . . . . . . 72
Table 5.4: Comparison of average fairness perception (using top institutions as protected
           attribute) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Table 5.5: Comparison of average fairness perception (using famous authors as protected
           attribute) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Table 5.6: Comparison of weighted statistical disparity (using top institutions as protected
           attribute) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Table 5.7: Comparison of weighted statistical disparity (using famous authors as protected
           attribute) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Table 6.1: Statistics of network data used for experiments. . . . . . . . . . . . . . . . . . . 88
Table 6.2: German credit Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Table 6.3: Faecebook Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Table 6.4: Credit Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Table 6.5: Tagged Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
                                                 vii


Table 6.6: The average node-level harmonic measure for German credit network . . . . . . . 93
                                            viii


                                       LIST OF FIGURES
Figure 3.1: Snapshots of friendship relation among students at a Dutch school taken 2003
            and 2004 along with their modularity values. The node color represents the
            student’s gender. Darker dashed lines correspond to links between students
            of different gender while the solid ones correspond to links between students
            of the same gender. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Figure 3.2: FLIP architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 4.1: An example illustration of fairness perception. . . . . . . . . . . . . . . . . . . 42
Figure 4.2: The average rating distribution of submitted papers. The red line indicates the
            threshold used for classifying papers as acceptable (H = 1) or unacceptable
            (H = 0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Figure 4.3: Degree distribution of nodes in peer-review network. . . . . . . . . . . . . . . . 53
Figure 4.4: Assessment of network-centric fairness perception . . . . . . . . . . . . . . . . 54
Figure 4.5: Comparison of E[⌘(E)] for rejected papers by famous and non-famous authors.            55
Figure 4.6: Effect of neighborhood size on fairness visibility. . . . . . . . . . . . . . . . . 57
Figure 6.1: An example illustration of different sampling objective. . . . . . . . . . . . . . 75
Figure 6.2: The proposed Harmonic measure values for the samples with 10 target nodes. . 93
                                                  ix


                              LIST OF ALGORITHMS
Algorithm 1: Greedy Fair Network Sampling (GFNS) . . . . . . . . . . . . . . . . . . . 86
                                           x


                                           CHAPTER 1
                                        INTRODUCTION
The transformative impact of machine learning on our daily lives is undeniable. Integration of
machine learning algorithms into decision making in various sectors such as healthcare [175, 18],
banking [103], and criminal justice system [15, 62] are becoming increasingly popular. Neverthe-
less, it has also resulted in growing concerns regarding the potential implicit bias in the decision
outcomes of these algorithms [11]. In particular, the output of these algorithms may discriminate
against particular subgroups in the population, as identified by their protected attributes such as
gender, race, or sexual orientation. For instance, a previous study [11] has suggested that the risk
assessment software used in courtrooms in Florida discriminates against African Americans. The
study found that black defendants were nearly twice more likely to be misclassified as higher risk for
recidivism than their white counterparts. In addition, discrimination in machine learning decisions
has been reported in other situations like hiring [42], credit scoring [2], and medical diagnosis
[101].
    There have been concerted efforts to address these concerns in recent years. However, despite
the increasing body of literature on quantifying fairness and developing fair machine learning
algorithms, existing studies are mostly limited to non-relational data, which assume that the data
instances are independent identically distributed (i.i.d). Relational data violates this assumption of
independence as the individual data instances can have direct relations with one another [51]. For
instance, there is a correlation among the underlying content or topics of two papers in a citation
network if one cites the other.
    An important class of relational data is that of network data. Networks are powerful tools for
modeling interactions between entities in a complex system. Examples include the Internet as a
physical network of interconnected computing devices, social media platforms such as Facebook
for human communications, and gene regulatory networks in biochemical systems. Analyzing the
properties of these networks may provide useful insights into the underlying mechanism governing
                                                  1


the behavior of the complex system. In particular, substantial progress has been made to address
fundamental questions such as: How are the links established in a network? How do communities
formed and sustained over time? To provide answers to these and many other network related ques-
tions, innovative algorithms have been developed to mine the rich and rapidly growing repositories
of network data.
     This thesis focuses on developing fairness measures and algorithms for social networks. The
key research question of this thesis is as follows: How to develop a fairness-aware network analysis
framework? In other words, how to detect and mitigate bias in algorithmic decisions in network
data? In the rest of this chapter, I will first describe this challenge in more details and explain why
it is an important problem. I will then discuss the main contributions of this work and provide an
outline of the rest of the thesis.
1.1      Fairness in Network Analysis
     Social network data have proliferated over the past decade. By 2016, 80% of all mobile owners
in the US have switched to smartphones, which is a 38% increase compared to just five years
before1. This means more online users, more online purchases, and more social networking apps.
Many of these apps employ machine learning algorithms, including network-based models, to
provide a variety of services to their customers. However, recent studies [10, 99] have shown that
these algorithms can be prone to biases. For example, Ali et al. [10] showed that the ad targeting
algorithms behind online digital advertising platforms such as Facebook have significant bias in
their advertisement delivery along gender and racial lines.
     In order to address biases and discriminatory decisions in machine learning, the following
questions need to be answered: (1) How to objectively quantify whether an algorithmic decision
is fair/unfair? (2) How to design algorithms that will mitigate the unwarranted biases and prevent
discrimination against individuals or underrepresented groups? For i.i.d. data, researchers have
proposed several definitions of fairness that generally compare an algorithm’s performance on
     1 https://www.freshbooks.com/blog/smartphone-revolution-small-businesses
                                                     2


different subgroups of the population. This allows us to answer questions such as “Is the minority
group more likely to be misclassified than the majority group?" by comparing the disparity in false
positive or false negative rates of classification models across different groups. After quantifying
the fairness measure, the next step is to design algorithms that account for the fairness measure
along with other performance criteria (e.g., model accuracy).
    While the same approach can be applied to relational data, answering these questions would
require more careful considerations in relational data since the link structure often embeds infor-
mation about the protected attribute. For example, individuals in a social network have a tendency
to form ties with other individuals of the same gender, race, age group, educational background,
etc., an effect known as the homophily principle in social network analysis. This may introduce
systematic bias in the network data, which in turn, affect the network analysis results For example
Karimi et al [83] have shown how homophily can influence the ranking of minorities in real-world
networks by restricting their ability to establish links with members of the majority group.
    Previous research has also suggested that many fairness metrics are incompatible with each
other [92]. To illustrate the difficulty of this problem, consider the well-documented debate on
COMPAS, a risk assessment software for predicting recidivism among offenders [173]. ProPublica
investigative journalists claimed that the COMPAS algorithm is racially biased due to the significant
disparity in false positive and false negative rates between black and white offenders [11]. In
response to this analysis [47], Northpointe asserted that their COMPAS software is indeed fair
and that ProPublica analysis had ignored the fairness criteria used by their software. Specifically,
they showed that the risk score provided by COMPAS did not discriminate against blacks since
the likelihood of recidivism predicted by the software is the same regardless of the race of the
offenders. Indeed, Kleinberg et al.[92] showed that it would be impossible to satisfy the diverse
fairness criteria simultaneously. The main takeaway conclusion here is that the notion of fairness
should be application and context dependent. Thus, one should focus on applying the notion of
fairness that makes the most sense for the problem domain at hand. This argument can be extended
to social network data. Since they violate the i.i.d assumption, one should consider defining fairness
                                                    3


measures that take into account the link structure of the network.
1.2     Thesis Contributions
     This thesis aims to expand the study of fairness to the mining of network data. The network
mining tasks to be investigated in this study include link prediction, node classification, and network
sampling. Specifically, I will introduce novel fairness metrics for network data and develop learning
algorithms that connsider the trade-off between fairness and utility of the models. The challenges
and contributions of this dissertation are summarized in the remainder of this section.
1.2.1    Metrics for Assessing Fairness in Network Data
As previously mentioned, the choice of fairness criteria should depend on the problem at hand.
In this dissertation, we introduced three novel fairness metrics depending on the application and
context of the network analysis task, i.e., whether it is at node-level, link-level, or subgraph-level.
The node-level fairness measure is applicable to node classification tasks while link-level fairness
measure can be applied to link prediction problems. The subgraph-level measure, on the other
hand, can be utilized for the network sampling problem.
     Chapter 3 presents a new link-based network fairness metric, which is an adaptation of the
well-known network modularity measure. The modularity of a network is computed by comparing
its link density against that of a random network with similar degree distribution. The measure
is extended to account for homogeneity of the node pairs that form the links in the network.
Specifically, a network with a large proportion of its links between nodes with the same protected
attribute value will have a high modularity value while one with more diverse links will have a
lower modularity value. This measure can be utilized to promote more heterogeneous (diverse)
links, and thus, prevent reinforcing segregation in link-based recommender systems.
     Chapter 4 introduces the proposed node-level metric known as network-centric fairness per-
ception. The idea here is that an individual’s perception of fairness should depend on his/her
expectation of the decision outcome. For example, if the individual expects a positive outcome
                                                   4


and indeed receives a favorable decision from the algorithm, the individual will likely perceive the
decision to be fair. However, if a positive decision is expected yet a negative outcome is received,
the individual will likely perceive the decision as unfair. The question is, how do we determine the
expected outcome of an individual? In social comparison theory, the expected outcome depends
on members of the reference group to whom the individual identified with. For network data,
the reference group that helps shape an individual’s expectation is defined by the node’s local
neighborhood. For example, a node expects a positive outcome if all of its neighbors also receive
a positive decision outcome. The outcomes of neighboring nodes thus help to shape the expected
outcome, which in turn, can be used to compute the fairness perception of an individual node.
    Chapter 6 introduces the proposed subgraph-level fairness metric known as max-min subgraph
fairness. It is based on the idea that fairness measure on a subgraph should be measured in terms of
its worst-case quality measure across all subgroups of the protected attribute. The metric is inspired
by the idea proposed in [118] for classification of i.i.d. data, which evaluates group fairness as
a minimax problem instead of the standard approach of measuring disparity in the outcomes for
different groups. In this thesis, the proposed max-min subgraph fairness measure is used for the
network sampling problem.
1.2.2   Fairness-Aware Network Mining Tasks
After offering network fairness measures, the next challenge is to introduce fairness into network
analysis tasks without significantly sacrificing accuracy. In this section we discuss the main findings
concerning this challenge.
1.2.2.1   Fairness-Aware Link Prediction
Homophily, which is the tendency of individuals to form relations with others similar to them, is an
essential characteristic of many social networks. Homophily can give rise to network community
structure with potential discriminative consequences in some situations. For example, consider pro-
fessional networking sites. Certain professions, such as software engineering, are male-dominated
                                                   5


and intensely segregated. This phenomenon, which is reflected in the network’s community struc-
ture, can adversely effected the employment opportunities of talented female engineers and other
minorities. Homophily can be quantified using the well-known network modularity (or assortative
mixing) measure.
    As we discussed above, due to the homophily principle, current link prediction algorithms
are susceptible to promoting links that may increase the network’s segregation. To mitigate this
problem, we proposed two algorithmic solutions in Chapter 3. The first algorithm is a greedy
postprocessing approach that utilizes the proposed fairness criteria based on modularity measure.
This algorithm can be applied to the output of any link prediction algorithm, and a hyper-parameter
can specify the importance of accuracy versus fairness measure. The second algorithm is a
novel framework that combines adversarial network representation learning with supervised link
prediction. The architecture of the algorithm consists of three components: A generator that
learns node representations, a discriminator that tries to predict link types, and a link prediction
component for new link inference. Experimental results on several real-world datasets showed
the effectiveness of the proposed methods in reducing the predicted network’s modularity without
degrading prediction accuracy significantly.
1.2.2.2   Fairness-Aware Node Classification
The network-centric fairness perception measure described previously can be used as a criterion
for assessing fairness in node classification decisions. In Chapter 4, we presented a case study on a
peer-review network to show how fairness perception can be exploited to mislead individuals into
perceiving unfair algorithmic decisions as fair. In Chapter 5, we investigated this issue theoretically.
The main theoretical result of this work is finding an upper bound on true positive rate disparity, i.e.,
the gap between true positive rates for two distinct groups of the protected attribute, for a classifier
that maximizes fairness perception. This upper bound is a function of the network structure and
the distribution of nodes belonging to the protected groups in the network. When the upper bound
is small, we showed that maximizing fairness perception has a linear solution. In this case, fairness
                                                   6


perception will not only ensure that individuals are satisfied with the classification decision but
also guarantees fairness in terms of true positive rate parity. When the upper bound is large,
we proposed a novel multi-objective optimization algorithm to achieve a compromise between
maximizing fairness perception and minimizing the true positive rate disparity.
1.2.2.3   Fairness-Aware Network Sampling
Network sampling aims to achieve the following two objectives: (1) to preserve specific properties
of the original network. We refer to this objective as structural preservability. (2) to obtain
a representative subset of nodes. We will refer to this objective as group representativity. A
fair sample can be defined as a sample that satisfies both structural preservability and group
representativity. Chapter 6 shows how the proposed max-min subgraph fairness measure can be
used as a unifying framework that combines structural preservability and group representativity.
    A greedy algorithm was proposed to generate a representative and preservative subgraph given
a target set of nodes. The algorithm utilizes a preservability measure based on the harmonic
mean node centrality measure. An approximation guarantee for the output of the proposed greedy
algorithm based on submodularity and curvature ratios is also presented. The proposed sampling
strategy can be also employed during the mini-batch training process of a graph convolutional
network (GCN) to generate an embedding of similar quality for each subgroup of the population.
1.3    Outline
    The remainder of this thesis will be organized as follows: Chapter 2 discusses the related work
and provide a formal overview of the material related to this thesis. In Chapter 3, we move forward to
the main contribution of this thesis and discuss the filter bubble problem from a fairness perspective.
Chapters 4 and 5 are related to the fairness perception problem. While Chapter 4 mainly focused
on the introduction of the notion of fairness perception and its properties. Chapter 5 presents
the proposed algorithm to mitigate the perception bias. Chapter 6 investigates the fairness-aware
network sampling problem. Finally, Chapter 7 discusses the plans for future research directions on
                                                   7


network analysis and fairness-aware machine learning.
1.4    Bibliographic Notes
    The materials in this thesis are based on the following publications:
   1. Mitigating Perception of Bias in Peer-Review Decision Making F Masrour, P Tan, A Esfa-
      hanian, in Proceedings of IEEE International Conference on Data Mining (Under Review).
   2. Algorithmic Fairness Perception: A Network-Centric Perspective F Masrour, P Tan,
      A Esfahanian, in Proceedings of IEEE International Conference on Data Mining (ICDM),
      2020.
   3. Bursting the Filter Bubble: Fairness-Aware Network Link Prediction. F Masrour, T
      Wilson, H Yan, P Tan, A Esfahanian, in Association for the Advancement of Artificial
      Intelligence(AAAI), 2020.
    During my research at Michigan State University, I collaborated with several colleagues on
multiple deep learning and optimization research projects with applications in network analysis.
The following are my additional publications related to this research:
   1. OPTANE: An OPtimal Transport Algorithm for NEtwork Alignment. F Masrour, P
      Tan, A Esfahanian, in International conference on Advances in Social Networks Analysis
      and Mining(ASONAM), 2019.
   2. You have been CAUTE! Early Detection of Compromised Accounts on Social Media. C
      VanDam, F Masrour, P Tan, T Wilson, in International conference on Advances in Social
      Networks Analysis and Mining(ASONAM), 2019.
   3. Attributed Network Representation Learning Approaches for Link Prediction. F Mas-
      rour, P Tan, A Esfahanian, C VanDam in International conference on Advances in Social
      Networks Analysis and Mining(ASONAM), 2018.
                                                  8


4. Network Completion with Node Similarity:A Matrix Completion Approach with Prov-
   able Guarantees. F Masrour, R Forsati, I Barjesteh, in International conference on Advances
   in Social Networks Analysis and Mining(ASONAM), 2015.
5. PushTrust: Efficient Recommendation by Leveraging Trust and Distrust Relations. R
   Forsati,I Barjesteh ,F Masrour,The ACM Series on Recommender Systems(RecSys), 2015.
6. Cold-Start Item and User Recommendation with Decoupled Completion and Transduc-
   tion. I Barjesteh, R Forsati, F Masrour, The ACM Series on Recommender Systems(RecSys),
   2015.
                                              9


                                             CHAPTER 2
                                       LITERATURE REVIEW
This chapter has two goals. First, a formal overview of the material related to this work. Second,
a comprehensive and critical synthesis of the state of art works related to network mining. In
particular, we first introduce a formal definition of a network. Then we discuss the main network
mining tasks.
2.1     Networks
    In this section, we first introduce basic concepts and definitions in the field. Then we discuss
some quantitative properties and theoretical ideas of graph theory necessary for the discussions in
the following chapters.
2.1.1   Basic Concepts and Definitions
A network is an abstract representation of real-world concepts focusing on entities and the interac-
tions between them. In math literature, a graph is a structure consisting of a set of objects and a
set of object pairs which indicates the relation between the objects. There is no difference between
a graph and a network, and in this work, we use these terms interchangeably. An information
network is a particular category of networks such that entities refer to a form of data. Example
of information network includes World Wide Web, online social networks and academic citation
networks. In this thesis, we mostly focus on attributed information networks, which can be defined
formally as follow:
Definition 1. An attributed network is a directed graph ⌧ =< +, ⇢,          >, where + is the set of
nodes(vertices) in ⌧, ⇢ is the set of links(edge) between nodes in ⌧ where each element in this set
is a pair of nodes in +, and     is the corresponding node attributes matrix such that each row in
represent the feature vector of nodes in +.
                                                  10


     A undirected attributed network is a special case of attributed network, where if (E8 , E 9 ) 2 ⇢
then also (E 9 , E8 ) 2 ⇢. An alternative way to represent a network is by using adjacency matrix.
The adjacency matrix A of network ⌧ is a = ⇥ =, = = |+ | , matrix with elements A8 9 such that
                                              8
                                              >
                                              >
                                              >
                                              < 1 if (E8 , E 9 ) 2 E
                                              >
                                       A8 9 =                                                       (2.1)
                                              >
                                              >
                                              >
                                              > 0 otherwise
                                              :
We also define immediate neighbors of node E8 as follow:
                                      # (E8 ) = {E 9 | (E8 , E 9 ) 2 E}                             (2.2)
Another notation we will use in the rest of this work is degree vector 3 2 R= for = = |+ |. where
                                                            ’
                                        38 = |# (E8 )| =          A8 9                              (2.3)
                                                              9
For directed networks the above equation defines the out degree vector and for indegree vector we
have
                                                     ’
                                              38 =       A 98                                       (2.4)
                                                       9
A walk on a network is a sequence of nodes E 1 , E 2 . . . E : such that for any node E8 in the sequence
(E8 , E8+1 ) 2 ⇢. A trial in the network is a walk in which all links are distinct. A path is a trail
which all the nodes in the walk are distinct. A cycle is a non-empty trail such that the only repeated
nodes are the first and last nodes. Distance of node E8 from node E 9 is represents by 38BC (E8 , E 9 ),
and it is equal to the length of the shortest path starting from node E8 and end in node E 9 .
     A network is connected if there exist a path that connects every pair of nodes in the network. A
network is said to be aperiodic if there is no integer : > 1 that divides the length of every cycle of
the network.
2.1.2    Random Walk
A random walk is a sequence of nodes on a network generated by traversing network randomly. In
particular, if P represent the transition matrix such that P8 9 is the probability of reaching node E8
                                                    11


from node E 9 such that for all E8 we have
                                             ’
                                                 P8 9 = 1
                                              9
    Then a random walk is the process of starting from a node in the network and traversing the
network according to the transition matrix P. One can define P = D 1 A where D is the degree
                        Õ
diagonal matrix, D88 = 9 Aij .
    Let ?(C) 2 R= represents the probability vector(distribution) of a random walk on a given
network where ?8 (C) is the probability of visiting node E8 at step t of random walk. ?(C) can be
expressed by the following recursive equation:
                                 ?(C + 1) = P> ?(C) = (D 1 A) > ?(C)                             (2.5)
And the stationary distribution can be defined as follow
                                            c = lim ?(C)                                         (2.6)
                                                C!1
By using fundamental theorem of markov chains [38], it can be shown that a random walk will
converge to a unique stationary distribution regardless of the choice of starting node if the network
is connected and aperiodic.
2.1.3   Network Measures and Metrics
In order to gain a better understanding of a network and capture particular feature of it one can
measure some useful quantities. In the following section we will introduce some useful measures
which will be apply in developing baseline algorithms and analysing networks in following chapters.
2.1.3.1   Network Centrality
When we are dealing with networks an important research question is "Which nodes in the network
are important or central one?" There is no unique answer to this question and many centrality
measures have been proposed [130]. Here we describe some of the most important centrality
measures.
                                                 12


    The simplest possible definition of importance of nodes in a network is degree size.
Definition 2. Degree Centrality ⇠3 for a node E8 in a network is ⇠3 (E8 ) = 38
    For directed network one can define degree centrality base on indegree, outdegree, or summation
of both. The fact that degree centrality is a simple concept does not undermine it’s importance.
For instance, in social network users with high number of connections have more influence or more
prestige than those with less connections.
    While number of connections is an important measure of centrality. A generalized version of it
is eigenvector centrality which incorporate the importance of the neighbors of a node.
Definition 3. Eigenvector Centrality ⇠4 for a node E8 in a network is
                                                 1’
                                      ⇠4 (E8 ) =       A8 9 ⇠4 (E 9 )
                                                 _
                                                     9                                          (2.7)
                                                 1
                                           ⇠4 = A>⇠4
                                                 _
where _ is some fixed constant.
    If the graph be connected, by using Perron-Frobenius Theorem [184] we can show the eigen-
vector corresponding to the largest eigenvalue of adjacency matrix is equal to ⇠4 .
    One challenge with the eigenvector centrality is the fact that a high centrality node with high
degree pass high centrality to all its neighbors. However not everyone known by a well known
person is well known. To address this problem one solution is PageRank centrality.
Definition 4. PageRank centrality ⇠ ? for a node E8 in a network is
                                                 ’        ⇠ ? (E 9 )
                                   ⇠ ? (E8 ) = U     A8 9            +V
                                                             39
                                                  9                                             (2.8)
                                        ⇠ ? = UA> D 1⇠ ? + V1
    Where 1 is a vector of all one. U and V are user specify scalar. In practice U < _ where _ is
the largest eigenvalue of the adjacency matrix of the network. For directed network 3 9 in the above
definition will be replaced by out degree and it should be nonzero.
                                                    13


    A completely different path to measure a network nodes importance is closeness centrality.
This centrality measures the mean distance from a node to other nodes in the network. However,
this definition results in low value to more central nodes which is opposite of the behavior of other
centrality measures we defined above. A commonly approach is to consider the inverse of the mean
distance:
Definition 5. Closeness centrality ⇠2 for a node E8 in a network is
                                                             =
                                       ⇠2 (E8 ) = Õ                                              (2.9)
                                                       9 38BC (E 8 , E 9 )
    Another centrality measure base on the pairwise distance is betweenness centrality. It is base
on the number of times a node lies on paths between other nodes.
Definition 6. Betweenness centrality ⇠1 for a node E8 in a network is
                                                         ’ =89 :
                                          ⇠1 (E8 ) =                                            (2.10)
                                                                =9:
                                                       9<8,:<8
    where =89 : is the number of shortest path from node E 9 to E : that pass thought node E8 . = 9 : is
total number of shortest path from E 9 to E : .
2.1.3.2   Homophily and Assortative Mixing
Homophily [123, 100], which is the tendency of individuals to form relations with others similar to
them, is an important characteristic of many social networks. Such relationship can be quantified
using the well-known network modularity (or assortative mixing) measure [134, 133]. The measure,
which was originally developed for community detection in networks, is based on the idea that a
random graph is not expected to contain any clustering structure. Any community structure in a
given network can thus be validated by comparing its link density against its expected density if the
link structure of the network is completely random. The modularity measure is defined as follows
[134]:
                                         1 ’⇣              38 3 9 ⌘
                                  &=                89              X(28 , 2 9 )                (2.11)
                                        2<                  2<
                                            89
                                                     14


where X(28 , 2 9 ) is the Kronecker delta function, 28 is the community of node 8, and < is total
number of links. Intuitively, a network is said to be assortative if a significant portion of its links
are between nodes that belong to the same community.
2.2     Network Mining Tasks
    In this section we will provide a comprehensive study of state of art works related to network
mining. We categorize the network mining related tasks into three groups: (1) Nodes oriented
tasks, (2) link oriented tasks, and (3) sampling and summarizing tasks.
2.2.1   Node Oriented Tasks
The first category of network mining are the tasks which focus on nodes and includes classification,
clustering and community detection, anomaly and spam detection, and ranking. In this section, we
review these tasks.
2.2.1.1    Classification
Node classification is a classic problem in network mining with applications ranging from online
social network and e-commerce to computational biology. In node classification problem we have
access to labels of a subset of nodes, and the task is to infer labels of the remaining nodes in the
network. Different from traditional classification researches, because of none-IID nature of the
network data, node classification has received considerable attention. In network classification link
information can be combined by using label propagation [193, 168], logistic regression[115], and
graph regularization [192].
    With recent progress in deep learning and neural network study, graph neural network also
gained a great deal of attention among researchers which results in novel deep models for solving
node classification. Deepwalk [144] is the first well known work in this group. Inspired by natural
language processing the DeepWalk method takes the output of multiple random walks on network
to learn a latent representations of nodes. Kipf and Welling[89] introduced graph convolutional
                                                  15


networks (GCNs) which is a variant of convolutional neural networks operate directly on graphs.
Matthias et al, [55] introduced SplineCNN a novel convolution operator based on B-splines, that
makes the computation time independent from the kernel size. VeliÃkovi∆, et al. [164] present graph
attention networks (GATs), leveraging masked self-attentional layers to address the shortcomings
of prior methods based on graph convolutions or their approximations.
2.2.1.2   Clustering and Community Detection
The cluster or community structure property of network is an important feature of real networks,
which is based on the assumption that the network structure is consists of modules commonly called
communities or clusters and the goal of community detection or clustering algorithm is to partition
nodes into these components. There are a variety of community detection algorithms which
can roughly divided into two categories: Structural clustering and attributed network clustering.
The structural clustering methods only consider the network topology. Methods based on min-
cut problem [63] cut the network into several partitions and assume these partitions represent
communities. The size of the cut is the number of edges that are being cut. Methods based on
graph Laplacian [132] assume that similar nodes should be mapped closer. Quasi-clique detection
methods are based on extracting dense subgraph structures [3, 142]. Methods based on matrix
factorization [179] factorize the adjacency matrix into node embeddings. In contrast, Attributed
graph clustering takes into account both network topology and node side information [181, 25, 180].
    More recently, there are methods developed for network clustering and community detection
using neural network models. Zhang et al., [190] proposed an adaptive graph convolution method
for network clustering that utilise high-order graph convolution to detect cluster structure. Pan,
et al.,[140] proposed adversarially regularized graph autoencoder (ARGA) for graph clustering
and other network mining tasks. Kipf et al., [90] proposed VGAE a framework for unsupervised
learning on graph-structured data based on the variational auto-encoder(VAE).
                                                16


2.2.1.3   Anomaly Detection
An anomaly is defined as an unusual activity exhibiting a different behavior than others present
in the same structure [84]. In network setting one application of anomaly detection is network
intrusion detection, where have computers sending packets to each other, and the goal is to find
misbehaved nodes. Another application is detecting fake or compromised account in social net-
work. Network anomaly detection can be defined as the task of finding the network objects
(nodes/edges/substructures) that are rare and that differ significantly from the majority of the
reference objects in the network [8].
    Similar to clustering and community detection, network anomaly detection task can be divided
into (1) plain network and (2) attributed network anomaly detection. For the plain network a
common approach for detecting anomalous network objects is feature extraction[7, 71, 82, 66].
Basically, these approaches use the network structure to compute measures associated with the
nodes, edges, subgraphs and communities and use these measures to find anomalous entities. For
example, [7] proposed the oddball algorithm for finding anomalous nodes by extracting rules (power
laws) in density, weights, ranks and eigenvalues which govern the ego network of social network
users.
2.2.2   Link Oriented Tasks
The second category of network mining are the tasks which focus on links and relation and includes
link prediction and, Network alignment. In this section, we review these tasks.
2.2.2.1   Link prediction
Link prediction is a well studied problem in network studies [111, 9, 125, 119]. Given a network
⌧ = (+, ⇢), let ⇢ 0 ⇢ ⇢ denote a set of observed links and ⇢        ⇢ 0 denote a set of unobserved
links in the network, where |⇢ 0 | < |⇢ |. The goal of link prediction is to accurately predict the
unobserved links from the partially observed network (+, ⇢ 0). One of the major obstacle of link
                                                17


prediction problem is class imbalance. In real network the |⇢ | is significantly smaller than number
non existing links.
    Various algorithms for link prediction have been developed over the past two decades. This
include simple algorithms that consider the pairwise similarities between nodes, where the similarity
is defined according to the network topology or social science theories [171]. This group of
algorithms includes as neighborhood based measures such as common neighbors [44] or Jaccard
coefficient which calculates the portion of common neighbors for a given nodes pair. Another two
well known algorithms in this category are Adamic/Adar(Ad-Ad), a similar measure that assigns
less weight to more connected common neighbors, and preferential attachment (Pr-At)[128], which
sets the probability of a connection between two pair of such that it is correlated with the product
of the their degrees [131].
    Another group of link prediction algorithms is based on global similarity measures. They are
defined based on the adjacency or Laplacian matrices of the network. Examples of this category
include: Path based methods [98, 137, 111] and random walk based methods [112, 111]. In
addition, probabilistic graphical models [34] and matrix factorization [152, 125] methods have also
been widely used to address the link prediction problem. Recent years have also witnessed the
emergence of deep neural network methods for the link prediction task [107, 108, 109, 160, 188].
Many of these algorithms however suffer from scalability issues due to training time requirement
for learning the hidden layers of their architecture [65].
    The knowledge graph (KG) is a subgroup of network data structure that represents a semantic
network. The node represents a concept, and the edge represents a relation between two entities.
KG has gained attention in recent years and it has broad applications from in finance [59], to health
[149] and semantic search [177]. Link prediction task can be applied in KG to infer new relations
to complete the missing knowledge [172, 45, 191].
                                                  18


2.2.2.2    Network Alignment
Work on network alignment can be generally classified into three approaches: (1) quadratic pro-
gramming, (2) IsoRank-based, and (3) graphlet-based.
     In principle, network alignment can be formulated as an integer quadratic programming prob-
lem [13], which unfortunately is a computationally hard problem to solve. Various approximation
methods were proposed to overcome this problem. For example, Bayati et al.[13] employed a
message passing algorithm based on belief propagation while Klau [91] converted it into a linear
programming problem. Other binary relaxation approaches proposed include those by Koutra et
al. [95], Zhang and Tong [189] and Vogelstein et al. [165]
     IsoRank by Singh et al. [154] is one of the most well-known network alignment algorithm.
IsoRank iteratively learns the alignment score between two nodes based on the scores of its
neighbors. It can be considered as the pioneer of a class of algorithms in the field. The main idea of
the IsoRank algorithm is similar to PageRank algorithm proposed by Page et al. [139]. PageRank
defined node similarity recursively in terms of the similarity of the node neighbors and IsoRank
utilized this idea by combining two networks using Kronecker product and calculate the alignment
index on that network. After IsoRank several improvement version has been presented. Variations
of IsoRank include the Liao et al. [110], to reduce computational cost and improve interpret-ability
of similarity scores. For instance, Kollias et al. [94] had significant improvements, in terms of
computational cost, interpret-ability of similarity scores, and nature of queries. In IsoRankN Liao
et al. [110] improved the performance of original IsoRank by utilizing iterative spectral clustering
algorithm. Another class of methods is known as GRAAL, which is based on graphlet degree
signature vectors. Graphlets are small connected non-isomorphic induced subgraphs of a large
network [?]. The first algorithm in this class was proposed by Kuchaiev et al. [96], which uses only
topological structure to match the networks.
                                                  19


2.2.3   Representation Learning
Network representation learning is an important research problem as the latent features can be used
for various network mining tasks such as node classification, community detection, link prediction,
etc. The ultimate goal is to construct a set of features for the nodes that are comparable or better
than the hand engineered features defined by domain experts [14]. Motivated by the word2vec
algorithms [126, 127], using the analogy of representing a network as a document, this has inspired
a group of works focusing on representation learning of nodes in networks using random walk based
methods [65, 144, 159]. For example, DeepWalk [144] used an approximation of the full softmax
called the hierarchical softmax [126] for learning the node embedding whereas the node2vec [65]
algorithm uses an alternative method known as negative sampling [127]. This negative sampling
strategy was shown to produce better performance compared to the hierarchical softmax approach.
    The previous methods consider only the link structure to learn the node representation. Since
the nodes of a network contain rich information, Yang et al. [178] showed the equivalence between
DeepWalk and matrix factorization and presented a framework that can incorporate node attributes
into the network representation learning process. Similarly, Huang et al. [74] proposed the Label-
informed Attributed Network Embedding (LANE) framework, which is capable of integrating both
the node attribute and label information with the graph structure to enhance the network embedding
process. Additionally, Kipf and Welling [89] proposed a semi-supervised convolutional neural
networks approach which operates directly on graphs and encodes both the local graph structure
as well as node attributes. This work was subsequently extended to an unsupervised learning
approach known as GAE in [90]. Closely related to GCN, Hamilton et al. [67] proposed an
inductive framework that is capable of learning node embeddings for previously unseen data.
    There has been a growing interest in attention models for graphs recently and various techniques
have been proposed [164, 32, 54, 68] One of the benefits of attention mechanisms is that they allow
for dealing with variable sized inputs and allowing it to “focus on the most relevant parts of the
input to make decisions” [164]. Moreover, inspired by Generative Adversarial Network(GAN),
there are methods proposed recently for learning stable and robust graph representation. [169, 41]
                                                 20


2.3     Fairness in Machine Learning
    Quantifying fairness has been a subject of intense debate among AI and ML researchers in
recent years [16, 49, 69, 97]. Previous works are primarily focused on non-relational data and
can be classified into two types—individual-level or group-level fairness. Fairness definition at
individual level is based on the premise that similar people should be treated similarly. For example,
Dwork et al. [49] defined a task-specific metric based on a probabilistic distance measure between
individuals via a Lipschitz condition. The metric is used as constraints to optimize a fairness-aware
classifier. In contrast, the group-level approach quantifies fairness in terms of statistical measures
such as demographic parity, equalized odds [69] or balanced error rate [53] with respect to the
protected groups. The measures are typically computed from a confusion matrix [16] and are
used to ensure that the average performance do not vary significantly among different groups of a
protected attribute.
    Let . be the target variable of interest (true outcome) and - be a set of input features. Con-
ventional supervised learning algorithms are designed to predict the target outcome . from - by
learning a model 5 such that .̂ = 5 (-) is the predicted outcome. Existing fairness-aware methods
seeks to ensure that the predictions generated by the model will not discriminate against one or
more subgroups, defined by a protected attribute - ? such as gender, race, or sexual orientation.
    A widely used criterion for assessing fairness is demographic parity or statistical parity. De-
mographic parity constrains the output of classification problem to be independent of the protected
attribute. Particularly a predictor satisfies demographic parity if there is no correlation between
the protected feature and the output. Demographic parity has been used in number of papers
[113, 80, 78, 50, 24], which considers the degree of independence between the model output and
protected attribute. Assuming both the target outcome and protected attributes are binary-valued,
demographic parity seeks to achieve:
                                       %(.̂ = 1|- ? ) = %(.̂ = 1)
    In other words, demographic parity wants to equalize the positive decision across different
                                                  21


groups of the protected feature.
    Another well known fairness criterion is equalized odds [69], which seeks to ensure that the
predictions are conditionally independent of the protected attribute given the true outcome:
                        %(.̂ = 1|- ? = 0, . = H) = %(.̂ = 1|- ? = 1, . = H),
If we consider . = 1 as advantaged outcome, such as job offer or college admission. Then on can
relax the equalized odds to only consider non-discrimination within the advantaged outcome. This
relaxation is called equal opportunity [69] a predictor .̂ satisfies equal opportunity with respect to
protected attribute A and outcome Y, if
                        %(.̂ = 1|- ? = 0, . = 1) = %(.̂ = 1|- ? = 1, . = 1),
    In addition to quantifying notion of fairness, there has been growing literature on developing
fairness-aware methods. Current methods can be divided into three categories. The first category
includes prepossessing algorithms [186, 113, 116] with the motivation that training data is the
main cause of bias in machine learning. Zemel et al. [186] introduced an optimization algorithm
to map data points into a new space to ensure membership in the protected group is lost. [113]
developed a variational autoencoder model for learning node representation that are invariant to
protected features while preserving as much of the information as possible. Madras and et al. [116]
connected group fairness concept to adversarial concept for learning fair representation.
                                                 22


                                           CHAPTER 3
                  LINK PREDICTION: FAIRNESS-AWARE PERSPECTIVE
In this chapter, we examine the filter bubble problem from the perspective of algorithm fairness
and introduce a dyadic-level fairness criterion based on network modularity measure. We show
how the criterion can be utilized as a postprocessing step to generate more heterogeneous links in
order to overcome the filter bubble problem. In addition, we also present a novel framework that
combines adversarial network representation learning with supervised link prediction to alleviate
the filter bubble problem.
3.1     Introduction
    Online social networking sites have transformed the way individuals interact and share infor-
mation with each other. The wealth of social network data available also provide opportunities to
mine them for a variety of business applications. For example, businesses can learn about the users’
interests, sentiment, and online behavior by analyzing the social network data. The insights gained
from such analysis will help businesses to increase engagement with their existing customers or
connect with new customers. Despite its importance, recent studies have raised concerns about the
potential biases and unintended consequences that may arise from such automated analysis.
    For example, link prediction methods [111, 9, 119, 120] are commonly employed by social
networking sites to encourage users to expand their social circles. “Suggested for you" on Instagram
and “People you may know" on LinkedIn are two example applications of such methods. However,
the rise of link prediction systems have led to an effect known as filter bubble [141], which is
the reinforced segregation and narrowing diversity of information exposed to online users. If left
unchecked, the filter bubble may introduce systematic biases in the network data and its subsequent
analysis. For instance, Hofstra et al. [72] examined the ethnic and gender diversity of social
relationships on Facebook and showed that those who have ample opportunities to befriend other
similar users often find themselves in highly segregated networks. This is due to the homophily
                                                 23


principle [123], which is the tendency of individuals to form social ties with other similar individuals
in a network. As current algorithms are designed to promote links between similar users, their
suggested links may exacerbate the user segregation problem.
     In addition to online social networks, the filter bubble problem is also prevalent in recommender
systems, which can be viewed as a link prediction task applied to a bipartite network of users and
items. A recent study by Nguyen et al. [135] concluded that recommender systems tend to expose
users to “slightly narrowing set of items over time." For example, in movie recommendation, movies
from a certain genre may only be recommended to users from a specific gender. By addressing the
filter bubble problem in network link prediction, the proposed method can potentially be used to
alleviate the filter bubble problem in other types of recommender systems.
     This work examines the filter bubble problem for network link prediction from algorithm fairness
perspective. Specifically, we consider a link prediction algorithm to be unfair if it is biased towards
promoting certain types of links (e.g., those between users with similar gender or other protected
attributes). As a motivating example, consider the link prediction task on professional networking
sites. Certain professions, such as software engineering, tend to be dominated by men, a fact that
is likely to be reflected in the link structure of the professional network. As a result, the links
recommended by the site may reinforce this gender-based segregation and primarily recommend
links between individuals from the same gender while recommending comparatively fewer inter-
gender links. Though such a system may be able to achieve high link prediction accuracy, it may
unfairly disadvantage some users. For example, a female software engineer may be treated unfairly
as they are seldom recommended to other male software engineers.
     Unfair practices due to the decisions generated by automated systems is a problem that has been
well-documented in many application domains, including criminal justice, mortgage lending, and
university admission. For example, Angwin et al. [11] warned about the potential biases against
African Americans in the software used to predict the risk score of defendants who would likely
re-offend again while O’Neil [136] cautioned against the manipulative marketing tactics used by
for-profit colleges in online advertising that exploit vulnerable populations. These concerns have
                                                     24


brought increasing scrutiny into the issue of fairness in machine learning algorithms. Despite their
growing research, existing works are primarily focused on independent and identically distributed
(i.i.d) data, and thus, may not be suitable for link analysis problems. For example, previous works
have considered the notion of fairness either at individual [49] or group [69, 53] level. In contrast,
this work examines the notion of fairness at a dyadic-level, based on the pairwise interactions
between users in a social network. Furthermore, previous approaches have considered fairness in
terms of the unjust decisions against members of a specific underrepresented (protected) group.
Instead, we consider fairness in terms of promoting inter-group connections in a network in order
to alleviate the filter bubble problem.
     There are four major contributions of this work. First, we empirically assess the influence of
protected attributes such as gender on the link structure of a network by measuring the homophily
effect on several real-world network datasets. Second, we introduce modred as a fairness criterion
for network link prediction. The metric is inspired by the well-known modularity measure [134]
developed for network community detection. We consider the reduction in modularity measure as
a way to determine whether the links predicted by an algorithm may lead to further segregation
of the network. We then illustrate how the measure can be incorporated into a greedy algorithm
for postprocessing the results of current link prediction algorithms. Finally, we present a novel
Fairness-aware LInk Prediction (FLIP) framework that combines adversarial network representation
learning with supervised link prediction to mitigate the filter bubble problem.
3.2     Related Work
     Link prediction is a well studied problem in network analysis with various algorithms been
developed over the past two decades [111, 9, 119]. This includes heuristics methods that consider
the pairwise similarities between nodes, where similarity is defined based on the network topology
[131, 111] or node features [39, 27]. The main benefit of these methods is their simplicity and the fact
that most of these approaches do not required training. Another class of link prediction methods
employ machine learning methods, such as those based on probabilistic graphical models [34],
                                                   25


matrix factorization [152], and supervised classification [9, 167]. Despite their higher accuracy,
these methods often suffer from the class imbalance problem as the number of links in a network is
significantly fewer than the number of non-links. Recent years have also witnessed the emergence
of deep neural network methods for the link prediction task [108, 109, 160]. These methods have
been shown to achieve state of the art performance.
    Social networks are increasingly personalizing their content using automated machine learning
techniques, which is a concern as the decisions may lead to adverse effects on the users. This is
due to the so-called “filter bubble" or “echo chamber" effect [72, 141] in which individuals are
increasingly isolated to consuming only information that conform to their own belief system. In
online social networks, the effect of filter bubble is exemplified by the recommendation decisions
generated using link prediction algorithms. As link prediction algorithms are commonly used to
encourage users to expand their networks, this may lead to adverse consequences such as segregation
of users [72, 135].
    Quantifying fairness has been a subject of intense debate among AI and ML researchers in
recent years [16, 49, 69, 97]. Previous works are primarily focused on non-relational data and
can be classified into two types—individual-level or group-level fairness. Fairness definition at
individual level is based on the premise that similar people should be treated similarly. For example,
Dwork et al. [49] defined a task-specific metric based on a probabilistic distance measure between
individuals via a Lipschitz condition. The metric is used as constraints to optimize a fairness-aware
classifier. In contrast, the group-level approach quantifies fairness in terms of statistical measures
such as demographic parity, equalized odds [69] or balanced error rate [53] with respect to the
protected groups. The measures are typically computed from a confusion matrix [16] and are
used to ensure that the average performance do not vary significantly among different groups of a
protected attribute.
    In addition, there has been growing literature on developing fairness-aware methods. Current
methods can be divided into three categories. The first category includes prepossessing algorithms
[186, 113, 116] with the motivation that training data is the main cause of bias in machine learning.
                                                  26


Zemel et al. [186] introduced an optimization algorithm to map data points into a new space to
ensure membership in the protected group is lost.        [113] developed a variational autoencoder
model for learning node representation that are invariant to protected features while preserving as
much of the information as possible. Madras and et al. [116] connected group fairness concept to
adversarial concept for learning fair representation.
3.3    Fairness for Network Data
    We first review the fairness criteria for i.i.d. data. Let . be the target variable of interest
(true outcome) and - be a set of input features. Conventional supervised learning algorithms are
designed to predict the target outcome . from - by learning a model 5 such that .̂ = 5 (-) is the
predicted outcome. Existing fairness-aware methods seeks to ensure that the predictions generated
by the model will not discriminate against one or more subgroups, defined by a protected attribute
- ? 8 - such as gender, race, or sexual orientation.
    A widely used criterion for assessing fairness is demographic parity [113, 80, 78, 50, 24], which
considers the degree of independence between the model output and protected attribute. Assuming
both the target outcome and protected attributes are binary-valued, demographic parity seeks to
achieve:
                                      %(.̂ = 1|- ? ) = %(.̂ = 1)
Another well known fairness criterion is equalized odds [69], which seeks to ensure that the
predictions are conditionally independent of the protected attribute given the true outcome:
                        %(.̂ = 1|- ? = 0, . = H) = %(.̂ = 1|- ? = 1, . = H),
If we consider . = 1 as advantaged outcome, a special case for this criterion is known as equal
opportunity [69], which is defined as follows:
                        %(.̂ = 1|- ? = 0, . = 1) = %(.̂ = 1|- ? = 1, . = 1),
                                                  27


3.3.1    Dyadic-level Fairness
In this work, we investigate the filter bubble problem from the perspective of algorithm fairness.
Specifically, a dyadic-level fairness criterion can be defined based on the protected group member-
ship of individuals participating in the links. Below, we consider two such criteria:
    • Subgroup dyadic-level protection, where fairness is assessed in terms of how representative
       each protected subgroup is in the formation of the links. For example, in applications
       such as link-based recommender systems, the fairness criteria could be to ensure that the
       recommended links do not favor certain subgroups in the population at the expense of other
       subgroups.
    • Mixed dyadic-level protection, where fairness is determined based on homogeneity of the
       nodes involved in each link. Specifically, a link is considered to be an intra-group link if it
       relates a pair of nodes with the same protected attribute values. Otherwise, it is known as an
       inter-group link. To prevent effects such as filter bubble, inter-group or mixed links should
       be favored to prevent segregation of the users.
In principle, the subgroup dyadic-level protected can be implemented using existing group-level
fairness criteria for i.i.d. data by applying them to the links instead of individual nodes in the
network. For mixed dyadic-level protected, we introduce the network modularity measure to be
described in the next section.
3.3.2    Network Modularity
Homophily [123, 100], which is the tendency of individuals to form relations with others similar to
them, is an important characteristic of many social networks. Such relationship can be quantified
using the well-known network modularity (or assortative mixing) measure [134, 133]. The measure,
which was originally developed for community detection in networks, is based on the idea that a
random graph is not expected to contain any clustering structure. Any community structure in a
given network can thus be validated by comparing its link density against its expected density if the
                                                   28


link structure of the network is completely random. The modularity measure is defined as follows
[134]:
                                         1 ’⇣              38 3 9 ⌘
                                  &=                89              X(28 , 2 9 )                       (3.1)
                                        2<                  2<
                                            89
where      is the adjacency matrix representation of the network, X(28 , 2 9 ) is the Kronecker delta
function, 28 is the community of node 8, 38 is its corresponding degree and < is total number of
links. Intuitively, a network is said to be assortative if a significant portion of its links are between
nodes that belong to the same community.
    The modularity measure can be used to determine whether a network is unfair in terms of mixed
dyadic-level protection by replacing X(28 , 2 9 ) in Equation (3.1) with X(- ?8 , - ? 9 ), where - ?8 is the
protected attribute value for node 8. The & value is thus influenced by only those pairs of nodes
belonging to the same protected class. Values of & close to one would indicate high unfairness due
to the strong alignment between the link structure and the protected attribute while values close to
zero indicate high fairness. For numeric-valued protected attributes such as age or income level, it
can be modified as follows:
                                             Õ
                                                89 ( 89     38 3 9 /2<) - ?8 - ? 9
                                       A=Õ
                                              :; (3 : X :;    3 : 3; /2<) - ?: - ?;
                            where
This is also known as assortativity coefficient of the network.
    To illustrate the use of modularity as a measure of unfairness, consider the networks shown in
Figure 3.1. The data correspond to friendship relations among freshman at a secondary school in
the Netherlands from 2003-2004 [155]. Using gender as protected attribute, the modularity value
for the first network shown in Figure 3.1(A) is equal to 0.3033 while the value for the second
network is 0.0179. Note that the network with higher modularity has more links between students
of the same gender compared to the one with lower value, and thus, is unfair from the perspective
of mixed dyadic-level protection.
    Our proposed fairness-aware framework evaluates the reduction in the modularity measure to
determine whether the modified network obtained from the link prediction results is biased towards
                                                     29


0.50.5
                                  (a) Snapshot taken in 2003. Modularity = 0.3033.
0.50.5
                                  (b) Snapshot taken in 2004. Modularity = 0.0179.
Figure 3.1: Snapshots of friendship relation among students at a Dutch school taken 2003 and 2004
along with their modularity values. The node color represents the student’s gender. Darker dashed
lines correspond to links between students of different gender while the solid ones correspond to
links between students of the same gender.
creating more inter-group or intra-group links. Specifically, we define the following metric:
                                                   & ref   & pred
                                       <>3A43 =                   ,                              (3.2)
                                                        & ref
where & ref is the modularity measure of a reference network (e.g., the ground truth network when
evaluating link prediction algorithms) and & pred is the modularity of the predicted network, i.e., the
network obtained by augmenting the predicted links to the original network. A positive <>3A43
                                                  30


value indicates that the link prediction algorithm predicts more inter-group links than the ground
truth network while a negative value suggests that the algorithm is predicting more intra-group
links than the ground truth network.
3.3.3     Greedy Post-Processing
One approach to promoting fairness in link prediction is to post-process the prediction results. To
this end, we propose a greedy algorithm for reducing modularity of the predicted network. It takes
as input a set of binarized link predictions, { 4§GH } and calculates the change in modularity resulting
from flipping the prediction of each node pair. The change in modularity for flipping link 4§GH is:
                                   1 ⇣        3G + 3 H   1⌘
   B2>A4( 4§GH ) = ( 1) X( 4§GH )        1+                 X(2 G , 2 H )+
                                  2<              2<
                                                                         1 ⇣ ’               ’          ⌘
                                                                                      39 +            38 (3.3)
                                                                      4< 2 9 2⇠ , 9<H      82⇠   ,8<G
                                                                               G               H
where the value of ( 1) X( 4§GH ) is 1 if 4§GH is 1 and +1 otherwise, 3G and 3 H are the degrees of
nodes G and H respectively, 2 G and 2 H are the protected attribute values of node G and H respectively,
X(2 G , 2 H ) is the delta function that returns 1 if G and H are the same protected attribute value and 0
otherwise, and ⇠G and ⇠ H are the sets of nodes with the same protected attribute value as node G
and node H respectively. After computing this score for each predicted link we flip the edges with
the lowest scores. This is another approximation since the score for edge should be recomputed
after each edge is flipped due to changes in the value of 3G and 3 H . The number of link predictions
to flip is a hyper-parameter that can be varied depending on the importance of accuracy versus
modularity.
3.4      Adversarial Learning for Fair Link Prediction
    Consider an attributed network N = (+, ⇢, -), where + is the set of nodes, ⇢ ✓ + ⇥ + is the
set of links, and - 2 R |+ |⇥3 is a matrix corresponding to the set of attribute values associated with
the nodes in +. Assume - can be partitioned into submatrices [- ? , -D ], which correspond to the
                                                       31


                                   Figure 3.2: FLIP architecture
protected and unprotected features of the nodes. Our goal is to accurately infer new links in the
network without being biased against the formation of inter-group links.
    Our proposed framework, known as FLIP (Fairness-aware LInk Prediction), employs an adver-
sarial learning approach to ensure that inter-group links are well-represented among the predicted
links. The framework consists of the following 3 components, as illustrated in Figure 3.2:
   1. A generator, ⌧, that takes the attributed network as input and learns a representation ⌧ (D)
       for each node D 2 +. We use DeepWalk [144] as the generator, though in principle, the
       framework can be applied to other network representation learning methods.
   2. A discriminator, ⇡, that takes the representations for each pair of nodes produced by the
       generator as input and attempts to predict if it is an intra-group or inter-group node pair. The
       discriminator’s predicted probability that a pair of nodes has the same protected attribute
       value is denoted as ⇡ (⌧ (*), ⌧ (E))
   3. A link prediction component, !, which tries to infer new links given node representation
                                                  32


        learned by the generator. The predicted probability that a link exists between a pair of vertices
        is !(⌧ (D), ⌧ (E)).
     To understand the rationale behind the framework, note that a good feature representation
learned by the generator will enable the link prediction component to infer correctly whether a
node pair is connected. If the link structure of the network is biased towards intra-group links,
so will the link prediction component as well as the generator. The discriminator plays the role
of an adversary who attempts to predict whether a node pair involves nodes from the same group
or from different groups. By making the generator and discriminator to work against each other,
this would lead to a situation in which the generator produces a feature representation that is good
enough for link prediction yet unbiased enough to prevent the discriminator from inferring whether
it is an inter-group or intra-group node pair. In networks with homophily property, this will help to
discourage the prediction of intra-group links and promotes more inter-group links.
3.4.1     Discriminator
In recent years, adversarial networks have been used to achieve different fairness criteria for
independent and identically distributed (i.i.d) data [17, 116]. The shared idea between these
methods is an adversarial component that attempts to predict the protected attribute value -D . A
naïve approach to achieving fairness in network data is to follow same path and design an adversarial
component that predicts the protected attribute of a node using the following entropy cost function:
                                         
                         ⇡       1 ’
                            =              - ?D ;>6( Ĥ D ) + (1 - ?D ) log(1 Ĥ D )
                                |+ |
                                     D2+
Here Ĥ D = ⇡ (⌧ (D)) is the prediction of the discriminator of the binary protected value of node D.
     However this will not necessarily result in mixed dyadic level protection because intra-group
links may still be favored in a homophilic network. To solve this challenge we propose the following
                                                      33


adversarial loss:
                                             ’       
                            ⇡           1
                                =                      ? DE ;>6( ?ˆDE )
                                      |T |
                                           (D,E)2T
                                                  +(1      ? DE ) log(1 ?ˆDE )                  (3.4)
where T ✓ + ⇥ + is the set of node pairs in the training data, ? DE is the actual type of node pair
(D, E) with respect to a given protected attribute (i.e. intra-group vs inter-group) and ?ˆDE is the
discriminator’s prediction. Instead of inferring the node’s protected attribute, the discriminator
receives a pair of node representations, which it passes to a two layer fully connected network with
leaky ReLU activation to determine the probability that it is an intra-group node pair.
3.4.2   Generator
In contrast to the original GAN framework proposed by [64] where the generator seeks to gen-
erate samples of data points that seem real, the generator in our framework tries to learn node
representation that will preserve important structural information of the network without implicit
usage of the protected attribute information. For the generator, we utilized DeepWalk [144] which
is a network representation learning method inspired by the Skip-gram [126] model from natural
language processing. DeepWalk consists of two steps: the first step is to extract sequences of nodes
from the network by performing a series of truncated random walks starting from each node in the
input network. In the second step, the node sequences generated from the random walk process are
used to learn the feature representation of each node. This is accomplished as follows. A sliding
window of width F scans the generated node sequences to generate all the node pairs (D, E) in
which node E appears in the sliding window centered at node D. A fully connected neural network
with a single hidden layer predicts the probability of the occurrence of node E given the one hot
encoding, D̄, of node D. Specifically, the network attempts to predict ?(E|D) for each D as follows:
                                                4G ?( 5 0 (E) > 5 (D))
                                p(E|D) ' Õ                   0 0 )
                                                                                                (3.5)
                                              E0 2+ 4G ?( 5 (E ) 5 (D))
                                                    34


where 5 (E) = , Ē, 5 0 (D) = / D̄, , is the weight matrix between the input and hidden layers of
the network, and / is the weight matrix between the hidden and output layers of the network.
The rows of matrix , are the node representations generated by the skip-gram model so we have
⌧ (D) = 5 (D).
    The parameters of DeepWalk are trained using the maximum likelihood estimation approach,
with the following loss function:
                                       ’h                 ’
                            (:8 ? =                ;>6(        4G ?( 5 0 (E0) > 5 (D))
                                       D2+              E0 2+
                                           ’                                i
                                    +               4G ?( 5 0 (E) > 5 (D))
                                       E0 2⌦F (D)
Here ⌦F (D) represent the set of all nodes that appears in the neighborhood of node D in the given
random walk sequence with window size of width F.
3.4.3   Link prediction
This component takes a pair of node embeddings as input to predict whether their nodes should
be linked or not. This is accomplished by adding a two-layer link prediction network to the GAN
model. During the training phase the link prediction component receives pairs of node embeddings
and concatenates them into a feature vector, which is then passed to a two-layer fully connected
network with leaky ReLU activation. The output of the network corresponds to the likelihood of a
link to exist between the node pair. Here we deployed the standard cross entropy cost function as
follows:
                                   ’       
                      !      1
                        =                    4 DE ;>6( 4̂ DE ) + (1    4 DE ) log(1    4̂ DE ) (3.6)
                            |T |
                                 (D,E)2T
where 4̂ DE is output of the link prediction component for node pair (D, E) and 4 DE is the binary
ground truth link label.
    Putting everything together, the overall loss function for the proposed framework is given as
follows:
                                    ⌧  = (1      U) (:8 ?       U ⇡+V !                        (3.7)
                                                      35


                       Table 3.1: The real social network data sets statistics
                          network       #nodes    #edges     protected feature
                       Dutch school       26        221           gender
                         Facebook        1,034    26,749          gender
                         Google+         4,938   547,923          gender
where V is a hyperparameter. The generator, discriminator, and link prediction network are all
trained end to end using Adam [88]. The generator and link prediction network are trained on
the same batches but every other batch is used to train the discriminator only so that training
alternates between updating the link predictor and generator together one one batch and updating
the discriminator on the next batch.
3.5    Experimental Evaluation
    This section describes the experiments performed to evaluate the efficacy of our proposed
methods to address the filter bubble problem in network link prediction.
3.5.1   Experiment Setup
We first discuss the experimental setup, including data sets, baselines and evaluation metrics used
in our experiment.
3.5.1.1   Datasets
We evaluated our methods on three real world social network data sets. Table 6.1 summarizes
the main properties of these data sets. The first data set is a Facebook ego network [106], which
contains 1,034 nodes, and 26,749 friendship links. The second data set is Google+, which has
4,938 nodes and more than 500,000 links. [106], The third data set is Dutch school network
[155], which corresponds to friendship relations among 26 freshmen at a secondary school in the
Netherlands. For all three datasets, we use gender as the protected attribute for inferring intra-group
and inter-group links.
                                                 36


Table 3.2: Proximity based link prediction algorithms. For each node E, # (E) is the set of its
immediate neighbors.
                                        Method                      Definition
                                                                   |# (D)\# (E)|
                                        Jaccard
                                                            Õ      |# (D)[# (E)|
                                                                                     1
                                     Adamic/Adar               G2# (D)\# (E) ;>6(3G )
                              Preferential attachment                   3G 3 H
Table 3.3: Performance comparison of various link prediction algorithms on 3 real-world network
datasets.
     Method                Dutch school                        Facebook                              Google+
                     AUC              <>3A43           AUC               <>3A43             AUC               <>3A43
       Jac      0.650 +/- 0.000 -0.503 +/- 0.005   0.830 +/- 0.0    -0.149 +/- 0.037    0.793 +/- 0.0      0.093 +/- 0.03
     Ad-Ad      0.657 +/- 0.001 -0.376 +/- 0.004    0.84 +/- 0.0    0.222 +/- 0.009     0.869 +/- 0.0    -0.305 +/- 0.001
      Pr-At     0.602 +/- 0.002    0.443 +/- 0.002 0.807 +/- 0.0      0.46 +/- 0.001    0.905 +/- 0.0    -2.935 +/- 0.011
       DW       0.529 +/- 0.007 -0.047 +/- 0.242   0.951 +/- 0.0    0.089 +/- 0.002     0.771 +/- 0.0     0.166 +/- 0.039
    Jac+PEO     0.536 +/- 0.003    0.032 +/- 0.052 0.799 +/- 0.00 -0.557 +/- 0.235      0.750 +/- 0.0      3.47 +/- 0.048
  Ad-Ad+PEO     0.527 +/- 0.002 -0.234 +/- 0.052   0.799 +/- 0.00     0.013 +/- 0.06    0.829 +/- 0.0     3.219 +/- 0.013
  Pr-At+PEO     0.505 +/- 0.00     0.022 +/- 0.109 0.682 +/- 0.00    -0.21 +/- 0.227    0.858 +/- 0.0     3.854 +/- 0.154
   DW+PEO       0.491 +/- 0.005 -0.121 +/- 0.213   0.949 +/- 0.0      0.014 +/- 0.02    0.735 +/- 0.0      3.55 +/- 2.652
    Jac+GM      0.657 +/- 0.029 -0.218 +/- 0.084   0.842 +/- 0.00     0.661 +/- 0.05     0.74 +/- 0.0      0.966 +/- 0.04
  Ad-Ad+GM      0.653 +/- 0.026 -0.211 +/- 0.088   0.842 +/- 0.00     0.661 +/- 0.05    0.818 +/- 0.0     1.369 +/- 0.038
   Pr-At+GM     0.583 +/- 0.044    0.179 +/- 0.165 0.740 +/- 0.00     0.919 +/ 0.06    0.842 +/- 0.00     1.749 +/- 0.096
   DW+GM        0.536 +/- 0.056    0.133 +/- 0.091 0.901 +/- 0.00   0.497 +/- 0.062    0.725 +/- 0.03     1.506 +/- 0.591
      FLIP      0.658 +/- 0.004    0.359 +/- 0.009 0.86 +/- 0.00    0.348 +/- 0.004     0.857 +/- 0.0     0.207 +/- 0.009
3.5.1.2     Baseline Algorithms
We considered 4 state-of-art link prediction algorithms as baselines. Three of them are well known
classical proximity based methods which use neighborhoods structural information. The first
baseline is based on the well known Jaccard’s (Jac) coefficient similarity metric which is deployed
in the context of network link prediction by calculating the portion of common neighbors for a
given nodes pair. The second baseline is Adamic/Adar(Ad-Ad), a similar measure that assigns less
weight to more connected common neighbors. The third proximity based algorithm is preferential
attachment (Pr-At)[128] which sets the probability of a connection between two pair of such that it
is correlated with the product of the their degrees [131]. Table 3.2 summarized the formal definition
of these algorithms. We also considered the more recent DeepWalk(DW) algorithm [144] which
learns 3-dimensional feature representations of nodes by simulating uniform random walks and
provides latent features for nodes at the first step and then, similar to proposed approach in [65],
we construct the edge embedding by applying binary Hadammard product operation to the given
                                                        37


node pair and train a logistic regression to do link prediction. For evaluation, we use the settings
suggested in the original DW paper for both the DW baseline and the proposed method’s skip gram
model. These settings are: latent feature dimension (128), length of random walks(80), and number
of random walks(10) and window size(10) on all data sets.
    We also consider a traditional fairness algorithm based on equalized odds which we use to
post-process our 4 baselines. As previously mentioned, imposing an equalized odds constraint on
the predictions of a model is a popular way of ensuring fair predictions. For our task, we use
a generalized version of equalized odds proposed in [145] to post-process each of the baseline
algorithm predictions. To make the generalized equalized odds constraint compatible to network
data setting we treat link type, intra-group versus inter-group, as each link’s binary protected
attribute. We refer to this post processing algorithm as (PEO).
3.5.1.3    Sampling process and training
A big challenge for link prediction algorithms is the sparsity of real world network data. In other
words, since the number of existing links are significantly smaller than non-existing links, training
a model which is not biased toward negative examples is difficult. Given a graph # = (+, ⇢, -)
we generate a training set with equal number of negative and positive examples < # 0, ⇢ + , ⇢       >.
Here # 0 is the remaining sub-graph after removing all sampled positive links, ⇢ + , and ⇢ is a set
of randomly sampled non-links such that |⇢ + | = |⇢ |. Sampling positive links from # is random
with the restriction that # 0 remains connected. For each data set we generating 10 examples of
< # 0, ⇢ + , ⇢ > by deleting 80% of all links in #. For FLIP and DW we learn node representations
by performing random walks on graph # 0 and train the link prediction using 10% of the generated
positive and negative samples. For the other baselines we used all the 30% of ⇢ + and ⇢            for
calculating the proximity measures. We used remaining 70% for test.
                                                  38


3.5.1.4   Evaluation Metric
We evaluate the quality of link predictions with two metrics, accuracy and the area under the ROC
curve (AUC) which represents the trade-off between true and false positives with respect to different
thresholds. For the fairness subgroup dyadic-level metric we consider modred measure given in
equation 3.2.
3.5.2   Experimental Results
In the following subsection we investigate the general performance of the proposed framework.
Due to a lack of space we provide extended experimental results, including parameter sensitivity
analysis of FLIP and a FLIP ablation study, in the supplementary file.
3.5.2.1   General Performance
We summarize our results for link prediction in Table 3.3. For FLIP we report the the result
for U = 0.1, V = 0.2, epochs=3. For greedy post-processing we chose to the flip the 3% of the
predictions that will most reduce the modularity. Based on these results we can make several
observations. First, there is generally a trade off between AUC and <>3A43 so higher <>3A43
scores are only achievable by sacrificing accuracy.
    Second, none of the baseline algorithms achieve consistently high <>3A43 scores. In particular,
equalized odds post-processing provides highly inconsistent gains in <>3A43 that are heavily
dependant on the data set. It provides significant gains on the Google+ data set, but on the Dutch
school data set it provides only moderate gains, and it is a moderate impediment on the Facebook
data set. However, greedy post-processing and FLIP always achieve strong <>3A43 measures and
provide a good balance between AUC and <>3A43. This is unsurprising since FLIP and greedy
post-processing were the only two techniques specifically designed for promoting fairness in link
prediction. This demonstrates the importance of developing algorithms tailored specifically for
network data and link prediction.
                                                 39


    Third, of all the baselines, preferential attachment generally does the best job of achieving good
accuracy and <>3A43. One possible explanation for this is that all other baselines make predictions
based on the neighborhood structure of nodes. In a network that is homophillic with respect to
a protected attribute, nodes with the same protected attribute value are likely to have similar
neighborhood structure. Since all of our networks are homophillic with respect to the protected
attribute, link prediction methods based on neighborhood structure are more likely to reinforce
the existing homophilly and create intra-group links. In contrast, preferential attachment ignores
the neighborhood structure of nodes when making predictions so it less affected by pre-existing
network homophilly.
3.6    Conclusions
    This work presents novel fairness-aware methods to alleviate the filter bubble problem in network
link prediction. First, we present a fairness criterion based on network modularity measure to
determine whether inter-group links are well-represented in the predicted output of a link prediction
algorithm. We then consider two approaches to overcome the filter bubble problem—one based on
a greedy postprocessing approach using the modred measure while the other based on an adversarial
learning framework. Experimental results showed that the proposed methods are promising as they
can reduce modularity of the predicted network without degrading prediction accuracy significantly.
                                                    40


                                             CHAPTER 4
        FAIRNESS PERCEPTION FROM A NETWORK-CENTRIC PERSPECTIVE
In this chapter, we investigate the issue of algorithmic fairness from a network-centric perspective.
Specifically, we introduce a novel yet intuitive function known as fairness perception and provide
an axiomatic approach to analyze its properties. Using a peer-review network as a case study,
we also examine its utility in terms of assessing the perception of fairness in paper acceptance
decisions. We show how the function can be extended to a group fairness metric known as fairness
visibility and demonstrate its relationship to demographic parity. We also discuss a potential pitfall
of the fairness visibility measure that can be exploited to mislead individuals into perceiving that
the algorithmic decisions are fair. We demonstrate how the problem can be alleviated by increasing
the local neighborhood size of the fairness perception function.
4.1     Introduction
    The influence of machine learning is pervasive across numerous applications, from healthcare
and e-commerce to financial and criminal justice systems. Despite its utility, previous studies
have shown that the algorithmic decisions may contain unintended biases that discriminate against
certain groups of the population [11, 77, 37]. As a result, the challenge of removing biases from the
algorithmic decision-making process has gained significant attention in recent years. In particular,
various mathematical formulations of fairness have been proposed. For instance, group fairness
metrics such as demographic parity and equalized odds have been developed to assess the degree of
prejudice against certain protected groups in the population. While each metric has its own merits,
many of them are incompatible with each other [33, 58].
    The group fairness metrics are designed to determine the level of equity among different groups
of individuals who are harmed by or benefited from the algorithmic-driven decisions. However, the
consequences of an unfair decision may extend beyond those individuals who are directly impacted
by the decision. In fact, they may elicit negative responses from other individuals who identified
                                                   41


                     Figure 4.1: An example illustration of fairness perception.
themselves to be in the same group as the affected individuals. For instance, hiring discrimination
against a qualified member from an underrepresented group not only affects the well being of
that individual, but will also have an adverse effect on other members of the underrepresented
group who observed such behavior. This example suggests that fairness assessment must take
into consideration the perception of other individuals who may not be directly impacted by the
algorithmic decisions [143, 153].
    Fairness perception is rooted in the social comparison theory. For instance, equity theory [4]
argues that “humans do not base their satisfaction on what they receive but rather what they receive
in relation to what they think they should receive". The reaction of an individual to the outcome
of a decision process is based on the expectation of the individual and this expectation not only
depends on one’s own outcome but also the outcomes of other individuals they are aligned with,
which we refer to as the reference group. The choice of the reference group is typically influenced
by the similarity measure we use, e.g., we may compare ourselves to our co-workers, friends, and
family members. By observing the outcomes of other members in our reference group, this will
help shape our expectation about what should be considered a fair outcome.
    In this paper, we examine the notion of fairness perception from a network analysis perspective.
Networks provide a natural way to represent individuals and their connections to other individuals
in the same reference group. For instance, Figure 4.1 shows a toy example of a network of students
applying for college admission to a prestigious university. In this network, two students are linked
                                                 42


if they know each other. Suppose the admission committee of the college has decided to accept 3 of
the applicants, denoted as nodes with green check marks, and to reject the other 4 applicants. For
brevity, we assume all the students have similar qualifications. Consider the two students labeled
as 1 and 2, respectively. Although both applicants were rejected, their expectations for admission
and perceptions of fairness are very different. Student 1 has a higher expectation of being admitted
compared to student 2 since all of his/her friends were accepted. Thus, the perception of fairness
for student 1 is different than that for student 2.
     This paper introduces the notion of network-centric fairness perception and illustrates its appli-
cation to peer review process. Peer evaluation of scientific work has a significant effect on scientific
advancement. However, similar to other systems designed by humans, it is potentially biased,
favoring certain groups of individuals (e.g., famous researchers from top institutions) [157, 162].
In this study, we show how the proposed function can be used to assess the perception of authors
about paper acceptance decisions. An axiomatic approach for analyzing the desirable properties
of fairness perception functions is also presented. We then extend our proposed function to a
group fairness measure known as fairness visibility and show its relationship to demographic parity
under certain mild assumptions. We also describe a potential pitfall of assessing fairness from
a local neighborhood perspective. Specifically, it can mislead individuals into thinking that the
decision-making process is fair even though the overall decisions are biased toward certain groups
of individuals. Finally, we show how to alleviate the problem by expanding the local neighborhood
size of the fairness perception function.
4.2     Quantifying Fairness Perception
     Let ⌧ =< +, ⇢, - > be an attributed network, where + is set of nodes, ⇢ ✓ + ⇥ + is the set of
links (edges), and - 2 R |+ |⇥3 is the feature matrix associated with the nodes. We further assume
that - = (- ( ?) , - (D) ), where - ( ?) are the protected attributes and - (D) are the unprotected ones.
The set of links can also be represented by an adjacency matrix, , where 8 9 = 1 if a link exists
                                                            Œ:
between nodes 8 and 9. Furthermore, we denote : = 8=1               , where 8:9 > 0 if there exists a path
                                                     43


of length : between 8 and 9, and 0 otherwise.
     We also assume that each node E is associated with a target outcome, H E 2 {0, 1}. As an
example, in the context of peer review network, each node corresponds to a submitted paper and
links between papers are established if the two papers share the same authors or have authors
who had previously collaborated with each other. The outcome H E of a given paper E may indicate
whether the paper is acceptable or unacceptable based on the average ratings provided by reviewers.
     We assume there exists a decision function ⌘ : + ! {0, 1} associated with each node in the
network. Let H be the hypothesis space of all decision functions. Our goal is to learn a decision
function ⌘ 2 H that is consistent with the set of outcomes . while satisfying some fairness criterion.
From the perspective of peer review network, the decision function ⌘ may refer to the final decision
whether to accept or reject the paper. The true positive and false positive rates of the binary decision
function, ⌘, can be computed as follows:
                                      Õ
     • True positive rate, TPR =         EÕH E ⌘(E)
                                             E HE
                                      Õ
     • False positive rate, FPR =         E (1Õ H E )⌘(E)
                                               E HE
     Our goal is to determine how the final decisions are perceived by the individual nodes in the
network. Do they feel that the decisions are biased toward nodes that belong to certain groups?
To answer this question, we assume each node E is associated with a fairness perception function,
 5 (E, ⌘), given a decision function ⌘. The function provides a local, albeit myopic, view on
individual fairness of the nodes in a network.
4.2.1    Axioms for Fairness Perception
We first outline the desirable properties of the fairness perception function, 5 (E, ⌘), using the
following set of axioms. We assume each node E 2 + is associated with the following tuple,
(E.- ? , E.-D , H E , # (E)), where E.- ? denotes the value of its protected attribute, E.-D denotes the
value of its other (unprotected) attributes, H E denotes its target outcome, and # (E) denotes its
                                                          44


X-neighborhood, which is defined as follows:
                                                            : > 0}.
                                      # (E) = {D | 9:  X : DE                                      (4.1)
For brevity, we assume X = 1, unless stated otherwise. Let ⌧ A = (+A , ⇢A , -A ) be an ego-network
for node A, where +A = # (A) is the 1-neighborhood of A, ⇢A = {(8, 9) | 8, 9 2 # (A) and (8, 9) 2 ⇢ }
and -A is the feature matrix associated with the attributes of the nodes in +A . We present a set of
axioms on the fairness perception function.
   1. Locality axiom: If ⌘(E) = ⌘0 (E) and 8D 2 # (E) : ⌘(D) = ⌘0 (D), where ⌘, ⌘0 2 H , then
       5 (E, ⌘) = 5 (E, ⌘0).
   2. Monotonicity axiom: If ⌘(E) < ⌘0 (E) and 8D 2 # (E) : ⌘(D) = ⌘0 (D), where ⌘, ⌘0 2 H ,
      then 5 (E, ⌘)  5 (E, ⌘0).
   3. Neighborhood expectation axiom: If ⌘(E) = ⌘0 (E) and 8D 2 # (E) : ⌘(D)  ⌘0 (D), where
      ⌘, ⌘0 2 H , then 5 (E, ⌘)       5 (E, ⌘0).
   4. Homogeneity axiom: Let ⌧ D and ⌧ E be the induced sub-graphs of +D = # (D) [ {D} and
      +E = # (E) [ {E}, respectively. If ⌧ D and ⌧ E are isomorphic with respect to the decision
      function ⌘, then 5 (D, ⌘) = 5 (E, ⌘).
For the last axiom, we say that a pair of networks, ⌧ A = (+A , ⇢A , -A ) and ⌧ B = (+B , ⇢ B , -B ), are
isomorphic with respect to the decision function ⌘ if there exists a bÚection function < : +A ! +B
such that:
    • 8D 2 +A : ⌘(D) = ⌘(<(D)), H D = H <(D) and -D = -<(D) .
    • 8(D 1 , D 2 ) 2 ⇢A : (<(D 1 ), <(D 2 )) 2 ⇢ B
    The locality axiom states that the perception of fairness for an individual depends on the decision
outcomes for other individuals in its neighborhood. As long as the outcomes for the node and its
neighborhood remains unchanged, the fairness perception function should remain the same. The
                                                     45


monotonicity axiom suggests that the perception of fairness for an individual never decreases if
the decision changes in favor of the individual (assuming the decisions for its neighbors remain
unchanged). For example, if a previous decision on the paper was overturned (say from reject to
accept), then one should expect the fairness perception to improve (or at least stays the same). In
contrast, the neighborhood expectation axiom states that if the number of neighbors with favorable
decisions increases, then fairness perception decreases monotonically. This is because, if more
individuals in our reference group received favorable decisions, we expect the decision outcome
to be favorable for us as well. The increased expectation makes it less likely for us to perceive
the decision as fair if our paper is rejected. The fourth axiom ensures consistency of the fairness
perception function when applied to different nodes in the network, The axiom states that if two
disparate nodes with similar neighborhoods receive the same decision outcomes, their perception
of fairness should be the same.
4.2.2    Proposed Network-Centric Fairness Perception
Definition 7 (Network-Centric Fairness Perception). Given a network ⌧ =< +, ⇢, - > and a
decision function ⌘, the network-centric fairness perception function is defined as:
                                             8
                                             >
                                             >
                                             >
                                             <1
                                             >   if E[⌘(E)]  ⌘(E)
                                  5 (E, ⌘) =                                                  (4.2)
                                             >
                                             >
                                             >
                                             >0  otherwise
                                             :
where E[⌘(E)] is the expected value of ⌘(E), which must satisfy the following properties:
   1. If 8D 2 # (E) : ⌘(D) = ⌘0 (D), then E[⌘(E)] = E[⌘0 (E)].
   2. If 8D 2 # (E) : ⌘(D)  ⌘0 (D), then E[⌘(E)]  E[⌘0 (E)].
   3. Let ⌧ D and ⌧ E be the the induced sub-graphs based on the node sets +D = # (D) [ {D} and
       +E = # (E) [ {E}, respectively. If ⌧ D and ⌧ E are isomorphic with respect to the decision
       function ⌘, then E[⌘(E)] = E[⌘(D)].
                                                 46


     Our fairness perception function can thus be viewed as a local measure of individual fairness
for any given node E in a network. If the decision ⌘(E) is more favorable than expected, then E will
perceive the decision as fair. Furthermore, the expected value of the decision outcome, E[⌘(E)],
depends on the neighborhood of the node E.
Theorem 1. The network-centric fairness perception function given in Eqn. (4.2) satisfies the
locality, monotonicity, neighborhood expectation, and homogeneity axioms.
     Proof : The locality and monotonicity properties are proven using the first property. Since
E[⌘(E)] remains unchanged when ⌘(D) = ⌘0 (D) for all the nodes D in the neighborhood # (E), Eqn.
(4.2) suggests that 5 (E, ⌘) depends only on ⌘(E). If ⌘(E) = ⌘0 (E), then 5 (E, ⌘) = 5 0 (E, ⌘), thereby
proving that the locality axiom holds. Similarly, if ⌘(E) < ⌘0 (E), then 5 (E, ⌘)  5 (E, ⌘0), which
satisfies the monotonicity axiom. For the neighborhood expectation axiom, the second property
states that the expected value monotonically decreases when ⌘(D)  ⌘0 (D) for all the nodes
D 2 # (E). Since E[⌘0 (E)] is larger, then nodes that initially satisfy the inequality E[⌘(E)]  ⌘(E)
may no longer do so since ⌘0 (E) = ⌘(E). Thus, 5 (E, ⌘)               5 (E, ⌘0). Finally, we use the the third
property to prove the homogeneity axiom. Let ⌧ D and ⌧ E be the induced sub-graphs based on
node sets +D = # (D)[{D} and +E = # (E)[{E}, respectively. Since ⌧ D and ⌧ E are isomorphic with
respect to the decision function ⌘ and E[⌘(D)] = E[⌘(E)] holds due to the third property, therefore
 5 (D, ⌘) = 5 (E, ⌘).                                                                                         ⇤
     We consider the following neighborhood peer expectation approach to compute E[⌘(E)]:
                                HE h ’                i 1 H h ’
                                                                  E
                                                                                           i
                    E[⌘(E)] =                 H D ⌘(D) +                     (1   H D )⌘(D) ,              (4.3)
                                :1                            :0
                                      D2# (E)                        D2# (E)
                   Õ                         Õ
     where : 0 =     D2# (E) (1 H D ), : 1 =   D2# (E) H D , and H D 2 {0, 1}. Note that if the target outcome
H E = 1, then E[⌘(E)] depends only on the first term (i.e., other nodes D in its neighborhood with
H D = 1). On the other hand, if H E = 0, then E[⌘(E)] depends only on the second term (i.e., other
nodes D in its neighborhood with H D = 0).
     Intuitively, the neighborhood peer expectation considers the average decision of all its neighbors
with the same target outcome. For example, if H D denotes whether paper D is acceptable (based on
                                                       47


its review ratings) and ⌘(D) is its decision for acceptance, then the expected value of ⌘(D) depends
on the average decision for other papers in its neighborhood (e.g., papers co-authored by one of
the authors or their collaborators) with the same degree of acceptability. Since the expectation is
a monotonically increasing function of ⌘(D) for its neighbors, it can be trivially shown that the
neighborhood peer expectation satisfies the first two properties given in Definition 1. The third
property holds since the bÚection function guarantees that the H and ⌘ values for the nodes in the
neighborhoods of D and E to be the same. Thus, their expected values, E[⌘(D)] and E[⌘(E)], will
also be the same.
4.3     Fairness Visibility
     We now introduce fairness visibility, which extends the fairness perception function to a group
fairness measure.
Definition 8 (Fairness Visibility). Let +2 = {D | D 2 +, D.- ? = 2}, i.e., the set of nodes belonging
to the protected attribute group 2. The fairness visibility of ⌘ for group 2 is defined as follows:
                                                   Õ
                                                     E2+2 5 (E, ⌘)
                                        + (+2 ) =                                                 (4.4)
                                                        |+2 |
     Note that the fairness visibility for a given group 2 can be viewed as the average fairness
perception of all the nodes that belong to the protected group 2. For example, the group 2 may
refer to all the papers written by well-established authors in the peer review network. To determine
whether the decision function ⌘ is fair, we compare the fairness visibility for different groups of
nodes using the definition below.
Definition 9 (Fairness Visibility Parity). The decision function ⌘ satisfies fairness visibility parity
for +2 and +20 if
                                            + (+2 ) = + (+20 )                                    (4.5)
     For example, in a peer review network, we may categorize the papers into two groups, those
written by famous authors or those written by less established researchers. If the average fairness
                                                    48


perception for both groups of papers are the same, then their fairness visibility parity holds. The
larger the disparity, the more biased are the decisions as perceived by the groups.
    A standard approach for measuring group fairness is to compute demographic parity, which is
defined as follows:
Definition 10 (Demographic Parity). The decision function ⌘ satisfies demographic parity for +2
and +20 if
                              %(⌘(E) = 1 | E 2 +2 ) = %(⌘(E) = 1 | E 2 +20 )                            (4.6)
    Unlike fairness visibility parity, demographic parity is computed for non-relational data since
it ignores the neighborhood structure of a node. In the context of peer review network, each
probability term in Eqn. (4.6) corresponds to the acceptance rate of papers that belong to the
group 2 or 20. For brevity, we termed %(⌘(E) = 1|E 2 +2 ) as the acceptance probability for the
group +2 . The theorem below illustrates the relationship between fairness visibility and acceptance
probability.
Theorem 2. Assuming the network graph is connected and the decision function ⌘ has non-zero true
positive and false positive rates, the fairness visibility of group +2 , based on the neighborhood peer
expectation, converges to the acceptance probability for +2 as the X-neighborhood size increases.
    Proof : Given a node E, note that # (E) ! + as the X-neighborhood expands since the network
graph is assumed to be connected. Furthermore, if the true positive and false positive rates
for ⌘ are non-zeros, then eventually E[⌘(E)] > 0, 8E 2 + by expansion of X-neighborhood. It
follows that 5 (E, ⌘) = 1 if ⌘(E) = 1 and 5 (E, ⌘) = 0 if ⌘(E) = 0. Thus              + (+2 ) converges to
%(⌘(E) = 1|E 2 +2 ).                                                                                       ⇤
Corollary 2.1. Given a connected network ⌧, the decision function ⌘ satisfies demographic parity
if and only if there exists a positive integer : such that for all X     :, fairness visibility parity holds
for ⌘ with the given X-neighborhood.
                                                    49


Figure 4.2: The average rating distribution of submitted papers. The red line indicates the threshold
used for classifying papers as acceptable (H = 1) or unacceptable (H = 0).
4.4    Application to Peer Review Networks
    This section presents a case study on the application of our proposed approach to a peer review
network dataset.
4.4.1   Data
We constructed a network from the peer review dataset collected for the ICLR 2020 conference from
the OpenReview.net website. Specifically, for each submitted paper, we gathered information
about its title, abstract, list of authors and their affiliations. In addition, the anonymized reviews
and acceptance decision for each reviewed paper are also available. For the ICLR 2020 conference,
the number of submitted papers is 2594. However, 382 of the submissions were withdrawn.
Our analysis is therefore restricted to only 2212 papers which had been reviewed. We use this
information to create a network that contains 2212 nodes, one for each peer-reviewed paper.
    The total number of accepted papers, either as oral or poster presentation, is 687 while the
                                                    50


Table 4.1: Summary distribution of acceptable and accepted papers for the ICLR 2020 conference.
                                                         Acceptability
                                                         H=1 H=0
                                 Acceptance      ⌘=1      589      98
                                   Decision      ⌘=0      117 1408
                                             (a) All papers
                                         Acceptability                 Acceptability
                                         H=1 H=0                      H=1      H=0
                 Acceptance      ⌘=1       94       13       ⌘=1      495        85
                   Decision      ⌘=0       12      153       ⌘=0      105      1255
                                  (b) Famous authors         (c) Non-famous authors
                                          Acceptability                Acceptability
                                          H=1 H=0                     H=1 H=0
                  Acceptance     ⌘=1       190      34       ⌘=1      399        64
                    Decision     ⌘=0        21     328       ⌘=0       96      1080
                                  (d) Top institutions       (e) Non-top institutions
number of rejected papers is 1525. Thus, the conference acceptance rate is around 31%. We use
the acceptance decision of each paper as the decision function ⌘ to evaluate fairness perception. We
consider the acceptability of a paper, in terms of its average review ratings, as the target outcome H.
Our assumption here is that the reviewers are rational-minded individuals, whose average ratings
given to a paper reflect the technical merits and acceptability level of the paper. Figure 4.2 shows
histograms of average review ratings for the accepted and rejected papers. Given that the number
of accepted papers is 687, we choose an acceptability threshold of 6 since it gives a number
of acceptable papers that has the closest match to the actual number of accepted papers. With
this threshold, all papers whose average ratings are larger than 5 are considered acceptable, i.e.,
H = 1. Table 4.1(a) shows a confusion matrix comparing the acceptability of the paper (H) and its
acceptance decision (⌘).
    The total number of authors who had submitted papers to the conference was 6953. We were
able to extract authorship information for each paper, such as names and email addresses of the
                                                   51


co-authors, affiliation, gender, and scholarid by prepossessing the the users profile page on the
OpenReview website. Based on this information we classified the submitted papers into groups
based on the following “protected" attributes:
     • Famous author papers: If a paper includes one or more famous authors, its protected
       attribute value is - ? = 0, otherwise - ? = 1. We consider the top 500 authors with highest
       h-index according to Google scholar1 as famous authors. With this designation, 272 of the
       submissions were classified as famous author papers.
     • Top institution papers: If a paper has an author from a top-10 university according to the
       csrankings.org website2, then it its protected attribute value is - ? = 0, otherwise - ? = 1.
       We found 573 of the submitted papers have at least one author from a top institution.
    A breakdown on the number of acceptable and accepted or rejected papers for each group is
shown in Tables 4.1(b)-(e). The results given in these tables are consistent with previous research,
which had suggested that conference paper acceptance decisions are generally biased in favor of
famous authors or papers written by authors from top institutions [157, 162]. In particular, the
results suggest that the chance for an acceptable paper to be accepted is significantly higher for
papers written by famous authors (88.7%) or authors from a top institution (90.1%) compared to
those written by non-famous authors (82.5%) or authors from lower ranked institutions (80.6%).
Papers by famous or top institution authors also have a higher chance of getting their unacceptable
papers accepted compared to those by non-famous authors or authors from lower ranked institutions,
as reflected by their higher false positive rates.
    We use the co-authorship information extracted from the authors’ profile pages on OpenRe-
view.net to construct the links between the nodes in the network. We consider two papers are
linked if they share a common co-author or if the authors have collaborated in the past. Figures
4.3-(a) and 4.3-(b) show the degree distribution of the networks based on the famous author and
    1 https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=
label:machine_learning
    2 http://csrankings.org/#/index?all
                                                   52


                  (a) Famous author                                 (b) Top institution
                   Figure 4.3: Degree distribution of nodes in peer-review network.
top institution protected attributes. The results suggest that papers by famous authors or authors
from top institutions tend to have higher degree (on average) and a heavier tail in their distribution
compared to those written by non-famous authors or authors from lower ranked institutions.
4.4.2   Fairness Perception
We applied the proposed fairness perception function to the network and evaluated the proportion
of papers who perceived the paper acceptance decision to be fair or unfair. The results are shown
in Figure 4.4-(a) for the famous author protected attribute. Despite the fact that papers by famous
authors are generally favored (i.e., have higher true positive and false positive rates), the bar
chart shown in Figure 4.4-(a) suggests that the majority of them still perceived the decision to be
unfair. According to the fairness perception function, the main source of their discontent is the
unacceptable papers that were rejected (i.e., the blue bar), which they believe should have been
accepted. For papers by non-famous authors, Figure 4.4-(a) tells an opposite story as the majority
of them perceived the paper acceptance decisions to be fair. Although a significantly large number
of them still perceived the decision to reject their unacceptable papers as unfair (see the blue bar
for perceived unfair), the non-famous author papers are more amenable to accepting the decision
to reject their unacceptable papers (see the proportion of blue bar for perceived fair).
    The preceding results show a potential pitfall of using the fairness perception function (with
                                                  53


                     (a) Famous author
                     (b) Top institution
Figure 4.4: Assessment of network-centric fairness perception
                             54


    Figure 4.5: Comparison of E[⌘(E)] for rejected papers by famous and non-famous authors.
neighborhood size X = 1). Although the analysis of the confusion matrices given in Table 4.1
suggests that the decision is biased in favor of papers written by famous authors or top institutions,
the non-famous authors or those from non-top-tier institutions still perceived the decisions to be
fair! This can be explained as follows. Since our fairness perception function depends on the
computation of E[⌘(E)], we examine the distribution of E[⌘(E)] for rejected papers by famous and
non-famous authors. The results are shown in Figure 4.5. More than 40% of the rejected papers
by non-famous authors have an expected value close to 0 compared to around 10% of the rejected
papers by famous authors, Based on the definition given in Eqn. (4.2), the larger the proportion
of papers with E[⌘(E)] close to zero, the more likely they perceived the decision to be fair. One
possible explanation for the famous authors to have fewer proportion of papers with E[⌘(E)] close
to zero is due to the degree distribution of their nodes (see Figure 4.3). Since papers by famous
authors generally have a higher degree, this increases the number of nodes in their neighborhood,
which in turn, results in a higher expected value according to the formula used to compute the
neighborhood peer expectation. In contrast, many papers by non-famous authors have low degree
nodes, thus producing more nodes with low E[⌘(E)].
                                                  55


4.4.3    Fairness Visibility
In this section, we will empirically evaluate the theoretical results for fairness visibility, which
provides a possible solution to alleviate the potential pitfall of using our fairness perception function.
For this experiment, we vary the X-neighborhood size from 1 to 5 and compute the corresponding
fairness visibility measure with respect to the protected attributes. The results are plotted in Figure
4.6.
    For Figure 4.6-(a), observe that the fairness visibility of papers by famous authors are initially
lower than that for papers by non-famous authors when X = 1. This means that, on average, the
papers by famous authors have lower perceived fairness. As X increases, fairness visibility decreases
for both groups of papers. However, the rate of decrease is higher for papers by non-famous authors.
According to Theorem 2, under mild assumption, fairness visibility will converge to the acceptance
probability of each subgroup of the protected attribute when X increases. Since the acceptance
probability for - ? = 0 (famous authors) is higher than that for - ? = 1 (non-famous authors), the
fairness visibility for famous authors will be higher for larger values of the neighborhood size,
X. This provides a strategy to counter against the potential pitfall of using fairness perception by
expanding the neighborhood size X. Furthermore, it is worth noting that the peer review network
is not a connected graph. As a result, the fairness visibility does not converge exactly to the
acceptance probability for each group, which is 0.2989 (for non-famous authors) and 0.3933 (for
famous authors), when X is sufficiently large.
    A similar observation can be made when analyzing the effect of increasing neighborhood size on
fairness visibility using top institution as protected attribute. As shown in Figure 4.6-(b), increasing
X leads to lower fairness visibility. However, with sufficiently large X, the fairness visibility for
papers by authors from top institutions is higher than that for papers by authors from lower ranked
institutions. By setting X = 2, the fairness visibility provides a good assessment on the true bias of
the paper acceptance decisions.
                                                    56


                (a) Famous author                                  (b) Top institution
                   Figure 4.6: Effect of neighborhood size on fairness visibility.
4.5    Conclusion
    This paper presents a novel approach for algorithmic fairness in network data. Motivated by
the equity theory in social science, we introduced the concept of fairness perception as a local
formulation of fairness and quantified this notion through an axiomatic approach to analyze its
properties. We also showed how our proposed network-centric fairness perception function can be
extended to a group fairness measure known as fairness visibility. We provided theoretical analysis
to demonstrate its relationship to demographic parity. Using a peer-review network as case study,
we also examined its utility in terms of assessing the perception of fairness in paper acceptance
decisions. We also highlighted a potential pitfall of using fairness visibility measure as it can be
exploited to mislead individuals into perceiving that the algorithmic decisions are fair. Finally, we
show how to alleviate the problem by increasing the local neighborhood size.
                                                57


                                              CHAPTER 5
     MITIGATING PERCEPTION OF BIAS IN PEER-REVIEW DECISION MAKING
This chapter presents a novel recommendation algorithm that considers both the ratings provided
by the reviewers and the co-authorship network. Specifically, the proposed algorithm balances
the trade-off between maximizing review ratings and minimizing the perception of bias in the
peer review decisions. Motivated by the work on fairness perception in previous chapter, we
provide theoretical bounds on the gap between the average rating for different groups of individuals
when fairness perception is maximized. Finally, we demonstrate the effectiveness of the proposed
algorithm using open peer review data for a premier conference in machine learning.
5.1     Introduction
     Peer review has an undeniable role in the academic advancement of scholars. It helps to assess
whether the quality of a submitted work merits acceptance for conference or journal publication.
It is also commonly used to evaluate grant proposals submitted for funding allocation. Despite
its benefits, similar to other subjective approaches, the peer review system suffers from potential
biases among the reviewers [87]. For example, previous studies have suggested that conference
peer review systems tend to be biased in favor of papers written by prominent authors or authors
from well-known institutions [162]. In order to mitigate this problem, researchers have begun
investigating algorithmic-driven approaches to improve different stages of the peer review process.
This includes learning to order the submissions for display during the reviewers’ bidding step [56],
matching the submission to appropriate reviewers during the reviewer assignment step [76], and
calibrating the scores provided by reviewers who have varying degrees of leniency [170].
     Despite these efforts, there are still concerns regarding fairness of the peer review decisions,
especially in top-tier computer science conferences, as the number of submissions has grown
significantly in recent years. The goal of this paper is to develop a recommendation algorithm that
identifies highly-rated papers for acceptance by learning a ranking function that takes into account
                                                   58


acceptability of the papers and fairness of the decision outcomes. Here, the notion of fairness
considers both disparity in the decision outcomes of similarly qualified papers by different groups
of authors as well as the authors’ perception of fairness in the decisions.
    The perception of fairness is important in peer review process as it affects credibility of the
conference or journal. In particular, if the decisions were viewed as favoring certain groups of
privileged individuals at the expense of others, this may discourage other authors from submitting
their work to the same venue again in the future. Despite its importance, quantifying and optimizing
the perceived fairness remains a challenge as most conference management systems do not provide
an option for authors to give their feedback regarding the fairness of the paper acceptance decisions.
While some conferences provide a rebuttal phase to solicit author’s feedback, this was often used
as a mechanism to clarify or respond to comments by reviewers.
    This chapter is built on development of a fairness perception criteria in chapter four. However,
the fairness perception measure alone is insufficient for two reasons. First, maximizing the measure
does not provide guarantees of global fainess for all groups of individuals since the criteria is
defined based on local neighborhood information only. This limitation was indeed noted in chapter
four, we showed that their measure may mislead individuals into perceiving the overall decisions to
be fair even though the decisions are still biased towards certain groups of individuals. Second, the
proposed fairness perception measure was designed for classification instead of recommendation
problems. It requires access to ground truth labels to train a model to predict the decision
outcome. For recommendation problems, this assumption is no longer valid, and thus, requires a
new formulation to define a continuous-valued score for ranking the papers.
    In this chapter, we present a fairness-aware recommendation algorithm for mitigating the
perception of bias in peer reviews. Specifically, given an initial set of review ratings and a co-
authorship network, our goal is to provide a ranking of the papers based on a combination of
factors, including fairness perception and statistical parity, which is a measure of group-level
fairness. We examine how to improve recommendation of papers for acceptance by balancing
the trade-off between maximizing fairness perception and minimizing statistical disparity between
                                                   59


different groups of individuals. Using a graph-theoretic approach, we provide theoretical bounds
on the gap in statistical parity of the decisions when fairness perception is maximized for all
the groups and show that the bounds are potentially large, depending on the network structure
and how the nodes belonging to the protected group are distributed in the network. A novel
multi-objective optimization algorithm is therefore proposed to overcome this problem. Finally, we
conducted extensive experiments on real-world data from a multi-year computer science conference
to demonstrate the effectiveness of our proposed algorithm in terms of the accuracy, fairness
perception, and statistical parity of its recommendations.
5.2     Related Work
    Algorithmic fairness is an important topic that has attracted considerable interest in recent years
[16, 49, 69, 97]. Fairness can be defined at individual level as the absence of any prejudice or
favoritism towards an individual with certain characteristics in a decision-making task [124]. In
other words, similar people should be treated similarly [49]. For group fairness, the individuals are
divided into disjoint groups on the basis of their protected attribute values [16]. Given the decisions
made on individuals belonging to each group, a confusion matrix can be created from which fairness
metrics are computed to statistically compare the groups. If the decisions are fair, then the metrics
for different groups are expected to be equal or close to each other [53, 69, 49, 53]. Examples of
group fairness metrics include demographic parity or statistical parity [49], equalized odds [69],
and calibration fairness [36], which have been used in previous studies [113, 80, 78, 50, 24] to
ensure the independence between the decision outcome (or risk score) and group membership of
the protected attribute. However, recent studies have shown that existing metrics are not always
compatible with each other [33, 58]. Kearns et al. [85] described another challenge known as
gerrymandering, in which a classifier may appear to be fair with respect to each protected attribute
separately, but not when defined jointly over multiple protected attributes. There have been several
recent attempts to address these limitations. For example, Hebert-Johnson et al. [70] introduced a
multi-calibration approach, which aims to guarantee the statistical fairness definitions would hold
                                                   60


for an exponential or infinite class of groups defined by some class of functions with bounded
complexity.
    In addition to defining fairness metrics, methods for de-biasing decision outcomes from machine
learning models have also been developed. As the input data itself is potentially biased, pre-
processing methods have been developed to alleviate this problem [186, 113, 116]. For example,
Louizos et al. [113] employed variational auto-encoders to learn a latent representation of the data
that is independent of sensitive attributes in the data. Similarly, Zemel et al. [186] presented an
algorithm for learning fair representation that preserves as much information as possible about the
original data except for the protected group membership information. However, these approaches
are mostly designed for independent and identically distributed (i.i.d.) data. To alleviate this
problem, in chapter three we introduced a dyadic-level fairness criterion based on a modified
network modularity measure and showed how it can be utilized for network link prediction tasks to
overcome the filter bubble problem.
    The concept of fairness has also been extensively investigated in other fields such as social
science and criminal justice [143, 153]. These include studies that focus on understanding how
individuals perceive the notion of fairness. For example, Saxena et al. [151] investigated public
opinions on 3 fairness measures developed by computer science researchers and concluded that there
is a preference for calibration fairness. Lee and Baykal [102] also investigated users’ perception
on algorithmic fairness, though their results suggest that one-third of the participants felt the
algorithmic decisions to be unfair due to their failure to account for multiple concepts of fairness.
    Ensuring fairness in peer review of scholarly research is important as it can have an adverse
affect on the quality of the decisions as well as reputation of the organization conducting the review
process (e.g., journals, conferences, or funding agencies). Recent years have witnessed growing
attempts to improve the different stages of the peer review process. For example, Charlin and
Zemel [26] proposed an approach for automated reviewer assignment of conference papers known
as the Toronto paper matching system. Fiez et al.[56] argued that the ordering of conference papers
presented to reviewers for bidding has a significant impact on the number of bids received for each
                                                    61


paper, which in turn, affects their quality of reviews. Other similar works focusing on improving
the reviewer assignment process include [158, 93]. A long-standing debate regarding peer review
process is whether to reveal the identities of the authors (single blind review) and reviewers (double
blind review). Tomkins et al. [162] conducted an experimental study by partitioning entire pool
of reviewers for a conference into single blind and double blind. They concluded that single
blind system tends to be biased in favor of papers authored by famous authors or those from top
institutions. However, a subsequent study by Stelmakh et al. [156] showed the limitations of the
test procedures used in [162] and designed an approach to overcome these limitations. Finally,
calibration of the reviewers scores is another research direction that has been pursued recently [170]
to account for the implicit biases in the ratings as some reviewers are more lenient or stringent than
others.
5.3     Preliminaries
    This section formalizes our problem statement and reviews some of the fairness measures
available.
5.3.1    Problem Statement
Let ⌧ =< +, ⇢, - > be an attributed network, where + is the set of nodes, ⇢ ✓ + ⇥ + is the set
of links, and - is the set of node attributes. For paper acceptance recommendation, we assume
each node is a paper, each link represents a co-authorship relation between two papers, and the
node attributes - = (- ( ?) , - (D) ) is a combination of the protected attributes, - ( ?) , and other
attributes, - (D) . The protected attribute indicates whether the paper was written by well-known
researchers (famous authors) or authors from highly-ranked universities or research labs (top
institutions). If so, then - ( ?) = 0, otherwise - ( ?) = 1. Our rationale for using these criteria as
protected attribute is based on the results from chapter four and [162], suggesting that peer review
decisions tend to favor famous authors or those from well-known institutions. The attributes in - (D)
include the review ratings and other assessment criteria associated with each paper. We assume
                                                   62


                                             (D)                          (D)
there exists an evaluation function q : -E       ! R such that BE = q(-E ) represents acceptability
of the paper E. For example, BE may correspond to the average reviewer ratings (or a weighted
average ratings based on the reviewers’ confidence) of a paper. In addition, we also consider a
kernel function,     : - (D) ⇥ - (D) ! R+ , which measures the similarity between a given pair of
papers. In this chapter, we use the following Gaussian radial basis function as our kernel function:
                ✓          ◆
                   kD Ek 2
   (D, E) = exp        2 , where f is a hyperparameter.
                    2f
     The co-authorship relation is established by examining whether two papers share the same
authors or have authors who had previously collaborated together in the past. Their links are
represented by an adjacency matrix, , where 8 9 = 1 if papers 8 and 9 has a co-authorship relation
and 0 otherwise.
Definition 11 (Peer-review recommendation). Given a co-authorship network ⌧ =< +, ⇢, - >,
the peer-review recommendation task is to learn a ranking function ⌘ : + ! [0, 1] based on the
link structure ⇢ and node attributes - = (- ( ?) , - (D) ).
     The output of the ranking function can assist decision makers (i.e., conference program chairs)
by providing an initial list of top-k highest-ranked papers recommended for acceptance. Our goal
is to provide accurate recommendation that minimizes implicit biases in the peer review decisions
while maximizing fairness perception of the authors. While the discussion presented in this chapter
focuses on peer review recommendation, in principle, the proposed approach is applicable to other
network-based recommendation systems [114, 30].
5.3.2    Fairness Measures
Quantifying the notion of fairness is one of the pillars of ethical machine learning. A widely used
criterion for assessing fairness is demographic parity or statistical parity (SP) [49]. Demographic
parity constrains the output of the classification problem to be independent of the protected attribute.
Particularly a predictor satisfies demographic parity if there is no correlation between the protected
feature and the output. In other words, demographic parity seeks to equalize the positive decision
                                                    63


across different groups of the protected feature. Assuming both the target outcome .̂ and protected
attributes are binary-valued, demographic parity can be formalized as follow:
                               %(.̂ = 1|- ( ?) = 0) = %(.̂ = 1|- ( ?) = 1)                       (5.1)
     Another well-known fairness criterion is equalized odds [69], which seeks to ensure that
the predictions are conditionally independent of the protected attribute given the true outcome.
Suppose . = 1 is the advantaged outcome, such as job offer or college admission. A classifier, .̂ ,
satisfies the equality of odds with respect to the protected attributed - ( ?) and binary outcome . if
both conditions below hold:
                        %(.̂ = 1|- ( ?) = 0, . = 1) = %(.̂ = 1|- ( ?) = 1, . = 1)
                        %(.̂ = 1|- ( ?) = 0, . = 0) = %(.̂ = 1|- ( ?) = 1, . = 0)
The former equation, also known as equal opportunity [69], refers to equality of true positive rates
for the two groups whereas the latter refers to equality of false positive rates.
5.4     Methodology
     This section presents our approach for mitigating bias in peer review recommendation. We
first introduce the proposed neighborhood expectation function for recommendation problems and
then present a simple algorithm that maximizes fairness perception. We examine its impact on the
acceptance rates for different groups and theoretically show that the gap in acceptance rates can
be potentially large, which motivates us to develop a multi-objective optimization algorithm that
balances the two criteria.
5.4.1    Proposed Expected Function
Instead of using Equation (4.3), which requires availability of the ground truth label H, we con-
sider the following approach to compute the neighborhood-based expectation E[⌘(E)] for our
recommendation algorithm:
                                               Õ
                                                         ⌘(D) (E, D)
                                                  Õ
                                                 D2# (E)
                                   E[⌘(E)] =                                                     (5.2)
                                                    D2# (E)  (E, D)
                                                   64


Intuitively, the expected value of ⌘ is computed based on the weighted average value of neighboring
nodes in the network, taking into account the node similarity values. Unlike chapter four, we
consider ⌘ as a continuous-valued ranking function instead of a binary decision function.
Lemma 3. The proposed expected value in Equation (5.2) is a well-behaved function, satisfying
properties P1-P3.
Proof. For the first property, since 8D 2 # (E) : ⌘(D) = ⌘0 (D), therefore E[⌘(E)] = E[⌘0 (E)] since
    is independent of the ranking function ⌘ in Equation (5.2). The second property holds given the
fact that the co-domain of the kernel function          is non-negative. Assuming 8D 2 # (E) : ⌘(D) 
⌘0 (D), we have
                                   ’                       ’
                                         ⌘(D) (E, D)            ⌘0 (D) (E, D)
                                 D2# (E)                 D2# (E)
The proof for property P2 follows since the denominator of the expected value in Equation (5.2)
is the same for ⌘ and ⌘0. The third property also holds based on the premise that the induced
subgraphs ⌧ D and ⌧ E are isomorphic with respect to h. This guarantees that the ⌘ and             values
for all the nodes the neighborhood of D and E are the same. As a consequence, E[⌘(D)] = E[⌘(E)],
which completes the proof.                                                                              ⇤
5.4.2     Maximizing Fairness Perception
Let C = {⇠1 , . . . , ⇠ : } be the set of all connected components in G. The theorem below presents a
simple solution for the ranking function ⌘ that maximizes fairness perception.
Theorem 4. Given a ranking function ⌘ and a network ⌧ =< +, ⇢, - >, the function achieves its
maximum fairness perception, i.e., 8E 2 + : 5 (E, ⌘) = 1, if and only if 8⇠ 2 C : ⌘(E) = ⌘(E0) for
all E, E0 2 ⇠.
Proof. If ⌘(E) is identical for all the nodes E in the same connected component, then according to
Definition 5.2, it is easy to see that 8E : E[⌘(E)] = ⌘(E). As a result, the fairness perception function
 5 (E, ⌘) = 1 for all E 2 ⇠. Since this property holds for all ⇠ 2 C, therefore 8E 2 + : 5 (E, ⌘) = 1.
                                                      65


Next, we will show that if 8E 2 + : 5 (E, ⌘) = 1, then all the nodes E in the same connected
component must have the same value for ⌘(E). This is trivially satisfied if the connected component
has only one node. Thus, we consider the case for connected components with at least two nodes. By
contradiction, assume that fairness perception is maximized and there exists a connected component
⇠ 2 C containing nodes with different rank values ⌘. Let ⇠<8= = {D 2 ⇠ |⌘(D) = <} where
< = <8={⌘(D)|D 2 ⇠}. Since the rank values are not uniform, there must exist a node D < 2 ⇠<8=
connected to another node E 2 ⇠ \ ⇠<8= , in which ⌘(D < ) < ⌘(E). If there is no such a D < , then
⇠<8= must not be connected to ⇠ \ ⇠<8= , which cannot be true. Thus, 8E 2 # (D < ) : ⌘(D < )  ⌘(E)
and there exist a neighboring node E 2 # (D < ) such that ⌘(E) > ⌘(D < ). Using the expected value
function defined in Equation (5.2), we have
                                             Õ
                                                           ⌘(E) (D < , <)
                                                Õ
                                               E2# (D < )
                               E[⌘(D < )] =
                                                               (D < , E)
                                              Õ
                                                  E2# (D < )
                                                E2# (D < ) ⌘(D < ) (D < , E)
                                           >     Õ
                                                   E2# (D < ) (D < , E)
                                           = ⌘(D < )
Since ⌘(D < ) < E[⌘(D < )], 5 (D < , ⌘) = 0, which contradicts the assertion that fairness perception is
maximized. Thus, the original assumption that the connected component has nodes with different
rank values must be wrong.                                                                            ⇤
    Theorem 4 suggests a trivial solution for maximizing fairness perception is by assigning the
same rank value ⌘ to every node in a connected component. However, maximizing fairness
perception alone is insufficient for several reasons. First, it does not guarantee good performance
as all the nodes (papers) in the same connected component have identical rank values irrespective
of their acceptability score B. Second, since fairness perception is an individual-level metric based
on local neighborhood features rather than global features, it does not guarantee similarly qualified
papers by different groups of authors will be treated equally.
    One way to measure equity across the different groups is to apply a group-level fairness criterion
such as statistical parity (see Equation (5.1)), which was originally introduced for classification
                                                   66


problems. Here, we propose the following weighted statistical disparity (WSD) metric to determine
the disparity in the average rank scores ⌘ of similarly qualified papers for the different groups of
authors. Specifically, we first partition the acceptability scores of the papers into a set of
ordinal values, B̂1 < B̂2 < · · · < B̂ . For example, if the average rating of a paper E ranges
between 0 to 10, then B̂8 may correspond to the rounded integer value of BE , A (BE ) = B̂8 . Let
    ( 9)                                ( ?)
⌫8       = {E 2 + | A (BE ) = B̂8 , -E = 9 } denotes the set of papers with similar acceptability scores,
                         Õ
B̂8 , and ⌘¯ 8 = (19)
             ( 9)
                                   ( 9) ⌘(E) denotes the average rank ⌘ of such papers from protected group
                  |⌫   |     E2⌫
                    8              8
- ( ?)   = 9. The following equation is used to measure the weighted statistical disparity across the
different groups:
                                                           1’
                                                                   B̂8 ⌘¯ 8       ⌘¯ 8 2
                                                                          (0)        (1)
                                                (⌘) =                                                                     (5.3)
                                                             8=1
Note that (⌘) = 0 if the average rank of similarly qualified papers are identical for the different
groups of the protected attribute. The measure also emphasizes on ensuring fairness in the
ranking of papers with higher acceptability scores since such papers are the ones more likely to
be recommended for acceptance. The theorem below illustrates the relationship between fairness
perception and the weighted statistical disparity measure.
Theorem 5. Given a ranking function ⌘ that maximizes fairness perception, i.e., 8E 2 + : 5 (E, ⌘) =
                                                                                                              n      o
1, its weighted statistical disparity satisfies the following inequality: ^ ⇥ <0G X⇤ , 0  (⌘) 
                           Õ
^ ⇥ ⇤ , where ^ := 1 8=1 B̂8 ,
                                                 ⇣                                             ⌘
                                                      (0)       (1)          (0)         (1)
                                   X⇤ = sign (⌘min ⌘max ) (⌘max ⌘min )
                                                  n                                                  o
                                                       (0)        (1) 2          (0)         (1) 2
                                        ⇥ min (⌘min ⌘max ) , (⌘max ⌘min ) ,
                                                  n                                                  o
                                                                ⌘max ) 2 , (⌘max            ⌘min ) 2
                                     ⇤                 (0)        (1)            (0)         (1)
                                        = max       (⌘min                                              .
                                ( 9)                         ( ?)                    ( 9)                     ( ?)
Here, we have denoted ⌘min = min{⌘(E) | -E                         = 9 } and ⌘max = max{⌘(E) | -E                  = 9 }.
Proof. First, note that ⌘min  ⌘¯ 8  ⌘max since
                               ( 9)        ( 9)       ( 9)
                                          1        ’                      1        ’ ( 9)
                          ⌘¯ 8 =
                             ( 9)                                                                        ( 9)
                                                           ⌘(E)                            ⌘max = ⌘max
                                          ( 9)                             ( 9)
                                       |⌫8 |         ( 9)             |⌫8 |            ( 9)
                                                E2⌫                              E2⌫
                                                     8                                 8
                                                                  67


                                            1         ’                       1        ’
                           ⌘¯ 8 =
                              ( 9)                                                               ( 9)      ( 9)
                                                               ⌘(E)                            ⌘min    = ⌘min
                                            ( 9)                              ( 9)
                                        |⌫8 |             ( 9)             |⌫8 |          ( 9)
                                                    E2⌫                             E2⌫
                                                          8                               8
Using these inequalities, it can be easily shown that
                                         ⌘max  ⌘¯ 8              ⌘¯ 8  ⌘max
                              (0)          (1)           (0)         (1)       (0)         (1)
                           ⌘min                                                         ⌘min and
                                                                n                                               o
                        ( ⌘¯ 8         ⌘¯ 8 ) 2  max (⌘min                   ⌘max ) 2 , (⌘max         ⌘min ) 2
                              (0)         (1)                         (0)       (1)           (0)        (1)
Replacing the above inequality into Equation (5.3) and using the definition of ⇤ , we obtain the
following upper bound:
                                           1’                    n                                                o
                                                                               ⌘<0G ) 2 , (⌘<0G          ⌘<8= ) 2
                                                                       (0)       (1)            (0)       (1)
                        (⌘)                         B̂8 max (⌘<8=
                                                 8
                                   = ^⇥ ⇤
To obtain the lower bound for (⌘), we use the following:
                                                            ⇣                                           ⌘
                   ( ⌘¯ 8          ⌘¯ 8 ) 2 sign (⌘min ⌘max ) (⌘max ⌘min )
                        (0)           (1)                        (0)        (1)       (0)         (1)
                                                           n                                                o
                                               ⇥ min (⌘min ⌘max ) 2 , (⌘max ⌘min ) 2 = X⇤
                                                                 (0)       (1)          (0)          (1)
                                           ⌘min ) have opposite signs, then X⇤ is negative. Since ( ⌘¯ 8                ⌘¯ 8 ) 2
     (0)     (1)                (0)          (1)                                                                    (0)    (1)
If (⌘min   ⌘max ) and (⌘max
is non-negative, we can obtain a tighter bound as follows:
                                                                                     n        o
                                                ( ⌘¯8          ⌘¯8 ) 2 max X⇤ , 0
                                                      (0)          (1)
                                                                                                                           (5.4)
Replacing the above inequality into Equation (5.3) yields
                                                                   ⇢                           ⇢
                                            1’
                                 (⌘)                   B̂8 max X , 0 = ^ ⇥ max X⇤ , 0
                                                                       ⇤
                                                   8
which completes the proof.                                                                                                     ⇤
    The preceding theorem shows that maximizing fairness perception does not guarantee parity in
the recommended decisions for different groups. The latter also depends on the distribution of the
protected attribute values across the network, which is beyond the control of the decision-makers.
Thus, it would be desirable to find a compromise that considers the trade-off between maximizing
fairness perception and minimizing the weighted statistical disparity. In the next section, we
describe our algorithm to achieve this balance.
                                                                        68


5.4.3   FPRank: Fair Peer Review Recommendation
This section presents FPRank, our proposed recommendation algorithm, which learns a ranking
function ⌘ that considers the acceptability scores of the papers and the trade-off between maximizing
fairness perception and minimizing the weighted statistical disparity measure, .
                                                                                           Õ
    First, we re-write the expected value given in Equation (5.2) as E[⌘(E8 )] = 9 ,8 9 ⌘(E 9 ), where
, = ⇡ 1(            ) and    denote a Hadamard product between the adjacency matrix and the
                                                                       Õ
kernel matrix     . ⇡ is a diagonal matrix, in which ⇡ 88 = 9 [                   ] 8 9 . A trivial approach to
learn a ranking function that maximizes fairness perception is to solve ⌘ = , ⌘, or equivalently,
(,     )⌘ = 0. The solution is obtained by finding the null space of the matrix ,                , which admits
the null vector ⌘ = 0 as its trivial solution. In addition to maximizing fairness perception, the
ranking function ⌘ should also match acceptability of the paper as given by B. In other words, the
rank of the paper should be high if its acceptability score is high. Thus, one of our objectives is
to minimize the discrepancy between B and ⌘. Furthermore, the ranking function ⌘ must also be
trained to minimize the weighted statistical disparity, (⌘), to ensure it is not biased against certain
groups of nodes.
    Putting everything together, the objective function to be solved by our framework is as follows:
                         argmin⌘     ||B   ⌘|| 2 + U (⌘) 2 + V||I|| 2                                      (5.5)
                                          ’
                              s.t.   88 :     ,8 9 ⌘ 9  ⌘8 + I8 , ⌘8       0, I8      0
                                            9
I is a slack variable that relaxes the constraint for fairness perception to be equal to 1 for all
the nodes. The hyperparameters U and V control the tradeoff between matching ⌘ to the paper
acceptability scores, ensuring group parity, and minimizing the number of constraint violations.
During training, our algorithm finds the values of U and V that minimizes the power mean of
the three terms in the objective function1. The objective function can be solved using a standard
quadratic programming package such as CVX [46, 6].
                                                                         ⇣ Õ           ⌘1
    1                                                                     1
      The power mean for G1 , . . . , G = is " ? (G 1 , . . . , G = ) = = 8=1 G8
                                                                             =     ? ?
                                                                                           where p is a hyper
parameter.
                                                   69


Table 5.1: Summary statistics of submitted and accepted papers for the ICLR conference from
2017 to 2020.
                           Year   # papers # Accepted    % Top     % Famous
                                             papers    Institution  authors
                           2017      488      243         25%        22%
                           2018      402      229         33%        17%
                           2019     1419      502         27%        15%
                           2020     2212      687         26%        12%
5.5     Experimental Evaluation
     This section describes the experiments performed to investigate the effectiveness of our fairness-
aware recommendation algorithm.
5.5.1    Experimental Setup
5.5.1.1    Data
We consider a multi-year peer review data from a major computer science conference (ICLR) for
our experiments. The dataset was scraped from the OpenReview.net website, which provides an
API to access the list of submitted papers as well as their meta-data, including reviews, comments,
and final decisions. Table 5.1 shows the summary statistics of the submitted and accepted papers
from 2017 to 2020. We extracted the authorship information for each paper, including the authors
names, affiliations, and scholarid. We also identified the email addresses of their co-authors by
preprocessing the authors’ profile pages on the OpenReview website. Similar to the approach
used in previous chapter, we categorized the submitted papers into different groups based on the
values of two protected attributes—famous author papers and top institution papers. We used
Google Scholar to identify famous authors by examining the top-500 researchers with the highest
                                                                                 ( ?)
h-index scores2. Thus, if a paper includes one or more famous authors, then -author = 0, otherwise
  ( ?)
-author = 1. To identify the top institutions, we used the rank list provided by csrankings.org3. If
    2 https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=
label:machine_learning
    3 http://csrankings.org/#/index?all
                                                  70


                                                                        ( ?)                 ( ?)
a paper has an author from a top-10 highest ranked university, then -inst = 0, otherwise -inst = 1.
The co-authorship relation was established by combining the list of co-authorship information
provided on the user profile page on OpenReview website with the co-authorship information
gathered from the DBLP website4. We consider the acceptability score of a paper B in terms of its
average review ratings.
5.5.1.2   Baseline Algorithms
We compare our proposed FPRank method against the following baselines: (1) Calibrated equalized
odds postprocessing, CEP [145], which is a method for calibrating the classifier output scores to
ensures fairness in terms of equalized odds measure. We use a variation of the method to ensure
equality of false positive rates CEP (FPR). (2) Reject option classification, ROC [79], which is a
post-processing method that guarantees fairness in terms of statistical parity ROC (SPD) or equal
opportunity ROC (EOD). (3) FairTop-k [185], which is a fairness-aware ranking algorithm that
identifies a subset of k candidates from a large pool of candidates by selecting the best candidates
based on a ranked group fairness criterion. We use the implementation provided by AI Fairness
360 software5 for the CEP and ROC baselines. We also consider a variation of our method, Max
Perception, which maximizes only fairness perception and consistency with acceptability score B
without considering statistical parity (i.e., see Equation 5.6 without (⌘) and the slack variables).
5.5.1.3   Evaluation Metrics
We consider the conference decision as the ground truth outcome and calculate the widely-used
                                                                                           Õ
average precision measure of each algorithm based on their recommended papers, i.e., <1 : %@:,
where < is the number of accepted papers and %@: is the precision based on the top-: recommended
papers. We also compute its overall fairness perception by taking the average fairness perception of
                       Õ
the papers, i.e., |+1 | E2+ 5 (E, ⌘). The metric ranges between 0 and 1, with larger values suggest a
   4 https://dblp.uni-trier.de/xml/
   5 https://aif360.mybluemix.net
                                                   71


Table 5.2: Average precision of various methods compared to the conference decisions (using top
institutions as protected attribute).
                                           2017      2018      2019     2020
                        FPRank            0.9246    0.8333    0.7956   0.9093
                        Max Perception    0.6901    0.6979    0.4601   0.3063
                        ROC (SPD)         0.7687    0.7789    0.7830   0.7381
                        ROC (EOD)         0.7769    0.7794    0.7797   0.7611
                        CEP (FPR)         0.5606    0.6934    0.5074   0.6705
                        FairTop-k         0.9004    0.8288    0.8810   0.8464
Table 5.3: Average precision of the various methods compared to the conference decisions (using
famous authors as protected attribute).
                                           2017      2018      2019     2020
                        FPRank            0.9262    0.8356    0.7956   0.9085
                        Max Perception    0.6901    0.6979    0.4601   0.3063
                        ROC (SPD)         0.7738    0.7575    0.7684   0.7467
                        ROC (EOD)         0.7769    0.7794    0.7797   0.7611
                        CEP (FPR)         0.5641    0.6976    0.5074   0.6705
                        FairTop-k         0.8971    0.8263    0.8790   0.8487
higher perception of fairness in the decision. Finally, we evaluate the weighted statistical disparity
of each method using Equation 5.3. To compute the metric, the acceptability scores B8 were
discretized into B̂8 by rounding the average review ratings to the nearest integer value.
5.5.2    Experimental Results
The first goal of our experiments is to compare the rankings generated by the different methods
against the conference decisions. Table 5.2 summarizes the average precision of FPRank and
other baseline methods using top institution as protected attribute while Table 5.3 shows the
corresponding results using famous author as protected attribute. The results in both tables suggest
that FPRank outperforms all the baseline methods in 3 out of 4 years. Although its precision is
lower than FairTop-K in 2019, it outperforms FairTop-K in terms of average precision and other
fairness metrics to be described below in all other years. Observe that the Max Perception baseline
has a lower average precision than FPRank, which suggests that maximizing fairness perception
does not guarantee high quality recommendation.
                                                  72


Table 5.4: Comparison of average fairness perception (using top institutions as protected attribute)
                                               2017      2018      2019     2020
                     FPRank                   0.8709    0.8682    0.7463   0.5023
                     Max Perception           0.9918    0.9577    0.8118   0.5054
                     ROC (SPD)                0.7766    0.8159    0.6758   0.5027
                     ROC (EOD)                0.8340    0.8159    0.6892   0.4973
                     CEP (FPR)                0.7889    0.8358    0.6723   0.4991
                     FairTop-k                0.8340    0.8184    0.6794   0.4986
                     Conference Decision      0.8135    0.8308    0.6963   0.5077
Table 5.5: Comparison of average fairness perception (using famous authors as protected attribute)
                                               2017      2018      2019     2020
                     FPRank                   0.8709    0.8607    0.7449   0.5023
                     Max Perception           0.9918    0.9577    0.8118   0.5054
                     ROC (SPD)                0.8094    0.8159    0.6737   0.4977
                     ROC (EOD)                0.8340    0.8159    0.6892   0.4973
                     CEP (FPR)                0.8197    0.8308    0.6723   0.4991
                     FairTop-k                0.8320    0.8134    0.6758   0.4973
                     Conference Decision      0.8135    0.8308    0.6963   0.5077
     Next, we compare the average fairness perception of the different methods. Table 5.4 sum-
marizes the results for top institution as protected attribute. Except for 2020, Max Perception
has the highest fairness perception, which is not surprising since the algorithm explicitly tries to
maximize the measure. FPRank has the second highest fairness perception values and consistently
outperforms other baseline methods including FairTop-K. Similar conclusions can be reached for
the results shown in Table 5.5 for famous authors as protected attribute. For 2020, it appears that
the original conference decision has the highest fairness perception, though it is quite close to other
methods and is lower than other years.
     Finally we compare the performance of the various methods in terms of their group fairness met-
ric, weighted statistical disparity. Tables 5.6 and 5.7 summarize the results using top institution and
famous author, respectively, as the protected attribute. Observe that FPRank consistently achieves
weighted statistical disparity values close to 0 in all years, which suggests that its recommendation
is not biased against one of the protected groups. These results support our claim that FPRank
can achieve high average precision in its recommendation while balancing the trade-off between
                                                   73


Table 5.6: Comparison of weighted statistical disparity (using top institutions as protected attribute)
                                             2017       2018     2019      2020
                    FPRank                  0.0657     0.0049   0.0040    0.0000
                    Max Perception          0.3707     0.1831   0.0165    0.0003
                    ROC (SPD)               0.3619     0.1649   0.0065    0.0101
                    ROC (EOD)               0.3466     0.1637   0.0009    0.0000
                    CEP (FPR)               0.6164     0.1637   0.0000    0.0001
                    FairTop-k               0.3459     0.1644   0.0040    0.0001
                    Conference Decision     0.3464     0.0110   0.0012    0.0023
Table 5.7: Comparison of weighted statistical disparity (using famous authors as protected attribute)
                                             2017       2018     2019      2020
                    FPRank                  0.0709     0.0391   0.0044    0.0000
                    Max Perception          0.2835     0.1798   0.0557    0.0004
                    ROC (SPD)               0.2375     0.1753   0.0102    0.0103
                    ROC (EOD)               0.1825     0.1638   0.0002    0.0000
                    CEP (FPR)               0.7653     0.1637   0.0002    0.0001
                    FairTop-k               0.1834     0.1639   0.0038    0.0001
                    Conference Decision     0.1851     0.0040   0.0009    0.0009
maximizing fairness perception and minimizing the weighted statistical disparity of the different
groups.
5.6    Conclusions
    This chapter presents a novel fairness-aware peer review recommendation algorithm called
FPRank that considers a multi-objective criteria based on acceptability scores, fairness perception,
and weighted statistical disparity, to generate its recommendation for paper acceptance. We also
provide theoretical analysis to show that maximizing fairness perception alone is insufficient as it
may lead to bias in terms of the group fairness criteria. Finally, we successfully demonstrate the
effectiveness of FPRank in terms of balancing the competing requirements using open review data
from a major conference in machine learning.
                                                  74


                                            CHAPTER 6
                                    FAIR GRAPH SAMPLING
This chapter investigates the problem of network sampling from a fairness perspective. First
we propose a novel fairness-aware network sampling framework that combines the structural
preservability and group representativity objectives into a unified structure. Next we developed a
fair greedy sampling algorithm which aims to satisfies both objectives. We also provide a theoretical
approximation guarantee bound for the proposed method.
6.1     Introduction
     Networks are powerful tools for modeling interactions between entities in a complex social
system. The digital revolution has provided a unique opportunity to collect significant amount of
information related to such systems. For example, the number of smartphone users has grown
by 40% from 2016 to 2021, reaching up to 3.8 billion people [163]. The pervasiveness of such
technology means more social network users, more online purchases, more user-generated content,
and consequently, more massive networks. Executing even simple algorithms on such a massive
network can be very expensive. In addition to computational complexity, many online social
networks are not entirely visible due to privacy concerns, and are accessible only via crawling or
      (a) The Original Network    (b) An Structural Preservative Sam-(c) A Group Representative Sample
                                  ple
                 Figure 6.1: An example illustration of different sampling objective.
                                                  75


through the use of APIs. However, such access is often limited due to restrictions on the number of
queries or rate limits imposed by the data providers [1]. This makes accessing the whole network
extensively hard or near impossible.
    One way to address the above challenges is by using network sampling. Sampling can be used
to extract a subset of the nodes and links of the network for subsequent analysis tasks. For example,
the sampled network can be used for efficient training of sophisticated deep learning models such
as graph convolutional networks (GCN) or for network visualization purposes. Furthermore, it can
be utilized as an efficient way to collect data with a limited budget.
    The most common sampling objective is to preserve specific topological properties of the
original network, which will be referred to as structural preservability in this chapter. In graph
sampling literature, many network topological measures have been used to characterize structural
preservability. These properties can be generally categorized into two groups: (1) vector properties
such as the distribution of node degrees, clustering coefficients, and eigenvalues, and (2) scalar
properties such as average degree, network diameter, and average clustering coefficient. The
former can be assessed using probabilistic distance measures such as Kullback–Leibler divergence
and Kolmogorov-Smirnov statistic whereas the latter can be measured using measures such as
normalized root mean square error (NRMSE) [73].
    Besides their link structure, many networks can also be characterized by properties of their
node attributes. The sampling objective can therefore be expressed in terms of representativeness
of their node attribute values [73], which will be referred to herein as group representativity. The
node attributes used to define the groups are known as protected attributes (e.g., gender, race, age
group, etc). Group representativity can be formalized as the ratio of nodes from each subgroup in
the sampled subgraph to the original network. Figure 6.1 shows a toy example to illustrate the two
potentially conflicting sampling objectives. The network in Figure 6.1(a) has 11 nodes with the red
and blue colors representing their group memberships. Observe that there are two major clusters
in the network, one involving the nodes labeled 0-4 while the other involving nodes 5-10. Figure
6.1(b) illustrates a structurally preservable network sample that maintains the clustering structure
                                                   76


but is not group representative as it includes mostly nodes from the blue subgroup, with only one
node from the red subgroup. On the other hand, Figure 6.1(c) illustrates a group representative
sample that is not structurally preservable since it contains nodes primarily from one of the two
clusters.
    Similar to other machine learning tasks, existing network sampling algorithms are not imper-
vious to biases against certain demographic groups of the social network. Given its broad range of
applications, the importance of fair network sampling cannot be overly emphasized. If the sampled
network is biased, this will adversely affect the results of downstream mining tasks. For example,
Wagner et al. [166] showed that uninformed sampling may lead to biased estimation of node
centrality values and unfair ranking of nodes from minority groups in a social network. They also
proposed a metric based on the idea that an ideal sample should not “systematically rank nodes of
one group higher and nodes of the other group lower than expected." However, the authors did not
present any new sampling method to overcome the limitations of existing algorithms.
    The main challenge in fair network sampling is how to combine the structural preservability and
group representativity objectives in a principled way and to design an algorithm that optimizes for
both. In this chapter, we develop an approach that measures structural preservability by comparing
the centrality measures of the nodes in the sampled network compared to their values in the original
network. The fairness of the sampled network, which corresponds to our group representativity
measure, is then defined by a max-min subgroup fairness criterion [118] based on the worst-case
structural preservability value among all the subgroups of the protected attribute. We have also
developed a greedy algorithm to obtain an approximate solution for the max-min subgroup fairness
criterion. A systematic evaluation of the proposed sampling algorithm was performed on various
real-world social network data to empirically compare its effectiveness relative to other baseline
methods.
                                                  77


6.2     Related Work
     Previous methods for network sampling can be categorized as random node selection (RN),
random edge selection (RE), or traversal-based sampling methods. In the first case, the RN sampling
algorithm will select a subset of the nodes, while maintaining all the edges among the selected
nodes. The sampled nodes are selected according to some structural preservability criteria, e.g.,
in terms of their node degree, PageRank value [138], or other node centrality measures. In the
second case, the sampling algorithm will select a subset of the edges to create the subgraph. The
RE sampling process can be done by randomly choosing the edge or by randomly choosing a
node first before selecting an edge incident to the node [104]. Finally, a traversal-based sampling
algorithm starts from an initial set of nodes (and/ or edges) and expands the sample based on
their connectivity to the nodes in the current sample. This method includes random walk based
sampling [174, 122, 148], snowball sampling [57] (which is similar to breadth-first search with the
difference in that it only expands a fixed number of neighbors), forest fire sampling [105] (which
is a probabilistic version of snowball sampling), and Markov chain Monte Carlo methods (e.g., the
Metropolis-Hastings [75, 61] algorithm).
     Graph sampling methods can be utilised in training graph neural networks (GNN). For example,
Hamilton et al. [67] proposed a method for generating node representation by sampling and aggre-
gating features from a node’s local neighborhood. Ying et al. [183] combines random walks and
graph convolutions to generate node representations. Chen et al.[28] proposed a sampling scheme
in the reformulation of the loss and the gradient whereas Zou et al.[194] proposed layer-dependent
importance sampling. Alternative graph sampling methods have been proposed in [31, 187, 35].
These sampling approaches are designed for efficient training of graph neural networks, instead of
preserving the topological properties and group representativity of the network, which is the focus
of this chapter.
     Fairness in machine learning has received considerable attention in recent years [16, 49, 69, 97].
Prior works have mainly focused on classification tasks and can be categorized into individual and
group-level fairness. Individual-level fairness was developed based on the proposition that similar
                                                 78


people should be treated similarly. An example of such an approach is by Dwork et al. [49]
that defines a metric based on probabilistic distance measure between individuals with Lipschitz
condition. Fairness from a group-level perspective was developed based on the idea that members of
different demographic groups such as race and gender should be treated equally. These approaches
would quantify fairness in terms of statistical measures [16] such as statistical parity [49], equalized
odds [69], and balanced error rate [53].
     There are growing interests to the problem of fairness in network mining tasks [121, 147, 23,
81, 40]. Rahman et al. [147] proposed Fairwalk, a fairness-aware embedding method based on
node2vec. Buyl and De Bie [23] proposed a Bayesian method for learning debiased embeddings by
using a biased prior. Kanget al. [81] presented a definition of individual fairness for graph mining
and three algorithms for debiasing the input graph, debiasing the learning model and debiasing the
output results. Dai and Wang [40] proposed debiasing framework for GNNs whereas Masrour et
al. [121] introduced a fairness metric based on a network modularity measure and utilized it for
link prediction tasks in network mining to alleviate the filter bubble problem in social networks.
6.3     Preliminaries
     Let ⌧ =< +, ⇢, - > be an attributed network, where + is the set of nodes, ⇢ ✓ + ⇥ + is the set
of links, and - 2 R |+ |⇥3 is the feature matrix associated with the nodes in the network. We assume
- can be partitioned into two groups - = (- ( ?) , - (D) ), where - ( ?) denote the protected attributes
and - (D) denote the unprotected ones. Let {%1 , . . . , % } be the partitions of the nodes in + based
on the values of the protected attribute - ( ?) , where   is the number of distinct combination of its
values.
     Given a pair of networks, ⌧ 1 =< +1 , ⇢ 1 , -1 > and ⌧ 2 =< +2 , ⇢ 2 , -2 >, we say that ⌧ 1 is a
subgraph of ⌧ 2 , denoted as ⌧ 1 ✓ ⌧ 2 , if +1 ✓ +2 and ⇢ 1 ✓ ⇢ 2 . If ⇢ 1 includes all the links in ⌧ 2
that have endpoints in +2 , then ⌧ 1 is an induced subgraph of ⌧ 2 . We denote the set of all possible
induced subgraphs of size = of a graph ⌧ as G= . For brevity, we consider only undirected networks
in this study, though the ideas can be extended to directed networks.
                                                   79


Definition 12 (Network Sampling). Given an attributed network ⌧ =< +, ⇢, - > and a sample
size n, the task of network sampling is to extract an induced subgraph ⌧ ⇤B =< +B⇤ , ⇢ B⇤ , -B⇤ > with n
nodes that maximizes the following objective function o : G= ! R+ .
                                      ⌧ ⇤B = argmax⌧ B 2G= o(⌧ B )                                 (6.1)
    This chapter focuses on designing a network sampling algorithm that satisfies both structural
preservability and group representativity requirements. The former requires comparing the topo-
logical properties of the sampled graph against the original network while the latter evaluates how
representative the sample is with respect to different subgroups, as defined by the node attributes.
We will discuss in detail the proposed objective function to achieve this goal in the next section.
6.3.1    Network Sampling as Subset Selection Problem
The network sampling problem can be formulated as a subset selection problem, which is defined
as the task of finding a subset    from a set ( that maximizes an objective function 5 : 2( ! R+
such that | |  =. Network sampling is a form of subset selection problem where ( is the set
of nodes in the network. Besides network sampling, there are other machine learning tasks, such
as feature subset selection, that can also be formulated as a subset selection problem. However,
due to its exponential search space, it has been shown that the subset selection problem is usually
NP-hard[43].
    Many algorithms have been developed to find a polynomial-time approximation solution for
this problem. These algorithms can be generally categorized into two groups—convex relaxation
and greedy algorithms [146]. Convex relaxation methods would replace the non-convex subset size
constraint with convex constraints [161, 195]. In contrast, greedy algorithms would start from an
empty set and iteratively select an element from the set + to be added in a way that maximizes
the objective function [60, 43]. For submodular objective functions, the greedy algorithms are
guaranteed to find a reasonably good approximation to the optimal solution. Specifically, a function
                                                  80


 5 is submodular if and only if:
                                 5 ( [ ⌫) + 5 ( \ ⌫)  5 ( ) + 5 (⌫)                             (6.2)
for all subsets , ⌫ ✓ +. If 5 is a monotone, submodular, non-negative function on 2+ , then the
greedy algorithm will provide a solution within 1 14 approximation of the optimum solution [129].
     If the function does not have submodular property, the greedy algorithms may perform poorly.
Nevertheless, there is a class of non-submodular functions that has received considerable attention
recently. Bian et al. [19] provided an approximation of the bounds for greedy maximization of
nondecreasing set functions. Their bound was defined in terms of the (generalized) curvature and
submodularity ratio of the function. Chen et al. [29] showed that a randomized version of the greedy
algorithm achieves an approximation ratio of (1+ W1 ) 2 for weakly submodular maximization subject
to a general matroid constraint, where W is a parameter measuring the distance from submodularity.
6.4      Proposed Sampling Framework
     Our goal is to design a network sampling framework that considers both structural preservability
and group representativity objectives. In this section we first introduced the proposed node-
level sampling measure for structural preservability of the sample. We then describe our group
representativity objective, which measures fairness in the sampled network in terms of the worst-
case structural preservability measure for different groups of the protected attribute. The criteria
will be employed by our proposed greedy algorithm to be described in Section 6.5.
6.4.1     Node-level Sampling Measure
Since the protected attribute is an individual-level feature defined for each node, a natural starting
point for measuring a sample’s goodness in terms of preserving the topological properties of a
network is at the individual node level. More specifically, node centrality [130] has been widely
used as a measure of the importance of the nodes in a network. By comparing the centrality values
of all the nodes in the sampled graph to their corresponding values in the original network, the
structural preservability of a sample can therefore be evaluated.
                                                  81


Definition 13 (Centrality Ratio). Given a network ⌧ =< +, ⇢, - > and a subgraph ⌧ B , then the
centrality ratio ` for a node v in ⌧ B is
                                                        ⇠ (E, ⌧ B )
                                          `(E, ⌧ B ) =                                            (6.3)
                                                        ⇠ (E, ⌧)
where ⇠ corresponds to a node centrality measure.
    One potential caveat of using Equation (6.3) is that it assumes the entire network is available in
order to compute its denominator. If the sampling algorithm is restricted to have access only to the
sampled network (e.g., while crawling the network) instead of full access to the entire network, we
may relax the preceding definition to the following equation instead:
                                          ˜ ⌧ B ) ⇡ ⇠ (E, ⌧ B )
                                          `(E,                                                    (6.4)
The relaxed measure is often used during sampling by most algorithms whereas the true centrality
ratio (Equation (6.3)) is used when evaluating the performance of the sampling algorithm. Note
that there are numerous centrality measures that have been introduced in the literature [130]. In
this chapter, we consider the harmonic centrality [22] as our node centrality measure. Harmonic
centrality of a node D to a graph ⌧ is defined as the sum of the inverse of the shortest path distance,
3⌧ (D, G), between D to all other nodes G in ⌧:
                                          ’                     ’         1
                             (D, ⌧) ⌘              (D, G) =                                       (6.5)
                                                                      3⌧ (D, G)
                                       G2+\{D}                G2+\{D}
Note that although the discussion in this chapter focuses on harmonic centrality, our proposed
framework is applicable to other centrality measures as well. We choose harmonic centrality as
our measure for several reasons. First, unlike other centrality measures such as closeness and
betweenness, which are restricted to connected networks, harmonic distance is applicable to both
connected and disconnected networks. This is because if 3⌧ (D, G) = 1, then the corresponding
element in the sum given in Equation (6.5) will be zero. Also, the measure is intuitive as it
considers the relative influence of the nodes in a network by giving higher weights to nodes that
are closer to D than those located further away. The strength of using harmonic distance compared
                                                     82


to other centrality measures was recently demonstrated in [22], where the authors presented an
axiomatic approach to evaluate the various centrality measures according to a set of fundamental
properties a good centrality measure should exhibit. They showed that only harmonic centrality
measure satisfies all the axioms, which makes it an excellent measure for arbitrary networks [21].
For example, the authors showed that harmonic centrality is one of only two centrality measures
(besides PageRank) that is strictly rank monotone, i.e., adding a new edge to a node will not demote
its rank relative to other lower-ranked nodes in the network.
     Furthermore, the following lemma shows two additional properties of the harmonic centrality
when used as he structural preservability measure (see Equation (6.4) by a sampling algorithm.
Lemma 6. The harmonic centrality satisfies the following properties:
    1. `(E,
       ˜ ⌧ B ) = 0, if E 8 +B .
    2. `(E,
       ˜ ⌧ 1 )  `(E,˜ ⌧ 2 ) 8E 2 +1 , if ⌧ 1 ✓ ⌧ 2 .
where `(E,
        ˜ ⌧ B) =       (E, ⌧ B ).
     Proof : For the first property, it is easy to show that `(E,˜ ⌧ B ) = 0 for E 8 +B since 8D 2 +B :
3⌧ (E, D) = 1 )       (E, D) = 0. To prove the second property, let ⌧ 1 = (+1 , ⇢ 1 ) and ⌧ 2 = (+2 , ⇢ 2 ).
If ⌧ 1 ✓ ⌧ 2 then 8E, D 2 +1 : 3⌧ 1 (E, D)     3⌧ 2 (E, D). As a result,
                                                      1               1
                                  8E, D 2 +1 :                             .
                                                3⌧ 1 (E, D)     3⌧ 2 (E, D)
Since +1 ✓ +2 , it is easy to show that      (E, ⌧ 1 )     (E, ⌧ 2 ), 8E 2 +1 . Thus, the second property
holds for Equation (6.4).                                                                                 ⇤
6.4.2   Fairness-aware Network Sampling Objective
To address the challenge of combining structural preservability with group representativity, we
introduce a max-min subgraph fairness criterion, which is inspired by the idea of the minimax
Pareto fairness concept proposed in [118] for satisfying group fairness. Specifically, our max-min
                                                      83


subgraph fairness criterion evaluates the average centrality value for each group of the protected
attribute and uses its worse group performance (i.e., group with minimum average centrality value)
as its fairness measure.
     Let {%1 , %2 , · · · , % } be a partitioning of the nodes in a network ⌧ based on their membership
according to the protected attribute - ( ?) . For each group %8 , we define the following group
centrality ratio as follows:
Definition 14 (Group Centrality Ratio). Given a network ⌧ =< +, ⇢, - > and centrality ratio `,
the group centrality ratio for the node group %8 is
                                                         1 ’
                                          f8 (⌧ B ) =              `(D, ⌧ B )                       (6.6)
                                                        |%8 |
                                                              D2%8
For sampling algorithms without full access to the entire network, we may replace ` by `˜ in the
above definition.
Definition 15 (Subgraph Fairness Criterion). Given a network sample ⌧ B =< +B , ⇢ B , -B >, node
groups {%1 , %2 , · · · , % }, and a group centrality ratio function f, we define the following subgraph
fairness measure:
                                         o 5 08A (⌧ B ) = min {f8 (⌧ B )}                           (6.7)
                                                           18
Our network sampling goal is to maximize the fairness criterion defined in Equation (6.7). Replacing
the measure into the objective function in Equation (6.1), our fairness-aware sampling objective is:
                                      ⌧ ⇤B = argmax⌧ B 2G= min (f8 (⌧ B ))                          (6.8)
                                                                   8
Lemma 7. If `˜ satisfies the properties stated in Lemma 6, then the fair network sampling measure,
o 5 08A , is a monotonically nondecreasing function.
     Proof : Let f <8= (⌧ B ) = min18 (f8) ). In order to show f <8= is a monotonically nonde-
creasing function, we have to show that f <8= (⌧ 1 )  f <8= (⌧ 2 ) if ⌧ 1 ✓ ⌧ 2 . Since `˜ satisfies the
second property in Lemma 6 and f8) is a summation over `, therefore f8) must be a monotonically
                                                          84


nondecreasing function for all 1  8             . Furthermore, as f <8= is the minimum value of f8) over
8, it must also be monotonically nondecreasing, which completes the proof.                              ⇤
6.5     Proposed Greedy Fair Network Sampling Algorithm
     This section describes our proposed greedy sampling algorithm for fair network sampling,
which is designed to provide an approximate solution to the sampling objective given in (6.8).
6.5.1   Greedy Algorithm for Max-min Subgraph Fairness
We propose the following greedy algorithm for the max-min subgraph fairness problem. Our
greedy algorithm is based on the following notion of marginal gain of a set of nodes           ⇢ +.
Definition 16 (Marginal Gain). Given a network ⌧ = (+, ⇢), a subgraph ⌧ B = (+B , ⇢ B ), and a set
of nodes      ⇢ + \ +B , the marginal gain X <8= (.) of adding set       ⇢ + to ⌧ B is
                                   X <8= ( |⌧ B ) = f <8= (⌧ B [ )   f <8= (⌧ B )                    (6.9)
where f <8= (⌧ B ) = min18 {f8) (⌧ B )}.
     A greedy algorithm can be developed to optimize (6.7) by incrementally adding a node E into
the sample ⌧ B in a way that maximizes the marginal gain. However, computing the harmonic
centrality can be expensive when the sampled graph is large. To improve its efficiency, we present
the following fast implementation of our greedy algorithm based on a reference set of target nodes.
Definition 17 (Target Set). Given a network ⌧ =< +, ⇢, - >, where + = [8=1 %8 , the target set
) = {)1 , . . . , ): } is a set of node subsets such that )8 ⇢ %8 , for 1  8    .
     We will use the target set to compute the following approximate group centrality ratio: f̃8 (⌧ B ) =
  1 Õ
|) | D2)8 `(D, ⌧ B ) for each group and compute the marginal gain in Equation (6.9) using the
   8
approximate group centrality ratio instead. Thus, our greedy algorithm will start from the initial,
induced subgraph of the target set ), and then, iteratively expands the subgraph by selecting a
node from the candidate set j(⌧ B ) that maximizes the marginal gain. The candidate set j(⌧ B )
                                                         85


corresponds to all immediate neighbors of the nodes +B in + \ +B . Note that the target set ) may
correspond to the initial seed nodes of the sampling algorithm or a subset of nodes that must be
included in the network sample (e.g., nodes with high degrees or other important nodes as specified
by user).
Algorithm 1 Greedy Fair Network Sampling (GFNS)
   Input: graph G, sample size n, and target set T.
   ⌧0      Induced-subgraph(T).
   for t = 1 to = 1 do
      j     {D | (D, E) 2 ⌧, D 2 + \ +C 1 , E 2 +C 1 }
      E⇤     argmaxE2j X <8= (E|⌧ C 1 )
      ⌧C     Induced-subgraph(⌧ C 1 [ {E ⇤ })
   end for
   Output: ⌧ C
6.5.2    Theoretical Bounds on Greedy Approximation
Unfortunately, the min-max subgraph fairness criterion is not a submodular function. However, as
shown in Lemma 7, it is a monotonically nondecreasing function. This allows us to use the result
of [19] to obtain a theoretical bound on the greedy approximated solution. Before providing the
main theorem, we first need to introduce some definitions.
Definition 18 (Greedy Submodularity Ratio [19]). The greedy submodularity ratio of a function f
is the largest scalar W such that
                       ’
                              X(D|⌧ (C) )   WX( |⌧ (C) ), 8| | = =, C = 0, . . . = 1
                    D2 \⌧ (C)
Remark. For a non-decreasing f or f <8= functions, W 2 [0, 1].
Definition 19 (Greedy curvature [19]). The greedy curvature is the smallest scalar U
                            X(E C |⌧ (C 1) [ )    (1   U)X(E C |⌧ C ), 8| | = =
                                                   86


Theorem 8. Let f be the group centrality measure defined in Equation (6.6) and X <8= (·) be the
marginal gain with greedy submodularity ratio and greedy curvature defined in Definitions 18
and 19, respectively. The proposed greedy fair network sampling algorithm has the following
approximation guarantee
                               1h      ⇣      UW ⌘ i            1
              f(), ⌧ ( ) )        1                  f(), ⌧ ⇤ )   (1    4 UW )f(), ⌧ ⇤ )
                               U                                U
where ⌧ ( ) is the output of the greedy algorithm and ⌧ ⇤ is the optimum solution.
    The proof of the theorem can be shown using Lemma 7 of this chapter and Theorem 1 of [19].
6.6    Experimental Evaluation
    This section describes the experiments performed to evaluate the efficacy of our proposed
methods fair network sampling algorithm.
6.6.1   Experiment Setup
6.6.1.1   Datasets
We evaluated our methods on four real-world datasets as summarized in Table 6.1. The first
dataset corresponds to the Facebook ego-network of friendship relation [106]. The second dataset
corresponds to a social network from tagged.com [52]. The third dataset is the German credit[48]
data, where the nodes represent clients and links between nodes are created based on similarity
of the clients’ credit accounts [5]. Finally, the fourth dataset corresponds to the Credit Default
data [182], where the nodes are credit applicants and the links are created based on similarity of
the applicants [5]. Gender is chosen as the protected attribute for the first 3 datasets whereas age is
the protected attribute for the last dataset.
6.6.1.2   Evaluation Metric
We evaluate the structural preservability of the sampling algorithm according to the following three
metrics:
                                                   87


                      Table 6.1: Statistics of network data used for experiments.
                  Network      #nodes          #edges        CC       protected feature
                  Facebook      4,039           88234      0.6055           gender
                   Tagged     5,607,448     912,280,409    0.0005           gender
                   German       1,000          24,970      0.3801           gender
                    Credit     30,000        2,174,014     0.6466            Age
    • Degree distribution distance (Ddist): Using the Kolmogorov-Smirnov statistic, we compare
      degree distribution of the sampled network to that of the original network. It is a nonparamet-
      ric test of equality between continuous, one-dimensional probability distributions. Following
      [104], we simply use it to measure the distance between the two distributions as follow:
                                      ⇡38BC (⌧ B ) = sup | ( (3)     (3)|
                                                      3
      where ( and        are cumulative distribution functions (CDFs) of degree distributions for the
      sampled network and original network respectively.
    • Clustering Coefficient(X-CC): We compare the average clustering coefficient of the sampled
      graph and the original one.
                                           1 ’ _⌧ (E)         1 ’ _⌧ B (E)
                                X-CC =
                                          |+ |     g⌧ (E)   |+B |      g⌧ B (E)
                                               E2+                E2+B
      where _⌧ (E) is the number of triangles in ⌧ with E being one of the nodes. g⌧ (E) is the
      number of subgraphs in G with 2 links and 3 nodes, where one of the nodes is v and v is
      connected to the other two nodes.
    • Harmonic: the average centrality ratio efined based on Harmonic distance for all nodes in
      the sampled graph. This measure will be between zero and one. Larger values are better.
For group representativity, we consider the following evaluation metrics:
    • Normalized Cumulative Group Relevance (nCGR) [166]: measures the extent to which
      the position of the nodes in the ranking of a sampled graph and the visibility of a protected
                                                   88


      group is preserved. To do this, the relevance of a node E in a given graph is defined as the
      inverse of the node E’s rank, based on its degree, normalized by the rank sum of all nodes in
      the network.
                                                     (A0=: (E)) 1
                                        A4; (E) = Õ
                                                      D2+ A0=: (E)
      The cumulative protected group relevance for the sampled graph is compared against the
      original network as follows:
                                             Õ
                                                                 A4; (E) + n
                                  =⇠⌧ '8 = Õ
                                               E2C> ?: (⌧ B )\%8
                                                E2C> ?: (⌧)\%8   A4; (E) + n
      where =⇠⌧ '8 measures the extent to which the relevance of a protected group %8 is above
      or below the expectation from the original network with respect to the top k nodes. Value
      above 1 indicates the group is more relevant in the sampled graph compared to the original
      network. Value less than 1 indicates that the group has become less relevant in the sampled
      network. The hyperparameter n = 0.001 is used to avoid division by zero [166]. Here we
      report the minority group nCGR and the closer value to 1 is considered to be better.
    • min-f: The subgraph fairness criterion using the group centrality ratio defined in Equation
      (6.6). Here in evaluation we consider all the nodes in the sampled subgraph when computing
      f.
6.6.1.3   Baseline Sampling Algorithms
We consider following commonly studied sampling techniques
    • Random Node: We consider two variations; the first variation which we refer to as NS,
      randomly selects a subset ( of the nodes without considering their topology properties. The
      second variation, NSD, randomly selects a subset ( of the nodes with probability proportional
      to the node degree.
                                                 89


     • Breadth/Depth-first search BFS/ DFS: Both BFS and DFS are widely used for exploring
       large networks. Both algorithms start from an initial set of nodes and iteratively expand the
       sample based on their graph traversal strategy.
     • Random Walk: RW starts with an initial set of seed nodes and expands the sample by
       simulating a random walk on the network. Fair Random Walk (FRW) [147] is a variation of
       the method to account for fairness. Instead of randomly selecting a node amongst all neighbors
       as the next node, it first partitions the neighbors into groups based on their protected attributes
       such that each group has the same probability of being chosen regardless of their sizes. A
       random node from the chosen group is then selected as the next node to visit.
     • Metropolis-Hastings Random Walk (MHRW): Metropolis-Hastings is a Markov Chain
       Monte-Carlo (MCMC) technique for producing random samples from an arbitrary distri-
       bution. The MHRW results in selecting a subset ( of nodes from a uniformly random
       distribution.
    In all of the above methods, after selecting a subset ( of nodes, we extract the induced subgraph
corresponding to the nodes in (.
6.6.2    Experimental Results
In the following subsections we investigate the performance of the proposed framework.
6.6.2.1    General Performance
In this subsection we investigate the performance of the proposed sampling methods on preserving
network properties. Tables 6.2, 6.3, 6.4, and 6.5 shows the result on all seven measures for the four
datasets. Based on the results, there is no single approach which perform better than other sampling
methods on all 4 datasets. Specifically, for the three structural perservability related measures the
GFNS perform relatively better than other baselines. Although the results may vary depending
on the dataset and evaluation metric the GFNS algorithm consistently appears in the top-2 in 9
                                                      90


                                    Table 6.2: German credit Dataset
                     X-CC               Ddist         Harmonic           nCGR             min-f
       NS        0.008+/-0.00       0.616+/-0.00    0.368+/-0.00      1.01+/-0.00     0.365+/-0.00
      NSD        0.009+/-0.00       0.412+/-0.00     0.4 +/-0.00     1.018+/-0.00     0.398+/-0.00
      DFS        0.008+/-0.00       0.586+/-0.00    0.372+/-0.00     1.011+/-0.00     0.369+/-0.00
       BFS      0.027+/-0.000       0.333+/-0.00    0.409+/-0.00     1.018+/-0.00     0.408+/-0.00
       RW        0.016+/-0.00      0.351+/-0.001    0.408+/-0.00     1.014+/-0.00     0.407+/-0.00
     MHRW        0.008+/-0.00      0.355+/-0.001    0.405+/-0.00     1.008+/-0.00     0.401+/-0.00
      FRW        0.011+/-0.00      0.357+/-0.001    0.406+/-0.00     1.014+/-0.00     0.405+/-0.00
     GFNS        0.037+/-0.00      0.249+/-0.00     0.443+/-0.00     1.019+/-0.00     0.441+/-0.00
                                      Table 6.3: Faecebook Dataset
                     X-CC              Ddist         Harmonic            nCGR              min-f
       NS       0.177+/-0.002     0.659+/-0.001     0.015+/-0.00     1.039+/-0.00      0.014+/-0.00
      NSD        0.077+/-0.00     0.224+/-0.001     0.079+/-0.00     1.018+/-0.00      0.078+/-0.00
      DFS         0.02+/-0.00      0.538+/-0.00     0.024+/-0.00     1.025+/-0.00      0.023+/-0.00
      BFS        0.105+/-0.00      0.308+/-0.00     0.109+/-0.00      1.007+/-0.00     0.109+/-0.00
       RW        0.103+/-0.00     0.339+/-0.001     0.093+/-0.00     1.012+/-0.00      0.092+/-0.00
    MHRW         0.041+/-0.00     0.192+/-0.005     0.145+/-0.00     1.013+/-0.00      0.144+/-0.00
      FRW        0.096+/-0.00     0.341+/-0.002     0.089+/-0.00     1.012+/-0.00      0.087+/-0.00
     GFNS        0.014+/-0.00      0.182+/-0.00     0.135+/-0.00     1.009+/-0.00      0.135+/-0.00
out of 12 settings, which shows the effectivness of the proposed method in preserving structural
properties of the graph.
    For the sampling fairness evaluation, we consider the proposed criteria alongside nCGR as
the evaluation metrics. The last two columns in Tables 6.2, 6.3, 6.4, and 6.5 results. Similar to
structural preservability, The results indicate that no single approach performs better than other
sampling methods on all four datasets. The proposed greedy algorithm successfully achieves to be
among the top-2 in 6 out of 8 settings. On the Credit dataset, the performance of our algorithm on
both fairness criteria is the best. On the other three datasets, it has the best result in one of the two
fairness criteria.
6.6.2.2   Performance on Target Nodes
We first evaluate the performance of the sampling algorithms on target nodes. We ran each baseline
algorithm ten times, and plot the patterns. For the target set, we consider ten nodes, ten nodes
                                                   91


                                     Table 6.4: Credit Dataset
                     X-CC            Ddist          Harmonic          nCGR             min-f
       NS       0.523+/-0.006    0.999+/-0.00      0.001+/-0.00   1.012+/-0.00     0.000+/-0.00
      NSD       0.242+/-0.007    0.997+/-0.00      0.001+/-0.00   1.000+/-0.00     0.001+/-0.00
       DFS       0.146+/-0.00    0.984+/-0.00      0.006+/-0.00   1.000+/-0.00     0.006+/-0.00
       BFS       0.121+/-0.00    0.868+/-0.00      0.008+/-0.00   1.051+/-0.00     0.007+/-0.00
       RW        0.019+/-0.00    0.808+/-0.00      0.007+/-0.00   1.017+/-0.00     0.007+/-0.00
     MHRW       0.081+/-0.002   0.552+/-0.018      0.016+/-0.00   1.001+/-0.00     0.012+/-0.00
      FRW        0.033+/-0.00   0.768+/-0.002      0.007+/-0.00   1.004+/-0.00     0.006+/-0.00
      GFNS       0.145+/-0.00    0.182+/-0.00      0.026+/-0.00   1.000+/-0.00     0.015+/-0.00
                                    Table 6.5: Tagged Dataset
                    X-CC           Ddist           Harmonic           nCGR             min-f
       NS      0.001+/-0.00    0.398+/-0.00      0.217+/-0.12     2.052+/-0.00     0.208+/-0.12
      NSD      0.001+/-0.00   0.105+/-0.003      0.012+/-0.001    1.951+/-0.00      0.01+/-0.00
      DFS      0.001+/-0.00    0.298+/-0.00       0.026+/-0.00    1.048+/-0.00     0.107+/-0.00
      BFS      0.001+/-0.00    0.193+/-0.00       0.115+/-0.00    1.072+/-0.00     0.113+/-0.00
       RW      0.001+/-0.00     0.30+/-0.00       0.114+/-0.00    1.053 +/-0.00    0.106+/-0.00
    MHRW       0.002+/-0.00   0.267+/-0.005       0.015+/-0.00    1.067+/-0.00     0.012+/-0.00
      FRW      0.001+/-0.00    0.281+/-0.00       0.114+/-0.00    1.052+/-0.00     0.111+/-0.00
     GFNS      0.001+/-0.00    0.082+/-0.00       0.079+/-0.00    1.040+/-0.00     0.068+/-0.00
from each gender. For the Facebook data, we consider nodes with the highest harmonic centrality
measure as the target set. For the Tagged network, we randomly selected five nodes from each
gender.
    Figure 6.2 shows the value of ` for the ten target nodes on the Facebook dataset for the sample
size of 5% of the original network. Here we sorted the value of ` for each sampling method. We
notice that the GFNS outperform all the baseline method. For all ten nodes in the target set, the
value of ` defined based on Harmonic distance is higher for GFNS. This is what we expected, and
the experiment results agree with it. The second observation is that the BFS is performing better
than most of the baselines.
    Table 6.6 for German credit network summarizes the performance of the algorithms based
on the average of proposed measure in equation 6.7 for different sample sizes. The higher value
indicates better performance in preserving the structural properties of target nodes. It is transparent
that regardless of the sample size and network type, the proposed greedy algorithm outperforms
                                                  92


     Figure 6.2: The proposed Harmonic measure values for the samples with 10 target nodes.
         Table 6.6: The average node-level harmonic measure for German credit network .
                                100                200                  400
                   NS     0.0617+/-0.0001    0.1629+/-0.0001     0.3639+/-0.0001
                  NSD     0.0841+/-0.0000    0.1900+/-0.0000     0.3984+/-0.0000
                  DFS     0.0765+/-0.0000    0.1651+/-0.0000     0.3672+/-0.0000
                  BFS     0.0985+/-0.0000    0.2033+/-0.0000     0.4211+/-0.0000
                   RW     0.0850+/-0.0000    0.2013+/-0.0000     0.4129+/-0.0000
                  FRW     0.0870+/-0.0000    0.1984+/-0.0000     0.4114+/-0.0000
                 GFNS     0.1147+/-0.0000    0.2259+/-0.0000     0.4396+/-0.0000
other sampling methods as expected. This result shows the proposed greedy sampling is successful
in achieving its objective.
6.7    Conclusion
    This work presents a novel fairness-aware network sampling approach that combines the struc-
tural preservability and group representativity objectives into a unified learning framework. We
                                                93


introduced a new sugraph fairness criterion and developed a greedy fair network sampling algorithm
with well-grounded theoretical bounds on the greedy approximation. Finally, we experimentally
demonstrate the effectiveness of the proposed method on various real-world network data.
                                                 94


                                             CHAPTER 7
                                           FUTURE WORK
The use of machine learning algorithms in daily life is pervasive and entrenched in our society.
Its vast application includes different fields such as healthcare, criminal justice, advertisement and
recommender systems, banking and finance, dating and hiring, online social media. This broad
application creates a unique opportunity for machine learning researchers with new data sources,
problems, and challenges to address. Machine learning and data analysis can help us expand our
knowledge about the complex environment we live in and makes it more accessible and efficient. It
also exposed to the social, cultural, and institutional biases that have been established in our society.
    I want to continue my research in two directions. First, I would like to extend my research in
network analysis. Although network studies have played a central role in machine learning for quite
a while, there are still multiple challenges in this field. In the first part of this chapter, I explain
some of the related challenges that I would like to address in my research. Second, I would like to
expand my line research in fairness to explainability. During my research related to this thesis, I
realized that explainability and fairness are related, and explainability can play a significant role in
mitigating machine learning-generated biases. In the second part of this chapter, I will explain this
connection and some research ideas that I would like to investigate.
7.1     Network Analysis
    I would like to extend my line of research on fairness-aware network analysis in the near future.
Many other problems in fairness network analysis are non-i.i.d problems and therefore call for an
approach similar to the ones that I have developed in my previous work. An interesting question
I would like to investigate is whether the network fairness measures that I have proposed can also
be applied to dynamic networks. This is not a straightforward question considering the fact that
static approach network centrality measures, such as PageRank, might be inappropriate in dynamic
settings [117]. An example of a fairness-aware dynamic problem on which I would like to work is
                                                    95


the ranking node problem. This has essential applications in recommender systems and information
retrieval. While there is some recent work on fairness in the non-dynamic ranking node problem
[176], to the best of my knowledge, there is no work on the dynamic ranking node problem.
    In addition to the fairness-aware network research track, I also found expanding the compu-
tational power of current network AI models beyond the message passing framework a critical
problem that I would like to investigate. I am interested in developing new network learning al-
gorithms that incorporate higher-order structural information about networks into the AI model. I
believe the harmonic-based measure for the sampling with an application on training graph convo-
lution networks has connections to the notion of graph moments discussed in [9] that I would like
to study more.
7.2    Fairness and Explainability in Machine Learning
    The best way to quantify fairness remains an unresolved debates among experts. There are
strong criticisms against both individualistic and group level approaches for quantifying fairness.
For example, group level metrics fail to guarantee fairness for individuals and subgroups, whereas
individual fairness metrics typically rely on similarity measures. Therefore, experts have not
reached a consensus over a general similarity measure that would be appropriate for a wide range
of problems.
    In response to this challenge, instead of developing a model that explicitly satisfies a notion of
fairness, I propose an indirect method that implicitly guarantees fairness. The proposed process,
which I refer to it as "democratization of fairness in machine learning," solicits judgments on the
fairness of the model’s decisions from users affected by these decisions. Maximizing the positive
feedbacks of users can indirectly result in the fairness of the model decision. Currently, fairness
in ML suffers from a lack of direct conversation with users (subjects) affected by the model’s
decisions. The democratization of fairness in ML works to close this gap through two main steps.
The first step is to ensure that subjects understand the algorithm decision-making process. As
a result of this understanding, well-informed subjects can have more reliable judgments about
                                                 96


algorithm decisions’ fairness. The second step is to update the model based on the feedback of the
well-informed subjects.
     Step one in the democratization of fairness is related to the explainability topic. Fairness
and explainability concepts are intertwined, and explanations, which try to help human users
understand the functionality of the ML models, can be utilized to reach fairness in ML models [12].
Previous research has shown that the explanation of an algorithmic decision process can help detect
discrimination behavior[20]. For instance, counterfactual explanations ask questions like, “Had an
individual been of a different protected status, would the model have treated them differently?”[20]
and then use the answer as a fairness measure. However, the kind of explanation depends on who
is the target of the explanation. In the future, I aim to understand how to utilize an explanation as a
tool for involving the subjects in the process of detecting discrimination.
     Step two in the democratization of fairness is integrating the subjects’ feedback into the training
and updating of the model. There is a line of research in fairness in recent years using auditing for
detecting discrimination against subgroups of the population. In this body of research, auditors are
either an algorithm playing a zero-sum game against the model [86] or they are a group of experts
who have the required skillset to make a judgment [150]. However, none of this research considers
placing the users affected by the model decisions in the position of Auditor. I am interested in
expanding the idea of auditing to include these users.
                                                    97


BIBLIOGRAPHY
      98


                                      BIBLIOGRAPHY
[1]  Developer agreement and policy – twitter developers | twitter developer platform.
[2]  Goldman sachs’ misguided world cup predictions could provide clues to the apple card
     controversy.
[3]  Abello, J., Resende, M. G., and Sudarsky, S. Massive quasi-clique detection. In Latin
     American symposium on theoretical informatics (2002), Springer, pp. 598–612.
[4]  Adams, J. S. Inequity in social exchange. In Advances in experimental social psychology,
     vol. 2. Elsevier, 1965, pp. 267–299.
[5]  Agarwal, C., Lakkaraju*, H., and Zitnik*, M. Towards a unified framework for fair and
     stable graph representation learning.
[6]  Agrawal, A., Verschueren, R., Diamond, S., and Boyd, S. A rewriting system for convex
     optimization problems. Journal of Control and Decision 5, 1 (2018), 42–60.
[7]  Akoglu, L., McGlohon, M., and Faloutsos, C. Oddball: Spotting anomalies in weighted
     graphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (2010),
     Springer, pp. 410–421.
[8]  Akoglu, L., Tong, H., and Koutra, D. Graph based anomaly detection and description: a
     survey. Data mining and knowledge discovery 29, 3 (2015), 626–688.
[9]  Al Hasan, M., Chaoji, V., Salem, S., and Zaki, M. Link prediction using supervised learning.
     In SDM06: workshop on link analysis, counter-terrorism and security (2006).
[10] Ali, M., Sapiezynski, P., Bogen, M., Korolova, A., Mislove, A., and Rieke, A. Discrimination
     through optimization: How facebook’s ad delivery can lead to biased outcomes. Proceedings
     of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–30.
[11] Angwin, J., Larson, J., Mattu, S., and Kirchner, L. Machine bias. ProPublica, May 23
     (2016).
[12] Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., García,
     S., Gil-López, S., Molina, D., Benjamins, R., et al. Explainable artificial intelligence (xai):
     Concepts, taxonomies, opportunities and challenges toward responsible ai. Information
     Fusion 58 (2020), 82–115.
[13] Bayati, M., Gerritsen, M., Gleich, D. F., Saberi, A., and Wang, Y. Algorithms for large, sparse
     network alignment problems. In Data Mining, 2009. ICDM’09. Ninth IEEE International
     Conference on (2009), IEEE, pp. 705–710.
[14] Bengio, Y., Courville, A., and Vincent, P. Representation learning: A review and new
     perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013),
     1798–1828.
                                                99


[15] Berk, R. Criminal justice forecasts of risk: A machine learning approach. Springer Science
     & Business Media, 2012.
[16] Berk, R., Heidari, H., Jabbari, S., Kearns, M., and Roth, A. Fairness in criminal justice risk
     assessments: The state of the art. Sociological Methods & Research (2018).
[17] Beutel, A., Chen, J., Zhao, Z., and Chi, E. H. Data decisions and theoretical implications
     when adversarially learning fair representations. arXiv preprint arXiv:1707.00075 (2017).
[18] Bhardwaj, R., Nambiar, A. R., and Dutta, D. A study of machine learning in healthcare.
     In 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC)
     (2017), vol. 2, IEEE, pp. 236–241.
[19] Bian, A. A., Buhmann, J. M., Krause, A., and Tschiatschek, S. Guarantees for greedy
     maximization of non-submodular functions with applications. In International conference
     on machine learning (2017), PMLR, pp. 498–507.
[20] Black, E., Yeom, S., and Fredrikson, M. Fliptest: fairness testing via optimal transport. In
     Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (2020),
     pp. 111–121.
[21] Boldi, P., Luongo, A., and Vigna, S. Rank monotonicity in centrality measures.
[22] Boldi, P., and Vigna, S. Axioms for centrality. Internet Mathematics 10, 3-4 (2014), 222–262.
[23] Buyl, M., and De Bie, T. Debayes: a bayesian method for debiasing network embeddings.
     In International Conference on Machine Learning (2020), PMLR, pp. 1220–1229.
[24] Calders, T., Kamiran, F., and Pechenizkiy, M. Building classifiers with independency
     constraints. In 2009 IEEE International Conference on Data Mining Workshops (2009).
[25] Chang, J., and Blei, D. Relational topic models for document networks. In Artificial
     Intelligence and Statistics (2009), pp. 81–88.
[26] Charlin, L., and Zemel, R. The toronto paper matching system: an automated paper-reviewer
     assignment system.
[27] Chen, J., Geyer, W., Dugan, C., Muller, M., and Guy, I. Make new friends, but keep the old:
     recommending people on social networking sites. In SIGCHI (2009).
[28] Chen, J., Ma, T., and Xiao, C. Fastgcn: fast learning with graph convolutional networks via
     importance sampling. arXiv preprint arXiv:1801.10247 (2018).
[29] Chen, L., Feldman, M., and Karbasi, A. Weakly submodular maximization beyond cardinal-
     ity constraints: Does randomization help greedy? In International Conference on Machine
     Learning (2018), PMLR, pp. 804–813.
[30] Chen, R., Hua, Q., Chang, Y.-S., Wang, B., Zhang, L., and Kong, X. A survey of collaborative
     filtering-based recommender systems: From traditional methods to hybrid methods based
     on social networks. IEEE Access 6 (2018), 64301–64320.
                                               100


[31] Chiang, W.-L., Liu, X., Si, S., Li, Y., Bengio, S., and Hsieh, C.-J. Cluster-gcn: An efficient
     algorithm for training deep and large graph convolutional networks. In Proceedings of the
     25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
     (2019), pp. 257–266.
[32] Choi, E., Bahadori, M. T., Song, L., Stewart, W. F., and Sun, J. Gram: graph-based attention
     model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD
     International Conference on Knowledge Discovery and Data Mining (2017), pp. 787–795.
[33] Chouldechova, A. Fair prediction with disparate impact: A study of bias in recidivism
     prediction instruments. Big data 5, 2 (2017), 153–163.
[34] Clauset, A., Moore, C., and Newman, M. E. Hierarchical structure and the prediction of
     missing links in networks. arXiv preprint arXiv:0811.0484 (2008).
[35] Cong, W., Forsati, R., Kandemir, M., and Mahdavi, M. Minimal variance sampling with
     provable guarantees for fast training of graph neural networks. In Proceedings of the 26th
     ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020),
     pp. 1393–1403.
[36] Corbett-Davies, S., and Goel, S. The measure and mismeasure of fairness: A critical review
     of fair machine learning. arXiv preprint arXiv:1808.00023 (2018).
[37] Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., and Huq, A. Algorithmic decision
     making and the cost of fairness. In SIGKDD (2017), ACM, pp. 797–806.
[38] Cramér, H., and Leadbetter, M. R. Stationary and related stochastic processes: Sample
     function properties and their applications. Courier Corporation, 2013.
[39] Crandall, D. J., Backstrom, L., Cosley, D., Suri, S., Huttenlocher, D., and Kleinberg, J.
     Inferring social ties from geographic coincidences. PNAS 107, 52 (2010), 22436–22441.
[40] Dai, E., and Wang, S. Say no to the discrimination: Learning fair graph neural networks
     with limited sensitive attribute information. In Proceedings of the 14th ACM International
     Conference on Web Search and Data Mining (2021), pp. 680–688.
[41] Dai, Q., Li, Q., Tang, J., and Wang, D. Adversarial network embedding. In Thirty-Second
     AAAI Conference on Artificial Intelligence (2018).
[42] Dastin, J. Amazon scraps secret ai recruiting tool that showed bias against women, Oct 2018.
[43] Davis, G., Mallat, S., and Avellaneda, M. Adaptive greedy approximations. Constructive
     approximation 13, 1 (1997), 57–98.
[44] De Sá, H. R., and Prudêncio, R. B. Supervised link prediction in weighted networks. In The
     2011 international joint conference on neural networks (2011), IEEE, pp. 2281–2288.
[45] Dettmers, T., Minervini, P., Stenetorp, P., and Riedel, S. Convolutional 2d knowledge graph
     embeddings. In Thirty-Second AAAI Conference on Artificial Intelligence (2018).
                                               101


[46] Diamond, S., and Boyd, S. Cvxpy: A python-embedded modeling language for convex
     optimization. The Journal of Machine Learning Research 17, 1 (2016), 2909–2913.
[47] Dieterich, W., Mendoza, C., and Brennan, T. Compas risk scales: Demonstrating accuracy
     equity and predictive parity. Northpoint Inc 7, 7.4 (2016), 1.
[48] Dua, C. Dheeru and graff,“. UCI Machine Learning Repository,” UCI Machine Learning
     Repository (2017).
[49] Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. Fairness through awareness.
     In Proceedings of the 3rd innovations in theoretical computer science conference (2012).
[50] Edwards, H., and Storkey, A. Censoring representations with an adversary. arXiv preprint
     arXiv:1511.05897 (2015).
[51] Eldardiry, H., and Neville, J. A resampling technique for relational data graphs. In Proceed-
     ings of the 2nd SNA workshop, 14th ACM SIGKDD conference on knowledge discovery and
     data mining (2008).
[52] Fakhraei, S., Foulds, J., Shashanka, M., and Getoor, L. Collective spammer detection in
     evolving multi-relational social networks. In SIGKDD (2015), ACM, pp. 1769–1778.
[53] Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., and Venkatasubramanian, S.
     Certifying and removing disparate impact. In SIGKDD (2015), ACM, pp. 259–268.
[54] Feng, J., Huang, M., Yang, Y., and Zhu, X. Gake: Graph aware knowledge embedding.
     In Proceedings of COLING 2016, the 26th International Conference on Computational
     Linguistics: Technical Papers (2016), pp. 641–651.
[55] Fey, M., Eric Lenssen, J., Weichert, F., and Müller, H. Splinecnn: Fast geometric deep
     learning with continuous b-spline kernels. In Proceedings of the IEEE Conference on
     Computer Vision and Pattern Recognition (2018), pp. 869–877.
[56] Fiez, T., Shah, N., and Ratliff, L. A super* algorithm to optimize paper bidding in peer
     review. In Conference on Uncertainty in Artificial Intelligence (2020), PMLR, pp. 580–589.
[57] Frank, O., and SnÚders, T. Estimating the size of hidden populations using snowball
     sampling. JOURNAL OF OFFICIAL STATISTICS-STOCKHOLM- 10 (1994), 53–53.
[58] Friedler, S. A., Scheidegger, C., and Venkatasubramanian, S. On the (im) possibility of
     fairness. arXiv preprint arXiv:1609.07236 (2016).
[59] Fu, X., Ren, X., Mengshoel, O. J., and Wu, X. Stochastic optimization for market return
     prediction using financial knowledge graph. In 2018 IEEE International Conference on Big
     Knowledge (ICBK) (2018), IEEE, pp. 25–32.
[60] Gilbert, A. C., Muthukrishnan, S., and Strauss, M. J. Approximation of functions over
     redundant dictionaries using coherence. In SODA (2003), Citeseer, pp. 243–252.
                                              102


[61] Gjoka, M., Kurant, M., Butts, C. T., and Markopoulou, A. Walking in facebook: A case study
     of unbiased sampling of osns. In 2010 Proceedings IEEE Infocom (2010), Ieee, pp. 1–9.
[62] Goel, S., Rao, J. M., and Shroff, R. Personalized risk assessments in the criminal justice
     system. American Economic Review 106, 5 (2016), 119–23.
[63] Goldberg, A. V., and Tarjan, R. E. A new approach to the maximum-flow problem. Journal
     of the ACM (JACM) 35, 4 (1988), 921–940.
[64] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,
     A., and Bengio, Y. Generative adversarial nets. In NeurIPS (2014), pp. 2672–2680.
[65] Grover, A., and Leskovec, J. node2vec: Scalable feature learning for networks. In Proc of
     KDD (2016), pp. 855–864.
[66] Gupte, M., and Eliassi-Rad, T. Measuring tie strength in implicit social networks. In
     Proceedings of the 4th Annual ACM Web Science Conference (2012), pp. 109–118.
[67] Hamilton, W. L., Ying, R., and Leskovec, J. Inductive representation learning on large
     graphs. In 31st Conf. on Neural Information Processing Systems (NIPS 2017) (2017).
[68] Han, X., Liu, Z., and Sun, M. Neural knowledge acquisition via mutual attention between
     knowledge graph and text. In Thirty-Second AAAI Conference on Artificial Intelligence
     (2018).
[69] Hardt, M., Price, E., Srebro, N., et al. Equality of opportunity in supervised learning. In
     NeurIPS (2016), pp. 3315–3323.
[70] Hébert-Johnson, U., Kim, M. P., Reingold, O., and Rothblum, G. N. Calibration for the
     (computationally-identifiable) masses. arXiv preprint arXiv:1711.08513 (2017).
[71] Henderson, K., Eliassi-Rad, T., Faloutsos, C., Akoglu, L., Li, L., Maruhashi, K., Prakash,
     B. A., and Tong, H. Metric forensics: a multi-level approach for mining volatile graphs. In
     Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery
     and data mining (2010), pp. 163–172.
[72] Hofstra, B., Corten, R., Van Tubergen, F., and Ellison, N. B. Sources of segregation in social
     networks: A novel approach using facebook. American Sociological Review 82, 3 (2017),
     625–656.
[73] Hu, P., and Lau, W. C. A survey and taxonomy of graph sampling. arXiv preprint
     arXiv:1308.5865 (2013).
[74] Huang, X., Li, J., and Hu, X. Label informed attributed network embedding. In Proceedings
     of the Tenth ACM International Conference on Web Search and Data Mining (2017), ACM,
     pp. 731–739.
[75] Hübler, C., Kriegel, H.-P., Borgwardt, K., and Ghahramani, Z. Metropolis algorithms for
     representative subgraph sampling. In 2008 Eighth IEEE International Conference on Data
     Mining (2008), IEEE, pp. 283–292.
                                              103


[76] Jecmen, S., Zhang, H., Liu, R., Shah, N. B., Conitzer, V., and Fang, F. Mitigating manipula-
     tion in peer review via randomized reviewer assignments. arXiv preprint arXiv:2006.16437
     (2020).
[77] Jefferson, B. J. Predictable policing: Predictive crime mapping and geographies of policing
     and race. Annals of the American Association of Geographers 108, 1 (2018), 1–16.
[78] Johndrow, J. E., Lum, K., et al. An algorithm for removing sensitive information: application
     to race-independent recidivism prediction. The Annals of Applied Statistics 13, 1 (2019),
     189–220.
[79] Kamiran, F., Karim, A., and Zhang, X. Decision theory for discrimination-aware clas-
     sification. In 2012 IEEE 12th International Conference on Data Mining (2012), IEEE,
     pp. 924–929.
[80] Kamishima, T., Akaho, S., and Sakuma, J. Fairness-aware learning through regularization
     approach. In ICDM Workshops (2011), IEEE, pp. 643–650.
[81] Kang, J., He, J., Maciejewski, R., and Tong, H. Inform: Individual fairness on graph mining.
     In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery
     & Data Mining (2020), pp. 379–389.
[82] Kang, U., Tsourakakis, C. E., Appel, A. P., Faloutsos, C., and Leskovec, J. Hadi: Mining
     radii of large graphs. ACM Transactions on Knowledge Discovery from Data (TKDD) 5, 2
     (2011), 1–24.
[83] Karimi, F., Génois, M., Wagner, C., Singer, P., and Strohmaier, M. Homophily influences
     ranking of minorities in social networks. Scientific reports 8 (2018).
[84] Kaur, R., and Singh, S. A survey of data mining and social network analysis based anomaly
     detection techniques. Egyptian informatics journal 17, 2 (2016), 199–216.
[85] Kearns, M., Neel, S., Roth, A., and Wu, Z. S. Preventing fairness gerrymandering: Auditing
     and learning for subgroup fairness. arXiv preprint arXiv:1711.05144 (2017).
[86] Kearns, M., Neel, S., Roth, A., and Wu, Z. S. Preventing fairness gerrymandering: Auditing
     and learning for subgroup fairness. In International Conference on Machine Learning (2018),
     PMLR, pp. 2564–2572.
[87] Kelly, J., Sadeghieh, T., and Adeli, K. Peer review in scientific publications: benefits,
     critiques, & a survival guide. EJIFCC 25, 3 (2014), 227.
[88] Kingma, D., and Ba, J. Adam: A method for stochastic optimization. In ICLR (2014).
[89] Kipf, T. N., and Welling, M. Semi-supervised classification with graph convolutional
     networks. arXiv preprint arXiv:1609.02907 (2016).
[90] Kipf, T. N., and Welling, M.            Variational graph auto-encoders.      arXiv preprint
     arXiv:1611.07308 (2016).
                                                104


[91] Klau, G. W. A new graph-based method for pairwise global network alignment. BMC
      bioinformatics 10, 1 (2009), S59.
[92] Kleinberg, J., Mullainathan, S., and Raghavan, M. Inherent trade-offs in the fair determina-
      tion of risk scores. arXiv preprint arXiv:1609.05807 (2016).
[93] Kobren, A., Saha, B., and McCallum, A. Paper matching with local fairness constraints. In
      Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery
      & Data Mining (2019), pp. 1247–1257.
[94] Kollias, G., Mohammadi, S., and Grama, A. Network similarity decomposition (nsd): A fast
      and scalable approach to network alignment. IEEE Transactions on Knowledge and Data
      Engineering 24, 12 (2011), 2232–2243.
[95] Koutra, D., Tong, H., and Lubensky, D. Big-align: Fast bipartite graph alignment. In 2013
      IEEE 13th International Conference on Data Mining (2013), IEEE, pp. 389–398.
[96] Kuchaiev, O., Milenkovi∆, T., Memi evi∆, V., Hayes, W., and Pr ulj, N. Topological network
      alignment uncovers biological function and phylogeny. Journal of the Royal Society Interface
      (2010), rsif20100063.
[97] Kusner, M. J., Loftus, J., Russell, C., and Silva, R. Counterfactual fairness. In NeurIPS
      (2017), pp. 4066–4076.
[98] Lakshmi, T. J., and Bhavani, S. D. Link prediction in temporal heterogeneous networks. In
      Pacific-Asia Workshop on Intelligence and Security Informatics (2017), Springer, pp. 83–98.
[99] Lambrecht, A., and Tucker, C. Algorithmic bias? an empirical study of apparent gender-
      based discrimination in the display of stem career ads. Management Science 65, 7 (2019),
      2966–2981.
[100] Laniado, D., Volkovich, Y., Kappler, K., and Kaltenbrunner, A. Gender homophily in online
      dyadic and triadic relationships. EPJ Data Science 5, 1 (2016), 19.
[101] Lashbrook, A. Ai-driven dermatology could leave dark-skinned patients behind, Aug 2018.
[102] Lee, M. K., and Baykal, S. Algorithmic mediation in group decisions: Fairness perceptions
      of algorithmically mediated vs. discussion-based social division. In Proceedings of the 2017
      ACM Conference on Computer Supported Cooperative Work and Social Computing (2017),
      pp. 1035–1048.
[103] Leo, M., Sharma, S., and Maddulety, K. Machine learning in banking risk management: A
      literature review. Risks 7, 1 (2019), 29.
[104] Leskovec, J., and Faloutsos, C. Sampling from large graphs. In Proceedings of the 12th
      ACM SIGKDD international conference on Knowledge discovery and data mining (2006),
      pp. 631–636.
                                                105


[105] Leskovec, J., Kleinberg, J., and Faloutsos, C. Graphs over time: densification laws, shrink-
      ing diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD
      international conference on Knowledge discovery in data mining (2005), pp. 177–187.
[106] Leskovec, J., and Mcauley, J. J. Learning to discover social circles in ego networks. In
      NeurIPS (2012), pp. 539–547.
[107] Li, K., Gao, J., Guo, S., Du, N., Li, X., and Zhang, A. Lrbm: A restricted boltzmann
      machine based approach for representation learning on linked data. In Data Mining (ICDM),
      2014 IEEE International Conference on (2014), IEEE, pp. 300–309.
[108] Li, X., Du, N., Li, H., Li, K., Gao, J., and Zhang, A. A deep learning approach to link
      prediction in dynamic networks. In Proceedings of the 2014 SIAM International Conference
      on Data Mining (2014), SIAM, pp. 289–297.
[109] Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. Gated graph sequence neural networks.
      arXiv preprint arXiv:1511.05493 (2015).
[110] Liao, C.-S., Lu, K., Baym, M., Singh, R., and Berger, B. Isorankn: spectral methods for
      global alignment of multiple protein networks. Bioinformatics 25, 12 (2009), i253–i258.
[111] Liben-Nowell, D., and Kleinberg, J. The link-prediction problem for social networks. journal
      of the Association for Information Science and Technology 58, 7 (2007), 1019–1031.
[112] Lichtenwalter, R. N., Lussier, J. T., and Chawla, N. V. New perspectives and methods
      in link prediction. In Proceedings of the 16th ACM SIGKDD international conference on
      Knowledge discovery and data mining (2010), pp. 243–252.
[113] Louizos, C., Swersky, K., Li, Y., Welling, M., and Zemel, R. The variational fair autoencoder.
      arXiv preprint arXiv:1511.00830 (2015).
[114] Lu, J., Wu, D., Mao, M., Wang, W., and Zhang, G. Recommender system application
      developments: a survey. Decision Support Systems 74 (2015), 12–32.
[115] Lu, Q., and Getoor, L. Link-based classification. In Proceedings of the 20th International
      Conference on Machine Learning (ICML-03) (2003), pp. 496–503.
[116] Madras, D., Creager, E., Pitassi, T., and Zemel, R. Learning adversarially fair and transferable
      representations. arXiv preprint arXiv:1802.06309 (2018).
[117] Mariani, M. S., Medo, M., and Zhang, Y.-C. Ranking nodes in growing networks: When
      pagerank fails. Scientific reports 5, 1 (2015), 1–10.
[118] Martinez, N., Bertran, M., and Sapiro, G. Minimax pareto fairness: A multi objective
      perspective. In International Conference on Machine Learning (2020), PMLR, pp. 6755–
      6764.
[119] Masrour, F., Barjesteh, I., Forsati, R., Esfahanian, A.-H., and Radha, H. Network completion
      with node similarity: A matrix completion approach with provable guarantees. In Advances
      in Social Networks Analysis and Mining (ASONAM), 2015 IEEE/ACM International Con-
      ference on (2015), IEEE, pp. 302–307.
                                                  106


[120] Masrour, F., Tan, P.-N., Esfahanian, A.-H., and VanDam, C. Attributed network representa-
      tion learning approaches for link prediction. In ASONAM (2018), IEEE, pp. 560–563.
[121] Masrour, F., Wilson, T., Yan, H., Tan, P.-N., and Esfahanian, A. Bursting the filter bubble:
      Fairness-aware network link prediction. In Proceedings of the AAAI Conference on Artificial
      Intelligence (2020), vol. 34, pp. 841–848.
[122] Massoulié, L., Le Merrer, E., Kermarrec, A.-M., and Ganesh, A. Peer counting and sampling
      in overlay networks: random walk methods. In Proceedings of the twenty-fifth annual ACM
      symposium on Principles of distributed computing (2006), pp. 123–132.
[123] McPherson, M., Smith-Lovin, L., and Cook, J. M. Birds of a feather: Homophily in social
      networks. Annual review of sociology 27, 1 (2001), 415–444.
[124] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. A survey on bias and
      fairness in machine learning. arXiv preprint arXiv:1908.09635 (2019).
[125] Menon, A., and Elkan, C. Link prediction via matrix factorization. Machine Learning and
      Knowledge Discovery in Databases (2011), 437–452.
[126] Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient estimation of word representations
      in vector space. arXiv preprint arXiv:1301.3781 (2013).
[127] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representa-
      tions of words and phrases and their compositionality. In Advances in neural information
      processing systems (2013), pp. 3111–3119.
[128] Mitzenmacher, M. A brief history of lognormal and power law distributions. In Proceedings
      of the Allerton conference on communication, control, and computing (2001), pp. 182–191.
[129] Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. An analysis of approximations for
      maximizing submodular set functions—i. Mathematical programming 14, 1 (1978), 265–
      294.
[130] Newman, M. Networks: an introduction. Oxford university press, 2010.
[131] Newman, M. E. Clustering and preferential attachment in growing networks. Physical review
      E 64, 2 (2001), 025102.
[132] Newman, M. E. Finding community structure in networks using the eigenvectors of matrices.
      Physical review E 74, 3 (2006), 036104.
[133] Newman, M. E. Modularity and community structure in networks. PNAS 103, 23 (2006),
      8577–8582.
[134] Newman, M. E., and Girvan, M. Finding and evaluating community structure in networks.
      Physical review E 69, 2 (2004), 026113.
[135] Nguyen, T. T., Hui, P.-M., Harper, F. M., Terveen, L., and Konstan, J. A. Exploring the filter
      bubble: the effect of using recommender systems on content diversity. In WWW (2014).
                                               107


[136] O’Neil, C. Weapons of math destruction: How big data increases inequality and threatens
      democracy. Broadway Books, 2017.
[137] Özcan, A., and Öﬁüdücü, . G. Multivariate temporal link prediction in evolving social
      networks. In 2015 IEEE/ACIS 14th International Conference on Computer and Information
      Science (ICIS) (2015), IEEE, pp. 185–190.
[138] Page, L. Method for node ranking in a linked database, Sept. 4 2001. US Patent 6,285,999.
[139] Page, L., Brin, S., Motwani, R., and Winograd, T. The pagerank citation ranking: Bringing
      order to the web. Tech. rep., Stanford InfoLab, 1999.
[140] Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., and Zhang, C. Adversarially regularized graph
      autoencoder for graph embedding. arXiv preprint arXiv:1802.04407 (2018).
[141] Pariser, E. The filter bubble: What the internet is hiding, 2012.
[142] Pattillo, J., Youssef, N., and Butenko, S. Clique relaxation models in social network analysis.
      In Handbook of Optimization in Complex Networks. Springer, 2012, pp. 143–162.
[143] Peiró, J. M., Martínez-Tur, V., and Moliner, C. Perceived Fairness. Springer Netherlands,
      Dordrecht, 2014, pp. 4693–4696.
[144] Perozzi, B., Al-Rfou, R., and Skiena, S. Deepwalk: Online learning of social representations.
      In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery
      and data mining (2014), pp. 701–710.
[145] Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., and Weinberger, K. Q. On fairness and
      calibration. In NeurIPS (2017).
[146] Qian, C., Li, G., Feng, C., and Tang, K. Distributed pareto optimization for subset selection.
      In ÒCAI (2018), pp. 1492–1498.
[147] Rahman, T. A., Surma, B., Backes, M., and Zhang, Y. Fairwalk: Towards fair graph
      embedding. In ÒCAI (2019), pp. 3289–3295.
[148] Ribeiro, B., and Towsley, D. Estimating and sampling graphs with multidimensional random
      walks. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
      (2010), pp. 390–403.
[149] Rotmensch, M., Halpern, Y., Tlimat, A., Horng, S., and Sontag, D. Learning a health
      knowledge graph from electronic medical records. Scientific reports 7, 1 (2017), 1–11.
[150] Saleiro, P., Kuester, B., Hinkson, L., London, J., Stevens, A., Anisfeld, A., Rodolfa, K. T.,
      and Ghani, R. Aequitas: A bias and fairness audit toolkit. arXiv preprint arXiv:1811.05577
      (2018).
[151] Saxena, N., Huang, K., DeFilippis, E., Radanovic, G., Parkes, D., and Liu, Y. How do
      fairness definitions fare? examining public attitudes towards algorithmic definitions of
      fairness, 2018.
                                                 108


[152] Scripps, J., Tan, P.-N., Chen, F., and Esfahanian, A.-H. A matrix alignment approach for
      link prediction. In Proc of ICPR (2008).
[153] Shulga, L., and Tanford, S. Measuring perceptions of fairness of loyalty program members.
      Journal of Hospitality Marketing & Management 27, 3 (2018), 346–365.
[154] Singh, R., Xu, J., and Berger, B. Global alignment of multiple protein interaction networks
      with application to functional orthology detection. Proceedings of the National Academy of
      Sciences 105, 35 (2008), 12763–12768.
[155] SnÚders, T. A., Van de Bunt, G. G., and Steglich, C. E. Introduction to stochastic actor-based
      models for network dynamics. Social networks 32, 1 (2010), 44–60.
[156] Stelmakh, I., Shah, N., and Singh, A. On testing for biases in peer review. In Advances in
      Neural Information Processing Systems (2019), pp. 5286–5296.
[157] Stelmakh, I., Shah, N. B., and Singh, A. Peerreview4all: Fair and accurate reviewer
      assignment in peer review. arXiv preprint arXiv:1806.06237 (2018).
[158] Stelmakh, I., Shah, N. B., and Singh, A. Peerreview4all: Fair and accurate reviewer
      assignment in peer review. In Algorithmic Learning Theory (2019), PMLR, pp. 828–856.
[159] Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. Line: Large-scale information
      network embedding. In Proc of WWW (2015), pp. 1067–1077.
[160] Tian, F., Gao, B., Cui, Q., Chen, E., and Liu, T.-Y. Learning deep representations for graph
      clustering. In AAAI (2014), pp. 1293–1299.
[161] Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal
      Statistical Society: Series B (Methodological) 58, 1 (1996), 267–288.
[162] Tomkins, A., Zhang, M., and Heavlin, W. D. Reviewer bias in single-versus double-blind
      peer review. Proceedings of the National Academy of Sciences 114, 48 (2017), 12708–12713.
[163] Turner, A. How many people have smartphones worldwide (sept 2021).
[164] VeliÃkovi∆, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph
      attention networks. arXiv preprint arXiv:1710.10903 (2017).
[165] Vogelstein, J. T., Conroy, J. M., Podrazik, L. J., Kratzer, S. G., Fishkind, D. E., Vogel-
      stein, R. J., and Priebe, C. E. Fast inexact graph matching with applications in statistical
      connectomics. CoRR, abs/1112.5507 (2011).
[166] Wagner, C., Singer, P., Karimi, F., Pfeffer, J., and Strohmaier, M. Sampling from social
      networks with attributes. In Proceedings of the 26th International Conference on World
      Wide Web (2017), pp. 1181–1190.
[167] Wang, D., Pedreschi, D., Song, C., Giannotti, F., and Barabasi, A.-L. Human mobility, social
      ties, and link prediction. In SIGKDD (2011), Acm, pp. 1100–1108.
                                               109


[168] Wang, F., and Zhang, C. Label propagation through linear neighborhoods. IEEE Transactions
      on Knowledge and Data Engineering 20, 1 (2007), 55–67.
[169] Wang, H., Wang, J., Wang, J., Zhao, M., Zhang, W., Zhang, F., Xie, X., and Guo, M.
      Graphgan: Graph representation learning with generative adversarial nets. In Thirty-second
      AAAI conference on artificial intelligence (2018).
[170] Wang, J., and Shah, N. B. Your 2 is my 1, your 3 is my 9: Handling arbitrary miscalibrations
      in ratings. arXiv preprint arXiv:1806.05085 (2018).
[171] Wang, P., Xu, B., Wu, Y., and Zhou, X. Link prediction in social networks: the state-of-the-
      art. Science China Information Sciences 58, 1 (2015), 1–38.
[172] Wang, R., Li, B., Hu, S., Du, W., and Zhang, M. Knowledge graph embedding via graph
      attenuated attention networks. IEEE Access (2019).
[173] Washington, A. L. How to argue with an algorithm: Lessons from the compas-propublica
      debate. Colo. Tech. LJ 17 (2018), 131.
[174] Wei, W., Erenrich, J., and Selman, B. Towards efficient sampling: Exploiting random walk
      strategies. In AAAI (2004), vol. 4, pp. 670–676.
[175] Wiens, J., and Shenoy, E. S. Machine learning for healthcare: on the verge of a major shift
      in healthcare epidemiology. Clinical Infectious Diseases 66, 1 (2018), 149–153.
[176] Xie, T., Ma, Y., Tong, H., Thai, M. T., and Maciejewski, R. Auditing the sensitivity
      of graph-based ranking with visual analytics. IEEE Transactions on Visualization and
      Computer Graphics (2020).
[177] Xiong, C., Power, R., and Callan, J. Explicit semantic ranking for academic search via
      knowledge graph embedding. In Proceedings of the 26th international conference on world
      wide web (2017), pp. 1271–1279.
[178] Yang, C., Liu, Z., Zhao, D., Sun, M., and Chang, E. Y. Network representation learning with
      rich text information. In ÒCAI (2015), pp. 2111–2117.
[179] Yang, J., and Leskovec, J. Overlapping community detection at scale: a nonnegative matrix
      factorization approach. In Proceedings of the sixth ACM international conference on Web
      search and data mining (2013), pp. 587–596.
[180] Yang, J., McAuley, J., and Leskovec, J. Community detection in networks with node
      attributes. In 2013 IEEE 13th International Conference on Data Mining (2013), IEEE,
      pp. 1151–1156.
[181] Yang, T., Jin, R., Chi, Y., and Zhu, S. Combining link and content for community detection: a
      discriminative approach. In Proceedings of the 15th ACM SIGKDD international conference
      on Knowledge discovery and data mining (2009), pp. 927–936.
                                                110


[182] Yeh, I.-C., and Lien, C.-h. The comparisons of data mining techniques for the predictive
      accuracy of probability of default of credit card clients. Expert Systems with Applications
      36, 2 (2009), 2473–2480.
[183] Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. Graph
      convolutional neural networks for web-scale recommender systems. In Proceedings of the
      24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      (2018), pp. 974–983.
[184] Zafarani, R., Abbasi, M. A., and Liu, H. Social media mining: an introduction. Cambridge
      University Press, 2014.
[185] Zehlike, M., Bonchi, F., Castillo, C., Hajian, S., Megahed, M., and Baeza-Yates, R. Fa* ir: A
      fair top-k ranking algorithm. In Proceedings of the 2017 ACM on Conference on Information
      and Knowledge Management (2017), pp. 1569–1578.
[186] Zemel, R., Wu, Y., Swersky, K., Pitassi, T., and Dwork, C. Learning fair representations. In
      ICML (2013), pp. 325–333.
[187] Zeng, H., Zhou, H., Srivastava, A., Kannan, R., and Prasanna, V. Graphsaint: Graph
      sampling based inductive learning method. arXiv preprint arXiv:1907.04931 (2019).
[188] Zhai, S., and Zhang, Z. Dropout training of matrix factorization and autoencoder for link
      prediction in sparse graphs. In Proceedings of the 2015 SIAM International Conference on
      Data Mining (2015), SIAM, pp. 451–459.
[189] Zhang, S., and Tong, H. Final: Fast attributed network alignment. In Proceedings of the
      22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
      (2016), ACM, pp. 1345–1354.
[190] Zhang, X., Liu, H., Li, Q., and Wu, X.-M. Attributed graph clustering via adaptive graph
      convolution. arXiv preprint arXiv:1906.01210 (2019).
[191] Zhang, Y., Yao, Q., Dai, W., and Chen, L. Autokge: searching scoring functions for
      knowledge graph embedding. arXiv preprint arXiv:1904.11682 (2019).
[192] Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Schölkopf, B. Learning with local and
      global consistency. In Advances in neural information processing systems (2004), pp. 321–
      328.
[193] Zhu, X., Ghahramani, Z., and Lafferty, J. D. Semi-supervised learning using gaussian fields
      and harmonic functions. In Proceedings of the 20th International conference on Machine
      learning (ICML-03) (2003), pp. 912–919.
[194] Zou, D., Hu, Z., Wang, Y., Jiang, S., Sun, Y., and Gu, Q. Layer-dependent importance
      sampling for training deep and large graph convolutional networks. In Advances in Neural
      Information Processing Systems (2019), pp. 11249–11259.
[195] Zou, H., and Hastie, T. Regularization and variable selection via the elastic net. Journal of
      the royal statistical society: series B (statistical methodology) 67, 2 (2005), 301–320.
                                                  111