NODE CLASSIFICATION IN SHIFTING LANDSCAPES : FAIRNESS AWARE DOMAIN ADAPTATION IN NETWORK DATA

Domain adaptation is an important task that considers how to apply a model trained on a labeled source dataset to an unlabeled target dataset. This is becoming an increasingly relevant concern as many datasets do not contain a large number of reliable labels and must consider additional training data from other sources. The domain adaptation task is challenging as the distributions of the source and target datasets may not be fully aligned. Such discrepancy in data distribution is called concept drift. This thesis focuses on the problem of domain adaptation in node classification for network data, a task known as cross-network node classification (CNNC). Unlike approaches specific to independent and identically distributed (i.i.d.) data, CNNC is concerned with the additional challenges introduced by the link structure of the source and target networks and differences in their node distributions.Such differences may exacerbate unfairness in node classification, and lead to one protected group receiving a disproportionately large number of positive outcomes. Additionally, existing CNNC approaches are computationally expensive as they rely upon having access to both the entire source and target graphs at one time.In this thesis I first present OTGCN, a method for performing domain adaptation between multiple disconnected networks for the task of node classification. OTGCN combines the powerful graph convolutional network (GCN) architecture along with techniques from optimal transport to design source node embeddings that are more aligned with nodes in the target graph. This allows us to train a more accurate node classifier within the target domain. I then present FOCI, an improvement to OTGCN which addresses fairness by implementing a novel optimal transport approach designed to directly target harmful link bias. Lastly I introduce FastFOCI and SpFOCI, two further enhancements to FOCI, which directly address performance and statistical parity, a popular group fairness measure. I demonstrate the effectiveness of each of these methods on several real-world datasets and discuss their strengths and weaknesses.

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: Attribution 4.0 International

Material Type: Theses

Authors: Stephens, Anna Joy

Thesis Advisors: Tan, Pang-Ning
Esfahanian, Abdol-Hossein

Committee Members: Zhou, Jiayu

Date: 2023

Subjects: Computer science

Program of Study: Computer Science - Master of Science

Degree Level: Masters

Language: English

Pages: 58 pages

Permalink: https://doi.org/doi:10.25335/37zs-9b15