CONCEPTUALIZING SOCIAL HARMS ARISING FROM BIAS AND DISCRIMINATION IN NATURAL LANGUAGE PROCESSING: RACE, GENDER & LANGUAGE By Jamell Dacon A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science —Doctor of Philosophy 2023 ABSTRACT Natural language processing (NLP) is a subfield of artificial intelligence (AI) and has become increasingly prominent in our everyday lives. NLP systems are now ubiquitous as they are capable of identifying offensive and abusive conversational content and hate speech detection on social media platforms, voice and speech recognition and transcription, news recommendation, dialogue systems and digital assistants, language generation, etc. Yet, the benefits of these language technologies do not accrue evenly to all of its users leading to harmful social impacts as NLP systems reproduce stereotypes or fallacious results. Most AI systems and algorithms are data driven and require natural language data upon which to be trained. Thus, data is tightly associated to the functionality of these algorithms and systems. These systems generate complex social implications i.e., displaying human-like social biases (e.g. gender bias) that induce technological marginalization and increased feelings of disenfranchisement. Throughout this dissertation, I argue that how harms arise in NLP systems and who is harmed by these biases, can only be conceptualized and understood at the intersection of NLP, justice and equity (e.g., Data Science for Social Good), and the coupled relationships between language and both social and racial hierarchies. I propose to address three questions at this intersection: (1) How can we conceptualize and quantify such aforementioned harms?; (2) How can we introduce a set of measurements to understand “bias” in NLP systems; and (3) How can we quantitatively and qualitatively ensure “fairness” in NLP systems?. To address these pertinent question, we attempt differentiate the two consequences of predictive bias in NLP: (1) outcome disparities (i.e., racial bias) and (2) error disparities (i.e., poor system performance) to explicate the importance of modeling social factors of language by exploiting NLP tools to examine predictive biases of both binary gender-specific (male and female) and LGBTQIA2S+ representations, and on an English language variety, i.e., African American English (AAE). Language reflects society, ideology, cultural identity, and customs of communicators, as well as their values. Therefore, natural language data, culture and systems are intertwined with social norms. Nevertheless, social media and online services contain rich textual information on topics sur- rounding ethnicity, gender identity and sexual orientation–members of the LGBTQIA2S+ community and language (e.g., AAE). This facilitates the collection of large-scale corpora to study social biases in NLP systems in hopes of reducing stigmatization, marginalization, mischaracterization, or erasure of dialectal languages and its speakers, pushing back against potentially discriminatory practices (in many cases—discriminatory through oversight more than malice). In this dissertation, I propose several studies to minimize the gaps between gender, race and NLP systems’ performance within the scope of the three aforementioned questions. To my family, friends and loved ones for their support, kindness, prayers, and encouragement. iv ACKNOWLEDGMENTS During my collegiate journey, I have received invaluable help, advice, support and guidance from a multitude of amazing people. First and foremost, I would like to thank God for allowing me to push through and attain such as degree; there were many low moments but I was able to stand again after each time I was knocked down. Next, I would like to thank my primary advisor, Dr. Jiliang Tang, for his patience, guidance, support and encouragement to continue to purse my own interest in research until I found my niche. I would also like to thank Mr. Steven Thomas, for numerous opportunities for growth, value, kindness, optimism. He has taught me that no matter how tough things may be, there is always a solution. With Dr. Tang’s and Mr. Thomas’ help I have achieved much more than I have imagined. I would like to extend my gratitude to my Ph.D. committee members: Dr. Hui Liu, Dr. Pan-Ning Tang and Dr. Tai-Quan Peng for all of their insightful questions and comments, support, encouragement and helpful suggestions. In addition, I would like to thank the members of the Data Science and Engineering (DSE) Lab and the Shiu Lab at MSU. Special thanks goes out to Tyler Derr, Haochen Liu, Harry Shomer, Kenia Segura Abá, Brianna Brown, Thilanka Ranaweera, Huan Chen, Serena Lotreck, Dr. Jyothi Kumar, Dr. Melissa Lehti-Shiu who were very supportive, and Dr. Shinhan Shiu for being his intelligent enthusiastic self, bringing constant joy to his lab members while directing the Shiu Lab. Finally, I would like to express my deepest thanks and gratitude to my dearest wife, Shaylynn Crum-Dacon, and my wonderful grandmother, Catherine Branker, as well as supportive family, friends, and colleagues for their love, encouragement and prayers during this time. v TABLE OF CONTENTS CHAPTER 1 INTRODUCTION . . . . 1.1 Motivation . 1.2 Dissertation Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 . . . . . . . Introduction . 2.2.1 2.2.2 2.2.3 CHAPTER 2 BIAS DETECTION IN DIALOGUE GENERATION . . . . . . . . . . . . 6 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 8 2.2 Fairness Analysis in Dialogue Systems . . . . . . . . . . . . . . . . . . . . . . . . 8 Fairness in Dialogue systems . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Context Data Construction . . . . . . . . . . . . . . . . . . . . . . 9 Fairness Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3.1 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Politeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3.2 Sentiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3.4 Attribute Words . 12 . 2.3.1 Dialogue Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 . . . . . . . . . . . . . . . . . . 12 . . . . . . . . . . . . . . . . . 12 2.3.2 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 . The Seq2Seq Generative Model The Transformer Retrieval Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Related Work . 2.5 Conclusion . 2.3 Experiment 2.3.1.1 2.3.1.2 . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . CHAPTER 3 DETECTING AND EXAMINING GENDER BIAS IN THE NEWS . . . . . 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 . . 3.3 Datasets . . 3.4 Bias in Gender Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4.1 Gender Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6.2 Centering Resonance Analysis . . . . . . . . . . . . . . . . . . . . . . . . 28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.6.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 . . . . 3.5.1 Attribute Words . . 3.5.2 Experiment . . 3.6 Bias in Wording . 3.5 Bias in Content 3.7 Conclusion . . . . . . . . . . . . . . . . . . CHAPTER 4 DETECTING HARMFUL ONLINE CONVERSATIONAL CONTENT 4.1 Introduction . 4.2 Preliminaries TOWARDS LGBTQIA2S+ INDIVIDUALS . . . . . . . . . . . . . . . . . 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 . . . . . . . . . . . . . . vi Problem Statement . . . . . 4.2.1 4.2.2 Dataset . 4.2.3 Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Human Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 . 36 . . 38 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 5 A MULTI-LAYERED LANGUAGE ANALYSIS: A CASE STUDY OF . . . . . . . . . . . . Introduction . . 5.1 5.2 Related Work . . 5.3 Dataset and Annotation . . . . AFRICAN-AMERICAN ENGLISH . . . . . . . . . . . . . . . . . . . . . . 39 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 . . 5.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3.2 Preprocessing . 5.3.3 Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3.4 Human Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Part-of-Speech (POS) Tagging . . . . . . . . . . . . . . . . . . . . . . . . 45 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.5 Operationalization of AAE as an English Language Variety . . . . . . . . . . . . . 46 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.7 Limitations And Ethical Considerations 5.4.1 5.4.2 Models 5.4 Methodology . . . . . . . . . . . . . . . . CHAPTER 6 DETECTING AND MITIGATING INHERENT LINGUISTIC BIAS IN . . . . . . . . . . . . 6.1 Introduction . 6.2 Preliminaries LARGE LANGUAGE MODELS . . . . . . . . . . . . . . . . . . . . . . . 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Problem Statement 6.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 . 6.2.2 Dataset 6.2.2.1 SNLI corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.2.2.2 MNLI corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.3 CODESWITCH Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.3.2 Candidate Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.3.3 Human Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.4 Empirical Study and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.5 Debiasing Methods . 6.5.1 Counterpart Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . 56 6.5.2 Language Style Disentanglement . . . . . . . . . . . . . . . . . . . . . . . 56 6.5.2.1 The LSD Framework . . . . . . . . . . . . . . . . . . . . . . . . 57 6.5.2.2 An Optimization Method . . . . . . . . . . . . . . . . . . . . . . 58 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 . 61 . 62 6.6 Related Work . 6.7 Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Limitations And Ethical Considerations 6.5.3 Experimental results . . . . . . . . CHAPTER 7 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 vii 7.1 Dissertation Summary . . 7.2 Future Work . . . 7.3 Concluding Remarks . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 APPENDIX A BIAS DETECTION IN DIALOGUE GENERATION . . . . . . . . . . . 82 APPENDIX B DETECTING AND EXAMINING GENDER BIAS IN THE NEWS . . . 86 APPENDIX C DETECTING HARMFUL ONLINE CONVERSATIONAL CONTENT TOWARDS LGBTQIA2S+ INDIVIDUALS . . . . . . . . . . . . . . . . 91 APPENDIX D A MULTI-LAYERED LANGUAGE ANALYSIS: A CASE STUDY OF AFRICAN-AMERICAN ENGLISH . . . . . . . . . . . . . . . . . . . . . 100 APPENDIX E DETECTING AND MITIGATING INHERENT LINGUISTIC BIAS IN LARGE LANGUAGE MODELS . . . . . . . . . . . . . . . . . . . . . . 103 viii CHAPTER 1 INTRODUCTION Natural language processing (NLP) is a subfield of artificial intelligence (AI), computer science, and linguistics focused on making human communication, such as speech and text, comprehensible to computers; NLP is used in a wide variety of everyday products and services. Some of the most common ways NLP is used are through voice-activated digital assistants on smartphones, email-scanning programs used to identify spam, and translation apps that translate a multitude of languages, and thus, has become increasingly prominent in our every day lives. NLP systems are now ubiquitous in both academia and industry as they are capable of identifying offensive and abusive conversational content and hate speech detection on social media platforms [120, 36, 142], voice and speech recognition and transcription [45, 77], news recommendation [34], dialogue systems and digital assistants [85], language generation [57], etc. Yet, the benefits of these language technologies do not accrue evenly to all of its users leading to harmful societal impacts as NLP systems reproduce gender and racial stereotypes [18, 24]. 1.1 Motivation Most AI systems and algorithms are data driven and require natural language data upon which to be trained. Thus, data is tightly associated to the functionality of these algorithms and systems. These systems generate complex social implications i.e., displaying human-like social biases (e.g. gender bias) that induce technological marginalization and increased feelings of disenfranchisement. As these systems aim to learn from natural language data, sentence and word embeddings–for example, are popular NLP tools that capture the semantic similarities of sentences and words which display human-like social biases. To consider both social and racial hierarchies sustained or intensified by current NLP computational techniques and facilitate Fairness, Accountability, Transparency and Ethics (FATE) in AI, ML, and NLP, to tackle bias and fairness issues we shift towards a human-in-the-loop paradigm to address issues surrounding gender and racial biases present in AI spaces. Moreover, by drawing on interdisciplinary fields such as a sociology, political science, 1 sociolinguistics, education, anthropology, psychology, and thorough engagement with relevant literature outside of NLP we aim to gain a deeper recognition of the coupled relationships between language, and racial and social hierarchies–a necessary step towards establishing a trustworthy path forward. In this thesis, we argue that how harms arise in NLP systems and who is harmed by these biases, can only be conceptualized and understood at the intersection of NLP, fairness, social justice, diversity and equity (e.g. Data Science for Social Good), and the coupled relationships between language and both social and racial hierarchies. We propose to address three questions at this intersection: 1. How can we conceptualize and quantify such aforementioned harms?; 2. How can we introduce a set of measurements to understand “bias” in NLP systems?; and 3. How can we quantitatively and qualitatively ensure “fairness” in NLP systems? To address these pertinent question, we attempt differentiate the two consequences of predictive bias in NLP: (1) outcome disparities (i.e., racial bias) and (2) error disparities (i.e., poor system performance) to explicate the importance of modeling social factors of language by exploiting NLP tools to examine predictive biases of both binary (male and female) and LGBTQIA2S+ representations, and on an English language variety, African American Language (AAE)1. Although AAE is spoken by millions of people across the United States, this dialect continuum is perceived to be “bad english” despite numerous studies by socio/raciolinguists and dialectologists in their attempts to quantify AAE as a legitimized language [6, 48, 11, 79]. As a consequence, conversational platforms struggle to effectively facilitate less-represented dialects and English language varieties. Language reflects society, ideology, cultural identity, and customs of communicators, as well as their values. Therefore natural language data, culture and systems are intertwined with social norms. 1A dialectal continuum previously known as Northern Negro English, Black English Vernacular (BEV), Black English, African American Vernacular English (AAVE), African American Language (AAL), Ebonics, and Non-standard English [79, 4, 56, 55, 6, 11, 76]. It is often referred to as African American Language (AAL) and African American English (AAE). In this work, we use the denotation AAE. 2 “[T]he common misconception [is] that language use has primarily to do with words and what they mean. It doesn’t. It has primarily to do with people and what they mean.” – [30] Nevertheless, social media and online services contain rich textual information on topics surrounding ethnicity, gender identity, sexual orientation and AAE, enabling the collection of large-scale corpora to study societal biases in NLP systems in hopes of reducing stigmatization, marginalization, mis- characterization, or erasure of AAE and its speakers, pushing back against potentially discriminatory practices (in many cases, discriminatory through oversight more than malice). Throughout this thesis, we propose several studies to minimize the gaps between gender, race and NLP systems’ performance within the scope of the three aforementioned questions. In order to enable in-depth conversations about what kinds of system behaviors are harmful, in what ways, to whom, and why; we will allude to three case studies, (1) Gender, Race, Language and Social Justice, (2) Gender and Sexual Identities, Orientations and Expressions, and (3) Language, Race and Culture referencing several published works accepted to top-tier conferences that engage with social factors of language, affected communities and NLP systems. 1.2 Dissertation Contributions We summarize the major contributions of this dissertation in 3-fold case studies as follows: • We conduct a pioneering case study about the fairness issues concerning (1) Gender, Race, Language and Social Justice, (2) Gender and Sexual Identities, Orientations and Expressions, and (3) Language, Race and Culture • In Chapter 2, we address the case study Gender, Race, Language and Social Justice. – We define the fairness in dialogue systems formally and introduce a set of measurements to understand the fairness of a dialogue system quantitatively; – We construct a benchmark dataset to study gender and racial (linguistic) biases in dialogue models; 3 – We propose two simple but effective debiasing methods which are demonstrated by experiments to be able to mitigate the biases in dialogue systems significantly. • Next, in Chapters 3 & 4, we address the case study Gender and Sexual Identities, Orientations and Expressions. – In chapter 3, we construct two of the largest benchmark datasets: (1) possessive (gender-specific and gender-neutral) nouns dataset and (2) attribute (career-related and family-related) words dataset to study gender bias to date; – We demonstrate that there exist conclusive socially-constructed biases in regards to gender by introducing a series of measurements to better understand gender representation in news articles quantitatively and qualitatively; – We later adapt the gender orientation (LGBTQIA2S+, to study stereotypical societal biases against LGBTQIA2S+ individuals by implementing a multi-headed BERT-based toxic comment detection model [60] to identify several forms of toxicity; – In chapter 4, we construct a large multi-labelled classification dataset for a total of 6 distinct labels to distinguish several forms of toxicity. To the best of our knowledge, our dataset is the first and largest dataset created to study the classification of harmful conversational content towards LGBTQIA2S+ individuals. • Finally, in Chapters 5 & 6, we address the case study Language, Race and Culture. – In chapter 5, we construct a small dataset of 3000 demographically-aligned African American (AA) tweets to study predictive bias in popular off-the-shelf Parts-of-Speech (POS) Tagger models; – Next, we incorporate a human-in-the-loop paradigm by recruiting 20 crowd-sourced diglossic annotators to evaluate AAE language variety, to counter-attack erasure and several forms of biases such as model over-amplification, and semantic bias; 4 – In chapter 5, we propose CodeSwitch, a greedy unidirectional morphosyntactically- informed translation method for data augmentation to generate intent-and-semantically equivalent AAE examples from SAE; – We construct the two intent-and-semantically equivalent NLI dataset of AAE sen- tence pairs with a wide range of morphological syntactic features and dialect-specific vocabulary. To our knowledge we are the first to create such a dataset; – We propose two simple, yet effective debiasing methods to mitigate the inherent linguistic bias in NLI models. 5 CHAPTER 2 BIAS DETECTION IN DIALOGUE GENERATION Recently there are increasing concerns about the fairness of Artificial Intelligence (AI) in real-world applications such as computer vision and recommendations. For example, recognition algorithms in computer vision are unfair to black people such as poorly detecting their faces and inappropriately identifying them as “gorillas”. As one crucial application of AI, dialogue systems have been extensively applied in our society. They are usually built with real human conversational data; thus they could inherit some fairness issues which are held in the real world. However, the fairness of dialogue systems has not been investigated. In this paper, we perform the initial study about the fairness issues in dialogue systems. In particular, we construct the first dataset and propose quantitative measures to understand fairness in dialogue models. Our studies demonstrate that popular dialogue models show significant prejudice towards different genders and races. We will release the dataset and the measurement code to foster the fairness research in dialogue systems upon the acceptance of the paper. 2.1 Introduction AI techniques have brought great conveniences to our lives. However, they have been proven to be unfair in many real-world applications such as computer vision [65], audio processing [113] and recommendations [145]. In other words, AI techniques may make decisions that are skewed towards certain groups of people in these applications [97]. In the field of computer vision, some face recognition algorithms fail to detect faces of black users [114] or inappropriately label black people as “gorillas” [65]. In the field of audio processing, it is found that voice-dictation systems recognize a voice from a male more accurately than that from a female [113]. Moreover, when predicting criminal recidivism, risk assessment tools tend to predict that people of some certain races are more likely to commit a crime again than other people [134]. The fairness of AI systems has become one of the biggest concerns due to its huge negative social impacts. Dialogue systems are important practical applications of Artificial Intelligence (AI). They 6 Table 2.1: Examples of Gender and Racial Biases in Dialogue Systems. Context Response Bias Hahaha, he has a really cute laugh and smile:d He really is the sweetest. all he does is make that cute little smile. Hahaha, she has a really cute laugh and smile:d She cute but she is also kinda evil Oh my god, for real, what is with this music during the downtime. Also this reminds me: my live karaoke cover band cure-aoke is still the best idea I’ve ever had The only good future song is percocet and stripper joint. I have no idea why that one is good but the rest are hot wet poo. Oh my god, for real, what is with dis music during the downtime. Sentiment: positive Sentiment: negative Not offensive Offensive interact with users by human-like conversations to satisfy various their needs. Conversational question answering agents converse with users to provide them with the information they want to find [118]. Task-oriented dialogue agents, such as Apple Siri and Microsoft Cortana, assist users to complete specific tasks such as trip planning and restaurant reservations [74]. Non-task-oriented dialogue agents, also known as chatbots, are designed to chit-chat with users in open domains for entertainment [112]. Dialogue systems have shown great commercial values in industry and have attracted increasing attention in the academic field [27, 50]. Though dialogue systems have been widely deployed in our daily lives, the fairness issues of dialogue systems have not been specially studied yet. Dialogue systems are often built based on real human conversational data through machine learning especially deep learning techniques [125, 123, 122]. Thus, they are likely to inherit some fairness issues against specific groups which are held in the real world such as gender and racial biases. Examples of gender and racial biases we observed from one popular dialog model are demonstrated in Table 2.1. When we simply change a word of male in a given context to its counterpart of female such as from “he” to “she” and from “his” to “her”, the sentiments of the corresponding responses are changed from positive to negative. As we replace a phrase in standard English to African American English such as from “this” to “dis”, the response becomes more offensive. Since the goal of dialogue systems is to talk with users and provide them with assistance and entertainment, if the systems show discriminatory behaviors in the interactions, the user experience will be adversely affected. Moreover, public commercial chatbots can get resisted for their improper speech [140]. Hence, there is an urgent demand to investigate the fairness issues 7 of dialog systems. In this work, we conduct the initial study about the fairness issues in two popular dialogue models, i.e., a generative dialogue model [128] and a retrieval dialogue model [135]. In particular, we aim to answer two research questions – (1) do fairness issues exist in dialogue models? and (2) how to quantitatively measure the fairness? Our key contributions are summarized as follows: • We construct the first dataset to study gender and racial biases in dialogue models and we will release it to foster the fairness research; • We define the fairness in dialogue systems formally and introduce a set of measurements to understand the fairness of a dialogue system quantitatively; and • We demonstrate that there exist significant gender-and linguistic (race-specific) biases in dialogue systems. 2.2 Fairness Analysis in Dialogue Systems In this section, we first formally define fairness in dialogue systems. Then we introduce our method to construct the dataset to investigate fairness and then detail various measurements to quantitatively evaluate the fairness in dialogue systems. 2.2.1 Fairness in Dialogue systems As shown in the examples in Table 2.1, the fairness issues in dialogue systems exist between different pairs of groups, such as male vs. female, white people vs. black people, and can be measured differently such as sentiment and politeness. Note that in this work we use “white people" to represent races who use standard English compared to “black people" who use African American English. Next we propose a general definition of fairness in dialogue systems. Definition 1 Suppose we are examining the fairness on a group pair G = ( 𝐴, 𝐵). Given a context , . . . , 𝑤𝑛) which contains concepts 𝑤 ( 𝐴) , . . . , 𝑤 ( 𝐴) , 𝑤 ( 𝐴) 𝑗 𝑖 𝐶 ( 𝐴) = (𝑤1, . . . , 𝑤 ( 𝐴) we construct a new context 𝐶 (𝐵) = (𝑤1, . . . , 𝑤 (𝐵) 𝑖 𝑗 𝑖 related to group 𝐴, , 𝑤 ( 𝐴) 𝑗 with 𝑖 , . . . , 𝑤 (𝐵) 𝑗 , . . . , 𝑤𝑛) by replacing 𝑤 ( 𝐴) 8 their counterparts 𝑤 (𝐵) context 𝐶 ( 𝐴). The pair of the two context (𝐶 ( 𝐴), 𝐶 (𝐵)) is referred as a parallel context pair. related to group 𝐵. Context 𝐶 (𝐵) is called the parallel context of , 𝑤 (𝐵) 𝑗 𝑖 Following the fairness definition proposed in [91], we define the fairness in dialogue systems as follows: Definition 2 Suppose D is a dialogue model that can be viewed as a function {D : 𝐶 ↦→ 𝑅} which maps a context 𝐶 to a response 𝑅. OG = {(𝐶 ( 𝐴) 𝑖=1 is a parallel context corpus related to group pair G = ( 𝐴, 𝐵). M is a measurement that maps a response 𝑅 to a scalar score 𝑠. We define , 𝐶 (𝐵) 𝑗 )}𝑛 𝑖 the fairness in the dialogue model D on the parallel context corpus OG in terms of the measurement M as: BM(D, OG) = E(𝐶 ( 𝐴) ,𝐶 (𝐵) )∈OG (M(D(𝐶 ( 𝐴))) − M(D(𝐶 (𝐵)))) (2.1) If BM(D, OG) < 𝜖, then the dialogue model D is considered to be fair for groups 𝐴 and 𝐵 on corpus OG in terms of the measurement M where 𝜖 is a threshold to control the significance. 2.2.2 Parallel Context Data Construction Table 2.2: Examples of Gender and Race Word Pairs. Gender Words (Male - Female) he - she dad - mom husband - wife mr. - mrs. hero - heroine Race Words (White - Black) the - da this - dis turn off - dub very good - supafly what’s up - wazzup To study the fairness of a dialogue model on a specific pair of group G, we need to build data OG which contains a great number of parallel contexts pairs. We first collect a list of gender word pairs for the (male, female) groups and a list of race word pairs for the (white, black) groups. The gender word list consists of male-related words with their counterparts of female. The race word list consists of common African American English words or phrases paired with their counterparts in standard English. Some examples are shown in Table 2.2. For the full lists, please refer to the Appendix A. Afterwards, for each word list, we first filter out a certain number of contexts which 9 contain at least one word or phrase in the list from a large dialogue corpus. Then, we construct the parallel contexts by replacing these words or phrases with their counterparts. All the obtained parallel context pairs form the data to study the fairness of dialogue systems. 2.2.3 Fairness Measurements In this work, we evaluate the fairness in dialogue systems in terms of four measurements, i.e., diversity, politeness, sentiment and attribute words. 2.2.3.1 Diversity Diversity of responses is an important measurement to evaluate the quality of a dialogue system [27]. Dull and generic responses make users boring while diverse responses make a conversation more human-like and engaging. Hence, if a dialogue model produces differently diverse responses for different groups, user experience of a part of users will be impacted. We measure the diversity of responses through the distinct metric [83]. Specifically, let distinct-1 and distinct-2 denote the number of distinct unigrams and bigrams divided by the total number of generated words in the responses. We report the diversity score as the average of distinct-1 and distinct-2. 2.2.3.2 Politeness Chatbots should talk politely with human users. Offensive responses cause users discomfort and should be avoided [62, 43, 87]. Fairness in terms of politeness exist when a dialogue model is more likely to provide offensive responses for a certain group of people than others. In this measurement, we apply an offensive language detection model [43] to predict whether a response is offensive or not. This model is specialized to judge offensive language in dialogues. The politeness measurement is defined as the expected probability of a response to the context of a certain group being offensive. It is estimated by the ratio of the number of offensive responses over the total number of produced responses. 2.2.3.3 Sentiment The sentiment of a piece of text refers to the subjective feelings it expresses, which can be positive, negative and neutral. A fair dialogue model should provide responses with the similar 10 sentiment distribution for people of different groups. In this measurement, we assess the fairness in terms of sentiment in dialogue systems. We use the public sentiment analysis tool Vader [67] to predict the sentiment of a given response. It outputs a normalized, weighted composite score of sentiment ranging from −1 to 1. Since the responses are very short, the sentiment analysis for short texts could be inaccurate. To ensure the accuracy of this measure, we only consider the responses with scores higher than 0.8 as positive and the ones with the scores lower than −0.8 as negative. The sentiment measures are the expected probabilities of a response to the context of a certain group being positive and negative. The measurements are estimated by the ratio of the number of responses with positive and negative sentiments over the total number of all produced responses, respectively. 2.2.3.4 Attribute Words Table 2.3: Examples of the Attribute Words. pleasant unpleasant career family Attribute Words awesome, enjoy, lovely, peaceful, honor, ... awful, ass, die, idiot, sick, ... academic, business, engineer, office, scientist, ... infancy, marriage, relative, wedding, parent, ... People usually have stereotypes about some groups and think that they are more associated with certain words. For example, people tend to associate males with words related to career and females with words related to family [68]. We call these words as attributes words. Here we measure this kind of fairness in dialogue systems by comparing the probability of attribute words appearing in the responses to contexts of different groups. We build a list of career words and a list of family words to measure the fairness on the (male, female) group. For the (white, black) groups, we construct a list of pleasant words and a list of unpleasant words. Table 2.3 shows some examples of the attribute words and the full lists can be found in Appendix A. In the measurement, we report the expected number of the attribute words appearing in one response to the context of different groups. This measurement is estimated by the average number of the attribute words appearing in all the produced responses. 11 2.3 Experiment In this section, we first introduce the two popular dialogue models we study, then detail the experimental settings and finally we present the fairness results with discussions. 2.3.1 Dialogue Models Typical chit-chat dialogue models can be categorized into two classes [27]: generative models and retrieval models. Given a context, the former generates a response word by word from scratch while the latter retrieves a candidate from a fixed repository as the response according to some matching patterns. In this work, we investigate the fairness in two representative models in the two categories, i.e., the Seq2Seq generative model [128] and the Transformer retrieval model [135]. 2.3.1.1 The Seq2Seq Generative Model The Seq2Seq models are popular in the task of sequence generation [128], from text summariza- tion, machine translation to dialogue generation. It consists of an encoder and a decoder, both of which are typically implemented by RNNs. The encoder reads a context word by word and encodes it as fixed-dimensional context vectors. The decoder then takes the context vector as input and generates its corresponding output response. The model is trained by optimizing the cross-entropy loss with the words in the ground truth response as the positive labels. The implementation details in the experiment are as follows. Both the encoder and the decoder are implemented by 3-layer LSTM networks with hidden states of size 1,024. The last hidden state of the encoder is fed into the decoder to initialize the hidden state of the decoder. Pre-trained Glove word vectors [104] are used as the word embeddings with dimension 300. The model is trained through stochastic gradient descent (SGD) with a learning rate of 1.0 on 2.5 million Twitter single-turn dialogues. In the training process, the dropout rate and gradient clipping value are set to 0.1. 2.3.1.2 The Transformer Retrieval Model The Transformer proposed by [135] is a novel encoder-decoder framework, which models sequences by pure attention mechanism instead of RNNs. Specially, in the encoder part, positional encodings are first added to the input embeddings to indicate the position of each word in the 12 Table 2.4: Fairness in terms of Gender. Responses by the Seq2Seq generative model Diversity (%) Offense Rate (%) Positive (%) Negative (%) Sentiment Ave.Career Word Numbers per Response Ave.Family Word Numbers per Response Male 0.1930 36.7630 2.6160 0.7140 0.0059 0.0342 Female Difference (%) Male 3.1831 +1.5544 0.1900 40.0980 0.2108 -9.0716 0.1168 +3.4404 2.5260 1.1490 0.0186 -60.9243 0.0208 +9.5076 0.0053 0.0533 0.1443 -55.9684 Responses by the Transformer retrieval model Female Difference (%) 2.4238 0.2376 0.1088 0.0196 0.0156 0.1715 +23.8541 -12.6986 +6.8242 -5.4868 +25.0360 -18.7985 Table 2.5: Fairness in terms of Race. Diversity (%) Offense Rate (%) Positive (%) Negative (%) Sentiment Ave.Pleasant Word Numbers per Response Ave.Unpleasant Word Numbers per Response Responses by Seq2Seq generative model Responses by Transformer retrieval model White 0.2320 26.0800 2.5130 0.3940 0.1226 0.0808 Black 0.2210 27.1030 2.0620 0.4650 0.1043 0.1340 Difference (%) White 4.9272 12.4050 10.6970 1.3800 0.2843 0.1231 +4.7413 -3.9225 +17.9467 -18.0203 +14.9637 -65.7634 Black 4.3013 16.4080 9.6690 1.5380 0.2338 0.1710 Difference (%) +12.7030 -32.2692 +9.6102 -11.4493 +17.7530 -38.9097 sequence. Next the input embeddings pass through stacked encoder layers, where each layer contains a multi-head self-attention mechanism and a position-wise fully connected feed-forward network. The retrieval dialogue model only takes advantage of the encoder to encode the input contexts and candidate responses. Then, the model retrieves the candidate response whose encoding matches the encoding of the context best as the output. The model is trained in batches of instances, by optimizing the cross-entropy loss with the ground truth response as positive label and the other responses in the batch as negative labels. The implementation of the model is detailed as follows. In the Transformer encoder, we adopt 2 encoder layers. The number of heads of attention is set to 2. The word embeddings are randomly initialized and the size is set to 300. The hidden size of the feed-forward network is set as 300. The model is trained through Adamax optimizer with a learning rate of 0.0001 on 2.5 million Twitter single-turn dialogues. In the training process, dropout mechanism is not used. Gradient clipping value is set to 0.1. The candidate response repository is built by randomly choosing 500,000 utterances from the training set. 13 2.3.2 Experimental Settings In the experiment, we focus only on single-turn dialogues for simplicity. We use a public conversation dataset that contains around 2.5 million single-turn conversations collected from Twitter to train the two dialogue models. The models are trained under the ParlAI framework [100]. To build the data to evaluate fairness, we use another Twitter dataset which consists of around 2.4 million single-turn dialogues. For each dialogue model, we construct a dataset that contains 300,000 parallel context pairs as describe in Section 2.2.2. When evaluating the diversity, politeness and sentiment measurements, we first remove the repetitive punctuation from the produced responses since they interfere with the performance of the sentiment classification and offense detection models. When evaluating with the attribute words, we lemmatize the words in the responses through WordNet lemmatizer in NLTK toolkit [9] before matching them with the attribute words. 2.3.3 Experimental Results We first present the results of fairness in terms of gender in Table 2.4. We feed 300,000 parallel context pairs in the data of (male, female) into the dialogue models and evaluate the produced responses with the four measurements. We also show the values of Z-statistics and their corresponding p-values. We make the following observations from the tables. First, in terms of the diversity, the retrieval model produces more diverse responses than the generative model. This is consistent with the fact that Seq2Seq generative model tends to produce more dull and generic responses [83] compared to responses from retrieval models. We observe the following: • For the diversity measurement, the retrieval model produces more diverse responses than the generative model. This is consistent with the fact that Seq2Seq generative model tends to produce dull and generic responses [83]. But the responses of the Transformer retrieval model are more diverse since all of them are human-made ones collected in the repository. We observe that both of the two models produce more diverse responses for males than females, which demonstrates that it is unfair in terms of diversity in dialogue systems. • In terms of the politeness measurement, we can see that females receive more offensive 14 responses from both of the two dialogue models. The results show that dialogue systems talk to females more unfriendly than males. • As for sentiment, results show that females receive more negative responses and less positive responses. • For the attribute words, there are more career words appearing in the responses for males and more family words existing in the responses for females. This is consistent with people’s stereotype that males dominate the field of career while females are more family-minded. Then we show the results of fairness in terms of race in Table 2.5. Similarly, 300,000 parallel context pairs of (white, black) are input into the dialogue models. From the table, it can be observed: • The first observation is that black people receive less diverse responses from the two dialogue models. It demonstrates that it is unfair in terms of diversity for races. • Dialogue models tend to produce more offensive languages for black people. • In terms of the sentiment measurements, the black people get more negative responses but less positive responses. • As for the attribute words, unpleasant words are referred more frequently for black people, while white people are associated more with pleasant words. To summarize, the dialogue models trained on real-world conversation data indeed share similar unfairness as that in the real-world in terms of gender and race. Given that dialogue systems have been widely applied in our society, it is strongly desired to handle the fairness issues in dialogue systems. 2.4 Related Work Existing works attempt to address the issue of fairness in various Machine Learning (ML) tasks such as classification [150, 75], regression [7], graph embedding [22] and clustering [3, 28]. Besides, we will briefly introduce related works which study fairness issues on NLP tasks. 15 Word Embedding. Word Embeddings often exhibit stereotypical human bias for text data, causing serious risk of perpetuating problematic biases in imperative societal contexts. Popular state-of-the-art word embeddings regularly mapped men to working roles and women to traditional gender roles [18], thus led to methods for the impartiality of embeddings for gender-neutral words. In [18], a 2-step method is proposed to debias word embeddings. In [158], it is proposed to modify Glove embeddings by saving gender information in some dimensions of the word embeddings while keeping the other dimensions unrelated to gender. Sentence Embedding. Several works attempted to extend the research in detecting biases in word embeddings to that of sentence embedding by generalizing bias-measuring techniques. In [94], their Sentence Encoder Association Test (SEAT) based on Word Embedding Association Test (WEAT [68]) is introduced in the context of sentence encoders. The test is conducted on various sentence encoding techniques, such as CBoW, GPT, ELMo, and BERT, concluding that there was varying evidence of human-like bias in sentence encoders. However, BERT, a more recent model, is more immune to biases. Coreference Resolution. The work [156] introduces a benchmark called WinoBias to measure the gender bias in coreference resolution. To eliminate the biases, a data-augmentation technique is proposed in combination with using word2vec debiasing techniques. Language Modeling. In [19] a measurement is introduced for measuring gender bias in a text generated from a language model that is trained on a text corpus along with measuring the bias in the training text itself. A regularization loss term was also introduced aiming to minimize the projection of embeddings trained by the encoder onto the embedding of the gender subspace following the soft debiasing technique introduced in [18]. Finally, concluded by stating that in order to reduce bias, there is a compromise on perplexity based on the evaluation of the effectiveness of their method on reducing gender bias. Machine Translation. In [107], it is shown that Google’s translate system can suffer from gender bias by making sentences taken from the U.S. Bureau of Labor Statistics into a dozen languages that are gender-neutral, including Yoruba, Hungarian, and Chinese, translating them into English, and 16 showing that Google Translate shows favoritism toward males for stereotypical fields such as STEM jobs. In the work [19], the authors use existing debiasing methods in word embedding to remove the bias in machine translation models. These methods do not only help them to mitigate the existing bias in their system, but also boost the performance of their system by one BLEU score. 2.5 Conclusion In this paper, we have investigated the fairness issues in dialogue systems. In particular, we define the fairness in dialogue systems formally and further introduce four measurements to evaluate the fairness of a dialogue system quantitatively, including diversity, politeness, sentiment and attribute words. Moreover, we construct data to study gender and racial biases for dialogue systems. At last, we conduct detailed experiments on two types of dialogue models (i.e., a Seq2Seq generative model and a Transformer retrieval model) to analyze the fairness issues in the dialogue systems. The results show that there exist significant gender-and race-specific biases in dialogue systems. Given that dialogue systems are widely deployed in various commercial scenarios, it’s urgent for us to resolve the fairness issues in dialogue systems. In the future, we will continue this line of research and focus on developing debiasing methods for building fair dialogue systems. 17 CHAPTER 3 DETECTING AND EXAMINING GENDER BIAS IN THE NEWS To attract unsuspecting readers, news article headlines and abstracts are often written with speculative sentences or clauses. Male dominance in the news is very evident, whereas females are seen as “eye candy” or “inferior”, and are underrepresented and under-examined within the same news categories as their male counterparts. In this paper, we present an initial study on gender bias in news abstracts in two large English news datasets used for news recommendation and news classification. We perform three large-scale, yet effective text-analysis fairness measurements on 296,965 news abstracts. In particular, to our knowledge we construct two of the largest benchmark datasets of possessive (gender-specific and gender-neutral) nouns and attribute (career-related and family-related) words datasets which we will release to foster both bias and fairness research aid in developing fair NLP models to eliminate the paradox of gender bias. Our studies demonstrate that females are immensely marginalized and suffer from socially-constructed biases in the news. This paper individually devises a methodology whereby news content can be analyzed on a large scale utilizing natural language processing (NLP) techniques from machine learning (ML) to discover both implicit and explicit gender biases. 3.1 Introduction In recent years, there has been a growing popularity of online newspapers in comparison to traditional “printed” newspapers [141]. A benefit to online news is that news articles are constantly updating; furthermore, news titles and abstracts are regularly taken into consideration when recommending news to quickly attract users [44]. However, to attract the attention of users, rich textual information such as news titles and abstracts present various forms of media biases such as ideological bias (i.e., biased articles that attempt to promote a particular opinion on a topic), coverage bias (i.e., media coverage in regards to the visibility of topics or entities), selection bias, and presentation bias [59], thus contributing to the problem of gender bias. Since the 1950s, there have been studies on biased news reporting [137]. Media bias is both intentional as it reflects a conscious act and is 18 sustained to present a systematic biased tendency[139]. Male dominance is well documented, and in news articles, men are always depicted as leaders while women are depicted as ‘inferior’ or as ‘eye candy’ [81]. Nevertheless, consumers of online news services are attracted to novelty and/or differences such as skin-color, ethnicity, gender identity, or sexual orientation, which creates an ingrained feeling of interest or curiosity that may result in chronic socially-constructed biases. News articles are often written with speculative sentences or clauses to clinch a reader’s attention [47], and thus, play a crucial role in shaping public and personal opinions on public affairs and political issues [59]. An example of explicit informational bias in gender-specific (male and female) job promotion news titles is, “Women who want to succeed at work should shut up - while men who want the same should keep talking, research says”, compared to, “Men have been promoted 3 times more than women during the pandemic, study finds”. In this example, those titles present enough information about the news’ body content; however, in some cases, titles may not have enough textual information. For example, “Women in the workplace.”, whereas an abstract will possess a quick overview of the news article, therefore, containing sufficient information content to indicate the presence of gender bias. Although online news recommendations [141, 44] continuously provide novel news stories, the textual information demonstrates and constitutes socially-constructed biases. Women represent nearly half of the world’s population, yet they are greatly under-examined and underrepresented in news stories [81]. Those who are considered to be newsworthy are politicians, CEOs, engineers, doctors, pilots, basketball players, and so on – are often men. When women are considered to be newsworthy they are often presented as sexual beings for their bodies, motherhood, and/or being supportive wives [69, 70]. In short, news media heavily influences gender roles in society by serving as a basis of stereotypes which results in the reinforcement of social inequalities. Therefore, conveying categorical barriers, and thus, controlling ones’ self-identity and determining ones’ position in a hierarchical taxonomy. Natural language processing (NLP) techniques and systems aim to learn from natural language data, and mitigating social biases becomes a compelling matter not only in machine learning (ML) but for social justice as well. Sentence and word embeddings are popular NLP tools that 19 capture the semantic similarities of sentences and words which display human-like societal biases [19, 68, 18, 94], whereas text classification [154] also know as text tagging is a computational process of categorizing texts into groups. Several NLP text classifiers can assign a set of predefined tags by automatically analyzing texts based on their textual information. Previous existing works have taken different approaches to address the issue of gender bias by detecting the male/female ratio of images [69, 70], measuring fairness in dialogues systems [86, 42], language modeling [20], machine translation [29], and coreference resolution [157]. In this work, we conduct an innovative study of bias issues in gender representation in news abstracts in two large English news datasets i.e., MIND Dataset [141], and a News Category Dataset [101] which are two large scale high quality news datasets constructed for news recommendations and news classification. Our goals are to detect and examine the phenomenon of implicit (i.e., bias that is implied and not stated directly) and explicit (bias that is plainly stated) gender bias in the abstracts of news articles where information about gender related stories to gain a sense of understanding of the gender representation in the news by examining the relationships between social hierarchies and news content. Our motivation is to identify how several forms of bias such as coverage bias, selection bias, and presentation bias contribute to the problem of gender bias. As gender fairness in news articles is an important problem, we analyze representational harms such as ideological bias which inseminates adverse generalizations about women. 1. We construct two large benchmark datasets: (1) possessive (gender-specific and gender-neutral) nouns dataset and (2) attribute (career-related and family-related) words dataset to study gender bias, and we will release them to foster both bias and fairness research; 2. We systematically conduct large scale analyses of each news corpora to detect and examine gender biases in distribution, content, and labeling and word choice; 3. We demonstrate that there exist conclusive socially-constructed biases in regards to gender by introducing a series of measurements to better understand gender representation in news articles quantitatively and qualitatively. 20 3.2 Related Works The elimination of gender discrimination is an important issue that contemporary society is facing. Gender bias is reflected in various behaviors of people, among which language is one of the most powerful means to express sexism [82, 99]. Existing works analyze gender bias in language of different fields. [93] discuss the gender stereotypes reflected in job evaluation languages such as letters of recommendation for academic positions. [52] analyze the gendered wordings used in job advertisements and discuss how they reflect gender inequality. In the field of education, gender bias in high school textbooks [2] and computer science education materials [96] are studied. [99] investigate gender bias in general language usages. The authors discuss two types of gender bias in languages: the unfair lexical choices caused by gender stereotypes and the sexism embedded in language structures, including grammatical and syntactical rules. The authors emphasize the beneficial effects of gender-fair linguistic expressions and suggest to mitigate gender bias by using them. Recently, [105] extend this line of research to the field of law. The authors study the gender bias reflected in the languages of court decisions. As a pioneering work, we investigate the gender bias in news languages in this paper to promote gender equality in the field of journalism. Man-made text data are widely used to train machine learning models for various NLP tasks. Learning from human behaviors, NLP models have been proven to inherit the prejudices from humans [98, 92]. Existing works attempt to address the issue of fairness in various NLP tasks such as text classification [103, 21, 152], word embedding [18, 159, 53], coreference resolution [157, 116], language modeling [20], machine translation [49], semantic role labeling [155], dialogue generation [86, 89], etc. In this paper, we are committed to a better understanding of gender bias in news texts, thus contributing to building fair NLP models trained on such data, such as news recommender systems, news classifiers, and fake news detection models, etc. 3.3 Datasets We first collect two English news datasets [32], i.e., MIND Dataset (MIND) and a News Category Dataset (NCD). In our corpus, we retrieved 363,385 news articles, thus 363,385 news titles. As previously mentioned in Section 3.1, some titles do not present enough informational content about 21 Table 3.1: Gender distribution test on the news datasets. Dataset Abstracts Category 96,112 MIND 200,853 NCD 18 41 M 22,760 21,250 F 6,817 15,856 Table 3.2: Illustration of four intersecting career words (prefixes) across the two datasets for females compared to their respective male counter parts. The results are reported in terms of no. of gender-specific career words mentioned in each dataset per gender with their corresponding Woman/Man suffixes. MIND NCD Career Words Spokes Congress Chair Business # Man 192 191 225 66 # Woman 121 49 20 3 # Man 112 94 102 31 # Woman 42 25 5 4 an article’s body content to attract users, hence the notion to analyze the abstracts of each news articles. We later extract a total of 296,965 news abstracts from 363,385 news articles. Following this, inspired by [86] we develop two large word datasets (1) Possessive nouns dataset: a large benchmark gender-specific possessive nouns dataset containing a total of 465 non-offensive masculine and feminine gender possessive nouns (see Appendix B.1.1 and B.1.2), and (2) Attribute words dataset: a large benchmark gender-specific and gender neutral dataset containing a total of 357 masculine, feminine and neutral career-related and family-related words (see Appendix B.2.1 and B.2.2). We then conduct the three experiments to detect and examine the bias across the two news datasets. We will now detail the two news recommendation and news classification datasets as follows: – MIND: The MIND dataset was collected from the Microsoft News website. Wu et al. [141] randomly sampled news for 6 weeks from October 12th to November 22th, 2019 to create two datasets i.e., MIND and MIND-small both totaling in 161,013 news articles. Each news article contains a news ID, a category label, a title, and a body (URL); however, not every article contains an abstract resulting in 96,112 abstracts. We used the training set (largest set of news articles) since both the validation and test sets are assumed to be subsets of the training set. MIND is created to serve as a new news recommendation benchmark dataset. 22 – NCD: The NCD dataset [101] was collected from Huffpost. The news articles were sampled from news headlines from the year 2012 to 2018 totaling in 202,372 news articles. Each news article contains a category label, headline, authors, link, and date; however, not every article contains a short description (abstract) resulting in 200,853 abstracts. NCD serves as a news classification and recommendation benchmark dataset. 3.4 Bias in Gender Distribution In this section, we explore the gender distribution in news abstracts across the two datasets to determine the presence of category bias and occupational bias by identifying words in our possessive nouns and attribute words dataset (see Appendix B.1 and B.2). 3.4.1 Gender Distribution Gender distribution refers to the diversity in the abstracts of each news article. The distribution is a simple, yet key measurement of equality in the number of males to females in each news dataset. Given that an abstract contains one or more sentences or clauses consisting of gender identity terms, the intuition is to classify a sex, i.e., male (M) or female (F), otherwise neutral N for each abstract. In turn, this quantification refers to the proportion of the number of males to females in each news category. Hence, we label each news abstract with one of three possible labels, (1) M: if the abstract contains more masculine possessive nouns, (2) F: if the abstract contains more feminine possessive nouns, and lastly, (3) N: if the abstract contains none or the same number of masculine and feminine possessive nouns. For neutral (N) cases, we simultaneously disregard unisex gender nouns e.g. baby, child, employee, worker, etc., and people’s names in abstracts as they can also be unisex, pet names, nicknames, or stage names for both males and females, e.g. Max, Dylan, Jamie, Jordan, Blake, Taylor, etc,. 3.4.2 Experiment In this measurement, we aim to investigate the gender distribution of males to females abstracts across the two news datasets. We first calculate the gender distribution in each dataset by parsing each sentence or clause of each abstract for gender identity terms to classify a sex, i.e., male: 23 (M), female: (F), or neutral: N. As previously mentioned, to determine the sex of an abstract we label an abstract with one of three possible labels, M if the abstract contains more masculine possessive nouns; otherwise, F or N from a total of 465 masculine and feminine gender-specific and gender-neutral possessive nouns. Table 3.1 presents the results of the gender distribution test on the news datasets in terms of the total number of abstracts, categories per dataset, and the number of gender-tagged abstracts. One can observe that distribution results from MIND are quite distressing, as female abstracts are greatly underrepresented. In NCD, female abstracts are not overly underrepresented; nonetheless, NCD possesses the largest number of categories and thus motivating the notion to investigate the category distribution in our now-labeled gender-tagged news abstracts. We examine the gender distribution across each category to identify if there exists a large proportion of gender biased topics e.g. Politics. As previously mentioned MIND was collected over a period of 6 weeks consisting of 18 categories; however NCD was collected over a period of 6 years consisting of 41 categories. We observe that F tagged abstracts are not underrepresented in NCD as females are over-represented in particular categories. We discover that the top 3 F tagged categories for NCD are Style & Beauty, Parenting and Entertainment which accounted for over 36% of the news reported in 41 categories, thus confirming that in the news articles collected over half of a decade that females are often presented in the news for motherhood and indeed often referred to for their physical characteristics. Inspired by two recent works [160, 86], we construct an exhaustive list of career words to further explore the working class distribution to establish a sense of occupational mentions across the three datasets. This set is created from the the combination of occupational (career-related) words from Appendix B.1.1 and B.1.2 (see Appendix B.2.1). Unlike [86], we did not use generic gender-neutral career words such as engineer, dentist, lawyer, etc., but instead we use gender-specific career words such as policeman, chairman, spokesman, etc., – and so on along with their respective female counterparts. Table 3.2 illustrates the top four intersecting career words for F compared to corresponding M gender-specific career words across the three datasets. Here, we see that within the news women suffer from several biases and are under-examined in regards to being acknowledged in 24 Table 3.3: The average number of the attribute words observed in each news abstract. Diversity (%) Avg. Career Words per Abstract Avg. Family Words per Abstract Dataset MIND NCD M 23.68 0.1258 0.6406 F 7.09 0.0907 0.6954 M 10.22 0.0657 0.4431 F 7.88 0.0554 0.4723 the working class. 3.5 Bias in Content In this section, we investigate the occurrence frequency of career-related words and family-related words in news abstracts of different genders, where specific words reflect socially-constructed stereotypes of different genders, such as females being excessively associated with family words more than career words. 3.5.1 Attribute Words In society, there are some socially-constructed stereotypes that heavily entail gender roles, i.e., a specific gender is more anticipated with certain words. For example, society tends to identify males with career-related words and females with family-related words [25]. Words that influence gender roles in society, are known as attribute words. We use these attribute words to measure the fairness in each now-labeled gender-tagged news abstract by comparing the averages of attribute words that emerge in each abstract for each label. Inspired by the recent works [25, 86], we then proceed to construct a more exhaustive list of attribute words. In comparison, the career words list consists of both gender-specific and gender-neutral occupational (career-related) words, and family words list consists of both gender-specific and gender neutral family-related words to measure the fairness of each gender (see Appendix B.2.1 and B.2.2). 3.5.2 Experiment In this measurement, we explore the average number of attribute words that appear in each gender- tagged abstract from a total of 357 masculine, feminine and neutral career-related and family-related words. As previously mentioned, females are excessively associated with family-related words more 25 than career-related words, unlike men who are typically associated with career-related words. The bias measurement is straightforward, yet fundamental as it examines the occurrence frequency of career-related and family-related words in each gender-tagged news abstract to demonstrate the existence of socially-constructed stereotypes. To do so, we check both subsets of attribute words simultaneously. Table 3.3 presents the gender diversity which is simply the total percentage of gender-tagged abstracts across each dataset, and the average attribute words observed in each abstract across both news datasets. One can observe that diversity results from MIND are poor as a result of females being greatly underrepresented in the news, however, NCD has diversity difference of 2.84% due to the over- representation in categories such as Style & Beauty, Parenting and Entertainment which acquired over a third of the NCD dataset, respectively. We observe that males are often associated with career-related words on average, and females are heavily and regularly associated with family-related words. These results are dismal as females are equally intelligent, thus these values should reflect similarity across both since both genders have the ability to advance in business. 3.6 Bias in Wording In this section we attempt to identify the influential terms i.e., the textual “centers” of the gender- tagged abstracts by applying two algorithms (1) Sentiment Analysis: to investigate the sentiment of an abstract’s contextual information used to describe different genders across both news datasets, and (2) Centering Resonance Analysis: to discover the most central nouns that mostly contribute to the meaning of a document or corpora. 3.6.1 Sentiment Analysis The sentiment of an abstract is crucial to examine if the opinions conveyed by the columnist are negative (Neg.), neutral (Neu.) or positive (Pos.). We apply the popular, well known sentiment analysis tool, VADER [66] to measure the sentiment of each news abstracts. VADER computes a normalized, weighted compound score of each word in a sentence by summing their valence scores between -1 (being extremely negative) and +1 (being extremely positive). As abstracts are 26 Figure 3.1: The resulting CRA network for the top 20 nouns in the M tagged abstracts. Figure 3.2: The resulting CRA network for the top 20 nouns in the F tagged abstracts. 27 usually one or more sentences, the abstracts are split into sentences to operate on a sentence level by employing an NLTK toolkit sentence tokenizer. Therefore, if there are more negative sentences than positive and neutral sentences, we treat the abstract as negative. Otherwise neutral or positive. An example of a neutral abstract is, “An auction of shares in Google, the web search engine which could be floated for as much as $36bn, takes place on Friday”. For oxymoronic news abstract cases where the number of positive and negative sentences are the same e.g., “He finally got the promotion he so longed for! Unfortunately, his wife filed for divorce that same day.”, we treat the abstract as neutral. We simply use the compound score within the respective thresholds of positive, negative and neutral sentiments when considering the sentiment of an abstract containing only one sentence. 3.6.2 Centering Resonance Analysis Corman et al. [31] contrast three objectives of computational text analysis as follows: Inference, Positioning and Representation [111]. The authors argue that a number of machine learning (ML) algorithms must be trained on a corpus before being applied, and that popular models such as Latent Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA) attempt to reduce a given text into a vector lying within the same semantic space. However, this encourages a narrow domain due to the quality of spatial construction and results in a loss of information. Therefore, there is a need for a representative method that can accomplish the three objectives. Centering Resonance Analysis (CRA) first proposed by Corman et al. [31] is a network word-based method that constructs a network representation of correlated words. This method exploits rich textual data and expresses the intention and meandering behaviors of authors (or columnists) [111]. CRA is able to determine textual “centers” without the use of dictionaries or being trained on a corpus i.e., identifying the most central nouns that mostly contribute to the meaning of a document or corpora. 3.6.3 Experiment In this measurement, we illustrate two representative text networks depicting the most central noun phrases for the combined gender-tagged abstracts. This bias measurement examines the compound noun phrases that are most prevalent for each gender. We measure the noun similarities between the two types of gender-tagged abstracts by combining both news datasets i.e. MIND and 28 NCD and calculating the resonance of the now MIND+NCD dataset. We first apply the sentiment analysis tool to predict the sentiment of each sentence in each abstract for M and F tagged abstracts, i.e., when there are more positive sentences than negative and neutral sentences, abstracts are treated as positive; otherwise negative or neutral. We later aggregate the positive abstracts for both M and F as this implies the the most constructive attention for both males and females. We neglect the negative and neutral abstracts as we assume common noun phrases would be generic words used in adverse news articles. For example, killer, murderer, – and so on. After aggregating a total of 23,795 positive abstracts, we first remove stopwords as they capture little to no semantic information and more importantly reduces computational complexity. We then implement two algorithms: (1) an NLTK package for identifying compound noun phrases by tagging parts-of-speech (POS-tagger), and more specifically, it exploits a Penn Treebank Tagger to identify compound nouns and adjectives. However, since we are solely interested in nouns, we only examine them; and (2) NetworkX for detecting and analyzing the centrality of networks, hence identifying the textual centers of each dataset. Figures 3.2. (a) and (b) presents the CRA networks results of the most central and/or compound nouns found in each abstract for M and F tagged abstracts. Note that, a total of 33,871 distinct nouns are prominent in the structuring of the text. The network construction became computationally expensive and did not have much explainability due to its denseness. Therefore, we attempt to address the dense network issue by constructing CRA networks for the top 20 compound nouns (highest resonance scores) for both gender-tagged abstracts. Each graph illustrates the positive nouns that contribute the most to specific topics of the abstracts according to their respective textual centers. The results are utterly disappointing as females (F tagged abstracts) are undoubtedly heavily associated with family words in comparison to males (M tagged abstracts) are often associated with political and occupational terms. The top 20 words females are densely associated with are mother, wife, beloved, happy, home, wedding, family, beauty, son, child, toddler, baby, 1-year-olds, aisle, planned, deposits, money, products, hygiene and influencer, respectively. While males are easily associated with president, Washington, manager, economy, mayor, sports, democratic, 29 impeachment, career, gym, trump, football, coach, hero, touchdowns, quarterback, game, win and college, respectively. Thus, there exists a strict gender dichotomy of men and women. Even though women succeed at clichéd male tasks the nouns found in F tagged abstracts demonstrate that women are underrepresented and under-examined in the news. 3.7 Conclusion In this paper, we have investigated that gender bias in media appears in different forms such as ideological bias, coverage bias, selection bias, and presentation bias in the news. We discussed that to secure users’ attention, news titles and abstracts are typically written with contentious sentences or clauses. We conducted a pioneering initial study of implicit and explicit gender bias in news abstracts from two benchmark news recommendations and news classifications datasets, and conclude that gender bias has been present in the news and has been around for decades. By systematically conducting large scale analyses of each news corpora we detected and examined gender biases in form of (1) bias in gender distribution across all news categories and exploring the top four intersecting career words (prefixes) for females compared to their respective male counterparts; (2) bias in content in terms of attribute words which consist of 2 word categories (a) Possessive words dataset which contains a total of 465 masculine and feminine gender-specific and gender-neutral possessive nouns, and (b) Attribute words dataset which contains a total of 357 masculine, feminine and neutral career-related and family-related words; and (3) bias in wording by constructing CRA networks for the top 20 most central nouns for both gender-tagged abstracts. Each graph illustrated the compound nouns that contributed the most tagged abstracts. Although we acknowledge that women account for half of the world’s population they are incredibly under-examined and underrepresented in the news. We can immediately deduce that in both datasets, categories such as Politics and Business contain the largest measure of gender bias, as females are immensely under-examined and underrepresented in these areas. Male dominance is prevalent and thoroughly documented while women are depicted as ‘family oriented’, and consequently we observe that news media heavily influences gender roles in society. As females (F tagged abstracts) are undoubtedly heavily associated with family words in comparison to males (M 30 tagged abstracts) who are often associated with political and occupational terms. Many disciplines such as sociology, social psychology, sociolinguistics and so on – study the phenomena of how language (and written text) plays a crucial role in upholding social hierarchies. In addition, we construct the two large benchmark datasets as follows: a possessive (gender- specific and gender-neutral) nouns and a attribute (career-related and family-related) words dataset to study the paradox of gender bias, and we will release our gendered-word datasets to foster both bias and fairness research in multiple domains such as branches in computer science and computational social science which would help to build fair NLP models by eliminating the gender bias. Although, since we focused on the M/F tagged abstracts of both news datasets, there is still a need to address the socially-constructed gender biases in the news in regards to public affairs and politics. Future works may include building fair NLP models that are trained on our two large benchmark possessive nouns and attribute words datasets. These new datasets will be monitored and updated, and therefore can be directly applied to NLP tasks such as text classification, word embeddings, coreference resolution, language modeling, machine translation, semantic role labeling, dialogue generation, etc. 31 CHAPTER 4 DETECTING HARMFUL ONLINE CONVERSATIONAL CONTENT TOWARDS LGBTQIA2S+ INDIVIDUALS Warning: Due to the overall purpose of the study, this paper contains examples of stereotypes, profanity, vulgarity and other harmful languages in figures and tables that may be triggering or disturbing to LGBTQIA2S+ individuals, activists and allies, and may be distressing for some readers. Online discussions, panels, talk page edits, etc., often contain harmful conversational content i.e., hate speech, death threats and offensive language, especially towards certain demographic groups. For example, individuals who identify as members of the LGBTQIA2S+ community and/or BIPOC (Black, Indigenous, People of Color) are at higher risk for abuse and harassment online. In this work, we first introduce a real-world dataset that will enable us to study and understand harmful online conversational content. Then, we conduct several exploratory data analysis experiments to gain deeper insights from the dataset. We later describe our approach for detecting harmful online Anti-LGBTQIA2S+ conversational content, and finally, we implement two baseline machine learning models (i.e., Support Vector Machine and Logistic Regression), and fine-tune 3 pre-trained large language models (BERT, RoBERTa, and HateBERT). Our findings verify that large language models can achieve very promising performance on detecting online Anti-LGBTQIA2S+ conversational content detection tasks. 4.1 Introduction Harmful online content from real-word conversations has become a major issue in today’s society, even though queer people often rely on the sanctity of online spaces to escape offline abuse [35, 37]. However, individuals who may oppose, criticize, or possess contradictory feelings, beliefs, or motivations towards certain communities constitute discrimination, harassment and abuse in the form of hate speech, abusive and offensive language use [35, 37, 12, 8]. Unfortunately, this issue results in the maintenance and sustenance of harmful stereotypical societal biases. Online conversational toxicity, death threats and other harmful languages can prevent people from genuinely expressing themselves out of fear of abuse and/or harassment, or encourage self-harm. 32 Conversations pertaining to members of the LGBTQIA2S+ community may lead to increased feelings of marginalization of an already marginalized community. Consequently, social media remains a hostile, exclusive, restrictive, and controlling environment for gender and sexual orientation, race, and LGBTQIA2S+ individuals, activists and allies [108, 35, 163], despite substantial progress on LGBTQIA2S+ rights causing a massive surge in negative online behaviors [39]. Accordingly, there has been an alarming increase in hate speech and abusive language instances toward the LGBTQIA2S+ community [108]. Gender is a spectrum, where the now LGBTQIA2S+ acronym continue to expand in an attempt to include all gender and sexual identities, for example, SGL, QPOC/QTPOC, QUILTBAG, etc. There may be as many definitions as there are people. Nonetheless, the LGBTQIA2S+ community not only considers gender identity, but encapsulates a multiplicity of sexual orientations, and relationships which are integral aspects of our everyday society, yet they lead to discrimination, harassment and abuse both offline and online [26]. As conversational platforms struggle to effectively facilitate harmful conversations, there is a dire need to create a safe and inclusive place that welcomes, supports, and values all LGBTQIA2S+ individuals (with the exception of dating sites and mobile dating applications) [115] by better facilitating Anti-LGBTQIA2S+ conversational content [108]. With the expansive growth of artificial intelligence (AI) and natural language processing (NLP) technologies, both researchers and practi- tioners can perform large-scale analysis, which aids in the automatic interpretation of unstructured text data, e.g., social media comments. To address the above challenges, in this paper, we aim to intersect NLP and queerness by implementing both machine learning (ML) and large language models (LLMs) models to readily identify and detect online Anti-LGBTQIA2S+ conversational content. Contributions. The contributions of this work are threefold: 1) we adapt the gender orientation (LGBTQIA2S+, straight) dimension from RedditBias [5] for the task of harmful conversational content detection to study stereotypical societal biases against LGBTQIA2S+ individuals by 33 Table 4.1: Queerness examples comments from RedditBias. Orientation Comments lesbian, gay, bisexual, transgender, asexual, pansexual, ... gay people are faggots and carry a bunch of other diseases with them. trans lesbians are mentally ill men identifying as women. doesnt sound normal to me bisexuals are confused. just about life in general Table 4.2: Automated labeled queerness (shortened) example comments from Table 4.1 using Detoxify. Comments gay people are faggots... trans lesbians are mentally ill... bisexuals are confused... Toxicity Severe Toxicity Obscene Threat 0.014 0.01 0.017 0.995 0.949 0.977 0.214 0.042 0.084 0.882 0.231 0.271 Insult 0.953 0.446 0.674 Identity Attack 0.777 0.711 0.831 Table 4.3: Harmful and non-harmful comment counts w.r.t each label for a total of 9930 comments. Label Toxicity Severe Toxicity Obscene Threat 185 9745 7529 2401 1590 8340 28 9902 1 0 Insults 2244 7686 Identity Attack 4494 5436 implementing a multi-headed BERT-based toxic comment detection model [60] to identify several forms of toxicity; (2) a detailed human evaluation of our human annotators to ensure data quality (see Appendix C.1 for details); and (3) we construct a large multi-labelled classification dataset for a total of 6 distinct labels to distinguish several forms of toxicity. To the best of our knowledge, our dataset is the first such dataset created for both binary and multi-label classification of 6 distinct labels for automated harmful conversational content detection to study stereotypical societal biases against LGBTQIA2S+ individuals. We release our labeled dataset for future shared tasks in hopes that AI and NLP practitioners develop and deploy safe, LGTBQIA+ inclusive technologies to readily identify and remove harmful online conversational content geared toward the LGBTQIA2S+ community. We will release the both the multi-label dataset with all code at: https://github.com/daconjam/Harmful-LGBTQIA. 4.2 Preliminaries In this section, we introduce some preliminary knowledge about the problem under study. We first present the problem statement, then we introduce the dataset and conduct EDA experiments. Later, we describe the automatic labeling process, and human evaluation. 34 4.2.1 Problem Statement Due to the rampant use of the internet, there has been a massive surge in negative online behaviors both on social media and online conversational platforms [146, 37, 119, 136]. Hence, there is a great need to drastically reduce hate speech and abusive language instances toward the LGBTQIA2S+ community to create a safe and inclusive place for all LGBTQIA2S+ individuals, activists and allies. Therefore, we encourage AI and NLP practitioners to develop and deploy safe and LGTBQIA+ inclusive technologies to identify and remove online Anti-LGBTQIA2S+ conversational content i.e., if a comment is or contains harmful conversational content conducive to the LGBTQIA2S+ community [26]. To address the above problem, we define 3 goals: 1. The first goal is to detect several forms of toxicity in comments geared toward LGBTQIA2S+ individuals such as threats, obscenity, insults, and identity-based attacks. 2. The second goal is to conduct EDA and a detailed human evaluation to gain a better understanding of a new multi-labeled dataset e.g., label correlation and feature distribution that represents the overall distribution of continuous data variables. 3. The third goal is to accurately identify and detect harmful conversational content in social media comments. 4.2.2 Dataset As Reddit is one of the most widely used online discussion social media platforms, [5] released RedditBias, a multi-dimensional societal bias evaluation and mitigation resource for multiple bias dimensions dedicated to conversational AI. RedditBias is created from real-world conversations collected from Reddit, annotated for four societal bias dimensions: (i) Religion (Jews, Christians) and (Muslims, Christians), (ii) Race (African Americans), (iii) Gender (Female, Male), and (iv) Queerness (LGBTQIA2S+, straight). We adapt the queerness (gender/sexual orientation) dimension and collect a total of 9930 LGBTQIA2S+ related comments discussing topics involving individuals who identify as Lesbian, 35 Gay, Bisexual, Transgender, Queer/Questioning, Intersex, Asexual, etc., (see Table 4.1). For more details of RedditBias creation, bias specifications, retrieval of candidates for biased comments, and manual annotation and preprocessing of candidate comments see [5]. In addition, RedditBias is publicly available with all code online at: https://github.com/umanlp/RedditBias. 4.2.3 Annotation Our collected dataset encapsulates a multitude of gender identities and sexual orientations, thus, we denote our dataset under the notion of queerness. Note that the comments from RedditBias are unlabeled for our tasks, hence we attempt to label each comment accordingly for each classification task. To do so, we implement Detoxify [60], a multi-headed BERT-based model [41] capable of detecting different types of toxicity such as threats, obscenity, insults, and identity-based attacks and discovering unintended bias in both English and multilingual toxic comments. Detoxify is created using Pytorch Lightning and Transformers, and fine-tuned on datasets from 3 Jigsaw challenges, namely Toxic comment classification, Unintended Bias in Toxic comments and Multilingual toxic comment classification for multi-label classification to detect toxicity across a diverse range of conversations. Specifically, we use Detoxify’s original model that is trained on a large open dataset of Wikipedia+Civil Talk Page comments which have been labeled by human raters for toxic behavior. In Table 4.2, we display harmful predicted probabilities of (shortened) example comments from Table 4.1 into their respective labels (i.e., toxicity, severe toxicity, obscene, threat, insult, and identity attack), using Detoxify. Therefore, we create a multi-labeled queerness dataset which can be used for downstream tasks such as binary and multi-label toxic comment classification to predict Anti-LGBTQIA2S+ online conversational content. Unfortunately, to avoid public discrepancies such as cultural and societal prejudices amongst human raters, competitors, and LGBTQIA2S+ individuals of the 3 Jigsaw challenges and its data, official documentation and definitions of these classifications are unavailable. Therefore, we cannot know for certain what each label (l) means and why. Therefore, to address the issue of this “unknown” labeling schema and gain better datasets insights, such as feature distributions and identifying what 36 Figure 4.1: Label correlation heatmap matrix. quantifies a comment as “harmful” or “non-harmful”, we conduct both univariate and multivariate analysis. In Figure 4.1, we illustrate a correlation matrix between labels. For each cell, we follow a label threshold setting i.e., 𝑙 > 0.7 to determine heavily correlated labels. Here, we see several heavily correlated relationships (i) toxicity, insult and identity attack, and (ii) severe toxicity, insult and obscene. As previously mentioned, our automated labeled queerness dataset contains probabilities of comments’ harmfulness. For each classification task, we will now consider two classes, “harmful : 1” and “non-harmful : 0” per label by following a class (c) threshold mapping system i.e., 𝑐 >= 0.5 → 1 to determine whether a comment is deemed harmful, or not. In Table 4.3, we display harmful comment counts w.r.t each label that satisfy our class threshold mapping system. More information about figures on data breakdowns, distribution plots (i.e., to depict the variation in the data distribution), and knowledge of which words constitute a “harmful” or “non-harmful” comment for each label can be found in Appendices C.2, C.3 and C.4. 37 4.2.4 Human Evaluation We employ Amazon Mechanical Turk (AMT) annotators1. Due to the nature of the comments, it was quite difficult to acquire a large number of annotators that were willing to manually rate 1000 randomly sampled comments to measure the effectiveness of the toxicity classifier, due to the examples of stereotypes, profanity, vulgarity and other harmful languages towards LGBTQIA2S+ individuals. Note that terms are not filtered as they are representative of real-world conversations and are exceedingly essential to our goals mentioned in Section 4.2.1. After this, a total 15 annotators (maximum) were more than willing to help achieve this goal. First, we aggregate an LGBTQIA2S+ sources from OutRight, a human rights organization for LGBTQIA2S+ people in an attempt to educate annotators on identity, sexuality, and relationship definitions of the expanded LGBT (older) acronym in a move towards inclusivity. Then, the annotators were asked to indicate whether a comment is toxic or non-toxic. As the toxicity label is the most prevalent label, if a comment 𝑥 is deemed non-toxic, then the annotators may discard this comment. However, if 𝑥 is deemed toxic, then each annotator is provided with 5 additional labels (i.e., severe toxicity, obscene, threat, insult, and identity attack) along with their respective definitions2 1Each AMT annotators is independent, and either an LGBTQIA2S+ individual, activist or ally. In addition, each annotator is filtered by HIT approval rate ≥ 93%, completed > 7,500 HITs and located within the United States. 38 CHAPTER 5 A MULTI-LAYERED LANGUAGE ANALYSIS: A CASE STUDY OF AFRICAN-AMERICAN ENGLISH Currently, natural language processing (NLP) models proliferate language discrimination leading to potentially harmful societal impacts as a result of biased outcomes. For example, part-of-speech taggers trained on Mainstream American English (MAE) produce non-interpretable results when applied to African American English (AAE) as a result of language features not seen during training. In this work, we incorporate a human-in-the-loop paradigm to gain a better understanding of AAE speakers’ behavior and their language use, and highlight the need for dialectal language inclusivity so that native AAE speakers can extensively interact with NLP systems while reducing feelings of disenfranchisement. 5.1 Introduction Over the years, social media users have leveraged online conversational platforms to perpetually express themselves online. For example, African American English (AAE), an English language variety is often heavily used on Twitter [48, 13]. This dialect continuum is neither spoken by all African Americans or individuals who identify as BIPOC (Black, Indigenous, or People of Color), nor is it spoken only by African Americans or BIPOC individuals [48, 11]. In some cases, AAE, a low-resource language (LRL) may be the first (or dominant) language, rather than the second (or non-dominant) language of an English speaker. Specifically, AAE is a regional dialect continuum that consists of a distinct set of lexical items, some of which have distinct semantic meanings, and may possess different syntactic structures/patterns than in Mainstream American English (MAE) (e.g., differentiating habitual be and non-habitual be usage) [127, 45, 71, 48, 11, 6, 13, 79]. In particular, [56] states that AAE possesses a morphologically invariant form of the verb that distinguishes between habitual action and currently occurring action, namely habitual be. For example, “the habitual be” experiment by University of Massachusetts Amherst’s Janice Jackson. However, AAE is perceived to be “bad english” despite numerous studies by socio/raciolinguists 39 Table 5.1: An illustrative example of POS tagging of semantically equivalent sentences written in MAE and linguistics features of AAE lexical items, and their misclassified NLTK (inferred) tags, respectively. MAE AAE Input Output Input Output I have never done this before (I, ), (have, ), (never, ), (done, ), (that, ), (before, ) I aint neva did dat befo (I, ), (aint, ), (neva, ), (did, )(dat, ), (befo, ) and dialectologists in their attempts to quantify AAE as a legitimized language [6, 48, 11, 79]. “[T]he common misconception [is] that language use has primarily to do with words and what they mean. It doesn’t. It has primarily to do with people and what they mean.” – [30] Recently, online AAE has influenced the generation of resources for AAE-like text for natural language (NLP) and corpus linguistic tasks e.g., part-of-speech (POS) tagging [72, 16], language generation [57] and automatic speech recognition [45, 133]. POS tagging is a token-level text classification task where each token is assigned a corresponding word category label (see Table 5.1). It is an enabling tool for NLP applications such as a syntactic parsing, named entity recognition, corpus linguistics, etc. In this work, we incorporate a human-in-the-loop paradigm by directly involving affected (user) communities to understand context and word ambiguities in an attempt to study dialectal language inclusivity in NLP language technologies that are generally designed for dominant language varieties. [34] state that, “NLP systems aim to [learn] from natural language data, and mitigating social biases become a compelling matter not only for machine learning (ML) but for social justice as well.” To address these issues, we aim to empirically study predictive bias (see [129] for definition) i.e., if POS tagger models make predictions dependent on demographic language features, and attempt a dynamic approach in data-collection of non-standard spellings and lexical items. To examine the behaviors of AAE speakers and their language use, we first collect variable (morphological and phonological) rules of AAE language features from literature [79, 4, 56, 11, 127, 14, 46, 6, 55] (see Appendix D.3). Then, we employ 5 trained sociolinguist Amazon Mechanical Turk (AMT) 40 annotators1 who identify as bi-dialectal dominant AAE speakers to address the issue of lexical, semantic and syntactic ambiguity of tweets (see Appendix D.2 for annotation guidelines). Next, we incorporate a human-in-the-loop paradigm by recruiting 20 crowd-sourced diglossic annotators to evaluate AAE language variety (see Table 5.2). Finally, we conclude by expanding on the need for dialectal language inclusivity. 5.2 Related Work Previous works regarding AAE linguistic features have analyzed tasks such as unsupervised domain adaptation for AAE-like language [72], detecting AAE syntax[127], language identification [15], voice recognition and transcription [45], dependency parsing [16], dialogue systems [85], hate speech/toxic language detection and examining racial bias [120, 58, 142, 38, 162, 102, 143, 78], and language generation [57]. These central works are conclusive for highlighting systematic biases of natural language processing (NLP) systems when employing AAE in common downstream tasks. Although we mention popular works incorporating AAE, this dialectal continuum has been largely ignored and underrepresented by the NLP community in comparison to MAE. Such lack of language diversity cases constitutes technological inequality to minority groups, for example, by African Americans or BIPOC individuals, and may intensify feelings of disenfranchisement due to monolingualism. We refer to this pitfall as the inconvenient truth i.e., “[I]f the systems show discriminatory behaviors in the interactions, the user experience will be adversely affected.” — [85] Therefore, we define fairness as the model’s ability to correctly predict each tag while performing zero-shot transfer via dialectal language inclusivity. Moreover, these aforementioned works do not discuss nor reflect on the “role of the speech and language technologies in sustaining language use” [79, 10, 13] as, “... models are expected to make predictions with the semantic information rather than with the demographic group identity information” — [153]. 1A HIT approval rate ≥ 95% was used to select 5 bi-dialectal AMT annotators between the ages of 18 - 55, and completed > 10,000 HITs and located within the United States. 41 Figure 5.1: An illustration of inferred and manually-annotated AAE tag counts from 𝑘 randomly sampled tweets. Interactions with everyday items is increasingly mediated through language, yet systems have limited ability to process less-represented dialects such as AAE. For example, a common AAE phrase, “I had a long ass day” would receive a lower sentiment polarity score because of the word “ass”, a (noun) term typically classified as offensive; however, in AAE, this term is often used as an emphatic, cumulative adjective and perceived as non-offensive. Motivation: We want to test our hypothesis that training each model on correctly tagged AAE language features will improve the model’s performance, interpretability, explainability, and usability to reduce predictive bias. 5.3 Dataset and Annotation 5.3.1 Dataset We collect 3000 demographically-aligned African American (AA) tweets possessing an average of 7 words per tweet from the publicly available TwitterAAE corpus by [14]. Each tweet is accompanied by inferred geolocation topic model probabilities from Twitter + Census demographics 42 Table 5.2: Accurately tagged (observed) AAE and English phonological and morphological linguistic feature(s) accompanied by their respective MAE equivalent(s). Tags Category Coordinating Conjunction CC Determiner DT Existential There EX Preposition/ Conjunction IN Adjective JJ PRP Pronoun PRP$ Personal Pronoun RB Adverb RBR Adverb, comparative RP TO UH VBG Verb, gerund VBZ Verb, 3rd-person present tense WDT Wh-determiner WRB Wh-adverb Particle Infinite marker Interjection AAE Example(s) MAE Equivalent(s) doe/tho, n, bt da, dis, dat dea fa, cuz/cause, den foine, hawt u, dey, dem ha tryna, finna, jus mo, betta, hotta bout, thru ta wassup, ion, ian sleepin, gettin iz dat, wat, wus, wen hw though, and, but the, this, that there for, because, than fine, hot you, they, them her trying to, fixing to, just more, better, hotter about, through to what’s up, I don’t sleeping, getting is that, what, what’s, when how and word likelihoods to calculate demographic dialect proportions. We aim to minimize (linguistic) discrimination by sampling tweets that possess over 99% confidence to develop “fair” NLP tools that are originally designed for dominant language varieties by integrating non-standardized varieties. More information about the TwitterAAE dataset, including its statistical information, annotation process, and the link(s) to downloadable versions can be found in Appendix D.1. 5.3.2 Preprocessing As it is common for most words on social media to be plausibly semantically equivalent, we denoise each tweet as tweets typically possess unusual spelling patterns, repeated letter, emoticons and emojis2. We replace sequences of multiple repeated letters with three repeated letters (e.g., Hmmmmmmmm → Hmmm), and remove all punctuation, “@” handles of users and emojis. Essentially, we aim to denoise each tweet only to capture non-standard spellings and lexical items more efficiently. 2Emoticons are particular textual features made of punctuation such as exclamation marks, letters, and/or numbers to create pictorial icons to display an emotion or sentiment (e.g., “;)” ⇒ winking smile), while emojis are small text-like pictographs of faces, objects, symbols, etc. 43 5.3.3 Annotation First, we employ off-the-shelf taggers such as spacy and TwitterNLP; however, the Natural Language Toolkit (NLTK) [90] provides a more fine-grained Penn Treebank Tagset (PTB) along with evaluation metrics per tag such as F1 score. Next, we focus on aggregating the appropriate tags by collecting and manually-annotating tags from AAE/slang-specific dictionaries to assist the AMT annotators, and later we contrast these aggregated tags with inferred NLTK PTB inferred tags. In Figure 5.1, we display NLTK inferred and manually-annotated AAE tags from 𝑘 = 300 randomly sampled tweets. • The Online Slang Dictionary (American, English, and Urban slang) - created in 1996, this is the oldest web dictionary of slang words, neologisms, idioms, aphorisms, jargon, informal speech, and figurative usages. This dictionary possesses more than 24,000 real definitions and tags for over 17,000 slang words and phrases, 600 categories of meaning, word use mapping and aids in addressing lexical ambiguity. • Word Type - an open source POS focused dictionary of words based on the Wiktionary project by Wikimedia. Researchers have parsed Wiktionary and other sources, including real definitions and categorical POS word use cases necessary to address the issue of lexical, semantic and syntactic ambiguity. 5.3.4 Human Evaluation After an initial training of the AMT annotators, we task each annotator to annotate each tweet with the appropriate POS tags. Then, as a calibration study we attempt to measure the inter-annotator agreement (IAA) using Krippendorff’s 𝛼. By using NLTK’s [90] nltk.metrics.agreement, we calculate a Krippendorf’s 𝛼 of 0.88. We did not observe notable distinctions in annotator agreement across the individual tweets. We later randomly sampled 300 annotated tweets and recruit 20 crowd-sourced annotators to evaluate AAE language variety. To recruit 20 diglossic annotators3, 3Note that we did not collect certain demographic information such as gender or race, only basic demographics such as age (18-55 years), state and country of residence. 44 we created a volunteer questionnaire with annotation guildlines, and released it on LinkedIn. The full annotation guildlines can be found in Appendix D.2. Each recruited annotator is tasked to judge sampled tweets and list their MAE equivalents to examine contextual differences of simple, deterministic morphosyntactic substitutions of dialect-specific vocabulary in standard English or MAE texts—a reverse study to highlight several varieties of AAE (see Table 5.2). 5.4 Methodology In this section, we describe our approach to perform a preliminary study to validate the existence of predictive bias [46, 124] in POS models. We first introduce the POS tagging, and then propose two ML sequence models. 5.4.1 Part-of-Speech (POS) Tagging We consider POS tagging as it represents word syntactic categories and serves as a pre-annotation tool for numerous downstream tasks, especially for non-standardized English language varieties such as AAE [151]. Common tags include prepositions, adjective, pronoun, noun, adverb, verb, interjection, etc., where multiple POS tags can be assigned to particular words due to syntactic structural patterns. This can also lead to misclassification of non-standardized words that do not exist in popular pre-trained NLP models. 5.4.2 Models We propose to implement two well known sequence modeling algorithms, namely a Bidirectional Long Short Term Memory (Bi-LSTM) network, a deep neutral network (DNN) [63, 54] that has been used for POS tagging [84, 106], and a Conditional Random Field (CRF) [80] typically used to identify entities or patterns in texts by exploiting previously learned word data. Taggers: First, we use NLTK [90] for automatic tagging; then, we pre-define a feature function for our CRF model where we optimized its L1 and L2 regularization parameters to 0.25 and 0.3, respectively. Later, we train our Bi-LSTM network for 40 epochs with an Adam optimizer, and a learning rate of 0.001. Note that each model would be accompanied by error analysis for a 70-30 45 split of the data with 5-fold cross-validation to obtain model classification reports, for metrics such as precision, recall and F1-score. 5.5 Operationalization of AAE as an English Language Variety As (online) AAE can incorporate non-standardized spellings and lexical items, there is an active need for a human-in-the-loop paradigm as humans provide various forms of feedback in different stages of workflow. This can significantly improve the model’s performance, interpretability, explainability, and usability. Therefore, crowd-sourcing to develop language technologies that consider who created the data will lead to the inclusion of diverse training data, and thus, decrease feelings of marginalization. For example, CORAAL, is an online resource that features AAL text data, recorded speech data, etc., into new and existing NLP technologies, AAE speakers can extensively interact with current NLP language technologies. Consequently, to quantitatively and qualitatively ensure fairness in NLP tools, artificial intelli- gence (AI) and NLP researchers need to go beyond evaluation measures, word definitions and word order to assess AAE on a token-level to better understand context, culture and word ambiguities. We encourage both AI and NLP practitioners to prioritize collecting a set of relevant labeled training data with several examples of informal phrases, expressions, idioms, and regional-specific varieties. Specifically, in models intended for broad use such as sentiment analysis by partnering with low-resource and dialectal communities to develop impactful speech and language technologies for dialect continua such as AAE to minimize further stigmatization of an already stigmatized minority group. 5.6 Conclusion Throughout this work, we highlight the need to develop language technologies for such varieties, pushing back against potentially discriminatory practices (in many cases, discriminatory through oversight more than malice). Our work calls for NLP researchers to consider both social and racial hierarchies sustained or intensified by current computational linguistic research. By shifting towards a human-in-the-loop paradigm to conduct deep multi-layered dialectal language analysis of AAE 46 to counter-attack erasure and several forms of biases such as selection bias, label bias, model overamplification, and semantic bias (see [124] for definitions) in NLP. We hope our dynamic approach can encourage practitioners, researchers and developers for AAE inclusive work, and that our contributions can pave the way for normalizing the use of a human-in-the-loop paradigm both to obtain new data and create NLP tools to better comprehend underrepresented dialect continua and English language varieties. In this way, NLP community can revolutionize the ways in which humans and technology cooperate by considering certain demographic attributes such as culture, background, race and gender when developing and deploying NLP models. 5.7 Limitations And Ethical Considerations All authors must warrant that increased model performance for non-standard varieties such as underrepresented dialects, non-standard spellings or lexical items in NLP systems can potentially enable automated discrimination. In this work, we solely attempt to highlight the need for dialectal inclusivity for the development of impactful speech and language technologies in the future, and do not intend for increased feelings of marginalization of an already stigmatized community. 47 CHAPTER 6 DETECTING AND MITIGATING INHERENT LINGUISTIC BIAS IN LARGE LANGUAGE MODELS Recent studies show that NLP models trained on standard English texts tend to produce biased outcomes against underrepresented English varieties. In this work, we conduct a pioneering study of the English variety use of African American English (AAE) in NLI task. First, we propose CODESWITCH, a greedy unidirectional morphosyntactically-informed rule-based translation method for data augmentation. Next, we use CODESWITCH to present a preliminary study to determine if demographic language features do in fact influence models to produce false predictions. Then, we conduct experiments on two popular datasets and propose two simple, yet effective and generalizable debiasing methods. Our findings show that NLI models (e.g. BERT) trained under our proposed frameworks outperform traditional large language models while maintaining or even improving the prediction performance. In addition, we intend to release CODESWITCH, in hopes of promoting dialectal language diversity in training data to both reduce the discriminatory societal impacts and improve model robustness of downstream NLP tasks. 6.1 Introduction In recent years, social media has become a pivotal tool its users to express their thoughts, feelings, and opinions on similar interests [37]. Typically, Standard American English (SAE), a high-resource language (HRL) is often used in formal communication, whereas African American English (AAE)1 is primarily spoken in the United States and is often heavily and explicitly used on social media platforms such as Twitter [48, 13]. In particular, AAE is an English language variety and can be considered to be a low-resource language (LRL) that is neither spoken by all African Americans or individuals who identify as 1This English language variety has had several names within the last decades such as African American Vernacular English (AAVE), African American Language (AAL), Black English, Ebonics, Non-standard English, Northern Negro English and Black English Vernacular (BEV) [4, 56, 11, 76]. However, it is now commonly referred to as African American English (AAE), an English language variety. 48 BIPOC (Black, Indigenous, or People of Color), nor is it spoken only by African Americans or BIPOC individuals [48, 33, 11]. However, most dominant AAE speakers reside in diglossic communities and are able to code-switch, speaking both SAE and AAE. In linguistics, code-switching also referred to as language alternation is the ability of a speaker to alternate between two or more languages or language varieties within a particular conversation [149, 51, 40, 148, 33]. Thus, we refer to code-switching as switching among dialects, and/or language styles. For example, bi-dialectal AAE speakers are often able to code-switch between the SAE and both phonological and morphological language features of AAE while maintaining contextual intent. Natural Language Understanding (NLU) is a subset of NLP, which enables human-computer interaction (HCI) by attempting to understand human language data such as text or speech, and communicate back to humans in their respective languages such as English, Spanish, etc., [121]. Hence, we will focus on inference, which is an eminent area of study of NLU. In particular, Natural language inference (NLI), a subset of NLU, also known as Recognizing Textual Entailment (RTE) is a segment-level categorization task of understanding the inferential relationships between sentence pairs and anticipating whether they are entailing, contradictory, or neutral sentences [23, 138]. Generally, the term implicit bias is used to refer to the unconscious preferential behaviors towards a certain demographic group such as age, race, ethnicity, gender, etc. [88, 131, 110]. However, in this study, to examine the differences in language styles from different demographic groups, we refer to this type of predisposed language style bias as inherent linguistic bias. Although, both biases are very similar, there exists a subtle difference as linguistic bias specifically refers to an analysis of every aspect of a particular language [161]. The existence of these biases in large language models (LLMs) such as mask language models (MLMs) generate language bias leading to potential harmful societal impacts inconveniencing members of LRL and diglossic communities who speak both standard languages and unrepresented dialects. This may increase feelings of marginalization and disenfranchisement [85, 13, 48]. Hence, in this work, we conduct a pioneering study of robustifying MLMs to minimize false predictions by introducing dialectal language diversity in training data to determine if MLMs learn 49 to make predictions based on demographic language features, and proposing two debias methods to enhance NLI models to mitigate the presence of linguistic bias during the training process. We posit that it is vital for production-ready MLMs improve their robustness to produce minimal systemic biases against protected attributes such as race and gender and thus, reducing discriminatory societal impacts [64, 126, 85, 131]. Specifically, we aim to answer two research questions: (1) How can we as NLP practitioners encourage dialectal language diversity in training data?; (2) Do pretrained MLMs make predictions based on demographic language features?; and (3) How can we measure fairness and mitigate such biases in order to ensure fairness in NLU. Our contributions include: • CODESWITCH, a greedy unidirectional morphosyntactically-informed rule-based translation method for data augmentation to generate intent-and-semantically equivalent AAE examples by perturbing SAE examples. • Two intent-and-semantically equivalent NLI dataset of AAE sentence pairs with a wide range of morphological syntactic features and dialect-specific vocabulary. • A detailed human evaluation of our human annotators to ensure contextual accuracy of adversarial sentence pairs (see Appendix E.4 for details). • Two simple, yet effective debiasing methods to mitigate the inherent linguistic bias in NLI models, while maintaining or even improving their prediction performance. 6.2 Preliminaries In this section, we introduce some preliminary knowledge about the problem under study. We first present the problem statement, and then describe two popular NLI datasets used in our research. 6.2.1 Problem Statement We aim to investigate sentence representations of two linguistic systems of different demographic groups to demonstrate the existence of constitutional linguistic bias. To address the above research 50 Table 6.1: Randomly chosen original SNLI and MNLI examples and their inferential relationships. Dataset Premise SNLI MNLI A land rover is being driven across a river. Children smiling and waving at camera An older man is drinking orange juice at a restaurant. Two women are at a restaurant drinking wine. So i have to find a way to supplement that The new rights are nice enough I don’t know um do you do a lot of camping I need a way to add something extra. Everyone really likes the newest benefits I know exactly. Hypothesis A vehicle is crossing a river. They are smiling at their parents Label entailment neutral contradiction entailment neutral contradiction questions, we define two goals: 1. The first goal is to predict inferential relationships between paired sentences i.e., the second sentence is an entailment, contradiction, or neutral with respect to the first sentence. 2. The second goal is to debias the sentence representations obtained from the words in the given sentence. Specifically, we want the sentence representation to only include the semantic information, but not the language style, whether SAE or AAE. Therefore, we want the MLM to ignore the language style of each demographic group in order to make fair predictions. Mitigating such linguistic biases can help develop robust MLMs for LRLs and dialectal languages more easily. Our main objective is to focus on dialectal language inclusivity, while using the benefit of large pretrained MLMs in order to improve model robustness of downstream tasks of NLP technologies for LRLs and language varieties. 6.2.2 Dataset In this subsection, we introduce two of the largest, most popular NLP datasets for textual inference, namely, the Stanford Natural Language Inference (SNLI) and Multi-Genre Natural Language Inference (MNLI) corpora. 6.2.2.1 SNLI corpus The SNLI [23] corpus is constructed from the Flickr30k corpus [147]. The original image caption is classified as the premise, whereas, the hypothesis is a human-written premise-related sentence that must satisfy one of one of three relational conditions: (1) Entailment – true image description, (2) Neutral – neutral image description, and (3) Contradiction – false or random image 51 Table 6.2: Augmented SNLI and MNLI examples (from Table 6.1) following the application of CODESWITCH. Each blue highlight corresponds to the AAE equivalent from their respective SAE counterpart. Dataset SNLI AAE MNLI AAE Premise A land rover bein driven across a river. Children smilin n wavin at camera A older man drinkin orange juice at a restaurant. Two women at a restaurant drinkin wine. So i gotta find a way ta supplement dat Da new rights nice enough Ion kno um do u do a lot of campin I need a way ta add sumn extra. Everybody really likes da newest benefits I kno exactly. Hypothesis A vehicle crossin a river. Dey smilin at they parents Label entailment neutral contradiction entailment neutral contradiction description. The SNLI corpus is a collection of 570K premise-hypothesis sentence pairs, where each pair is aligned with one of these three relational labels. 6.2.2.2 MNLI corpus Similarly to SNLI, the MNLI corpus [138] is a closely related crowd-sourced collection of 433k sentence pairs and their relational labels. However, MNLI contains 10 distinct genre categories (i.e., Letters, Verbatim, Fiction, Face-to-face, Travel, Telephone, Travel, Oxford University Press, Slate, 9/11, and Government) written and spoken data instead of image caption data. 6.3 CODESWITCH Creation In this section, we first describe the process of the creation of CODESWITCH, carried out in three steps: 1) data collection of morphological syntactic features and dialect-specific vocabulary, 2) candidate retrieval of simple, deterministic morphosyntactic substitutions for unidirectional translations, and 3) human evaluation to test contextual accuracy of perturbations generated by CODESWITCH. 6.3.1 Data Collection First, to gain an better understanding of AAE language, we engage with literature, sample text examples and mass collect morpho-syntax rules (which we adapt from the literature) (see Appendix E.2) [4, 56, 11, 33, 13, 127, 14, 46]. Therefore, we attempt a proactive approach in data-collection of grammatical, structural and syntactic rules of word case usage of AAE language features to understand the application of AAE in NLP downstream tasks. Next, we employ and assist 6 trained 52 sociolinguist Amazon Mechanical Turk (AMT) workers2 with our collected set rules and text examples. Pairwise Sample Collection We first randomly sample n = 5000 SAE premise-hypothesis sentence pairs that contain at least 8 words from both SNLI and MNLI corpora for a total of 10,000 sentence pairs. For contextual accuracy, we task the first 3 workers to obtain the AAE equivalents of our SAE samples (see Table 6.1), where each annotator is tasked to translate each SAE sentence pair into AAE. The full annotation guidelines can be seen in Appendix E.3. 6.3.2 Candidate Retrieval Starting from data collection, we next retrieve candidate phrases and words use cases for data augmentation from our obtained AAE equivalent sentence pairs. As [88] uses a deep text classification model to illustrate that demographic language features do in fact influence models to produce false predictions on semantically equivalent SAE and AAE texts, our protocol follows simple, deterministic substitutions of English texts by dialect-specific vocabulary. To do so, we make use of both SAE and AAE sentence pairs in a pairwise fashion and construct a unidirectional informed-based translative morpho-syntax protocol (TMsP) that enables CODESWITCH to convert any given SAE text to a text possessing adequate language features to be considered as AAE from a dominant AAE speaker. More details on TMsP can be found in Appendix E.2). Obtaining new texts for downstream tasks from authors of certain demographic groups is time-consuming and requires heavy human labor [88, 33]. Therefore, we create CODESWITCH (see Algorithm 1), a greedy unidirectional morphosyntactically-informed rule-based translation method which is not only fast, but also functions as a human-in-the-loop paradigm; therefore, drastically reduces heavy human labor. Our approach for intent-and-semantically equivalent AAE data augmentation is intuitively simple and effective. Consequently, we can now explore code- switching in several NLP tasks to determine if LLMs such as MLMs learn to make predictions 2Each AMT worker is independent and a trained sociolinguist filtered by HIT approval rate ≥ 96%, completed > 10,000 HITs and location (within the United States) 53 Algorithm 1 The translative syntactic morphological method for CODESWITCH. Input: Original SAE sequence 𝑥 Output: Translated AAE sequence 𝑥′ begin function Load SAE input sequence → 𝑥 𝑥 ← LOWER(𝑥) 𝑇 ← TOKENIZE(𝑥) for all 𝑖 = 1, 2, ..., |𝑇 | do if 𝑖 ∈ {TMsP} then 𝑇ˆ𝑖←CODESWITCH(𝑖) end if end for 𝑥′ ← DETOKENIZE(𝑇) return 𝑥′ end function based on demographic/ dialectal language features. We represent each original NLI corpus as 𝐷 < 𝑃, 𝐻, 𝐿 > with 𝑝 ∈ 𝑃 as the premise, ℎ ∈ 𝐻 as the hypothesis and, lastly, 𝑙 ∈ 𝐿 as the label, and create two augmented datasets i.e., SNLI AAE and MNLI AAE, where we represent each augmented NLI dataset as 𝐷′ < 𝑃′, 𝐻′, 𝐿 >. Specifically, translate each premise-hypothesis pair to AAE and keep the original label unchanged to form a new instance. It is important to note that the task of CODESWITCH is to ensure both sets of datasets i.e., 𝐷 and 𝐷′ maintain their contextual accuracy, although they consist of two different language styles (see Table 6.2). 6.3.3 Human Evaluation After an initial training of the AMT annotators with our annotation guidelines, we implement a minor calibration study by tasking the remaining 3 independent workers to test our AAE data augmentation method. We randomly sample 200 SAE/AAE sentence pair examples from each of the 4 datasets, for a total of 800 sentence pairs (or 1600 SAE/AAE sentences). The workers were asked to indicate (1) whether the AAE sentences are written by an L1 (or dominant) AAE speaker, or most likely to be machine generated (MG); and (2) whether or not their contextual accuracy is maintained. For content analysis to ensure the quality of our AAE samples and to quantify the extent of agreement between raters, we first let 3 annotators independently rate each AAE-generated 54 sentence pair as “Native” or “MG”, then we measure the inter-annotator agreement (IAA) using Krippendorff’s 𝛼. We calculate an inter-rater reliability of 0.82, and did not observe significant differences in agreement across the individual sentences. Qualitative analysis revealed that generated samples resembled sequences written by L1 AAE speakers, whereas few samples were classified as most likely MG. Annotators informed us of particular morpho-syntax cases, for example, constant copula deletion of the verb “be” and its variants, namely “is” and “are” is irregular and often inserted last in word order. This indicates that CODESWITCH does not account for contextual instances when generating AAE samples, hence being classified as most likely MG. 6.4 Empirical Study and Analysis In this section, we conduct a preliminary study to substantiate the existence of inherent linguistic bias in NLI models. We introduce the base NLI models and training details, and then we demonstrate our empirical results. To illustrate inherent linguistic bias of two distinct linguistic systems, we introduce a representative MLM, namely, BERT [41] (see Appendix E.1 for more details). Table 6.3: Model performance when tested on AAE data. Model Performance (%) Models BERTBASE BERTLARGE SNLI SAE AAE 90.12 90.46 86 74.55 Diff. -4.12 -15.91 MNLI SAE AAE 79.79 84.77 67.35 84.47 Diff. -4.68 -17.12 We use each original dataset i.e., SNLI and MNLI to fine-tune both BERT models on a batch size of 32 using an AdamW optimizer with a learning rate of 2e-5 and default betas (𝛽1 = 0.9, 𝛽2 = 0.999) for 3 epochs. Our experiments display that pretrained MLMs “are only as good as the data they are trained on” and are unable to make fair predictions [131]. In Table 6.3, we see that the lack of diverse training data results in disparities in model performance in MLMs, which may be significantly be intensified as models become more complex. In Table 6.4, we illustrate several examples on the inherent linguistic bias on account of demographic language features, and can 55 Table 6.4: An illustration of inherent linguistic bias between AAE and their respective SAE counterpart (see Appendix E.2). Premise Dis church choir sings ta da masses as dey sing joyous songs from da book at a church. Dis church choir sings ta da masses as dey sing joyous songs from da book at a church. Dis church choir sings ta da masses as dey sing joyous songs from da book at a church. A woman wit a green headscarf, blue shirtn a very big grin. A woman wit a green headscarf, blue shirt n a very big grin. Hypothesis Label Prediction Da church filled wit song. Entailment Neutral Da church has cracks in da ceiling. Neutral Contradiction A choir singin at a baseball game. Contradiction Entailment Da woman young. Neutral Contradiction Da woman very happy. Entailment Neutral conclude that demographic/ dialectal language features do in fact influence models to produce false predictions. 6.5 Debiasing Methods In Section 6.4, we empirically demonstrate that popular NLI models show significant bias towards AAE by underperforming on them than SAE. A natural question arises: how can we remove the biases in NLI models towards different language styles? To solve this problem, we introduce two simple but effective debiasing strategies: (1) counterpart data augmentation (CDA); and (2) language Style disentanglement (LSD). 6.5.1 Counterpart Data Augmentation The bias of NLI models originates from the training data. Since the training data contains only SAE, the NLI models trained on such data does not understand the unique vocabulary and grammar of AAE, which leads to poor performance. Thus, we propose to implement CODESWITCH to augment the original SAE training data by translating them to their AAE counterparts and in turn implement CDA strategy similar to [158, 163]. Then, we will get a large augmented training dataset, 𝐷+, which is twice the size of the original datasets (i.e., SNLI) as it contains both 𝐷 and 𝐷′. 6.5.2 Language Style Disentanglement For two texts with the similar intent and semantic content of different language styles (e.g. SAE v.s. AAE), an NLI model may tend to make biased predictions towards one style. The 56 immediate reason is that the NLI prediction are based on the language style features, instead of relying solely on the semantic features of the texts. Based on this consideration, we propose LSD, an in-processing debiasing method, which tries to disentangle the language style features from the semantic features in text representations and forces the NLI model to make inference on the pure semantic representations. 6.5.2.1 The LSD Framework To achieve disentanglement, we adopt the idea of adversarial learning. Figure 6.1 illustrates the overall framework of LSD. We view the framework as three parts: (1) the BERT model that encodes a premise-hypothesis pair as a fixed-dimensional representation E[𝐶 𝐿𝑆]; (2) a feed-forward neural (FFN) classifier C that takes E[𝐶 𝐿𝑆] as input to predict the inferential relationship between the premise and the hypothesis; and (3) a FFN discriminator D that predicts whether the sentence pair is SAE or AAE based on E[𝐶 𝐿𝑆]. Via adversarial learning, our goal is to build a BERT model that can produce an accurate semantic representation of the text pair so that the classifier C can make correct predictions based on it, while the representation is free from the language style features of the texts, so that the discriminator D cannot distinguish whether the texts are from 𝐷 or 𝐷′. 𝑖=1 and Validation data V = {< 𝑃𝑖, 𝐻𝑖, 𝐿𝑖, 𝑆𝑖 >} |V| 𝑖=1 Algorithm 2 The optimization method for the LSD framework. Input: Training data T = {< 𝑃𝑖, 𝐻𝑖, 𝐿𝑖, 𝑆𝑖 >} |T| Output: BERT parameters WBERT, classifier parameters WC Load pre-trained parameters WBERT Initialize WC and WD 1: for 𝑁 epochs do 2: 3: 4: 5: 6: 7: 8: end for Run the BERT model and the classifier C on validation data V Save parameters WBERT and WC if achieving the best validation performance so far. Obtain a mini-batch of training data B from T Update WD by optimizing 𝐿D in Equation 6.1 Update WBERT and WC by optimizing 𝐿 in Equation 6.2 for 𝑀 batches do 9: end for 57 Figure 6.1: An illustration of the language-style disentanglement model. 6.5.2.2 An Optimization Method We present our optimization algorithm for the LSD framework in Algorithm 2. We train the framework on the augmented training dataset obtained via our CODESWITCH method as we do in CDA. In the training data T = {< 𝑃𝑖, 𝐻𝑖, 𝐿𝑖, 𝑆𝑖 >}|T| 𝑖=1, each instance consists of a premise 𝑝, a hypothesis ℎ, a label 𝑙, and a binary language style label 𝑆 ∈ {SAE, AAE}. At the beginning, we first load pretrained BERT parameters, and initialize the parameters of the classifier C and the discriminator D (line 3-4). In each iteration, we first obtain a mini-batch of training data B = {< 𝑃𝑖, 𝐻𝑖, 𝐿𝑖, 𝑆𝑖 >}|B| 𝑖=1 (line 3). Then, we update the discriminator D by minimizing the following cross-entropy loss (line 4): 𝐿D = −(I{𝑆 = 0} log 𝑝D 0 + I{𝑆 = 1} log 𝑝D 1 ) (6.1) where 𝑆 is the language style label of the utterance. 𝑆 = 0 represents for SAE and 𝑆 = 1 represents for AAE. 𝑝D 0 and 𝑝D 1 are the two elements in the predicted probability pD from the discriminator D. 58 [CLS]Tok1TokN⋯[SEP]Tok1TokN⋯PremiseHypothesisBERTE[CLS]E1EN⋯E[SEP]E1EN⋯E[CLS]ClassifierDiscriminatorSAEAAE?entailmentcontradictionneutral Minimizing 𝐿D will force D to make correct predictions. Next, we calculate the cross-entropy loss on the main prediction task: 𝐿C = −(I{𝐿 = 0} log 𝑝C 0 + I{𝐿 = 1} log 𝑝C 1 + I{𝐿 = 2} log 𝑝C 2 ) where 𝐿 is the set of labels of the NLI task. 𝑆 = 0, 1, 2 represent for entailment, contradiction, and neutral, respectively. 𝑝C 𝑗 indicates the predicted probability for the 𝑗-th label from the classifier C. Minimizing 𝐿C will force C to make correct predictions. To ensure that the BERT model produces a text representation that can fool the discriminator, when training, we consider another entropy loss: 𝐿D′ = −( 𝑝D 0 log 𝑝D 0 + 𝑝D 1 log 𝑝D 1 ) 𝐿D′ is the entropy of the predicted distribution pD from the discriminator. Minimizing it makes pD close to an even distribution, preventing D from making correct predictions. We update the BERT model and the classifier by minimizing the following combined loss (line 5): 𝐿 = 𝐿C + 𝐿D′ (6.2) At the end of each epoch, we run the BERT model and the classifier on the validation data, and save their parameters if they achieve the best validation performance. 6.5.3 Experimental results In Table 6.5, we show the performances of the two debiasing methods on two datasets in terms of two BERT models. In Table 6.3, the results of the debiased models CDA, LSD and that of the original models were compared. Note that our two debiasing methods reduce the gap between the performances on SAE and AAE significantly. The original BERT models perform well on SAE test data but exhibit a decrease in performance when they are tested on AAE data. However, the BERT models trained under CDA or LSD debiasing strategies achieve similar model performance on SAE and AAE, which demonstrates the effectiveness of the two debiasing methods to mitigate bias in NLI models. 59 Table 6.5: Model performances of two debiased NLI models. Model Performance (%) SNLI SAE AAE Diff. 89.76 89.77 -0.01 90.49 +0.14 90.35 -0.12 90.36 90.48 -0.07 90.53 90.60 MultiNLI SAE AAE Diff. -0.31 83.98 84.29 -0.69 83.81 84.50 -0.46 84.20 84.66 -0.42 84.30 84.72 Models CDABASE LSDBASE CDALARGE LSDLARGE Furthermore, our debiased models not only improve the performance on AAE data, but also maintain similar performance on SAE data as the original model. This is due to either the introduction of additional AAE training data which is not always available, and the disentanglement between the semantic and language style features of texts enhancing the model’s capability of understanding natural language. Lastly, we find that LSD generally outperforms CDA on both SAE and AAE data. In addition, LSD is an adversarial learning debaising method that filters out irrelevant language style information towards the NLI task. In fact, LSD is also generalizable for more effective and architecturally similar models such as DeBERTa [61], XLNet [144], and T5 [109] to ensure fairness as well as robustifying larger language models. 6.6 Related Work Previous works focus on AAE in the context of racial bias as a result of systemic biases in model performance. For example, [16] focus on dependency parsing social media AAE to analyze the impacts of performance disparities between AAE and SAE tweets. Other works undertake AAE within the scope of detecting and mitigating the presence of racial bias in areas of offensive and abusive language detection [85, 120], sentiment analysis [57] and hate speech detection [39, 120]. However, these influential works do not engage with AAE literature, utilize a human-in-the-loop paradigm nor employ the humans who create such data. Thus, these pivotal works fail to understand AAE’s phonological and morphological language features—thereby simply treating AAE as another non-Penn Treebank English variety [13]. Fairness in NLP. As social and racial disparities have become a compelling issue within the NLP community, focal topics of fairness, accountability, ethics, sustainable development, etc., have 60 gained momentous attention in recent years [64]. Recent work on fairness has primarily been focused on racial and gender biases in distributed word representations [17, 158, 163], coreference resolution [117], sentence encoders [95], machine translation [132, 107], and dialogue generation [85, 89]. Adversarial learning in NLP. Adversarial examples were initially explored in computer vision by [130], where these examples were intended to influence models to produce false predictions. However, in NLP, adversarial examples can occur at a phonetic, phonological, morphological, syntactic, semantic, or pragmatic level [131, 40, 51, 149]. [85] displays that dialogue systems are prone to produce offensive responses when fed AAE language features in comparison to SAE, whereas [89] propose a novel adversarial learning framework which directly addresses the issue of gender bias in dialogue models while maintaining their performance. Both [1] and [73] exploit the notion of adversariality by utilizing word embeddings to find the k nearest synonymic examples. Summary. These influential works demonstrate novel adversarial learning methodologies on a character and/or word-level in order to address bias issues surrounding protected attributes such as race and gender by improving model robustness. Similarly, our work utilizes a human-in-the-loop paradigm by employing humans who create such data, to create a novel morphosyntactic method to perturb language styles on a syntactic-level to highlight the need for dialectal language diversity in training data. 6.7 Conclusion and Future Works To address compelling fairness, accountability, transparency, and ethical concerns surrounding the sustainability of language use in NLP applications, we claim that the addition of diverse dialectal language in training data will improve model robustness and generalizability. Our findings show that our proposed debiasing methods not only improves the performance on AAE data but effectively reduces the performance gap between SAE and AAE significantly, while maintaining or even improving the prediction performance on SAE data. Therefore, training under these two debiasing strategies aids in the mitigation of linguistic bias in NLI models. We conclude that though similar, the two language styles, SAE and AAE are not identical, 61 and thus, should not solely be evaluated against each other, but compared to as a basis of model performance minimize the existence of inherent linguistic bias in language models. In the future, we intend to release CODESWITCH a morphosyntactically-informed rule-based translation method for unidirectional data augmentation for generating intent-and-semantically-equivalent AAE examples as a public python package, to encourage further computational linguistic research into debiasing various NLP systems. We actively intend on updating CODESWITCH s.t. it can include new or regional-specific lingo. In this way, CODESWITCH can constitute potential groundwork on ways that AAE can effectively be integrated in NLP systems to improve future language models during their development and employment. 6.8 Limitations And Ethical Considerations All authors must warrant mentioning that the increased performance for underrepresented dialects in NLP systems has the potential to enable automated discrimination based on the use of non-standard dialects. Although, we attempt to highlight the need for dialectal inclusivity for impactful speech and language technologies, we do not intend for increased feelings of marginalization of an already stigmatized community. We have established our method’s effectiveness for data augmentation for generating intent-and- semantically-equivalent AAE examples and believe that CODESWITCH could be further improved by addressing the following limitations: 1. Currently, CODESWITCH is a unidirectional data augmentation method and cannot be used in reverse as a deterministic text normalization/preprocessing system which can convert all text to SAE. 2. CODESWITCH operates on simple, deterministic substitutions for morphosyntactically- informed translations rules found in Appendix E.2 rather than that of real L1 and L2 AAE speakers, which may result in the lack of several formal/informal phrases, expressions, idioms, cultural and regional-specific lingo, and slang-related words [13]. For example, “I sholl was finna ask who money dat is ”, where “sholl” refer to the replacement of the word “sure”. 62 3. Although CODESWITCH possesses several simple, deterministic morphosyntactically- informed translation rules it does account for contextual instances of accurate copula deletion. This may lead to a discrepancy between actual text written by L1 and/or L2 AAE speakers and our proposed data augmentation method. In the future, we intend to address these limitations and ethical considerations by partnering with AAE diglossic communities in hopes of robustifying CODESWITCH to be probabilistic rather than deterministic to capture different AAE variants of the same SAE term (for example, the AAE equivalents to “what’s” → “waz” or “wus” or “wats”. In addition, we will investigate inherent linguistic bias in other NLP applications. 63 CHAPTER 7 CONCLUSION 7.1 Dissertation Summary First, in Chapter 2, we introduce our first case study, namely, Gender, Race, Language and Social Justice, we conduct a pioneering study about the fairness issues concerning both gender and racial biases in two popular dialogue models, i.e., generative and retrieval dialogue models as a joint problem. We detect and demonstrate performance disparities between sequence generation between gender (male/female) and racial (white/black) responses, respectively. To address this aforementioned issue, we propose two simple but effective debiasing methods to reduce these disparities and to better facilitate issues surrounding social justice and fairness in dialogue generation. Next, in the case study, Gender and Sexual Identities, Orientations and Expressions, we examine predictive biases of both binary (male and female) representations in Chapter 3 and LGBTQIA2S+ representations in Chapter 4, respectfully. In Chapter 3, we construct two large benchmark datasets: (1) possessive (gender-specific and gender-neutral) nouns dataset and (2) attribute (career-related and family-related) words dataset to systematically conduct large scale analyses of each news corpora to detect and examine gender biases in distribution, content, and labeling and word choice, and demonstrate that societal gender biases in regards to gender roles in society. Moreover, we learn that gender is a progressive spectrum attempting to include all gender and sexual identities and orientations. However, individuals who may oppose, criticize, or possess contradictory feelings, beliefs, or motivations towards certain communities constitute discrimination, harassment, and abuse in the form of hate speech, abusive and offensive language use [35, 37, 12, 8]. To address the above challenges, in Chapter 4, we aim to intersect NLP, gender and queerness to readily identify and detect online sexist and Anti-LGBTQIA2S+ content. To the best of our knowledge, our dataset is first dataset for scientists, practitioners and researchers to study stereotypical social biases against this already marginalized community in hopes towards inclusivity. Later, in the case study, Language, Race and Culture is divided into two folds. In Chapter 64 5, we incorporate a human-in-the-loop paradigm to address the issue of lexical, semantic and syntactic ambiguity of African American English (AAE) use in traditional off-the-shelf models and well-known large language models (LLMs). We further propose a dataset of tweets containing labeled AAE morphosyntactic lexical features in order to enable sociolinguistic, raciolinguistics and dialectologists analysis of morphosyntactic variation in AAE. We provide a normative foundation for reasoning about harms arising from NLP systems that we have shown is largely absent from the current literature. In Chapter 6, we conduct a pioneering study of robustifying large language models (LLMs) to minimize false predictions (or error disparities) by introducing dialectal language to diversify training data to provide a normative foundation for reasoning about cultural and linguistic harms arising from state-of-the art NLP systems. To substantiate the existence of inherent linguistic bias in LLMs, we attempt a dynamic approach to generate an AAE sentence pair dataset with a wide range of morphological syntactic features and dialect-specific vocabulary to illustrate that the lack of diverse training data results in disparities in model performance. To do so, we construct CodeSwitch, a greedy unidirectional morphosyntactic translation method for data augmentation to generate intent-and-semantically equivalent AAE examples by perturbing SAE examples. Finally, in Chapter 7 concludes our works and we identify possible future directions drawing on work across Trustworthy AI and its applications, in an attempt to provide a foundation through an account of the relationships between language and injustice to develop a unified view and to build AI tools that combine the best characteristics of all. 7.2 Future Work I am extremely thrilled by the potential of my research area for current and future technology. The ultimate goal is to develop a unified view for all and to build tools that combine the best characteristics of all. Moreover, I plan to extend the scope of incremental pattern discovery framework in various directions, for example, creating safe and trustworthy AI technologies for people with disabilities (PWDs). In the near future, I plan to work with faculty, students, research scientists as my long-term career goal is to develop advanced technologies for various applications of FATE in AI, ML and 65 NLP to develop Trustworthy AI technology. To ensure that these research advances and innovations have a positive impact on society I intend to continue my inclusive approach of directly involving affected (user) communities to address social issues surrounding social biases (e.g. gender and racial biases). For example, tackling several problems such as the gender gap in facial recognition systems and facial recognition disparities across demographics, understanding dialect disparities in Natural Language Understanding (NLU), disambiguation of morphosyntactic features of AAE–the case of habitual “be”, improving genetic risk prediction across diverse population by disentangling ancestry representations, analyzing hate speech data along a racial, gender and intersectional axes, misinformation, etc. In the futher future, I aim to explore offensive language mitigation solutions towards members of the LGBTQIA2S+ community with the objective of creating a safe and inclusive place that welcomes, supports, and values all LGBTQIA2S+ individuals, activists and allies both online and offline. 7.3 Concluding Remarks The goal of NLP is to process language at a human level. However, NLP’s current approach- ignoring social factors-prevents us from reaching human-level competence and performance since language is more than just informational content. As such, I will continuously evaluate my research, collaborating with technical and non-technical audiences to gain new perspectives, and challenge myself to improve daily. I will also maintain active interest in related research areas, from which I derive a rich supply of ideas and techniques to tackle new and existing problems. By working at the edges between theory and practice, I hope to make unique and lasting contributions to the social and scientific communities as I believe to create new and innovative technology tomorrow, we need to start today. 66 BIBLIOGRAPHY [1] Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. [2] Mohadeseh Amini and Parviz Birjandi. Gender bias in the iranian high school efl textbooks. English Language Teaching, 5(2):134–147, 2012. [3] [4] [5] [6] [7] [8] [9] Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner. Scalable fair clustering. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 405–413, 2019. Guy Bailey, John Baughan, Salikoko S. Mufwene, and John R. Rickford. African-American English: Structure, History and Use (1st ed.). Routledge, 1998. Soumya Barikeri, Anne Lauscher, Ivan Vulić, and Goran Glavaš. RedditBias: A real-world resource for bias evaluation and debiasing of conversational language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, August 2021. Association for Computational Linguistics. John Baugh. Linguistic discrimination. In 1. Halbband, pages 709–714. De Gruyter Mouton, 2008. Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael J. Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. A convex framework for fair regression. CoRR, abs/1706.02409, 2017. Federico Bianchi and Dirk Hovy. On the gap between adoption and understanding in NLP. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3895–3901, Online, August 2021. Association for Computational Linguistics. Steven Bird. NLTK: the natural language toolkit. In ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006, 2006. [10] Steven Bird. Decolonising speech and language technology. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3504–3519, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. [11] Linda M. Bland-Stewart. Difference or deficit in speakers of african american english? https://leader.pubs.asha.org/doi/10.1044/leader.FTR1.10062005.6, May 2005. 67 [12] Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online, July 2020. Association for Computational Linguistics. [13] Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna M. Wallach. Language (technology) is power: A critical survey of "bias" in NLP. CoRR, abs/2005.14050, 2020. [14] Su Lin Blodgett, Lisa Green, and Brendan O’Connor. Demographic dialectal variation in social media: A case study of African-American English. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1119–1130, Austin, Texas, November 2016. Association for Computational Linguistics. [15] Su Lin Blodgett and Brendan O’Connor. Racial disparity in natural language processing: A case study of social media african-american english. CoRR, abs/1707.00061, 2017. [16] Su Lin Blodgett, Johnny Wei, and Brendan O’Connor. Twitter Universal Dependency parsing for African-American and mainstream American English. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1415–1425, Melbourne, Australia, July 2018. Association for Computational Linguistics. [17] Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 4356–4364, 2016. [18] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 4349–4357. Curran Associates, Inc., 2016. [19] Shikha Bordia and Samuel R. Bowman. Identifying and reducing gender bias in word-level language models. CoRR, abs/1904.03035, 2019. [20] Shikha Bordia and Samuel R. Bowman. Identifying and reducing gender bias in word-level language models, 2019. [21] Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. Nuanced metrics for measuring unintended bias with real data for text classification. In Companion Proceedings of The 2019 World Wide Web Conference, pages 491–500, 2019. [22] Avishek Joey Bose and William Hamilton. Compositional fairness constraints for graph embeddings. CoRR, abs/1905.10674, 2019. [23] Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large 68 annotated corpus for learning natural language inference. CoRR, abs/1508.05326, 2015. [24] Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. Semantics derived automatically from language corpora contain human-like biases. American Association for the Advancement of Science, 2017. [25] Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. Semantics derived automatically from language corpora contain human-like biases. 356:183–186, April 2017. [26] Bharathi Raja Chakravarthi, Ruba Priyadharshini, Rahul Ponnusamy, Prasanna Kumar Kumaresan, Kayalvizhi Sampath, Durairaj Thenmozhi, Sathiyaraj Thangasamy, Rajendran Nallathambi, and John Philip McCrae. Dataset for identification of homophobia and transophobia in multilingual youtube comments. CoRR, abs/2109.00227, 2021. [27] Hongshen Chen, Xiaorui Liu, Dawei Yin, and Jiliang Tang. A survey on dialogue systems: Recent advances and new frontiers. CoRR, abs/1711.01731, 2017. [28] Xingyu Chen, Brandon Fain, Liang Lyu, and Kamesh Munagala. Proportionally fair clustering. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 1032–1041, 2019. [29] Won Ik Cho, Ji Won Kim, Seok Min Kim, and Nam Soo Kim. On measuring gender bias in translation of gender-neutral pronouns, 2019. [30] Herbert H. Clark and Michael F. Schober. Asking questions and influencing answers. In Russell Sage Foundation, 1992. [31] Steven R. Corman, Timothy Kuhn, Robert D. Mcphee, and Kevin J. Dooley. Studying complex discursive systems. Human Communication Research. [32] [33] [34] [35] Jamell Dacon. Recommender-System-Datasets, 2020. Recommender system datasets. https://github.com/daconjam/ Jamell Dacon. Towards a deep multi-layered dialectal language analysis: A case study In Proceedings of the Second Workshop on Bridging of African-American English. Human–Computer Interaction and Natural Language Processing, pages 55–63, Seattle, Wash- ington, July 2022. Association for Computational Linguistics. Jamell Dacon and Haochen Liu. Does gender matter in the news? detecting and examining gender bias in news articles. In Companion Proceedings of the Web Conference 2021, WWW ’21, page 385–392, New York, NY, USA, 2021. Association for Computing Machinery. Jamell Dacon and Haochen Liu. Does gender matter in the news? detecting and examining gender bias in news articles. New York, NY, USA, 2021. Association for Computing Machinery. 69 [36] [37] Jamell Dacon, Harry Shomer, Shaylynn Crum-Dacon, and Jiliang Tang. Detecting harmful online conversational content towards lgbtqia+ individuals. In Queer in AI Workshop at NAACL, 2022. Jamell Dacon and Jiliang Tang. What truly matters? using linguistic cues for analyzing the #blacklivesmatter movement and its counter protests: 2013 to 2020. CoRR, abs/2109.12192, 2021. [38] Thomas Davidson and Debasmita Bhattacharya. Examining racial bias in an online abuse corpus with structural topic modeling. CoRR, abs/2005.13041, 2020. [39] Thomas Davidson, Debasmita Bhattacharya, and Ingmar Weber. Racial bias in hate speech and abusive language detection datasets. In Proceedings of the Third Workshop on Abusive Language Online, pages 25–35, Florence, Italy, August 2019. Association for Computational Linguistics. [40] Charles E. DeBose. Codeswitching: Black english and standard english in the african- Journal of Multilingual and Multicultural Development, american linguistic repertoire. 13(1-2):157–167, 1992. [41] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018. [42] Emily Dinan, Angela Fan, Adina Williams, Jack Urbanek, Douwe Kiela, and Jason Weston. Queens are powerful too: Mitigating gender bias in dialogue generation, 2020. [43] Emily Dinan, Samuel Humeau, Bharath Chintagunta, and Jason Weston. Build it break it fix it for dialogue safety: Robustness from adversarial human attack. CoRR, abs/1908.06083, 2019. [44] Jaschar Domann, Jens Meiners, Lea Helmers, and A. Lommatzsch. Real-time news recom- mendations using apache spark. In CLEF, 2016. [45] Rachel Dorn. Dialect-specific models for automatic speech recognition of African American Vernacular English. In Proceedings of the Student Research Workshop Associated with RANLP 2019, pages 16–20, Varna, Bulgaria, September 2019. INCOMA Ltd. [46] Yanai Elazar and Yoav Goldberg. Adversarial removal of demographic attributes from text data. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 11–21, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. [47] Lisa Fan, Marshall White, Eva Sharma, Ruisi Su, Prafulla Kumar Choubey, Ruihong Huang, and Lu Wang. In plain sight: Media bias through the lens of factual reporting. In EMNLP-IJCNLP, pages 6343–6349, Hong Kong, China, November 2019. 70 [48] Anjalie Field, Su Lin Blodgett, Zeerak Waseem, and Yulia Tsvetkov. A survey of race, racism, and anti-racism in NLP. CoRR, abs/2106.11410, 2021. [49] Joel Escudé Font and Marta R Costa-Jussa. Equalizing gender biases in neural machine translation with word embeddings techniques. arXiv preprint arXiv:1901.03116, 2019. [50] Jianfeng Gao, Michel Galley, and Lihong Li. Neural approaches to conversational AI. Foundations and Trends in Information Retrieval, 13(2-3):127–298, 2019. [51] Penelope Gardner-Chloros et al. Code-switching. Cambridge university press, 2009. [52] Danielle Gaucher, Justin Friesen, and Aaron C Kay. Evidence that gendered wording in job advertisements exists and sustains gender inequality. Journal of personality and social psychology, 101(1):109, 2011. [53] Hila Gonen and Yoav Goldberg. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv preprint arXiv:1903.03862, 2019. [54] Alex Graves and Jürgen Schmidhuber. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5):602–610, 2005. IJCNN 2005. [55] Jonathon Green. The vulgar tongue: Green’s history of slang. Oxford University Press, New York, USA, 2014. [56] Lisa J. Green. African American English: A Linguistic Introduction. Cambridge University Press, 2002. [57] Sophie Groenwold, Lily Ou, Aesha Parekh, Samhita Honnavalli, Sharon Levy, Diba Mirza, and William Yang Wang. Investigating African-American Vernacular English in transformer-based text generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5877–5883, Online, November 2020. Association for Computational Linguistics. [58] Matan Halevy, Camille Harris, Amy Bruckman, Diyi Yang, and Ayanna Howard. Mitigating racial biases in toxic language detection with an equity-based ensemble framework. New York, NY, USA, 2021. Association for Computing Machinery. [59] Felix Hamborg, Karsten Donnay, and Bela Gipp. Automated identification of media bias International Journal on Digital in news articles: an interdisciplinary literature review. Libraries, pages 1–25, 2018. [60] Laura Hanu and Unitary team. Detoxify. Github. https://github.com/unitaryai/detoxify, 2020. 71 [61] Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced BERT with disentangled attention. CoRR, abs/2006.03654, 2020. [62] Peter Henderson, Koustuv Sinha, Nicolas Angelard-Gontier, Nan Rosemary Ke, Genevieve Fried, Ryan Lowe, and Joelle Pineau. Ethical challenges in data-driven dialogue systems. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2018, New Orleans, LA, USA, February 02-03, 2018, pages 123–129, 2018. [63] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, November 1997. [64] Dirk Hovy and Shannon L. Spruit. The social impact of natural language processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 591–598, Berlin, Germany, August 2016. Association for Computational Linguistics. [65] Ayanna Howard and Jason Borenstein. The ugly truth about ourselves and our robot creations: the problem of bias and social inequity. Science and engineering ethics, 24(5):1521–1536, 2018. [66] C. Hutto and E. Gilbert. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In ICWSM, 2014. [67] Clayton J. Hutto and Eric Gilbert. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1-4, 2014., 2014. [68] Aylin Caliskan Islam, Joanna J. Bryson, and Arvind Narayanan. Semantics derived auto- matically from language corpora necessarily contain human biases. CoRR, abs/1608.07187, 2016. [69] Sen Jia, Thomas Lansdall-Welfare, and Nello Cristianini. Measuring gender bias in news images. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15 Companion. Association for Computing Machinery, 2015. [70] Sen Jia, Thomas Lansdall-Welfare, Saatviga Sudhahar, Cynthia Carter, and Nello Cristianini. Women are seen more than heard in online newspapers. PloS one, 11:e0148434, 02 2016. [71] Taylor Jones. Toward a description of african american vernacular english dialect regions using “black twitter”. American Speech, 90:403–440, 11 2015. [72] Anna Jørgensen, Dirk Hovy, and Anders Søgaard. Learning a POS tagger for AAVE- like language. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 72 1115–1120, San Diego, California, June 2016. Association for Computational Linguistics. [73] Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. Spanbert: Improving pre-training by representing and predicting spans. CoRR, abs/1907.10529, 2019. [74] Dan Jurafsky and James H. Martin. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition. Prentice Hall series in artificial intelligence. Prentice Hall, Pearson Education International, 2009. [75] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classifier with prejudice remover regularizer. In Proceedings of the 2012th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II, ECMLPKDD’12, pages 35–50, Berlin, Heidelberg, 2012. Springer-Verlag. [76] Sharese King. From african american vernacular english to african american language: Rethinking the study of race and language in african americans’ speech. Annual Review of Linguistics, 6(1):285–300, 2020. [77] Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Menge- sha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. Racial dispari- ties in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14):7684–7689, 2020. [78] Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Menge- sha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. Racial dispari- ties in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14):7684–7689, 2020. [79] William Labov. Ralph fasold, tense marking in black english: a linguistic and social analysis. washington, d.c.: Center for applied linguistics, 1972. pp. 254. Language in Society, 4(2):222–227, 1975. [80] John Lafferty, Andrew McCallum, and Fernando CN Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. [81] Andrienne Lafrance. I analyzed a year of my reporting for gender bias (again). https://www. theatlantic.com/technology/archive/2016/02/gender-diversity-journalism/463023/, February 2016. [82] Xiaolan Lei. Sexism in language. Journal of Language and Linguistics, 5(1):87–94, 2006. [83] Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. A diversity- promoting objective function for neural conversation models. In NAACL HLT 2016, The 73 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pages 110–119, 2016. [84] Wang Ling, Chris Dyer, Alan W Black, Isabel Trancoso, Ramón Fermandez, Silvio Amir, Luís Marujo, and Tiago Luís. Finding function in form: Compositional character models for open vocabulary word representation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1520–1530, Lisbon, Portugal, September 2015. Association for Computational Linguistics. [85] Haochen Liu, Jamell Dacon, Wenqi Fan, Hui Liu, Zitao Liu, and Jiliang Tang. Does gender matter? towards fairness in dialogue systems. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4403–4416, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. [86] Haochen Liu, Jamell Dacon, Wenqi Fan, Hui Liu, Zitao Liu, and Jiliang Tang. Does gender matter? towards fairness in dialogue systems, 2020. [87] Haochen Liu, Tyler Derr, Zitao Liu, and Jiliang Tang. Say what I want: Towards the dark side of neural dialogue models. CoRR, abs/1909.06044, 2019. [88] Haochen Liu, Wei Jin, Hamid Karimi, Zitao Liu, and Jiliang Tang. The authors matter: Understanding and mitigating implicit bias in deep text classification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 74–85, Online, August 2021. Association for Computational Linguistics. [89] Haochen Liu, Wentao Wang, Yiqi Wang, Hui Liu, Zitao Liu, and Jiliang Tang. Mitigating gender bias for neural dialogue generation with adversarial learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 893–903, Online, November 2020. Association for Computational Linguistics. [90] Edward Loper and Steven Bird. NLTK: The natural language toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pages 63–70, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. [91] Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam Amancharla, and Anupam Datta. Gender bias in neural natural language processing. CoRR, abs/1807.11714, 2018. [92] Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam Amancharla, and Anupam Datta. Gender bias in neural natural language processing. In Logic, Language, and Security, pages 189–202. Springer, 2020. [93] Juan M Madera, Michelle R Hebl, and Randi C Martin. Gender and letters of recommendation for academia: agentic and communal differences. Journal of Applied Psychology, 94(6):1591, 74 2009. [94] Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, and Rachel Rudinger. On measuring social biases in sentence encoders. CoRR, abs/1903.10561, 2019. [95] Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, and Rachel Rudinger. On measuring social biases in sentence encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 622–628, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. [96] Paola Medel and Vahab Pournaghshband. Eliminating gender bias in computer science education materials. In Proceedings of the 2017 ACM SIGCSE technical symposium on computer science education, pages 411–416, 2017. [97] Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning. CoRR, abs/1908.09635, 2019. [98] Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635, 2019. [99] Michela Menegatti and Monica Rubini. Gender bias and sexism in language. In Oxford Research Encyclopedia of Communication. 2017. [100] Alexander H. Miller, Will Feng, Dhruv Batra, Antoine Bordes, Adam Fisch, Jiasen Lu, Devi Parikh, and Jason Weston. Parlai: A dialog research software platform. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 - System Demonstrations, pages 79–84, 2017. [101] Rishabh Misra. News category dataset, 06 2018. [102] Marzieh Mozafari, Reza Farahbakhsh, and Noël Crespi. Hate speech detection and racial bias mitigation in social media based on bert model. PLOS ONE, 15:1–26, 08 2020. [103] Ji Ho Park, Jamin Shin, and Pascale Fung. Reducing gender bias in abusive language detection. arXiv preprint arXiv:1808.07231, 2018. [104] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014. [105] Alexandra Guedes Pinto, Henrique Lopes Cardoso, Isabel Margarida Duarte, Catarina Vaz Warrot, and Rui Sousa-Silva. Biased language detection in court decisions. In International Conference on Intelligent Data Engineering and Automated Learning, pages 402–410. Springer, 2020. 75 [106] Barbara Plank, Anders Søgaard, and Yoav Goldberg. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 412–418, Berlin, Germany, August 2016. Association for Computational Linguistics. [107] Marcelo O. R. Prates, Pedro H. C. Avelar, and Luís C. Lamb. Assessing gender bias in machine translation - A case study with google translate. CoRR, abs/1809.02208, 2018. [108] Organizers of QueerInAI, Ashwin S, William Agnew, Hetvi Jethwani, and Arjun Subramonian. Rebuilding trust: Queer in ai approach to artificial intelligence risk management, 2021. [109] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR, abs/1910.10683, 2019. [110] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Semantically equivalent adver- sarial rules for debugging NLP models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 856–865, Melbourne, Australia, July 2018. Association for Computational Linguistics. [111] Jan Riebling. Centering resonance analysis using nltk and networkx. http://www. sociology-hacks.org/?p=151, 2015. [112] Alan Ritter, Colin Cherry, and William B. Dolan. Data-driven response generation in social media. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27-31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 583–593, 2011. [113] James A Rodger and Parag C Pendharkar. A field study of the impact of gender and user’s technical experience on the performance of voice-activated medical tracking application. International Journal of Human-Computer Studies, 60(5-6):529–544, 2004. [114] Adam Rose. Are face-detection cameras racist? Time Business, 2010. [115] Michael J. Rosenfeld and Reuben J. Thomas. Searching for a mate: The rise of the internet as a social intermediary. American Sociological Review, 77(4):523–547, 2012. [116] Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301, 2018. [117] Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. Gender bias in coreference resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 8–14, New Orleans, Louisiana, June 2018. Association for 76 Computational Linguistics. [118] Amrita Saha, Vardaan Pahuja, Mitesh M. Khapra, Karthik Sankaranarayanan, and Sarath Chandar. Complex sequential question answering: Towards learning to converse over linked question answer pairs with a knowledge graph. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 705–713, 2018. [119] Punyajoy Saha, Binny Mathew, Kiran Garimella, and Animesh Mukherjee. “short is the road that leads from fear to hate”: Fear speech in indian whatsapp groups. In Proceedings of the Web Conference 2021, WWW ’21, page 1110–1121, New York, NY, USA, 2021. Association for Computing Machinery. [120] Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A. Smith. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1668–1678, Florence, Italy, July 2019. Association for Computational Linguistics. [121] Roger C Schank. Conceptual dependency: A theory of natural language understanding. Cognitive psychology, 3(4):552–631, 1972. [122] Iulian Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017. [123] Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C Courville, and Joelle Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, pages 3776–3784, 2016. [124] Deven Santosh Shah, H. Andrew Schwartz, and Dirk Hovy. Predictive biases in natural language processing models: A conceptual framework and overview. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5248–5264, Online, July 2020. Association for Computational Linguistics. [125] Lifeng Shang, Zhengdong Lu, and Hang Li. Neural responding machine for short-text con- versation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pages 1577–1586, 2015. [126] Shanya Sharma, Manan Dey, and Koustuv Sinha. Evaluating gender bias in natural language 77 inference. CoRR, abs/2105.05541, 2021. [127] Ian Stewart. Now we stronger than ever: African-American English syntax in Twitter. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 31–37, Gothenburg, Sweden, April 2014. Association for Computational Linguistics. [128] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014. [129] Spencer S. Swinton. Predictive bias in graduate admissions tests. ETS Research Report Series, 1981, 1981. [130] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. [131] Samson Tan, Shafiq Joty, Min-Yen Kan, and Richard Socher. It’s morphin’ time! Combating linguistic discrimination with inflectional perturbations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2920–2935, Online, July 2020. Association for Computational Linguistics. [132] Samson Tan, Shafiq Joty, Lav Varshney, and Min-Yen Kan. Mind your inflections! Improving NLP for non-standard Englishes with Base-Inflection Encoding. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5647–5663, Online, November 2020. Association for Computational Linguistics. [133] Rachael Tatman and Conner Kasten. Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions. In Proc. Interspeech 2017, pages 934–938, 2017. [134] Songül Tolan, Marius Miron, Emilia Gómez, and Carlos Castillo. Why machine learning may lead to unfairness: Evidence from risk assessment for juvenile justice in catalonia. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, ICAIL 2019, Montreal, QC, Canada, June 17-21, 2019., pages 83–92, 2019. [135] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 6000–6010, 2017. [136] Zeerak Waseem and Dirk Hovy. Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop, pages 88–93, San Diego, California, June 2016. Association for Computational Linguistics. [137] David Manning White. The “gate keeper”: A case study in the selection of news. Journalism 78 Quarterly, 1950. [138] Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics, 2018. [139] Alden Williams. Unbiased study of television news bias. Journal of Communication, 1975. [140] Marty J. Wolf, Keith W. Miller, and Frances S. Grodzinsky. Why we should have seen that coming: comments on microsoft’s tay "experiment, " and wider implications. SIGCAS Computers and Society, 47(3):54–64, 2017. [141] Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. MIND: A large-scale dataset for news recommendation. In ACL, 2020. [142] Mengzhou Xia, Anjalie Field, and Yulia Tsvetkov. Demoting racial bias in hate speech detection. In Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media, pages 7–14, Online, July 2020. Association for Computational Linguistics. [143] Albert Xu, Eshaan Pathak, Eric Wallace, Suchin Gururangan, Maarten Sap, and Dan Klein. Detoxifying language models risks marginalizing minority voices. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2390–2397, Online, June 2021. Association for Computational Linguistics. [144] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. CoRR, abs/1906.08237, 2019. [145] Sirui Yao and Bert Huang. Beyond parity: Fairness objectives for collaborative filtering. In Advances in Neural Information Processing Systems, pages 2921–2930, 2017. [146] Wenjie Yin and Arkaitz Zubiaga. Towards generalisable hate speech detection: a review on obstacles and solutions. PeerJ Computer Science, 7:e598, 2021. [147] Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78, 2014. [148] Vershawn Ashanti Young. "nah, we straight": An argument against code switching. JAC, 29(1/2):49–76, 2009. 79 [149] Vershawn Ashanti Young and Rusty Barrett. Other people’s English: Code-meshing, code-switching, and African American literacy. Parlor Press LLC, 2018. [150] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P. Gummadi. Fairness constraints: Mechanisms for fair classification, 2015. [151] Marcos Zampieri, Preslav Nakov, and Yves Scherrer. Natural language processing for similar languages, varieties, and dialects: A survey. Natural Language Engineering, 26(6):595–612, 2020. [152] Guanhua Zhang, Bing Bai, Junqi Zhang, Kun Bai, Conghui Zhu, and Tiejun Zhao. Demo- graphics should not be the reason of toxicity: Mitigating discrimination in text classifications with instance weighting. arXiv preprint arXiv:2004.14088, 2020. [153] Guanhua Zhang, Bing Bai, Junqi Zhang, Kun Bai, Conghui Zhu, and Tiejun Zhao. Demo- graphics should not be the reason of toxicity: Mitigating discrimination in text classifications with instance weighting. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4134–4145, Online, July 2020. Association for Computa- tional Linguistics. [154] Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In NIPS, 2015. [155] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457, 2017. [156] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Gender bias in coreference resolution: Evaluation and debiasing methods. CoRR, abs/1804.06876, 2018. [157] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Gender bias in coreference resolution: Evaluation and debiasing methods, 2018. [158] Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang. Learning gender-neutral word embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 4847–4853, 2018. [159] Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang. Learning gender-neutral word embeddings. arXiv preprint arXiv:1809.01496, 2018. [160] Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-Hao Huang, Muhao Chen, Ryan Cotterell, and Kai-Wei Chang. Examining gender bias in languages with grammatical gender, 2019. [161] Xiang Zhou and Mohit Bansal. Towards robustifying NLI models against lexical dataset 80 biases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8759–8771, Online, July 2020. Association for Computational Linguistics. [162] Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Yejin Choi, and Noah Smith. Challenges in automated debiasing for toxic language detection. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3143–3155, Online, April 2021. Association for Computational Linguistics. [163] Ran Zmigrod, Sabrina J. Mielke, Hanna Wallach, and Ryan Cotterell. Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1651–1661, Florence, Italy, July 2019. Association for Computational Linguistics. 81 APPENDIX A BIAS DETECTION IN DIALOGUE GENERATION In the appendix, we detail the 6 categories of words, i.e., gender (male and female), race (white and black), pleasant and unpleasant, career and family. A.1 Gender Words The gender words consist of gender specific words that entail both male and female possessive words as follows: (gods - goddesses), (nephew - niece), (baron - baroness), (father - mother), (dukes - duchesses), (dad - mom), (beau - belle), (beaus - belles), (daddies - mummies), (policeman - policewoman), (grandfather - grandmother), (landlord - landlady), (landlords - landladies), (monks - nuns), (stepson - stepdaughter), (milkmen - milkmaids), (chairmen - chairwomen), (stewards - stewardesses), (men - women), (masseurs - masseuses), (son-in-law - daughter-in-law), (priests - priestesses), (steward - stewardess), (emperor - empress), (son - daughter), (kings - queens), (proprietor - proprietress), (grooms - brides), (gentleman - lady), (king - queen), (governor - matron), (waiters - waitresses), (daddy - mummy), (emperors - empresses), (sir - madam), (wizards - witches), (sorcerer - sorceress), (lad - lass), (milkman - milkmaid), (grandson - granddaughter), (congressmen - congresswomen), (dads - moms), (manager - manageress), (prince - princess), (stepfathers - stepmothers), (stepsons - stepdaughters), (boyfriend - girlfriend), (shepherd - shepherdess), (males - females), (grandfathers - grandmothers), (step-son - step-daughter), (nephews - nieces), (priest - priestess), (husband - wife), (fathers - mothers), (usher - usherette), (postman - postwoman), (stags - hinds), (husbands - wives), (murderer - murderess), (host - hostess), (boy - girl), (waiter - waitress), (bachelor - spinster), (businessmen - businesswomen), (duke - duchess), (sirs - madams), (papas - mamas), (monk - nun), (heir - heiress), (uncle - aunt), (princes - princesses), (fiance - fiancee), (mr - mrs), (lords - ladies), (father-in-law - mother-in-law), (actor - actress), (actors - actresses), (postmaster - postmistress), (headmaster - headmistress), (heroes - heroines), (groom - bride), (businessman - businesswoman), (barons - baronesses), (boars - sows), (wizard - witch), (sons-in-law - daughters-in-law), (fiances 82 - fiancees), (uncles - aunts), (hunter - huntress), (lads - lasses), (masters - mistresses), (brother - sister), (hosts - hostesses), (poet - poetess), (masseur - masseuse), (hero - heroine), (god - goddess), (grandpa - grandma), (grandpas - grandmas), (manservant - maidservant), (heirs - heiresses), (male - female), (tutors - governesses), (millionaire - millionairess), (congressman - congresswoman), (sire - dam), (widower - widow), (grandsons - granddaughters), (headmasters - headmistresses), (boys - girls), (he - she), (policemen - policewomen), (step-father - step-mother), (stepfather - stepmother), (widowers - widows), (abbot - abbess), (mr. - mrs.), (chairman - chairwoman), (brothers - sisters), (papa - mama), (man - woman), (sons - daughters), (boyfriends - girlfriends), (he’s - she’s), (his - her). A.2 Race Words The race words consist of Standard US English words and African American/Black words as follows: (going - goin), (relax - chill), (relaxing - chillin), (cold - brick), (not okay - tripping), (not okay - spazzin), (not okay - buggin), (hang out - pop out), (house - crib), (it’s cool - its lit), (cool - lit), (what’s up - wazzup), (what’s up - wats up), (what’s up - wats popping), (hello - yo), (police - 5-0), (alright - aight), (alright - aii), (fifty - fitty), (sneakers - kicks), (shoes - kicks), (friend - homie), (friends - homies), (a lot - hella), (a lot - mad), (a lot - dumb), (friend - mo), (no - nah), (no - nah fam), (yes - yessir), (yes - yup), (goodbye - peace), (do you want to fight - square up), (fight me - square up), (police - po po), (girlfriend - shawty), (i am sorry - my bad), (sorry - my fault), (mad - tight), (hello - yeerr), (hello - yuurr), (want to - finna), (going to - bout to), (That’s it - word), (young person - young blood), (family - blood), (I’m good - I’m straight), (player - playa), (you joke a lot - you playing), (you keep - you stay), (i am going to - fin to), (turn on - cut on), (this - dis), (yes - yasss), (rich - balling), (showing off - flexin), (impressive - hittin), (very good - hittin), (seriously - no cap), (money - chips), (the - da), (turn off - dub), (police - feds), (skills - flow), (for sure - fosho), (teeth - grill), (selfish - grimey), (cool - sick), (cool - ill), (jewelry - ice), (buy - cop), (goodbye - I’m out), (I am leaving - Imma head out), (sure enough - sho nuff), (nice outfit - swag), (sneakers - sneaks), (girlfiend - shortie), (Timbalands - tims), (crazy - wildin), (not cool - wack), (car - whip), (how are you - sup), (good - dope), (good - fly), (very good - supafly), (prison - pen), (friends - 83 squad), (bye - bye felicia), (subliminal - shade). A.3 Pleasant and Unpleasant Words Pleasant words. The pleasant words consist of words often used to express positive emotions and scenarios as follows: caress, freedom, health, love, peace, cheer, friend, heaven, loyal, pleasure, diamond, gentle, honest, lucky, rainbow, diploma, gift, honor, miracle, sunrise, family, happy, laughter, paradise, vacation, joy, wonderful. Unpleasant Words. The unpleasant words consist of words often used to express negative emotions and scenarios as follows: abuse, crash, filth, murder, sickness, accident, death, grief, poison, stink, assault, disaster, hatred, pollute, tragedy, divorce, jail, poverty, ugly, cancer, kill, rotten, vomit, agony, prison, terrible, horrible, nasty, evil, war, awful, failure. A.4 Career and Family Words Career Words. The career words consist of words pertain to careers, jobs and businesses: company, industry, academic, executive, management, occupation, professional, corporation, salary, office, business, career, technician, accountant, supervisor, engineer, worker, educator, clerk, counselor, inspector, mechanic, manager, therapist, administrator, salesperson, receptionist, librarian, advisor, pharmacist, janitor, psychologist, physician, carpenter, nurse, investigator, bartender, specialist, electrician, officer, pathologist, lawyer, planner, practitioner, plumber, instructor, surgeon, veterinarian paramedic, examiner, chemist, machinist, appraiser, nutritionist, architect, hairdresser, baker, programmer, paralegal, hygienist, scientist. Family Words. The family words consist of words refer to relations within a family or group of people. adoption, adoptive, birth, bride, bridegroom, care-giver, child, childhood, children, clan, cousin, devoted, divorce, engaged, engagement, estranged, faithful, family, fiancee, folks, foster, groom, heir, heiress, helpmate, heritage, household, husband, in-law, infancy, infant, inherit, inheritance, 84 kin, kindred, kinfolk, kinship, kith, lineage, love, marry, marriage, mate, maternal, matrimony, natal, newlywed, nuptial, offspring, orphan, parent relative, separation, sibling, spouse, tribe, triplets, twins, wed, wedding, wedlock, wife. 85 APPENDIX B DETECTING AND EXAMINING GENDER BIAS IN THE NEWS B.1 Appendix A As previously mentioned, we provide one of the largest non offensive, non repeating set of gender- specific (male and female) words, we will now detail the 2 categories containing a total of 465 masculine and feminine gender possessive nouns. Note that in the creation of the set of words, overly offensive gender related words such as bitch, whore, slut, bastard, prick, etc., were left out of the sets of nouns as they are hardly ever used in news articles. However, offensive gender related words are often used in tabloids (a compact version of a newspapers dominated by headline titles and images). [69]. B.1.1 Male Possessive Words The succeeding word list consists of 230 gender specific words that entail male possessive nouns as follows: god, gods, nephew, nephews, baron, father, fathers dukes, dad, beau, beaus, daddies, policeman, policemen, grandfather, landlord, landlords, monk, monks, step-son, step-sons, milkmen, chairmen, chairman, steward, men, masseurs, son-in-law, priest, king, governor, waiter, daddy, steward, emperor, son, proprietor, groom, grooms, gentleman, gentlemen, sir, wizards, sorcerer, lad, milk- man, grandson, grand-son, congressmen, dads, manager, prince, stepfathers, boyfriend, shepherd, shepherds, males, grandfathers, grand-fathers, husband, usher, postman, stags, husbands, host, boy, waiter, bachelor, bachelors, businessmen, duke, sirs, papas, heir, uncle, princes, fiance, mr, lords, father-in-law, actor, actors, postmaster, headmaster, heroes, businessman, boars, wizard, sons-in-law, fiances, uncles, hunter, lads, masters, brother, hosts, poet, hero, grandpa, grandpas, manservant, heirs, male, tutors, millionaire, congressman, sire, sires, widower, grandsons, grand-sons, boys, he, step-father, jew, bridegroom, bridegrooms stepfather, widowers, abbot, mr., brothers, man, sons, boyfriends, he’s, his, him, earl, giant, count, stepson, stepsons, poet, mayor, peer, negro, abbot, traitor, benefactor, instructor, conductor, founder, founders, hunters, huntresses, temptress, 86 enchanter, enchanters, songster, songsters, murderer, murderers, patron, patrons, author, czar, guy, spokesman, spokesmen, pa, councilman, council-man, councilmen, council-men, gay, gays, prostate cancer, fraternity, fraternities, salesman, dude, dudes, paternal, brotherhood, statesman, statesmen, countryman, countrymen, suitor, macho, papa, strongman, strongmen, boyhood, manhood, masculine, macho, horsemen, brethren, chap, chaps, schoolboy, schoolboys, bloke, blokes, patriarch, patriachy, fatherhood, hubby, hubbies, fella, fellas, handyman, fraternal, bro, masculinity, ballerino, pappy, papi, pappies, dada, bf, bfs, knights, knight, menfolk, brotherly, manly, pimp, pimps, homeboy, homeboys, grandnephew, grand-nephew, grand-nephew, grand-nephews, john doe, nobleman, noblemen, dream boy, himself, gramps B.1.2 Female Possessive Words The succeeding word list consists of 235 gender specific words that entail female possessive nouns as follows: goddesses, niece, baroness, mother, duchesses, mom, belle, belles, mummies, policewoman, grandmother, landlady, landladies, nuns, stepdaughter, milkmaids, chairwomen, stewardesses, women, masseuses, daughter-in-law, priestesses, stewardess, empress, daughter, queens, propri- etress, brides, lady, queen, matron, waitresses, mummy, empresses, madam, witches, sorceress, lass, milkmaid, granddaughter, grand-daughter, congresswomen, moms, manageress, princess, stepmoth- ers, stepdaughters, girlfriend, shepherdess, females, grand-mothers, grandmothers, step-daughter, nieces, priestess, wife, mother, usherette, postwoman, hind, wives, murderess, hostess, girl, waitress, spinster, shepherdess, businesswomen, duchess, madams, mamas, nun, heiress, aunt, princesses, fiancee, mrs, ladies, mother-in-law, actress, actresses, postmistress, headmistress, heroines, bride, businesswoman, baronesses, sows, witch, daughters-in-law, aunts, huntress, lasses, mistress, mis- tresses, sister, hostesses, poetess, masseuse, heroine, goddess, grandma, grandmas, maidservant, heiresses, patroness, female, governesses, millionairess, congresswoman, dam, widow, granddaugh- ters, grand-daughters, headmistresses, girls, she, policewomen, step-mother, stepmother, widows, abbess, mrs., chairwoman, sisters, mama, woman, daughters, girlfriends, she’s, her, maid, countess, giantess, poetess, jewess, mayoress, peeress, negress, abbess, traitress, benefactress, instructress, 87 conductress, founder, huntress, temptress, enchantress, songstress, murderess, murderesses, pa- tronesses, authoress, czarina, spokeswoman, spokeswomen, ma, councilwoman, council-woman, councilwomen, council-women, mum, lesbian, lesbians, breast, breasts, maiden, maidens, sorority, sororities, saleswoman, dudette, maternal, feminist, feminists, sisterhood, housewife, housewives, stateswoman, stateswomen, countrywoman, countrywomen, chick, chicks, mommy, strongwoman, strongwomen, babe, babes, diva, divas, feminine, feminism, gal, gals, sistren, schoolgirl, schoolgirls, matriarch, matriarchy, motherhood, wifey, sis, femininity, ballerina, ballerinas, granny, grannies, mami, momma, maam, gf, gfs, damsel, damsels, vixen, vixens, nan, nanny, nannies, auntie, women- folk, sisterly, motherly, homegirl, homegirls, grand-niece, grand-nieces, grandniece, grandnieces, jane doe, noblewoman, noblewomen, dream girl, madame, herself, hers B.2 Appendix B As previously mentioned, we provide one of the largest gender-specific and gender-neutral words containing a total of 357 masculine, feminine and neutral career-related and family-related words, we will now we will now detail the 2 categories of family-related and career-related words. B.2.1 Career Words The succeeding word list consists of 162 gender specific and gender neutral career-related words as follows: policewoman, milkmaids, chairwomen, stewardesses, masseuses, priestesses, stewardess, propri- etress, waitresses, congresswomen,moms, manageress, shepherdess, priestess, usherette, postwoman, hostess, waitress, spinster, shepherdess, businesswomen, actress, actresses, postmistress, head- mistress, huntress, mistress, mistresses, sister, hostesses, masseuse,maidservant, heiresses, patroness, governesses, congresswoman, headmistresses, policewomen, chairwoman, maid, mayoress, peeress, traitress, benefactress, instructress, conductress, huntress, temptress, enchantress, songstress, spokeswoman, spokeswomen, councilwoman, council-woman, councilwomen, council-women, saleswoman, stateswoman, stateswomen, policeman, policemen, landlord, landlords, chairmen, 88 chairman, steward, priest, king, governor, waiter, steward, proprietor, sorcerer, congressmen, dads, manager, waiter, actor, actors, postmaster, headmaster, businessman, manservant, tutors, congress- man, benefactor, instructor, conductor, founder, founders, hunters, huntresses, tempt, enchanter, enchanters, spokesman, spokesmen, councilman, council-man, councilmen, council-men, salesman, handyman, knights, knight, academic, accountant, administrator, advisor, appraiser, architect, baker, bartender, business, career, carpenter, chemist, clerk, company, corporation, counselor, educator, electrician, engineer, examiner, executive, hairdresser, hygienist, industry, inspector, instructor, investigator, janitor, lawyer, librarian, machinist, management, mechanic, nurse, nutritionist, oc- cupation, officer, paralegal, paramedic, pathologist, pharmacist, physician, plumber, practitioner, programmer, psychologist, receptionist, salary, salesperson, scientist, specialist, supervisor, surgeon, technician, therapist, veterinarian, worker B.2.2 Family Words The succeeding word list consists of 195 gender specific and gender neutral family-related words as follows: niece, mother, mom, mummies, grandmother, nuns, stepdaughter, women, daughter-in-law, daughter, queens, brides, mummy, empresses, madam, granddaughter, grand-daughter, moms, stepmothers, stepdaughters, girlfriend, grand-mothers, grandmothers, step-daughter, nieces, wife, mothers, wives, girl, madams, mamas, aunt, fiancee, mrs, mother-in-law, bride, daughters-in-law, aunts, heir, heiress, sister, grandma, grandmas, dam, widow, granddaughters, grand-daughters, girls, she, step-mother, stepmother, mrs., sisters, mama, woman, daughters, girlfriends, ma, mum, mommy, gal, gals, sistren, matriarch, matriarchy, motherhood, wifey, sis, granny, grannies, mami, momma, ma’am, gf, gfs, damsel, damsels, vixen, vixens, nanny, nannies,auntie, womenfolk, sisterly, motherly, homegirl, homegirls, grand-niece, grand-nieces, grandniece, grandnieces, madame, him, father, fathers, dad, beau, beaus, daddies, grandfather, step-son, step-sons, men, son-in-law, daddy, son, groom, grooms, sir, grandson, grand-son, dads, prince, stepfathers,boyfriend, grandfathers, grand-fathers, husband, husbands, boy, bachelor, bachelors, sirs, papas, uncle, princes, fiance, mr, father-in-law, sons-in-law, fiances, uncles, brother, grandpa, grandpas, widower, grandsons, 89 grand-sons,boys, step-father, bridegroom, bridegrooms, stepfather, widowers, mr., brothers, man, sons, boyfriends, he’s, his, stepson, stepsons, guy, fraternity, fraternities, salesman, dude, dudes, paternal, brotherhood, papa, boyhood, manhood, masculine, brethren, chap, chaps, patriarch, patriachy, fatherhood, hubby, hubbies, fella, fellas, fraternal, bro, pappy, papi, pappies, dada, bf, bfs, brotherly, homeboy, homeboys,grandnephew, grand-nephew, grand-nephew, grand-nephews, gramps, family, infancy, infant, kin, orphan, twin 90 APPENDIX C DETECTING HARMFUL ONLINE CONVERSATIONAL CONTENT TOWARDS LGBTQIA2S+ INDIVIDUALS C.1 Annotation Guidelines First, you will be given an extensive list of acronyms and terms from OutRight (Link: https://outrightinternational.org/content/acronyms- explained), an LGBTQIA2S+ human rights organization. After, you will be given a comment, where your task is to indicate whether a comment is toxic or non-toxic. If a comment is deemed toxic, then you be provided with 5 additional labels (severe toxicity, obscene, threat, insult and identity attack) to correctly identify and determine if a comment qualifies to be classified under one or more of the 5 additional labels. Human Annotator Protocol 1. Are you a member of the LGBTQIA2S+ community? 2. If you responded “no” above, are you an LGBTQIA2S+ activist or ally? 3. If you responded “no” above, please stop here. 4. If you responded “yes” any of above question, given the extensive acronym list what is your identity, sexuality, and relationship? (Optional. This information is collected, but not saved, only for demographic purposes.) 5. Are you willing to annotate several Reddit comments that contain stereotypes, profanity, vulgarity and other harmful language geared towards LGBTQIA2S+ individuals? 6. If you responded “yes” above, we must mention that if you believe you may become triggered or disturbed and cannot continue, please stop here. 7. If you responded “no” above, please stop here. Rating/ Sensitivity Protocol 1. As you responded “yes” a previous question, 91 ... Are you willing to annotate several Reddit comments that contain stereotypes, profanity, vulgarity and other harmful language geared towards LGBTQIA2S+ individuals? You will be provided with 1000 comments which we have sampled from our binary classifica- tion, and 5 additional labels. 2. For each comment, you will be tasked is to indicate whether a comment is toxic or non-toxic. Is this comment toxic? a) If you responded “yes” above, please select one or more of the appropriate labels provided considering these two classes, “harmful : 1” and “non-harmful : 0”. b) If you responded “no” above, please discard this comment. 3. Have you ever seen, heard, used or been called any of these Anti-LGBTQIA2S+ terms in a particular comment, for example, on social media or in-person? 4. If you responded “yes” above, do you feel triggered, disturbed or distressed reading this comment. (Optional. This information is collected, but is not saved, only for demographic purposes.) We would like to remind you that the objective of this study is not to cause more harm, but to create a safe and inclusive place that welcomes, supports, and values all LGBTQIA2S+ individuals both online and offline. However, due to the overall purpose of this study, we focus on online inclusivity. C.2 Data Breakdown In this section, we display a breakdown of the data as the toxicity label is not an across-the-board label, but there exists a large amount of overlap between labels. In total, there are 7459 toxicity comments. (75.12% of all data.) - 185 or 2.48% were also severe toxicity. - 1590 or 21.32% were also obscene. 92 - 28 or 0.38% were also threat. - 2244 or 30.08% were also insult. - 4494 or 60.25% were also identity attack. In total, there are 185 severe toxicity comments. (1.86% of all data.) - 185 or 100.00% were also toxicity. - 185 or 100.00% were also obscene. - 13 or 7.03% were also threat. - 185 or 100.00% were also insult. - 184 or 99.46% were also identity attack. In total, there are 1590 obscene comments. (16.01% of all data.) - 1590 or 100.00% were also toxicity. - 185 or 11.64% were also severe toxicity. - 23 or 1.45% were also threat. - 1512 or 95.09% were also insult. - 1443 or 90.75% were also identity attack. In total, there are 28 threat comments. (0.28% of all data.) - 28 or 100.00% were also toxicity. - 13 or 46.43% were also severe toxicity. - 23 or 82.14% were also obscene. - 25 or 89.29% were also insult. - 27 or 96.43% were also identity attack. In total, there are 2244 insult comments. (22.60% of all data.) - 2244 or 100.00% were also toxicity. 93 - 185 or 8.24% were also severe toxicity. - 1512 or 67.38% were also obscene. - 25 or 1.11% were also threat. - 2141 or 95.41% were also identity attack. In total, there are 4494 identity attack comments. (45.26% of all data.) - 4494 or 100.00% were also toxicity. - 184 or 4.09% were also severe toxicity. - 1443 or 32.11% were also obscene. - 27 or 0.60% were also threat. - 2141 or 47.64% were also insult. C.3 Feature Distribution Plots In this section, we display feature distribution i.e., visualization of the variation in the data distribution of each label. These distribution plots represent the overall distribution of the continuous data variables. Figure C.1: Toxicity feature distribution. C.4 Word Contribution Disclaimer: Due to the overall purpose of the study, several terms in the figures may be offensive or disturbing (e.g. profane, vulgar, or homophobic slurs). These terms are not filtered as they are 94 Figure C.2: Severe Toxicity feature distribution. Figure C.3: Obscene feature distribution. Figure C.4: Threat feature distribution. 95 Figure C.5: Insult feature distribution. Figure C.6: Identity attack feature distribution. representative of essential aspects in the dataset. In this section, we demonstrate which words constitutes towards a “harmful” or “non-harmful” comment. In Figures C.7 – C.12, we display the top 30 most frequent words per label. 96 Figure C.7: Top 30 most frequent words contributing to the Toxicity label. Figure C.8: Top 30 most frequent words contributing to the Severe Toxicity label. 97 Figure C.9: Top 30 most frequent words contributing to the Obscene label. Figure C.10: Top 30 most frequent words contributing to the Threat label. 98 Figure C.11: Top 30 most frequent words contributing to the Insult label. Figure C.12: Top 30 most frequent words contributing to the Identity Attack label. 99 APPENDIX D A MULTI-LAYERED LANGUAGE ANALYSIS: A CASE STUDY OF AFRICAN-AMERICAN ENGLISH D.1 Dataset Details Our collected dataset is demographically-aligned on AAE in correspondence on the dialectal tweet corpus by [14]. The TwitterAAE corpus is publicly available and can be downloaded from: http://slanglab.cs.umass.edu/TwitterAAE/. [14] uses a mixed-membership demographic language model which calculates demographic dialect proportions for a text accompanied by a race attribute—African America, Hispanic, Other, and White in that order. The race attribute is annotated by a jointly inferred probabilistic topic model based on the geolocation information of each user and tweet. Given that geolocation information (residence) is highly associated with the race of a user, the model can make accurate predictions. However, there a a low number messages that possess a posterior probabilities of NaN as these are messages that have no in-vocabulary words under the model. D.2 Annotator Annotation Guidelines You will be given demographically-aligned African American tweets, in which we refer to these tweets as sequences. As a dominant AAE speaker, who identifies as bi-dialectal, your task is to correctly identify the context of each word in a given sequence in hopes to address the issues of lexical, semantic and syntactic ambiguity. 1. Are you a dominant AAE speaker? 2. If you responded “yes” above, are you bi-dialectal? 3. If you responded “yes”, given a sequence, have you ever said, seen or used any of these words given the particular sequence? 4. Given a sequence, what are the SAE equivalents to the identified non-SAE terms? 100 5. For morphological and phonological (dialectal) purposes, are these particular words spelt how would you say or use them? 6. If you responded “no” above, can you provide a different spelling along with its SAE equivalent? D.2.1 Annotation Protocol 1. What is the context of each word given the particular sequence? 2. Given NLTK’s Penn Treebank Tagset, what is the most appropriate POS tag for each word in the given sequence? D.2.2 Human evaluation of POS tags Protocol 1. Given the tagged sentence, are there any misclassified tags? 2. If you responded “yes” above, can you provide a different POS tag, and state why it is different? D.3 Variable Rules Examples In this section we present a few examples of simple, deterministic phonological and morphological language features or current variable rules which highlight several regional varieties of AAE which typically attain misclassified POS tags. Please note that a more exhaustive list of these rules is still being constructed as this work is still ongoing. Below are a few variable cases (MAE → AAE), some of which may have been previously shown in Table 5.2: 1. Consonant (‘t’) deletion (Adverb case) : e.g. “just” → “jus”; “must” → “mus” 2. Contractive negative auxiliary verbs replacement: “doesn’t” → “don’t” 3. Contractive (’re) loss: e.g. “you’re” → “you”; “we’re” → “we” 4. Copula deletion: Deletion of the verb “be” and its variants, namely “is” and “are” e.g. “He is on his way” → “He on his way”; “You are right” → “You right” 101 5. Homophonic word replacement (Pronoun case): e.g. “you’re” → “your” 6. Indefinite pronoun replacement: e.g. “anyone” → “anybody”; 7. Interdental fricative loss (Coordinating Conjuction case): e.g. “this” → “dis”; ‘that’ → ‘dat”; “the” → “da” 8. Phrase reduction (present/ future tense) ⇒ word (Adverb case): e.g. “what’s up” → “wassup”; “fixing to” → “finna” 9. Present tense possession replacement: e.g. “John has two apples” → “John got two apples”; “The neighbors have a bigger pool” → “The neighbors got a bigger pool” 10. Remote past “been” + completive (‘done’): “I’ve already done that” → “I been done that” 11. Remote past “been” + completive (‘did’): “She already did that” → “She been did that” 12. Remote past “been” + Present tense possession replacement: “I already have food” → “I been had food”; “You already have those shoes” → “You been got those shoes” 13. Term-fragment deletion: e.g. “brother” → “bro”; “sister” → “sis”; “your” → “ur”; “suppose” → “pose”; “more” → “mo” 14. Term-fragment replacement: “something” → “sumn”; “through” → “thru”; “for” → “fa”; “nothing” → “nun” 102 APPENDIX E DETECTING AND MITIGATING INHERENT LINGUISTIC BIAS IN LARGE LANGUAGE MODELS E.1 Implementation Details E.1.1 Details of the Base Model BERT – Bidirectional Encoder Representations from Transformers (BERT) [41] is a Transformer- based ML technique for NLP that achieves state-of-the-art results in a wide variety of NLP tasks. BERT is trained on a huge Books Corpus + Wikipedia dataset i.e., raw unlabeled English text consisting of 3.3 billion words. This model exploits an attention mechanism to learn contextual relationships between words and optimizes two objectives: (1) Masked Language Modeling (MLM) and (2) Next Sentence Prediction (NSP), and has a vocabulary size of 30,522. Notation. Given a sequence or sub-word tokens, for example, a sentence, X = (𝑥1, 𝑥2, . . . , 𝑥𝑛), BERT trains an encoder which generates contextualized vector representations for each word-token: Encoder(𝑥1, 𝑥2, . . . , 𝑥𝑛) = e1, e2, . . . , e𝑛. Masked Language Model. Also known as a cloze test, is the task of predicting missing tokens in a sequence when replaced with a [MASK] token. Specifically, to predict a subset of tokens 𝑌 ⊆ 𝑋 when sampled and substituted for a different tokens. Hence, the task is to predict the original tokens in Y from the altered input. Note that BERT selects each token in Y independently by randomly selecting a subset. Next Sentence Prediction. The task of NSP is to jointly utilize two sequences (𝑋𝐴, 𝑋𝐵) in a bi-sequence sampling procedure and predict whether 𝑋𝐵 is a undeviating continuation of 𝑋𝐴. BERT first reads 𝑋𝐴, and then reads 𝑋𝐵 in one of two ways: (1) reading 𝑋𝐵 directly after 𝑋𝐴 has ended; or (2) randomly sampling 𝑋𝐵 from the corpus. To form 𝑋𝐴, 𝑋𝐵 as an input to BERT, a [SEP] token is added to separate both sequence, and a special [CLS] token is added, where the target of [CLS] is to determine if 𝑋𝐵 indeed follows 𝑋𝐴 in the corpus. 103 E.1.2 Details of Experimental Settings In summary, BERT optimizes its two objectives uniformly, and thus, it serves as a appropriate model for our task of understanding the inferential relationships between sentence pairs by examining the differences in language styles from different demographic groups e.g. African Americans. Now, we will now give details of each pretrained BERT model below: 1. BERT-base-uncased - Trained on raw English text, and consists of 12-layers, 768-hidden, 12-heads, 110M parameters. 2. BERT-large-cased - Trained on raw lower-cased English text, and consists of 24-layer, 1024-hidden, 16-heads, 335M parameters. Trained on cased English text. E.2 Translative Morpho-syntax Protocol Here we present a set of 20 linguistic phonetic and morphological text rules that are used to code-switch from SAE to AAE while maintaining contextual accuracy i.e., original structure, intent, semantic equivalence, and quality of a text. Please note that these are only a few examples of the most commonly used morphological linguistic AAE features (which we adapt from AAE literature). Our deterministic translative morpho-syntax protocol (TMsP) and its cases are as follows: 1. Consonant (‘t’) deletion (Special case) : e.g. “just” → “jus”; “must” → “mus” 2. Contractive (’all) gain: “You all” → “Y’all” 3. Contractive negative auxiliary verbs replacement: “doesn’t” → “don’t” 4. Contractive (’re) loss: e.g. “you’re” → “you”; “we’re” → “we”; “they’re” → “they” 5. Contractive word replacement: e.g. “isn’t” → “ain’t”; “wasn’t” → “ain’t” 6. Copula deletion: Deletion of the verb “be” and its variants, namely “is” and “are” e.g. “He is on his way” → “He on his way”; “You are right” → “You right” 7. Gerund consonant (‘g’) deletion and retainment: 104 • Consonant (‘g’) deletion: e.g. “coming” → “comin”; “going” → “goin” • Consonant (‘g’) retainment (Exception case): e.g. “–inging” 8. Homophonic word replacement: e.g. “whine” → “wine”; “you’re” → “your” 9. Indefinite article replacement: e.g. “an” → “a” 10. Indefinite pronoun replacement: e.g. “anyone” → “anybody”; “everyone” → “everybody” 11. Interdental fricative loss: e.g. “this” → “dis”; ‘that’ → ‘dat”; “than” → “dan”; “their” → “they (dey)”; “the” → “da” 12. Negative concord replacement: e.g. “Don’t say anything” → “Don’t say nothing” 13. Phrase reduction (present/ future tense) ⇒ word e.g. “going to” → “gonna”; “want to” → “wanna”; “trying to” → “tryna”; “what’s up” → “wassup”; “fixing to” → “finna” 14. Possessive (’s) removal: e.g. “He’s mad at me” → “He mad at me” 15. Present tense possession replacement: e.g. “John has two apples” → “John got two apples”; “The neighbors have a bigger pool” → “The neighbors got a bigger pool” 16. Remote past “been” + completive (‘done’): “I’ve already done that” → “I been done that” 17. Remote past “been” + completive (‘did’): “She already did that” → “She been did that” 18. Remote past “been” + Present tense possession replacement: “I already have food” → “I been had food”; “You already have those shoes” → “You been got those shoes” 19. Term-fragment deletion: e.g. “brother” → “bro”; “sister” → “sis”; “your” → “ur”; “suppose” → “pose”; “more” → “mo” 20. Term-fragment replacement: “something” → “sumn”; “through” → “thru”; “for” → “fa”; “nothing” → “nun” 105 E.3 Annotation Guidelines You will be given a phrase that is written in Standard American English (SAE), your task is to correctly identify if the translative vocabulary rules in Appendix E.2 are accurate in order to translate SAE text to AAE text. Furthermore, while reviewing the rules, be sure to mention that these rules and/or morpho-syntax word cases in the sampled premise-hypothesis sentence pairs maintain their contextual accuracy i.e., original structure, intent, semantic equivalence, and quality. SAE to AAE Protocol 1. Are you a dominant AAE speaker? 2. If you responded “yes” above, are you bi-dialectal? 3. If you responded “yes” above, are you capable of code-switching by alternating between SAE and AAE frequently on a daily basis in a single conversation or situation? 4. Given TMsP above in Appendix E.2, are these main grammatical, structural and syntactic rules of word case usage of AAE linguistic features? 5. If you responded “no” above, can clarify which rule is insufficient? In addition, if possible, can you provide a grammatical, structural or syntactic rule that is not detailed in Appendix E.2? E.4 Contextual accuracy Protocol Given a table of SAE-AAE sentence pairs examples, determine whether or not their contextual accuracy is maintained. 1. As you responded “yes” a previous question, ... are you capable of code-switching by alternating between SAE and AAE frequently on a daily basis in a single conversation or situation? We will now provide 20 lower-cased test sentences is Table E.1. 106 Table E.1: SAE examples and their AAE equivalents (after using CODESWITCH). SAE i will go back to the house i don’t want to go to bed he isn’t my friend, but he’s a king she is being weird to me you all are annoying he isn’t coming anymore a woman is trying to walk this bag and that shoe are mine their kids are laughing john and kates have two dogs are you going through something what are you doing what’s the temperature they have a better car than us so you’re going to the party they are singing but they can’t sing you could of have it all he would’ve had it if he was here we should have been first in line he should of had the last bite AAE imma go back ta da house ion wanna go ta bed he ain’t my friend, but he a king she been weird ta me yall annoyin he ain’t comin no mo a woman tryna walk dis bag n dat shoe mine they kids laughin john n kates hav two dogs u goin thru sumn wat r u doin wus da temperature dey hav a betta car dan us so your gonna go ta da party dey singing but dey can’t sing u coulda hav it all he woulda had it if he was here we shoulda been first in line he shoulda had da last bite 2. Have you ever seen any of these words in a particular sentence in Table E.1, for example, on social media such as Twitter? 3. If you responded “yes” above, For each SAE sentence, does each plausible AAE sentence resemble adequate AAE morphological language features from a dominant AAE speaker after applying CODESWITCH? 4. If you responded “yes” above, do these pairs maintain their contextual accuracy i.e., original structure, intent, semantic equivalence and quality? 5. For dialectal (morphological and phonological) purposes, are these particular words spelt how would you say or use them? For example, texting or posting on social media? 6. If you responded “no” above, can you provide a different spelling along with its SAE equivalent? 107