HUMAN IN THE LOOP: THE ROLE OF INDIVIDUAL AND INSTITUTIONAL BEHAVIOR ON PREDICTIVE ALGORITHMS By William Isaac A DISSERTATION Michigan State University in partial fulfillment of the requirements Submitted to for the degree of Political Science – Doctor of Philosophy 2018 ABSTRACT HUMAN IN THE LOOP: THE ROLE OF INDIVIDUAL AND INSTITUTIONAL BEHAVIOR ON PREDICTIVE ALGORITHMS By William Isaac Over the past decade, algorithmic decision systems (ADS) – applications of statistical or compu- tational techniques designed to assist human-decision making processes – have moved from an obscure domain of statistics and computer science into the mainstream. The rapid decline in the cost of computer processing and ubiquity of digital data storage has created a dramatic rise in the adoption of ADS using applied machine learning algorithms, transforming various sectors of soci- ety from digital advertising to political campaigns, risk modeling for the banking sector, healthcare and beyond. Many agencies and practitioners in the public sector turn to ADS as a means to stretch limited public resources amidst growing public demands for equity and accountability. However, recent research from multiple fields has found that social and institutional biases, often reflected by input data used to generate predictions. The potential of perpetuated discrimination via input data is a particular concern in fields such as criminal justice where historical biases against minorities have the potential to exacerbate existing racial inequalities. In a series of three essays, this dissertation seeks to outline how institutional norms often shape algorithmic predictions, examine how ADSs alter the incentive structures for agents using the tools, and ultimately its impact on human decision-making. Copyright by WILLIAM ISAAC 2018 To Bobbi and Charlotte, who gave me the strength to finish the race. iv ACKNOWLEDGEMENTS While I had plenty of warning beforehand, it is shocking how many people support you while writing a dissertation. This experience has shown me that I have been blessed a with truly amazing support system and some of the most thoughtful and caring people in the world. Starting with dissertation committee (Eric Juenke, Matt Grossmann, Josh Sapotichne, and Cory Smidt) who have mentored me and given the space to develop into the scholar I always wanted to be. Even during the low moments of my graduate school tenure they always believe in my potential and pushed me into positions to succeed as a person and academic. I truly wish every doctoral student could have a committee as supportive as mine. I also want to express my deepest thanks to my colleagues at the Human Right Data Analysis Group (HRDAG) including, Patrick Ball, Kristian Lum, and Megan Price. The opportunity to work at HRDAG changed my life. They taught me that it is possible to be rigorous in your research while still speaking truth to power. Regardless of where my adventures in life take me next, I will always be proudest of the work I did during my tenure there. I sincerely hope this institution can remain a beacon for data geeks to pursue statistics in the justice and progress. I would like to thank my family. The countless nights spent in front of a computer writing this document or traveling around the world to communicate its findings would not have been possible without my entire family (Reta Adams, Jesse Adams, Joyce White, Jesse Adams Jr., Elvis Isaac, Jennie Isaac and Geraldine Isaac) and countless others filling in the gaps and chipping in. Though they may not always understand what my research is about, they have always been extremely proud of me using my platform for change. Last but certainly not least, I would like to thank my wife Bobbi. Before the success and achievements, you took a gamble on a kid cleaning cars inside of a dusty auto garage in Tuscaloosa, Alabama. Not many people would have believed my outlandish dream of becoming an academic or getting a PhD, but you stood by me the entire way and never wavered. For that I am eternally grateful. And Charlotte, you were perhaps the most important inspiration during this process. I v work everyday with the goal of creating a better world for you. I truly hope you can one day live in a world where people can judge you by the content of your character and not the color your skin. While the current times may give us cause to question this possibility, I firmly believe we will achieve this promise one day. As Dr. Martin Luther King once said, "the arc of the moral universe is long, but it bends toward justice." vi TABLE OF CONTENTS . . . . . . . . . . LIST OF TABLES . LIST OF FIGURES . CHAPTER 1 HOPE, HYPE, AND FEAR: THE PROMISE AND POTENTIAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PITFALLS OF THE BIG DATA ERA IN CRIMINAL JUSTICE . . . . 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Case Study: Predictive Policing . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Discussion . 1 2 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 . . . . . . . . . . . . . CHAPTER 2 THE NUMBERS GAME: GOODHART’S LAW AND THE USE OF 2.1.1 Defining Goodhart Effects THE INDEX CRIMES AS A PERFORMANCE METRICS . . . . . . . 15 2.1 Performance Metrics and Goodhart Effects . . . . . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1.1 Goodhart’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1.2 Campbell’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Goodhart Effects and Policing . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 . 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 . . . . . . . . . . . . . . . . . . . . . . . . . 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.2 Results 2.2.2.1 2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feedback Effects . . 2.2.1.1 Data . . 2.2.1 Methods & Data . . . . . . . . CHAPTER 3 BACK TO THE FUTURE: PREDICTIVE POLICING AND RESID- UAL DISCRIMINATION FROM RACE-NEUTRAL CLASSIFIERS . 37 3.1 This is Your Brain on Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.1.1 Principles of Modern Technology and Human Behavior . . . . . . . . . 41 3.1.2 Technology-Behavior Interaction in Policing . . . . . . . . . . . . . . . 47 . . . . . . . . . . . . . . . . . . . . 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.2 Case Study: LAPD Deployment of Predpol 3.3 Conclusion . . . . . . . . . . . . . . . BIBLIOGRAPHY . vii LIST OF TABLES Table 2.1: Segmented Regression Table of MAD Values on Predictive Policing Deployment 33 viii LIST OF FIGURES Figure 1.1: Estimated number of drug users, based on 2011 National Survey on Drug Use and Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 1.2: Number of drug arrests made by Oakland police department, 2010 . . . . . . . 8 9 Figure 1.3: Number of days with targeted policing in areas flagged by PredPol analysis of Oakland police data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 1.4: Number of days with targeted policing in areas flagged by PredPol analysis of Oakland police data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Figure 1.5: Predicted odds of crime in locations targeted by PredPol algorithm, relative to non-targeted locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Figure 2.1: Cumulative Digit Distribution for Reported Property Crimes, Foothill District . 28 Figure 2.2: Monthly Digit Distribution for Reported Property Crimes, Foothill District . . . 29 Figure 2.3: Cumulative Digit Distribution for Reported Property Crimes, Southwest District 30 Figure 2.4: Annual Digit Distribution for Reported Property Crimes, Southwest District . . 31 Figure 2.5: Cumulative Digit Distribution for Reported Property Crimes, North Holly- wood District . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Figure 2.6: Monthly Digit Distribution for Reported Property Crimes, North Hollywood District . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Figure 2.7: Monthly Time Series of MAD Scores, North Hollywood District . . . . . . . . 34 Figure 3.1: Percentage of Arrests by Crime Type and Data Source, Foothill District 2010-2017 59 Figure 3.2: Percentage of Arrests by Crime Type and Data Source, Southwest District 2010-2017 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Figure 3.3: Percentage of Arrests by Crime Type and Data Source, North Hollywood District 2010-2017 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 ix Figure 3.4: Demographic Composition of LAPD Districts . . . . . . . . . . . . . . . . . . 63 Figure 3.5: Arrests Rates in Control and Treatment Boxes for Predpol Deployed LAPD Divisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Figure 3.6: Arrests Rates Non-Predicted Boxes in Predpol Deployed LAPD Divisions . . . 65 Figure 3.7: Pre-Post Monthly Arrest Rates by Race and Quadrant, Foothill District . . . . . 67 Figure 3.8: Pre-Post Monthly Arrest Rates by Race and Quadrant, Southwest District . . . . 68 Figure 3.9: Pre-Post Monthly Arrest Rates by Race and Quadrant, North Hollywood District 69 x CHAPTER 1 HOPE, HYPE, AND FEAR: THE PROMISE AND POTENTIAL PITFALLS OF THE BIG DATA ERA IN CRIMINAL JUSTICE Crime and disorder are not natural phenomena. These events have to be observed, noticed, acted upon, collected, categorized, and recorded — while other events aren’t. — Elizabeth Joh, Feeding the Machine: Policing, Crime Data, & Algorithms Over the past decade, algorithmic decision systems (ADS) – applications of statistical or computational techniques designed to assist human-decision making processes – have moved from an obscure domain of statistics and computer science into the mainstream (Munoz et al. 2016, O’Neil 2016). The rapid decline in the cost of computer processing and ubiquity of digital data storage has created a dramatic rise in the adoption of ADS using applied machine learning algorithms, transforming various sectors of society from digital advertising to political campaigns (Hersh 2015, Nickerson & Rogers 2014), risk modeling for the banking sector (Wei et al. 2016, pg. 234), healthcare (Powles & Hodson 2017, pg. 351) and beyond. Many agencies and practitioners in the public sector turn to ADS as a means to stretch limited public resources amidst growing public demands for equity and accountability. Advocates of these "intelligence-led" or "evidence-based" policy approaches assume these algorithmic tools will allow government agencies to use objective data to overcome historical inequalities to serve underrepresented groups (Podesta et al. 2014, Chowdhry et al. 2016, Miller 2018) better. However, the assumption of objective data is flawed. All human behavior or social phenomenon that machine learning algorithms attempt to predict come from a data-generation process (DGP) which comprises trillions of complex interactions between the roughly 7 billion people that inhabit our planet. The DGP is often unseen to the analyst, but we make assumptions regarding this process by choice of statistical models or the inferences we derive from our analysis. Machine learning 1 algorithms are often theorized and developed in cases of simulated data or data with outcome variables with little ambiguity in interpretation or method of collection. However, if we assume incorrectly about the DGP, the predictions and conclusions we generate will be highly inaccurate (Cederman & Weidmann 2017, pg. 475). Furthermore, because the true DGP is unseen, it is nearly impossible to determine whether a proposed measure captures the phenomenon or outcome of interest for decision-makers. 1.1 Background Defining what is considered objective data is a particularly acute problem in criminal justice. Dating back to the turn of the 20th century, statisticians and criminologists have raised concerns over the operationalization and measurement of crime (Morrison 1897). At its core, crime is a social phenomenon that has had multiple definitions and interpretations across time. Since the early 1930s, the United States Department of Justice uniform crime reporting (UCR) data – considered the official assignment of national crime trends – are based on crimes known to the police, either reported by the public or witnessed by members of law enforcement (Beattie 1941, Mosher et al. 2010, Wormeli 2018). While this operationalization of crime is reliable in the statistical sense (i.e. it will consistently measure the same concept over time), multiple scholars have pointed out this approach systematically undercounts crimes committed by some groups. These "hidden” or unreported instances are often referred to as the ”dark figure of crime" (Mosher et al. 2010, pg. 45). One of the most common reasons for the emergence of "dark figures” has been the policies and practices of individual police departments. Robison (1936), who was among a group of early scholars to originally make a linkage between systemic bias in the criminal justice system and measurement, found that arrest data for delinquency of juveniles in New York state (defined as truancy, theft, or malicious mischief) was not a function of a persons’ race or socioeconomic status directly – as previously theorized – but rather the differential treatment of these individuals by the criminal justice system. Beattie (1941, pg. 21) also noted that in addition to demographic factors, police statistics were likely manipulated based on the local political conditions, often with 2 a tendency to "report those facts which show a good administrative record on the part of the department.” More recently, Levitt (1998) analyzed crime victimization and reporting data for 26 large American cities and found that the likelihood of a crime reported to the police increases as the size of the city’s police force increases. The study found a 10% increase in the number of sworn officers per capita corresponds to a 1.54% increase in the reporting rate of Household Larcenies. MacDonald (2002) assessed the likelihood of reporting a crime to law enforcement in the United Kingdom, and found that non-White (except Asian), unemployed, and low-income residents were less likely to report crimes. A longitudinal study by Baumer & Lauritsen (2010) of crime reporting from the US National Crime Victimization Study (NCVS) between 1973 and 2005 had similar findings. While the authors found increasing rates of reporting over time, non-White victims and male victims were much less likely to report crimes to the police. More surprising was that overall less than half of nonlethal violent incidents (40%) and property crimes (32%) were reported to the police. In addition to the underlying demographics and method of measurement, policing strategy behavior can be a significant factor in the measurement and perception of crime in a particular area. Golub et al. (2006) examined the spatial shifts in MPV (Possession of Marijuana in Public) arrests in New York City after the NYPD shifted its enforcement strategy in the early 1990’s as part of Chief Bill Bratton’s embrace of "broken windows” or quality-of-life policing strategy (Golub et al. 2007, Harcourt 2009, Bornstein 2015). The authors found that as the NYPD shifted the focus of MPV enforcement from Transit locations and tourist areas near lower Manhattan to public housing projects in Brooklyn and Queens, this led to a significant shift in both the level and intensity of arrest patterns over time. As a result of the change in strategy, MPV arrests spike in subsequent years, increasing from 1,851 in 1994 to 39,212 in 2003, with the highest proportion of arrests occurring in predominately Black and Latino neighborhoods. When comparing police units, the NYPD Housing police significantly increased their activity, going from zero recorded arrests in 1994 to 3,769 MPV 3 arrests in 2003 and accounting for 10% of the total in 2003. Conversely, arrests from transit police declined from 499 in 1995 to only 57 in 2003. Aggressive police policies can also have a chilling effect on community behavior. For example, Desmond et al. (2016) used an interrupted time series approach on administrative data to find that 911 calls from minority neighborhoods had a net loss of 22,200 calls after the beating of Frank Jude by police officers in Milwaukee, Wisconsin. Simply put, crimes recorded by police are not a complete census of all criminal offenses, nor do they constitute a representative random sample. Instead, police records are a compilation of complex interactions between criminality, policing strategy, and community-police relations. Moreover, while questions about how to measure and operationalize crime are often see as debates for academic criminologists, it has become more relevant as the use of criminal justice data has moved into the era of machine learning. The next sections will discuss in detail how machine learning algorithms are often unaware, and in many cases, unable to adjust for institutional biases and norms embedded within policing data. As a result, the presence of bias in the initial (training) dataset leads to predictions that are subject to the same biases that already exist within the dataset. Further, these biased forecasts can often become amplified if practitioners begin to concentrate resources on an increasingly smaller subset of these forecasted targets. Thus, a failure to understand the limitations of data used in these predictive tools – and create more transparent and accountable mechanisms to mitigate these potential harms may perpetuate historical discrimination toward underrepresented groups and violate their civil and human rights. 4 1.2 Case Study: Predictive Policing One of the most popular and fastest growing ADS in criminal justice is "predictive policing" tools, or applications designed to identify likely targets for police intervention and prevent crime or solve past crimes by making statistical predictions (Perry 2013) In a survey of the nation’s 50 largest police forces, Robinson & Koepke (2016) reported that more than 30 departments have either deployed or are actively exploring the deployment of a predictive policing system. Outside the United States, European cities such as Kent, London, and Berlin are considering the use of predictive policing (or precrime) tools to predict potential violent gang members (Baraniuk 2015). Winston (2018) found the technology firm Palantir sold their predictive policing technology to the Israeli government for targeting Palestinian dissidents, and a report by Human Rights Watch (2018) uncovered the use of predictive policing by the Chinese government on the Muslim Uyghur population of the Xinjiang region. While proponents of predictive policing have viewed this trend as a significant step towards transparency and pragmatic, data-driven policymaking, the use of predictive policing and other ADS within police departments has also raised very serious concerns among activists and scholars (Ferguson 2014, Joh 2014, Robinson & Koepke 2016, ACLU 2016, Joh 2017) regarding this new intersection between statistical learning and public policy. Civil liberties advocates have argued the growth of predictive policing means that officers in the field are more likely to stop suspects who have yet to commit a crime under the guise of historical crime patterns that are not representative of all criminal behavior. Is there any evidence of these potential harms? In their analysis, Lum & Isaac (2016) replicate Predpol’s algorithm initially proposed in Mohler et al. (2011) to generate predictions with publicly available data on drug crimes in the city of Oakland from 2009 to 2011. Specifically, the algorithm at the center of this study is the Epidemic-Type Aftershock Sequence (ETAS) crime forecasting model developed by Predpol Inc., one of the largest vendors of predictive policing systems in the country and one of the only companies to publicly release details of their algorithm in a peer-reviewed journal (Mohler et al. 2015). The foundation of the ETAS model is a spatio-temporal branching or 5 "self-exciting" Poisson process referred to as a Hawkes Process based on the seminal research by (Hawkes 1971) into using seismographic activity to predict earthquake aftershocks. More recently, the Hawkes process has used in a wide array of fields, from criminology (Mohler 2013, Mohler et al. 2015) to finance (Bacry et al. 2015), social media (Du et al. 2015), and counter-terrorism (Tench et al. 2016). Equation 1.1 outlines Predpol’s ETAS model as defined in Mohler et al. (2015). Predpol takes a defined geographic area such as a city or police district and divides it into discrete interlocking boxes or bins for isolating allocations of additional policing surveillance. The model then determines which bins are selected for targeting by generating a conditional intensity rate of crime for each bin n at time t by calculating λn(t) as a function of the background rate µn and the triggering kernel Θω exp−ω(t−ti n). The background rate is a nonparametric histogram of the counts of recorded crimes in bin n over time t − ti and can be thought of as the fixed level of crime in a given area. The triggering kernel parameter captures the model’s "near-repeat” or "contagion” effects in crime data. In particular, the decaying exponential function gives a higher weight to bins with recent spikes in recorded crimes compared to bins with higher background rates and declining rates of recently recorded crimes. This parameter is very similar to the hotspot maps that have become very common within police departments across the country. As Mohler et al. (2015) point out, a critical difference between the ETAS model and hotspot maps such as Compstat which model near-repeat effects are the introduction of the background rate µn. λn(t) = µn + Θω exp−ω(t−ti n) (1.1) ti n