Predictive Algorithms in the Criminal Justice System: Evaluating the Racial Bias Objection

Rebecca Berman

Increasingly, many courtrooms around the U.S. are utilizing predictive algorithms (PAs). PAs are an AI that assigns risk [of future offending] scores to defendants based upon various data about the defendant, not including race, to inform bail, sentencing, and parole decisions with the goals of increasing public safety, increasing fairness, and reducing mass incarceration. Although these PAs are intended to introduce greater objectivity to the courtroom by more accurately and fairly predicting who is most likely to commit future crimes, many worry about the racial inequities that these algorithms may perpetuate. Here, I scrutinize and subsequently support the claim that PAs can operate in racially biased ways, providing a strong ethical objection against their use. Then, I raise and consider the rejoinder that we should still utilize PAs because they are morally preferable to the alternative: leaving judges to their own devices. I conclude that the rejoinder adequately, but not conclusively, succeeds in rebutting the objection. Unfair racial bias in PAs is not sufficient grounds to outright reject their use, for we must evaluate the potential racial inequities perpetuated by utilizing these algorithms relative to the potentially greater racial inequities perpetuated without their use.

The Racial Bias Objection to Predictive Risk Assessment ProPublica conducted research to support concerns that COMPAS (a leading predictive algorithm used in many courtrooms) is unfairly racially biased. Its re- search on risk scores for defendants in Florida showed: a. 44.9% of black defendants who do not end up recidivating are mislabeled as “high risk” (defined as a score of 5 or above), while only 23.5% of white defendants who do not end up recidivating are mislabeled as “high risk.” b. 47.7% of white defendants who end up recidivating are mislabeled as “low risk,” while only 28% of black defendants who end up recidivating are mislabeled as “low risk” (1).

Intuitively, these findings strike us as an unfair racial disparity. COMPAS’s errors operate in different directions for white and black defendants: disproportionately overestimating the risk of black defendants while disproportionately underestimating the risk of white defendants. In “Measuring Algorithmic Fairness,” Deborah Hellman further unpacks the unfairness of this kind of racialized error rate disparity: First, different directions of error carry different costs. In the criminal justice system, we generally view false positives, which punishes an innocent person or over-punishes someone who deserves less punishment, as more costly and morally troublesome than false negatives, which fails to punish or under-punishes someone who is guilty. The policies and practices we have constructed in the U.S. system reflect this view. Defendants are innocent until proven guilty, and there is a high burden of proof for conviction. Because of this, the judicial system airs on the side of producing more false negatives than false positives. Given the widely accepted view that false positives (punishing an innocent person or over-punishing someone) carry a greater moral cost than false negatives (failing to punish or under-punish- ing a guilty individual) in the criminal justice system, we should be especially troubled by black defendants disproportionately receiving errors in the false positive direction (2). A black defendant mislabeled as “high risk” may very well lead judges to impose a much longer sentence or post higher bail than fair or necessary, a cost that black defendants would be shouldering disproportionately (in comparison to white defendants) given the error rate disparity produced by COMPAS. Second, COMPAS’s lack of error rate parity is particularly problematic due to its links to structural biases in data used by PAs. Mathematically, a calibrated algorithm will yield more false positives in the group with a higher base rate of the outcome being predicted. PAs act upon data that suggest a much higher base rate of black offending than white offending, and this base rate discrepancy can reflect structural injustices: I. Measurement Error: Black communities are over-policed, so a crime committed by a black person is much more likely to lead to an arrest than a crime committed by a white person. Therefore, the measured difference of offending between black and white offenders is much greater than the real (statistically unknowable) difference in offending between black and white offenders, and PAs unavoidably utilize this racially biased arrest data (3). II. Compounding Injustice: Due to historical and ongoing systemic racism, black Americans are more likely to live in conditions, such as poverty, certain neighborhoods, and low educational attainment, that correlate with higher predicted criminal behavior. Therefore, if and when PAs utilize criminogenic conditions as data points, relatively more black offenders will score “high risk” as a reflection of past injustices (4).

To summarize, data reflecting unfair racial disparities are necessarily incorporated into COMPAS’s calculations, so unfair racial disparities will come out of COMPAS predictions. For all of these reasons—the high cost of false positives, measurement error, and compounding injustice—lack of error rate parity is a morally relevant attack on the fairness of COMPAS. By being twice as likely to label black defendants that do not end up re-offending as “high risk” than white defendants, COMPAS operates in an unfairly racially biased way. Consequently, we should not use PAs like COM- PAS in the criminal justice system.

Rejoinder to the Racial Bias Objection to Predictive Risk Assessment The argument, however, is not that simple. An important rejoinder is based on the very reason why we find such tools appealing in the first place: humans are imperfect, biased decision-makers. We must consider the alternative to using risk tools in criminal justice settings: sole reliance on a human decision-maker, one that may be just as susceptible, if not more, to racial bias. Due to historical and continuing forces in the U.S. creating an association between dark skin and criminality and the fact that judges are disproportionately white, judges are unavoidably in- grained with implicit or even explicit bias that leads them to perceive black defendants as more dangerous than their white counterparts. This bias inevitably seeps into judges’ highly subjective decisions. Many studies of judicial decision-making show racially disparate outcomes in bail, sentencing, and other key criminal justice decisions (5). For example: a. Arnold, Dobbie, and Yang (2018) find, “black defendants are 3.6 percentage points more likely to be assigned monetary bail than white defendants and, conditional on being assigned monetary bail, receive bail amounts that are $9,923 greater” (6). b. According to the Bureau of Justice Statistics, “between 2005 and 2012, black men received roughly 5% to 10% longer prison sentences than white men for similar crimes, after accounting for the facts surrounding the case” (7).

Consequently, the critical and challenging question is not whether or not PAs are tainted by racial biases, but rather becomes: which is the “lesser of two evils” in terms of racial justice: utilizing PAs or leaving judges to their own devices? I will argue the former, especially if we consider the long-term potential for improving our predictive decision-making through PAs. First, although empirical data on this precise matter is limited, we have reason to believe that utilizing well-constructed PAs can reduce racial inequities in the criminal justice system. Kleinberg et al. (2017) modeled New York City pre-trial hearings and found that “a properly built algorithm can reduce crime and jail populations while simultaneously reducing racial disparities” (8). Even though the ProPublica analysis highlighted disconcerting racial data, it did not compare decision-making using COMPAS to decisions made by judges without such a tool. Second, evidence-based algorithms present more readily available means for improvement than the subjective assessments of judges. Scholars and journalists can critically examine the metrics and their relative weights used by algorithms and work to eliminate or reduce the weight of metrics that are found to be especially potent in producing racially skewed and inaccurate predictions. Also, as Hellman suggests, race can be soundly incorporated into PAs to increase their overall accuracy because certain metrics can be distinctly predictive of recidivism in white versus black offenders. For example, “housing stability” might be more predictive of recidivism in white offenders than black offenders (9). If an algorithm’s assessment of this metric were to occur in conjunction with information on race, its overall predictions would improve, reducing the level of unfair error rate dis- parity (10). Furthermore, PAs’ level of bias is consistent and uniform, while the biases of judges are highly variable and hard to predict or assess. Uniform bias is easier to ameliorate than variable, individual bias, for only one agent of bias has to be tackled rather than an abundance of agents of bias. All in all, there appear to be promising ways to reduce the unfairness of PAs—particularly if we construct these tools with a concern for systemic biases—while there currently does not appear to be ready means to better ensure a judiciary full of systematically less biased judges. The question here is not “which is more biased: PAs or judges?” but rather “which produces more racially inequitable outcomes: judges utilizing PAs or judges alone?” Even if improved algorithms’ judgments are less biased than those of judges, we must consider how the human judge, who is still the final arbiter of decisions, interacts with the tool. Is a “high risk” score more salient to a judge when given to a black defendant, perhaps leading to continued or even heightened punitive treatment being disproportionately shown towards black offenders? Simultaneously, is a “low risk” score only salient to judges when given to a white defendant, or can it help a judge overcome implicit biases to also show more leniency towards a “low risk” black offender? In other words, does utilizing this tool serve to exacerbate, confirm, or ameliorate the perpetuation of racial inequity in judges’ decisions? Much more empirical data is required to explore these questions and come to more definitive conclusions. However, this uncertainty is no reason to completely abandon PAs at this stage, for PAs hold great promise for net gains in racial equity because we can and should keep working to overcome their structural flaws. In conclusion, while COMPAS in its current form operates in a racially biased way, this factor alone is not enough to forgo the use of PAs in the criminal justice system: we must consider the extent of unfair racial disparities perpetuated by tools like COMPAS relative to the extent of unfair racial disparities perpetuated when judges make decisions without the help of a tool like COMPAS. Despite PAs’ flaws, we must not instinctively fall back on the alternative of leaving judges to their own devices, where human cognitive biases reign unchecked. We must embrace the possibility that we can improve human decision-making by using ever-improving tools like properly crafted risk assessment instruments.


1 ProPublica, “Machine Bias.”

2 Hellman, “Measuring Algorithmic Fairness,” 832-836.

3 Ibid, 840-841.

4 Ibid, 840-841.

5 National Institute of Justice, “Relationship between Race, Ethnicity, and Sentencing Outcomes: A Meta-Analysis of Sentencing Research.” 6 Arnold, Dobbie, and Yang, “Racial Bias in Bail Decisions,” 1886.

7 Bureau of Justice Statistics, “Federal Sentencing Disparity: 2005-2012,” 1.

8 Kleinberg et al., “Human Decisions and Machine Predictions,” 241.

9 Corbett-Davies et al., “Algorithmic Decision Making and the Cost of Fairness,” 9.

10 Hellman, “Measuring Algorithmic Fairness,” 865.

Bibliography Angwin, Julia, Jeff Larson, Surya Mattu, Lauren Kirchner. “Machine Bias.” Pro- Publica. May 23,

2016. as-risk-assessments-in-criminal-

sentencing. Arnold, Savid, Will Dobbie, Crystal S Yang. “Racial Bias in Bail Decisions.” The Quarterly Journal of

Economics 133, no. 4 (November 2018): 1885–1932. Bureau of Justice Statistics, “Federal Sentencing Disparity: 2005-2012.” 248768. October, 2015. Corbett-Davies, Sam, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. “Algorithmic Decision

Making and the Cost of Fairness.” In Proceedings of the 23rd acm sigkdd international conference

on knowledge discovery and data mining, pp. 797-806. 2017. Hellman, Deborah. “Measuring Algorithmic Fairness.” Virginia Public Law and Legal Theory

Research Paper, no. 2019-39 (July 2019). Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, Sendhil Mul- lainathan. “Human

Decisions and Machine Predictions.” The Quarterly Journal of Economics 133, no. 1 (February

2018): 237–293. https://doi. org/10.1093/qje/qjx032. National Institute of Justice. “Relationship between Race, Ethnicity, and Sen- tencing Outcomes: A

Meta-Analysis of Sentencing Research.” Ojmarrh Mitchell, Doris L. MacKenzie. 208129.

December, 2004. https://www.

Acknowledgments I would like to thank Professor Frick and Masny for teaching the seminar “The Ethics of Emerging Technologies” for which I wrote this paper. Thank you for bringing my attention to this topic and Hellman’s paper and for helping me clarify my argument. I would like to thank my dad for helping me talk through ideas and providing feedback on my first draft of this paper.