In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction

Wang, Caroline; Han, Bin; Patel, Bhrij; Rudin, Cynthia

doi:10.1007/s10940-022-09545-w

In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction

Original Paper
Published: 28 March 2022

Volume 39, pages 519–581, (2023)
Cite this article

Journal of Quantitative Criminology Aims and scope Submit manuscript

Caroline Wang¹^na1,
Bin Han ORCID: orcid.org/0000-0002-5280-9456²^na1,
Bhrij Patel³ &
…
Cynthia Rudin^3,4

2435 Accesses
21 Citations
1 Altmetric
Explore all metrics

Abstract

Objectives

We study interpretable recidivism prediction using machine learning (ML) models and analyze performance in terms of prediction ability, sparsity, and fairness. Unlike previous works, this study trains interpretable models that output probabilities rather than binary predictions, and uses quantitative fairness definitions to assess the models. This study also examines whether models can generalize across geographic locations.

Methods

We generated black-box and interpretable ML models on two different criminal recidivism datasets from Florida and Kentucky. We compared predictive performance and fairness of these models against two methods that are currently used in the justice system to predict pretrial recidivism: the Arnold PSA and COMPAS. We evaluated predictive performance of all models on predicting six different types of crime over two time spans.

Results

Several interpretable ML models can predict recidivism as well as black-box ML models and are more accurate than COMPAS or the Arnold PSA. These models are potentially useful in practice. Similar to the Arnold PSA, some of these interpretable models can be written down as a simple table. Others can be displayed using a set of visualizations. Our geographic analysis indicates that ML models should be trained separately for separate locations and updated over time. We also present a fairness analysis for the interpretable models.

Conclusions

Interpretable ML models can perform just as well as non-interpretable methods and currently-used risk assessment scales, in terms of both prediction accuracy and fairness. ML models might be more accurate when trained separately for distinct locations and kept up-to-date.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative user study of human predictions in algorithm-supported recidivism risk assessment

Article Open access 15 March 2024

Predicting Recidivism Risk Meets AI Act

Article Open access 10 June 2022

Risk Assessment, Predictive Algorithms and Preventive Justice

Data Availability Statement

The Broward County, FL dataset generated and analyzed during the current study is available from the corresponding author on request. The Kentucky dataset is not publicly available but can be accessed through a special data request to the Kentucky Department of Shared Services, Research and Statistics.

Notes

Kentucky created and implemented their own tool in 2006 but transitioned to the Arnold PSA in 2013.
For decreasing (respectively increasing) stumps, if the coefficient for the largest (respectively smallest) stump is negative, the function f will still be monotonic because the negative value will be subtracted from all values of the remaining stumps
We note that a real-valued score S between 0 and 1 is well-calibrated if \(P(Y = 1 | S = s) = s\). Well-calibration says that the predicted probability of recidivism should be the same as the true probability of recidivism (Verma and Rubin 2018). Although well-calibration is the definition of calibration that is standard in the statistics community, we consider monotonic-calibration here because any score that is monotonically-calibrated can be transformed to be well-calibrated.

References

Agarwal A, Beygelzimer A, Dudík M, Langford J, Wallach H (2018) A reductions approach to fair classification. In: Proceedings of the 35th international conference on machine learning. https://proceedings.mlr.press/v80/agarwal18a.html
Agarwal A, Dudík M, Wu ZS (2019) Fair regression: quantitative definitions and reduction-based algorithms. In: Proceedings of the 36th international conference on machine learning. https://proceedings.mlr.press/v97/agarwal19d.html
Alfred B (2006) The crime drop in America: an explanation of some recent crime trends. J Scand Stud Criminol Crime Prev 7:17–35
Article Google Scholar
American Law Institute (2017) Model penal code. https://www.ali.org/projects/show/sentencing/
Angelino E, Larus-Stone N, Alabi D, Seltzer M, Rudin C (2018) Certifiably optimal rule lists for categorical data. J Mach Learn Res 19:1–79
Google Scholar
Angwin J, Larson J, Mattu S, Kirchner L (2016) Machine bias. Technical report, ProPublica
Barabas C, Dinakar K, Doyle C (2019) The problems with risk assessment tools. The New York Times. https://www.nytimes.com/2019/07/17/opinion/pretrial-ai.html
Barocas S, Selbst AD (2016) Big data’s disparate impact. Calif Law Rev 104:671–732
Google Scholar
Berk R (2017) An impact assessment of machine learning risk forecasts on parole board decisions and recidivism. Exp Criminol 13:193–216
Article Google Scholar
Berk RA, He Y, Sorenson SB (2005) Developing a practical forecasting screener for domestic violence incidents. Eval Rev 29(4):358–383
Article Google Scholar
Berk R, Heidari H, Jabbari S, Joseph M, Kearns M, Morgenstern J, Neel S, Roth A (2017a) A convex framework for fair regression. arXiv:1706.02409
Berk R, Heidari H, Jabbari S, Kearns M, Roth A (2017b) Fairness in criminal justice risk assessments: the state of the art. Sociol Methods Res
Bindler A, Hjalmarsson R (2018) How punishment severity affects jury verdicts: evidence from two natural experiments. Am Econ J 10
Binns R (2018) Fairness in machine learning: lessons from political philosophy. J Mach Learn Res 81:1–11
Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, New York
Google Scholar
Brennan T, Dieterich W, Ehret B (2009) Evaluating the predictive validity of the COMPAS risk and needs assessment system. Crim Justice Behav 36(1):21–40
Article Google Scholar
Bureau of Justice Assistance (2020) History of risk assessment. Bureau of Justice Assistance. https://psrac.bja.ojp.gov/basics/history
Burgess EW (1928) Factors determining success or failure on parole
Bushway SD, Piehl AM (2007) The inextricable link between age and criminal history in sentencing. Crime Delinq 53(1):156–183
Article Google Scholar
Cadigan TP, Lowenkamp CT (2011) Implementing risk assessment in the federal pretrial services system. Federal Probation 75(2)
Carollo J, Hines M, Hedlund J (2007) Expanded validation of a decision aid for pretrial conditional release. Technical report, Central Connecticut State University
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Chouldechova A (2017) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5(2):153–163
Article Google Scholar
Cook P, Laub J (2002) After the epidemic recent trends in youth violence in the United States. Crime Justice 29:1–37
Article Google Scholar
Corbett-Davies S, Goel S (2018) The measure and mismeasure of fairness: a critical review of fair machine learning. arXiv:180800023v2
Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A (2017) Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 797–806
CPAT of Pretrial Services (2015) The colorado pretrial assessment tool (cpat): Administration, scoring, and reporting manual. https://university.pretrial.org/HigherLogic/System/DownloadDocumentFile.ashx?DocumentFileKey=47e978bb-3945-9591-7a4f-77755959c5f5
Dawes RM, Faust D, Meehl PE (1989) Clinical versus actuarial judgment. Science 243(4899):1668–1674
Article Google Scholar
Defronzo J (1984) Climate and crime: tests of an FBI assumption. Environ Behav 16
Desmarais S, Garrett B, Rudin C (2019) Risk assessment tools are not a failed ’minority report’. Law360. https://www.law360.com/access-to-justice/articles/1180373/risk-assessment-tools-are-not-a-failed-minority-report-
Dieterich W, Mendoza C, Brennan T (2016) COMPAS risk scales: demonstrating accuracy equity and predictive parity: performance of the COMPAS risk scales in Broward county. Technical report, Northpointe, Inc
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference, ITCS ’12, pp 214–226, New York. ACM
Electronic Privacy Information Center (2016) Algorithms in the criminal justice system. Electronic Privacy Information Center. https://epic.org/algorithmic-transparency/crim-justice/
Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
Google Scholar
Flores AW, Lowenkamp CT, Bechtel K (2016) False positives, false negatives, and false analyses: a rejoinder to “Machine bias: there’s software used across the country to predict future criminals”. Federal Probation 80(2)
Frase RS, Roberts J, Hester R, Mitchell KL (2015) Robina institute of criminal law and criminal justice, criminal history enhancements sourcebook. https://robinainstitute.umn.edu/publications/criminal-history-enhancements-sourcebook
Freeman K (2016) Algorithmic injustice: How the wisconsin supreme court failed to protect due process rights in state v. loomis. N C J Law Technol 18. http://ncjolt.org/wp-content/uploads/2016/12/Freeman_Final.pdf
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Article Google Scholar
Garrett B, Stevenson M (2020) Open risk assessments. Behav Sci Law. https://sites.law.duke.edu/justsciencelab/2019/09/15/comment-on-pattern-by-brandon-l-garrett-megan-t-stevenson/
Gelb A, Velazquez T, Trust PC, of America, U. S. (2018) The changing state of recidivism: fewer people going back to prison. The Pew Charitable Trusts
Goel S, Rao JM, Shroff R (2016) Precinct or prejudice? understanding racial disparities in New York city’s stop-and-frisk policy. Inst Math Stat 10(1):365–394
Google Scholar
Grove WM, Meehl PE (1996) Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy. Psychol Public Policy Law 2(2):293
Article Google Scholar
Hanson R, Thornton D (2003) Notes on the development of static-2002. Department of the Solicitor General of Canada, Ottawa
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. In: Advances in neural information processing systems, pp 3315–3323
Harris GT, Rice ME (2008) Encyclopedia of Psychology and Law, chapter Violence Risk Appraisal Guide (VRAG), p 848. SAGE Publications, Inc.
Hart H (1924) Predicting parole success. J Crim Law Criminol 14
Hoffman PB, Adelberg S (1980) The salient factor score: a nontechnical overview. Federal Probation 44:44
Google Scholar
Howard P, Francis B, Soothill K, Humphreys L (2009) OGRS 3: the revised offender group reconviction scale. Technical report, Ministry of Justice
James N (2018) Risk and needs assessment in the federal prison system. Technical report, Congressional Research Service
Kehl D, Guo P, Kessler S (2017) Algorithms in the criminal justice system: assessing the use of risk assessments in sentencing. https://cyber.harvard.edu/publications/2017/07/Algorithms
Kim J, Bushway S, Tsao H (2016) Identifying classes of explanation for crime drop: period and cohort effects for New York state. J Quant Criminol 32:357–375
Article Google Scholar
Kleiman M, Ostrom BJ, Cheesman FL (2007) Using risk assessment to inform sentencing decisions for nonviolent offenders in Virginia. Crime Delinq 53(1):106–132
Article Google Scholar
Kleinberg J, Mullainathan S, Raghavan M (2017) Inherent trade-offs in the fair determination of risk scores. In: Proceedings of the 8th conference on innovations in theoretical computer science
Lakkaraju H, Rudin C (2017) Learning cost-effective and interpretable treatment regimes. In: Singh A, Zhu J (eds) Proceedings of the 20th international conference on artificial intelligence and statistics, vol 54 of proceedings of machine learning research, pp 166–175, Fort Lauderdale. PMLR. http://proceedings.mlr.press/v54/lakkaraju17a.html
Larson J, Mattu S, Kirchner L, Angwin J (2016) How we analyzed the COMPAS recidivism algorithm. Technical report, ProPublica. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
Latessa E, Smith P, Lemke R, Makarios M, Lowenkamp C (2009) Creation and validation of the ohio risk assessment system. Technical report, University of Cincinnati School of Criminal Justice Center for Criminal Justice Research
Lazarsfeld PF (1974) An evaluation of the pretrial services agency of the Vera institute of justice. Vera Institute, New York
Google Scholar
Lou Y, Caruana R, Gehrke J, Hooker G (2013) Accurate intelligible models with pairwise interactions. In: 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 623–631. https://doi.org/10.1145/2487575.2487579
Ludwig J, Mullainathan S (2021) Fragile algorithms and fallible decision-makers: lessons from the justice system. J Econ Perspect 35(4):71–96
Article Google Scholar
Matthews B, Minton J (2017) Rethinking one of the criminology’s ‘brute facts’: the age-crime curve and the crime drop in Scotland. Eur J Criminol 15(3):296–320
Article Google Scholar
MHS Assessments (2017) Level of service/case management inventory: an offender management system. MHS Public Safety. https://issuu.com/mhs-assessments/docs/ls-cmi.lsi-r.brochure_insequence
Milgram A (2014) Why smart statistics are the key to fighting crime
Mishra A (2014) Climate and crime. Global J Sci Front Res 14
Nafekh M, Motiuk LL (2002) The statistical information on recidivism, revised 1 (SIR-R1) scale: a psychometric examination. Correctional Service of Canada. Research Branch
Neuilly M-A, Zgoba KM, Tita GE, Lee SS (2011) Predicting recidivism in homicide offenders using classification tree analysis. Homicide Stud 15(2):154–176
Article Google Scholar
Northpointe (2013) Practitioner’s Guide to COMPAS Core. http://www.northpointeinc.com/downloads/compas/Practitioners-Guide-COMPAS-Core-_031915.pdf
Northpointe Inc. (2009) Measurement & treatment implications of COMPAS core scales. Technical report, Northpointe Inc
O’Neil C (2016) Weapons of math destruction. Crown Books, New York
Google Scholar
Orbis (2014) Service planning instrument: an innovative assessment and case planning tool. https://orbispartners.com/wp-content/uploads/2014/07/SPIn-Brochure.pdf
Palocsay W, PingWang S, Brookshire RG (2000) Predicting criminal recidivism using neural networks. Socio-Econ Plan Sci 34:271–284
Article Google Scholar
Pleiss G, Raghavan M, Wu F, Kleinberg J, Weinberger K (2017) On fairness and calibration. In: Advances in neural information processing systems, pp 5680–5689
Pretrial Justice Institute (2020) Updated position on pretrial risk assessment tools. Pretrial Justice Institute. https://university.pretrial.org/viewdocument/updated-statement-on-pretrial-risk
Public Safety Assessment (2019) Risk factors and formulas. Laura and John Arnold Foundation. https://www.psapretrial.org/about/
Ranson M (2014) Crime, weather, and climate change. J Environ Econ Manag 67
Richard B (2019) Accuracy and fairness for juvenile justice risk assessments. J Empir Leg Stud 16:174–194
Google Scholar
Roberts J, von Hirsch A (2010) Previous convictions at sentening - theoretical and applied perspective. Bloomsbury Publishing, London
Google Scholar
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1:206–215
Article Google Scholar
Rudin C, Wang C, Coker B (2020) The age of secrecy and unfairness in recidivism prediction. Harvard Data Sci Rev 2(1). https://hdsr.mitpress.mit.edu/pub/7z10o269
Sherman LW (2007) The power few: experimental criminology and the reduction of harm. J Exp Criminol 3(4):299–321
Article Google Scholar
Singh A, Mohapatra S (2021) Development of risk assessment framework for first time offenders using ensemble learning. IEEE Access 9:135024–135033
Article Google Scholar
Skeem J, Lin Z, Jung J, Goel S (2020) The limits of human predictions of recidivism. Sci Adv 6
Smith B (2016) Auditing deep neural networks to understand recidivism predictions. PhD thesis, Haverford College
Soares E, Angelov PP (2019) Fair-by-design explainable models for prediction of recidivism. arXiv:abs/1910.02043
Starr SB (2015) The risk assessment era: an overdue debate. Federal Sentencing Reporter 27:205–206
Article Google Scholar
Stevenson M (2018) Assessing risk assessment in action. Minnesota Law Review. http://www.minnesotalawreview.org/wp-content/uploads/2019/01/13Stevenson_MLR.pdf
Stevenson MT, Slobogin C (2018) Algorithmic risk assessments and the double-edged sword of youth. Washington Univ Law Rev 96(18–36)
The Leadership Conference on Civil and Human Rights (2018) The use of pretrial “risk assessment” instrument: a shared statement of civil rights concerns. http://civilrightsdocs.info/pdf/criminal-justice/Pretrial-Risk-Assessment-Full.pdf
Tollenaar N, van der Heijden P (2013) Which method predicts recidivism best? A comparison of statistical, machine learning and data mining predictive models. J R Stat Soc A Stat Soc 176(2):565–584
Article Google Scholar
Turner S, Hess J, Jannetta J (2009) Development of the California Static Risk Assessment Instrument (CSRA). CEBC Working Papers
United States Census Bureau (2015) Hispanic or latino origin by race 2011–2015 American community survey 5-year estimates. https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_15_5YR_B03002&prodType=table
United States Census Bureau (2019) Quickfacts kentucy United States. https://www.census.gov/quickfacts/fact/table/KY,US/PST04521
Ustun B, Rudin C (2015) Supersparse linear integer models for optimized medical scoring systems. Mach Learn 1–43
Ustun B, Rudin C (2017) Optimized risk scores. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining
Ustun B, Rudin C (2019) Learning optimized risk scores. J Mach Learn Res 20(150):1–75
Google Scholar
Vapnik V, Chervonenkis A (1964) A note on one class of perceptrons. Autom Remote Control 25
Verma S, Rubin J (2018) Fairness definitions explained. In: ACM/IEEE international workshop on software fairness, pp 1–7. ACM
Virginia Department of Criminal Justice Services (2018) Virginia pretrial risk assessment instrument - (vprai). https://www.dcjs.virginia.gov/sites/dcjs.virginia.gov/files/publications/corrections/virginia-pretrial-risk-assessment-instrument-vprai_0.pdf
Wexler R (2017) When a computer program keeps you in jail: how computers are harming criminal justice. New York Times, p 27. Section A
Wolfgang ME (1987) Delinquency in a birth cohort. University of Chicago Press, Chicago
Google Scholar
Zemel R, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. In: International conference on machine learning, pp 325–333
Zeng J, Ustun B, Rudin C (2017) Interpretable classification models for recidivism prediction. J R Stat Soc A Stat Soc 180(3):689–722
Article Google Scholar
Zweig J (2010) Extraordinary conditions of release under the bail reform act. Harvard J Legis 47:555–585
Google Scholar

Download references

Acknowledgements

We thank the Broward County Sheriff’s office and the Kentucky Department of Shared Services, Research and Statistics for their assistance and provision of data. We would also like to thank Daniel Sturtevant from the Kentucky Department of Shared Services, Research and Statistics for providing significant insight into the Kentucky data set, and Berk Ustun for his advice on training RiskSLIM. Finally, we thank Brandon Garrett from Duke, Stuart Buck and Kristin Bechtel from Arnold Ventures, and Kathy Schiflett, Christy May, and Tara Blair from Kentucky Pretrial Services for their thoughtful comments on the article.

Funding

This study was partially supported by Arnold Ventures, the Department of Computer Science at Duke University, the Department of Electrical and Computer Engineering at Duke University, and the Lord Foundation of North Carolina. This report represents the findings of the authors and does not represent the views of any of the funding agencies.

Author information

C. Wang, B. Han: These authors contributed equally to this work.

Authors and Affiliations

Department of Computer Science, The University of Texas at Austin, Austin, TX, 78712, USA
Caroline Wang
Department of Information Science, The University of Washington, Seattle, WA, 98195, USA
Bin Han
Department of Computer Science, Duke University, Durham, NC, 27708, USA
Bhrij Patel & Cynthia Rudin
Department of Statistical Science, Duke University, LSRC D342, Research Drive, Durham, NC, 27708, USA
Cynthia Rudin

Authors

Caroline Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Han
View author publications
You can also search for this author in PubMed Google Scholar
Bhrij Patel
View author publications
You can also search for this author in PubMed Google Scholar
Cynthia Rudin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Han.

Ethics declarations

Conflict of interest

No additional institutional conflicts.

Code Availability

Our code is here: https://github.com/BeanHam/interpretable-machine-learning.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Nested Cross Validation Procedure

We applied fivefold nested cross validation to tune parameters. We split the entire data set into five equally-sized folds for the outer cross validation step. One fold was used as the holdout test set and the other four folds were used as the training set (call it “outer training set”). The inner loop deals only with the outer training set (\(\frac{4}{5}\)ths of the data). On this outer training set, we conducted fivefold cross validation and grid-searched hyperparameter values. After this point, each hyperparameter value had five validation results. We selected the parameter values with the highest average validation results and then trained the model with this best set of parameters on the entire outer training set and tested it on the holdout test set.

We repeated the process above until each one of the original five folds was used as the holdout test set. Ultimately, we had five holdout test results, with which we were able to calculate the average and standard deviation of the performance.

We applied a variant of the nested cross validation procedure described above to perform the analysis discussed in the “Recidivism Prediction Models Do Not Generalize Well Across Regions” section—where we trained models on one region and tested on the other region. For instance, when we trained models on Broward and tested them on Kentucky, the Kentucky data was treated as the holdout test set. We split the Broward data into five folds and used four folds to do cross validation and constructed the final model using the best parameters. We then tested the final model on the entire Kentucky data set, as well as the holdout test set from Broward. We rotated the four folds and repeated the above process five times.

Broward Data Processing

The Broward County data set consists of publicly available criminal history, court data and COMPAS scores from Broward County, Florida. The criminal history and demographic information were computed from raw data released by ProPublica (Angwin et al. 2016). The probational history was computed from public criminal records released by the Broward Clerk’s Office.

The screening date is the date on which the COMPAS score was calculated. The features and labels were computed for an individual with respect to a particular screening date. For individuals who have multiple screening dates, we compute the features for each screening date, such that the set of events for calculating features for earlier screening dates is included in the set of events for later screening dates. On occasion, an individual will have multiple COMPAS scores calculated on the same date. There appears to be no information distinguishing these scores other than their identification number, so we take the scores with the larger identification number. The recidivism labels were computed for the timescales of 6 months and 2 years. Some individuals were sentenced to prison as a result of their offense(s). We used only observations for which we have 6 months/2 years of data subsequent to the individual’s release date.

Below, we describe details of the feature and label generation process. The constructed features are presented in Table 4 at the end of this section.

Degree “(0)” charges seem to be very minor offenses, so we exclude these charges. We infer whether a charge is a felony, misdemeanor, or traffic charge based off the charge degree.
Some of our features rely on classifying the type of each offense (e.g., whether or not it is a violent offense). We infer this from the statute number, most of which correspond to statute numbers from the Florida state crime code.
The raw Propublica data includes arrest data as well as charge data. Because the arrest data does not include the statute, which is necessary for us to determine offense type, we use the charge data to compute features that require the offense type. We use both charge and arrest data to predict recidivism.
For each person on each COMPAS screening date, we identify the offense—which we call the current offense—that most likely triggered the COMPAS screening. The current offense date is the date of the most recent charge that occurred on or before the COMPAS screening date. Any charge that occurred on the current offense date is part of the current offense. In some cases, there is no prior charge that occurred near the COMPAS screening date, suggesting charges may be missing from the data set. For this reason we consider charges that occurred within 30 days of the screening date for computing the current offense. If there are no charges in this range, we say the current offense is missing. We exclude observations with missing current offenses. We used some of the COMPAS subscale items as features for our ML models. All such components of the COMPAS subscales that we compute are based on data that occurred prior to (not including) the current offense date.
The events/documents data includes a number of events (e.g., “File Affidavit Of Defense” or “File Order Dismissing Appeal”) related to each case, and thus to each person. To determine how many prior offenses occurred while on probation, or if the current offense occurred while on probation, we define a list of event descriptions indicating that an individual was taken on or off probation. Unfortunately, there appear to be missing events, as individuals often have consecutive “On” or consecutive “Off” events (e.g., two “On” events in a row, without an “Off” in between). In these cases, or if the first event is an “Off” event or the last event is an “On” event, we define two thresholds, \(t_{on}\) and \(t_{off}\). If an offense occurred within \(t_{on}\) days after an “On” event or \(t_{off}\) days before an “Off” event, we count the offense as occurring while on probation. We set \(t_{on}\) to 365 and \(t_{off}\) to 30. On the other hand, the “number of times on probation” feature is just the count of “On” events and the “number of times the probation was revoked” feature is just the count of “File order of Revocation of Probation” event descriptions (i.e., we do not infer missing probation events for these two features).
Current age is defined as the age in years, rounded down to the nearest integer, on the COMPAS screening date.
A juvenile charge is defined as an offense that occurred prior to the defendant’s 18th birthday.
Labels and features were computed using charge data.
The final data set contains 1954 records and 41 features.

Table 4 Features from Broward data set

In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction

Abstract

Objectives

Methods

Results

Conclusions

Access this article

Similar content being viewed by others

A comparative user study of human predictions in algorithm-supported recidivism risk assessment

Predicting Recidivism Risk Meets AI Act

Risk Assessment, Predictive Algorithms and Preventive Justice

Data Availability Statement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code Availability

Additional information

Publisher's Note

Appendix

Appendix

Nested Cross Validation Procedure

Broward Data Processing

Kentucky Data Processing

Why We Compare Only Against COMPAS and the PSA

Hyperparameters

Baseline Models, CART, EBM

Additive Stumps

RiskSLIM

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation