skip to main content
10.1145/3278721.3278742acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

Fair Forests: Regularized Tree Induction to Minimize Model Bias

Published:27 December 2018Publication History

ABSTRACT

The potential lack of fairness in the outputs of machine learning algorithms has recently gained attention both within the research community as well as in society more broadly. Surprisingly, there is no prior work developing tree-induction algorithms for building fair decision trees or fair random forests. These methods have widespread popularity as they are one of the few to be simultaneously interpretable, non-linear, and easy-to-use. In this paper we develop, to our knowledge, the first technique for the induction of fair decision trees.We show that our "Fair Forest" retains the benefits of the tree-based approach, while providing both greater accuracy and fairness than other alternatives, for both "group fairness'' and "individual fairness.'' We also introduce new measures for fairness which are able to handle multinomial and continues attributes as well as regression problems, as opposed to binary attributes and labels only. Finally, we demonstrate a new, more robust evaluation procedure for algorithms that considers the dataset in its entirety rather than only a specific protected attribute.

References

  1. Yahav Bechavod and Katrina Ligett. 2017. Learning Fair Classifiers: A Regularization-Inspired Approach. In FAT ML Workshop . http://arxiv.org/abs/1707.00044Google ScholarGoogle Scholar
  2. Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. 2017. A Convex Framework for Fair Regression. In FAT ML Workshop . http://arxiv.org/abs/1706.02409Google ScholarGoogle Scholar
  3. Leo Breiman. 2001. Random forests . Machine learning , Vol. 45, 1 (2001), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Leo Breiman. 2003. Manual on setting up, using, and understanding random forests v4.0 . Statistics Department University of California Berkeley, CA, USA (2003).Google ScholarGoogle Scholar
  5. Leo Breiman, Jerome Friedman, Charles J. Stone, and R.A. Olshen. 1984. Classification and Regression Trees. CRC press.Google ScholarGoogle Scholar
  6. Toon Calders, Asim Karim, Faisal Kamiran, Wasif Ali, and Xiangliang Zhang. 2013. Controlling Attribute Effect in Linear Regression. In 2013 IEEE 13th International Conference on Data Mining. IEEE, 71--80.Google ScholarGoogle Scholar
  7. Toon Calders and Sicco Verwer. 2010. Three Naive Bayes Approaches for Discrimination-free Classification . Data Min. Knowl. Discov. , Vol. 21, 2 (9 2010), 277--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: Reliable Large-scale Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). ACM, New York, NY, USA, 214--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, and Max Leiserson. 2017. Decoupled classifiers for fair and efficient machine learning. In FAT ML Workshop . https://doi.org/1707.06613Google ScholarGoogle Scholar
  11. Harrison Edwards and Amos Storkey. 2016. Censoring Representations with an Adversary. In International Conference on Learning Representations (ICLR) . http://arxiv.org/abs/1511.05897Google ScholarGoogle Scholar
  12. Manuel Ferná ndez-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? Journal of Machine Learning Research , Vol. 15 (2014), 3133--3181. http://jmlr.org/papers/v15/delgado14a.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jerome H. Friedman. 2002. Stochastic gradient boosting . Computational Statistics & Data Analysis , Vol. 38, 4 (2002), 367--378. http://www.sciencedirect.com/science/article/pii/S0167947301000652 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Eva Garcí a-Martí n and Niklas Lavesson. 2017. Is it ethical to avoid error analysis?. In FAT ML Workshop . http://arxiv.org/abs/1706.10237Google ScholarGoogle Scholar
  15. Patrick Hall and Navdeep Gill. 2017. Debugging the Black-Box COMPAS Risk Assessment Instrument to Diagnose and Remediate Bias . (2017). https://openreview.net/pdf?id=r1iWHVJ7ZGoogle ScholarGoogle Scholar
  16. Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning . In Advances in Neural Information Processing Systems 29 (NIPS 2016) . Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Faisal Kamiran and Toon Calders. 2009. Classifying without discriminating. In 2009 2nd International Conference on Computer, Control and Communication. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  18. Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. 2011. Fairness-aware Learning Through Regularization Approach. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW '11). IEEE Computer Society, Washington, DC, USA, 643--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Virgile Landeiro and Aron Culotta. 2016. Robust Text Classification in the Presence of Confounding Bias. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'16). AAAI Press, 186--193. http://dl.acm.org/citation.cfm?id=3015812.3015840 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. 2016. The Variational Fair Autoencoder. In International Conference on Learning Representations (ICLR) . http://arxiv.org/abs/1511.00830Google ScholarGoogle Scholar
  21. Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. 2013. Understanding variable importances in forests of randomized trees . In Advances in Neural Information Processing Systems 26, C.j.c. Burges, L Bottou, M Welling, Z Ghahramani, and K.q. Weinberger (Eds.). 431--439. http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/281.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. 2011. k-NN As an Implementation of Situation Testing for Discrimination Discovery and Prevention. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '11). ACM, New York, NY, USA, 502--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware Data Mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 560--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J R Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann series in M achine L earning, Vol. 1. Morgan Kaufmann. 302 pages. http://portal.acm.org/citation.cfm?id=152181 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Edward Raff. 2017. JSAT: Java Statistical Analysis Tool, a Library for Machine Learning . Journal of Machine Learning Research , Vol. 18, 23 (2017), 1--5. http://jmlr.org/papers/v18/16--131.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael Skirpan and Micha Gorelick. 2017. The Authority of "Fair" in Machine Learning. In FAT ML Workshop . http://arxiv.org/abs/1706.09976Google ScholarGoogle Scholar
  27. Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning Fair Representations. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research), Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. PMLR, Atlanta, Georgia, USA, 325--333. http://proceedings.mlr.press/v28/zemel13.html Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fair Forests: Regularized Tree Induction to Minimize Model Bias

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          AIES '18: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society
          December 2018
          406 pages
          ISBN:9781450360128
          DOI:10.1145/3278721

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 December 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          AIES '18 Paper Acceptance Rate61of162submissions,38%Overall Acceptance Rate61of162submissions,38%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader