ABSTRACT
The potential lack of fairness in the outputs of machine learning algorithms has recently gained attention both within the research community as well as in society more broadly. Surprisingly, there is no prior work developing tree-induction algorithms for building fair decision trees or fair random forests. These methods have widespread popularity as they are one of the few to be simultaneously interpretable, non-linear, and easy-to-use. In this paper we develop, to our knowledge, the first technique for the induction of fair decision trees.We show that our "Fair Forest" retains the benefits of the tree-based approach, while providing both greater accuracy and fairness than other alternatives, for both "group fairness'' and "individual fairness.'' We also introduce new measures for fairness which are able to handle multinomial and continues attributes as well as regression problems, as opposed to binary attributes and labels only. Finally, we demonstrate a new, more robust evaluation procedure for algorithms that considers the dataset in its entirety rather than only a specific protected attribute.
- Yahav Bechavod and Katrina Ligett. 2017. Learning Fair Classifiers: A Regularization-Inspired Approach. In FAT ML Workshop . http://arxiv.org/abs/1707.00044Google Scholar
- Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. 2017. A Convex Framework for Fair Regression. In FAT ML Workshop . http://arxiv.org/abs/1706.02409Google Scholar
- Leo Breiman. 2001. Random forests . Machine learning , Vol. 45, 1 (2001), 5--32. Google ScholarDigital Library
- Leo Breiman. 2003. Manual on setting up, using, and understanding random forests v4.0 . Statistics Department University of California Berkeley, CA, USA (2003).Google Scholar
- Leo Breiman, Jerome Friedman, Charles J. Stone, and R.A. Olshen. 1984. Classification and Regression Trees. CRC press.Google Scholar
- Toon Calders, Asim Karim, Faisal Kamiran, Wasif Ali, and Xiangliang Zhang. 2013. Controlling Attribute Effect in Linear Regression. In 2013 IEEE 13th International Conference on Data Mining. IEEE, 71--80.Google Scholar
- Toon Calders and Sicco Verwer. 2010. Three Naive Bayes Approaches for Discrimination-free Classification . Data Min. Knowl. Discov. , Vol. 21, 2 (9 2010), 277--292. Google ScholarDigital Library
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: Reliable Large-scale Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . Google ScholarDigital Library
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). ACM, New York, NY, USA, 214--226. Google ScholarDigital Library
- Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, and Max Leiserson. 2017. Decoupled classifiers for fair and efficient machine learning. In FAT ML Workshop . https://doi.org/1707.06613Google Scholar
- Harrison Edwards and Amos Storkey. 2016. Censoring Representations with an Adversary. In International Conference on Learning Representations (ICLR) . http://arxiv.org/abs/1511.05897Google Scholar
- Manuel Ferná ndez-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? Journal of Machine Learning Research , Vol. 15 (2014), 3133--3181. http://jmlr.org/papers/v15/delgado14a.html Google ScholarDigital Library
- Jerome H. Friedman. 2002. Stochastic gradient boosting . Computational Statistics & Data Analysis , Vol. 38, 4 (2002), 367--378. http://www.sciencedirect.com/science/article/pii/S0167947301000652 Google ScholarDigital Library
- Eva Garcí a-Martí n and Niklas Lavesson. 2017. Is it ethical to avoid error analysis?. In FAT ML Workshop . http://arxiv.org/abs/1706.10237Google Scholar
- Patrick Hall and Navdeep Gill. 2017. Debugging the Black-Box COMPAS Risk Assessment Instrument to Diagnose and Remediate Bias . (2017). https://openreview.net/pdf?id=r1iWHVJ7ZGoogle Scholar
- Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning . In Advances in Neural Information Processing Systems 29 (NIPS 2016) . Google ScholarDigital Library
- Faisal Kamiran and Toon Calders. 2009. Classifying without discriminating. In 2009 2nd International Conference on Computer, Control and Communication. IEEE, 1--6.Google ScholarCross Ref
- Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. 2011. Fairness-aware Learning Through Regularization Approach. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW '11). IEEE Computer Society, Washington, DC, USA, 643--650. Google ScholarDigital Library
- Virgile Landeiro and Aron Culotta. 2016. Robust Text Classification in the Presence of Confounding Bias. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'16). AAAI Press, 186--193. http://dl.acm.org/citation.cfm?id=3015812.3015840 Google ScholarDigital Library
- Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. 2016. The Variational Fair Autoencoder. In International Conference on Learning Representations (ICLR) . http://arxiv.org/abs/1511.00830Google Scholar
- Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. 2013. Understanding variable importances in forests of randomized trees . In Advances in Neural Information Processing Systems 26, C.j.c. Burges, L Bottou, M Welling, Z Ghahramani, and K.q. Weinberger (Eds.). 431--439. http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/281.pdf Google ScholarDigital Library
- Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. 2011. k-NN As an Implementation of Situation Testing for Discrimination Discovery and Prevention. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '11). ACM, New York, NY, USA, 502--510. Google ScholarDigital Library
- Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware Data Mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 560--568. Google ScholarDigital Library
- J R Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann series in M achine L earning, Vol. 1. Morgan Kaufmann. 302 pages. http://portal.acm.org/citation.cfm?id=152181 Google ScholarDigital Library
- Edward Raff. 2017. JSAT: Java Statistical Analysis Tool, a Library for Machine Learning . Journal of Machine Learning Research , Vol. 18, 23 (2017), 1--5. http://jmlr.org/papers/v18/16--131.html Google ScholarDigital Library
- Michael Skirpan and Micha Gorelick. 2017. The Authority of "Fair" in Machine Learning. In FAT ML Workshop . http://arxiv.org/abs/1706.09976Google Scholar
- Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning Fair Representations. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research), Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. PMLR, Atlanta, Georgia, USA, 325--333. http://proceedings.mlr.press/v28/zemel13.html Google ScholarDigital Library
Index Terms
- Fair Forests: Regularized Tree Induction to Minimize Model Bias
Recommendations
Performance evaluation of a fair backoff algorithm for IEEE 802.11 DFWMAC
MobiHoc '02: Proceedings of the 3rd ACM international symposium on Mobile ad hoc networking & computingDue to hidden terminals and a dynamic topology, contention among stations in an ad-hoc network is not homogeneous. Some stations are at a disadvantage in opportunity of access to the shared channel and can suffer severe throughput degradation when the ...
Inter-AP coordination for fair throughput in infrastructure-based IEEE 802.11 mesh networks
IWCMC '06: Proceedings of the 2006 international conference on Wireless communications and mobile computingThis paper studies throughput fairness among different basic service sets (BSSs) in infrastructure-based IEEE 802.11 mesh networks, where inter-BSS interference is unavoidable because of the difficulty in frequency and coverage planning and the limited ...
Enhanced binary exponential backoff algorithm for fair channel access in the ieee 802.11 medium access control protocol
The medium access control protocol determines system throughput in wireless mobile ad hoc networks following the ieee 802.11 standard. Under this standard, asynchronous data transmissions have a defined distributed coordination function that allows ...
Comments