skip to main content
10.1145/3097983.3098167acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Achieving Non-Discrimination in Data Release

Authors Info & Claims
Published:13 August 2017Publication History

ABSTRACT

Discrimination discovery and prevention/removal are increasingly important tasks in data mining. Discrimination discovery aims to unveil discriminatory practices on the protected attribute (e.g., gender) by analyzing the dataset of historical decision records, and discrimination prevention aims to remove discrimination by modifying the biased data before conducting predictive analysis. In this paper, we show that the key to discrimination discovery and prevention is to find the meaningful partitions that can be used to provide quantitative evidences for the judgment of discrimination. With the support of the causal graph, we present a graphical condition for identifying a meaningful partition. Based on that, we develop a simple criterion for the claim of non-discrimination, and propose discrimination removal algorithms which accurately remove discrimination while retaining good data utility. Experiments using real datasets show the effectiveness of our approaches.

References

  1. Philip Adler, Casey Falk, Sorelle A Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. 2016. Auditing black-box models for indirect influence. ICDM'16. IEEE, 1--10. Google ScholarGoogle ScholarCross RefCross Ref
  2. George A Anastassiou. 2009. Probabilistic inequalities. World Scientific.Google ScholarGoogle Scholar
  3. Martin Andersen, Joachim Dahl, and Lieven Vandenberghe. 2004. CVXOPT. http://cvxopt.org/. (2004).Google ScholarGoogle Scholar
  4. Francesco Bonchi, Sara Hajian, Bud Mishra, and Daniele Ramazzotti 2017. Exposing the probabilistic causal structure of discrimination. International Journal of Data Science and Analytics, Vol. 3, 1 (2017), 1--21. Google ScholarGoogle ScholarCross RefCross Ref
  5. Toon Calders and Sicco Verwer 2010. Three naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery Vol. 21, 2 (2010), 277--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Diego Colombo and Marloes H Maathuis 2014. Order-independent constraint-based causal structure learning. JMLR, Vol. 15, 1 (2014), 3741--3782.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Thomas H Cormen. 2009. Introduction to algorithms. MIT press.Google ScholarGoogle Scholar
  8. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel 2012. Fairness through awareness. In Proceedings of ITCS. ACM, 214--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In SIGKDD'15. ACM, 259--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sheila R Foster. 2004. Causation in Antidiscrimination Law: Beyond Intent Versus Impact. Hous. L. Rev. Vol. 41 (2004), 1469.Google ScholarGoogle Scholar
  11. Clark Glymour and others 2004. The TETRAD project. http://www.phil.cmu.edu/tetrad. (2004).Google ScholarGoogle Scholar
  12. Sara Hajian and Josep Domingo-Ferrer 2013. A methodology for direct and indirect discrimination prevention in data mining. JKDE, Vol. 25, 7 (2013), 1445--1459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sara Hajian, Josep Domingo-Ferrer, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. 2015. Discrimination-and privacy-aware patterns. Data Mining and Knowledge Discovery Vol. 29, 6 (2015), 1733--1782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Moritz Hardt, Eric Price, and Nati Srebro 2016. Equality of opportunity in supervised learning. In NIPS'16. 3315--3323.Google ScholarGoogle Scholar
  15. Markus Kalisch and Peter Bühlmann 2007. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research Vol. 8 (2007), 613--636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Faisal Kamiran and Toon Calders 2012. Data preprocessing techniques for classification without discrimination. KAIS, Vol. 33, 1 (2012), 1--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. 2010. Discrimination aware decision tree learning. In ICDM'10. IEEE, 869--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. 2011. Fairness-aware learning through regularization approach ICDMW. IEEE, 643--650.Google ScholarGoogle Scholar
  19. Daphne Koller and Nir Friedman 2009. Probabilistic graphical models: principles and techniques. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mikhail K Kozlov, Sergei P Tarasov, and Leonid G Khachiyan. 1980. The polynomial solvability of convex quadratic programming. U. S. S. R. Comput. Math. and Math. Phys. Vol. 20, 5 (1980), 223--228.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Lichman. 2013. UCI Machine learning repository. http://archive.ics.uci.edu/ml. (2013).Google ScholarGoogle Scholar
  22. Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. 2011. k-NN as an implementation of situation testing for discrimination discovery and prevention SIGKDD'11. ACM, 502--510.Google ScholarGoogle Scholar
  23. Koray Mancuhan and Chris Clifton 2014. Combating discrimination using Bayesian networks. Artificial intelligence and law Vol. 22, 2 (2014), 211--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Richard E Neapolitan and others 2004. Learning bayesian networks. Vol. Vol. 38. Prentice Hall Upper Saddle River.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Statistics Netherlands. 2001. Volkstelling. https://sites.google.com/site/faisalkamiran/. (2001).Google ScholarGoogle Scholar
  26. Dino Pedreschi, Salvatore Ruggieri, and Franco Turini. 2009. Measuring Discrimination in Socially-Sensitive Decision Records SIAM SDM. Society for Industrial and Applied Mathematics, 581.Google ScholarGoogle Scholar
  27. Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware data mining. In SIGKDD'08. ACM, 560--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Andrea Romei and Salvatore Ruggieri 2014. A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review Vol. 29, 05 (2014), 582--638. Google ScholarGoogle ScholarCross RefCross Ref
  29. Salvatore Ruggieri, Dino Pedreschi, and Franco Turini. 2010. Data mining for discrimination discovery. TKDD, Vol. 4, 2 (2010), 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Peter Spirtes, Clark N Glymour, and Richard Scheines. 2000. Causation, prediction, and search. Vol. Vol. 81. MIT press.Google ScholarGoogle Scholar
  31. Alexander Statnikov, Jan Lemeir, and Constantin F Aliferis. 2013. Algorithms for discovery of multiple Markov boundaries. JMLR, Vol. 14, 1 (2013), 499--566.Google ScholarGoogle Scholar
  32. Indre vZliobait.e, Faisal Kamiran, and Toon Calders. 2011. Handling conditional discrimination. In ICDM'11. IEEE, 992--1001.Google ScholarGoogle Scholar
  33. Yongkai Wu and Xintao Wu 2016. Using loglinear model for discrimination discovery and prevention Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on. IEEE, 110--119.Google ScholarGoogle Scholar
  34. Richard S Zemel, Yu Wu, Kevin Swersky, Toniann Pitassi, and Cynthia Dwork 2013. Learning Fair Representations. ICML Vol. 28 (2013), 325--333.Google ScholarGoogle Scholar
  35. Lu Zhang and Xintao Wu 2017. Anti-discrimination learning: a causal modeling-based framework. International Journal of Data Science and Analytics (2017), 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  36. Lu Zhang, Yongkai Wu, and Xintao Wu. On discrimination discovery using causal networks. SBP-BRiMS 2016.Google ScholarGoogle Scholar
  37. Lu Zhang, Yongkai Wu, and Xintao Wu. Situation testing-based discrimination discovery: a causal inference approach IJCAI'16.Google ScholarGoogle Scholar
  38. Lu Zhang, Yongkai Wu, and Xintao Wu. Achieving non-discrimination in prediction. arXiv preprint arXiv:1703.00060 (2017).Google ScholarGoogle Scholar
  39. Lu Zhang, Yongkai Wu, and Xintao Wu 2017natexlabb. A causal framework for discovering and removing direct and indirect discrimination IJCAI'17.Google ScholarGoogle Scholar

Index Terms

  1. Achieving Non-Discrimination in Data Release

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
          August 2017
          2240 pages
          ISBN:9781450348874
          DOI:10.1145/3097983

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 August 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader