Skip to main content
Erschienen in: Knowledge and Information Systems 3/2013

01.06.2013 | Regular Paper

Quantifying explainable discrimination and removing illegal discrimination in automated decision making

verfasst von: Faisal Kamiran, Indrė Žliobaitė, Toon Calders

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recently, the following discrimination-aware classification problem was introduced. Historical data used for supervised learning may contain discrimination, for instance, with respect to gender. The question addressed by discrimination-aware techniques is, given sensitive attribute, how to train discrimination-free classifiers on such historical data that are discriminative, with respect to the given sensitive attribute. Existing techniques that deal with this problem aim at removing all discrimination and do not take into account that part of the discrimination may be explainable by other attributes. For example, in a job application, the education level of a job candidate could be such an explainable attribute. If the data contain many highly educated male candidates and only few highly educated women, a difference in acceptance rates between woman and man does not necessarily reflect gender discrimination, as it could be explained by the different levels of education. Even though selecting on education level would result in more males being accepted, a difference with respect to such a criterion would not be considered to be undesirable, nor illegal. Current state-of-the-art techniques, however, do not take such gender-neutral explanations into account and tend to overreact and actually start reverse discriminating, as we will show in this paper. Therefore, we introduce and analyze the refined notion of conditional non-discrimination in classifier design. We show that some of the differences in decisions across the sensitive groups can be explainable and are hence tolerable. Therefore, we develop methodology for quantifying the explainable discrimination and algorithmic techniques for removing the illegal discrimination when one or more attributes are considered as explanatory. Experimental evaluation on synthetic and real-world classification datasets demonstrates that the new techniques are superior to the old ones in this new context, as they succeed in removing almost exclusively the undesirable discrimination, while leaving the explainable differences unchanged, allowing for differences in decisions as long as they are explainable.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
This model does not express our belief how admission procedures happen. We use it for the purpose of illustration only.
 
2
Short notation of probabilities: \(P(+|e_i)\) means \(P(y=+|e=e_i)\).
 
Literatur
4.
Zurück zum Zitat Becker G (1971) The economics of discrimination. University of Chicago Press, ChicagoCrossRef Becker G (1971) The economics of discrimination. University of Chicago Press, ChicagoCrossRef
5.
Zurück zum Zitat Bickel P, Hammel E, O’Connell J (1975) Sex bias in graduate admissions: data from Berkeley. Science 187(4175):398–404CrossRef Bickel P, Hammel E, O’Connell J (1975) Sex bias in graduate admissions: data from Berkeley. Science 187(4175):398–404CrossRef
6.
Zurück zum Zitat Calders T, Kamiran F, Pechenizkiy M (2009) Building classifiers with independency constraints. In: IEEE ICDM workshop on domain driven data mining (DDDM’09), pp 13–18 Calders T, Kamiran F, Pechenizkiy M (2009) Building classifiers with independency constraints. In: IEEE ICDM workshop on domain driven data mining (DDDM’09), pp 13–18
7.
Zurück zum Zitat Calders T, Verwer S (2010) Three naive bayes approaches for discrimination-free classification. Data Mining Knowl Discov 21(2):277–292MathSciNetCrossRef Calders T, Verwer S (2010) Three naive bayes approaches for discrimination-free classification. Data Mining Knowl Discov 21(2):277–292MathSciNetCrossRef
8.
Zurück zum Zitat Chan PK, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (KDD’98), pp 164–168 Chan PK, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (KDD’98), pp 164–168
9.
Zurück zum Zitat Chawla N, Hall L, Joshi A (2005) Wrapper-based computation and evaluation of sampling methods for imbalanced datasets. In: Proceedings of the 1st international workshop on Utility-based data mining, pp 24–33 Chawla N, Hall L, Joshi A (2005) Wrapper-based computation and evaluation of sampling methods for imbalanced datasets. In: Proceedings of the 1st international workshop on Utility-based data mining, pp 24–33
10.
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH
11.
Zurück zum Zitat Collard D (1972) The economics of discrimination. Econ J 82(326):788–790CrossRef Collard D (1972) The economics of discrimination. Econ J 82(326):788–790CrossRef
12.
Zurück zum Zitat Dedman B (1988) The color of money: atlanta blacks losing in home loans scramble: banks favor white areas by 5–1 margin. Atlanta J Const Dedman B (1988) The color of money: atlanta blacks losing in home loans scramble: banks favor white areas by 5–1 margin. Atlanta J Const
13.
Zurück zum Zitat Dewey D (1958) The economics of discrimination. South Econ J 24(4):494–496CrossRef Dewey D (1958) The economics of discrimination. South Econ J 24(4):494–496CrossRef
14.
Zurück zum Zitat Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (KDD)), pp 155–164 Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (KDD)), pp 155–164
15.
Zurück zum Zitat Dutch Central Bureau for Statistics (2001) Volkstelling Dutch Central Bureau for Statistics (2001) Volkstelling
16.
Zurück zum Zitat Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on, artificial intelligence (IJCAI’01), pp 973–978 Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on, artificial intelligence (IJCAI’01), pp 973–978
17.
Zurück zum Zitat Ellis E (2005) EU anti-discrimination law. Oxford University Press, Oxford Ellis E (2005) EU anti-discrimination law. Oxford University Press, Oxford
20.
Zurück zum Zitat Hajian S, Domingo-Ferrer J, Martinez-Balleste A (2011) Discrimination prevention in data mining for intrusion and crime detection. In: IEEE symposium on computational intelligence in cyber security (CICS). IEEE, pp 47–54 Hajian S, Domingo-Ferrer J, Martinez-Balleste A (2011) Discrimination prevention in data mining for intrusion and crime detection. In: IEEE symposium on computational intelligence in cyber security (CICS). IEEE, pp 47–54
21.
Zurück zum Zitat Hajian S, Domingo-Ferrer J, Martínez-Ballesté A (2011) Rule protection for indirect discrimination prevention in data mining. Model Dec Artif Intell 6820:211–222 Hajian S, Domingo-Ferrer J, Martínez-Ballesté A (2011) Rule protection for indirect discrimination prevention in data mining. Model Dec Artif Intell 6820:211–222
22.
Zurück zum Zitat Hart M (2005) Subjective decisionmaking and unconscious discrimination. Alabama Law Rev 56:741 Hart M (2005) Subjective decisionmaking and unconscious discrimination. Alabama Law Rev 56:741
23.
Zurück zum Zitat Kamiran F, Calders T (2009) Classifying without discriminating. In: Proceedings of the 2nd international conference on computer, control and, communication (IC4), pp 1–6 Kamiran F, Calders T (2009) Classifying without discriminating. In: Proceedings of the 2nd international conference on computer, control and, communication (IC4), pp 1–6
24.
Zurück zum Zitat Kamiran F, Calders T (2010) Classification with no discrimination by preferential sampling. In: Proceedings of the 19th annual machine learning conference of Belgium and the Netherlands (BENELEARN’10), pp 1–6 Kamiran F, Calders T (2010) Classification with no discrimination by preferential sampling. In: Proceedings of the 19th annual machine learning conference of Belgium and the Netherlands (BENELEARN’10), pp 1–6
25.
Zurück zum Zitat Kamiran F, Calders T (2012) Data preprocessing techniques for classification without discrimination. Knowl Inf Syst 33:1–33 Kamiran F, Calders T (2012) Data preprocessing techniques for classification without discrimination. Knowl Inf Syst 33:1–33
26.
Zurück zum Zitat Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of IEEE international conference on data mining (ICDM), pp 869–874 Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of IEEE international conference on data mining (ICDM), pp 869–874
27.
Zurück zum Zitat Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324MATHCrossRef Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324MATHCrossRef
28.
Zurück zum Zitat Koknar-Tezel S, Latecki L (2010) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inf Syst 24(2):1–23 Koknar-Tezel S, Latecki L (2010) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inf Syst 24(2):1–23
29.
Zurück zum Zitat Krueger A (1963) The economics of discrimination. J Polit Econ 71(5):481–486CrossRef Krueger A (1963) The economics of discrimination. J Polit Econ 71(5):481–486CrossRef
30.
Zurück zum Zitat Luong B, Ruggieri S, Turini F (2011) k-nn as an implementation of situation testing for discrimination discovery and prevention. Technical Report TR-11-04, Dipartimento di Informatica, Universita di Pisa Luong B, Ruggieri S, Turini F (2011) k-nn as an implementation of situation testing for discrimination discovery and prevention. Technical Report TR-11-04, Dipartimento di Informatica, Universita di Pisa
31.
Zurück zum Zitat Margineantu D, Dietterich T (1999) Learning decision trees for loss minimization In: Multi-class problems. Technical report, Department of Computer Science, Oregon State University Margineantu D, Dietterich T (1999) Learning decision trees for loss minimization In: Multi-class problems. Technical report, Department of Computer Science, Oregon State University
32.
Zurück zum Zitat Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (KDD’08) Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (KDD’08)
33.
Zurück zum Zitat Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM international conference on data mining (SDM’09), pp 581–592 Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM international conference on data mining (SDM’09), pp 581–592
34.
Zurück zum Zitat Reder M (1958) The economics of discrimination. Am Econ Rev 48(3):495–500 Reder M (1958) The economics of discrimination. Am Econ Rev 48(3):495–500
35.
Zurück zum Zitat Ruggieri S, Pedreschi D, Turini F (2010) DCUBE: discrimination discovery in databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’10). ACM, pp 1127–1130 Ruggieri S, Pedreschi D, Turini F (2010) DCUBE: discrimination discovery in databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’10). ACM, pp 1127–1130
36.
Zurück zum Zitat Ruggieri S, Pedreschi D, Turini F (2010) Integrating induction and deduction for finding evidence of discrimination. Artif Intell Law 18:1–43 Ruggieri S, Pedreschi D, Turini F (2010) Integrating induction and deduction for finding evidence of discrimination. Artif Intell Law 18:1–43
37.
Zurück zum Zitat Sawhill I (1973) The economics of discrimination against women: some new findings. J Human Res 8(3):383–396CrossRef Sawhill I (1973) The economics of discrimination against women: some new findings. J Human Res 8(3):383–396CrossRef
38.
Zurück zum Zitat Simpson EH (1951) The interpretation of interaction in contingency tables. J R Stat Soc 13:238–241MATH Simpson EH (1951) The interpretation of interaction in contingency tables. J R Stat Soc 13:238–241MATH
40.
Zurück zum Zitat Turney P (2000) Cost-sensitive learning bibliography. In: Institute for Information Technology, National Research Council, Ottawa, Canada Turney P (2000) Cost-sensitive learning bibliography. In: Institute for Information Technology, National Research Council, Ottawa, Canada
46.
Zurück zum Zitat Wang B, Japkowicz N (2009) Boosting support vector machines for imbalanced data Sets. Knowl Inf Syst, pp 1–20 Wang B, Japkowicz N (2009) Boosting support vector machines for imbalanced data Sets. Knowl Inf Syst, pp 1–20
47.
Zurück zum Zitat Zliobaite I, Kamiran F, Calders T (2011) Handling conditional discrimination. In: Proceedings of IEEE international conference on data mining (ICDM’11), pp 992–1001 Zliobaite I, Kamiran F, Calders T (2011) Handling conditional discrimination. In: Proceedings of IEEE international conference on data mining (ICDM’11), pp 992–1001
Metadaten
Titel
Quantifying explainable discrimination and removing illegal discrimination in automated decision making
verfasst von
Faisal Kamiran
Indrė Žliobaitė
Toon Calders
Publikationsdatum
01.06.2013
Verlag
Springer-Verlag
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2013
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-012-0584-8

Weitere Artikel der Ausgabe 3/2013

Knowledge and Information Systems 3/2013 Zur Ausgabe

Premium Partner