nach oben

Empirical Software Engineering

Erschienen in:

01.02.2024

Detection and evaluation of bias-inducing features in machine learning

verfasst von: Moses Openja, Gabriel Laberge, Foutse Khomh

Erschienen in: Empirical Software Engineering | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The cause-to-effect analysis can help us decompose all the likely causes of a problem, such as an undesirable business situation or unintended harm to the individual(s). This implies that we can identify how the problems are inherited, rank the causes to help prioritize fixes, simplify a complex problem and visualize them. In the context of machine learning (ML), one can use cause-to-effect analysis to understand the reason for the biased behavior of the system. For example, we can examine the root causes of biases by checking each feature for a potential cause of bias in the model. To approach this, one can apply small changes to a given feature or a pair of features in the data, following some guidelines and observing how it impacts the decision made by the model (i.e., model prediction). Therefore, we can use cause-to-effect analysis to identify the potential bias-inducing features, even when these features are originally are unknown. This is important since most current methods require a pre-identification of sensitive features for bias assessment and can actually miss other relevant bias-inducing features, which is why systematic identification of such features is necessary. Moreover, it often occurs that to achieve an equitable outcome, one has to take into account sensitive features in the model decision. Therefore, it should be up to the domain experts to decide based on their knowledge of the context of a decision whether bias induced by specific features is acceptable or not. In this study, we propose an approach for systematically identifying all bias-inducing features of a model to help support the decision-making of domain experts. Our technique is based on the idea of swapping the values of the features and computing the divergences in the distribution of the model prediction using different distance functions. We evaluated our technique using four well-known datasets to showcase how our contribution can help spearhead the standard procedure when developing, testing, maintaining, and deploying fair/equitable machine learning systems.

Vorheriger Artikel A theory of factors affecting continuous experimentation (FACE)

Nächster Artikel Language usage analysis for EMF metamodels on GitHub

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Plaintiff’s expert report of Peter S. Arcidiacono, Professor of Economics at Duke University.

https://github.com/openjamoses/Bias-detection-dataswap

Agarwal A, Beygelzimer A, Dudík M, Langford J, Wallach H (2018) A reductions approach to fair classification. In: International conference on machine learning, pp 60–69. PMLR

Aggarwal A, Lohia P, Nagar S, Dey K, Saha D (2019) Black box fairness testing of machine learning models. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 625–635

Aha D, Kibler D (1988) Instance-based prediction of heart-disease presence with the cleveland database. Univ Calif 3(1):3–2

Alelyani S (2021) Detection and evaluation of machine learning bias. Appl Sci 11(14):6271CrossRef

Arcidiacono P (2018a) Expert report of Peter S. Arcidiacono students for fair admissions, inc. v. harvard no. 14-cv-14176-adb (d. mass)

Arcidiacono P (2018b) Professor Peter Arcidiacono provides expert analysis for nonprofit’s lawsuit against harvard. https://econ.duke.edu/news/professor-peter-arcidiacono-provides-expert-analysis-nonprofit%E2%80%99s-lawsuit-against-harvard

Arcidiacono P, Kinsler J, Ransom T (2022) Legacy and athlete preferences at harvard. J Labor Econ 40(1):133–156CrossRef

Arrow K (1971) The theory of discrimination. Working Papers 403, Princeton University, Department of Economics, industrial relations section. https://EconPapers.repec.org/RePEc:pri:indrel:30a

Barbosa NM, Chen M (2019) Rehumanized crowdsourcing: a labeling framework addressing bias and ethics in machine learning. In: Proceedings of the 2019 CHI conference on human factors in computing systems, pp 1–12

Barocas S, Hardt M, Narayanan A (2017) Fairness in machine learning. Nips Tutorial 1:2

Berzuini C, Dawid P, Bernardinell L (2012) Causality: statistical perspectives and applications. John Wiley & SonsCrossRef

Bhattacharya A (2022) Applied machine learning explainability techniques: make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more. Packt Publishing Ltd

Blank RM, Dabady M, Citro CF, Blank RM (2004) Measuring racial discrimination. National Academies Press Washington, DC

Buolamwini J, Gebru T (2018) Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on fairness, accountability and transparency, pp 77–91. PMLR

Calders T, Verwer S (2010) Three naive bayes approaches for discrimination-free classification. Data Min Knowl Disc 21(2):277–292MathSciNetCrossRef

Chakraborty J, Majumder S, Menzies T (2021) Bias in machine learning software: why? how? what to do? In: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 429–440

Chakraborty J, Majumder S, Yu, Z, Menzies T (2020) Fairway: a way to build fair ml software. In: Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 654–665

Chouldechova A (2017) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5(2):153–163CrossRef

Corbett-Davies S, Goel S (2018) The measure and mismeasure of fairness: a critical review of fair machine learning. arXiv preprint arXiv:1808.00023

Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A (2017) Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining, pp 797–806

Custers B, Calders T, Schermer B, Zarsky T (1866) Discrimination and privacy in the information society. Stud Appl Philos Epistemology Rational Ethics 3

De Capitani L, De Martini D (2011) On stochastic orderings of the wilcoxon rank sum test statistic—with applications to reproducibility probability estimation testing. Stat Probab Lett 81(8):937–946MathSciNetCrossRef

de Kleer J, Brown JS (1986) Theories of causal ordering. Artif Intell 29(1):33–61. https://doi.org/10.1016/0004-3702(86)90090-1. www.sciencedirect.com/science/article/pii/0004370286900901

Detrano R, Janosi A, Steinbrunn W, Pfisterer M, Schmid JJ, Sandhu S, Guppy KH, Lee S, Froelicher V (1989) International application of a new probability algorithm for the diagnosis of coronary artery disease. Am J Cardiol 64(5):304–310CrossRef

Fisher V (2016) University of Texas at austin

Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J Mach Learn Res 20(177):1–81MathSciNet

Freedman DA (2005) On specifying graphical models for causation, and the identification problem. Identif Infer Econ Model pp 56–79

Frye C, Rowat C, Feige I (2020) Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability. Adv Neural Inf Process Syst 33:1229–1239

Fuglede B, Topsoe F (2004) Jensen-shannon divergence and hilbert space embedding. In: International symposium oninformation theory, 2004. ISIT 2004. Proceedings, pp 31. https://doi.org/10.1109/ISIT.2004.1365067

Goldstein A, Kapelner A, Bleich J, Pitkin E (2015) Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Graph Stat 24(1):44–65MathSciNetCrossRef

Hajian S, Domingo-Ferrer J (2012) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459CrossRef

Hajian S, Domingo-Ferrer J, Martinez-Balleste A (2011) Discrimination prevention in data mining for intrusion and crime detection. In: 2011 IEEE symposium on computational intelligence in cyber security (CICS), IEEE, pp 47–54

Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Adv. Neural Inf Process Syst 29

Hitchcock C (2012) Probabilistic causation. In: Zalta EN (ed) The Stanford Encyclopedia of Philosophy, Winter, 2012th edn. Stanford University, Metaphysics Research Lab

Holland PW (1986) Statistics and causal inference. J Am Stat Assoc 81(396):945–960MathSciNetCrossRef

Holland PW (2003) Causation and race. ETS Res Rep Ser 2003(1):i–21

Janos A, Steinbrunn W, Pfisterer M, Detrano R (1998) Heart disease data set. https://archive.ics.uci.edu/ml/datasets/heart+disease

Johnson B, Brun Y, Meliou A (2020) Causal testing: understanding defects’ root causes. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 87–99

Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116CrossRef

Kamiran F, Calders T (2012) Data preprocessing techniques for classification without discrimination. Knowl Inf Syst 33(1):1–33CrossRef

Kamiran F, Karim A, Zhang X (2012) Decision theory for discrimination-aware classification. In: 2012 IEEE 12th International conference on data mining, IEEE, pp 924–929

Kuczmarski J (2018) Reducing gender bias in google translate. Google Blog 6

Lang K, Kahn-Lang Spitzer A (2020) Race discrimination: an economic perspective. J Econ Perspect 34(2):68–89CrossRef

Li Y, Meng L, Chen L, Yu L, Wu D, Zhou Y, Xu B (2022) Training data debugging for the fairness of machine learning software, pp 2215–2227. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3510003.3510091

Liu Y, Li Y, Guo J, Zhou Y, Xu B (2018) Connecting software metrics across versions to predict defects. In: 2018 IEEE 25th International conference on software analysis, evolution and reengineering (SANER), IEEE, pp 232–243

Loohuis LO, Caravagna G, Graudenzi A, Ramazzotti D, Mauri G, Antoniotti M, Mishra B (2014) Inferring tree causal models of cancer progression with probability raising. PloS One 9(10):e108358CrossRef

Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30

Luong BT, Ruggieri S, Turini F (2011) k-nn as an implementation of situation testing for discrimination discovery and prevention. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 502–510

MacKinnon DP, Fairchild AJ, Fritz MS (2007) Mediation analysis. Annu Rev Psychol 58:593CrossRef

Majidi F, Openja M, Khomh F, Li H (2022) An empirical study on the usage of automated machine learning tools. In: 2022 IEEE International conference on software maintenance and evolution (ICSME), IEEE , pp 59–70

Majumder S, Chakraborty J, Bai GR, Stolee KT, Menzies T (2021) Fair enough: searching for sufficient measures of fairness. arXiv:2110.13029

Mancuhan K, Clifton C (2014) Combating discrimination using bayesian networks. Artif Intell Law 22(2):211–238CrossRef

Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31CrossRef

Openja M, Majidi F, Khomh F, Chembakottu B, Li H (2022a) Studying the practices of deploying machine learning projects on docker. In: Proceedings of the 26th international conference on evaluation and assessment in software engineering, pp 190–200

Openja M, Nikanjam A, Yahmed AH, Khomh F, Jiang ZMJ (2022b) An empirical study of challenges in converting deep learning models. In: 2022 IEEE International conference on software maintenance and evolution (ICSME), IEEE, pp 13–23

Pearl J (2001) Direct and indirect effects. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence, UAI’01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 411–420

Pearl J et al (2000) Models, reasoning and inference. Cambridge, UK: Cambridge UniversityPress 19(2)

Pearl J, Mackenzie D (2018) The book of why: the new science of cause and effect. Basic Books

Pedreschi D, Ruggieri S, Turini F (2009) Integrating induction and deduction for finding evidence of discrimination. In: Proceedings of the 12th international conference on artificial intelligence and law, ICAIL ’09, Association for Computing Machinery, New York, USA, pp 157–166. https://doi.org/10.1145/1568234.1568252

Peng K, Chakraborty J, Menzies T (2021) Fairmask: better fairness via model-based rebalancing of protected attributes. arXiv:2110.01109

Perera A, Aleti A, Tantithamthavorn C, Jiarpakdee J, Turhan B, Kuhn L, Walker K (2022) Search-based fairness testing for regression-based machine learning systems. Empir Softw Eng 27(3):1–36CrossRef

Phelps ES (1972) The statistical theory of racism and sexism. Am Econ Rev 62(4):659–661

Pleiss G, Raghavan M, Wu F, Kleinberg J, Weinberger KQ (2017) On fairness and calibration. Advances Neural Inf Process Syst 30

Ribeiro MT, Singh S, Guestrin C (2016) Why should i trust you? explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144

Richiardi L, Bellocco R, Zugna D (2013) Mediation analysis in epidemiology: methods, interpretation and bias. Int J Epidemiol 42(5):1511–1519CrossRef

Robins JM, Greenland S (1992) Identifiability and exchangeability for direct and indirect effects. Epidemiology 143–155

Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the nsse and other surveys: are the t-test and cohen’sd indices the most appropriate choices. In: Annual meeting of the southern association for institutional research, Citeseer, pp 1–51

Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29(5):582–638CrossRef

Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688CrossRef

Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. ACM Trans Knowl Discov Data (TKDD) 4(2):1–40CrossRef

Shin D, Park YJ (2019) Role of fairness, accountability, and transparency in algorithmic affordance. Comput Hum Behav 98:277–284CrossRef

Simon HA (1977) Causal ordering and identifiability. Models of Discovery: and other topics in the methods of science pp 53–80

Sunstein CR (2018) Legal reasoning and political conflict. Oxford University Press

Suppes P (1970) A theory of probabilistic causality

Tofallis C (2014) Add or multiply? a tutorial on ranking and choosing with multiple criteria. INFORMS Trans Educ 14(3):109–119CrossRef

Toloşi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27(14):1986–1994. https://doi.org/10.1093/bioinformatics/btr300CrossRef

Wachter S, Mittelstadt B, Russell C (2017) Counterfactual explanations without opening the black box: automated decisions and the gdpr. Harv JL & Tech 31:841

Willenborg L, De Waal T (2012) Elements of statistical disclosure control, vol 155. Springer Science & Business Media

Yapo A, Weiss J (2018) Ethical implications of bias in machine learning

Zhang P, Wang J, Sun J, Dong G, Wang X, Wang X, Dong JS, Dai T (2020) White-box fairness testing through adversarial sampling. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 949–960

Titel: Detection and evaluation of bias-inducing features in machine learning
verfasst von: Moses Openja
Gabriel Laberge
Foutse Khomh
Publikationsdatum: 01.02.2024
Verlag: Springer US
Erschienen in: Empirical Software Engineering / Ausgabe 1/2024
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-023-10409-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 1/2024

Multi-granular software annotation using file-level weak labelling

Predicting merge conflicts considering social and technical assets

Assessing the utility of text-to-SQL approaches for satisfying software developer information needs

What is an app store? The software engineering perspective

A fly in the ointment: an empirical study on the characteristics of Ethereum smart contract code weaknesses

SAFe transformation in a large financial corporation

Premium Partner