Top

Data Mining and Knowledge Discovery

Published in:

17-09-2022

Fairness in vulnerable attribute prediction on social media

Authors: Mariano G. Beiró, Kyriaki Kalimeri

Published in: Data Mining and Knowledge Discovery | Issue 6/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Historically, policymakers and practitioners relied exclusively on survey and census data to design and plan for assistive interventions; now, social media offer a timely and cost-effective way to reach out to populations otherwise unobserved. This study was designed to address the needs of a non-for-profit organisation to reach out to the young unemployed individuals in Italy with educational and job opportunities via communication channels that are more likely to appeal to younger generations. To this extend, we developed an ad-hoc Facebook application which administers questionnaires while gathering data about the Likes on Facebook Pages. Then, we developed a machine learning framework that successfully predicts the unemployment status of an unseen individual (.74 AUC). However, blindly delegating to the machine learning model the communication intervention may lead to digital discrimination on the basis of socio-demographic characteristics. Here, we propose a framework that aims to optimising both for the prediction performance as well as the most adequate fairness metric. Our framework is based on an adaptive threshold for gender, while we show that it can be expanded for other socio-demographic attributes and generalised for other interventions of assistive character. We present a doubly cross-validated setting that achieves out-of-sample stability and generalisability of results. We compare the behaviour of models that infer on different sets of data and provide an indepth discussion on the most predictive features, demonstrating that the “fairness through unawareness” approach does not suffice to achieve a fair classification since sensitive demographic information can be inferred not only via other sociodemographic attributes but also from behavioural digital patterns. Finally, we thoroughly assess the behaviour of the adaptive threshold approach and provide an in-depth discussion on the advantages but also the implications of such models offering actionable insights. Our results show that careful assessment of fairness metrics should be considered, primarily when AI models are employed for policymaking.

previous article An external stability audit framework to test the validity of personality prediction in AI hiring

next article Transfer how much: a fine-grained measure of the knowledge transferability of user behavior sequences in social network

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

https://ec.europa.eu/social/main.jsp?catId=1036

We conventionally refer to the AUROC values as “accuracy” throughout this paper.

The gender attribute is considered to be a binary variable since very few participants opted for the “Other” option.

A comparison between the geographical distribution of our sample per region and the expected values from the official Census is shown in the Supplementary Materials.

This choice is based on the fact that both groups do not actively search for a job.

Link to the list of categories: https://developers.facebook.com/docs/commerce-platform/catalog/categories/google-product-category-to-facebook-product-category

The full ranges for each hyperparameter are reported in the Supplementary Materials.

All experiments are performed in Python (Van Rossum and Drake 2009) with scikit-learn (Pedregosa et al. 2011).

The baseline AUC for our tasks is .50.

Agarwal A, Beygelzimer A, Dudík M, Langford J, Wallach H (2018) A reductions approach to fair classification. In: International Conference on Machine Learning, pp 60–69. PMLR

Aiken E, Bellue S, Karlan D, Udry C, Blumenstock JE (2022) Machine learning and phone data can improve targeting of humanitarian aid. Nature 1–7

Akintande OJ (2021) Algorithm fairness through data inclusion, participation, and reciprocity. In: International Conference on Database Systems for Advanced Applications, Springer, pp 633–637

Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern Information Retrieval, vol 463. ACM Press, New York

Barocas S, Selbst AD (2016) Big data’s disparate impact. Calif L Rev 104:671

Becker GS (2010) The Economics of Discrimination. University of Chicago Press, Chicago

Bento M, Martinez LM, Martinez LF (2018) Brand engagement and search for brands on social media: Comparing generations x and y in portugal. J of Retailing and Consum Serv 43:234–241CrossRef

Beutel A, Chen J, Doshi T, Qian H, Woodruff A, Luu C, Kreitmann P, Bischof J, Chi EH (2019) Putting fairness principles into practice: Challenges, metrics, and improvements. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp 453–459

Bi B, Shokouhi M, Kosinski M, Graepel T (2013) Inferring the demographics of search users: Social data meets search queries. In: Proceedings of the 22Nd International Conference on World Wide Web. WWW ’13, ACM, New York, NY, USA, pp 131–140. https://doi.org/10.1145/2488388.2488401

Bokányi E, Lábszki Z, Vattay G (2017) Prediction of employment and unemployment rates from twitter daily rhythms in the us. EPJ Data Sci 6(1):14CrossRef

Bonanomi A, Rosina A, Cattuto C, Kalimeri K (2017) Understanding youth unemployment in italy via social media data. In: 28th IUSSP International Population Conference, Cape Town, South Africa

Calders T, Verwer S (2010) Three naive bayes approaches for discrimination-free classification. Data mining and knowl discov 21(2):277–292MathSciNetCrossRef

Chhabra A, Masalkovaitė K, Mohapatra P (2021) An overview of fairness in clustering. IEEE Access

Chouldechova A (2017) Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5(2):153–163CrossRef

Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A (2017) Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17, Association for Computing Machinery, New York, NY, USA pp 797–806. https://doi.org/10.1145/3097983.3098095

Desiere S, Langenbucher K, et al. (2018) Profiling tools for early identification of jobseekers who need extra support. OECD Policy Brief on Activation Policies (dec) 1–4

Desiere S, Struyven L (2020) Using artificial intelligence to classify jobseekers: The accuracy-equity trade-off. Journal Of Social Policy

Dong Y, Yang Y, Tang J, Yang Y, Chawla NV (2014) Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, USA, pp 15–24. https://doi.org/10.1145/2623330.2623703

Dutta S, Wei D, Yueksel H, Chen P-Y, Liu S, Varshney K (2020) Is there a trade-off between fairness and accuracy? a perspective using mismatched hypothesis testing. In: International Conference on Machine Learning, pp 2803–2813. PMLR

Eslami, M., Krishna Kumaran, S.R., Sandvig, C., Karahalios, K.: Communicating algorithmic process in online behavioral advertising. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2018)

Fatehkia M, Kashyap R, Weber I (2018) Using facebook ad data to track the global digital gender gap. World Dev 107:189–209CrossRef

Fatehkia M, Coles B, Ofli F, Weber I (2020) The relative value of facebook advertising data for poverty mapping. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, pp 934–938

Felbo B, Sundsøy P, Lehmann S, de Montjoye Y-A et al. (2017) Modeling the temporal nature of human behavior for demographics prediction. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 140–152

Gao J, Zhang Y-C, Zhou T (2019) Computational socioeconomics. Physics Reports

Goel S, Hofman J, Sirer MI (2012) Who does what on the web: Studying web browsing behavior at scale. In: International Conference on Weblogs and Social Media, pp 130–137

Goyat S (2011) The basis of market segmentation: A critical review of literature. Eur J of Bus and Management 3(9):45–54

Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, Red Hook, NY, USA, pp 3323–3331

ISTAT (2020) ISTAT Database. Data on unemployed rate. http://dati.istat.it

Kalimeri K, Beiró MG, Delfino M, Raleigh R, Cattuto C (2019) Predicting demographics, moral foundations, and human values from digital behaviours. Comput in Human Behav 92:428–445CrossRef

Kalimeri K, Beiró MG, Bonanomi A, Rosina A, Cattuto C (2020) Traditional versus facebook-based surveys: Evaluation of biases in self-reported demographic and psychometric information. Demogr Res 42(5):133–148CrossRef

Kamiran F, Calders T (2012) Data preprocessing techniques for classification without discrimination. Knowl and Inf Syst 33(1):1–33CrossRef

Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 35–50

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp 3146–3154

Kilbertus N, Rojas Carulla M, Parascandolo G, Hardt M, Janzing D, Schölkopf B (2017) Avoiding discrimination through causal reasoning. Advances in neural information processing systems 30

Kleinberg J, Mullainathan S, Raghavan M (2016) Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807

Kosinski M, Stillwell D, Graepel T (2013) Private traits and attributes are predictable from digital records of human behavior. Proc of the National Acad of Sci 110(15):5802–5805CrossRef

Kuhn P (1987) Sex discrimination in labor markets: The role of statistical evidence. The American Economic Review 567–583

Leonelli S, Lovell R, Wheeler BW, Fleming L, Williams H (2021) From fair data to fair data use: Methodological data fairness in health-related social media research. Big Data & Soc 8(1):20539517211010310CrossRef

Llorente A, Garcia-Herranz M, Cebrian M, Moro E (2015) Social media fingerprints of unemployment. PLOS ONE 10(5):1–13CrossRef

Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I (2019) Explainable AI for Trees: From Local Explanations to Global Understanding

Lundberg SM, Lee S-I (2017a) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems 30, pp 4765–4774

Lundberg S, Lee S-I (2017b) A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874

Malmi E, Weber I (2016) You are what apps you use: Demographic prediction based on user’s apps. ICWSM, 635–638

Mason SJ, Graham NE (2002) Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: Statistical significance and interpretation. Quarterly J of the Royal Meteorol Soc 128(584):2145–2166CrossRef

Matz SC, Menges JI, Stillwell DJ, Schwartz HA (2019) Predicting individual-level income from facebook profiles. PloS one 14(3):0214369CrossRef

Ntoutsi E, Fafalios P, Gadiraju U, Iosifidis V, Nejdl W, Vidal M-E, Ruggieri S, Turini F, Papadopoulos S, Krasanakis E et al (2020) Bias in data-driven artificial intelligence systems-an introductory survey. Wiley Int Rev: Data Mining and Knowl Discov 10(3):1356

Olteanu A, Castillo C, Diaz F, Kıcıman E (2019) Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data 2:13CrossRef

Olteanu A, Castillo C, Diaz F, Kiciman E (2016) Social data: Biases, methodological pitfalls, and ethical boundaries. https://doi.org/10.2139/ssrn.2886526

O’Neil C (2016) Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown, New YorkMATH

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J of Mach Learning Res 12:2825–2830MathSciNetMATH

Pedreshi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 560–568

Pessach D, Shmueli E (2022) A review on fairness in machine learning. ACM Comput Surveys (CSUR) 55(3):1–44CrossRef

Rama D, Mejova Y, Tizzoni M, Kalimeri K, Weber I (2020) Facebook ads as a demographic tool to measure the urban-rural divide. In: Proceedings of The Web Conference 2020, pp 327–338

Saleiro P, Kuester B, Stevens A, Anisfeld A, Hinkson L, London J, Ghani R (2018) Aequitas: A bias and fairness audit toolkit. arXiv preprint arXiv:1811.05577

Seneviratne S, Seneviratne A, Mohapatra P, Mahanti A (2015) Your installed apps reveal your gender and more! ACM SIGMOBILE Mobile Comput and Commun Rev 18(3):55–61CrossRef

Stoll MA, Raphael S, Holzer HJ (2004) Black job applicants and the hiring officer’s race. ILR Rev 57(2):267–287CrossRef

Sundsøy P, Bjelland J, Reme B-A, Jahani E, Wetter E, Bengtsson L (2016) Estimating individual employment status using mobile phone network data. arXiv preprint arXiv:1612.03870

Toole JL, Lin Y-R, Muehlegger E, Shoag D, González MC, Lazer D (2015) Tracking employment shocks using mobile phone data. J of The Royal Soc Int 12(107):20150185CrossRef

Urbinati A, Kalimeri K, Bonanomi A, Rosina A, Cattuto C, Paolotti D (2020) Young adult unemployment through the lens of social media: Italy as a case study. In: International Conference on Social Informatics, Springer, Cham, pp 380–396

van Landeghem B, Desiere S, Struyven L (2021) Statistical profiling of unemployed jobseekers. IZA World of Labor, GermanyCrossRef

Van Rossum G, Drake FL (2009) Python 3 Reference Manual. CreateSpace, Scotts Valley, CA

Verma S, Rubin J (2018) Fairness definitions explained. In: 2018 IEEE/ACM International Workshop on Software Fairness (fairware), pp 1–7. IEEE

Wood R, Murch B, Betteridge R (2019) A comparison of population segmentation methods. Oper Res for Health Care 22:100192CrossRef

Yeung K, Lodge M (2019) The Possibilities of Digital Discrimination: Research on E-commerce, Algorithms and Big Data. Oxford University Press, UK

Ying JJ-C, Chang Y-J, Huang C-M, Tseng VS (2012) Demographic prediction based on users mobile behaviors. Mobile Data Challenge

Zafar MB, Valera I, Gomez Rodriguez M, Gummadi KP (2017) Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web, pp 1171–1180

Zemel R, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. In: International Conference on Machine Learning, pp 325–333. PMLR

Zhang BH, Lemoine B, Mitchell M (2018) Mitigating unwanted biases with adversarial learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp 335–340

Zhong Y, Yuan NJ, Zhong W, Zhang F, Xie X (2015) You are where you go: Inferring demographic attributes from location check-ins. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. WSDM ’15, ACM, New York, NY, USA, pp 295–304

Title: Fairness in vulnerable attribute prediction on social media
Authors: Mariano G. Beiró
Kyriaki Kalimeri
Publication date: 17-09-2022
Publisher: Springer US
Published in: Data Mining and Knowledge Discovery / Issue 6/2022
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-022-00855-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 6/2022

Decision tree boosted varying coefficient models

An external stability audit framework to test the validity of personality prediction in AI hiring

Personalised meta-path generation for heterogeneous graph neural networks

Wisdom of the contexts: active ensemble learning for contextual anomaly detection

POI recommendation with queuing time and user interest awareness

Transfer how much: a fine-grained measure of the knowledge transferability of user behavior sequences in social network

Premium Partner