ABSTRACT
There has been rapidly growing interest in the use of algorithms in hiring, especially as a means to address or mitigate bias. Yet, to date, little is known about how these methods are used in practice. How are algorithmic assessments built, validated, and examined for bias? In this work, we document and analyze the claims and practices of companies offering algorithms for employment assessment. In particular, we identify vendors of algorithmic pre-employment assessments (i.e., algorithms to screen candidates), document what they have disclosed about their development and validation procedures, and evaluate their practices, focusing particularly on efforts to detect and mitigate bias. Our analysis considers both technical and legal perspectives. Technically, we consider the various choices vendors make regarding data collection and prediction targets, and explore the risks and trade-offs that these choices pose. We also discuss how algorithmic de-biasing techniques interface with, and create challenges for, antidiscrimination law.
Supplemental Material
Available for Download
Supplemental material.
- Ifeoma Ajunwa. 2020. The Paradox of Automation as Anti-Bias Intervention. Cardozo Law Review 41 (2020).Google Scholar
- Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. ProPublica, May 23 (2016).Google Scholar
- Illinois General Assembly. 2019. Artificial Intelligence Video Interview Act.Google Scholar
- Lewis Baker, David Weisberger, Daniel Diamond, Mark Ward, and Joe Naso. 2018. audit-AI. https://github.com/pymetrics/audit-ai.Google Scholar
- Loren Baritz. 1960. The servants of power: A history of the use of social science in American industry. Wesleyan University Press.Google Scholar
- Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Calif. L. Rev. 104 (2016), 671.Google Scholar
- Lisa Feldman Barrett, Ralph Adolphs, Stacy Marsella, Aleix M Martinez, and Seth D Pollak. 2019. Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological science in the public interest 20, 1 (2019), 1--68.Google Scholar
- Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine learning 79, 1-2 (2010), 151--175.Google Scholar
- Marc Bendick and Ana P Nunes. 2012. Developing the research basis for controlling bias in hiring. Journal of Social Issues 68, 2 (2012), 238--262.Google ScholarCross Ref
- Marc Bendick Jr, Charles W Jackson, and J Horacio Romero. 1997. Employment discrimination against older workers: An experimental study of hiring practices. Journal of Aging & Social Policy 8, 4 (1997), 25--46.Google ScholarCross Ref
- Jason R Bent. 2020. Is Algorithmic Affirmative Action Legal? Georgetown Law Journal 108 (2020).Google Scholar
- Marianne Bertrand and Sendhil Mullainathan. 2004. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American economic review 94, 4 (2004), 991--1013.Google Scholar
- Daniel A Biddle. 2008. Are the uniform guidelines outdated? Federal guidelines, professional standards, and validity generalization (VG). The Industrial-Organizational Psychologist 45, 4 (2008), 17--23.Google Scholar
- Miranda Bogen and Aaron Rieke. 2018. Help Wanted: An Exploration of Hiring Algorithms, Equity, and Bias. Technical Report. Upturn. https://www.upturn.org/static/reports/2018/hiring-algorithms/files/Upturn%20-%20Help%20Wanted%20-%20An%20Exploration%20of%20Hiring%20Algorithms,%20Equity%20and%20Bias.pdfGoogle Scholar
- Stephanie Bornstein. 2018. Antidiscriminatory Algorithms. Ala. L. Rev. 70 (2018), 519.Google Scholar
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77--91.Google Scholar
- Peter Cappelli, Prasanna Tambe, and Valery Yakubovich. 2018. Artificial Intelligence in Human Resources Management: Challenges and a Path Forward. Available at SSRN 3263878 (2018).Google Scholar
- L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. 2018. Ranking with Fairness Constraints. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic. 28:1--28:15. Google ScholarCross Ref
- Tomas Chamorro-Premuzic, Dave Winsborough, Ryne A Sherman, and Robert Hogan. 2016. New talent signals: Shiny new objects or a brave new world? Industrial and Organizational Psychology 9, 3 (2016), 621--640.Google ScholarCross Ref
- Tomas Chamorro-Prezumic and Reece Akhtar. 2019. Should Companies Use AI to Assess Job Candidates? Harvard Business Review (2019).Google Scholar
- Richard M Cohn. 1979. On the Use of Statistics in Employment Discrimination Cases. Ind. LJ 55 (1979), 493.Google Scholar
- Richard M Cohn. 1979. Statistical Laws and the Use of Statistics in Law: A Rejoinder to Professor Shoben. Ind. LJ 55 (1979), 537.Google Scholar
- Equal Employment Opportunity Commission, Civil Service Commission, et al. 1978. Uniform guidelines on employee selection procedures. Federal Register 43, 166 (1978), 38290--38315.Google Scholar
- U.S. Congress. 1964. Civil Rights Act.Google Scholar
- U.S. Congress. 1990. Americans with Disabilities Act.Google Scholar
- U.S. Congress. 1991. Civil Rights Act.Google Scholar
- Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 797--806.Google ScholarDigital Library
- National Research Council et al. 1989. Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. National Academies Press.Google Scholar
- National Research Council et al. 2013. New directions in assessing performance potential of individuals and groups: Workshop summary. National Academies Press.Google Scholar
- Bo Cowgill. 2018. Bias and Productivity in Humans and Algorithms: Theory and Evidence from Resume Screening. Columbia Business School, Columbia University 29 (2018).Google Scholar
- Hamilton Cravens. 1978. The triumph of evolution: The heredity-environment controversy, 1900--1941. Johns Hopkins University Press.Google Scholar
- Philip Hunter DuBois. 1970. A history of psychological testing. Allyn and Bacon.Google Scholar
- Marvin D Dunnette and Walter C Borman. 1979. Personnel selection and classification systems. Annual review of psychology 30, 1 (1979), 477--525.Google Scholar
- Harrison Edwards and Amos J. Storkey. 2016. Censoring Representations with an Adversary. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings.Google Scholar
- Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 259--268.Google ScholarDigital Library
- Society for Industrial, Organizational Psychology (US), and American Psychological Association. Division of Industrial-Organizational Psychology. 2018. Principles for the validation and use of personnel selection procedures. American Psychological Association.Google Scholar
- Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning. Springer series in statistics New York.Google Scholar
- Jim Fruchterman and Joan Melllea. 2018. Expanding Employment Success for People with Disabilities. Technical Report. benetech.Google Scholar
- Stacia Sherman Garr and Carole Jackson. 2019. Diversity & Inclusion Technology: The Rise of a Transformative Market. Technical Report. RedThread Research. https://info.mercer.com/rs/521-DEV-513/images/Mercer_DI_Report_Digital.pdfGoogle Scholar
- PW Gerhardt. 1916. Scientific selection of employees. Electric Railway Journal 47 (1916).Google Scholar
- Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi. 2019. Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.Google ScholarDigital Library
- Jeff Grimmett. 2017. Veterinary Practitioners - personal characteristics and professional longevity. VetScript (2017).Google Scholar
- Anhong Guo, Ece Kamar, Jennifer Wortman Vaughan, Hannah Wallach, and Meredith Ringel Morris. 2019. Toward Fairness in AI for People with Disabilities: A Research Roadmap. ACM SIGACCESS 125 (October 2019).Google Scholar
- Richard A Guzzo, Alexis A Fink, Eden King, Scott Tonidandel, and Ronald S Landis. 2015. Big data recommendations for industrial-organizational psychology. Industrial and Organizational Psychology 8, 4 (2015), 491--508.Google ScholarCross Ref
- Craig Haney. 1982. Employment tests and employment discrimination: A dissenting psychological opinion. Indus. Rel. LJ 5 (1982), 1.Google Scholar
- Kamala D. Harris, Patty Murray, and Elizabeth Warren. 2018. Letter to U.S. Equal Employment Opportunity Commission. https://www.scribd.com/embeds/388920670/content#from_embedGoogle Scholar
- Deborah Hellman. 2019. Measuring Algorithmic Fairness. Virginia Public Law and Legal Theory Research Paper 2019-39 (2019).Google Scholar
- Kimberly Houser. 2019. Can AI solve the diversity problem in the tech industry? Mitigating noise and bias in employment decision-making. Stanford Technology Law Review 22 (2019).Google Scholar
- Amy E. Hurley-Hanson and Cristina M. Giannantonio (Eds.). 2016. Journal of Business Management. 22, 1 (2016).Google Scholar
- Ben Hutchinson and Margaret Mitchell. 2019. 50 Years of Test (Un) fairness: Lessons for Machine Learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 49--58.Google ScholarDigital Library
- Josh Jarrett and Sarah Croft. 2018. The Science Behind The Koru Model of Predictive Hiring for Fit. Technical Report. Koru.Google Scholar
- Stefanie K Johnson, David R Hekman, and Elsa T Chan. 2016. If there's only one woman in your candidate pool, there's statistically no chance she'll be hired. Harvard Business Review 26, 04 (2016).Google Scholar
- William F Kemble. 1916. Testing the fitness of your employees. Industrial Management (1916).Google Scholar
- Pauline T Kim. 2016. Data-driven discrimination at work. Wm. & Mary L. Rev. 58 (2016), 857.Google Scholar
- Pauline T Kim. 2017. Auditing algorithms for discrimination. U. Pa. L. Rev. Online 166 (2017), 189.Google Scholar
- Pauline T Kim. 2018. Big Data and Artificial Intelligence: New Challenges for Workplace Equality. U. Louisville L. Rev. 57 (2018), 313.Google Scholar
- Pauline T Kim. 2020. Manipulating Opportunity. Virginia Law Review 106 (2020).Google Scholar
- Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2017. Human decisions and machine predictions. The Quarterly Journal of Economics 133, 1 (2017), 237--293.Google Scholar
- Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Cass R Sunstein. 2019. Discrimination in the Age of Algorithms. Journal of Legal Analysis (2019).Google Scholar
- Robin SS Kramer and Robert Ward. 2010. Internal facial features are signals of personality and health. The Quarterly Journal of Experimental Psychology 63, 11 (2010), 2273--2287.Google ScholarCross Ref
- Joshua A Kroll, Solon Barocas, Edward W Felten, Joel R Reidenberg, David G Robinson, and Harlan Yu. 2016. Accountable algorithms. U. Pa. L. Rev. 165 (2016), 633.Google Scholar
- California State Legislature. 1959. Fair Employment and Housing Act.Google Scholar
- Zachary Lipton, Julian McAuley, and Alexandra Chouldechova. 2018. Does mitigating ML's impact disparity require treatment disparity?. In Advances in Neural Information Processing Systems. 8125--8135.Google Scholar
- George F Madaus and Marguerite Clarke. 2001. The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data. Technical Report. ERIC.Google Scholar
- David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018. Learning Adversarially Fair and Transferable Representations. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80. PMLR, Stockholmsmässan, Stockholm Sweden, 3384--3393.Google Scholar
- Andrew Mariotti. 2017. Talent Acquisition Benchmarking Report. Technical Report. Society for Human Resource Management. https://www.shrm.org/hr-today/trends-and-forecasting/research-and-surveys/Documents/2017-Talent-Acquisition-Benchmarking.pdfGoogle Scholar
- Michael A Mcdaniel, Sven Kepes, and George C Banks. 2011. The Uniform Guidelines are a detriment to the field of personnel selection. Industrial and Organizational Psychology 4, 4 (2011), 494--514.Google ScholarCross Ref
- Hugo Munsterberg. 1998. Psychology and industrial efficiency. Vol. 49. A&C Black.Google Scholar
- Isabel Briggs Myers. 1962. The Myers-Briggs type indicator. Consulting Psychologists Press.Google Scholar
- David Neumark, Roy J Bank, and Kyle D Van Nort. 1996. Sex discrimination in restaurant hiring: An audit study. The Quarterly journal of economics 111, 3 (1996), 915--941.Google Scholar
- Warren T Norman. 1963. Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. The Journal of Abnormal and Social Psychology 66, 6 (1963), 574.Google ScholarCross Ref
- Samir Passi and Solon Barocas. 2019. Problem Formulation and Fairness. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 39--48.Google ScholarDigital Library
- Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 560--568.Google ScholarDigital Library
- Ruchir Puri. 2018. Mitigating Bias in AI Models. IBM Research Blog (2018).Google Scholar
- Lincoln Quillian, Devah Pager, Ole Hexel, and Arnfinn H Midtbøen. 2017. Meta-analysis of field experiments shows no change in racial discrimination in hiring over time. Proceedings of the National Academy of Sciences 114, 41 (2017), 10870--10875.Google ScholarCross Ref
- Inioluwa Deborah Raji and Joy Buolamwini. 2019. Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products. AAAI/ACM Conf. on AI Ethics and Society (2019).Google ScholarDigital Library
- McKenzie Raub. 2018. Bots, Bias and Big Data: Artificial Intelligence, Algorithmic Bias and Disparate Impact Liability in Hiring Practices. Ark. L. Rev. 71 (2018), 529.Google Scholar
- Lauren Rhue. 2018. Racial Influence on Automated Perceptions of Emotions. Available at SSRN 3281765 (2018).Google Scholar
- Peter A Riach and Judith Rich. 2002. Field experiments of discrimination in the market place. The economic journal 112, 483 (2002), F480--F518.Google Scholar
- John Roach. 2018. Microsoft improves facial recognition technology to perform well across all skin tones, genders. The AI Blog (2018).Google Scholar
- Michael C Rodriguez and Yukiko Maeda. 2006. Meta-analysis of coefficient alpha. Psychological methods 11, 3 (2006), 306.Google Scholar
- Edward Ruda and Lewis E Albright. 1968. Racial differences on selection instruments related to subsequent job performance. Personnel Psychology (1968).Google Scholar
- Eduardo Salas. 2011. Reply to Request for Public Comment on Plan for Retrospective Analysis of Significant Regulations pursuant to Executive Order 13563.Google Scholar
- Javier Sanchez-Monedero, Lina Dencik, and Lilian Edwards. 2020. What does it mean to solve the problem of discrimination in hiring? Social, technical and legal perspectives from the UK on automated hiring systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM.Google Scholar
- Heinz Schuler, James L Farr, and Mike Smith. 1993. Personnel selection and assessment: Individual and organizational perspectives. Psychology Press.Google Scholar
- Elaine W Shoben. 1978. Differential pass-fail rates in employment testing: Statistical proof under Title VII. Harvard Law Review (1978), 793--813.Google Scholar
- Elaine W Shoben. 1979. In defense of disparate impact analysis under Title VII: A reply to Dr. Cohn. Ind. LJ 55 (1979), 515.Google Scholar
- Jim Sidanius and Marie Crane. 1989. Job evaluation and gender: The case of university faculty. Journal of Applied Social Psychology 19, 2 (1989), 174--197.Google ScholarCross Ref
- Lewis Madison Terman. 1916. The measurement of intelligence: An explanation of and a complete guide for the use of the Stanford revision and extension of the Binet-Simon intelligence scale. Houghton Mifflin.Google Scholar
- Leona E Tyler. 1947. The psychology of human differences. D Appleton-Century Company.Google Scholar
- Ke Yang and Julia Stoyanovich. 2017. Measuring fairness in ranked outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, 22.Google ScholarDigital Library
- John W Young. 2001. Differential Validity, Differential Prediction, and College Admission Testing: A Comprehensive Review and Analysis. Research Report No. 2001-6. College Entrance Examination Board (2001).Google Scholar
- Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P. Gummadi. 2017. Fairness Constraints: Mechanisms for Fair Classification. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Vol. 54. PMLR, Fort Lauderdale, FL, USA, 962--970.Google Scholar
- Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. FA*IR: A fair top-k ranking algorithm. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1569--1578.Google ScholarDigital Library
- Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International Conference on Machine Learning. 325--333.Google ScholarDigital Library
- Dawei Zhou, Jiebo Luo, Vincent MB Silenzio, Yun Zhou, Jile Hu, Glenn Currier, and Henry Kautz. 2015. Tackling mental health by integrating unobtrusive multimodal sensing. In Twenty-Ninth AAAI Conference on Artificial Intelligence.Google ScholarDigital Library
Index Terms
- Mitigating bias in algorithmic hiring: evaluating claims and practices
Recommendations
Algorithmic Hiring in Practice: Recruiter and HR Professional's Perspectives on AI Use in Hiring
AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and SocietyThe use of AI-enabled hiring software raises questions about the practice of Human Resource (HR) professionals' use of the software and its consequences. We interviewed 15 recruiters and HR professionals about their experiences around two decision-...
(Some) algorithmic bias as institutional bias
AbstractIn this paper I argue that some examples of what we label ‘algorithmic bias’ would be better understood as cases of institutional bias. Even when individual algorithms appear unobjectionable, they may produce biased outcomes given the way that ...
Mitigating Bias in Algorithmic Systems—A Fish-eye View
Mitigating bias in algorithmic systems is a critical issue drawing attention across communities within the information and computer sciences. Given the complexity of the problem and the involvement of multiple stakeholders—including developers, end users, ...
Comments