research-article

Open Access

Bias in Automated Speaker Recognition

Authors:
Wiebke Toussaint Hutiri

Technology, Policy & Management / Engineering Systems & Services / Cyber Physical Intelligence Lab, Delft University of Technology, Netherlands

Technology, Policy & Management / Engineering Systems & Services / Cyber Physical Intelligence Lab, Delft University of Technology, Netherlands
View Profile

,
Aaron Yi Ding

Technology, Policy & Management/Engineering Systems & Services / Cyber Physical Intelligence Lab, Delft University of Technology, Netherlands

Technology, Policy & Management/Engineering Systems & Services / Cyber Physical Intelligence Lab, Delft University of Technology, Netherlands
View Profile

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and TransparencyJune 2022Pages 230–247https://doi.org/10.1145/3531146.3533089

Published:20 June 2022Publication History

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

Pages 230–247

ABSTRACT

Automated speaker recognition uses data processing to identify speakers by their voice. Today, automated speaker recognition is deployed on billions of smart devices and in services such as call centres. Despite their wide-scale deployment and known sources of bias in related domains like face recognition and natural language processing, bias in automated speaker recognition has not been studied systematically. We present an in-depth empirical and analytical study of bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. Drawing on an established framework for understanding sources of harm in machine learning, we show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge, including data generation, model building, and implementation. Most affected are female speakers and non-US nationalities, who experience significant performance degradation. Leveraging the insights from our findings, we make practical recommendations for mitigating bias in automated speaker recognition, and outline future research directions.

References

Martine Adda-Decker and Lori Lamel. 2005. Do speech recognizers prefer female speakers?INTERSPEECH (2005), 2205–2208. https://www.isca-speech.org/archive/interspeech_2005/addadecker05_interspeech.htmlGoogle Scholar
Zhongxin Bai and Xiao Lei Zhang. 2021. Speaker recognition based on deep learning: An overview. Neural Networks 140(2021), 65–99. https://doi.org/10.1016/j.neunet.2021.03.004Google ScholarCross Ref
Tolga Bolukbasi, Kai-wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker ? Debiasing Word Embeddings. In NIPS’16: Proceedings of the 30th International Conference on Neural Information Processing Systems. 4356 – 4364.Google Scholar
Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of Machine Learning Research: Conference on Fairness, Accountability, and Transparency, Vol. 81. 1889–1896.Google Scholar
Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong Jin Lee, and Icksang Han. 2020. In defence of metric learning for speaker recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-Octob (2020), 2977–2981. https://doi.org/10.21437/Interspeech.2020-1064Google ScholarCross Ref
Joon Son Chung and Andrew Zisserman. 2017. Out of time: Automated lip sync in the wild. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10117 LNCS, i(2017), 251–263. https://doi.org/10.1007/978-3-319-54427-4_19Google ScholarCross Ref
Gianni Fenu, Mirko Marras, Giacomo Medda, and Giacomo Meloni. 2021. Fair Voice Biometrics : Impact of Demographic Imbalance on Group Fairness in Speaker Recognition. (2021), 1892–1896.Google Scholar
Sadaoki Furui. 1994. An Overview of Speaker Recognition Technology. In ESCA Workshop on Automatic Speaker Recognition, Identification and Verification. 1 – 9.Google Scholar
Craig S. Greenberg, Lisa P. Mason, Seyed Omid Sadjadi, and Douglas A. Reynolds. 2020. Two decades of speaker recognition evaluation at the national institute of standards and technology. Computer Speech and Language 60 (2020). https://doi.org/10.1016/j.csl.2019.101032Google ScholarDigital Library
Oxford Visual Geometry Group. 2021. The VoxCeleb Speaker Recognition Challenge 2021. https://www.robots.ox.ac.uk/~vgg/data/voxceleb/competition2021.htmlGoogle Scholar
John H.L. Hansen and Taufiq Hasan. 2015. Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine 32, 6 (2015), 74–99. https://doi.org/10.1109/MSP.2015.2462851Google ScholarCross Ref
Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems (2016), 3323–3331.Google ScholarDigital Library
Khaled Hechmi, Trung Ngo Trong, Ville Hautamaki, and Tomi Kinnunen. 2021. VoxCeleb Enrichment for Age and Gender Recognition. (2021). http://arxiv.org/abs/2109.13510Google Scholar
Georg Heigold, Ignacio Moreno, Samy Bengio, and Noam Shazeer. 2016. End-to-End Text-Dependent Speaker Verification. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5115–5119.Google Scholar
Hee Soo Heo, Bong Jin Lee, Jaesung Huh, and Joon Son Chung. 2020. Clova baseline system for the VoxCeleb speaker recognition challenge 2020. arXiv (2020), 1–3.Google Scholar
Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising Bias in Compressed Models. https://arxiv.org/abs/2010.03058Google Scholar
Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner. 2016. Machine Bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingGoogle Scholar
Elie Khoury, Laurent El Shafey, Christopher McCool, Manuel Günther, and Sébastien Marcel. 2014. Bi-modal biometric authentication on mobile phones in challenging conditions. Image and Vision Computing 32, 12 (2014), 1147–1160. https://doi.org/10.1016/j.imavis.2013.10.001Google ScholarDigital Library
E. Khoury, B. Vesnicer, J. Franco-Pedroso, R. Violato, Z. Boulkcnafet, L. M. Mazaira Fernandez, M. Diez, J. Kosmala, H. Khemiri, T. Cipr, R. Saeidi, M. Gunther, J. Zganec-Gros, R. Zazo Candil, F. Simoes, M. Bengherabi, A. Alvarez Marquina, M. Penagarikano, A. Abad, M. Boulayemen, P. Schwarz, D. Van Leeuwen, J. Gonzalez-Dominguez, M. Uliani Neto, E. Boutellaa, P. Gomez Vilda, A. Varona, D. Petrovska-Delacretaz, P. Matejka, J. Gonzalez-Rodriguez, T. Pereira, F. Harizi, L. J. Rodriguez-Fuentes, L. El Shafey, M. Angeloni, G. Bordel, G. Chollet, and S. Marcel. 2013. The 2013 speaker recognition evaluation in mobile environment. Proceedings - 2013 International Conference on Biometrics, ICB 2013 (2013). https://doi.org/10.1109/ICB.2013.6613025Google ScholarCross Ref
Davis E. King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758.Google ScholarDigital Library
Tomi Kinnunen and Haizhou Li. 2009. An Overview of Text-Independent Speaker Recognition : from Features to Supervectors. Speech Communication 52, 1 (2009), 12. https://doi.org/10.1016/j.specom.2009.08.009Google ScholarDigital Library
Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition. PNAS 117, 14 (2020), 7684–7689. https://doi.org/10.1073/pnas.1915768117/-/DCSupplemental.yGoogle ScholarCross Ref
Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, and Zhenyao Zhu. 2017. Deep speaker: An end-to-end neural speaker embedding system. arXiv (2017).Google Scholar
Beryl Lipton and Quintin Cooper. 2021. The Catalog of Carceral Surveillance: Voice Recognition and Surveillance. https://www.eff.org/deeplinks/2021/09/catalog-carceral-surveillance-voice-recognition-and-surveillanceGoogle Scholar
Mohamed Maouche, Brij Mohan, Lal Srivastava, Nathalie Vauquier, Marc Tommasi, Emmanuel Vincent, Mohamed Maouche, Brij Mohan, Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Mohamed Maouche, Brij Mohan, Lal Srivastava, Nathalie Vauquier, Emmanuel Vincent, and De Lorraine. 2020. A comparative study of speech anonymization metrics. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Shanghai, China.Google ScholarCross Ref
A Martin, G Doddington, T Kamm, M Ordowski, and M Przybocki. 1997. The DET Curve in Assessment of Detection Task Performance. Technical Report. National Institute of Standards and Technology (NIST), Gaithersburg MD. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.4489Google Scholar
Luis Miguel Mazaira-Fernandez, Agustín Álvarez-Marquina, and Pedro Gómez-Vilda. 2015. Improving speaker recognition by biometric voice deconstruction. Frontiers in Bioengineering and Biotechnology 3, September(2015), 1–19. https://doi.org/10.3389/fbioe.2015.00126Google ScholarCross Ref
M McLaren, L Ferrer, D Castan, and A Lawson. 2016. The Speakers in the Wild (SITW) speaker recognition database.. In Interspeech. pdfs.semanticscholar.org. https://pdfs.semanticscholar.org/3fe3/58a66359ee2660ec0d13e727eb8f3f0007c2.pdfGoogle Scholar
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. arXiv (2019).Google Scholar
Margaret Mitchell, Dylan Baker, Nyalleng Moorosi, Emily Denton, Ben Hutchinson, Alex Hanna, Timnit Gebru, and Jamie Morgenstern. 2020. Diversity and inclusion metrics in subset selection. AIES 2020 - Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (2020), 117–123. https://doi.org/10.1145/3375627.3375832Google ScholarDigital Library
Marta Morrás. 2021. BBVA Mexico allows its pensioner customers to provide proof of life from home thanks to Veridas voice biometrics. https://veridas.com/en/bbva-mexico-allows-pensioner-customers-provide-proof-of-life-from-home/Google Scholar
Arsha Nagrani, Joon Son Chung, Jaesung Huh, Andrew Brown, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, and Andrew Zisserman. 2020. VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge. (2020). http://arxiv.org/abs/2012.06867Google Scholar
Arsha Nagrani, Joon Son Chung, Weidi Xie, and Andrew Zisserman. 2020. Voxceleb: Large-scale speaker verification in the wild. Computer Speech and Language 60 (2020), 101027. https://doi.org/10.1016/j.csl.2019.101027Google ScholarDigital Library
Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. Voxceleb: A large-scale speaker identification dataset. arXiv (2017), 2616–2620.Google Scholar
Andreas Nautsch, Abelino Jim, Mohamed Amine, Aymen Mtibaa, Mohammed Ahmed, Alberto Abad, Francisco Teixeira, Driss Matrouf, Marta Gomez-barrero, and Dijana Petrovska-delacr. 2019. Preserving privacy in speaker and speech characterisation. Computer Speech and Language 58 (2019), 441–480. https://doi.org/10.1016/j.csl.2019.06.001Google ScholarDigital Library
Andreas Nautsch, Jose Patino, Natalia Tomashenko, Junichi Yamagishi, Paul Gauthier Noé, Jean François Bonastre, Massimiliano Todisco, and Nicholas Evans. 2020. The privacy ZEBRA: Zero evidence biometric recognition assessment. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-Octob (2020), 1698–1702. https://doi.org/10.21437/Interspeech.2020-1815Google ScholarCross Ref
NIST. 2019. NIST 2019 Speaker Recognition Evaluation Plan. 1 (2019), 1–7.Google Scholar
NIST. 2020. NIST 2020 CTS Speaker Recognition Challenge Evaluation Plan. Technical Report. 1–8 pages.Google Scholar
Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patricia Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, and Abeer Alwan. 2016. Speaker Identity and Voice Quality: Modeling Human Responses and Automatic Speaker Recognition.. In Interspeech 2016. ISCA, San Francisco, CA, USA. https://doi.org/10.21437/Interspeech.2016-523Google ScholarCross Ref
Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep Face Recognition. In British Machine Vision Conference.Google Scholar
Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, and Haizhou Li. 2020. The INTERSPEECH 2020 far-field speaker verification challenge. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-Octob (2020), 3456–3460. https://doi.org/10.21437/Interspeech.2020-1249Google ScholarCross Ref
Inioluwa Deborah Raji and Joy Buolamwini. 2019. Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products. AIES 2019 - Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (2019), 429–435. https://doi.org/10.1145/3306618.3314244Google ScholarDigital Library
Inioluwa Deborah Raji and Genevieve Fried. 2021. About Face: A Survey of Facial Recognition Evaluation. (2021). http://arxiv.org/abs/2102.00813Google Scholar
Douglas A. Reynolds. 2002. An Overview of Automatic Speaker Recognition Technology. IEEE (2002).Google Scholar
Morgan Klaus Scheuerman, Jacob M. Paul, and Jed R. Brubaker. 2019. How computers see gender: An evaluation of gender classification in commercial facial analysis and image labeling services. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019). https://doi.org/10.1145/3359246Google ScholarDigital Library
Lea Schönherr, Maximilian Golla, Thorsten Eisenhofer, Jan Wiele, Dorothea Kolossa, and Thorsten Holz. 2020. Unacceptable, where is my privacy? Exploring Accidental Triggers of Smart Speakers. (8 2020). http://arxiv.org/abs/2008.00508Google Scholar
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings(2015), 1–14.Google Scholar
Rita Singh. 2019. Profiling Humans from their Voice. https://doi.org/10.1007/978-981-13-8403-5Google ScholarCross Ref
D Snyder, D Garcia-Romero, D Povey, and S Khudanpur. 2017. Deep Neural Network Embeddings for Text-Independent Speaker Verification.Interspeech (2017). https://www.isca-speech.org/archive/Interspeech_2017/pdfs/0620.PDFGoogle Scholar
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-Vectors: Robust DNN Embeddings for Speaker Recognition. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5329–5333.Google ScholarDigital Library
Harini Suresh and John Guttag. 2021. A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle. In EAAMO ’21: Equity and Access in Algorithms, Mechanisms, and Optimization.Google Scholar
Rachael Tatman and Conner Kasten. 2017. Effects of talker dialect, gender & race on accuracy of bing speech and youtube automatic captions. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017-Augus (2017), 934–938. https://doi.org/10.21437/Interspeech.2017-1746Google ScholarCross Ref
Wiebke Toussaint, Akhil Mathur, Aaron Yi Ding, and Fahim Kawsar. 2021. Characterising the Role of Pre-Processing Parameters in Audio-based Embedded Machine Learning. In The 3rd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChal- lengeIoT 21). Association for Computing Machinery, Coimbra, Portugal, 439–445. https://doi.org/10.1145/3485730.3493448Google ScholarDigital Library
Wiebke Toussaint, Akhil Mathur, Fahim Kawsar, and Aaron Yi Ding. 2022. Tiny, always-on and fragile: Bias propagation through design choices in on-device machine learning workflows. (2022), 19 pages. http://arxiv.org/abs/2201.07677Google Scholar
Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2021. Bias Preservation in Machine Learning : The Legality of Fairness Metrics Under EU Non- Discrimination Law. West Virginia Law Review, Forthcoming(2021), 1–51. https://ssrn.com/abstract=3792772Google Scholar
Wikipedia contributors. 2022. List of languages by number of native speakers in India. https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers_in_India [Online; accessed 6-May-2022].Google Scholar
Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee. 2021. SUPERB: Speech processing Universal PERformance Benchmark. (2021). http://arxiv.org/abs/2105.01051Google Scholar
Hossein Zeinali, Kong Aik Lee, Jahangir Alam, and Lukas Burget. 2020. Short-duration Speaker Verification (SdSV) Challenge 2021: the Challenge Evaluation Plan. Technical Report. 1–13 pages. http://arxiv.org/abs/1912.06311Google Scholar

Index Terms

Bias in Automated Speaker Recognition
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
2. Security and privacy

Index terms have been assigned to the content through auto-classification.

Recommendations

Multi-style speaker recognition database in practical conditions

This work describes the process of collection and organization of a multi-style database for speaker recognition. The multi-style database organization is based on three different categories of speaker recognition: voice-password, text-dependent and ...
Read More
The NIST 1999 Speaker Recognition Evaluation An Overview

Martin, Alvin, and Przybocki, Mark, The NIST 1999 Speaker Recognition Evaluation An Overview, Digital Signal Processing10(2000), 1 18.This article summarizes the 1999 NIST Speaker Recognition Evaluation. It discusses the overall research objectives, the ...
Read More
Speaker Verification by Human Listeners

Schmidt-Nielsen, Astrid, and Crystal, Thomas H., Speaker Verification by Human Listeners: Experiments Comparing Human and Machine Performance Using the NIST 1998 Speaker Evaluation Data, Digital Signal Processing10(2000), 249 266.The speaker ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
June 2022
2351 pages
ISBN:9781450393522
DOI:10.1145/3531146

Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2022
Check for updates
Author Tags
audit
bias
evaluation
fairness
speaker recognition
speaker verification
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 1,700
  Total Downloads
- Downloads (Last 12 months)774
- Downloads (Last 6 weeks)106
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Bias in Automated Speaker Recognition

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-style speaker recognition database in practical conditions

The NIST 1999 Speaker Recognition Evaluation An Overview

Speaker Verification by Human Listeners

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Bias in Automated Speaker Recognition

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-style speaker recognition database in practical conditions

The NIST 1999 Speaker Recognition Evaluation An Overview

Speaker Verification by Human Listeners

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media