skip to main content
10.1145/3531146.3533089acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open Access

Bias in Automated Speaker Recognition

Authors Info & Claims
Published:20 June 2022Publication History

ABSTRACT

Automated speaker recognition uses data processing to identify speakers by their voice. Today, automated speaker recognition is deployed on billions of smart devices and in services such as call centres. Despite their wide-scale deployment and known sources of bias in related domains like face recognition and natural language processing, bias in automated speaker recognition has not been studied systematically. We present an in-depth empirical and analytical study of bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. Drawing on an established framework for understanding sources of harm in machine learning, we show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge, including data generation, model building, and implementation. Most affected are female speakers and non-US nationalities, who experience significant performance degradation. Leveraging the insights from our findings, we make practical recommendations for mitigating bias in automated speaker recognition, and outline future research directions.

References

  1. Martine Adda-Decker and Lori Lamel. 2005. Do speech recognizers prefer female speakers?INTERSPEECH (2005), 2205–2208. https://www.isca-speech.org/archive/interspeech_2005/addadecker05_interspeech.htmlGoogle ScholarGoogle Scholar
  2. Zhongxin Bai and Xiao Lei Zhang. 2021. Speaker recognition based on deep learning: An overview. Neural Networks 140(2021), 65–99. https://doi.org/10.1016/j.neunet.2021.03.004Google ScholarGoogle ScholarCross RefCross Ref
  3. Tolga Bolukbasi, Kai-wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker ? Debiasing Word Embeddings. In NIPS’16: Proceedings of the 30th International Conference on Neural Information Processing Systems. 4356 – 4364.Google ScholarGoogle Scholar
  4. Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of Machine Learning Research: Conference on Fairness, Accountability, and Transparency, Vol. 81. 1889–1896.Google ScholarGoogle Scholar
  5. Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong Jin Lee, and Icksang Han. 2020. In defence of metric learning for speaker recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-Octob (2020), 2977–2981. https://doi.org/10.21437/Interspeech.2020-1064Google ScholarGoogle ScholarCross RefCross Ref
  6. Joon Son Chung and Andrew Zisserman. 2017. Out of time: Automated lip sync in the wild. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10117 LNCS, i(2017), 251–263. https://doi.org/10.1007/978-3-319-54427-4_19Google ScholarGoogle ScholarCross RefCross Ref
  7. Gianni Fenu, Mirko Marras, Giacomo Medda, and Giacomo Meloni. 2021. Fair Voice Biometrics : Impact of Demographic Imbalance on Group Fairness in Speaker Recognition. (2021), 1892–1896.Google ScholarGoogle Scholar
  8. Sadaoki Furui. 1994. An Overview of Speaker Recognition Technology. In ESCA Workshop on Automatic Speaker Recognition, Identification and Verification. 1 – 9.Google ScholarGoogle Scholar
  9. Craig S. Greenberg, Lisa P. Mason, Seyed Omid Sadjadi, and Douglas A. Reynolds. 2020. Two decades of speaker recognition evaluation at the national institute of standards and technology. Computer Speech and Language 60 (2020). https://doi.org/10.1016/j.csl.2019.101032Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Oxford Visual Geometry Group. 2021. The VoxCeleb Speaker Recognition Challenge 2021. https://www.robots.ox.ac.uk/~vgg/data/voxceleb/competition2021.htmlGoogle ScholarGoogle Scholar
  11. John H.L. Hansen and Taufiq Hasan. 2015. Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine 32, 6 (2015), 74–99. https://doi.org/10.1109/MSP.2015.2462851Google ScholarGoogle ScholarCross RefCross Ref
  12. Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems (2016), 3323–3331.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Khaled Hechmi, Trung Ngo Trong, Ville Hautamaki, and Tomi Kinnunen. 2021. VoxCeleb Enrichment for Age and Gender Recognition. (2021). http://arxiv.org/abs/2109.13510Google ScholarGoogle Scholar
  14. Georg Heigold, Ignacio Moreno, Samy Bengio, and Noam Shazeer. 2016. End-to-End Text-Dependent Speaker Verification. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5115–5119.Google ScholarGoogle Scholar
  15. Hee Soo Heo, Bong Jin Lee, Jaesung Huh, and Joon Son Chung. 2020. Clova baseline system for the VoxCeleb speaker recognition challenge 2020. arXiv (2020), 1–3.Google ScholarGoogle Scholar
  16. Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising Bias in Compressed Models. https://arxiv.org/abs/2010.03058Google ScholarGoogle Scholar
  17. Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner. 2016. Machine Bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingGoogle ScholarGoogle Scholar
  18. Elie Khoury, Laurent El Shafey, Christopher McCool, Manuel Günther, and Sébastien Marcel. 2014. Bi-modal biometric authentication on mobile phones in challenging conditions. Image and Vision Computing 32, 12 (2014), 1147–1160. https://doi.org/10.1016/j.imavis.2013.10.001Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Khoury, B. Vesnicer, J. Franco-Pedroso, R. Violato, Z. Boulkcnafet, L. M. Mazaira Fernandez, M. Diez, J. Kosmala, H. Khemiri, T. Cipr, R. Saeidi, M. Gunther, J. Zganec-Gros, R. Zazo Candil, F. Simoes, M. Bengherabi, A. Alvarez Marquina, M. Penagarikano, A. Abad, M. Boulayemen, P. Schwarz, D. Van Leeuwen, J. Gonzalez-Dominguez, M. Uliani Neto, E. Boutellaa, P. Gomez Vilda, A. Varona, D. Petrovska-Delacretaz, P. Matejka, J. Gonzalez-Rodriguez, T. Pereira, F. Harizi, L. J. Rodriguez-Fuentes, L. El Shafey, M. Angeloni, G. Bordel, G. Chollet, and S. Marcel. 2013. The 2013 speaker recognition evaluation in mobile environment. Proceedings - 2013 International Conference on Biometrics, ICB 2013 (2013). https://doi.org/10.1109/ICB.2013.6613025Google ScholarGoogle ScholarCross RefCross Ref
  20. Davis E. King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tomi Kinnunen and Haizhou Li. 2009. An Overview of Text-Independent Speaker Recognition : from Features to Supervectors. Speech Communication 52, 1 (2009), 12. https://doi.org/10.1016/j.specom.2009.08.009Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition. PNAS 117, 14 (2020), 7684–7689. https://doi.org/10.1073/pnas.1915768117/-/DCSupplemental.yGoogle ScholarGoogle ScholarCross RefCross Ref
  23. Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, and Zhenyao Zhu. 2017. Deep speaker: An end-to-end neural speaker embedding system. arXiv (2017).Google ScholarGoogle Scholar
  24. Beryl Lipton and Quintin Cooper. 2021. The Catalog of Carceral Surveillance: Voice Recognition and Surveillance. https://www.eff.org/deeplinks/2021/09/catalog-carceral-surveillance-voice-recognition-and-surveillanceGoogle ScholarGoogle Scholar
  25. Mohamed Maouche, Brij Mohan, Lal Srivastava, Nathalie Vauquier, Marc Tommasi, Emmanuel Vincent, Mohamed Maouche, Brij Mohan, Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Mohamed Maouche, Brij Mohan, Lal Srivastava, Nathalie Vauquier, Emmanuel Vincent, and De Lorraine. 2020. A comparative study of speech anonymization metrics. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Shanghai, China.Google ScholarGoogle ScholarCross RefCross Ref
  26. A Martin, G Doddington, T Kamm, M Ordowski, and M Przybocki. 1997. The DET Curve in Assessment of Detection Task Performance. Technical Report. National Institute of Standards and Technology (NIST), Gaithersburg MD. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.4489Google ScholarGoogle Scholar
  27. Luis Miguel Mazaira-Fernandez, Agustín Álvarez-Marquina, and Pedro Gómez-Vilda. 2015. Improving speaker recognition by biometric voice deconstruction. Frontiers in Bioengineering and Biotechnology 3, September(2015), 1–19. https://doi.org/10.3389/fbioe.2015.00126Google ScholarGoogle ScholarCross RefCross Ref
  28. M McLaren, L Ferrer, D Castan, and A Lawson. 2016. The Speakers in the Wild (SITW) speaker recognition database.. In Interspeech. pdfs.semanticscholar.org. https://pdfs.semanticscholar.org/3fe3/58a66359ee2660ec0d13e727eb8f3f0007c2.pdfGoogle ScholarGoogle Scholar
  29. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. arXiv (2019).Google ScholarGoogle Scholar
  30. Margaret Mitchell, Dylan Baker, Nyalleng Moorosi, Emily Denton, Ben Hutchinson, Alex Hanna, Timnit Gebru, and Jamie Morgenstern. 2020. Diversity and inclusion metrics in subset selection. AIES 2020 - Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (2020), 117–123. https://doi.org/10.1145/3375627.3375832Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Marta Morrás. 2021. BBVA Mexico allows its pensioner customers to provide proof of life from home thanks to Veridas voice biometrics. https://veridas.com/en/bbva-mexico-allows-pensioner-customers-provide-proof-of-life-from-home/Google ScholarGoogle Scholar
  32. Arsha Nagrani, Joon Son Chung, Jaesung Huh, Andrew Brown, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, and Andrew Zisserman. 2020. VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge. (2020). http://arxiv.org/abs/2012.06867Google ScholarGoogle Scholar
  33. Arsha Nagrani, Joon Son Chung, Weidi Xie, and Andrew Zisserman. 2020. Voxceleb: Large-scale speaker verification in the wild. Computer Speech and Language 60 (2020), 101027. https://doi.org/10.1016/j.csl.2019.101027Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. Voxceleb: A large-scale speaker identification dataset. arXiv (2017), 2616–2620.Google ScholarGoogle Scholar
  35. Andreas Nautsch, Abelino Jim, Mohamed Amine, Aymen Mtibaa, Mohammed Ahmed, Alberto Abad, Francisco Teixeira, Driss Matrouf, Marta Gomez-barrero, and Dijana Petrovska-delacr. 2019. Preserving privacy in speaker and speech characterisation. Computer Speech and Language 58 (2019), 441–480. https://doi.org/10.1016/j.csl.2019.06.001Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Andreas Nautsch, Jose Patino, Natalia Tomashenko, Junichi Yamagishi, Paul Gauthier Noé, Jean François Bonastre, Massimiliano Todisco, and Nicholas Evans. 2020. The privacy ZEBRA: Zero evidence biometric recognition assessment. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-Octob (2020), 1698–1702. https://doi.org/10.21437/Interspeech.2020-1815Google ScholarGoogle ScholarCross RefCross Ref
  37. NIST. 2019. NIST 2019 Speaker Recognition Evaluation Plan. 1 (2019), 1–7.Google ScholarGoogle Scholar
  38. NIST. 2020. NIST 2020 CTS Speaker Recognition Challenge Evaluation Plan. Technical Report. 1–8 pages.Google ScholarGoogle Scholar
  39. Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patricia Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, and Abeer Alwan. 2016. Speaker Identity and Voice Quality: Modeling Human Responses and Automatic Speaker Recognition.. In Interspeech 2016. ISCA, San Francisco, CA, USA. https://doi.org/10.21437/Interspeech.2016-523Google ScholarGoogle ScholarCross RefCross Ref
  40. Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep Face Recognition. In British Machine Vision Conference.Google ScholarGoogle Scholar
  41. Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, and Haizhou Li. 2020. The INTERSPEECH 2020 far-field speaker verification challenge. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-Octob (2020), 3456–3460. https://doi.org/10.21437/Interspeech.2020-1249Google ScholarGoogle ScholarCross RefCross Ref
  42. Inioluwa Deborah Raji and Joy Buolamwini. 2019. Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products. AIES 2019 - Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (2019), 429–435. https://doi.org/10.1145/3306618.3314244Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Inioluwa Deborah Raji and Genevieve Fried. 2021. About Face: A Survey of Facial Recognition Evaluation. (2021). http://arxiv.org/abs/2102.00813Google ScholarGoogle Scholar
  44. Douglas A. Reynolds. 2002. An Overview of Automatic Speaker Recognition Technology. IEEE (2002).Google ScholarGoogle Scholar
  45. Morgan Klaus Scheuerman, Jacob M. Paul, and Jed R. Brubaker. 2019. How computers see gender: An evaluation of gender classification in commercial facial analysis and image labeling services. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019). https://doi.org/10.1145/3359246Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Lea Schönherr, Maximilian Golla, Thorsten Eisenhofer, Jan Wiele, Dorothea Kolossa, and Thorsten Holz. 2020. Unacceptable, where is my privacy? Exploring Accidental Triggers of Smart Speakers. (8 2020). http://arxiv.org/abs/2008.00508Google ScholarGoogle Scholar
  47. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings(2015), 1–14.Google ScholarGoogle Scholar
  48. Rita Singh. 2019. Profiling Humans from their Voice. https://doi.org/10.1007/978-981-13-8403-5Google ScholarGoogle ScholarCross RefCross Ref
  49. D Snyder, D Garcia-Romero, D Povey, and S Khudanpur. 2017. Deep Neural Network Embeddings for Text-Independent Speaker Verification.Interspeech (2017). https://www.isca-speech.org/archive/Interspeech_2017/pdfs/0620.PDFGoogle ScholarGoogle Scholar
  50. David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-Vectors: Robust DNN Embeddings for Speaker Recognition. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5329–5333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Harini Suresh and John Guttag. 2021. A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle. In EAAMO ’21: Equity and Access in Algorithms, Mechanisms, and Optimization.Google ScholarGoogle Scholar
  52. Rachael Tatman and Conner Kasten. 2017. Effects of talker dialect, gender & race on accuracy of bing speech and youtube automatic captions. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017-Augus (2017), 934–938. https://doi.org/10.21437/Interspeech.2017-1746Google ScholarGoogle ScholarCross RefCross Ref
  53. Wiebke Toussaint, Akhil Mathur, Aaron Yi Ding, and Fahim Kawsar. 2021. Characterising the Role of Pre-Processing Parameters in Audio-based Embedded Machine Learning. In The 3rd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChal- lengeIoT 21). Association for Computing Machinery, Coimbra, Portugal, 439–445. https://doi.org/10.1145/3485730.3493448Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Wiebke Toussaint, Akhil Mathur, Fahim Kawsar, and Aaron Yi Ding. 2022. Tiny, always-on and fragile: Bias propagation through design choices in on-device machine learning workflows. (2022), 19 pages. http://arxiv.org/abs/2201.07677Google ScholarGoogle Scholar
  55. Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2021. Bias Preservation in Machine Learning : The Legality of Fairness Metrics Under EU Non- Discrimination Law. West Virginia Law Review, Forthcoming(2021), 1–51. https://ssrn.com/abstract=3792772Google ScholarGoogle Scholar
  56. Wikipedia contributors. 2022. List of languages by number of native speakers in India. https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers_in_India [Online; accessed 6-May-2022].Google ScholarGoogle Scholar
  57. Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee. 2021. SUPERB: Speech processing Universal PERformance Benchmark. (2021). http://arxiv.org/abs/2105.01051Google ScholarGoogle Scholar
  58. Hossein Zeinali, Kong Aik Lee, Jahangir Alam, and Lukas Burget. 2020. Short-duration Speaker Verification (SdSV) Challenge 2021: the Challenge Evaluation Plan. Technical Report. 1–13 pages. http://arxiv.org/abs/1912.06311Google ScholarGoogle Scholar

Index Terms

  1. Bias in Automated Speaker Recognition
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
        June 2022
        2351 pages
        ISBN:9781450393522
        DOI:10.1145/3531146

        Copyright © 2022 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 June 2022

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format