Skip to main content

2020 | OriginalPaper | Buchkapitel

Exploring Algorithmic Fairness in Deep Speaker Verification

verfasst von : Gianni Fenu, Hicham Lafhouli, Mirko Marras

Erschienen in: Computational Science and Its Applications – ICCSA 2020

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To allow individuals to complete voice-based tasks (e.g., send messages or make payments), modern automated systems are required to match the speaker’s voice to a unique digital identity representation for verification. Despite the increasing accuracy achieved so far, it still remains under-explored how the decisions made by such systems may be influenced by the inherent characteristics of the individual under consideration. In this paper, we investigate how state-of-the-art speaker verification models are susceptible to unfairness towards legally-protected classes of individuals, characterized by a common sensitive attribute (i.e., gender, age, language). To this end, we first arranged a voice dataset, with the aim of including and identifying various demographic classes. Then, we conducted a performance analysis at different levels, from equal error rates to verification score distributions. Experiments show that individuals belonging to certain demographic groups systematically experience higher error rates, highlighting the need of fairer speaker recognition models and, by extension, of proper evaluation frameworks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this paper, we will use the terms “individuals” and “users” interchangeably.
 
2
Code, data, and models are available at https://​mirkomarras.​github.​io/​fair-voice/​.
 
4
While the gender is by no means a binary construct, to the best of our knowledge no dataset for speaker recognition with non-binary genders exists. What we are considering is a binary feature, as the current publicly available datasets offer.
 
6
In the context of our work, where we are more interested in understanding algorithm characteristics beyond overall accuracy, the small further accuracy improvements that can probably be achieved through intensive hyper-parameter tuning would not substantially affect the main outcomes of our analyses.
 
7
Please note that the figures in this manuscript are best seen in color.
 
Literatur
2.
Zurück zum Zitat Alasadi, J., Al Hilli, A., Singh, V.K.: Toward fairness in face matching algorithms. In: Proceedings of the 1st International Workshop on Fairness, Accountability, and Transparency in MultiMedia, pp. 19–25 (2019) Alasadi, J., Al Hilli, A., Singh, V.K.: Toward fairness in face matching algorithms. In: Proceedings of the 1st International Workshop on Fairness, Accountability, and Transparency in MultiMedia, pp. 19–25 (2019)
3.
Zurück zum Zitat Anzalone, L., Barra, P., Barra, S., Narducci, F., Nappi, M.: Transfer learning for facial attributes prediction and clustering. In: Wang, G., El Saddik, A., Lai, X., Martinez Perez, G., Choo, K.-K.R. (eds.) iSCI 2019. CCIS, vol. 1122, pp. 105–117. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-1301-5_9CrossRef Anzalone, L., Barra, P., Barra, S., Narducci, F., Nappi, M.: Transfer learning for facial attributes prediction and clustering. In: Wang, G., El Saddik, A., Lai, X., Martinez Perez, G., Choo, K.-K.R. (eds.) iSCI 2019. CCIS, vol. 1122, pp. 105–117. Springer, Singapore (2019). https://​doi.​org/​10.​1007/​978-981-15-1301-5_​9CrossRef
4.
Zurück zum Zitat Barocas, S., Hardt, M., Narayanan, A.: Fairness in machine learning. In: NIPS (2017) Barocas, S., Hardt, M., Narayanan, A.: Fairness in machine learning. In: NIPS (2017)
5.
Zurück zum Zitat Barra, P., Bisogni, C., Nappi, M., Freire-Obregón, D., Castrillón-Santana, M.: Gender classification on 2D human skeleton. In: 2019 3rd International Conference on Bio-engineering for Smart Technologies (BioSMART), pp. 1–4. IEEE (2019) Barra, P., Bisogni, C., Nappi, M., Freire-Obregón, D., Castrillón-Santana, M.: Gender classification on 2D human skeleton. In: 2019 3rd International Conference on Bio-engineering for Smart Technologies (BioSMART), pp. 1–4. IEEE (2019)
6.
Zurück zum Zitat Boratto, L., Carta, S.: Modeling the preferences of a group of users detected by clustering: a group recommendation case-study. In: 4th International Conference on Web Intelligence, Mining and Semantics, WIMS, pp. 16:1–16:7. ACM (2014) Boratto, L., Carta, S.: Modeling the preferences of a group of users detected by clustering: a group recommendation case-study. In: 4th International Conference on Web Intelligence, Mining and Semantics, WIMS, pp. 16:1–16:7. ACM (2014)
8.
Zurück zum Zitat Chen, Y.h., Lopez-Moreno, I., Sainath, T.N., Visontai, M., Alvarez, R., Parada, C.: Locally-connected and convolutional neural networks for small footprint speaker recognition. In: Proceedings Interspeech 2015, pp. 1136–1140 (2015) Chen, Y.h., Lopez-Moreno, I., Sainath, T.N., Visontai, M., Alvarez, R., Parada, C.: Locally-connected and convolutional neural networks for small footprint speaker recognition. In: Proceedings Interspeech 2015, pp. 1136–1140 (2015)
9.
Zurück zum Zitat Chung, J.S., Nagrani, A., Zisserman, A.: Voxceleb2: deep speaker recognition. In: Proceedings Interspeech 2018, pp. 1086–1090 (2018) Chung, J.S., Nagrani, A., Zisserman, A.: Voxceleb2: deep speaker recognition. In: Proceedings Interspeech 2018, pp. 1086–1090 (2018)
10.
Zurück zum Zitat Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011) Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
11.
Zurück zum Zitat Drozdowski, P., Rathgeb, C., Dantcheva, A., Damer, N., Busch, C.: Demographic bias in biometrics: a survey on an emerging challenge. arXiv:2003.02488 (2020) Drozdowski, P., Rathgeb, C., Dantcheva, A., Damer, N., Busch, C.: Demographic bias in biometrics: a survey on an emerging challenge. arXiv:​2003.​02488 (2020)
12.
Zurück zum Zitat Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Innovations in Theoretical Computer Science Conference, pp. 214–226 (2012) Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Innovations in Theoretical Computer Science Conference, pp. 214–226 (2012)
13.
Zurück zum Zitat Fang, M., Damer, N., Kirchbuchner, F., Kuijper, A.: Demographic bias in presentation attack detection of iris recognition systems. arXiv:2003.03151 (2020) Fang, M., Damer, N., Kirchbuchner, F., Kuijper, A.: Demographic bias in presentation attack detection of iris recognition systems. arXiv:​2003.​03151 (2020)
14.
Zurück zum Zitat Garcia, R.V., Wandzik, L., Grabner, L., Krueger, J.: The harms of demographic bias in deep face recognition research. In: 2019 International Conference on Biometrics (ICB), pp. 1–6. IEEE (2019) Garcia, R.V., Wandzik, L., Grabner, L., Krueger, J.: The harms of demographic bias in deep face recognition research. In: 2019 International Conference on Biometrics (ICB), pp. 1–6. IEEE (2019)
15.
Zurück zum Zitat Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 38(3), 50–57 (2017) Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 38(3), 50–57 (2017)
16.
Zurück zum Zitat Hansen, J.H., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015) Hansen, J.H., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)
17.
Zurück zum Zitat Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in neural information processing systems, pp. 3315–3323 (2016) Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in neural information processing systems, pp. 3315–3323 (2016)
18.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
19.
Zurück zum Zitat Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115–5119. IEEE (2016) Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115–5119. IEEE (2016)
20.
Zurück zum Zitat Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE (2017) Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE (2017)
21.
Zurück zum Zitat Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-vector based speaker recognition on short utterances. In: Proceedings Interspeech 2011, pp. 2341–2344 (2011) Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-vector based speaker recognition on short utterances. In: Proceedings Interspeech 2011, pp. 2341–2344 (2011)
22.
Zurück zum Zitat Lukic, Y., Vogt, C., Dürr, O., Stadelmann, T.: Speaker identification and clustering using convolutional neural networks. In: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2016) Lukic, Y., Vogt, C., Dürr, O., Stadelmann, T.: Speaker identification and clustering using convolutional neural networks. In: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2016)
23.
Zurück zum Zitat Mahfouz, A., Mahmoud, T.M., Eldin, A.S.: A survey on behavioral biometric authentication on smartphones. J. Inform. Secur. Appl. 37, 28–37 (2017) Mahfouz, A., Mahmoud, T.M., Eldin, A.S.: A survey on behavioral biometric authentication on smartphones. J. Inform. Secur. Appl. 37, 28–37 (2017)
24.
Zurück zum Zitat Marras, M., Korus, P., Memon, N., Fenu, G.: Adversarial optimization for dictionary attacks on speaker verification. In: Proceedings Interspeech 2019, pp. 2913–2917 (2019) Marras, M., Korus, P., Memon, N., Fenu, G.: Adversarial optimization for dictionary attacks on speaker verification. In: Proceedings Interspeech 2019, pp. 2913–2917 (2019)
25.
Zurück zum Zitat Nagrani, A., Chung, J.S., Xie, W., Zisserman, A.: VoxCeleb: large-scale speaker verification in the wild. Comput. Speech Lang. 60, 101027 (2020) Nagrani, A., Chung, J.S., Xie, W., Zisserman, A.: VoxCeleb: large-scale speaker verification in the wild. Comput. Speech Lang. 60, 101027 (2020)
26.
Zurück zum Zitat Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: Proceedings Interspeech 2017, pp. 2616–2620 (2017) Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: Proceedings Interspeech 2017, pp. 2616–2620 (2017)
27.
Zurück zum Zitat Ramos, G., Boratto, L.: Reputation (in)dependence in ranking systems: demographics influence over output disparities. CoRR abs/2005.12371 (2020) Ramos, G., Boratto, L.: Reputation (in)dependence in ranking systems: demographics influence over output disparities. CoRR abs/2005.12371 (2020)
29.
Zurück zum Zitat Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process 10(1–3), 19–41 (2000) Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process 10(1–3), 19–41 (2000)
30.
Zurück zum Zitat Selbst, A.D.: Disparate impact in big data policing. Ga. L. Rev. 52, 109 (2017) Selbst, A.D.: Disparate impact in big data policing. Ga. L. Rev. 52, 109 (2017)
31.
Zurück zum Zitat Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5329–5333. IEEE (2018) Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5329–5333. IEEE (2018)
32.
Zurück zum Zitat Terhörst, P., Kolf, J.N., Damer, N., Kirchbuchner, F., Kuijper, A.: Post-comparison mitigation of demographic bias in face recognition using fair score normalization. arXiv preprint arXiv:2002.03592 (2020) Terhörst, P., Kolf, J.N., Damer, N., Kirchbuchner, F., Kuijper, A.: Post-comparison mitigation of demographic bias in face recognition using fair score normalization. arXiv preprint arXiv:​2002.​03592 (2020)
33.
Zurück zum Zitat Tolan, S.: Fair and unbiased algorithmic decision making: current state and future challenges. arXiv preprint arXiv:1901.04730 (2019) Tolan, S.: Fair and unbiased algorithmic decision making: current state and future challenges. arXiv preprint arXiv:​1901.​04730 (2019)
34.
Zurück zum Zitat Variani, E., Lei, X., McDermott, E., Moreno, I.L., Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: International Conference on Acoustics, Speech and Signal Processing, pp. 4052–4056. IEEE (2014) Variani, E., Lei, X., McDermott, E., Moreno, I.L., Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: International Conference on Acoustics, Speech and Signal Processing, pp. 4052–4056. IEEE (2014)
Metadaten
Titel
Exploring Algorithmic Fairness in Deep Speaker Verification
verfasst von
Gianni Fenu
Hicham Lafhouli
Mirko Marras
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-58811-3_6

Premium Partner