Skip to main content
Top
Published in: International Journal of Speech Technology 4/2018

28-08-2018

Three-stage speaker verification architecture in emotional talking environments

Authors: Ismail Shahin, Ali Bou Nassif

Published in: International Journal of Speech Technology | Issue 4/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speaker verification performance in neutral talking environment is usually high, while it is sharply decreased in emotional talking environments. This performance degradation in emotional environments is due to the problem of mismatch between training in neutral environment while testing in emotional environments. In this work, a three-stage speaker verification architecture has been proposed to enhance speaker verification performance in emotional environments. This architecture is comprised of three cascaded stages: gender identification stage followed by an emotion identification stage followed by a speaker verification stage. The proposed framework has been evaluated on two distinct and independent emotional speech datasets: in-house dataset and “Emotional Prosody Speech and Transcripts” dataset. Our results show that speaker verification based on both gender information and emotion information is superior to each of speaker verification based on gender information only, emotion information only, and neither gender information nor emotion information. The attained average speaker verification performance based on the proposed framework is very alike to that attained in subjective assessment by human listeners.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Bosch, L. T. (2003). Emotions, speech and the ASR framework. Speech Communication, 40, 213–225.CrossRefMATH Bosch, L. T. (2003). Emotions, speech and the ASR framework. Speech Communication, 40, 213–225.CrossRefMATH
go back to reference Chen, L., Lee, K. A., Chng, E.-S., Ma, B., Li, H., & Dai, L. R., (2016). Content-aware local variability vector for speaker verification with short utterance. In The 41st IEEE international conference on acoustics, speech and signal processing, Shanghai, China, March 2016 (pp. 5485–5489). Chen, L., Lee, K. A., Chng, E.-S., Ma, B., Li, H., & Dai, L. R., (2016). Content-aware local variability vector for speaker verification with short utterance. In The 41st IEEE international conference on acoustics, speech and signal processing, Shanghai, China, March 2016 (pp. 5485–5489).
go back to reference Hansen, J. H. L., & Hasan, T., (2015). Speaker recognition by machines and humans: a tutorial review. IEEE Signal Processing Magazine, 32(6), 74–99.CrossRef Hansen, J. H. L., & Hasan, T., (2015). Speaker recognition by machines and humans: a tutorial review. IEEE Signal Processing Magazine, 32(6), 74–99.CrossRef
go back to reference Harb, H., & Chen, L. (2003). Gender identification using a general audio classifier. In International Conference on Multimedia and Expo 2003 (ICME’03), July 2003, (pp. 733–736). Harb, H., & Chen, L. (2003). Gender identification using a general audio classifier. In International Conference on Multimedia and Expo 2003 (ICME’03), July 2003, (pp. 733–736).
go back to reference Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.CrossRef Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.CrossRef
go back to reference Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.CrossRef Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.CrossRef
go back to reference Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.CrossRef Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.CrossRef
go back to reference Pillay, S. G., Ariyaeeinia, A., Pawlewski, M., & Sivakumaran, P. (2009). Speaker verification under mismatched data conditions. IET Signal Processing, 3(4), 236–246.CrossRef Pillay, S. G., Ariyaeeinia, A., Pawlewski, M., & Sivakumaran, P. (2009). Speaker verification under mismatched data conditions. IET Signal Processing, 3(4), 236–246.CrossRef
go back to reference Pitsikalis, V., & Maragos, P. (2009). Analysis and classification of speech signals by generalized fractal dimension features. Speech Communication, 51(12), 1206–1223.CrossRef Pitsikalis, V., & Maragos, P. (2009). Analysis and classification of speech signals by generalized fractal dimension features. Speech Communication, 51(12), 1206–1223.CrossRef
go back to reference Pittermann, J., Pittermann, A., & Minker, W. (2010). Emotion recognition and adaptation in spoken dialogue systems. International Journal of Speech Technology, 13, 49–60.CrossRef Pittermann, J., Pittermann, A., & Minker, W. (2010). Emotion recognition and adaptation in spoken dialogue systems. International Journal of Speech Technology, 13, 49–60.CrossRef
go back to reference Polzin, T. S., & Waibel, A. H., (1998). Detecting emotions in speech. Cooperative multimodal communication. In second international conference 1998, CMC 1998. Polzin, T. S., & Waibel, A. H., (1998). Detecting emotions in speech. Cooperative multimodal communication. In second international conference 1998, CMC 1998.
go back to reference Reynolds, D. A. (1995). Automatic speaker recognition using Gaussian mixture speaker models. The Lincoln Laboratory Journal, 8(2), 173–192. Reynolds, D. A. (1995). Automatic speaker recognition using Gaussian mixture speaker models. The Lincoln Laboratory Journal, 8(2), 173–192.
go back to reference Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. ICASSP 2002, 4, IV-4072–IV-4075. Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. ICASSP 2002, 4, IV-4072–IV-4075.
go back to reference Reynolds, D. A., Quatieri, T. F., & Dunn, R. B., (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.CrossRef Reynolds, D. A., Quatieri, T. F., & Dunn, R. B., (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.CrossRef
go back to reference Scherer, K. R., Johnstone, T., Klasmeyer, G., & Banziger, T. (2000). Can automatic speaker verification be improved by training the algorithms on emotional speech? Proceedings of International Conference on Spoken Language Processing, 2, 807–810.CrossRef Scherer, K. R., Johnstone, T., Klasmeyer, G., & Banziger, T. (2000). Can automatic speaker verification be improved by training the algorithms on emotional speech? Proceedings of International Conference on Spoken Language Processing, 2, 807–810.CrossRef
go back to reference Shahin, I. (2008). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing, 88(11), 2700–2708.CrossRefMATH Shahin, I. (2008). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing, 88(11), 2700–2708.CrossRefMATH
go back to reference Shahin, I. (2009). Verifying speakers in emotional environments. In The 9th IEEE international symposium on signal processing and information technology, Ajman, United Arab Emirates, December 2009, (pp. 328–333). Shahin, I. (2009). Verifying speakers in emotional environments. In The 9th IEEE international symposium on signal processing and information technology, Ajman, United Arab Emirates, December 2009, (pp. 328–333).
go back to reference Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.CrossRef Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.CrossRef
go back to reference Vogt, T., & Andre, E., (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of Language Resources and Evaluation Conference (LREC 2006), Genoa, Italy, 2006. Vogt, T., & Andre, E., (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of Language Resources and Evaluation Conference (LREC 2006), Genoa, Italy, 2006.
go back to reference Wang, L., Wang, J., Li, L., Zheng, T. F., & Soong, F. K. (2016). Improving speaker verification performance against long-term speaker variability. Speech Communication, 79, 14–29.CrossRef Wang, L., Wang, J., Li, L., Zheng, T. F., & Soong, F. K. (2016). Improving speaker verification performance against long-term speaker variability. Speech Communication, 79, 14–29.CrossRef
go back to reference Wu, W., Zheng, T. F., Xu, M. X., & Bao, H. J., (2006). Study on speaker verification on emotional speech. In Proceedings of International Conference on Spoken Language Processing, INTERSPEECH 2006. September 2006, (pp. 2102–2105). Wu, W., Zheng, T. F., Xu, M. X., & Bao, H. J., (2006). Study on speaker verification on emotional speech. In Proceedings of International Conference on Spoken Language Processing, INTERSPEECH 2006. September 2006, (pp. 2102–2105).
go back to reference Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification systems. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.CrossRef Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification systems. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.CrossRef
go back to reference Zhou, G., Hansen, J. H. L., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech & Audio Processing, 9(3), 201–216.CrossRef Zhou, G., Hansen, J. H. L., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech & Audio Processing, 9(3), 201–216.CrossRef
Metadata
Title
Three-stage speaker verification architecture in emotional talking environments
Authors
Ismail Shahin
Ali Bou Nassif
Publication date
28-08-2018
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 4/2018
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9543-4

Other articles of this Issue 4/2018

International Journal of Speech Technology 4/2018 Go to the issue