Skip to main content
Erschienen in: International Journal of Speech Technology 3/2013

01.09.2013

Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments

verfasst von: Ismail Mohd Adnan Shahin

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speaker recognition performance in emotional talking environments is not as high as it is in neutral talking environments. This work focuses on proposing, implementing, and evaluating a new approach to enhance the performance in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues. Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in this work. This approach has been tested on our collected emotional speech database which is composed of six emotions. The results of this work show that speaker identification performance based on using both gender and emotion cues is higher than that based on using gender cues only, emotion cues only, and neither gender nor emotion cues by 7.22 %, 4.45 %, and 19.56 %, respectively. This work also shows that the optimum speaker identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models in the emotional talking environments. The achieved average speaker identification performance based on the new proposed approach falls within 2.35 % of that obtained in subjective evaluation by human judges.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdulla, W. H., & Kasabov, N. K. (2001). Improving speech recognition performance through gender separation. In Artificial neural networks and expert systems international conference (ANNES), Dunedin, New Zealand (pp. 218–222). Abdulla, W. H., & Kasabov, N. K. (2001). Improving speech recognition performance through gender separation. In Artificial neural networks and expert systems international conference (ANNES), Dunedin, New Zealand (pp. 218–222).
Zurück zum Zitat Bao, H., Xu, M., & Zheng, T. F. (2007). Emotion attribute projection for speaker recognition on emotional speech. In INTERSPEECH 2007, Antwerp, Belgium, August (pp. 758–761). Bao, H., Xu, M., & Zheng, T. F. (2007). Emotion attribute projection for speaker recognition on emotional speech. In INTERSPEECH 2007, Antwerp, Belgium, August (pp. 758–761).
Zurück zum Zitat Casale, S., Russo, A., & Serrano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49(10–11), 801–810. CrossRef Casale, S., Russo, A., & Serrano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49(10–11), 801–810. CrossRef
Zurück zum Zitat Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Collias, S., Fellenz, W., & Taylor, J. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80. CrossRef Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Collias, S., Fellenz, W., & Taylor, J. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80. CrossRef
Zurück zum Zitat Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100. CrossRef Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100. CrossRef
Zurück zum Zitat Fragopanagos, N., & Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Networks (Special issue), 18, 389–405. CrossRef Fragopanagos, N., & Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Networks (Special issue), 18, 389–405. CrossRef
Zurück zum Zitat Harb, H., & Chen, L. (2003). Gender identification using a general audio classifier. In International conference on multimedia and expo 2003 (ICME’03), July (pp. 733–736). Harb, H., & Chen, L. (2003). Gender identification using a general audio classifier. In International conference on multimedia and expo 2003 (ICME’03), July (pp. 733–736).
Zurück zum Zitat Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40. CrossRef Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40. CrossRef
Zurück zum Zitat Koolagudi, S. G., & Krothapalli, R. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48. CrossRef Koolagudi, S. G., & Krothapalli, R. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48. CrossRef
Zurück zum Zitat Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In 8th European conference on speech communication and technology, Geneva, Switzerland, September (pp. 125–128). Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In 8th European conference on speech communication and technology, Geneva, Switzerland, September (pp. 125–128).
Zurück zum Zitat Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303. CrossRef Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303. CrossRef
Zurück zum Zitat Li, D., Yang, Y., Wu, Z., & Wu, T. (2005). Emotion-state conversion for speaker recognition. In LNCS: Vol. 3784. Affective computing and intelligent interaction 2005 (pp. 403–410). Berlin: Springer. CrossRef Li, D., Yang, Y., Wu, Z., & Wu, T. (2005). Emotion-state conversion for speaker recognition. In LNCS: Vol. 3784. Affective computing and intelligent interaction 2005 (pp. 403–410). Berlin: Springer. CrossRef
Zurück zum Zitat Luengo, I., Navas, E., Hernaez, I., & Sanches, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH 2005, Lisbon, Portugal, September (pp. 493–496). Luengo, I., Navas, E., Hernaez, I., & Sanches, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH 2005, Lisbon, Portugal, September (pp. 493–496).
Zurück zum Zitat Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112. CrossRef Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112. CrossRef
Zurück zum Zitat Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097–1108. CrossRef Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097–1108. CrossRef
Zurück zum Zitat Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623. CrossRef Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623. CrossRef
Zurück zum Zitat Petrushin, V. A. (2000). Emotion recognition in speech signal: experimental study, development, and application. In Proceedings of international conference on spoken language processing (ICSLP 2000). Petrushin, V. A. (2000). Emotion recognition in speech signal: experimental study, development, and application. In Proceedings of international conference on spoken language processing (ICSLP 2000).
Zurück zum Zitat Picard, R. W. (1995). Affective computing. MIT Media Lab Perceptual Computing Section tech. rep., No. 321. Picard, R. W. (1995). Affective computing. MIT Media Lab Perceptual Computing Section tech. rep., No. 321.
Zurück zum Zitat Pittermann, J., Pittermann, A., & Minker, W. (2010). Emotion recognition and adaptation in spoken dialogue systems. International Journal of Speech Technology, 13, 49–60. CrossRef Pittermann, J., Pittermann, A., & Minker, W. (2010). Emotion recognition and adaptation in spoken dialogue systems. International Journal of Speech Technology, 13, 49–60. CrossRef
Zurück zum Zitat Polzin, T. S., & Waibel, A. H. (1998). Detecting emotions in speech, cooperative multimodal communication. In Second international conference (CMC 1998). Polzin, T. S., & Waibel, A. H. (1998). Detecting emotions in speech, cooperative multimodal communication. In Second international conference (CMC 1998).
Zurück zum Zitat Rabiner, L. R., & Juang, B. H. (1983). Fundamentals of speech recognition. Eaglewood Cliffs: Prentice Hall (Chap. 6). Rabiner, L. R., & Juang, B. H. (1983). Fundamentals of speech recognition. Eaglewood Cliffs: Prentice Hall (Chap. 6).
Zurück zum Zitat Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. In ICASSP 2002, May (Vol. 4, pp. IV-4072–IV-4075). Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. In ICASSP 2002, May (Vol. 4, pp. IV-4072–IV-4075).
Zurück zum Zitat Shahin, I. (2008a). Speaker recognition systems in the emotional environment. In 3rd international conference on information & communication technologies: from theory to applications (ICTTA’08), IEEE section, France, Damascus, Syria, April. Shahin, I. (2008a). Speaker recognition systems in the emotional environment. In 3rd international conference on information & communication technologies: from theory to applications (ICTTA’08), IEEE section, France, Damascus, Syria, April.
Zurück zum Zitat Shahin, I. (2008b). Speaking style authentication using suprasegmental hidden Markov models. University of Sharjah Journal of Pure and Applied Sciences, 5(2), 41–65. Shahin, I. (2008b). Speaking style authentication using suprasegmental hidden Markov models. University of Sharjah Journal of Pure and Applied Sciences, 5(2), 41–65.
Zurück zum Zitat Shahin, I. (2008c). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing Journal, 88(11), 2700–2708. MATHCrossRef Shahin, I. (2008c). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing Journal, 88(11), 2700–2708. MATHCrossRef
Zurück zum Zitat Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1), 41–46. MathSciNet Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1), 41–46. MathSciNet
Zurück zum Zitat Ververidis, D., & Kotropoulos, C. (2004). Automatic speech classification to five emotional states based on gender information. In 12th European signal processing conference, Vienna, Austria, September (pp. 341–344). Ververidis, D., & Kotropoulos, C. (2004). Automatic speech classification to five emotional states based on gender information. In 12th European signal processing conference, Vienna, Austria, September (pp. 341–344).
Zurück zum Zitat Vogt, T., & Andre, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC 2006), Genoa, Italy. Vogt, T., & Andre, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC 2006), Genoa, Italy.
Zurück zum Zitat Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785. CrossRef Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785. CrossRef
Metadaten
Titel
Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments
verfasst von
Ismail Mohd Adnan Shahin
Publikationsdatum
01.09.2013
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-013-9188-2

Weitere Artikel der Ausgabe 3/2013

International Journal of Speech Technology 3/2013 Zur Ausgabe

Neuer Inhalt