Skip to main content
Top
Published in: International Journal of Speech Technology 3/2013

01-09-2013

Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments

Author: Ismail Mohd Adnan Shahin

Published in: International Journal of Speech Technology | Issue 3/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speaker recognition performance in emotional talking environments is not as high as it is in neutral talking environments. This work focuses on proposing, implementing, and evaluating a new approach to enhance the performance in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues. Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in this work. This approach has been tested on our collected emotional speech database which is composed of six emotions. The results of this work show that speaker identification performance based on using both gender and emotion cues is higher than that based on using gender cues only, emotion cues only, and neither gender nor emotion cues by 7.22 %, 4.45 %, and 19.56 %, respectively. This work also shows that the optimum speaker identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models in the emotional talking environments. The achieved average speaker identification performance based on the new proposed approach falls within 2.35 % of that obtained in subjective evaluation by human judges.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abdulla, W. H., & Kasabov, N. K. (2001). Improving speech recognition performance through gender separation. In Artificial neural networks and expert systems international conference (ANNES), Dunedin, New Zealand (pp. 218–222). Abdulla, W. H., & Kasabov, N. K. (2001). Improving speech recognition performance through gender separation. In Artificial neural networks and expert systems international conference (ANNES), Dunedin, New Zealand (pp. 218–222).
go back to reference Bao, H., Xu, M., & Zheng, T. F. (2007). Emotion attribute projection for speaker recognition on emotional speech. In INTERSPEECH 2007, Antwerp, Belgium, August (pp. 758–761). Bao, H., Xu, M., & Zheng, T. F. (2007). Emotion attribute projection for speaker recognition on emotional speech. In INTERSPEECH 2007, Antwerp, Belgium, August (pp. 758–761).
go back to reference Casale, S., Russo, A., & Serrano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49(10–11), 801–810. CrossRef Casale, S., Russo, A., & Serrano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49(10–11), 801–810. CrossRef
go back to reference Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Collias, S., Fellenz, W., & Taylor, J. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80. CrossRef Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Collias, S., Fellenz, W., & Taylor, J. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80. CrossRef
go back to reference Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100. CrossRef Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100. CrossRef
go back to reference Fragopanagos, N., & Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Networks (Special issue), 18, 389–405. CrossRef Fragopanagos, N., & Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Networks (Special issue), 18, 389–405. CrossRef
go back to reference Harb, H., & Chen, L. (2003). Gender identification using a general audio classifier. In International conference on multimedia and expo 2003 (ICME’03), July (pp. 733–736). Harb, H., & Chen, L. (2003). Gender identification using a general audio classifier. In International conference on multimedia and expo 2003 (ICME’03), July (pp. 733–736).
go back to reference Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40. CrossRef Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40. CrossRef
go back to reference Koolagudi, S. G., & Krothapalli, R. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48. CrossRef Koolagudi, S. G., & Krothapalli, R. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48. CrossRef
go back to reference Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In 8th European conference on speech communication and technology, Geneva, Switzerland, September (pp. 125–128). Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In 8th European conference on speech communication and technology, Geneva, Switzerland, September (pp. 125–128).
go back to reference Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303. CrossRef Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303. CrossRef
go back to reference Li, D., Yang, Y., Wu, Z., & Wu, T. (2005). Emotion-state conversion for speaker recognition. In LNCS: Vol. 3784. Affective computing and intelligent interaction 2005 (pp. 403–410). Berlin: Springer. CrossRef Li, D., Yang, Y., Wu, Z., & Wu, T. (2005). Emotion-state conversion for speaker recognition. In LNCS: Vol. 3784. Affective computing and intelligent interaction 2005 (pp. 403–410). Berlin: Springer. CrossRef
go back to reference Luengo, I., Navas, E., Hernaez, I., & Sanches, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH 2005, Lisbon, Portugal, September (pp. 493–496). Luengo, I., Navas, E., Hernaez, I., & Sanches, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH 2005, Lisbon, Portugal, September (pp. 493–496).
go back to reference Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112. CrossRef Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112. CrossRef
go back to reference Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097–1108. CrossRef Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097–1108. CrossRef
go back to reference Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623. CrossRef Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623. CrossRef
go back to reference Petrushin, V. A. (2000). Emotion recognition in speech signal: experimental study, development, and application. In Proceedings of international conference on spoken language processing (ICSLP 2000). Petrushin, V. A. (2000). Emotion recognition in speech signal: experimental study, development, and application. In Proceedings of international conference on spoken language processing (ICSLP 2000).
go back to reference Picard, R. W. (1995). Affective computing. MIT Media Lab Perceptual Computing Section tech. rep., No. 321. Picard, R. W. (1995). Affective computing. MIT Media Lab Perceptual Computing Section tech. rep., No. 321.
go back to reference Pittermann, J., Pittermann, A., & Minker, W. (2010). Emotion recognition and adaptation in spoken dialogue systems. International Journal of Speech Technology, 13, 49–60. CrossRef Pittermann, J., Pittermann, A., & Minker, W. (2010). Emotion recognition and adaptation in spoken dialogue systems. International Journal of Speech Technology, 13, 49–60. CrossRef
go back to reference Polzin, T. S., & Waibel, A. H. (1998). Detecting emotions in speech, cooperative multimodal communication. In Second international conference (CMC 1998). Polzin, T. S., & Waibel, A. H. (1998). Detecting emotions in speech, cooperative multimodal communication. In Second international conference (CMC 1998).
go back to reference Rabiner, L. R., & Juang, B. H. (1983). Fundamentals of speech recognition. Eaglewood Cliffs: Prentice Hall (Chap. 6). Rabiner, L. R., & Juang, B. H. (1983). Fundamentals of speech recognition. Eaglewood Cliffs: Prentice Hall (Chap. 6).
go back to reference Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. In ICASSP 2002, May (Vol. 4, pp. IV-4072–IV-4075). Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. In ICASSP 2002, May (Vol. 4, pp. IV-4072–IV-4075).
go back to reference Shahin, I. (2008a). Speaker recognition systems in the emotional environment. In 3rd international conference on information & communication technologies: from theory to applications (ICTTA’08), IEEE section, France, Damascus, Syria, April. Shahin, I. (2008a). Speaker recognition systems in the emotional environment. In 3rd international conference on information & communication technologies: from theory to applications (ICTTA’08), IEEE section, France, Damascus, Syria, April.
go back to reference Shahin, I. (2008b). Speaking style authentication using suprasegmental hidden Markov models. University of Sharjah Journal of Pure and Applied Sciences, 5(2), 41–65. Shahin, I. (2008b). Speaking style authentication using suprasegmental hidden Markov models. University of Sharjah Journal of Pure and Applied Sciences, 5(2), 41–65.
go back to reference Shahin, I. (2008c). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing Journal, 88(11), 2700–2708. MATHCrossRef Shahin, I. (2008c). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing Journal, 88(11), 2700–2708. MATHCrossRef
go back to reference Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1), 41–46. MathSciNet Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1), 41–46. MathSciNet
go back to reference Ververidis, D., & Kotropoulos, C. (2004). Automatic speech classification to five emotional states based on gender information. In 12th European signal processing conference, Vienna, Austria, September (pp. 341–344). Ververidis, D., & Kotropoulos, C. (2004). Automatic speech classification to five emotional states based on gender information. In 12th European signal processing conference, Vienna, Austria, September (pp. 341–344).
go back to reference Vogt, T., & Andre, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC 2006), Genoa, Italy. Vogt, T., & Andre, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC 2006), Genoa, Italy.
go back to reference Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785. CrossRef Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785. CrossRef
Metadata
Title
Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments
Author
Ismail Mohd Adnan Shahin
Publication date
01-09-2013
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 3/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-013-9188-2

Other articles of this Issue 3/2013

International Journal of Speech Technology 3/2013 Go to the issue