Top

International Journal of Speech Technology

Published in:

01-09-2013

Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments

Author: Ismail Mohd Adnan Shahin

Published in: International Journal of Speech Technology | Issue 3/2013

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Speaker recognition performance in emotional talking environments is not as high as it is in neutral talking environments. This work focuses on proposing, implementing, and evaluating a new approach to enhance the performance in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues. Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in this work. This approach has been tested on our collected emotional speech database which is composed of six emotions. The results of this work show that speaker identification performance based on using both gender and emotion cues is higher than that based on using gender cues only, emotion cues only, and neither gender nor emotion cues by 7.22 %, 4.45 %, and 19.56 %, respectively. This work also shows that the optimum speaker identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models in the emotional talking environments. The achieved average speaker identification performance based on the new proposed approach falls within 2.35 % of that obtained in subjective evaluation by human judges.

previous article MCRA noise estimation for KLT-VRE-based speech enhancement

next article Wavelet-scalogram based study of non-periodicity in speech signals as a complementary measure of chaotic content

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Abdulla, W. H., & Kasabov, N. K. (2001). Improving speech recognition performance through gender separation. In Artificial neural networks and expert systems international conference (ANNES), Dunedin, New Zealand (pp. 218–222).

Bao, H., Xu, M., & Zheng, T. F. (2007). Emotion attribute projection for speaker recognition on emotional speech. In INTERSPEECH 2007, Antwerp, Belgium, August (pp. 758–761).

Casale, S., Russo, A., & Serrano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49(10–11), 801–810. CrossRef

Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Collias, S., Fellenz, W., & Taylor, J. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80. CrossRef

Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100. CrossRef

Fragopanagos, N., & Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Networks (Special issue), 18, 389–405. CrossRef

Harb, H., & Chen, L. (2003). Gender identification using a general audio classifier. In International conference on multimedia and expo 2003 (ICME’03), July (pp. 733–736).

Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40. CrossRef

Koolagudi, S. G., & Krothapalli, R. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48. CrossRef

Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In 8th European conference on speech communication and technology, Geneva, Switzerland, September (pp. 125–128).

Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303. CrossRef

Li, D., Yang, Y., Wu, Z., & Wu, T. (2005). Emotion-state conversion for speaker recognition. In LNCS: Vol. 3784. Affective computing and intelligent interaction 2005 (pp. 403–410). Berlin: Springer. CrossRef

Luengo, I., Navas, E., Hernaez, I., & Sanches, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH 2005, Lisbon, Portugal, September (pp. 493–496).

Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112. CrossRef

Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097–1108. CrossRef

Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623. CrossRef

Petrushin, V. A. (2000). Emotion recognition in speech signal: experimental study, development, and application. In Proceedings of international conference on spoken language processing (ICSLP 2000).

Picard, R. W. (1995). Affective computing. MIT Media Lab Perceptual Computing Section tech. rep., No. 321.

Pittermann, J., Pittermann, A., & Minker, W. (2010). Emotion recognition and adaptation in spoken dialogue systems. International Journal of Speech Technology, 13, 49–60. CrossRef

Polzin, T. S., & Waibel, A. H. (1998). Detecting emotions in speech, cooperative multimodal communication. In Second international conference (CMC 1998).

Rabiner, L. R., & Juang, B. H. (1983). Fundamentals of speech recognition. Eaglewood Cliffs: Prentice Hall (Chap. 6).

Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. In ICASSP 2002, May (Vol. 4, pp. IV-4072–IV-4075).

Shahin, I. (2008a). Speaker recognition systems in the emotional environment. In 3rd international conference on information & communication technologies: from theory to applications (ICTTA’08), IEEE section, France, Damascus, Syria, April.

Shahin, I. (2008b). Speaking style authentication using suprasegmental hidden Markov models. University of Sharjah Journal of Pure and Applied Sciences, 5(2), 41–65.

Shahin, I. (2008c). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing Journal, 88(11), 2700–2708. MATHCrossRef

Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1), 41–46. MathSciNet

Shahin, I. (2011). Identifying speakers using their emotion cues. International Journal of Speech Technology, 14(2), 89–98. doi:10.1007/s10772-011-9089-1. CrossRef

Ververidis, D., & Kotropoulos, C. (2004). Automatic speech classification to five emotional states based on gender information. In 12th European signal processing conference, Vienna, Austria, September (pp. 341–344).

Vogt, T., & Andre, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC 2006), Genoa, Italy.

Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785. CrossRef

Title: Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments
Author: Ismail Mohd Adnan Shahin
Publication date: 01-09-2013
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 3/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-013-9188-2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2013

MCRA noise estimation for KLT-VRE-based speech enhancement

An efficient iterative method for nearly perfect reconstruction non-uniform filter bank

Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables

Implementation of variable bitrate data hiding techniques on standard and proposed GSM 06.10 full rate coder and its overall comparative evaluation of performance

Environment dependent noise tracking for speech enhancement

Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition