nach oben

International Journal of Speech Technology

Erschienen in:

01.09.2013

Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments

verfasst von: Ismail Mohd Adnan Shahin

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Speaker recognition performance in emotional talking environments is not as high as it is in neutral talking environments. This work focuses on proposing, implementing, and evaluating a new approach to enhance the performance in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues. Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in this work. This approach has been tested on our collected emotional speech database which is composed of six emotions. The results of this work show that speaker identification performance based on using both gender and emotion cues is higher than that based on using gender cues only, emotion cues only, and neither gender nor emotion cues by 7.22 %, 4.45 %, and 19.56 %, respectively. This work also shows that the optimum speaker identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models in the emotional talking environments. The achieved average speaker identification performance based on the new proposed approach falls within 2.35 % of that obtained in subjective evaluation by human judges.

Vorheriger Artikel MCRA noise estimation for KLT-VRE-based speech enhancement

Nächster Artikel Wavelet-scalogram based study of non-periodicity in speech signals as a complementary measure of chaotic content

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abdulla, W. H., & Kasabov, N. K. (2001). Improving speech recognition performance through gender separation. In Artificial neural networks and expert systems international conference (ANNES), Dunedin, New Zealand (pp. 218–222).

Bao, H., Xu, M., & Zheng, T. F. (2007). Emotion attribute projection for speaker recognition on emotional speech. In INTERSPEECH 2007, Antwerp, Belgium, August (pp. 758–761).

Casale, S., Russo, A., & Serrano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49(10–11), 801–810. CrossRef

Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Collias, S., Fellenz, W., & Taylor, J. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80. CrossRef

Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100. CrossRef

Fragopanagos, N., & Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Networks (Special issue), 18, 389–405. CrossRef

Harb, H., & Chen, L. (2003). Gender identification using a general audio classifier. In International conference on multimedia and expo 2003 (ICME’03), July (pp. 733–736).

Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40. CrossRef

Koolagudi, S. G., & Krothapalli, R. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48. CrossRef

Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In 8th European conference on speech communication and technology, Geneva, Switzerland, September (pp. 125–128).

Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303. CrossRef

Li, D., Yang, Y., Wu, Z., & Wu, T. (2005). Emotion-state conversion for speaker recognition. In LNCS: Vol. 3784. Affective computing and intelligent interaction 2005 (pp. 403–410). Berlin: Springer. CrossRef

Luengo, I., Navas, E., Hernaez, I., & Sanches, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH 2005, Lisbon, Portugal, September (pp. 493–496).

Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112. CrossRef

Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097–1108. CrossRef

Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623. CrossRef

Petrushin, V. A. (2000). Emotion recognition in speech signal: experimental study, development, and application. In Proceedings of international conference on spoken language processing (ICSLP 2000).

Picard, R. W. (1995). Affective computing. MIT Media Lab Perceptual Computing Section tech. rep., No. 321.

Pittermann, J., Pittermann, A., & Minker, W. (2010). Emotion recognition and adaptation in spoken dialogue systems. International Journal of Speech Technology, 13, 49–60. CrossRef

Polzin, T. S., & Waibel, A. H. (1998). Detecting emotions in speech, cooperative multimodal communication. In Second international conference (CMC 1998).

Rabiner, L. R., & Juang, B. H. (1983). Fundamentals of speech recognition. Eaglewood Cliffs: Prentice Hall (Chap. 6).

Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. In ICASSP 2002, May (Vol. 4, pp. IV-4072–IV-4075).

Shahin, I. (2008a). Speaker recognition systems in the emotional environment. In 3rd international conference on information & communication technologies: from theory to applications (ICTTA’08), IEEE section, France, Damascus, Syria, April.

Shahin, I. (2008b). Speaking style authentication using suprasegmental hidden Markov models. University of Sharjah Journal of Pure and Applied Sciences, 5(2), 41–65.

Shahin, I. (2008c). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing Journal, 88(11), 2700–2708. MATHCrossRef

Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1), 41–46. MathSciNet

Shahin, I. (2011). Identifying speakers using their emotion cues. International Journal of Speech Technology, 14(2), 89–98. doi:10.1007/s10772-011-9089-1. CrossRef

Ververidis, D., & Kotropoulos, C. (2004). Automatic speech classification to five emotional states based on gender information. In 12th European signal processing conference, Vienna, Austria, September (pp. 341–344).

Vogt, T., & Andre, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC 2006), Genoa, Italy.

Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785. CrossRef

Titel: Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments
verfasst von: Ismail Mohd Adnan Shahin
Publikationsdatum: 01.09.2013
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-013-9188-2

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Customer Experience/© © oatawa / Getty Images / iStock, Erdgasmotor 1.5 TGI evo von Volkswagen/© Volkswagen AG, Thorsten Mücke/© Alexandra Bachran, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2013

Environment dependent noise tracking for speech enhancement

Advanced classification approach for neuronal phoneme recognition system based on efficient constructive training algorithm

VoCMex: a voice corpus in Mexican Spanish for research in speaker recognition

Implementation of variable bitrate data hiding techniques on standard and proposed GSM 06.10 full rate coder and its overall comparative evaluation of performance

An efficient iterative method for nearly perfect reconstruction non-uniform filter bank

MCRA noise estimation for KLT-VRE-based speech enhancement

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.