nach oben

Cognitive Computation

Erschienen in:

01.12.2012

Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation

verfasst von: Jae-Bok Kim, Jeong-Sik Park, Yung-Hwan Oh

Erschienen in: Cognitive Computation | Ausgabe 4/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper proposes a novel speech emotion recognition (SER) framework for affective interaction between human and personal devices. Most of the conventional SER techniques adopt a speaker-independent model framework because of the sparseness of individual speech data. However, a large amount of individual data can be accumulated on a personal device, making it possible to construct speaker-characterized emotion models in accordance with a speaker adaptation procedure. In this study, to address problems associated with conventional adaptation approaches in SER tasks, we modified a representative adaptation technique, maximum likelihood linear regression (MLLR), on the basis of selective label refinement. We subsequently carried out the modified MLLR procedure in an online and iterative manner, using accumulated individual data, to further enhance the speaker-characterized emotion models. In the SER experiments based on an emotional corpus, our approach exhibited performance superior to that of conventional adaptation techniques as well as the speaker-independent model framework.

Vorheriger Artikel Real-Time Activity Detection in a Multi-Talker Reverberated Environment

Nächster Artikel Supervised Learning and Codebook Optimization for Bag-of-Words Models

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Suprateek S, John D. Understanding mobile handheld device use and adoption. Commu ACM. 2003;46:35–40.

Ballagas R, Borchers J, Rohs M, Jennifer G. The smart phone: ubiquitous input device. IEEE Pervasive Comput. 2006;5:70–7.CrossRef

Mark A, Streefkerk J. Interacting in desktop and mobile context: emotion, trust, and task performance. Ambient Intell. 2003;2875:119–32.CrossRef

Pittermann J, Pittermann A, Minker W. Handing emotions in human–computer dialogues. Berlin: Springer; 2010. p. 19–42.CrossRef

Park JS, Kim JH, Oh YH. Feature vector classification based speech emotion recognition for service robots. IEEE Trans Consum Electron. 2009;55:1590–6.CrossRef

Ignacio LM, Carlos OR, Joaquin GR, Daniel R. Speaker dependent emotion recognition using prosodic supervectors. In: Proceedings of interspeech; 2009. pp. 1971–4.

Nwe TL, Foo SW, Silva LCD. Speech emotion recognition using hidden Markov models. Speech Commun. 2003;41:603–23.CrossRef

Ververidis D, Kotropoulos C. Emotional speech recognition: resources, features, and methods. Speech Commun. 2006;48:1162–81.CrossRef

Kwon O, Chan K, Hao J, Lee T. Emotion recognition by speech signals. In: Proceedings of Eurospeech; 2003. pp. 125–8.

10.

Tato R, Santos R, Kompe R, Pardo JM. Emotional space improves emotion recognition. In: Proceedings of the international conference on spoken language processing (ICSLP); 2002. pp. 2029–32.

11.

Huang R, Ma C. Toward a speaker-independent real time affect detection system. In: Proceedings of international conference on pattern recognition (ICPR); 2006. pp. 1204–7.

12.

Leggetter CJ, Woodland PC. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang. 1995;9:171–85.CrossRef

13.

Woodland PC, Pye D, Gales MJF. Iterative unsupervised adaptation using maximum likelihood linear regression. In: Proceedings of international conference on spoken language processing (ICSLP); 1996. pp. 1133–6.

14.

Lee CH, Lin CH, Juang BH. A study on speaker adaptation of the parameters of continuous density hidden markov models. IEEE Trans Signal Process. 1991;39:806–14.CrossRef

15.

Matsui T, Furui S. N-best-based unsupervised speaker adaptation for speech recognition. Comput Speech Lang. 1998;12:41–50.CrossRef

16.

Anastasakos T, Balakrishnan SV. The use of confidence measures in unsupervised adaptation of speech recognizers. In: Proceedings of international conference on spoken language processing (ICSLP); 1998. pp. 2303–6.

17.

Grimm M, Kroschel K, Mower E, Narayanan S. Primitives-based evaluation and estimation of emotions in speech. Speech Commun. 2007;49:787–800.CrossRef

18.

Jiang H. Confidence measures for speech recognition: a survey. Speech Commun. 2005;45:455–70.CrossRef

19.

Pitz M, Wessel F, Ney H. Improved MLLR speaker adaptation using confidence measures for conversational speech recognition. In: Proceedings of international conference on spoken language processing (ICSLP); 2000. pp. 548–51.

20.

Gollan C, Bacchiani M. Confidence scores for acoustic model adaptation. In: Proceedings of international conference on acoustics, speech, and signal processing (ICASSP); 2008, pp. 4289–92.

21.

Liberman M, Davis K, Grossman M, Martey N, Bell J. Emotional prosody speech and transcripts. In: Linguistic data consortium (LDC). Philadelphia: University of Pennsylvania; 2002.

Titel: Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
verfasst von: Jae-Bok Kim
Jeong-Sik Park
Yung-Hwan Oh
Publikationsdatum: 01.12.2012
Verlag: Springer-Verlag
Erschienen in: Cognitive Computation / Ausgabe 4/2012
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI: https://doi.org/10.1007/s12559-012-9132-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2012

Multi-Robot Exploration in Wireless Environments

A Collaborative Video Annotation System Based on Semantic Web Technologies

Optimal Path Computation for Autonomous Aerial Vehicles

Cognitive and Emotional Information Processing for Human–Machine Interaction

Affect Sensing Using Linguistic, Semantic and Cognitive Cues in Multi-threaded Improvisational Dialogue

Blame the Opponent! Effects of Multimodal Discrediting Moves in Public Debates

Premium Partner