nach oben

Cognitive Computation

Erschienen in:

01.06.2009

Time-Scale Feature Extractions for Emotional Speech Characterization

Applied to Human Centered Interaction Analysis

verfasst von: Mohamed Chetouani, Ammar Mahdhaoui, Fabien Ringeval

Erschienen in: Cognitive Computation | Ausgabe 2/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Emotional speech characterization is an important issue for the understanding of interaction. This article discusses the time-scale analysis problem in feature extraction for emotional speech processing. We describe a computational framework for combining segmental and supra-segmental features for emotional speech detection. The statistical fusion is based on the estimation of local a posteriori class probabilities and the overall decision employs weighting factors directly related to the duration of the individual speech segments. This strategy is applied to a real-world application: detection of Italian motherese in authentic and longitudinal parent–infant interaction at home. The results suggest that short- and long-term information, respectively, represented by the short-term spectrum and the prosody parameters (fundamental frequency and energy) provide a robust and efficient time-scale analysis. A similar fusion methodology is also investigated by the use of a phonetic-specific characterization process. This strategy is motivated by the fact that there are variations across emotional states at the phoneme level. A time-scale based on both vowels and consonants is proposed and it provides a relevant and discriminant feature space for acted emotion recognition. The experimental results on two different databases Berlin (German) and Aholab (Basque) show that the best performance are obtained by our phoneme-dependent approach. These findings demonstrate the relevance of taking into account phoneme dependency (vowels/consonants) for emotional speech characterization.

Vorheriger Artikel Ultimate Cognition à la Gödel

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Picard R. Affective computing. Cambridge, MA: MIT Press; 1997.

Argyle M. Bodily communication. 2nd edn. Madison: International Universities Press; 1988.

Kendon A, Harris RM, Key MR. Organization of behavior in face to face interactions. The Hague: Mouton; 1975.

Pentland A. Social signal processing. IEEE Signal Process Mag. 2007;24(4):108–11.CrossRef

Vinciarelli A, Pantic M, Bourlard H, Pentland A. Social signals, their function, and automatic analysis: a survey. In: IEEE international conference on multimodal interfaces (ICMI’08). 2008. p. 61–8.

Schuller B, Batliner A, Seppi D, Steidl S, Vogt T, Wagner J, et al. The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of interspeech; 2007. p. 2253–6.

Keller E. The Analysis of voice quality in speech processing. In: Chollet G, Esposito A, Faundez-Zanuy M, et al. editors. Lecture notes in computer science, vol. 3445/2005. New York: Springer; 2005. p. 54–73.

Campbell N. On the use of nonverbal speech sounds in human communication. In: Esposito A, et al. editors. Verbal and nonverbal communicational behaviours, LNAI 4775. Berlin, Heidelberg: Springer; 2007. p. 117–128.

Williams CE, Stevens KN. Emotions and speech: some acoustic correlates. J Acoust Soc Am. 1972;52:1238–50.PubMedCrossRef

10.

Sherer KR. Vocal affect expression: a review and a model for future research. Psychol Bull. 1986;99(2):143–65.CrossRef

11.

Murray IR, Amott JL. Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am. 1993;93(2):1097–108.PubMedCrossRef

12.

Shami M, Verhelst W. An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions, speech. Speech Commun. 2007;49(3):201–12.CrossRef

13.

Schuller B, Rigoll G, Lang M. Hidden Markov model-based speech emotion recognition. In: Proceedings of ICASSP’03, vol. 2. 2003. p. 1–4.

14.

Lee Z, Zhao Y. Recognizing emotions in speech using short-term and long-term features. In: Proceedings ICSLP 98; 1998. p. 2255–58.

15.

Vlasenko B, Schuller B, Wendemuth A, Rigoll G. Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. Affect Comput Intell Interact. 2007;139–47.

16.

Schuller B, Vlasenko B, Minguez R, Rigoll G, Wendemuth A. Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In: Proceedings of IEEE automatic speech recognition and understanding workshop (ASRU 2007), 9–13 Dec 2007, Kyoto, Japan; 2007. p. 596–600.

17.

Jiang DN, Cai L-H. Speech emotion classification with the combination of statistic features and temporal features. In: Proceedings of ICME 2004 IEEE, Taipei, Taiwan; 2004. p. 1967–71.

18.

Kim S, Georgiou P, Lee S, Narayanan S. Real-time emotion detection system using speech: multi-modal fusion of different timescale features. In: IEEE international workshop on multimedia signal processing; 2007.

19.

Fernald A, Simon T. Expanded intonation contours in mother’s speech to newborns. Dev Psychol.1987;20(1):104–13.CrossRef

20.

Uther M, Knoll MA, Burnham D. Do you speak E-NG-L-I-SH? A comparison of foreigner- and infant directed speech. Speech Commun. 2007;49:2–7.CrossRef

21.

Fernald A, Kuhl P. Acoustic determinants of infant preference for Motherese speech. Infant Behav Dev. 1987;10:279–93.CrossRef

22.

Fernald A. Intonation and communication intent in mothers speech to infants: is the melody the message? Child Dev. 1989;60:1497–510.PubMedCrossRef

23.

Slaney M, McRoberts G. Baby ears: a recognition system for affective vocalizations. Speech Commun. 2003;39(3–4):367–84.

24.

Burnham D, Kitamura C, Vollmer-Conna U. What’s new, Pussycat? On talking to babies and animals. Science. 2002;296:1435.PubMedCrossRef

25.

Varchavskaia P, Fitzpatrick P, Breazeal C. Characterizing and processing robot-directed speech. In: Proceedings of the IEEE/RAS international conference on humanoid robots. Tokyo, Japan, 22–24 Nov 2001.

26.

Batliner A, Biersack S, Steidl S. The prosody of pet robot directed speech: evidence from children. In: Proceedings of speech prosody; 2006. p. 1–4.

27.

Breazeal C, Aryananda L. Recognition of affective communicative intent in robot-directed speech. Auton Robots. 2002;12:83–104.

28.

Maestroa S, et al. Early behavioral development in autistic children: the first 2 years of life through home movies. Psychopathology. 2001;34:147–52.CrossRef

29.

Muratori F, Maestro S. Autism as a downstream effect of primary difficulties in intersubjectivity interacting with abnormal development of brain connectivity. Int J Dialog Sci Fall. 2007;2(1):93–118.

30.

Mahdhaoui A, Chetouani M, Zong C, Cassel RS, Saint-Georges C, Laznik M-C, et al. Automatic Motherese detection for face-to-face interaction analysis. In: Anna Esposito, et al. editors. Multimodal signals: cognitive and algorithmic issues. Berlin: Springer; 2009. p. 248–55.

31.

Laznik MC, Maestro S, Muratori F, Parlato E. Les interactions sonores entre les bebes devenus autistes et leur parents. In: Castarde MF, Konopczynski G, editors. Au commencement tait la voix. Ramonville Saint-Agne: Eres; 2005. p. 171–81.

32.

Mahdhaoui A, Chetouani M, Zong C. Motherese detection based on segmental and supra-segmental features. In: IAPR international conference on pattern recognition, ICPR 2008; 2008.

33.

Chetouani M, Faundez-Zanuy M, Gas B, Zarader JL. Investigation on LP-residual representations for speaker identification. Pattern Recogn. 2009;42(3):487–94.CrossRef

34.

Duda RO, Hart PE, Stork DG. Pattern classification. 2nd edn. New York: Wiley; 2000.

35.

Kuncheva I. Combining pattern classifiers: methods and algorithms. Wiley-Interscience; 2004.

36.

Monte-Moreno E, Chetouani M, Faundez-Zanuy M, Sole-Casals J. Maximum likelihood linear programming data fusion for speaker recognition. Speech Commun; 2009 (in press).

37.

Reynolds D. Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 1995;17:91108.

38.

Leinonen L, Hiltunen T, Linnankoski I, Laakso MJ. Expression or emotional–motivational connotations with a one-word utterance. J Acoust Soc Am. 1997;102(3):1853–63.PubMedCrossRef

39.

Pereira C, Watson C. Some acoustic characteristics of emotion. In: International conference on spoken language processing (ICSLP98); 1998. p. 927–30.

40.

Lee CM, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan S. Effects of emotion on different phoneme classes. J Acoust Soc Am. 2004;116:2481.

41.

Ringeval F, Chetouani M. A vowel based approach for acted emotion recognition. In: Proceedings of interspeech’08; 2008.

42.

Andr-Obrecht R. A new statistical approach for automatic speech segmentation. IEEE Trans ASSP. 1988;36(1):29–40.CrossRef

43.

Rouas JL, Farinas J, Pellegrino F, Andr-Obrecht R. Rhythmic unit extraction and modelling for automatic language identification. Speech Commun. 2005;47(4):436–56.CrossRef

44.

Burkhardt F. et al. A database of German emotional speech. In: Proceedings of Interspeech; 2005. p. 1517–20.

45.

Saratxaga I, Navas E, Hernaez I, Luengo I. Designing and recording an emotional speech database for corpus based synthesis in Basque. In: Proceedings of LREC; 2006. p. 2126–9.

46.

Keller E, Port R. Speech timing: Approaches to speech rhythm. Special session on timing. In: Proceedings of the international congress of phonetic sciences; 2007. p. 327–29.

47.

Tincoff R, Hauser M, Tsao F, Spaepen G, Ramus F, Mehler J. The role of speech rhythm in language discrimination: further tests with a nonhuman primate. Dev Sci. 2005;8(1):26–35.PubMedCrossRef

48.

Ramus F, Nespor M, Mehler J. Correlates of linguistic rhythm in the speech signal. Cognition. 1999;73(3):265–92.PubMedCrossRef

49.

Grabe E, Low EL. Durational variability in speech and the rhythm class hypothesis. Papers in Laboratory Phonology 7, Mouton; 2002.

Titel: Time-Scale Feature Extractions for Emotional Speech Characterization
Applied to Human Centered Interaction Analysis
verfasst von: Mohamed Chetouani
Ammar Mahdhaoui
Fabien Ringeval
Publikationsdatum: 01.06.2009
Verlag: Springer-Verlag
Erschienen in: Cognitive Computation / Ausgabe 2/2009
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI: https://doi.org/10.1007/s12559-009-9016-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2009

Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors

Recruitment and Consolidation of Cell Assemblies for Words by Way of Hebbian Learning and Competition in a Multi-Layer Neural Network

Ultimate Cognition à la Gödel

Artificial Cognitive Systems: From VLSI Networks of Spiking Neurons to Neuromorphic Cognition

On the Natural Hierarchical Composition of Cliques in Cell Assemblies

Premium Partner