Skip to main content
Erschienen in: Cognitive Computation 2/2009

01.06.2009

Time-Scale Feature Extractions for Emotional Speech Characterization

Applied to Human Centered Interaction Analysis

verfasst von: Mohamed Chetouani, Ammar Mahdhaoui, Fabien Ringeval

Erschienen in: Cognitive Computation | Ausgabe 2/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Emotional speech characterization is an important issue for the understanding of interaction. This article discusses the time-scale analysis problem in feature extraction for emotional speech processing. We describe a computational framework for combining segmental and supra-segmental features for emotional speech detection. The statistical fusion is based on the estimation of local a posteriori class probabilities and the overall decision employs weighting factors directly related to the duration of the individual speech segments. This strategy is applied to a real-world application: detection of Italian motherese in authentic and longitudinal parent–infant interaction at home. The results suggest that short- and long-term information, respectively, represented by the short-term spectrum and the prosody parameters (fundamental frequency and energy) provide a robust and efficient time-scale analysis. A similar fusion methodology is also investigated by the use of a phonetic-specific characterization process. This strategy is motivated by the fact that there are variations across emotional states at the phoneme level. A time-scale based on both vowels and consonants is proposed and it provides a relevant and discriminant feature space for acted emotion recognition. The experimental results on two different databases Berlin (German) and Aholab (Basque) show that the best performance are obtained by our phoneme-dependent approach. These findings demonstrate the relevance of taking into account phoneme dependency (vowels/consonants) for emotional speech characterization.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Picard R. Affective computing. Cambridge, MA: MIT Press; 1997. Picard R. Affective computing. Cambridge, MA: MIT Press; 1997.
2.
Zurück zum Zitat Argyle M. Bodily communication. 2nd edn. Madison: International Universities Press; 1988. Argyle M. Bodily communication. 2nd edn. Madison: International Universities Press; 1988.
3.
Zurück zum Zitat Kendon A, Harris RM, Key MR. Organization of behavior in face to face interactions. The Hague: Mouton; 1975. Kendon A, Harris RM, Key MR. Organization of behavior in face to face interactions. The Hague: Mouton; 1975.
4.
Zurück zum Zitat Pentland A. Social signal processing. IEEE Signal Process Mag. 2007;24(4):108–11.CrossRef Pentland A. Social signal processing. IEEE Signal Process Mag. 2007;24(4):108–11.CrossRef
5.
Zurück zum Zitat Vinciarelli A, Pantic M, Bourlard H, Pentland A. Social signals, their function, and automatic analysis: a survey. In: IEEE international conference on multimodal interfaces (ICMI’08). 2008. p. 61–8. Vinciarelli A, Pantic M, Bourlard H, Pentland A. Social signals, their function, and automatic analysis: a survey. In: IEEE international conference on multimodal interfaces (ICMI’08). 2008. p. 61–8.
6.
Zurück zum Zitat Schuller B, Batliner A, Seppi D, Steidl S, Vogt T, Wagner J, et al. The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of interspeech; 2007. p. 2253–6. Schuller B, Batliner A, Seppi D, Steidl S, Vogt T, Wagner J, et al. The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of interspeech; 2007. p. 2253–6.
7.
Zurück zum Zitat Keller E. The Analysis of voice quality in speech processing. In: Chollet G, Esposito A, Faundez-Zanuy M, et al. editors. Lecture notes in computer science, vol. 3445/2005. New York: Springer; 2005. p. 54–73. Keller E. The Analysis of voice quality in speech processing. In: Chollet G, Esposito A, Faundez-Zanuy M, et al. editors. Lecture notes in computer science, vol. 3445/2005. New York: Springer; 2005. p. 54–73.
8.
Zurück zum Zitat Campbell N. On the use of nonverbal speech sounds in human communication. In: Esposito A, et al. editors. Verbal and nonverbal communicational behaviours, LNAI 4775. Berlin, Heidelberg: Springer; 2007. p. 117–128. Campbell N. On the use of nonverbal speech sounds in human communication. In: Esposito A, et al. editors. Verbal and nonverbal communicational behaviours, LNAI 4775. Berlin, Heidelberg: Springer; 2007. p. 117–128.
9.
Zurück zum Zitat Williams CE, Stevens KN. Emotions and speech: some acoustic correlates. J Acoust Soc Am. 1972;52:1238–50.PubMedCrossRef Williams CE, Stevens KN. Emotions and speech: some acoustic correlates. J Acoust Soc Am. 1972;52:1238–50.PubMedCrossRef
10.
Zurück zum Zitat Sherer KR. Vocal affect expression: a review and a model for future research. Psychol Bull. 1986;99(2):143–65.CrossRef Sherer KR. Vocal affect expression: a review and a model for future research. Psychol Bull. 1986;99(2):143–65.CrossRef
11.
Zurück zum Zitat Murray IR, Amott JL. Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am. 1993;93(2):1097–108.PubMedCrossRef Murray IR, Amott JL. Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am. 1993;93(2):1097–108.PubMedCrossRef
12.
Zurück zum Zitat Shami M, Verhelst W. An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions, speech. Speech Commun. 2007;49(3):201–12.CrossRef Shami M, Verhelst W. An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions, speech. Speech Commun. 2007;49(3):201–12.CrossRef
13.
Zurück zum Zitat Schuller B, Rigoll G, Lang M. Hidden Markov model-based speech emotion recognition. In: Proceedings of ICASSP’03, vol. 2. 2003. p. 1–4. Schuller B, Rigoll G, Lang M. Hidden Markov model-based speech emotion recognition. In: Proceedings of ICASSP’03, vol. 2. 2003. p. 1–4.
14.
Zurück zum Zitat Lee Z, Zhao Y. Recognizing emotions in speech using short-term and long-term features. In: Proceedings ICSLP 98; 1998. p. 2255–58. Lee Z, Zhao Y. Recognizing emotions in speech using short-term and long-term features. In: Proceedings ICSLP 98; 1998. p. 2255–58.
15.
Zurück zum Zitat Vlasenko B, Schuller B, Wendemuth A, Rigoll G. Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. Affect Comput Intell Interact. 2007;139–47. Vlasenko B, Schuller B, Wendemuth A, Rigoll G. Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. Affect Comput Intell Interact. 2007;139–47.
16.
Zurück zum Zitat Schuller B, Vlasenko B, Minguez R, Rigoll G, Wendemuth A. Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In: Proceedings of IEEE automatic speech recognition and understanding workshop (ASRU 2007), 9–13 Dec 2007, Kyoto, Japan; 2007. p. 596–600. Schuller B, Vlasenko B, Minguez R, Rigoll G, Wendemuth A. Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In: Proceedings of IEEE automatic speech recognition and understanding workshop (ASRU 2007), 9–13 Dec 2007, Kyoto, Japan; 2007. p. 596–600.
17.
Zurück zum Zitat Jiang DN, Cai L-H. Speech emotion classification with the combination of statistic features and temporal features. In: Proceedings of ICME 2004 IEEE, Taipei, Taiwan; 2004. p. 1967–71. Jiang DN, Cai L-H. Speech emotion classification with the combination of statistic features and temporal features. In: Proceedings of ICME 2004 IEEE, Taipei, Taiwan; 2004. p. 1967–71.
18.
Zurück zum Zitat Kim S, Georgiou P, Lee S, Narayanan S. Real-time emotion detection system using speech: multi-modal fusion of different timescale features. In: IEEE international workshop on multimedia signal processing; 2007. Kim S, Georgiou P, Lee S, Narayanan S. Real-time emotion detection system using speech: multi-modal fusion of different timescale features. In: IEEE international workshop on multimedia signal processing; 2007.
19.
Zurück zum Zitat Fernald A, Simon T. Expanded intonation contours in mother’s speech to newborns. Dev Psychol.1987;20(1):104–13.CrossRef Fernald A, Simon T. Expanded intonation contours in mother’s speech to newborns. Dev Psychol.1987;20(1):104–13.CrossRef
20.
Zurück zum Zitat Uther M, Knoll MA, Burnham D. Do you speak E-NG-L-I-SH? A comparison of foreigner- and infant directed speech. Speech Commun. 2007;49:2–7.CrossRef Uther M, Knoll MA, Burnham D. Do you speak E-NG-L-I-SH? A comparison of foreigner- and infant directed speech. Speech Commun. 2007;49:2–7.CrossRef
21.
Zurück zum Zitat Fernald A, Kuhl P. Acoustic determinants of infant preference for Motherese speech. Infant Behav Dev. 1987;10:279–93.CrossRef Fernald A, Kuhl P. Acoustic determinants of infant preference for Motherese speech. Infant Behav Dev. 1987;10:279–93.CrossRef
22.
Zurück zum Zitat Fernald A. Intonation and communication intent in mothers speech to infants: is the melody the message? Child Dev. 1989;60:1497–510.PubMedCrossRef Fernald A. Intonation and communication intent in mothers speech to infants: is the melody the message? Child Dev. 1989;60:1497–510.PubMedCrossRef
23.
Zurück zum Zitat Slaney M, McRoberts G. Baby ears: a recognition system for affective vocalizations. Speech Commun. 2003;39(3–4):367–84. Slaney M, McRoberts G. Baby ears: a recognition system for affective vocalizations. Speech Commun. 2003;39(3–4):367–84.
24.
Zurück zum Zitat Burnham D, Kitamura C, Vollmer-Conna U. What’s new, Pussycat? On talking to babies and animals. Science. 2002;296:1435.PubMedCrossRef Burnham D, Kitamura C, Vollmer-Conna U. What’s new, Pussycat? On talking to babies and animals. Science. 2002;296:1435.PubMedCrossRef
25.
Zurück zum Zitat Varchavskaia P, Fitzpatrick P, Breazeal C. Characterizing and processing robot-directed speech. In: Proceedings of the IEEE/RAS international conference on humanoid robots. Tokyo, Japan, 22–24 Nov 2001. Varchavskaia P, Fitzpatrick P, Breazeal C. Characterizing and processing robot-directed speech. In: Proceedings of the IEEE/RAS international conference on humanoid robots. Tokyo, Japan, 22–24 Nov 2001.
26.
Zurück zum Zitat Batliner A, Biersack S, Steidl S. The prosody of pet robot directed speech: evidence from children. In: Proceedings of speech prosody; 2006. p. 1–4. Batliner A, Biersack S, Steidl S. The prosody of pet robot directed speech: evidence from children. In: Proceedings of speech prosody; 2006. p. 1–4.
27.
Zurück zum Zitat Breazeal C, Aryananda L. Recognition of affective communicative intent in robot-directed speech. Auton Robots. 2002;12:83–104. Breazeal C, Aryananda L. Recognition of affective communicative intent in robot-directed speech. Auton Robots. 2002;12:83–104.
28.
Zurück zum Zitat Maestroa S, et al. Early behavioral development in autistic children: the first 2 years of life through home movies. Psychopathology. 2001;34:147–52.CrossRef Maestroa S, et al. Early behavioral development in autistic children: the first 2 years of life through home movies. Psychopathology. 2001;34:147–52.CrossRef
29.
Zurück zum Zitat Muratori F, Maestro S. Autism as a downstream effect of primary difficulties in intersubjectivity interacting with abnormal development of brain connectivity. Int J Dialog Sci Fall. 2007;2(1):93–118. Muratori F, Maestro S. Autism as a downstream effect of primary difficulties in intersubjectivity interacting with abnormal development of brain connectivity. Int J Dialog Sci Fall. 2007;2(1):93–118.
30.
Zurück zum Zitat Mahdhaoui A, Chetouani M, Zong C, Cassel RS, Saint-Georges C, Laznik M-C, et al. Automatic Motherese detection for face-to-face interaction analysis. In: Anna Esposito, et al. editors. Multimodal signals: cognitive and algorithmic issues. Berlin: Springer; 2009. p. 248–55. Mahdhaoui A, Chetouani M, Zong C, Cassel RS, Saint-Georges C, Laznik M-C, et al. Automatic Motherese detection for face-to-face interaction analysis. In: Anna Esposito, et al. editors. Multimodal signals: cognitive and algorithmic issues. Berlin: Springer; 2009. p. 248–55.
31.
Zurück zum Zitat Laznik MC, Maestro S, Muratori F, Parlato E. Les interactions sonores entre les bebes devenus autistes et leur parents. In: Castarde MF, Konopczynski G, editors. Au commencement tait la voix. Ramonville Saint-Agne: Eres; 2005. p. 171–81. Laznik MC, Maestro S, Muratori F, Parlato E. Les interactions sonores entre les bebes devenus autistes et leur parents. In: Castarde MF, Konopczynski G, editors. Au commencement tait la voix. Ramonville Saint-Agne: Eres; 2005. p. 171–81.
32.
Zurück zum Zitat Mahdhaoui A, Chetouani M, Zong C. Motherese detection based on segmental and supra-segmental features. In: IAPR international conference on pattern recognition, ICPR 2008; 2008. Mahdhaoui A, Chetouani M, Zong C. Motherese detection based on segmental and supra-segmental features. In: IAPR international conference on pattern recognition, ICPR 2008; 2008.
33.
Zurück zum Zitat Chetouani M, Faundez-Zanuy M, Gas B, Zarader JL. Investigation on LP-residual representations for speaker identification. Pattern Recogn. 2009;42(3):487–94.CrossRef Chetouani M, Faundez-Zanuy M, Gas B, Zarader JL. Investigation on LP-residual representations for speaker identification. Pattern Recogn. 2009;42(3):487–94.CrossRef
34.
Zurück zum Zitat Duda RO, Hart PE, Stork DG. Pattern classification. 2nd edn. New York: Wiley; 2000. Duda RO, Hart PE, Stork DG. Pattern classification. 2nd edn. New York: Wiley; 2000.
35.
Zurück zum Zitat Kuncheva I. Combining pattern classifiers: methods and algorithms. Wiley-Interscience; 2004. Kuncheva I. Combining pattern classifiers: methods and algorithms. Wiley-Interscience; 2004.
36.
Zurück zum Zitat Monte-Moreno E, Chetouani M, Faundez-Zanuy M, Sole-Casals J. Maximum likelihood linear programming data fusion for speaker recognition. Speech Commun; 2009 (in press). Monte-Moreno E, Chetouani M, Faundez-Zanuy M, Sole-Casals J. Maximum likelihood linear programming data fusion for speaker recognition. Speech Commun; 2009 (in press).
37.
Zurück zum Zitat Reynolds D. Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 1995;17:91108. Reynolds D. Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 1995;17:91108.
38.
Zurück zum Zitat Leinonen L, Hiltunen T, Linnankoski I, Laakso MJ. Expression or emotional–motivational connotations with a one-word utterance. J Acoust Soc Am. 1997;102(3):1853–63.PubMedCrossRef Leinonen L, Hiltunen T, Linnankoski I, Laakso MJ. Expression or emotional–motivational connotations with a one-word utterance. J Acoust Soc Am. 1997;102(3):1853–63.PubMedCrossRef
39.
Zurück zum Zitat Pereira C, Watson C. Some acoustic characteristics of emotion. In: International conference on spoken language processing (ICSLP98); 1998. p. 927–30. Pereira C, Watson C. Some acoustic characteristics of emotion. In: International conference on spoken language processing (ICSLP98); 1998. p. 927–30.
40.
Zurück zum Zitat Lee CM, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan S. Effects of emotion on different phoneme classes. J Acoust Soc Am. 2004;116:2481. Lee CM, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan S. Effects of emotion on different phoneme classes. J Acoust Soc Am. 2004;116:2481.
41.
Zurück zum Zitat Ringeval F, Chetouani M. A vowel based approach for acted emotion recognition. In: Proceedings of interspeech’08; 2008. Ringeval F, Chetouani M. A vowel based approach for acted emotion recognition. In: Proceedings of interspeech’08; 2008.
42.
Zurück zum Zitat Andr-Obrecht R. A new statistical approach for automatic speech segmentation. IEEE Trans ASSP. 1988;36(1):29–40.CrossRef Andr-Obrecht R. A new statistical approach for automatic speech segmentation. IEEE Trans ASSP. 1988;36(1):29–40.CrossRef
43.
Zurück zum Zitat Rouas JL, Farinas J, Pellegrino F, Andr-Obrecht R. Rhythmic unit extraction and modelling for automatic language identification. Speech Commun. 2005;47(4):436–56.CrossRef Rouas JL, Farinas J, Pellegrino F, Andr-Obrecht R. Rhythmic unit extraction and modelling for automatic language identification. Speech Commun. 2005;47(4):436–56.CrossRef
44.
Zurück zum Zitat Burkhardt F. et al. A database of German emotional speech. In: Proceedings of Interspeech; 2005. p. 1517–20. Burkhardt F. et al. A database of German emotional speech. In: Proceedings of Interspeech; 2005. p. 1517–20.
45.
Zurück zum Zitat Saratxaga I, Navas E, Hernaez I, Luengo I. Designing and recording an emotional speech database for corpus based synthesis in Basque. In: Proceedings of LREC; 2006. p. 2126–9. Saratxaga I, Navas E, Hernaez I, Luengo I. Designing and recording an emotional speech database for corpus based synthesis in Basque. In: Proceedings of LREC; 2006. p. 2126–9.
46.
Zurück zum Zitat Keller E, Port R. Speech timing: Approaches to speech rhythm. Special session on timing. In: Proceedings of the international congress of phonetic sciences; 2007. p. 327–29. Keller E, Port R. Speech timing: Approaches to speech rhythm. Special session on timing. In: Proceedings of the international congress of phonetic sciences; 2007. p. 327–29.
47.
Zurück zum Zitat Tincoff R, Hauser M, Tsao F, Spaepen G, Ramus F, Mehler J. The role of speech rhythm in language discrimination: further tests with a nonhuman primate. Dev Sci. 2005;8(1):26–35.PubMedCrossRef Tincoff R, Hauser M, Tsao F, Spaepen G, Ramus F, Mehler J. The role of speech rhythm in language discrimination: further tests with a nonhuman primate. Dev Sci. 2005;8(1):26–35.PubMedCrossRef
48.
Zurück zum Zitat Ramus F, Nespor M, Mehler J. Correlates of linguistic rhythm in the speech signal. Cognition. 1999;73(3):265–92.PubMedCrossRef Ramus F, Nespor M, Mehler J. Correlates of linguistic rhythm in the speech signal. Cognition. 1999;73(3):265–92.PubMedCrossRef
49.
Zurück zum Zitat Grabe E, Low EL. Durational variability in speech and the rhythm class hypothesis. Papers in Laboratory Phonology 7, Mouton; 2002. Grabe E, Low EL. Durational variability in speech and the rhythm class hypothesis. Papers in Laboratory Phonology 7, Mouton; 2002.
Metadaten
Titel
Time-Scale Feature Extractions for Emotional Speech Characterization
Applied to Human Centered Interaction Analysis
verfasst von
Mohamed Chetouani
Ammar Mahdhaoui
Fabien Ringeval
Publikationsdatum
01.06.2009
Verlag
Springer-Verlag
Erschienen in
Cognitive Computation / Ausgabe 2/2009
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-009-9016-9

Weitere Artikel der Ausgabe 2/2009

Cognitive Computation 2/2009 Zur Ausgabe

Premium Partner