Skip to main content

2015 | OriginalPaper | Buchkapitel

7. Speech Based Emotion Recognition

verfasst von : Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah

Erschienen in: Speech and Audio Processing for Coding, Enhancement and Recognition

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter will examine current approaches to speech based emotion recognition. Following a brief introduction that describes the current widely utilised approaches to building such systems, it will attempt to broadly segregate components commonly involved in emotion recognition systems based on their function (i.e., feature extraction, normalisation, classifier, etc.) to give a broad view of the landscape. The next section of the chapter will then attempt to explain in more detail those components that are part of the most current systems. The chapter will also present a broad overview of how phonetic and speaker variability are dealt with in emotion recognition systems. Finally, the chapter presents the authors’ views on what are the current and future research challenges in the field.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat A.R. Damasio, A second chance for emotion, in Cognitive Neuroscience of Emotion, ed. by R. Lane et al. (Oxford University Press, New York, 2000), pp. 12–23 A.R. Damasio, A second chance for emotion, in Cognitive Neuroscience of Emotion, ed. by R. Lane et al. (Oxford University Press, New York, 2000), pp. 12–23
2.
Zurück zum Zitat K. Scherer, On the nature and function of emotion: a component process approach, in Approaches to Emotion, ed. by K.R. Scherer, P. Ekman (Lawrence Erlbaum Associates, Inc., Hillsdale, 1984), pp. 293–317 K. Scherer, On the nature and function of emotion: a component process approach, in Approaches to Emotion, ed. by K.R. Scherer, P. Ekman (Lawrence Erlbaum Associates, Inc., Hillsdale, 1984), pp. 293–317
3.
Zurück zum Zitat S. Tompkins, Affect Imagery Consciousness-Volume I the Positive Affects: The Positive Affects (Springer Publishing Company, New York, 1962) S. Tompkins, Affect Imagery Consciousness-Volume I the Positive Affects: The Positive Affects (Springer Publishing Company, New York, 1962)
4.
5.
6.
Zurück zum Zitat K.R. Scherer, What are emotions? And how can they be measured? Soc. Sci. Inf. 44, 695–729 (2005)CrossRef K.R. Scherer, What are emotions? And how can they be measured? Soc. Sci. Inf. 44, 695–729 (2005)CrossRef
7.
Zurück zum Zitat P. Verduyn, I. Van Mechelen, F. Tuerlinckx, The relation between event processing and the duration of emotional experience. Emotion 11, 20 (2011)CrossRef P. Verduyn, I. Van Mechelen, F. Tuerlinckx, The relation between event processing and the duration of emotional experience. Emotion 11, 20 (2011)CrossRef
8.
Zurück zum Zitat P. Verduyn, E. Delvaux, H. Van Coillie, F. Tuerlinckx, I. Van Mechelen, Predicting the duration of emotional experience: two experience sampling studies. Emotion 9, 83 (2009)CrossRef P. Verduyn, E. Delvaux, H. Van Coillie, F. Tuerlinckx, I. Van Mechelen, Predicting the duration of emotional experience: two experience sampling studies. Emotion 9, 83 (2009)CrossRef
9.
Zurück zum Zitat A. Moors, P.C. Ellsworth, K.R. Scherer, N.H. Frijda, Appraisal theories of emotion: state of the art and future development. Emot. Rev. 5, 119–124 (2013)CrossRef A. Moors, P.C. Ellsworth, K.R. Scherer, N.H. Frijda, Appraisal theories of emotion: state of the art and future development. Emot. Rev. 5, 119–124 (2013)CrossRef
10.
Zurück zum Zitat I. Fónagy, Emotions, voice and music. Res. Aspects Singing 33, 51–79 (1981) I. Fónagy, Emotions, voice and music. Res. Aspects Singing 33, 51–79 (1981)
11.
Zurück zum Zitat J. Ohala, Cross-language use of pitch: an ethological view. Phonetica 40, 1 (1983)CrossRef J. Ohala, Cross-language use of pitch: an ethological view. Phonetica 40, 1 (1983)CrossRef
12.
Zurück zum Zitat E. Kramer, Judgment of personal characteristics and emotions from nonverbal properties of speech. Psychol. Bull. 60, 408 (1963)CrossRef E. Kramer, Judgment of personal characteristics and emotions from nonverbal properties of speech. Psychol. Bull. 60, 408 (1963)CrossRef
13.
Zurück zum Zitat I.R. Murray, J.L. Arnott, Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93, 1097–1108 (1993)CrossRef I.R. Murray, J.L. Arnott, Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93, 1097–1108 (1993)CrossRef
14.
Zurück zum Zitat I. Pollack, H. Rubenstein, A. Horowitz, Communication of verbal modes of expression. Lang. Speech 3, 121–130 (1960) I. Pollack, H. Rubenstein, A. Horowitz, Communication of verbal modes of expression. Lang. Speech 3, 121–130 (1960)
16.
Zurück zum Zitat C. Darwin, The Expressions of Emotions in Man and Animals (John Murray, London, 1872)CrossRef C. Darwin, The Expressions of Emotions in Man and Animals (John Murray, London, 1872)CrossRef
17.
Zurück zum Zitat P. Ekman, An argument for basic emotions. Cogn. Emot. 6, 169–200 (1992)CrossRef P. Ekman, An argument for basic emotions. Cogn. Emot. 6, 169–200 (1992)CrossRef
18.
Zurück zum Zitat B. De Gelder, Recognizing emotions by ear and by eye, in Cognitive Neuroscience of Emotion, ed. by R. Lane et al. (Oxford University Press, New York, 2000), pp. 84–105 B. De Gelder, Recognizing emotions by ear and by eye, in Cognitive Neuroscience of Emotion, ed. by R. Lane et al. (Oxford University Press, New York, 2000), pp. 84–105
19.
Zurück zum Zitat T. Johnstone, K. Scherer, Vocal communication of emotion, in Handbook of Emotions, ed. by M. Lewis, J. Haviland, 2nd edn. (Guilford, New York, 2000), pp. 220–235 T. Johnstone, K. Scherer, Vocal communication of emotion, in Handbook of Emotions, ed. by M. Lewis, J. Haviland, 2nd edn. (Guilford, New York, 2000), pp. 220–235
20.
Zurück zum Zitat K.R. Scherer, Vocal communication of emotion: a review of research paradigms. Speech Comm. 40, 227–256 (2003)CrossRefMATH K.R. Scherer, Vocal communication of emotion: a review of research paradigms. Speech Comm. 40, 227–256 (2003)CrossRefMATH
21.
Zurück zum Zitat K. Scherer, Vocal affect expression: a review and a model for future research. Psychol. Bull. 99, 143–165 (1986)CrossRef K. Scherer, Vocal affect expression: a review and a model for future research. Psychol. Bull. 99, 143–165 (1986)CrossRef
22.
Zurück zum Zitat R. Frick, Communicating emotion: the role of prosodic features. Psychol. Bull. 97, 412–429 (1985)CrossRef R. Frick, Communicating emotion: the role of prosodic features. Psychol. Bull. 97, 412–429 (1985)CrossRef
23.
Zurück zum Zitat R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, J.G. Taylor, Emotion recognition in human–computer interaction. Signal Proc. Mag. 18, 32–80 (2001)CrossRef R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, J.G. Taylor, Emotion recognition in human–computer interaction. Signal Proc. Mag. 18, 32–80 (2001)CrossRef
24.
Zurück zum Zitat R. Cowie, R.R. Cornelius, Describing the emotional states that are expressed in speech. Speech Comm. 40, 5–32 (2003)CrossRefMATH R. Cowie, R.R. Cornelius, Describing the emotional states that are expressed in speech. Speech Comm. 40, 5–32 (2003)CrossRefMATH
25.
Zurück zum Zitat J. Averill, A semantic atlas of emotional concepts. JSAS Cat. Sel. Doc. Psychol. 5, 330 (1975) J. Averill, A semantic atlas of emotional concepts. JSAS Cat. Sel. Doc. Psychol. 5, 330 (1975)
26.
Zurück zum Zitat R. Plutchik, The Psychology and Biology of Emotion (HarperCollins College Div, New York, 1994) R. Plutchik, The Psychology and Biology of Emotion (HarperCollins College Div, New York, 1994)
27.
Zurück zum Zitat R. Cowie, E. Douglas-cowie, B. Apolloni, J. Taylor, A. Romano, W. Fellenz, What a neural net needs to know about emotion words, in Computational Intelligence and Applications, ed. by N. Mastorakis (Word Scientific Engineering Society, Singapore, 1999), pp. 109–114 R. Cowie, E. Douglas-cowie, B. Apolloni, J. Taylor, A. Romano, W. Fellenz, What a neural net needs to know about emotion words, in Computational Intelligence and Applications, ed. by N. Mastorakis (Word Scientific Engineering Society, Singapore, 1999), pp. 109–114
28.
Zurück zum Zitat H. Scholsberg, A scale for the judgment of facial expressions. J. Exp. Psychol. 29, 497 (1941)CrossRef H. Scholsberg, A scale for the judgment of facial expressions. J. Exp. Psychol. 29, 497 (1941)CrossRef
29.
Zurück zum Zitat H. Schlosberg, Three dimensions of emotion. Psychol. Rev. 61, 81 (1954)CrossRef H. Schlosberg, Three dimensions of emotion. Psychol. Rev. 61, 81 (1954)CrossRef
30.
Zurück zum Zitat J.A. Russell, A. Mehrabian, Evidence for a three-factor theory of emotions. J. Res. Pers. 11, 273–294 (1977)CrossRef J.A. Russell, A. Mehrabian, Evidence for a three-factor theory of emotions. J. Res. Pers. 11, 273–294 (1977)CrossRef
31.
Zurück zum Zitat R. Lazarus, Emotion and Adaptation (Oxford University Press, New York, 1991) R. Lazarus, Emotion and Adaptation (Oxford University Press, New York, 1991)
32.
Zurück zum Zitat M. Schröder, Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis (Ph. D thesis), Saarland University (2004) M. Schröder, Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis (Ph. D thesis), Saarland University (2004)
34.
Zurück zum Zitat R.R. Cornelius, The Science of Emotion: Research and Tradition in the Psychology of Emotions (Prentice-Hall, Inc, Upper Saddle River, 1996) R.R. Cornelius, The Science of Emotion: Research and Tradition in the Psychology of Emotions (Prentice-Hall, Inc, Upper Saddle River, 1996)
35.
Zurück zum Zitat M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, R. Cowie, Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies, in INTERSPEECH (2008), pp. 597–600 M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, R. Cowie, Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies, in INTERSPEECH (2008), pp. 597–600
36.
Zurück zum Zitat M. Grimm, K. Kroschel, Emotion estimation in speech using a 3d emotion space concept, in Robust Speech Recognition and Understanding, ed. by M. Grimm, K. Kroschel (I-Tech, Vienna, 2007), pp. 281–300CrossRef M. Grimm, K. Kroschel, Emotion estimation in speech using a 3d emotion space concept, in Robust Speech Recognition and Understanding, ed. by M. Grimm, K. Kroschel (I-Tech, Vienna, 2007), pp. 281–300CrossRef
37.
Zurück zum Zitat H.P. Espinosa, C.A.R. García, L.V. Pineda, Features selection for primitives estimation on emotional speech, in 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (2010), pp. 5138–5141 H.P. Espinosa, C.A.R. García, L.V. Pineda, Features selection for primitives estimation on emotional speech, in 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (2010), pp. 5138–5141
38.
Zurück zum Zitat H. Gunes, B. Schuller, M. Pantic, R. Cowie, Emotion representation, analysis and synthesis in continuous space: A survey, in 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011) (2011), pp. 827–834 H. Gunes, B. Schuller, M. Pantic, R. Cowie, Emotion representation, analysis and synthesis in continuous space: A survey, in 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011) (2011), pp. 827–834
40.
Zurück zum Zitat B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth, Acoustic emotion recognition: A benchmark comparison of performances, in ASRU 2009. IEEE Workshop on Automatic Speech Recognition & Understanding, 2009 (2009), pp. 552–557 B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth, Acoustic emotion recognition: A benchmark comparison of performances, in ASRU 2009. IEEE Workshop on Automatic Speech Recognition & Understanding, 2009 (2009), pp. 552–557
41.
Zurück zum Zitat M. Wollmer, B. Schuller, F. Eyben, G. Rigoll, Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening. IEEE J. Sel. Top. Signal Process. 4, 867–881 (2010)CrossRef M. Wollmer, B. Schuller, F. Eyben, G. Rigoll, Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening. IEEE J. Sel. Top. Signal Process. 4, 867–881 (2010)CrossRef
42.
Zurück zum Zitat R. Barra, J.M. Montero, J. Macias-Guarasa, L.F. D’Haro, R. San-Segundo, R. Cordoba, Prosodic and segmental rubrics in emotion identification, in 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings (2006), pp. I–I R. Barra, J.M. Montero, J. Macias-Guarasa, L.F. D’Haro, R. San-Segundo, R. Cordoba, Prosodic and segmental rubrics in emotion identification, in 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings (2006), pp. I–I
43.
Zurück zum Zitat M. Borchert, A. Dusterhoft, Emotions in speech – Experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments, in Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE ’05 (2005), pp. 147–151 M. Borchert, A. Dusterhoft, Emotions in speech – Experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments, in Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE ’05 (2005), pp. 147–151
44.
Zurück zum Zitat M. Lugger, B. Yang, An incremental analysis of different feature groups in speaker independent emotion recognition, in ICPhS (2007) M. Lugger, B. Yang, An incremental analysis of different feature groups in speaker independent emotion recognition, in ICPhS (2007)
45.
Zurück zum Zitat M. Pantic, L.J.M. Rothkrantz, Toward an affect-sensitive multimodal human–computer interaction. Proc. IEEE 91, 1370–1390 (2003)CrossRef M. Pantic, L.J.M. Rothkrantz, Toward an affect-sensitive multimodal human–computer interaction. Proc. IEEE 91, 1370–1390 (2003)CrossRef
46.
Zurück zum Zitat D. Ververidis, C. Kotropoulos, Emotional speech recognition: resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)CrossRef D. Ververidis, C. Kotropoulos, Emotional speech recognition: resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)CrossRef
47.
Zurück zum Zitat L. Vidrascu, L. Devillers, Five emotion classes detection in real-world call center data: The use of various types of paralinguistic features, in Proceedings of International Workshop on Paralinguistic Speech – 2007 (2007), pp. 11–16. L. Vidrascu, L. Devillers, Five emotion classes detection in real-world call center data: The use of various types of paralinguistic features, in Proceedings of International Workshop on Paralinguistic Speech – 2007 (2007), pp. 11–16.
48.
Zurück zum Zitat S. Yacoub, S. Simske, X. Lin, J. Burns, Recognition of emotions in interactive voice response systems, in Eighth European Conference on Speech Communication and Technology (2003), pp. 729–732 S. Yacoub, S. Simske, X. Lin, J. Burns, Recognition of emotions in interactive voice response systems, in Eighth European Conference on Speech Communication and Technology (2003), pp. 729–732
49.
Zurück zum Zitat D. Bitouk, R. Verma, A. Nenkova, Class-level spectral features for emotion recognition. Speech Commun. 52, 613–625 (2010)CrossRef D. Bitouk, R. Verma, A. Nenkova, Class-level spectral features for emotion recognition. Speech Commun. 52, 613–625 (2010)CrossRef
51.
Zurück zum Zitat C. Lee, S. Narayanan, R. Pieraccini, Combining acoustic and language information for emotion recognition, in Seventh International Conference on Spoken Language Processing (2002), pp. 873–876 C. Lee, S. Narayanan, R. Pieraccini, Combining acoustic and language information for emotion recognition, in Seventh International Conference on Spoken Language Processing (2002), pp. 873–876
52.
Zurück zum Zitat B. Schuller, A. Batliner, S. Steidl, D. Seppi, Emotion recognition from speech: Putting ASR in the loop, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2009. ICASSP 2009 (2009), pp. 4585–4588 B. Schuller, A. Batliner, S. Steidl, D. Seppi, Emotion recognition from speech: Putting ASR in the loop, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2009. ICASSP 2009 (2009), pp. 4585–4588
53.
Zurück zum Zitat B. Schuller, G. Rigoll, M. Lang, Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ’04), vol. 1 (2004) pp. I-577–I-580 B. Schuller, G. Rigoll, M. Lang, Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ’04), vol. 1 (2004) pp. I-577–I-580
54.
Zurück zum Zitat S. Mozziconacci, D. Hermes, Role of intonation patterns in conveying emotion in speech, in 14th International Conference of Phonetic Sciences (1999), pp. 2001–2004 S. Mozziconacci, D. Hermes, Role of intonation patterns in conveying emotion in speech, in 14th International Conference of Phonetic Sciences (1999), pp. 2001–2004
55.
Zurück zum Zitat F. Burkhardt, W. Sendlmeier, Verification of acoustical correlates of emotional speech using formant-synthesis, in ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000), pp. 151–156 F. Burkhardt, W. Sendlmeier, Verification of acoustical correlates of emotional speech using formant-synthesis, in ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000), pp. 151–156
56.
Zurück zum Zitat M. Wöllmer, F. Eyben, B. Schuller, E. Douglas-Cowie, R. Cowie, Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks, in INTERSPEECH (2009), pp. 1595–1598 M. Wöllmer, F. Eyben, B. Schuller, E. Douglas-Cowie, R. Cowie, Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks, in INTERSPEECH (2009), pp. 1595–1598
57.
Zurück zum Zitat A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, V. Aharonson, L. Kessous, Whodunnit–searching for the most important feature types signalling emotion-related user states in speech. Comput. Speech Lang. 25, 4–28 (2011)CrossRef A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, V. Aharonson, L. Kessous, Whodunnit–searching for the most important feature types signalling emotion-related user states in speech. Comput. Speech Lang. 25, 4–28 (2011)CrossRef
58.
Zurück zum Zitat T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40 (2010)CrossRef T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40 (2010)CrossRef
59.
Zurück zum Zitat E. Ambikairajah, H. Li, W. Liang, Y. Bo, V. Sethu, Language identification: a tutorial. IEEE Circuits Syst. Magazine 11, 82–108 (2011)CrossRef E. Ambikairajah, H. Li, W. Liang, Y. Bo, V. Sethu, Language identification: a tutorial. IEEE Circuits Syst. Magazine 11, 82–108 (2011)CrossRef
60.
Zurück zum Zitat M. Kockmann, L. Burget, J. Cernocky, Brno University of Technology System for Interspeech 2009 Emotion Challenge, in INTERSPEECH-2009 (2009), pp. 348–351 M. Kockmann, L. Burget, J. Cernocky, Brno University of Technology System for Interspeech 2009 Emotion Challenge, in INTERSPEECH-2009 (2009), pp. 348–351
61.
Zurück zum Zitat V. Sethu, J. Epps, E. Ambikairajah, Speaker variability in speech based emotion models – analysis and normalisation, in ICASSP (2013) V. Sethu, J. Epps, E. Ambikairajah, Speaker variability in speech based emotion models – analysis and normalisation, in ICASSP (2013)
63.
Zurück zum Zitat V. Sethu, Automatic emotion recognition: an investigation of acoustic and prosodic parameters (PhD Thesis), The University of New South Wales, Sydney (2009) V. Sethu, Automatic emotion recognition: an investigation of acoustic and prosodic parameters (PhD Thesis), The University of New South Wales, Sydney (2009)
64.
Zurück zum Zitat V. Sethu, E. Ambikairajah, J. Epps, On the use of speech parameter contours for emotion recognition. EURASIP J. Audio Speech Music Process. 2013, 1–14 (2013)CrossRef V. Sethu, E. Ambikairajah, J. Epps, On the use of speech parameter contours for emotion recognition. EURASIP J. Audio Speech Music Process. 2013, 1–14 (2013)CrossRef
65.
Zurück zum Zitat C. Busso, S. Lee, S.S. Narayanan, Using neutral speech models for emotional speech analysis, in INTERSPEECH (2007), pp. 2225–2228 C. Busso, S. Lee, S.S. Narayanan, Using neutral speech models for emotional speech analysis, in INTERSPEECH (2007), pp. 2225–2228
66.
Zurück zum Zitat B. Schuller, G. Rigoll, M. Lang, Hidden Markov model-based speech emotion recognition, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03), vol. 2 (2003), pp. II-1–II-4 B. Schuller, G. Rigoll, M. Lang, Hidden Markov model-based speech emotion recognition, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03), vol. 2 (2003), pp. II-1–II-4
67.
Zurück zum Zitat V. Sethu, E. Ambikairajah, J. Epps, Group delay features for emotion detection, in INTERSPEECH-2007 (2007), pp. 2273–2276 V. Sethu, E. Ambikairajah, J. Epps, Group delay features for emotion detection, in INTERSPEECH-2007 (2007), pp. 2273–2276
68.
Zurück zum Zitat C.M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, S. Narayanan, Emotion recognition based on phoneme classes, in INTERSPEECH (2004) C.M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, S. Narayanan, Emotion recognition based on phoneme classes, in INTERSPEECH (2004)
69.
Zurück zum Zitat D. Küstner, R. Tato, T. Kemp, B. Meffert, Towards real life applications in emotion recognition, in Affective Dialogue Systems, ed. by E. André, L. Dybkaer, W. Minker, P. Heisterkamp (Springer, Berlin, 2004), pp. 25–35CrossRef D. Küstner, R. Tato, T. Kemp, B. Meffert, Towards real life applications in emotion recognition, in Affective Dialogue Systems, ed. by E. André, L. Dybkaer, W. Minker, P. Heisterkamp (Springer, Berlin, 2004), pp. 25–35CrossRef
70.
Zurück zum Zitat B. Schuller, S. Steidl, A. Batliner, The INTERSPEECH 2009 emotion challenge, in INTERSPEECH-2009, Brighton (2009), pp. 312–315 B. Schuller, S. Steidl, A. Batliner, The INTERSPEECH 2009 emotion challenge, in INTERSPEECH-2009, Brighton (2009), pp. 312–315
71.
Zurück zum Zitat B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, F. Weninger, F. Eyben, E. Marchi, M. Mortillaro, H. Salamin, A. Polychroniou, F. Valente, S. Kim, The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism, in Interspeech, Lyon (2013) B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, F. Weninger, F. Eyben, E. Marchi, M. Mortillaro, H. Salamin, A. Polychroniou, F. Valente, S. Kim, The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism, in Interspeech, Lyon (2013)
74.
Zurück zum Zitat J. Cichosz, K. Slot, Emotion recognition in speech signal using emotion-extracting binary decision trees, Doctoral Consortium. ACII (2007) J. Cichosz, K. Slot, Emotion recognition in speech signal using emotion-extracting binary decision trees, Doctoral Consortium. ACII (2007)
75.
Zurück zum Zitat M.W. Bhatti, W. Yongjin, G. Ling, A neural network approach for human emotion recognition in speech, in Proceedings of the 2004 International Symposium on Circuits and Systems, 2004. ISCAS ’04., vol. 2 (2004), pp. II-181–II-184 M.W. Bhatti, W. Yongjin, G. Ling, A neural network approach for human emotion recognition in speech, in Proceedings of the 2004 International Symposium on Circuits and Systems, 2004. ISCAS ’04., vol. 2 (2004), pp. II-181–II-184
76.
Zurück zum Zitat V. Petrushin, Emotion in speech: recognition and application to call centers, in Conference on Artificial Neural Networks in Engineering (1999), pp. 7–10 V. Petrushin, Emotion in speech: recognition and application to call centers, in Conference on Artificial Neural Networks in Engineering (1999), pp. 7–10
77.
Zurück zum Zitat L. Chul Min, S.S. Narayanan, R. Pieraccini, Classifying emotions in human–machine spoken dialogs, in 2002 IEEE International Conference on Multimedia and Expo, 2002. ICME ’02 Proceedings, vol. 1 (2002), pp. 737–740 L. Chul Min, S.S. Narayanan, R. Pieraccini, Classifying emotions in human–machine spoken dialogs, in 2002 IEEE International Conference on Multimedia and Expo, 2002. ICME ’02 Proceedings, vol. 1 (2002), pp. 737–740
79.
Zurück zum Zitat W.M. Campbell, D.E. Sturim, D.A. Reynolds, A. Solomonoff, SVM based speaker verification using a GMM supervector kernel and NAP variability compensation, in 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings (2006), pp. I-I W.M. Campbell, D.E. Sturim, D.A. Reynolds, A. Solomonoff, SVM based speaker verification using a GMM supervector kernel and NAP variability compensation, in 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings (2006), pp. I-I
80.
Zurück zum Zitat H. Hao, X. Ming-Xing, W. Wei, GMM supervector based SVM with spectral features for speech emotion recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007 (2007), pp. IV-413–IV-416 H. Hao, X. Ming-Xing, W. Wei, GMM supervector based SVM with spectral features for speech emotion recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007 (2007), pp. IV-413–IV-416
81.
Zurück zum Zitat O. Kwon, K. Chan, J. Hao, T. Lee, Emotion recognition by speech signals, in INTERSPEECH-2003 (2003), pp. 125–128 O. Kwon, K. Chan, J. Hao, T. Lee, Emotion recognition by speech signals, in INTERSPEECH-2003 (2003), pp. 125–128
83.
Zurück zum Zitat D.G. Childers, C.K. Lee, Vocal quality factors: analysis, synthesis, and perception. J. Acoust. Soc. Am. 90, 2394–2410 (1991)CrossRef D.G. Childers, C.K. Lee, Vocal quality factors: analysis, synthesis, and perception. J. Acoust. Soc. Am. 90, 2394–2410 (1991)CrossRef
85.
Zurück zum Zitat T.L. Nwe, S.W. Foo, L.C. De Silva, Speech emotion recognition using hidden Markov models. Speech Commun. 41, 603–623 (2003)CrossRef T.L. Nwe, S.W. Foo, L.C. De Silva, Speech emotion recognition using hidden Markov models. Speech Commun. 41, 603–623 (2003)CrossRef
86.
Zurück zum Zitat A. Nogueiras, A. Moreno, A. Bonafonte, J. Mariño, Speech emotion recognition using hidden Markov models, in Proceedings of EUROSPEECH-2001 (2001), pp. 2679–2682 A. Nogueiras, A. Moreno, A. Bonafonte, J. Mariño, Speech emotion recognition using hidden Markov models, in Proceedings of EUROSPEECH-2001 (2001), pp. 2679–2682
87.
Zurück zum Zitat B. Vlasenko, B. Schuller, A. Wendemuth, G. Rigoll, Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing, in Affective Computing and Intelligent Interaction, ed. by A. Paiva, R. Prada, R. Picard (Springer, Berlin, 2007), pp. 139–147CrossRef B. Vlasenko, B. Schuller, A. Wendemuth, G. Rigoll, Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing, in Affective Computing and Intelligent Interaction, ed. by A. Paiva, R. Prada, R. Picard (Springer, Berlin, 2007), pp. 139–147CrossRef
88.
Zurück zum Zitat L. Rabiner, B. Juang, An introduction to hidden Markov models. IEEE ASSP Magazine 3, 4–16 (1986)CrossRef L. Rabiner, B. Juang, An introduction to hidden Markov models. IEEE ASSP Magazine 3, 4–16 (1986)CrossRef
89.
Zurück zum Zitat P. Chang-Hyun, S. Kwee-Bo, Emotion recognition and acoustic analysis from speech signal, in Proceedings of the International Joint Conference on Neural Networks, 2003., vol. 4 (2003), pp. 2594–2598 P. Chang-Hyun, S. Kwee-Bo, Emotion recognition and acoustic analysis from speech signal, in Proceedings of the International Joint Conference on Neural Networks, 2003., vol. 4 (2003), pp. 2594–2598
90.
Zurück zum Zitat S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRef S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRef
91.
Zurück zum Zitat A. Batliner, Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom, (Wiley, 2013) A. Batliner, Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom, (Wiley, 2013)
92.
Zurück zum Zitat A. Batliner, R. Huber, in Speaker Characteristics and Emotion Classification, ed. by C. Müller, vol. 4343 (Springer, Berlin, 2007), pp. 138–151CrossRef A. Batliner, R. Huber, in Speaker Characteristics and Emotion Classification, ed. by C. Müller, vol. 4343 (Springer, Berlin, 2007), pp. 138–151CrossRef
93.
Zurück zum Zitat C. Busso, M. Bulut, S.S. Narayanan, in Toward Effective Automatic Recognition Systems of Emotion in Speech, ed. by J. Gratch, S. Marsella (Oxford University Press, New York, 2012) C. Busso, M. Bulut, S.S. Narayanan, in Toward Effective Automatic Recognition Systems of Emotion in Speech, ed. by J. Gratch, S. Marsella (Oxford University Press, New York, 2012)
94.
95.
Zurück zum Zitat L. Van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 85 (2008) L. Van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 85 (2008)
96.
Zurück zum Zitat C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. Chang, S. Lee, S. Narayanan, IEMOCAP: interactive emotional dyadic motion capture database. Lang. Res. Eval. 42, 335–359 (2008)CrossRef C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. Chang, S. Lee, S. Narayanan, IEMOCAP: interactive emotional dyadic motion capture database. Lang. Res. Eval. 42, 335–359 (2008)CrossRef
97.
Zurück zum Zitat V. Sethu, E. Ambikairajah, J. Epps, Phonetic and speaker variations in automatic emotion classification, in INTERSPEECH-2008 (2008), pp. 617–620 V. Sethu, E. Ambikairajah, J. Epps, Phonetic and speaker variations in automatic emotion classification, in INTERSPEECH-2008 (2008), pp. 617–620
98.
Zurück zum Zitat D. Ni, V. Sethu, J. Epps, E. Ambikairajah, Speaker variability in emotion recognition – An adaptation based approach, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 5101–5104 D. Ni, V. Sethu, J. Epps, E. Ambikairajah, Speaker variability in emotion recognition – An adaptation based approach, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 5101–5104
99.
Zurück zum Zitat K. Jae-Bok, P. Jeong-Sik, O. Yung-Hwan, On-line speaker adaptation based emotion recognition using incremental emotional information, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4948–4951 K. Jae-Bok, P. Jeong-Sik, O. Yung-Hwan, On-line speaker adaptation based emotion recognition using incremental emotional information, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4948–4951
100.
Zurück zum Zitat V. Sethu, E. Ambikairajah, J. Epps, Speaker normalisation for speech-based emotion detection, in 2007 15th International Conference on Digital Signal Processing (2007), pp. 611–614 V. Sethu, E. Ambikairajah, J. Epps, Speaker normalisation for speech-based emotion detection, in 2007 15th International Conference on Digital Signal Processing (2007), pp. 611–614
101.
Zurück zum Zitat C. Busso, A. Metallinou, S.S. Narayanan, Iterative feature normalization for emotional speech detection, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 5692–5695 C. Busso, A. Metallinou, S.S. Narayanan, Iterative feature normalization for emotional speech detection, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 5692–5695
102.
Zurück zum Zitat B. Schuller, M. Wimmer, D. Arsic, T. Moosmayr, G. Rigoll, Detection of security related affect and behaviour in passenger transport, in Interspeech, Brisbane (2008), pp. 265–268 B. Schuller, M. Wimmer, D. Arsic, T. Moosmayr, G. Rigoll, Detection of security related affect and behaviour in passenger transport, in Interspeech, Brisbane (2008), pp. 265–268
103.
Zurück zum Zitat L. Ming, A. Metallinou, D. Bone, S. Narayanan, Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 1937–1940 L. Ming, A. Metallinou, D. Bone, S. Narayanan, Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 1937–1940
104.
Zurück zum Zitat T. Rahman, C. Busso, A personalized emotion recognition system using an unsupervised feature adaptation scheme, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 5117–5120 T. Rahman, C. Busso, A personalized emotion recognition system using an unsupervised feature adaptation scheme, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 5117–5120
105.
Zurück zum Zitat M. Tahon, A. Delaborde, L. Devillers, Real-life emotion detection from speech in human–robot interaction: Experiments across diverse corpora with child and adult voices, in INTERSPEECH (2011), pp. 3121–3124 M. Tahon, A. Delaborde, L. Devillers, Real-life emotion detection from speech in human–robot interaction: Experiments across diverse corpora with child and adult voices, in INTERSPEECH (2011), pp. 3121–3124
106.
Zurück zum Zitat B. Schuller, B. Vlasenko, F. Eyben, M. Wollmer, A. Stuhlsatz, A. Wendemuth, G. Rigoll, Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1, 119–131 (2010). doi:10.1109/T-AFFC.2010.8 CrossRef B. Schuller, B. Vlasenko, F. Eyben, M. Wollmer, A. Stuhlsatz, A. Wendemuth, G. Rigoll, Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1, 119–131 (2010). doi:10.​1109/​T-AFFC.​2010.​8 CrossRef
107.
Zurück zum Zitat E. Douglas-Cowie, N. Campbell, R. Cowie, P. Roach, Emotional speech: towards a new generation of databases. Speech Commun. 40, 33–60 (2003)CrossRefMATH E. Douglas-Cowie, N. Campbell, R. Cowie, P. Roach, Emotional speech: towards a new generation of databases. Speech Commun. 40, 33–60 (2003)CrossRefMATH
108.
Zurück zum Zitat E. Bozkurt, E. Erzin, C.E. Erdem, A.T. Erdem, Improving automatic emotion recognition from speech signals, in INTERSPEECH (2009), pp. 324–327 E. Bozkurt, E. Erzin, C.E. Erdem, A.T. Erdem, Improving automatic emotion recognition from speech signals, in INTERSPEECH (2009), pp. 324–327
109.
Zurück zum Zitat C.-C. Lee, E. Mower, C. Busso, S. Lee, S.S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach, in INTERSPEECH-2009 (2009), pp. 320–323 C.-C. Lee, E. Mower, C. Busso, S. Lee, S.S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach, in INTERSPEECH-2009 (2009), pp. 320–323
110.
Zurück zum Zitat B. Vlasenko, A. Wendemuth, Processing affected speech within human machine interaction, in INTERSPEECH (2009), pp. 2039–2042 B. Vlasenko, A. Wendemuth, Processing affected speech within human machine interaction, in INTERSPEECH (2009), pp. 2039–2042
111.
Zurück zum Zitat I. Luengo, E. Navas, I. Hernaez, Combining spectral and prosodic information for emotion recognition in the Interspeech 2009 emotion challenge, in INTERSPEECH-2009 (2009), pp. 332–335 I. Luengo, E. Navas, I. Hernaez, Combining spectral and prosodic information for emotion recognition in the Interspeech 2009 emotion challenge, in INTERSPEECH-2009 (2009), pp. 332–335
112.
Zurück zum Zitat S. Planet, I. Iriondo, J. Socoró, C. Monzo, J. Adell, GTM-URL contribution to the INTERSPEECH 2009 emotion challenge, in INTERSPEECH-2009 (2009), pp. 316–319 S. Planet, I. Iriondo, J. Socoró, C. Monzo, J. Adell, GTM-URL contribution to the INTERSPEECH 2009 emotion challenge, in INTERSPEECH-2009 (2009), pp. 316–319
113.
Zurück zum Zitat P. Dumouchel, N. Dehak, Y. Attabi, R. Dehak, N. Boufaden, Cepstral and long-term features for emotion recognition, in INTERSPEECH-2009 (2009), pp. 344–347 P. Dumouchel, N. Dehak, Y. Attabi, R. Dehak, N. Boufaden, Cepstral and long-term features for emotion recognition, in INTERSPEECH-2009 (2009), pp. 344–347
114.
Zurück zum Zitat T. Vogt, E. André, Exploring the benefits of discretization of acoustic features for speech emotion recognition, in INTERSPEECH (2009), pp. 328–331 T. Vogt, E. André, Exploring the benefits of discretization of acoustic features for speech emotion recognition, in INTERSPEECH (2009), pp. 328–331
115.
Zurück zum Zitat R. Barra Chicote, F. Fernández Martínez, L. Lutfi, S. Binti, J.M. Lucas Cuesta, J. Macías Guarasa, J.M. Montero Martínez, R. San Segundo Hernández, J.M. Pardo Muñoz, Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions, in INTERSPEECH 2009 (2009), pp. 336–339 R. Barra Chicote, F. Fernández Martínez, L. Lutfi, S. Binti, J.M. Lucas Cuesta, J. Macías Guarasa, J.M. Montero Martínez, R. San Segundo Hernández, J.M. Pardo Muñoz, Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions, in INTERSPEECH 2009 (2009), pp. 336–339
116.
Zurück zum Zitat O. Räsänen, J. Pohjalainen, Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon (2013) O. Räsänen, J. Pohjalainen, Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon (2013)
117.
Zurück zum Zitat G. Gosztolya, R. Busa-Fekete, L. Tóth, Detecting autism, emotions and social signals using AdaBoost. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon G. Gosztolya, R. Busa-Fekete, L. Tóth, Detecting autism, emotions and social signals using AdaBoost. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon
118.
Zurück zum Zitat H.-y. Lee, T.-y. Hu, H. Jing, Y.-F. Chang, Y. Tsao, Y.-C. Kao, T.-L. Pao, Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon (2013) H.-y. Lee, T.-y. Hu, H. Jing, Y.-F. Chang, Y. Tsao, Y.-C. Kao, T.-L. Pao, Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon (2013)
119.
Zurück zum Zitat V. Sethu, J. Epps, E. Ambikairajah, H. Li, GMM based speaker variability compensated system for Interspeech 2013 ComParE emotion challenge. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon (2013) V. Sethu, J. Epps, E. Ambikairajah, H. Li, GMM based speaker variability compensated system for Interspeech 2013 ComParE emotion challenge. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon (2013)
Metadaten
Titel
Speech Based Emotion Recognition
verfasst von
Vidhyasaharan Sethu
Julien Epps
Eliathamby Ambikairajah
Copyright-Jahr
2015
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4939-1456-2_7

Neuer Inhalt