Skip to main content
Erschienen in: International Journal of Speech Technology 1/2018

19.01.2018

Databases, features and classifiers for speech emotion recognition: a review

verfasst von: Monorama Swain, Aurobinda Routray, P. Kabisatpathy

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speech is an effective medium to express emotions and attitude through language. Finding the emotional content from a speech signal and identify the emotions from the speech utterances is an important task for the researchers. Speech emotion recognition has considered as an important research area over the last decade. Many researchers have been attracted due to the automated analysis of human affective behaviour. Therefore a number of systems, algorithms, and classifiers have been developed and outlined for the identification of emotional content of a speech from a person’s speech. In this study, available literature on various databases, different features and classifiers have been taken in to consideration for speech emotion recognition from assorted languages.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abrilian, S., Devillers, L., & Martin, J. C. (2006). Annotation of emotions in real-life video interviews: Variability between coders. In 5th international conference on language resources and evaluation (LREC 06), Genoa, pp. 2004–2009. Abrilian, S., Devillers, L., & Martin, J. C. (2006). Annotation of emotions in real-life video interviews: Variability between coders. In 5th international conference on language resources and evaluation (LREC 06), Genoa, pp. 2004–2009.
Zurück zum Zitat Agrawal, S. S. (2011). Emotions in Hindi speech-analysis, perception and recognition. In International conference on speech database and assessments (Oriental COCOSDA). Agrawal, S. S. (2011). Emotions in Hindi speech-analysis, perception and recognition. In International conference on speech database and assessments (Oriental COCOSDA).
Zurück zum Zitat Agrawal, S. S., Jain, A., & Arora, S. (2009). Acoustic and perceptual features of intonation patterns in Hindi speech. In International workshop on spoken language prosody (IWSLPR-09), Kolkata, pp. 25–27. Agrawal, S. S., Jain, A., & Arora, S. (2009). Acoustic and perceptual features of intonation patterns in Hindi speech. In International workshop on spoken language prosody (IWSLPR-09), Kolkata, pp. 25–27.
Zurück zum Zitat Alonso, J. B., Cabrera, J., Medina, M., & Travieso, C. M. (2015). New approach in quantification of emotional intensity from the speech signal: Emotional temperature. Experts Systems with Applications, 42, 9554–9564.CrossRef Alonso, J. B., Cabrera, J., Medina, M., & Travieso, C. M. (2015). New approach in quantification of emotional intensity from the speech signal: Emotional temperature. Experts Systems with Applications, 42, 9554–9564.CrossRef
Zurück zum Zitat Amer, M. R., Siddiquie, B., Richey, C., & Divakaran, A. (2014). Emotion detection in speech using deep networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3724–3728. Amer, M. R., Siddiquie, B., Richey, C., & Divakaran, A. (2014). Emotion detection in speech using deep networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3724–3728.
Zurück zum Zitat Amir, N., Ron, S., & Laor, N. (2000). Analysis of an emotional speech corpus in Hebrew based on objective criteria. In Proceedings of ISCA workshop speech and emotion, Belfast, Vol. 1, pp. 29–33. Amir, N., Ron, S., & Laor, N. (2000). Analysis of an emotional speech corpus in Hebrew based on objective criteria. In Proceedings of ISCA workshop speech and emotion, Belfast, Vol. 1, pp. 29–33.
Zurück zum Zitat Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697.CrossRef Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697.CrossRef
Zurück zum Zitat Atassi, H., & Esposito, A. (2008). A speaker independent approach to the classification of emotional vocal expressions. In IEEE international conference on tools with artificial intelligence (ICTAI’08), Dayton, Ohio, USA, Vol 2, pp 147–152. Atassi, H., & Esposito, A. (2008). A speaker independent approach to the classification of emotional vocal expressions. In IEEE international conference on tools with artificial intelligence (ICTAI’08), Dayton, Ohio, USA, Vol 2, pp 147–152.
Zurück zum Zitat Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.CrossRef Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.CrossRef
Zurück zum Zitat Bapineedu, G., Avinash, B., Gangashetty, S. V., & Yegnanarayana, B. (2009). Analysis of Lombard speech using excitation source information. In INTERSPEECH-09, Brighton, UK, pp. 1091–1094. Bapineedu, G., Avinash, B., Gangashetty, S. V., & Yegnanarayana, B. (2009). Analysis of Lombard speech using excitation source information. In INTERSPEECH-09, Brighton, UK, pp. 1091–1094.
Zurück zum Zitat Batliner, A., Biersack, S., & Steidl, S. (2006). The prosody of pet robot directed speech: Evidence from children. In Speech prosody, Dresden, pp. 1–4. Batliner, A., Biersack, S., & Steidl, S. (2006). The prosody of pet robot directed speech: Evidence from children. In Speech prosody, Dresden, pp. 1–4.
Zurück zum Zitat Batliner, A., Hacker, C., Steidl, S., Noth, E., D’Arcy, S., Russell, M., & Wong, M. (2004). You stupid tin box—children interacting with the AIBO robot: A cross-linguistic emotional speech corpus. In Proceedings of language resources and evaluation (LREC 04), Lisbon. Batliner, A., Hacker, C., Steidl, S., Noth, E., D’Arcy, S., Russell, M., & Wong, M. (2004). You stupid tin box—children interacting with the AIBO robot: A cross-linguistic emotional speech corpus. In Proceedings of language resources and evaluation (LREC 04), Lisbon.
Zurück zum Zitat Batliner, A., Huber, R., Niemann, H., Nöth, E., Spilker, J., & Fischer, K. (2000). The recognition of emotion. In Verbmobil: Foundations of speech-to-speech translation, pp. 122–130. Batliner, A., Huber, R., Niemann, H., Nöth, E., Spilker, J., & Fischer, K. (2000). The recognition of emotion. In Verbmobil: Foundations of speech-to-speech translation, pp. 122–130.
Zurück zum Zitat Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7–8), 613–625. Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7–8), 613–625.
Zurück zum Zitat Borden, G., Harris, K., & Raphael, L. (1994). Speech science primer: Physiology, acoustics, and perception of speech (3rd ed.). Baltimore: Williams and Wilkins. Borden, G., Harris, K., & Raphael, L. (1994). Speech science primer: Physiology, acoustics, and perception of speech (3rd ed.). Baltimore: Williams and Wilkins.
Zurück zum Zitat Bozkurt, E., Erzin, E., & Erdem, A. T. (2009). Improving automatic emotion recognition from speech signals. In 10th annual conference of the international speech communication association (INTERSPEECH), Brighton, UK, pp. 324–327. Bozkurt, E., Erzin, E., & Erdem, A. T. (2009). Improving automatic emotion recognition from speech signals. In 10th annual conference of the international speech communication association (INTERSPEECH), Brighton, UK, pp. 324–327.
Zurück zum Zitat Brester, C., Semenkin, E., & Sidorov, M. (2016). Multi-objective heuristic feature selection for speech-based multilingual emotion recognition. JAISCR, 6(4), 243–253. Brester, C., Semenkin, E., & Sidorov, M. (2016). Multi-objective heuristic feature selection for speech-based multilingual emotion recognition. JAISCR, 6(4), 243–253.
Zurück zum Zitat Buck, R. (1999). The biological affects, a typology. Psychological Review, 106(2), 301–336.CrossRef Buck, R. (1999). The biological affects, a typology. Psychological Review, 106(2), 301–336.CrossRef
Zurück zum Zitat Bulut, M., Narayanan, S. S., & Syrdal, A. K. (2002). Expressive speech synthesis using a concatenative synthesizer. In Proceedings of international conference on spoken language processing (ICSLP’02), Vol. 2, pp. 1265–1268. Bulut, M., Narayanan, S. S., & Syrdal, A. K. (2002). Expressive speech synthesis using a concatenative synthesizer. In Proceedings of international conference on spoken language processing (ICSLP’02), Vol. 2, pp. 1265–1268.
Zurück zum Zitat Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Proceedings of the INTERSPEECH 2005, Lissabon, Portugal, pp. 1517–1520. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Proceedings of the INTERSPEECH 2005, Lissabon, Portugal, pp. 1517–1520.
Zurück zum Zitat Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S. et al. (2008) IEMOCAP: Interactive emotional dyadic motion capture database. In: Language resources and evaluation. Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S. et al. (2008) IEMOCAP: Interactive emotional dyadic motion capture database. In: Language resources and evaluation.
Zurück zum Zitat Caballero-Morales, S. O. (2013) Recognition of emotions in Mexican Spanish speech: An approach based on acoustic modelling of emotion-specific vowels. The Scientific World Journal, 2013, 1–13. Caballero-Morales, S. O. (2013) Recognition of emotions in Mexican Spanish speech: An approach based on acoustic modelling of emotion-specific vowels. The Scientific World Journal, 2013, 1–13.
Zurück zum Zitat Caldognetto, E. M., Cosi, P., Drioli, C., Tisato, G., & Cavicchio, F. (2004). Modifications of phonetic labial targets in emotive speech: Effects of the co-production of speech and emotions. Speech Communication, 44, 173–185.CrossRef Caldognetto, E. M., Cosi, P., Drioli, C., Tisato, G., & Cavicchio, F. (2004). Modifications of phonetic labial targets in emotive speech: Effects of the co-production of speech and emotions. Speech Communication, 44, 173–185.CrossRef
Zurück zum Zitat Chauhan, A., Koolagudi, S. G., Kafley, S. & Rao, K. S. (2010). Emotion recognition using LP residual. In Proceedings of the 2010 IEEE students’ technology symposium, IIT Kharagpu. Chauhan, A., Koolagudi, S. G., Kafley, S. & Rao, K. S. (2010). Emotion recognition using LP residual. In Proceedings of the 2010 IEEE students’ technology symposium, IIT Kharagpu.
Zurück zum Zitat Chen, L., Mao, X., Xue, Y., & Lung, L. (2012). Speech emotion recognition: Features and classification models. Digital Signal Processing, 22(6), 1154–1160.MathSciNetCrossRef Chen, L., Mao, X., Xue, Y., & Lung, L. (2012). Speech emotion recognition: Features and classification models. Digital Signal Processing, 22(6), 1154–1160.MathSciNetCrossRef
Zurück zum Zitat Chuang, Z.-J., & Wu, C.-H. (2002). Emotion recognition from textual input using an emotional semantic network. In Proceedings of international conference on spoken language processing (ICSLP’02), Vol. 3, pp. 2033–2036. Chuang, Z.-J., & Wu, C.-H. (2002). Emotion recognition from textual input using an emotional semantic network. In Proceedings of international conference on spoken language processing (ICSLP’02), Vol. 3, pp. 2033–2036.
Zurück zum Zitat Cichosz, J., & Slot, K. (2005). Low-dimensional feature space derivation for emotion recognition. In INTERSPEECH’05, Lisbon, Portugal, pp. 477–480. Cichosz, J., & Slot, K. (2005). Low-dimensional feature space derivation for emotion recognition. In INTERSPEECH’05, Lisbon, Portugal, pp. 477–480.
Zurück zum Zitat Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO Corpus: An Italian emotional speech database. In Proceedings of the 9th international conference on language resources and evaluation—LREC 14, pp. 3501–3504. Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO Corpus: An Italian emotional speech database. In Proceedings of the 9th international conference on language resources and evaluation—LREC 14, pp. 3501–3504.
Zurück zum Zitat Cummings, K. E., & Clements, M. A. (1998). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98, 88–98. Cummings, K. E., & Clements, M. A. (1998). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98,  88–98.
Zurück zum Zitat Darwin, C. (1872/1965). The expression of the emotions in man and animals. Chicago University Press, Chicago. Darwin, C. (1872/1965). The expression of the emotions in man and animals. Chicago University Press, Chicago.
Zurück zum Zitat Dellaert, F., Polzin, T., & Waibel, A. (1996a). Recognising emotions in speech. In ICSLP 96. Dellaert, F., Polzin, T., & Waibel, A. (1996a). Recognising emotions in speech. In ICSLP 96.
Zurück zum Zitat Dellert, F., Polzin, T., & Waibel, A. (1996b). Recognizing emotion in speech. In 4th international conference on spoken language processing, Philadelphia, PA, USA, pp. 1970–1973. Dellert, F., Polzin, T., & Waibel, A. (1996b). Recognizing emotion in speech. In 4th international conference on spoken language processing, Philadelphia, PA, USA, pp. 1970–1973.
Zurück zum Zitat Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40, 33–60.CrossRefMATH Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40, 33–60.CrossRefMATH
Zurück zum Zitat Eckman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6, 169–200.CrossRef Eckman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6, 169–200.CrossRef
Zurück zum Zitat Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion. Sussex: Wiley. Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion. Sussex: Wiley.
Zurück zum Zitat EI Ayadi M, Kamel MS, Karray F (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 1(44), 572–587.CrossRefMATH EI Ayadi M, Kamel MS, Karray F (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 1(44), 572–587.CrossRefMATH
Zurück zum Zitat Esmaileyan, Z., & Marvi, H. (2014). A database for automatic Persian speech emotion recognition: Collection, processing and evaluation. IJE Transactions A: Bascis, 27(1), 79–90. Esmaileyan, Z., & Marvi, H. (2014). A database for automatic Persian speech emotion recognition: Collection, processing and evaluation. IJE Transactions A: Bascis, 27(1), 79–90.
Zurück zum Zitat Espinosa, H. P., Garcia, J. O., & Pineda, L. V. (2010). Features selection for primitives estimation on emotional speech. In ICASSP, Florence, Italy, pp. 5138–5141 Espinosa, H. P., Garcia, J. O., & Pineda, L. V. (2010). Features selection for primitives estimation on emotional speech. In ICASSP, Florence, Italy, pp. 5138–5141
Zurück zum Zitat Fernandez, R., & Picard, R. W. (2003). Modeling driver’s speech under stress. Speech Communication, 40, 145–159.CrossRefMATH Fernandez, R., & Picard, R. W. (2003). Modeling driver’s speech under stress. Speech Communication, 40, 145–159.CrossRefMATH
Zurück zum Zitat Shah, A. F., Vimal Krishnan, V. R., Sukumar, A. R., Jayakumar, A., & Anto, P. B. (2009). Speaker independent automatic emotion recognition in speech: A comparison of MFCCs and discrete wavelet transforms. In International conference on advances in recent technologies in communication and computing, ARTCom ‘09. Shah, A. F., Vimal Krishnan, V. R., Sukumar, A. R., Jayakumar, A., & Anto, P. B. (2009). Speaker independent automatic emotion recognition in speech: A comparison of MFCCs and discrete wavelet transforms. In International conference on advances in recent technologies in communication and computing, ARTCom ‘09.
Zurück zum Zitat Fontaine, J. R., Scherer, K. R., Roesch, E. B., & Ellsworth, P. C. (2007). The world of emotion is not two dimensional. Psychological Science, 13, 1050–1057.CrossRef Fontaine, J. R., Scherer, K. R., Roesch, E. B., & Ellsworth, P. C. (2007). The world of emotion is not two dimensional. Psychological Science, 13, 1050–1057.CrossRef
Zurück zum Zitat France, D. J., Shiavi, R. G., Silverman, S., Silverman, M., & Wilkes, M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Transactions on Biomedical Engineering, 7, 829–837.CrossRef France, D. J., Shiavi, R. G., Silverman, S., Silverman, M., & Wilkes, M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Transactions on Biomedical Engineering, 7, 829–837.CrossRef
Zurück zum Zitat Gangamohan, P., Kadiri, S. R., Gangashetty, S. V., & Yegnanarayana, B. (2014). Excitation source features for discrimination of anger and happy emotions. In: INTERSPEECH, Singapore, pp. 1253–1257. Gangamohan, P., Kadiri, S. R., Gangashetty, S. V., & Yegnanarayana, B. (2014). Excitation source features for discrimination of anger and happy emotions. In: INTERSPEECH, Singapore, pp. 1253–1257.
Zurück zum Zitat Gangamohan, P., Kadiri, S. R., & Yegnanarayana, B. (2013). Analysis of emotional speech at sub segmental level. In Interspeech, Lyon, France, pp. 1916–1920. Gangamohan, P., Kadiri, S. R., & Yegnanarayana, B. (2013). Analysis of emotional speech at sub segmental level. In Interspeech, Lyon, France, pp. 1916–1920.
Zurück zum Zitat Gomez, P., & Danuser, B. (2004). Relationships between musical structure and physiological measures of emotion. Emotion, 7(2), 377–387.CrossRef Gomez, P., & Danuser, B. (2004). Relationships between musical structure and physiological measures of emotion. Emotion, 7(2), 377–387.CrossRef
Zurück zum Zitat Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera Ammittag German audio-visual emotional speech database. In International conference on multimedia and expo, pp. 865–868. Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera Ammittag German audio-visual emotional speech database. In International conference on multimedia and expo, pp. 865–868.
Zurück zum Zitat Grimm, M., Mower, E., Kroschel, K., & Narayanan, S. (2006). Combining categorical and primitives-based emotion recognition. In 14th European signal processing conference (EUSIPCO 2006), Florence, Italy. Grimm, M., Mower, E., Kroschel, K., & Narayanan, S. (2006). Combining categorical and primitives-based emotion recognition. In 14th European signal processing conference (EUSIPCO 2006), Florence, Italy.
Zurück zum Zitat Haq, S., & Jackson, P. J. B. (2009). Speaker-dependent audio-visual emotion recognition. In Proceedings of international conference on auditory-visual speech processing, pp. 53–58. Haq, S., & Jackson, P. J. B. (2009). Speaker-dependent audio-visual emotion recognition. In Proceedings of international conference on auditory-visual speech processing, pp. 53–58.
Zurück zum Zitat He, L., Lech, M., & Allen, N. (2010). On the importance of glottal flow spectral energy for the recognition of emotions in speech. In INTERSPEECH 2010, Makuhari, Chiba, Japan, pp. 26–30. He, L., Lech, M., & Allen, N. (2010). On the importance of glottal flow spectral energy for the recognition of emotions in speech. In INTERSPEECH 2010, Makuhari, Chiba, Japan, pp. 26–30.
Zurück zum Zitat Hozjan, V., & Kacic, Z. (2003). Improved emotion recognition with large set of stastical features. Geneva: Eurospecch. Hozjan, V., & Kacic, Z. (2003). Improved emotion recognition with large set of stastical features. Geneva: Eurospecch.
Zurück zum Zitat Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. In Proceedings of the 3rd international conference on language (LREC’02) Las Palmas de Gran Canaria, Spain, pp. 2019–2023. Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. In Proceedings of the 3rd international conference on language (LREC’02) Las Palmas de Gran Canaria, Spain, pp. 2019–2023.
Zurück zum Zitat Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falco, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech and Language, 24(3), 445–460.CrossRef Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falco, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech and Language, 24(3), 445–460.CrossRef
Zurück zum Zitat Iliou, T., & Anagnostopoulos, C.-N. (2009). Statistical evaluation of speech features for emotion recognition. In Fourth international conference on digital telecommunications, Colmar, France, pp. 121–126. Iliou, T., & Anagnostopoulos, C.-N. (2009). Statistical evaluation of speech features for emotion recognition. In Fourth international conference on digital telecommunications, Colmar, France, pp. 121–126.
Zurück zum Zitat Iriondo, I., Guaus, R., & Rodriguez, A. (2000). Validation of an acoustical modeling of emotional expression in Spanish using speech synthesis techniques. In Proceedings of ISCA workshop speech and emotion, Belfast, Vol. 1, pp. 161–166. Iriondo, I., Guaus, R., & Rodriguez, A. (2000). Validation of an acoustical modeling of emotional expression in Spanish using speech synthesis techniques. In Proceedings of ISCA workshop speech and emotion, Belfast, Vol. 1, pp. 161–166.
Zurück zum Zitat Izard, C. E. (1992). Basic emotions, relations among emotions, and emotion-cognition relations. Psychological Review, 99, 561–565.CrossRef Izard, C. E. (1992). Basic emotions, relations among emotions, and emotion-cognition relations. Psychological Review, 99, 561–565.CrossRef
Zurück zum Zitat Jeon, J. H., Le, D., Xia, R., & Liu, Y. (2013). A preliminary study of cross-lingual emotion recognition from speech: Automatic classification versus human perception. In Interspeech, Layon, France, pp. 2837–2840. Jeon, J. H., Le, D., Xia, R., & Liu, Y. (2013). A preliminary study of cross-lingual emotion recognition from speech: Automatic classification versus human perception. In Interspeech, Layon, France, pp. 2837–2840.
Zurück zum Zitat Jiang, D.-N., & Cai, L. H. (2004). Classifying emotion in Chinese speech by decomposing prosodic features. In International conference on speech and language processing (ICSLP), Jeju, Korea. Jiang, D.-N., & Cai, L. H. (2004). Classifying emotion in Chinese speech by decomposing prosodic features. In International conference on speech and language processing (ICSLP), Jeju, Korea.
Zurück zum Zitat Jiang, D.-N., Zhang, W., Shen, L.-Q., & Cai, L.-H. (2005). Prosody analysis and modelling for emotional speech synthesis. In IEEE proceedings of ICASSP 2005, pp. 281–284. Jiang, D.-N., Zhang, W., Shen, L.-Q., & Cai, L.-H. (2005). Prosody analysis and modelling for emotional speech synthesis. In IEEE proceedings of ICASSP 2005, pp. 281–284.
Zurück zum Zitat Jin, X., & Wang, Z. (2005). An emotion space model for recognition of emotions in spoken Chinese (pp. 397–402). Berlin: Springer. Jin, X., & Wang, Z. (2005). An emotion space model for recognition of emotions in spoken Chinese (pp. 397–402). Berlin: Springer.
Zurück zum Zitat Jovičić, S. T., Kašić, Z., Đorđević, M., & Rajković, M. (2004). Serbian emotional speech database: Design, processing and evaluation. In SPECOM 9th conference speech and computer, St. Petersburg, Russia. Jovičić, S. T., Kašić, Z., Đorđević, M., & Rajković, M. (2004). Serbian emotional speech database: Design, processing and evaluation. In SPECOM 9th conference speech and computer, St. Petersburg, Russia.
Zurück zum Zitat Kadiri, S. R., Gangamohan, P., Gangashetty, S. V., & Yegnanarayana, B. (2015). Analysis of excitation source features of speech for emotion recognition. In INTERSPEECH 2015, Dresden, pp. 1324–1328. Kadiri, S. R., Gangamohan, P., Gangashetty, S. V., & Yegnanarayana, B. (2015). Analysis of excitation source features of speech for emotion recognition. In INTERSPEECH 2015, Dresden, pp. 1324–1328.
Zurück zum Zitat Kandali, A. B., Routray, A., & Basu, T. K. (2008a). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In Proceedings of IEEE region 10 conference on TENCHON. Kandali, A. B., Routray, A., & Basu, T. K. (2008a). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In Proceedings of IEEE region 10 conference on TENCHON.
Zurück zum Zitat Kandali, A. B., Routray, A., & Basu, T. K. (2008b). Emotion recognition from speeches of some native languages of ASSAM independent of text and speaker. In National seminar on Devices, Circuits, and Communications, B. I. T. Mesra, Ranchi, pp. 6–7. Kandali, A. B., Routray, A., & Basu, T. K. (2008b). Emotion recognition from speeches of some native languages of ASSAM independent of text and speaker. In National seminar on Devices, Circuits, and Communications, B. I. T. Mesra, Ranchi, pp. 6–7.
Zurück zum Zitat Kao, Y.-H., & Lee, L.-S. (2006). Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In INTERSPEECH-ICSLP, Pittsburgh, Pennsylvania, pp. 1814–1817. Kao, Y.-H., & Lee, L.-S. (2006). Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In INTERSPEECH-ICSLP, Pittsburgh, Pennsylvania, pp. 1814–1817.
Zurück zum Zitat Kim, J. B., Park, J. S., Oh, Y. H. (2011). On-line speaker adaptation based emotion recognition using incremental emotional information. In ICASSP, Prague, Czech Republic, pp. 4948–4951. Kim, J. B., Park, J. S., Oh, Y. H. (2011). On-line speaker adaptation based emotion recognition using incremental emotional information. In ICASSP, Prague, Czech Republic, pp. 4948–4951.
Zurück zum Zitat Koolagudi, S. G., Devliyal, S., Chawla, B., Barthwal, A., & Rao, K. S. (2012). Recognition of emotions from speech using excitation source features. Procedia Engineering, 38, 3409–3417.CrossRef Koolagudi, S. G., Devliyal, S., Chawla, B., Barthwal, A., & Rao, K. S. (2012). Recognition of emotions from speech using excitation source features. Procedia Engineering, 38, 3409–3417.CrossRef
Zurück zum Zitat Koolagudi, S. G., & Krothapalli, S. R. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(4), 495–511.CrossRef Koolagudi, S. G., & Krothapalli, S. R. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(4), 495–511.CrossRef
Zurück zum Zitat Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabati, S., & Rao, K. S. (2009). IITKGP-SESC: Speech database for emotion analysis. Communications in computer and information science, LNCS (pp. 485–492). Berlin: Springer. Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabati, S., & Rao, K. S. (2009). IITKGP-SESC: Speech database for emotion analysis. Communications in computer and information science, LNCS (pp. 485–492). Berlin: Springer.
Zurück zum Zitat Koolagudi, S. G., & Rao, K. S. (2012a). Emotion recognition from speech: A review. International Journal of Speech Technology, 15, 99–117.CrossRef Koolagudi, S. G., & Rao, K. S. (2012a). Emotion recognition from speech: A review. International Journal of Speech Technology, 15, 99–117.CrossRef
Zurück zum Zitat Koolagudi, S. G., & Rao, K. S. (2012b). Emotion recognition from speech using source, system, and prosodic features. International Journal of Speech Technology, 15(2), 265–289.CrossRef Koolagudi, S. G., & Rao, K. S. (2012b). Emotion recognition from speech using source, system, and prosodic features. International Journal of Speech Technology, 15(2), 265–289.CrossRef
Zurück zum Zitat Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In International conference on signal processing and communications (SPCOM). Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In International conference on signal processing and communications (SPCOM).
Zurück zum Zitat Krothapalli, S. R., & Koolagudi, S. G. (2013). Characterization and recognition of emotions from speech using excitation source information. International Journal of Speech Technology, 16(2), 181–201.CrossRef Krothapalli, S. R., & Koolagudi, S. G. (2013). Characterization and recognition of emotions from speech using excitation source information. International Journal of Speech Technology, 16(2), 181–201.CrossRef
Zurück zum Zitat Kwon, O.-W., Chan, K., Hao, J., & Lee, T.-W. (2003). Emotion recognition by speech signals. In EUROSPEECH, pp. 125–128,. Kwon, O.-W., Chan, K., Hao, J., & Lee, T.-W. (2003). Emotion recognition by speech signals. In EUROSPEECH, pp. 125–128,.
Zurück zum Zitat Lanjewar, R. B., Mauhurkar, S., & Patel, N. (2015). Implementation and comparison of speech emotion recognition system using Gaussian mixture model and K-nearest neighbor techniques. Procedia Computer Science, 49, 50–57.CrossRef Lanjewar, R. B., Mauhurkar, S., & Patel, N. (2015). Implementation and comparison of speech emotion recognition system using Gaussian mixture model and K-nearest neighbor techniques. Procedia Computer Science, 49, 50–57.CrossRef
Zurück zum Zitat Lazarus, R. S. (1991). Emotion & adaptation. New York: Oxford University Press. Lazarus, R. S. (1991). Emotion & adaptation. New York: Oxford University Press.
Zurück zum Zitat Lee, C. M., & Narayanan, S. (2003). Emotion recognition using a data-driven fuzzy inference system. In European conference on speech and language processing (EUROSPEECH), Geneva, Switzerland, pp. 157–160. Lee, C. M., & Narayanan, S. (2003). Emotion recognition using a data-driven fuzzy inference system. In European conference on speech and language processing (EUROSPEECH), Geneva, Switzerland, pp. 157–160.
Zurück zum Zitat Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.CrossRef Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.CrossRef
Zurück zum Zitat Lee, C. M., Narayanan, S., & Pieraccini, R. (2001). Recognition of negative emotion in the human speech signals. In Workshop on auto, speech recognition and understanding. Lee, C. M., Narayanan, S., & Pieraccini, R. (2001). Recognition of negative emotion in the human speech signals. In Workshop on auto, speech recognition and understanding.
Zurück zum Zitat Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z. et al. (2004). Emotion recognition based on phoneme classes. In 8th international conference on spoken language processing, INTERSPEECH 2004, Korea. Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z. et al. (2004). Emotion recognition based on phoneme classes. In 8th international conference on spoken language processing, INTERSPEECH 2004, Korea.
Zurück zum Zitat Lee, C.-C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53, 1162–1171.CrossRef Lee, C.-C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53, 1162–1171.CrossRef
Zurück zum Zitat Lida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus based synthesis system with emotion. Speech Communication, 40, 161–187.CrossRefMATH Lida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus based synthesis system with emotion. Speech Communication, 40, 161–187.CrossRefMATH
Zurück zum Zitat Lin, Y.-L., & Wei, G. (2005). Speech emotion recognition based on HMM and SVM. In: Fourth International conference on machine learning and cybernetics, Guangzhou, pp. 4898–4901. Lin, Y.-L., & Wei, G. (2005). Speech emotion recognition based on HMM and SVM. In: Fourth International conference on machine learning and cybernetics, Guangzhou, pp. 4898–4901.
Zurück zum Zitat Lotfian, R., & Busso, C. (2015). Emotion recognition using synthetic speech as neutral reference. In IEEE International conference on ICASSP, pp. 4759–4763. Lotfian, R., & Busso, C. (2015). Emotion recognition using synthetic speech as neutral reference. In IEEE International conference on ICASSP, pp. 4759–4763.
Zurück zum Zitat Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH, Lisbon, Portugal, pp. 493–496. Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH, Lisbon, Portugal, pp. 493–496.
Zurück zum Zitat Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In ICASSP, Honolulu, Hawaii, pp. IV17–IV20. Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In ICASSP, Honolulu, Hawaii, pp. IV17–IV20.
Zurück zum Zitat Makarova, V., & Petrushin, V. A. (2002). RUSLANA: A database of Russian emotional utterances. In 7th International conference on spoken language processing (ICSLP 02), pp. 2041–2044. Makarova, V., & Petrushin, V. A. (2002). RUSLANA: A database of Russian emotional utterances. In 7th International conference on spoken language processing (ICSLP 02), pp. 2041–2044.
Zurück zum Zitat Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.CrossRef Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.CrossRef
Zurück zum Zitat McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000) Approaching automatic recognition of emotion from voice: A rough benchmark. In Proceedings of ISCA workshop speech emotion, pp. 207–212. McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000) Approaching automatic recognition of emotion from voice: A rough benchmark. In Proceedings of ISCA workshop speech emotion, pp. 207–212.
Zurück zum Zitat McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2007). The SEMAINE database: Annotated multimodal records of emotionally coloured conversations between a person and a limited agent. Journal of LATEX Class Files, 6(1), 1–14. McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2007). The SEMAINE database: Annotated multimodal records of emotionally coloured conversations between a person and a limited agent. Journal of LATEX Class Files, 6(1), 1–14.
Zurück zum Zitat Mencattini, A., Martinelli, E., Costantini, G., Todisco, M., Basile, B., Bozzali, M., & Di Natale, C. (2014). Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure. Knowledge-Based Systems, 63, 68–81.CrossRef Mencattini, A., Martinelli, E., Costantini, G., Todisco, M., Basile, B., Bozzali, M., & Di Natale, C. (2014). Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure. Knowledge-Based Systems, 63, 68–81.CrossRef
Zurück zum Zitat Mirsamadi, S., Barsoum, E., & Zhang, C. (2017). Automatic speech emotion recognition using recurrent neural networks with local attention. In Proceedings of IEEE conference on ICASSP, pp. 2227–2231. Mirsamadi, S., Barsoum, E., & Zhang, C. (2017). Automatic speech emotion recognition using recurrent neural networks with local attention. In Proceedings of IEEE conference on ICASSP, pp. 2227–2231.
Zurück zum Zitat Mohanty, S., & Swain, B. K. (2010). Emotion recognition using fuzzy K-means from Oriya speech. In International Conference [ACCTA-2010] on Special Issue of IJCCT, Vol. 1 Issue 2–4. Mohanty, S., & Swain, B. K. (2010). Emotion recognition using fuzzy K-means from Oriya speech. In International Conference [ACCTA-2010] on Special Issue of IJCCT, Vol. 1 Issue 2–4.
Zurück zum Zitat Montero, J. M., Gutiérrez-Arriola, J., Colás, J., Enríquez, E., & Pardo, J. M. (1999). Analysis and modeling of emotional speech in Spanish. In Proceedings of international conference on phonetic sciences, pp. 957–960. Montero, J. M., Gutiérrez-Arriola, J., Colás, J., Enríquez, E., & Pardo, J. M. (1999). Analysis and modeling of emotional speech in Spanish. In Proceedings of international conference on phonetic sciences, pp. 957–960.
Zurück zum Zitat Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49, 98–112.CrossRef Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49, 98–112.CrossRef
Zurück zum Zitat Nakatsu, R., Nicholson, J., & Tosa, N. (2000). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. Knowledge-Based Systems, 13, 497–504.CrossRef Nakatsu, R., Nicholson, J., & Tosa, N. (2000). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. Knowledge-Based Systems, 13, 497–504.CrossRef
Zurück zum Zitat Nandi, D., Pati, D., & Rao, K. S. (2017). Parametric representation of excitation source information for language identification. Computer Speech and Language, 41, 88–115.CrossRef Nandi, D., Pati, D., & Rao, K. S. (2017). Parametric representation of excitation source information for language identification. Computer Speech and Language, 41, 88–115.CrossRef
Zurück zum Zitat Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In INTERSPEECH 2006, ICSLP, Pittsburgh, Pennsylvania, pp. 809–812. Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In INTERSPEECH 2006, ICSLP, Pittsburgh, Pennsylvania, pp. 809–812.
Zurück zum Zitat New, T. L., Wei, F. S., & De Silva, L. C. (2001). Speech based emotion classification. In Proceedings of the IEEE region 10 international conference on electrical and electronic technology (TENCON), Phuket Island, Singapore, Vol. 1, pp 297–301. New, T. L., Wei, F. S., & De Silva, L. C. (2001). Speech based emotion classification. In Proceedings of the IEEE region 10 international conference on electrical and electronic technology (TENCON), Phuket Island, Singapore, Vol. 1, pp 297–301.
Zurück zum Zitat New, T. L., Wei, F. S., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.CrossRef New, T. L., Wei, F. S., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.CrossRef
Zurück zum Zitat Nicholson, J., Takahashi, K., & Nakatsu, R. (2006). Emotion recognition in speech using neural networks. Neural Computing & Applications, 11, 290–296.MATH Nicholson, J., Takahashi, K., & Nakatsu, R. (2006). Emotion recognition in speech using neural networks. Neural Computing & Applications, 11, 290–296.MATH
Zurück zum Zitat Nogueiras, A., Marino, J. B., Moreno, A., & Bonafonte, A. (2001). Speech emotion recognition using hidden Markov models. In Proceedings of European conference on speech communication and technology (Eurospeech’01), Denmark. Nogueiras, A., Marino, J. B., Moreno, A., & Bonafonte, A. (2001). Speech emotion recognition using hidden Markov models. In Proceedings of European conference on speech communication and technology (Eurospeech’01), Denmark.
Zurück zum Zitat Nordstrand, L., Svanfeld, G., Granstrom, B., & House, D. (2004). Measurements of ariculatory variation in expressive speech for a set of Swedish vowels. Speech Communication, 44, 187–196.CrossRef Nordstrand, L., Svanfeld, G., Granstrom, B., & House, D. (2004). Measurements of ariculatory variation in expressive speech for a set of Swedish vowels. Speech Communication, 44, 187–196.CrossRef
Zurück zum Zitat Ooi, C. S., Seng, K. P., Ang, L.-M., & Chew, L. W. (2014). A new approach of audio emotion recognition. Experts Systems with Applications, 41, 5858–5869.CrossRef Ooi, C. S., Seng, K. P., Ang, L.-M., & Chew, L. W. (2014). A new approach of audio emotion recognition. Experts Systems with Applications, 41, 5858–5869.CrossRef
Zurück zum Zitat Pao, T.-L., Chen, Y.-T., Yeh, J.-H., & Liao, W.-Y. (2005). Combining acoustic features for improved emotion recognition in Mandarin speech. In International conference on affective computing and intelligent interaction, pp. 279–285. Pao, T.-L., Chen, Y.-T., Yeh, J.-H., & Liao, W.-Y. (2005). Combining acoustic features for improved emotion recognition in Mandarin speech. In International conference on affective computing and intelligent interaction, pp. 279–285.
Zurück zum Zitat Park, C.-H., & Sim, K.-B. (2003). Emotion recognition and acoustic analysis from speech signal. In Proceedings of the international joint conference on neural networks, pp. 2594–2598. Park, C.-H., & Sim, K.-B. (2003). Emotion recognition and acoustic analysis from speech signal. In Proceedings of the international joint conference on neural networks, pp. 2594–2598.
Zurück zum Zitat Pereira, C. (2000). Dimensions of emotional meaning in speech. In Proceedings of ISCA workshop speech and emotion, Belfast, Vol. 1, pp. 25–28. Pereira, C. (2000). Dimensions of emotional meaning in speech. In Proceedings of ISCA workshop speech and emotion, Belfast, Vol. 1, pp. 25–28.
Zurück zum Zitat Petrushin, V. A. (1999). Emotion in speech: Recognition and application to call centers. In Proceedings of the 1999 conference on artificial neural networks in engineering (ANNIE 99). Petrushin, V. A. (1999). Emotion in speech: Recognition and application to call centers. In Proceedings of the 1999 conference on artificial neural networks in engineering (ANNIE 99).
Zurück zum Zitat Picard, R. W., Vyzas, E., & Healey, J. (2001). Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1175–1191.CrossRef Picard, R. W., Vyzas, E., & Healey, J. (2001). Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1175–1191.CrossRef
Zurück zum Zitat Power, M., & Dalgleish, T. (2000). Cognition and emotion from order to disorder. New York: Psychology Press. Power, M., & Dalgleish, T. (2000). Cognition and emotion from order to disorder. New York: Psychology Press.
Zurück zum Zitat Prasanna, S. R. M., & Govind, D. (2010). Analysis of excitation source information in emotional speech. In INTERSPEECH 2010, Makuhari, Chiba, Japan, pp. 781–784. Prasanna, S. R. M., & Govind, D. (2010). Analysis of excitation source information in emotional speech. In INTERSPEECH 2010, Makuhari, Chiba, Japan, pp. 781–784.
Zurück zum Zitat Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.CrossRef Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.CrossRef
Zurück zum Zitat Pravena, D., & Govind, D. (2017). Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals. International Journal of Speech Technology, 20(4), 787–797.CrossRef Pravena, D., & Govind, D. (2017). Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals. International Journal of Speech Technology, 20(4), 787–797.CrossRef
Zurück zum Zitat Pravena, D., & Govind, D. (2017). Development of simulated emotion speech database for excitation source analysis. International Journal of Speech Technology, 20, 327–338.CrossRef Pravena, D., & Govind, D. (2017). Development of simulated emotion speech database for excitation source analysis. International Journal of Speech Technology, 20, 327–338.CrossRef
Zurück zum Zitat Quiros-Ramirez, M. A., Polikovsky, S., Kameda, Y., & Onisawa, T. (2014). A spontaneous cross-cultural emotion database: Latin-America vs. Japan. In International conference on Kansei Engineering and emotion research, pp. 1127–1134. Quiros-Ramirez, M. A., Polikovsky, S., Kameda, Y., & Onisawa, T. (2014). A spontaneous cross-cultural emotion database: Latin-America vs. Japan. In International conference on Kansei Engineering and emotion research, pp. 1127–1134.
Zurück zum Zitat Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.MATH Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.MATH
Zurück zum Zitat Rahurkar, M. A., & Hansen, J. H. (2002). Frequency band analysis for stress detection using a Teager energy operator based feature. Proceedings of International Conference on Spoken Language Processing (ICSLP’), Vol. 3, issue 02, pp. 2021–2024. Rahurkar, M. A., & Hansen, J. H. (2002). Frequency band analysis for stress detection using a Teager energy operator based feature. Proceedings of International Conference on Spoken Language Processing (ICSLP’), Vol. 3, issue 02, pp. 2021–2024.
Zurück zum Zitat Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. In IEEE transactions on audio, speech and language processing, Vol. 14, p. 3. Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. In IEEE transactions on audio, speech and language processing, Vol. 14, p. 3.
Zurück zum Zitat Rao, K. S., & Koolagudi, S. G. (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Systemics, Cybernetics, and Informatics, 9(4), 24–33. Rao, K. S., & Koolagudi, S. G. (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Systemics, Cybernetics, and Informatics, 9(4), 24–33.
Zurück zum Zitat Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16(2), 143–160.CrossRef Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16(2), 143–160.CrossRef
Zurück zum Zitat Rao, K. S., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S. V. S. K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3, 3603–3607. Rao, K. S., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S. V. S. K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3, 3603–3607.
Zurück zum Zitat Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765.CrossRef Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765.CrossRef
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. In IEEE transactions on audio and speech, pp. 972–980. Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. In IEEE transactions on audio and speech, pp. 972–980.
Zurück zum Zitat Rong, J., Li, G., & Chen, Y. P. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing and Management, 45, 315–328.CrossRef Rong, J., Li, G., & Chen, Y. P. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing and Management, 45, 315–328.CrossRef
Zurück zum Zitat Rozgic, V., Ananthakrishnan, S., Saleem, S., Kumar, R., Vembu, A. N., & Prasad, R. (2012). Emotion recognition using acoustic and lexical features. In INTERSPEECH, Portland, USA. Rozgic, V., Ananthakrishnan, S., Saleem, S., Kumar, R., Vembu, A. N., & Prasad, R. (2012). Emotion recognition using acoustic and lexical features. In INTERSPEECH, Portland, USA.
Zurück zum Zitat Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39, 1161–1178.CrossRef Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39, 1161–1178.CrossRef
Zurück zum Zitat Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76, 805–819.CrossRef Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76, 805–819.CrossRef
Zurück zum Zitat Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11, 273–294.CrossRef Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11, 273–294.CrossRef
Zurück zum Zitat Salovey, P., Kokkonen, M., Lopes, P., & Mayer, J. (2004). Emotional Intelligence: What do we know? In ASR Manstead, N. H. Frijda & A. H. Fischer (Eds.), Feelings and emotions: The Amsterdam symposium (pp. 321–340). Cambridge: Cambridge University Press.CrossRef Salovey, P., Kokkonen, M., Lopes, P., & Mayer, J. (2004). Emotional Intelligence: What do we know? In ASR Manstead, N. H. Frijda & A. H. Fischer (Eds.), Feelings and emotions: The Amsterdam symposium (pp. 321–340). Cambridge: Cambridge University Press.CrossRef
Zurück zum Zitat Schachter, S., & Singer, J. (1962). Cognitive, social, and physiological determinants of emotional state. Psychological Review, 69, 379–399.CrossRef Schachter, S., & Singer, J. (1962). Cognitive, social, and physiological determinants of emotional state. Psychological Review, 69, 379–399.CrossRef
Zurück zum Zitat Scherer, K. R., Grandjean, D., Johnstone, T., Klasmeyer, G., & Banziger, T. (2002). Acoustic correlates of task load and stress. In Proceedings of international conference on spoken language processing (ICSLP’02), Colorado, Vol. 3, pp. 2017–2020. Scherer, K. R., Grandjean, D., Johnstone, T., Klasmeyer, G., & Banziger, T. (2002). Acoustic correlates of task load and stress. In Proceedings of international conference on spoken language processing (ICSLP’02), Colorado, Vol. 3, pp. 2017–2020.
Zurück zum Zitat Schroder, M. (2000). Experimental study of affect bursts. In Proceedings of ISCA workshop speech and emotion, Vol. 1, pp. 132–137. Schroder, M. (2000). Experimental study of affect bursts. In Proceedings of ISCA workshop speech and emotion, Vol. 1, pp. 132–137.
Zurück zum Zitat Schroder, M., & Grice, M. (2003). Expressing vocal effort in concatenative synthesis. In Proceedings of international conference on phonetic sciences (ICPhS’03), Barcelona, pp. 2589–2592. Schroder, M., & Grice, M. (2003). Expressing vocal effort in concatenative synthesis. In Proceedings of international conference on phonetic sciences (ICPhS’03), Barcelona, pp. 2589–2592.
Zurück zum Zitat Schubert, E. (1999). Measurement and time series analysis of emotion in music, Ph.D dissertation, school of Music education, University of New South Wales, Sydeny, Australia. Schubert, E. (1999). Measurement and time series analysis of emotion in music, Ph.D dissertation, school of Music education, University of New South Wales, Sydeny, Australia.
Zurück zum Zitat Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model based speech emotion recognition. In Proceedings of the International conference on multimedia and Expo, ICME. Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model based speech emotion recognition. In Proceedings of the International conference on multimedia and Expo, ICME.
Zurück zum Zitat Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistis information in a hybrid support vector machine-belief network architecture. In Proceedings of international conference on acoustics, speech and signal processing (ICASSP’04), Vol. 1, pp. 557–560. Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistis information in a hybrid support vector machine-belief network architecture. In Proceedings of international conference on acoustics, speech and signal processing (ICASSP’04), Vol. 1, pp. 557–560.
Zurück zum Zitat Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.CrossRef Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.CrossRef
Zurück zum Zitat Slaney, M., & McRoberts, G. (2003). Babyears: A recognition system for affective vocalizations. Speech Comunnication, 39, 367–384.CrossRefMATH Slaney, M., & McRoberts, G. (2003). Babyears: A recognition system for affective vocalizations. Speech Comunnication, 39, 367–384.CrossRefMATH
Zurück zum Zitat Song, P., Ou, S., Zheng, W., Jin, Y., & Zhao, L. (2016). Speech emotion recognition using transfer non-negative matrix factorization. In Proceedings of IEEE international conference ICASSP, pp. 5180–5184. Song, P., Ou, S., Zheng, W., Jin, Y., & Zhao, L. (2016). Speech emotion recognition using transfer non-negative matrix factorization. In Proceedings of IEEE international conference ICASSP, pp. 5180–5184.
Zurück zum Zitat Sun, R., & Moore, E. (2011). Investigating glottal parameters and teager energy operators in emotion recognition. In Affective Computing and Intelligent Interaction, pp. 425–434. Sun, R., & Moore, E. (2011). Investigating glottal parameters and teager energy operators in emotion recognition. In Affective Computing and Intelligent Interaction, pp. 425–434.
Zurück zum Zitat Takahashi, K. (2004). Remarks on SVM-based emotion recognition from multi-modal bio-potential signals. In 13th IEEE international workshop on robot and human interactive communication, Roman. Takahashi, K. (2004). Remarks on SVM-based emotion recognition from multi-modal bio-potential signals. In 13th IEEE international workshop on robot and human interactive communication, Roman.
Zurück zum Zitat Tao, J., & Kang, Y. (2005). Features importance analysis for emotional speech classification. In Affective Computing and Intelligent Interaction, pp. 449–457. Tao, J., & Kang, Y. (2005). Features importance analysis for emotional speech classification. In Affective Computing and Intelligent Interaction, pp. 449–457.
Zurück zum Zitat Tato, R., Santos, R., Kompe, R., & Pardo, J. M. (2002). Emotional space improves emotion recognition. In Proceedings of international conference on spoken language processing (ICSLP’02), Colorado, Vol. 3, pp. 2029–2032. Tato, R., Santos, R., Kompe, R., & Pardo, J. M. (2002). Emotional space improves emotion recognition. In Proceedings of international conference on spoken language processing (ICSLP’02), Colorado, Vol. 3, pp. 2029–2032.
Zurück zum Zitat Tomkins, S. (1962). Affect imagery and consciousness: The positive affects, Vol. 1. New York: Springer. Tomkins, S. (1962). Affect imagery and consciousness: The positive affects, Vol. 1. New York: Springer.
Zurück zum Zitat Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features and methods. Speech Communication, 48, 1162–1181.CrossRef Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features and methods. Speech Communication, 48, 1162–1181.CrossRef
Zurück zum Zitat Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In Proceedings of international conference on acoustics, speech and signal processing (ICASSP’04), Montreal, Vol. 1, pp. 593–596. Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In Proceedings of international conference on acoustics, speech and signal processing (ICASSP’04), Montreal, Vol. 1, pp. 593–596.
Zurück zum Zitat Vidrascu, L., & Devillers, L. (2005). Detection of real-life emotions in call centers. In INTERSPEECH, Lisbon, Portugal, pp. 1841–1844. Vidrascu, L., & Devillers, L. (2005). Detection of real-life emotions in call centers. In INTERSPEECH, Lisbon, Portugal, pp. 1841–1844.
Zurück zum Zitat Vogt, T., & André, E. (2006). Improving automatic from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC 2006), Genoa. Vogt, T., & André, E. (2006). Improving automatic from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC 2006), Genoa.
Zurück zum Zitat Wakita, H. (1976). Residual energy of linear prediction to vowel and speaker recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 270–271.CrossRef Wakita, H. (1976). Residual energy of linear prediction to vowel and speaker recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 270–271.CrossRef
Zurück zum Zitat Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015). Speech emotion recognition using Fourier parameters. IEEE Transcations on Affective Computing, 6(1), 69–75. Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015). Speech emotion recognition using Fourier parameters. IEEE Transcations on Affective Computing, 6(1), 69–75.
Zurück zum Zitat Wang, Y., Du, S., & Zhan, Y. (2008). Adaptive and optimal classification of speech emotion recognition. In Fourth international conference on natural computation, pp. 407–411. Wang, Y., Du, S., & Zhan, Y. (2008). Adaptive and optimal classification of speech emotion recognition. In Fourth international conference on natural computation, pp. 407–411.
Zurück zum Zitat Wang, Y., & Guan, L. (2004). An investigation of speech based human emotion recognition. In IEEE 6th workshop on multimedia signal processing. Wang, Y., & Guan, L. (2004). An investigation of speech based human emotion recognition. In IEEE 6th workshop on multimedia signal processing.
Zurück zum Zitat Werner, S., & Keller, E. (1994). Prosodic aspects of speech. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition: Basic concepts, state of the art, the future challenges (pp. 23–40). Chichester: Wiley. Werner, S., & Keller, E. (1994). Prosodic aspects of speech. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition: Basic concepts, state of the art, the future challenges (pp. 23–40). Chichester: Wiley.
Zurück zum Zitat Wu, S., Falk, T. H., & Chan, W.-Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785.CrossRef Wu, S., Falk, T. H., & Chan, W.-Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785.CrossRef
Zurück zum Zitat Wu, T., Yang, Y., Wu, Z., & Li, D. (2006). MASC: a speech corpus in mandarin for emotion analysis and affective speaker recognition. In Speaker and language recognition workshop. Wu, T., Yang, Y., Wu, Z., & Li, D. (2006). MASC: a speech corpus in mandarin for emotion analysis and affective speaker recognition. In Speaker and language recognition workshop.
Zurück zum Zitat Wu, W., Zheng, T. F., Xu, M.-X., & Bao, H.-J. (2006). Study on speaker verification on emotional speech. In INTERSPEECH’06, Piisburgh, Pennsylvania, pp. 2102–2105. Wu, W., Zheng, T. F., Xu, M.-X., & Bao, H.-J. (2006). Study on speaker verification on emotional speech. In INTERSPEECH’06, Piisburgh, Pennsylvania, pp. 2102–2105.
Zurück zum Zitat Wundt, W. (2013). An introduction to psychology. Read Books Ltd. Wundt, W. (2013). An introduction to psychology. Read Books Ltd.
Zurück zum Zitat Yamagishi, J., Onishi, K., Maskko, T., & Kobayashi, T. (2003). Emotion recognition using a data-driven fuzzy inference system. Geneva: Eurospeech. Yamagishi, J., Onishi, K., Maskko, T., & Kobayashi, T. (2003). Emotion recognition using a data-driven fuzzy inference system. Geneva: Eurospeech.
Zurück zum Zitat Yegnanarayana, B., & Gangashetty, S. (2011). Epoch-based analysis of speech signals. S¯adhan¯ a, 36(5), 651–697. Yegnanarayana, B., & Gangashetty, S. (2011). Epoch-based analysis of speech signals. S¯adhan¯ a, 36(5), 651–697.
Zurück zum Zitat Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speechspecific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.CrossRef Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speechspecific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.CrossRef
Zurück zum Zitat Yeh, L., & Chi, T. (2010). Spectro-temporal modulations for robust speech emotion recognition. In INTERSPEECH, Chiba, Japan, pp. 789–792. Yeh, L., & Chi, T. (2010). Spectro-temporal modulations for robust speech emotion recognition. In INTERSPEECH, Chiba, Japan, pp. 789–792.
Zurück zum Zitat Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., & Narayanan, S. (2004). An acoustic study of emotions expressed in speech. In Proceedings of International Conference on Spoken Language Processing (ICSLP’04), Korea, Vol. 1, pp. 2193–2196. Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., & Narayanan, S. (2004). An acoustic study of emotions expressed in speech. In Proceedings of International Conference on Spoken Language Processing (ICSLP’04), Korea, Vol. 1, pp. 2193–2196.
Zurück zum Zitat You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (1997). Getting started with susas: a speech under simulated and actual stress database. Eurospeech, 4, 1743–1746. You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (1997). Getting started with susas: a speech under simulated and actual stress database. Eurospeech, 4, 1743–1746.
Zurück zum Zitat Yu, F., Chang, E., Xu, Y.-Q., & Shum, H.-Y. (2001). Emotion detection from speech to enrich multimedia content. In: Proceedings of IEEE Pacific-Rim Conference on Multimedia, Beijing, Vol. 1, pp. 550–557. Yu, F., Chang, E., Xu, Y.-Q., & Shum, H.-Y. (2001). Emotion detection from speech to enrich multimedia content. In: Proceedings of IEEE Pacific-Rim Conference on Multimedia, Beijing, Vol. 1, pp. 550–557.
Zurück zum Zitat Yuan, J., Shen, L., & Chen, F. (2002). The acoustic realization of anger, fear, joy and sadness in Chinese. In Proceedings of International Conference on Spoken Language Processing (ICSLP’02), Vol. 3, pp. 2025–2028. Yuan, J., Shen, L., & Chen, F. (2002). The acoustic realization of anger, fear, joy and sadness in Chinese. In Proceedings of International Conference on Spoken Language Processing (ICSLP’02), Vol. 3, pp. 2025–2028.
Zurück zum Zitat Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Sun et al. (Ed.), Advances in neural networks. Lecture notes in computer science (pp. 457–464). Berlin: Springer. Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Sun et al. (Ed.), Advances in neural networks. Lecture notes in computer science (pp. 457–464). Berlin: Springer.
Zurück zum Zitat Zhang, T., Hasegawa-Johnson, M., & Levinson, S. E. (2004). Children’s emotion recognition in an intelligent tutoring scenario. In Proceeding of the eighth European Conference on Speech Communication and Technology, INTERSPEECH. Zhang, T., Hasegawa-Johnson, M., & Levinson, S. E. (2004). Children’s emotion recognition in an intelligent tutoring scenario. In Proceeding of the eighth European Conference on Speech Communication and Technology, INTERSPEECH.
Zurück zum Zitat Zhu, A., & Luo, Q. (2007). Study on speech emotion recognition system in E-learning. In J. Jacko (Ed.), Human computer interaction, Part III, HCII (pp. 544–552). Berlin: Springer. Zhu, A., & Luo, Q. (2007). Study on speech emotion recognition system in E-learning. In J. Jacko (Ed.), Human computer interaction, Part III, HCII (pp. 544–552). Berlin: Springer.
Metadaten
Titel
Databases, features and classifiers for speech emotion recognition: a review
verfasst von
Monorama Swain
Aurobinda Routray
P. Kabisatpathy
Publikationsdatum
19.01.2018
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9491-z

Weitere Artikel der Ausgabe 1/2018

International Journal of Speech Technology 1/2018 Zur Ausgabe

Neuer Inhalt