Skip to main content

2018 | OriginalPaper | Buchkapitel

Combined Feature Representation for Emotion Classification from Russian Speech

verfasst von : Oxana Verkholyak, Alexey Karpov

Erschienen in: Artificial Intelligence and Natural Language

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Acoustic feature extraction for emotion classification is possible on different levels. Frame-level features provide low-level description characteristics that preserve temporal structure of the utterance. On the other hand, utterance-level features represent functionals applied to the low-level descriptors and contain important information about speaker emotional state. Utterance-level features are particularly useful for determining emotion intensity, however, they lose information about temporal changes of the signal. Another drawback includes often insufficient number of feature vectors for complex classification tasks. One solution to overcome these problems is to combine the frame-level features and utterance-level features to take advantage of both methods. This paper proposes to obtain low-level feature representation feeding frame-level descriptor sequences to a Long Short-Term Memory (LSTM) network, combine the outcome with the Principal Component Analysis (PCA) representation of utterance-level features, and make the final prediction with a logistic regression classifier.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Metallinou, A., Wollmer, M., Katsamanis, A., Eyben, F., Schuller, B., Narayanan, S.: Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affect. Comput. 3(2), 184–198 (2012)CrossRef Metallinou, A., Wollmer, M., Katsamanis, A., Eyben, F., Schuller, B., Narayanan, S.: Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affect. Comput. 3(2), 184–198 (2012)CrossRef
2.
3.
Zurück zum Zitat Kim, Y., Honglak, L., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing ICASSP-2013, pp. 3687–3691 (2013) Kim, Y., Honglak, L., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing ICASSP-2013, pp. 3687–3691 (2013)
4.
Zurück zum Zitat Hochreiter, S., Jürgen, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Jürgen, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
5.
Zurück zum Zitat Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G.: Combining frame and turn-level information for robust recognition of emotions within speech. In: Proceedings of 8th International Conference INTERSPEECH-2007, Antwerp, Belgium, pp. 2249–2252 (2007) Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G.: Combining frame and turn-level information for robust recognition of emotions within speech. In: Proceedings of 8th International Conference INTERSPEECH-2007, Antwerp, Belgium, pp. 2249–2252 (2007)
6.
Zurück zum Zitat Eyben, F., Wöllmer, M., Schuller, B.: openSMILE – the Munich versatile and fast open-source audio feature extractor. In: Proceedings of 18th ACM International Conference on Multimedia, Florence, Italy, pp. 1459–1462 (2010) Eyben, F., Wöllmer, M., Schuller, B.: openSMILE – the Munich versatile and fast open-source audio feature extractor. In: Proceedings of 18th ACM International Conference on Multimedia, Florence, Italy, pp. 1459–1462 (2010)
7.
Zurück zum Zitat Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., Narayanan, S.S.: The INTERSPEECH 2010 paralinguistic challenge. In: Proceedings 11th International Conference INTERSPEECH-2010, Makuhari, Japan, pp. 2795–2798 (2010) Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., Narayanan, S.S.: The INTERSPEECH 2010 paralinguistic challenge. In: Proceedings 11th International Conference INTERSPEECH-2010, Makuhari, Japan, pp. 2795–2798 (2010)
8.
Zurück zum Zitat Verkholyak, O.: Research on methods of automatic emotion recognition in Russian speech. Ms. dissertation, ITMO University, St. Petersburg, Russia (2017) Verkholyak, O.: Research on methods of automatic emotion recognition in Russian speech. Ms. dissertation, ITMO University, St. Petersburg, Russia (2017)
10.
Zurück zum Zitat Jolliffe, I.: Principal Component Analysis. Wiley, Indianapolis (2002)MATH Jolliffe, I.: Principal Component Analysis. Wiley, Indianapolis (2002)MATH
11.
Zurück zum Zitat Sidorov, M.: Automatic recognition of paralinguistic information. Ph.D. dissertation, Ulm University, Ulm, Germany (2016) Sidorov, M.: Automatic recognition of paralinguistic information. Ph.D. dissertation, Ulm University, Ulm, Germany (2016)
12.
Zurück zum Zitat Makarova, V., Petrushin, V.A.: RUSLANA: a database of Russian emotional utterances. In: Proceedings of 7th International Conference on Spoken Language Processing ICSLP-2002, Denver, Colorado, USA, pp. 2041–2044 (2002) Makarova, V., Petrushin, V.A.: RUSLANA: a database of Russian emotional utterances. In: Proceedings of 7th International Conference on Spoken Language Processing ICSLP-2002, Denver, Colorado, USA, pp. 2041–2044 (2002)
Metadaten
Titel
Combined Feature Representation for Emotion Classification from Russian Speech
verfasst von
Oxana Verkholyak
Alexey Karpov
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-71746-3_6