nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Gender-Aware CNN-BLSTM for Speech Emotion Recognition

verfasst von : Linjuan Zhang, Longbiao Wang, Jianwu Dang, Lili Guo, Qiang Yu

Erschienen in: Artificial Neural Networks and Machine Learning – ICANN 2018

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Gender information has been widely used to improve the performance of speech emotion recognition (SER) due to different expressing styles of men and women. However, conventional methods cannot adequately utilize gender information by simply representing gender characteristics with a fixed unique integer or one-hot encoding. In order to emphasize the gender factors for SER, we propose two types of features for our framework, namely distributed-gender feature and gender-driven feature. The distributed-gender feature is constructed in a way to represent the gender distribution as well as individual differences, while the gender-driven feature is extracted from acoustic signals through a deep neural network (DNN). These two proposed features are then augmented into the original spectrogram respectively to serve as the input for the following decision-making network, where we construct a hybrid one by combining convolutional neural network (CNN) and bi-directional long short-term memory (BLSTM). Compared with spectrogram only, adding the distributed-gender feature and gender-driven feature in gender-aware CNN-BLSTM improved unweighted accuracy by relative error reduction of 14.04% and 45.74%, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Fast and Accurate Affect Prediction Using a Hierarchy of Random Forests

Nächstes Kapitel Semi-supervised Model for Emotion Recognition in Speech

Brody, L.R.: Gender differences in emotional development: a review of theories and research. J. Pers. 53(2), 102–149 (1985)CrossRef

Hall, J.A., Carter, J.D., Horgan, T.: Gender differences in nonverbal communication of emotion. In: Gender and Emotion: Social Psychological Perspectives, pp. 97–117 (2000). https://doi.org/10.1017/CBO9780511628191.006

Sidorov, M., Ultes, S., Schmitt, A.: Comparison of Gender-and Speaker-adaptive Emotion Recognition. In: Language Resources and Evaluation Conference, pp. 3476–3480 (2014)

Sidorov, M., Ultes, S., Schmitt, A.: Emotions are a personal thing: towards speaker-adaptive emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4803–4807 (2014)

Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Language Resources and Evaluation Conference, Genoa (2006)

Sidorov, M., Schmitt, A., Semenkin, E., et al.: Could speaker, gender or age awareness be beneficial in speech-based emotion recognition? In: Language Resources and Evaluation Conference (2016)

Schuller, B., Steidl S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association (2009)

Hannun, A., Case, C., Casper, J., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)

Amodei, D., Ananthanarayanan, S., Anubhai, R., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)

10.

Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Signal and Information Processing Association Annual Summit and Conference, Asia-Pacific, pp. 1–4. IEEE (2016)

11.

Satt, A., Rozenberg, S., Hoory, R.: Efficient emotion recognition from speech using deep learning on spectrograms. In: Proceedings of INTERSPEECH 2017, pp. 1089–1093 (2017)

12.

Guo, L., Wang, L., Dang, J., Zhang, L., Guan, H.: A feature fusion method based on extreme learning machine for speech emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2666–2670 (2018)

13.

Scherer, K.R.: Emotion. In: Stroebe, W., Jonas, K., Hewstone, M. (eds.) Sozialpsychologie. Springer-Lehrbuch, pp. 165–213. Springer, Heidelberg (2002). https://doi.org/10.1007/978-3-662-08008-5_6CrossRef

14.

Grezl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4729–4732 (2008)

15.

Petrushin, V.A.: Emotion recognition in speech signal: experimental study, development, and application. In: Sixth International Conference on Spoken Language Processing (2000)

16.

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)

17.

Yu, D., et al.: Deep convolutional neural networks with layer-wise context expansion and attention. In: INTERSPEECH, pp. 17–21 (2016)

18.

Lee, J., Tashev, I.: High-level feature representation using recurrent neural network for speech emotion recognition. In: INTERSPEECH (2015)

Titel: Gender-Aware CNN-BLSTM for Speech Emotion Recognition
verfasst von: Linjuan Zhang
Longbiao Wang
Jianwu Dang
Lili Guo
Qiang Yu
Verlag: Springer International Publishing
Buch: Artificial Neural Networks and Machine Learning – ICANN 2018
Print ISBN: 978-3-030-01417-9

Electronic ISBN: 978-3-030-01418-6

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-01418-6_76

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"