Skip to main content

2018 | OriginalPaper | Buchkapitel

Gender-Aware CNN-BLSTM for Speech Emotion Recognition

verfasst von : Linjuan Zhang, Longbiao Wang, Jianwu Dang, Lili Guo, Qiang Yu

Erschienen in: Artificial Neural Networks and Machine Learning – ICANN 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Gender information has been widely used to improve the performance of speech emotion recognition (SER) due to different expressing styles of men and women. However, conventional methods cannot adequately utilize gender information by simply representing gender characteristics with a fixed unique integer or one-hot encoding. In order to emphasize the gender factors for SER, we propose two types of features for our framework, namely distributed-gender feature and gender-driven feature. The distributed-gender feature is constructed in a way to represent the gender distribution as well as individual differences, while the gender-driven feature is extracted from acoustic signals through a deep neural network (DNN). These two proposed features are then augmented into the original spectrogram respectively to serve as the input for the following decision-making network, where we construct a hybrid one by combining convolutional neural network (CNN) and bi-directional long short-term memory (BLSTM). Compared with spectrogram only, adding the distributed-gender feature and gender-driven feature in gender-aware CNN-BLSTM improved unweighted accuracy by relative error reduction of 14.04% and 45.74%, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Brody, L.R.: Gender differences in emotional development: a review of theories and research. J. Pers. 53(2), 102–149 (1985)CrossRef Brody, L.R.: Gender differences in emotional development: a review of theories and research. J. Pers. 53(2), 102–149 (1985)CrossRef
3.
Zurück zum Zitat Sidorov, M., Ultes, S., Schmitt, A.: Comparison of Gender-and Speaker-adaptive Emotion Recognition. In: Language Resources and Evaluation Conference, pp. 3476–3480 (2014) Sidorov, M., Ultes, S., Schmitt, A.: Comparison of Gender-and Speaker-adaptive Emotion Recognition. In: Language Resources and Evaluation Conference, pp. 3476–3480 (2014)
4.
Zurück zum Zitat Sidorov, M., Ultes, S., Schmitt, A.: Emotions are a personal thing: towards speaker-adaptive emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4803–4807 (2014) Sidorov, M., Ultes, S., Schmitt, A.: Emotions are a personal thing: towards speaker-adaptive emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4803–4807 (2014)
5.
Zurück zum Zitat Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Language Resources and Evaluation Conference, Genoa (2006) Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Language Resources and Evaluation Conference, Genoa (2006)
6.
Zurück zum Zitat Sidorov, M., Schmitt, A., Semenkin, E., et al.: Could speaker, gender or age awareness be beneficial in speech-based emotion recognition? In: Language Resources and Evaluation Conference (2016) Sidorov, M., Schmitt, A., Semenkin, E., et al.: Could speaker, gender or age awareness be beneficial in speech-based emotion recognition? In: Language Resources and Evaluation Conference (2016)
7.
Zurück zum Zitat Schuller, B., Steidl S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association (2009) Schuller, B., Steidl S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association (2009)
8.
Zurück zum Zitat Hannun, A., Case, C., Casper, J., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014) Hannun, A., Case, C., Casper, J., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:​1412.​5567 (2014)
9.
Zurück zum Zitat Amodei, D., Ananthanarayanan, S., Anubhai, R., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016) Amodei, D., Ananthanarayanan, S., Anubhai, R., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)
10.
Zurück zum Zitat Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Signal and Information Processing Association Annual Summit and Conference, Asia-Pacific, pp. 1–4. IEEE (2016) Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Signal and Information Processing Association Annual Summit and Conference, Asia-Pacific, pp. 1–4. IEEE (2016)
11.
Zurück zum Zitat Satt, A., Rozenberg, S., Hoory, R.: Efficient emotion recognition from speech using deep learning on spectrograms. In: Proceedings of INTERSPEECH 2017, pp. 1089–1093 (2017) Satt, A., Rozenberg, S., Hoory, R.: Efficient emotion recognition from speech using deep learning on spectrograms. In: Proceedings of INTERSPEECH 2017, pp. 1089–1093 (2017)
12.
Zurück zum Zitat Guo, L., Wang, L., Dang, J., Zhang, L., Guan, H.: A feature fusion method based on extreme learning machine for speech emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2666–2670 (2018) Guo, L., Wang, L., Dang, J., Zhang, L., Guan, H.: A feature fusion method based on extreme learning machine for speech emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2666–2670 (2018)
14.
Zurück zum Zitat Grezl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4729–4732 (2008) Grezl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4729–4732 (2008)
15.
Zurück zum Zitat Petrushin, V.A.: Emotion recognition in speech signal: experimental study, development, and application. In: Sixth International Conference on Spoken Language Processing (2000) Petrushin, V.A.: Emotion recognition in speech signal: experimental study, development, and application. In: Sixth International Conference on Spoken Language Processing (2000)
16.
Zurück zum Zitat Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005) Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)
17.
Zurück zum Zitat Yu, D., et al.: Deep convolutional neural networks with layer-wise context expansion and attention. In: INTERSPEECH, pp. 17–21 (2016) Yu, D., et al.: Deep convolutional neural networks with layer-wise context expansion and attention. In: INTERSPEECH, pp. 17–21 (2016)
18.
Zurück zum Zitat Lee, J., Tashev, I.: High-level feature representation using recurrent neural network for speech emotion recognition. In: INTERSPEECH (2015) Lee, J., Tashev, I.: High-level feature representation using recurrent neural network for speech emotion recognition. In: INTERSPEECH (2015)
Metadaten
Titel
Gender-Aware CNN-BLSTM for Speech Emotion Recognition
verfasst von
Linjuan Zhang
Longbiao Wang
Jianwu Dang
Lili Guo
Qiang Yu
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01418-6_76