skip to main content
10.1145/3408127.3408192acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdspConference Proceedingsconference-collections
research-article

Speech Emotion Recognition Based on BLSTM and CNN Feature Fusion

Authors Info & Claims
Published:10 September 2020Publication History

ABSTRACT

Speech emotion recognition (SER) is always challenging because of factors such as emotional corpus, acoustic features and SER modeling. SER based on deep learning are limited to using a spectrogram or handcrafted features as input, but cannot capture enough of the defects of emotional information, this paper proposes a feature fusion method based on Bidirectional Long Short-Term Memory (BLSTM) and Convolutional Neural Networks (CNN) to study richer emotional features, the method is combining context features and spatial features. Statistical features are used as the input of BLSTM network, the context features of speech signals are extracted by BLSTM, and the spatial features of speech signals are extracted by using log-mel spectrogram as the input of CNN, so as to jointly learn the emotional features with good recognition performance. The experimental results showed that the weighted accuracy and unweighted accuracy of the proposed method on the IEMOCAP data set were 74.14% and 65.62% respectively. In addition, compared with the existing SER methods, the effectiveness of the proposed method is verified.

References

  1. S. Mirsamadi, E. Barsoum and C. Zhang. 2017. Automatic speech emotion recognition using recurrent neural networks with local attention. In Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (New Orleans, LA, March 5-9, 2017). IEEE, 2227--2231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Aldeneh and E. M. Provost. 2017. Using regional saliency for speech emotion recognition. In Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (New Orleans, LA, March 5-9, 2017). IEEE, 2741--2745.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Pengcheng Li, Yan Song, Ian McLoughlin, et al. 2018. An Attention Pooling based Representation Learning Method for Speech Emotion Recognition. In Proceedings of the Interspeech 2018.Google ScholarGoogle ScholarCross RefCross Ref
  4. Mingyi Chen, Xuanji He, Jing Yang, Han Zhang. 2018. 3-D Convolutional Recurrent Neural Networks with Attention Model for Speech Emotion Recognition. IEEE Signal Processing Letters. 25, 10 (Oct. 2018), 1440--1444.Google ScholarGoogle ScholarCross RefCross Ref
  5. John W.Kim, Rif A.Saurous. 2018. Emotion Recognition from Human Speech Using Temporal Information and Deep Learning. In Proceedings of the Interspeech 2018. 2018-1132.Google ScholarGoogle Scholar
  6. P. Tzirakis, J. Zhang and B. W. Schuller. 2018. End-to-End Speech Emotion Recognition Using Deep Neural Networks. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Calgary, AB, April 15-20, 2018). IEEE, 5089--5093.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Busso, M. Bulut, C. C. Lee, et al. 2008. IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation. 42, 4 (2008), 335.Google ScholarGoogle ScholarCross RefCross Ref
  8. Björn Schuller, Stefan Steidl, and Anton Batliner. 2009. The interspeech 2009 emotion challenge. In Proceedings of Interspeech 2009. 312--315.Google ScholarGoogle ScholarCross RefCross Ref
  9. Y. Kim and E. M. Provost. 2013. Emotion classification via utterance-level dynamics: A pattern-based approach to characterizing affective expressions. In Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Vancouver, BC, May 26-31, 2013). IEEE, 3677--3681.Google ScholarGoogle ScholarCross RefCross Ref
  10. Y. Zhang, J. Du, Z. Wang, J. Zhang and Y. Tu. 2018. Attention Based Fully Convolutional Network for Speech Emotion Recognition. In Proceedings of 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (Honolulu, HI, USA, Nov 12-15, 2018). IEEE, 1771--1775.Google ScholarGoogle Scholar
  11. G. Ramet, P. N. Garner, M. Baeriswyl and A. Lazaridis. 2018. Context-Aware Attention Mechanism for Speech Emotion Recognition. In Proceedings of 2018 IEEE Spoken Language Technology Workshop (SLT), (Athens, Greece, Dec 18-21, 2018). IEEE, 126--131.Google ScholarGoogle ScholarCross RefCross Ref
  12. Liu ZT., Xiao P., Li DY., Hao M. 2019. Speaker-Independent Speech Emotion Recognition Based on CNN-BLSTM and Multiple SVMs. In Proceedings of Intelligent Robotics and Applications (ICIRA 2019). Lecture Notes in Computer Science. Springer, Cham. 11742.Google ScholarGoogle Scholar

Index Terms

  1. Speech Emotion Recognition Based on BLSTM and CNN Feature Fusion

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICDSP '20: Proceedings of the 2020 4th International Conference on Digital Signal Processing
          June 2020
          383 pages
          ISBN:9781450376877
          DOI:10.1145/3408127

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 September 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader