Skip to main content
Top

2023 | OriginalPaper | Chapter

Analysis of Speech Emotion Recognition Using Deep Learning Algorithm

Authors : Rathnakar Achary, Manthan S. Naik, Tirth K. Pancholi

Published in: Intelligent Communication Technologies and Virtual Mobile Networks

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this project, we propose an automated system for Speech emotion recognition using convolution neural network (CNN). The system uses a 5 layer CNN model, which is trained and tested on over 7000 speech samples. The data used is .wav files of speech samples. Data required for the anlysis is gathered from RAVDESS dataset which consists of samples of speech and songs from both male and female actors. The different models of CNN were trained and tested on RAVDESS dataset until we got the required accuracy. The algorithm then classifies the given input audio file of .wav format into a range of emotions. The performance is evaluated by the accuracy of the code and also the validation accuracy. The algorithm must have minimum loss as well. The data consists of 24 actors singing and speaking in different emotions and with different intensity. The experimental results gives an accuracy of about 99.8% and a validation accuracy of 93.33% on applying the five layer model to the dataset. We get an model accuracy of 92.65%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Livingstone SR, Russo FA (2018) The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13:e0196391 Livingstone SR, Russo FA (2018) The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13:e0196391
2.
go back to reference Lotfian R, Busso C (2019) Curriculum learning for speech emotion recognition from crowdsourced labels. IEEE/ACM Trans Audio Speech Lang Process 27:815–826CrossRef Lotfian R, Busso C (2019) Curriculum learning for speech emotion recognition from crowdsourced labels. IEEE/ACM Trans Audio Speech Lang Process 27:815–826CrossRef
3.
go back to reference Shaqra FA, Duwairi R, Al-Ayyoub M (2019) Recognizing emotion from speech based on age and gender using hierarchical models. Procedia Comput Sci 151:37–44 Shaqra FA, Duwairi R, Al-Ayyoub M (2019) Recognizing emotion from speech based on age and gender using hierarchical models. Procedia Comput Sci 151:37–44
4.
go back to reference Zamil AAA, Hasan S, Baki SMJ, Adam JM, Zaman I, Emotion detection from speech signals using voting mechanism on Zamil AAA, Hasan S, Baki SMJ, Adam JM, Zaman I, Emotion detection from speech signals using voting mechanism on
5.
go back to reference Huang Z, Dong M, Mao Q, Zhan Y (2014) Speech emotion recognition using CNN. In: ACM (Orlando, FL), pp 801–804 Huang Z, Dong M, Mao Q, Zhan Y (2014) Speech emotion recognition using CNN. In: ACM (Orlando, FL), pp 801–804
6.
go back to reference Mirsamadi S, Barsoum E, Zhang C, Automatic speech emotion recognition using recurrent neural networks with local Mirsamadi S, Barsoum E, Zhang C, Automatic speech emotion recognition using recurrent neural networks with local
7.
go back to reference André E, Rehm M, Minker W, Bühler D (2004) Endowing spoken language dialogue systems with emotional intelligence. In: Andre E, Dybkjaer L, Heisterkamp P, Minker W (eds) Affective dialogue systems tutorial and research workshop, ADS 2004, Germany: Kloster Irsee, pp 178–187 André E, Rehm M, Minker W, Bühler D (2004) Endowing spoken language dialogue systems with emotional intelligence. In: Andre E, Dybkjaer L, Heisterkamp P, Minker W (eds) Affective dialogue systems tutorial and research workshop, ADS 2004, Germany: Kloster Irsee, pp 178–187
10.
go back to reference Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and recurrent neural networks. In: Proceedings of the signal and information processing association annual summit and conference (Jeju), pp 1–4 Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and recurrent neural networks. In: Proceedings of the signal and information processing association annual summit and conference (Jeju), pp 1–4
11.
go back to reference Attention. In Proceedings of the 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp 2227–2231 Attention. In Proceedings of the 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp 2227–2231
12.
go back to reference Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International conference on platform technology and service (PlatCon-17) (Busan), pp 1–5 Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International conference on platform technology and service (PlatCon-17) (Busan), pp 1–5
13.
go back to reference Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:​1409.​0473
14.
go back to reference Zhang S, Zhang S, Huang T, Gao W (2017) Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Trans Multimed 20:1576–1590CrossRef Zhang S, Zhang S, Huang T, Gao W (2017) Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Trans Multimed 20:1576–1590CrossRef
15.
go back to reference Swain M, Routray A, Kabisatpathy P (2018) Databases, features and classifiers for speech emotion recognition: a review. Int J Speech Technol 21:93–120CrossRef Swain M, Routray A, Kabisatpathy P (2018) Databases, features and classifiers for speech emotion recognition: a review. Int J Speech Technol 21:93–120CrossRef
17.
go back to reference Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed Signal Process Control 47:312–323CrossRef Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed Signal Process Control 47:312–323CrossRef
18.
go back to reference Fayek H, Lech M, Cavedon L (2015) Towards real-time speech emotion recognition using deep neural networks. In: ICSPCS (Cairns, QLD), pp 1–6 Fayek H, Lech M, Cavedon L (2015) Towards real-time speech emotion recognition using deep neural networks. In: ICSPCS (Cairns, QLD), pp 1–6
19.
go back to reference Han K, Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech (Singapore), pp 1–5 Han K, Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech (Singapore), pp 1–5
20.
go back to reference Achary R, Naik M, Pancholi T, Prediction of congestive heart failure (CHF) ECG data using machine learning. In: Intelligent data communication technologies and Internet of Things. https://link.springer.com/chapter/https://doi.org/10.1007/978-981-15-9509-728 Achary R, Naik M, Pancholi T, Prediction of congestive heart failure (CHF) ECG data using machine learning. In: Intelligent data communication technologies and Internet of Things. https://​link.​springer.​com/​chapter/​https://​doi.​org/​10.​1007/​978-981-15-9509-728
Metadata
Title
Analysis of Speech Emotion Recognition Using Deep Learning Algorithm
Authors
Rathnakar Achary
Manthan S. Naik
Tirth K. Pancholi
Copyright Year
2023
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-19-1844-5_42