Top

Published in:

2023 | OriginalPaper | Chapter

Analysis of Speech Emotion Recognition Using Deep Learning Algorithm

Authors : Rathnakar Achary, Manthan S. Naik, Tirth K. Pancholi

Published in: Intelligent Communication Technologies and Virtual Mobile Networks

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this project, we propose an automated system for Speech emotion recognition using convolution neural network (CNN). The system uses a 5 layer CNN model, which is trained and tested on over 7000 speech samples. The data used is .wav files of speech samples. Data required for the anlysis is gathered from RAVDESS dataset which consists of samples of speech and songs from both male and female actors. The different models of CNN were trained and tested on RAVDESS dataset until we got the required accuracy. The algorithm then classifies the given input audio file of .wav format into a range of emotions. The performance is evaluated by the accuracy of the code and also the validation accuracy. The algorithm must have minimum loss as well. The data consists of 24 actors singing and speaking in different emotions and with different intensity. The experimental results gives an accuracy of about 99.8% and a validation accuracy of 93.33% on applying the five layer model to the dataset. We get an model accuracy of 92.65%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Comparative Study of SVM and KNN Machine Learning Algorithm for Spectrum Sensing in Cognitive Radio

next chapter Fault-Tolerant Reconfigured FBSRC Topologies for Induction Furnace with PI Controller

Livingstone SR, Russo FA (2018) The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13:e0196391

Lotfian R, Busso C (2019) Curriculum learning for speech emotion recognition from crowdsourced labels. IEEE/ACM Trans Audio Speech Lang Process 27:815–826CrossRef

Shaqra FA, Duwairi R, Al-Ayyoub M (2019) Recognizing emotion from speech based on age and gender using hierarchical models. Procedia Comput Sci 151:37–44

Zamil AAA, Hasan S, Baki SMJ, Adam JM, Zaman I, Emotion detection from speech signals using voting mechanism on

Huang Z, Dong M, Mao Q, Zhan Y (2014) Speech emotion recognition using CNN. In: ACM (Orlando, FL), pp 801–804

Mirsamadi S, Barsoum E, Zhang C, Automatic speech emotion recognition using recurrent neural networks with local

André E, Rehm M, Minker W, Bühler D (2004) Endowing spoken language dialogue systems with emotional intelligence. In: Andre E, Dybkjaer L, Heisterkamp P, Minker W (eds) Affective dialogue systems tutorial and research workshop, ADS 2004, Germany: Kloster Irsee, pp 178–187

Lieskovska E, Jakubec M, Jarina R, Chmulik M (2021) A Review on speech emotion recognition using deep learning and attention mechanism. Electronics 10:1163. https://doi.org/10.3390/electronics10101163

Abbaschian BJ, Sierra-Sosa D, Elmaghraby A (2021) Deep learning techniques for speech emotion recognition, from databases to models. Sensors 21:1249. https://doi.org/10.3390/s21041249

10.

Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and recurrent neural networks. In: Proceedings of the signal and information processing association annual summit and conference (Jeju), pp 1–4

11.

Attention. In Proceedings of the 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp 2227–2231

12.

Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International conference on platform technology and service (PlatCon-17) (Busan), pp 1–5

13.

Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

14.

Zhang S, Zhang S, Huang T, Gao W (2017) Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Trans Multimed 20:1576–1590CrossRef

15.

Swain M, Routray A, Kabisatpathy P (2018) Databases, features and classifiers for speech emotion recognition: a review. Int J Speech Technol 21:93–120CrossRef

16.

Krothapalli SR, Koolagudi SC (2013) Emotion recognition using speech features. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5143-3

17.

Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed Signal Process Control 47:312–323CrossRef

18.

Fayek H, Lech M, Cavedon L (2015) Towards real-time speech emotion recognition using deep neural networks. In: ICSPCS (Cairns, QLD), pp 1–6

19.

Han K, Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech (Singapore), pp 1–5

20.

Achary R, Naik M, Pancholi T, Prediction of congestive heart failure (CHF) ECG data using machine learning. In: Intelligent data communication technologies and Internet of Things. https://link.springer.com/chapter/https://doi.org/10.1007/978-981-15-9509-728

Title: Analysis of Speech Emotion Recognition Using Deep Learning Algorithm
Authors: Rathnakar Achary
Manthan S. Naik
Tirth K. Pancholi
Publisher: Springer Nature Singapore
Book: Intelligent Communication Technologies and Virtual Mobile Networks
Print ISBN: 978-981-19-1843-8

Electronic ISBN: 978-981-19-1844-5

Copyright Year: 2023
DOI: https://doi.org/10.1007/978-981-19-1844-5_42

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"