Skip to main content
Top

2021 | OriginalPaper | Chapter

Speech Emotion Recognition Using CNN, k-NN, MLP and Random Forest

Authors : Jasmeet Kaur, Anil Kumar

Published in: Computer Networks and Inventive Communication Technologies

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Emotion recognition from speech has become a hot topic among researchers. This paper describes several methods to recognize emotions from speech signals using machine learning algorithms such as a k-nearest neighbour, multi-layer perceptron, convolutional neural network and random forest. Short-term Fourier transform spectrograms and mel frequency cepstral coefficients were extracted from Berlin database of emotional speech. Spectrograms were used as input for CNN. While MFCC features were input to k-NN, MLP and random forest. Each classifier demonstrated satisfactory results in the classification of seven emotions (happy, sad, angry, neutral, disgust, boredom and fear) but MLP classifier was the most prominent with an overall accuracy of 90.36%. A comparison between the performances of these classification algorithms is also presented.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Castellano G, Kessous L, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Affect and emotion in human-computer interaction. Springer, Berlin, Heidelberg, pp 92–103 Castellano G, Kessous L, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Affect and emotion in human-computer interaction. Springer, Berlin, Heidelberg, pp 92–103
2.
go back to reference France DJ, Shiavi RG, Silverman S, Silverman M, Wilkes M (2000) Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans Biomed Eng 47(7):829–837CrossRef France DJ, Shiavi RG, Silverman S, Silverman M, Wilkes M (2000) Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans Biomed Eng 47(7):829–837CrossRef
3.
go back to reference Koolagudi SG, Rao KS (2012) Emotion recognition from speech: a review. Int J Speech Technol 15(2):99–117CrossRef Koolagudi SG, Rao KS (2012) Emotion recognition from speech: a review. Int J Speech Technol 15(2):99–117CrossRef
4.
go back to reference Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181CrossRef Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181CrossRef
5.
go back to reference Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of german emotional speech. In: Ninth european conference on speech communication and technology Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of german emotional speech. In: Ninth european conference on speech communication and technology
6.
go back to reference Ramdinmawii E, Mohanta A, Mittal VK (2017) Emotion recognition from speech signal. In: TENCON 2017–2017 IEEE region 10 conference. IEEE, Penang, Malaysia, pp 1562–1567 Ramdinmawii E, Mohanta A, Mittal VK (2017) Emotion recognition from speech signal. In: TENCON 2017–2017 IEEE region 10 conference. IEEE, Penang, Malaysia, pp 1562–1567
7.
go back to reference Satt A, Rozenberg S, Hoory R (2017) Efficient emotion recognition from speech using deep learning on spectrograms. In: INTERSPEECH, pp 1089–1093 Satt A, Rozenberg S, Hoory R (2017) Efficient emotion recognition from speech using deep learning on spectrograms. In: INTERSPEECH, pp 1089–1093
8.
go back to reference Yenigalla P, Kumar A, Tripathi S, Singh C, Kar S, Vepa J (2018) Speech emotion recognition using spectrogram & phoneme embedding. In: INTERSPEECH, pp 3688–3692 Yenigalla P, Kumar A, Tripathi S, Singh C, Kar S, Vepa J (2018) Speech emotion recognition using spectrogram & phoneme embedding. In: INTERSPEECH, pp 3688–3692
9.
go back to reference Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. In: International conference on platform technology and service (PlatCon). IEEE, Busan, South Korea, pp 1–5 Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. In: International conference on platform technology and service (PlatCon). IEEE, Busan, South Korea, pp 1–5
10.
go back to reference Kaur J, Kumar A (2020) Databases, features and classification techniques for speech emotion recognition. Int J Innov Technol Exploring Eng 9(6):185–190CrossRef Kaur J, Kumar A (2020) Databases, features and classification techniques for speech emotion recognition. Int J Innov Technol Exploring Eng 9(6):185–190CrossRef
11.
go back to reference El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587CrossRef El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587CrossRef
12.
go back to reference Huang K, Wu C, Hong Q, Su M, Chen Y (2019) Speech emotion recognition using deep neural network considering verbal and nonverbal speech sounds. In: ICASSP 2019–2019 IEEE ınternational conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5866–5870 Huang K, Wu C, Hong Q, Su M, Chen Y (2019) Speech emotion recognition using deep neural network considering verbal and nonverbal speech sounds. In: ICASSP 2019–2019 IEEE ınternational conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5866–5870
13.
go back to reference Sarker MK, Alam KMR, Arifuzzaman M (2014) Emotion recognition from speech based on relevant feature and majority voting. In: 2014 international conference on informatics, electronics & vision (ICIEV). IEEE, pp 1–5 Sarker MK, Alam KMR, Arifuzzaman M (2014) Emotion recognition from speech based on relevant feature and majority voting. In: 2014 international conference on informatics, electronics & vision (ICIEV). IEEE, pp 1–5
14.
go back to reference Chen H, Liu Z, Kang X, Nishide S, Ren F (2019) Investigating voice features for Speech emotion recognition based on four kinds of machine learning methods. In: 2019 IEEE 6th international conference on cloud computing and intelligence systems (CCIS). IEEE, pp 195–199 Chen H, Liu Z, Kang X, Nishide S, Ren F (2019) Investigating voice features for Speech emotion recognition based on four kinds of machine learning methods. In: 2019 IEEE 6th international conference on cloud computing and intelligence systems (CCIS). IEEE, pp 195–199
15.
go back to reference Palo HK, Sagar S (2018) Comparison of neural network models for speech emotion recognition. In: 2018 2nd international conference on data science and business analytics (ICDSBA). IEEE, pp 127–131 Palo HK, Sagar S (2018) Comparison of neural network models for speech emotion recognition. In: 2018 2nd international conference on data science and business analytics (ICDSBA). IEEE, pp 127–131
Metadata
Title
Speech Emotion Recognition Using CNN, k-NN, MLP and Random Forest
Authors
Jasmeet Kaur
Anil Kumar
Copyright Year
2021
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-15-9647-6_39