Skip to main content
Top
Published in: International Journal of Speech Technology 1/2021

24-11-2020

Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm

Author: M. Gomathy

Published in: International Journal of Speech Technology | Issue 1/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Human interactions involve emotional cues that can be used to interpret the emotion expressed by the speaker. As the vocal emotions vary from one speaker to another, there is a chance of misinterpretation. To determine the emotion expressed by the speaker, a speech emotion recognizer can be utilized. It is known that speech expresses the emotional states of humans along with the syntax and semantic content of linguistic sentences. Therefore, human emotion recognition using speech signaling is possible. Speech emotion recognition is a crucial and challenging task in which the feature extraction plays a prominent role in its performance. Determining emotional states in speech signals is a very challenging area for many reasons. The first issue of all speech emotion systems is the selection of the best features, which is powerful enough to distinguish various emotions. The presence of different language, pronunciation, sentences, style, and speakers adds additional difficulty since these characteristics include pitch and energy that directly alters most of the features extracted. Redundant features and high computational cost make emotion recognition an undesirable task. Instead of focusing on the words, the vocal changes and communicative pressure on the words should be taken as the primary consideration. The Enhanced Cat Swarm Optimization (ECSO) algorithm for feature extraction has been proposed to address these issues and it is not used in any existing speech emotion recognition approaches. The proposed approach achieves excellent performance in terms of accuracy, recognition rate, sensitivity, and specificity.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, 25(3), 556–570.CrossRef Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, 25(3), 556–570.CrossRef
go back to reference El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44, 572–587.CrossRef El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44, 572–587.CrossRef
go back to reference Gharavian, D., Mansour, S., Alireza, N., & Sahar, G. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21(8), 2115–2126.CrossRef Gharavian, D., Mansour, S., Alireza, N., & Sahar, G. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21(8), 2115–2126.CrossRef
go back to reference Jiang, P., Hongliang, F., Huawei, T., Peizhi, L., & Li, Z. (2019). Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition. IEEE Access, 7, 90368–90377.CrossRef Jiang, P., Hongliang, F., Huawei, T., Peizhi, L., & Li, Z. (2019). Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition. IEEE Access, 7, 90368–90377.CrossRef
go back to reference Jing, S., Xia, M., & Lijiang, C. (2018). Prominence features: Effective emotional features for speech emotion recognition. Digital Signal Processing, 72, 216–231.CrossRef Jing, S., Xia, M., & Lijiang, C. (2018). Prominence features: Effective emotional features for speech emotion recognition. Digital Signal Processing, 72, 216–231.CrossRef
go back to reference Li, X., & Masato, A. (2019). Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model. Speech Communication, 110, 1–12.CrossRef Li, X., & Masato, A. (2019). Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model. Speech Communication, 110, 1–12.CrossRef
go back to reference Liu, Z.-T., Min, W., Wei-Hua, C., Jun-Wei, M., Jian-Ping, X., & Guan-Zheng, T. (2018). Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing, 273, 271–280.CrossRef Liu, Z.-T., Min, W., Wei-Hua, C., Jun-Wei, M., Jian-Ping, X., & Guan-Zheng, T. (2018). Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing, 273, 271–280.CrossRef
go back to reference Meng, H., Tianhao, Y., Fei, Y., & Hongwei, W. (2019). Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network. IEEE Access, 7, 125868–125881.CrossRef Meng, H., Tianhao, Y., Fei, Y., & Hongwei, W. (2019). Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network. IEEE Access, 7, 125868–125881.CrossRef
go back to reference Milton, A., & Tamil, S. S. (2015). Four-stage feature selection to recognize emotion from speech signals. International Journal of Speech Technology, 18(4), 505–520.CrossRef Milton, A., & Tamil, S. S. (2015). Four-stage feature selection to recognize emotion from speech signals. International Journal of Speech Technology, 18(4), 505–520.CrossRef
go back to reference Ozseven, T. (2019). A novel feature selection method for speech emotion recognition. Applied Acoustics, 146, 320–326.CrossRef Ozseven, T. (2019). A novel feature selection method for speech emotion recognition. Applied Acoustics, 146, 320–326.CrossRef
go back to reference Ramakrishnan, S., Emary, I. M. M. E. I. (2013). Speech emotion recognition approaches in human computer interaction.Telecommunication Systems, 52(3), 1467–1478. Ramakrishnan, S., Emary, I. M. M. E. I. (2013). Speech emotion recognition approaches in human computer interaction.Telecommunication Systems, 52(3), 1467–1478.
go back to reference Sheikhan, M., Mahdi, B., & Davood, G. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.CrossRef Sheikhan, M., Mahdi, B., & Davood, G. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.CrossRef
go back to reference Sun, L., Sheng, F., and Fu, W. (2019) Decision tree SVM model with Fisher feature selection for speech emotion recognition. EURASIP Journal on Audio, Speech, and Music Processing, 1(2). Sun, L., Sheng, F., and Fu, W. (2019) Decision tree SVM model with Fisher feature selection for speech emotion recognition. EURASIP Journal on Audio, Speech, and Music Processing, 1(2).
go back to reference Sun, Y., & Guihua, W. (2015). Emotion recognition using semi-supervised feature selection with speaker normalization. International Journal of Speech Technology, 18(3), 317–331.CrossRef Sun, Y., & Guihua, W. (2015). Emotion recognition using semi-supervised feature selection with speaker normalization. International Journal of Speech Technology, 18(3), 317–331.CrossRef
go back to reference Wang, F., Verhelst, W., & Sahli, H. (2011). Relevance vector machine based speech emotion recognition” Lecture Notes in Computer Science. Affect Comput Intell Interact, 69(75), 111–120.CrossRef Wang, F., Verhelst, W., & Sahli, H. (2011). Relevance vector machine based speech emotion recognition” Lecture Notes in Computer Science. Affect Comput Intell Interact, 69(75), 111–120.CrossRef
go back to reference Xiao, Z., Dellandrea, E., Dou, W., Chen, L. (2010). Multi-stage classification of emotional speech motivated by a dimensional model.Multimedia Tools and Applications, 46, 119–345. Xiao, Z., Dellandrea, E., Dou, W., Chen, L. (2010). Multi-stage classification of emotional speech motivated by a dimensional model.Multimedia Tools and Applications, 46, 119–345.
go back to reference Zhao, J., Xia, M., & Lijiang, C. (2019a). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312–323.CrossRef Zhao, J., Xia, M., & Lijiang, C. (2019a). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312–323.CrossRef
go back to reference Zhao, Z., Zhongtian, B., Yiqin, Z., Zixing, Z., Nicholas, C., Zhao, R., et al. (2019b). Exploring deep spectrum representations via attention-based recurrent and convolutional neural networks for speech emotion recognition. IEEE Access, 7, 97515–97525.CrossRef Zhao, Z., Zhongtian, B., Yiqin, Z., Zixing, Z., Nicholas, C., Zhao, R., et al. (2019b). Exploring deep spectrum representations via attention-based recurrent and convolutional neural networks for speech emotion recognition. IEEE Access, 7, 97515–97525.CrossRef
Metadata
Title
Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm
Author
M. Gomathy
Publication date
24-11-2020
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2021
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-020-09776-x

Other articles of this Issue 1/2021

International Journal of Speech Technology 1/2021 Go to the issue