Skip to main content
Erschienen in: Wireless Personal Communications 2/2024

29.03.2024

An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques

verfasst von: Mohammed Jawad Al-Dujaili Al-Khazraji, Abbas Ebrahimi-Moghadam

Erschienen in: Wireless Personal Communications | Ausgabe 2/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speech is one of the communication processes of humans. One of the important features of speech is to convey the inner feelings of the person to the listener. When a speech is expressed by the speaker, this speech also contains the feelings of the person, which leads to the creation of thoughts and behaviors appropriate to oneself. Speech Emotion Recognition (SER) is a very important issue in the field of human–machine interaction. The expansion of the use of computers and its impact on today's life has caused this mutual cooperation between man and machine to be widely investigated and researched. In this article, SER in English and Persian has been examined. Frequency time characteristics such as Mel- Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding and Predictive Linear Perceptual (PLP) are extracted from the data as feature vectors, then they are combined with each other and a selection of suitable features from them. Also, Principal components analysis (PCA) is used to reduce dimensions and eliminate redundancy while retaining most of the intrinsic information content of the pattern. Then, each emotional state was classified using the Gaussian Mixtures Model (GMM) and Hidden Markov Model (HMM) technique. Combining the MFCC + PLP properties, PCA features, and HMM classification with a precision of 88.85% and a runtime of 0.3 s produces the average diagnostic rate in the English database; similarly, the PLP properties, PCA features, and HMM classification with a precision of 90.21% and a runtime of 0.4 s produce the average diagnostic rate in the Persian database. Based on the combination of features and classifications, the experimental results demonstrated that the suggested approach can attain a high level of stable detection performance for every emotional state.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Journal of Neural Computing and Applications, 9, 290–296.CrossRef Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Journal of Neural Computing and Applications, 9, 290–296.CrossRef
2.
Zurück zum Zitat Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56–76.CrossRef Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56–76.CrossRef
3.
Zurück zum Zitat Fragopanagos, N., & Taylor, J. G. (2005). Emotion recognition in human–computer interaction. Neural Networks, 18(4), 389–405.CrossRef Fragopanagos, N., & Taylor, J. G. (2005). Emotion recognition in human–computer interaction. Neural Networks, 18(4), 389–405.CrossRef
4.
Zurück zum Zitat Cichosz, J., & Slot, K. (2007). Emotion recognition in speech signal using emotion-extracting binary decision trees. In Proceedings of affective computing and intelligent interaction.‏ Cichosz, J., & Slot, K. (2007). Emotion recognition in speech signal using emotion-extracting binary decision trees. In Proceedings of affective computing and intelligent interaction.‏
5.
Zurück zum Zitat Roy, T., Marwala, T., & Chakraverty, S. (2020). A survey of classification techniques in speech emotion recognition. Mathematical Methods in Interdisciplinary Sciences, 1, 33–48.CrossRef Roy, T., Marwala, T., & Chakraverty, S. (2020). A survey of classification techniques in speech emotion recognition. Mathematical Methods in Interdisciplinary Sciences, 1, 33–48.CrossRef
6.
Zurück zum Zitat New, T., Foo, S., & De Silva, L. (2003). Speech emotion recognition using hidden Markov models. Journal of Speech Commun., 41, 603–623.CrossRef New, T., Foo, S., & De Silva, L. (2003). Speech emotion recognition using hidden Markov models. Journal of Speech Commun., 41, 603–623.CrossRef
7.
Zurück zum Zitat Truong, K. P., & van Leeuwen, D. A. (2005). Automatic detection of laughter. In 9th European conference on speech communication and technology. Truong, K. P., & van Leeuwen, D. A. (2005). Automatic detection of laughter. In 9th European conference on speech communication and technology.
8.
Zurück zum Zitat Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Automatic emotion recognition using prosodic parameters. In 9th European conference on speech communication and technology. Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Automatic emotion recognition using prosodic parameters. In 9th European conference on speech communication and technology.
9.
Zurück zum Zitat Anagnostopoulos, C.-N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155–177.CrossRef Anagnostopoulos, C.-N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155–177.CrossRef
10.
Zurück zum Zitat Koolagudi, S. G., & Sreenivasa Rao, K. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.CrossRef Koolagudi, S. G., & Sreenivasa Rao, K. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.CrossRef
11.
Zurück zum Zitat Al-Dujaili, M. J., & Ebrahimi-Moghadam, A. (2023). Speech emotion recognition: A comprehensive survey. Wireless Personal Communications, 129(4), 2525–2561.CrossRef Al-Dujaili, M. J., & Ebrahimi-Moghadam, A. (2023). Speech emotion recognition: A comprehensive survey. Wireless Personal Communications, 129(4), 2525–2561.CrossRef
12.
Zurück zum Zitat Issa, D., Fatih Demirci, M., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894.CrossRef Issa, D., Fatih Demirci, M., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894.CrossRef
13.
Zurück zum Zitat Staroniewicz, P. (2011). Automatic recognition of emotional state in Polish speech. In Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues (347–353). Springer.‏ Staroniewicz, P. (2011). Automatic recognition of emotional state in Polish speech. In Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues (347–353). Springer.‏
14.
Zurück zum Zitat Staroniewicz, P. (2009). Recognition of emotional state in Polish speech—Comparison between human and automatic efficiency. In Fierrez, J., Ortega- Garcia, J., Esposito, A., Drygajlo,A., Faundez- Zanuy, M. (Eds.) BioID MultiComm (vol. 5707, pp. 33–40). LNCS, Springer. Staroniewicz, P. (2009). Recognition of emotional state in Polish speech—Comparison between human and automatic efficiency. In Fierrez, J., Ortega- Garcia, J., Esposito, A., Drygajlo,A., Faundez- Zanuy, M. (Eds.) BioID MultiComm (vol. 5707, pp. 33–40). LNCS, Springer.
15.
Zurück zum Zitat Al Dujaili, M. J., & Ebrahimi-Moghadam, A. (2023). Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers. Multimedia Tools and Applications, 1, 1–19. Al Dujaili, M. J., & Ebrahimi-Moghadam, A. (2023). Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers. Multimedia Tools and Applications, 1, 1–19.
16.
Zurück zum Zitat Vogt, T., Andre, E., & Wagner, J. (2008). Automatic recognition of emotions from speech: A review of the literature and recommendations for practical realisation. In Peter, Beale, R (Eds.) Affect and Emotion in HCI (pp. 75–91). LNCS 4868. Vogt, T., Andre, E., & Wagner, J. (2008). Automatic recognition of emotions from speech: A review of the literature and recommendations for practical realisation. In Peter, Beale, R (Eds.) Affect and Emotion in HCI (pp. 75–91). LNCS 4868.
17.
Zurück zum Zitat Mao, X., & Chenand L. F. (2009). Multi-level speech emotion recognition based on HMM and ANN. In Proceeding of the World Cong. on Computer Science and Information Engineering. Mao, X., & Chenand L. F. (2009). Multi-level speech emotion recognition based on HMM and ANN. In Proceeding of the World Cong. on Computer Science and Information Engineering.
18.
Zurück zum Zitat Yang, N., Dey, N., Sherratt, R. S., & Shi, F. (2020). Recognize basic emotional statesin speech by machine learning techniques using mel-frequency cepstral coefficient features. Journal of Intelligent & Fuzzy Systems, 39(2), 1925–1936.CrossRef Yang, N., Dey, N., Sherratt, R. S., & Shi, F. (2020). Recognize basic emotional statesin speech by machine learning techniques using mel-frequency cepstral coefficient features. Journal of Intelligent & Fuzzy Systems, 39(2), 1925–1936.CrossRef
19.
Zurück zum Zitat Chauhan, N., Isshiki, T., Li, D. (2020). Speaker Recognition using fusion of features with Feed forward Artificial Neural Network and Support Vector Machine. In 2020 International conference on intelligent engineering and management (ICIEM). IEEE.‏ Chauhan, N., Isshiki, T., Li, D. (2020). Speaker Recognition using fusion of features with Feed forward Artificial Neural Network and Support Vector Machine. In 2020 International conference on intelligent engineering and management (ICIEM). IEEE.‏
20.
Zurück zum Zitat Chen, L. S., Tao, H., Huang, T. S., Miyasato, T., & Nakatsu, R. (1998) Emotion recognition from audiovisual information. In Proceeding of the IEEE/MMSP (pp. 83–88). Chen, L. S., Tao, H., Huang, T. S., Miyasato, T., & Nakatsu, R. (1998) Emotion recognition from audiovisual information. In Proceeding of the IEEE/MMSP (pp. 83–88).
21.
Zurück zum Zitat Koduru, A., Valiveti, H. B., & Budati, A. K. (2020). Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, 23(1), 45–55.CrossRef Koduru, A., Valiveti, H. B., & Budati, A. K. (2020). Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, 23(1), 45–55.CrossRef
22.
Zurück zum Zitat Y. Pan, P. Shen, L. Shen (2012) Feature extraction and selection in speech emotion recognition. In Proceeding of the onlinepresent.org (Vol. 2, pp. 64–69). Y. Pan, P. Shen, L. Shen (2012) Feature extraction and selection in speech emotion recognition. In Proceeding of the onlinepresent.org (Vol. 2, pp. 64–69).
23.
Zurück zum Zitat Gaurav, M. (2008). Performance analyses of spectral and prosodic features and their fusion for emotion recognition in speech. In Proceeding of the IEEE/SLT (pp. 313–316). Gaurav, M. (2008). Performance analyses of spectral and prosodic features and their fusion for emotion recognition in speech. In Proceeding of the IEEE/SLT (pp. 313–316).
24.
Zurück zum Zitat Athanaselist, T., & Bakamidis, S. (2005). ASR for emotional speech: Clarifying the issues and enhancing performance. Journal of Neural Network, 18, 437–444.CrossRef Athanaselist, T., & Bakamidis, S. (2005). ASR for emotional speech: Clarifying the issues and enhancing performance. Journal of Neural Network, 18, 437–444.CrossRef
25.
Zurück zum Zitat Harb, H., & Chen, L. (2005). Voice-based gender identification in multimedia application. Jornal of Intelligent Information Systems, 24(2–3), 179–198.CrossRef Harb, H., & Chen, L. (2005). Voice-based gender identification in multimedia application. Jornal of Intelligent Information Systems, 24(2–3), 179–198.CrossRef
26.
Zurück zum Zitat Farrús, M., Hernando, J., & Ejarque, P. (2007) Jitter and shimmer measurements for speaker recognition. In Eighth annual conference of the international speech communication association. Farrús, M., Hernando, J., & Ejarque, P. (2007) Jitter and shimmer measurements for speaker recognition. In Eighth annual conference of the international speech communication association.
27.
Zurück zum Zitat Kwon, S. (2020). A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors, 20(1), 183. Kwon, S. (2020). A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors, 20(1), 183.
28.
Zurück zum Zitat Dujaili, Al., Jawad, M., Ebrahimi-Moghadam, A., & Fatlawi, A. (2021). Speech emotion recognition based on SVM and KNN classifications fusion. International Journal of Electrical and Computer Engineering, 11(2), 1259. Dujaili, Al., Jawad, M., Ebrahimi-Moghadam, A., & Fatlawi, A. (2021). Speech emotion recognition based on SVM and KNN classifications fusion. International Journal of Electrical and Computer Engineering, 11(2), 1259.
29.
Zurück zum Zitat Bhavan, A., Sharma, M., Piplani, M., Chauhan, P., Hitkul, & Shah, R. R. (2020). Deep learning approaches for speech emotion recognition. In Deep learning-based approaches for sentiment analysis (pp. 259–289). Springer.‏ Bhavan, A., Sharma, M., Piplani, M., Chauhan, P., Hitkul, & Shah, R. R. (2020). Deep learning approaches for speech emotion recognition. In Deep learning-based approaches for sentiment analysis (pp. 259–289). Springer.‏
30.
Zurück zum Zitat Jermsittiparsert, K., Abdurrahman, A., Siriattakul, P., Sundeeva, L. A., Hashim, W., Rahim, R., & Maseleno, A. (2020). Pattern recognition and features selection for speech emotion recognition model using deep learning. International Journal of Speech Technology, 23, 799–806.CrossRef Jermsittiparsert, K., Abdurrahman, A., Siriattakul, P., Sundeeva, L. A., Hashim, W., Rahim, R., & Maseleno, A. (2020). Pattern recognition and features selection for speech emotion recognition model using deep learning. International Journal of Speech Technology, 23, 799–806.CrossRef
31.
Zurück zum Zitat Zvarevashe, K., & Olugbara, O. (2020). Ensemble learning of hybrid acoustic features for speech emotion recognition. Algorithms, 13(3), 70.CrossRef Zvarevashe, K., & Olugbara, O. (2020). Ensemble learning of hybrid acoustic features for speech emotion recognition. Algorithms, 13(3), 70.CrossRef
32.
Zurück zum Zitat Zimmermann, M., Mehdipour Ghazi, M., Ekenel, H. K., & Thiran, J. P. (2016). Visual speech recognition using PCA networks and LSTMs in a tandem GMM-HMM system. In Asian conference on computer vision. Springer. Zimmermann, M., Mehdipour Ghazi, M., Ekenel, H. K., & Thiran, J. P. (2016). Visual speech recognition using PCA networks and LSTMs in a tandem GMM-HMM system. In Asian conference on computer vision. Springer.
33.
Zurück zum Zitat Kacha, A., Grenez, F., Orozco-Arroyave, J. R., & Schoentgen, J. (2020). Principal component analysis of the spectrogram of the speech signal: Interpretation and application to dysarthric speech. Computer Speech & Language, 59, 114–122.CrossRef Kacha, A., Grenez, F., Orozco-Arroyave, J. R., & Schoentgen, J. (2020). Principal component analysis of the spectrogram of the speech signal: Interpretation and application to dysarthric speech. Computer Speech & Language, 59, 114–122.CrossRef
34.
Zurück zum Zitat Sin, B., & Kim, J. H. (1995). Nonstationary hiden Markov model. Signal Procesing, 46(1), 31–46.CrossRef Sin, B., & Kim, J. H. (1995). Nonstationary hiden Markov model. Signal Procesing, 46(1), 31–46.CrossRef
35.
Zurück zum Zitat Daneshfar, F., Kabudian, S. J., & Neekabadi, A. (2020). Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier. Applied Acoustics, 166, 107360.CrossRef Daneshfar, F., Kabudian, S. J., & Neekabadi, A. (2020). Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier. Applied Acoustics, 166, 107360.CrossRef
36.
Zurück zum Zitat Srivastava, R. K., Shree, R., Shukla, A. K., Pandey, R. P., Shukla, V., & Pandey, D. (2022). A Feature Based Classification and Analysis of Hidden Markov Model in Speech Recognition. In Cyber Intelligence and Information Retrieval: Proceedings of CIIR 2021. Springer.‏ Srivastava, R. K., Shree, R., Shukla, A. K., Pandey, R. P., Shukla, V., & Pandey, D. (2022). A Feature Based Classification and Analysis of Hidden Markov Model in Speech Recognition. In Cyber Intelligence and Information Retrieval: Proceedings of CIIR 2021. Springer.‏
37.
Zurück zum Zitat Yang, H. (2023). Application of PNN-HMM model based on emotion-speech combination in broadcast intelligent communication analysis. IEEE Access.‏ Yang, H. (2023). Application of PNN-HMM model based on emotion-speech combination in broadcast intelligent communication analysis. IEEE Access.‏
38.
Zurück zum Zitat Sharma, D., Cheema, A. P., Reddy, K. K., Reddy, C. K., Ram, G. B., Avinash, G., & Reddy, P. K. (2023). Speech Emotion Recognition System using SVD algorithm with HMM Model. In 2023 International Conference for Advancement in Technology (ICONAT). IEEE.‏ Sharma, D., Cheema, A. P., Reddy, K. K., Reddy, C. K., Ram, G. B., Avinash, G., & Reddy, P. K. (2023). Speech Emotion Recognition System using SVD algorithm with HMM Model. In 2023 International Conference for Advancement in Technology (ICONAT). IEEE.‏
39.
Zurück zum Zitat Xu, X., Li, D., Zhou, Y., & Wang, Z. (2022). Multi-type features separating fusion learning for Speech Emotion Recognition. Applied Soft Computing, 130, 109648.CrossRef Xu, X., Li, D., Zhou, Y., & Wang, Z. (2022). Multi-type features separating fusion learning for Speech Emotion Recognition. Applied Soft Computing, 130, 109648.CrossRef
Metadaten
Titel
An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques
verfasst von
Mohammed Jawad Al-Dujaili Al-Khazraji
Abbas Ebrahimi-Moghadam
Publikationsdatum
29.03.2024
Verlag
Springer US
Erschienen in
Wireless Personal Communications / Ausgabe 2/2024
Print ISSN: 0929-6212
Elektronische ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-024-10918-6

Weitere Artikel der Ausgabe 2/2024

Wireless Personal Communications 2/2024 Zur Ausgabe

Neuer Inhalt