Skip to main content
Erschienen in: International Journal of Speech Technology 3/2018

04.08.2018

Robust emotion recognition from speech: Gamma tone features and models

verfasst von: A. Revathi, N. Sasikaladevi, R. Nagakrishnan, C. Jeyalakshmi

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Affective computing is gaining paramount importance in ensuring the better and effective human–machine interaction. As glottal and speech signals depict the characteristics of the emotional nature of the speaker in addition to the linguistic information, speaker’s emotions are needed to be recognised to give meaningful response by the system. This paper emphasises the effectiveness and efficiency in selecting the energy features by passing the speech through the Gamma tone filters spaced in Equivalent rectangular bandwidth (ERB), MEL and BARK scale. Various modelling techniques are used to develop the robust multi-speaker independent speaker’s emotion/stress recognition system. Since EMO-DB Berlin database and SAVEE emotional audio-visual database used in this work contain the only limited set of speech utterances uttered by 10/4 actors/speakers in different emotions, it has become challenging to improve the performance of the stress/emotion recognition system. Speaker independent emotion recognition is done by extracting the Gamma tone energy features and cepstral features by passing the concatenated speech considered for training through the Gamma tone filters spaced in ERB, MEL and BARK scales. Subsequently, VQ/Fuzzy clustering models and continuous density hidden Markov models are created for all emotions and evaluation is done with the utterances of a speaker independent of speeches considered for training. The proposed features for test utterances are captured and applied to the VQ/Fuzzy/MHMM/SVM models and testing is performed by using minimum distance criterion/maximum log-likelihood criterion. The proposed Gamma tone energy/cepstral features and modelling techniques provide complementary evidence in assessing the performance of the system. This algorithm offers 96%, 79%, and 95.3% as weighted accuracy recall for the stress recognition system with respect to the classification done on emotion-specific group VQ/Fuzzy/MHMM/SVM models for GTF energy features with Gamma tone filters spaced in ERB, MEL and BARK scale respectively for the system evaluated for the EMO-DB database. Weighted accuracy recall is found to be 91%, 93% and 94% for the classification done on emotion-specific group models for GTF energy features with Gamma tone filters spaced in ERB, MEL and BARK scale respectively for the evaluation done on the utterances chosen from the SAVEE database. Gamma tone Cepstral features provide the overall accuracy of 92%, 90% and 92% for filters spaced in ERB, MEL and BARK scale for Berlin EMO-DB. Decision level fusion classification based on GTF energy features and modelling techniques provides the overall accuracy as 99.8% for EO-DB database and 100% for SAVEE database.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Anagnostopoulos, C.-N.,·Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review, 43, 155–177.CrossRef Anagnostopoulos, C.-N.,·Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review, 43, 155–177.CrossRef
Zurück zum Zitat Babu, M., Arun Kumar, M. N., & Santhosh, S. M. (2014). Extracting MFCC AND GTCC features for emotion recognition from audio speech signals. International Journal of Research in Computer Applications and Robotics, 2(8), 46–63. Babu, M., Arun Kumar, M. N., & Santhosh, S. M. (2014). Extracting MFCC AND GTCC features for emotion recognition from audio speech signals. International Journal of Research in Computer Applications and Robotics, 2(8), 46–63.
Zurück zum Zitat Garg, E., & Bahl, M. (2014). Emotion recognition in speech using gammatone cepstral coefficients. International Journal of Application or Innovation in Engineering & Management (IJAIEM), 3(10), 285–291. Garg, E., & Bahl, M. (2014). Emotion recognition in speech using gammatone cepstral coefficients. International Journal of Application or Innovation in Engineering & Management (IJAIEM), 3(10), 285–291.
Zurück zum Zitat Kaur, I., Kumar, R., Kaur, P. (2017). Speech emotion detection based on optimistic—DNN (Deep Neural Network) approach. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), 6(4), 150–156. Kaur, I., Kumar, R., Kaur, P. (2017). Speech emotion detection based on optimistic—DNN (Deep Neural Network) approach. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), 6(4), 150–156.
Zurück zum Zitat Koolagudi, S. G., Sharma, K., & Sreenivasa Rao, K. (2012). Speaker recognition in emotional environment. Communications in Computer and Information Science, 305, 117–124.CrossRef Koolagudi, S. G., Sharma, K., & Sreenivasa Rao, K. (2012). Speaker recognition in emotional environment. Communications in Computer and Information Science, 305, 117–124.CrossRef
Zurück zum Zitat Lee, C.-C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53, 1162–1171.CrossRef Lee, C.-C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53, 1162–1171.CrossRef
Zurück zum Zitat Li, Z., & Gao, Y. (2016). Acoustic feature extraction method for robust speaker identification. International Journal of Multimedia Tools and Applications, 75, 7391–7406.CrossRef Li, Z., & Gao, Y. (2016). Acoustic feature extraction method for robust speaker identification. International Journal of Multimedia Tools and Applications, 75, 7391–7406.CrossRef
Zurück zum Zitat Marković, B., Galić, J., Grozdić, Đ, Jovičić, S. T., & Mijić, M. (2017). Whispered speech recognition based on gammatone filterbank cepstral coefficients. Journal of Communications Technology and Electronics, 62(11), 1255–1261.CrossRef Marković, B., Galić, J., Grozdić, Đ, Jovičić, S. T., & Mijić, M. (2017). Whispered speech recognition based on gammatone filterbank cepstral coefficients. Journal of Communications Technology and Electronics, 62(11), 1255–1261.CrossRef
Zurück zum Zitat Mohanty, S. (2016). Language independent emotion recognition in speech signals. International Journal of Advanced Research in Computer Science and Software Engineering, 6(10), 299–301. Mohanty, S. (2016). Language independent emotion recognition in speech signals. International Journal of Advanced Research in Computer Science and Software Engineering, 6(10), 299–301.
Zurück zum Zitat Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49, 98–112.CrossRef Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49, 98–112.CrossRef
Zurück zum Zitat Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.CrossRef Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.CrossRef
Zurück zum Zitat Patel, P., Chaudhari, A., Kale, R., & Pund, M. A. (2009). Emotion recognition from speech with gaussian mixture models & via boosted GMM. International Journal of Research In Science & Engineering, 3(2), 47–53. Patel, P., Chaudhari, A., Kale, R., & Pund, M. A. (2009). Emotion recognition from speech with gaussian mixture models & via boosted GMM. International Journal of Research In Science & Engineering, 3(2), 47–53.
Zurück zum Zitat Peng, Z., Zhu, Z., Unoki, M., Dang, J., & Akagi, M. (2017). Speech emotion recognition using multichannel parallel convolutional recurrent neural networks based on Gammatone Auditory Filterbank. Proceedings of APSIPA Annual Summit and Conference, pp 1750–1755. https://ieeexplore.ieee.org/document/8282316/. Peng, Z., Zhu, Z., Unoki, M., Dang, J., & Akagi, M. (2017). Speech emotion recognition using multichannel parallel convolutional recurrent neural networks based on Gammatone Auditory Filterbank. Proceedings of APSIPA Annual Summit and Conference, pp 1750–1755. https://​ieeexplore.​ieee.​org/​document/​8282316/​.
Zurück zum Zitat Pervaiz, M., & Khan, T. A. (2016). Emotion recognition from speech using prosodic and linguistic features. International Journal of Advanced Computer Science and Applications, 7(8), 84–90.CrossRef Pervaiz, M., & Khan, T. A. (2016). Emotion recognition from speech using prosodic and linguistic features. International Journal of Advanced Computer Science and Applications, 7(8), 84–90.CrossRef
Zurück zum Zitat Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16, 143–160.CrossRef Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16, 143–160.CrossRef
Zurück zum Zitat Rabiner, L. & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall. Rabiner, L. & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.
Zurück zum Zitat Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRef Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRef
Zurück zum Zitat Sapra, A., Panwar, N., & Panwar, S. (2013). Emotion recognition from speech. International Journal of Emerging Technology and Advanced Engineering, 3(2), 341–345. Sapra, A., Panwar, N., & Panwar, S. (2013). Emotion recognition from speech. International Journal of Emerging Technology and Advanced Engineering, 3(2), 341–345.
Zurück zum Zitat Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1, Winter-Spring), 41–46. Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1, Winter-Spring), 41–46.
Zurück zum Zitat Sreenivasa Rao, K., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S. V. S. K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3(2), 3603–3607. Sreenivasa Rao, K., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S. V. S. K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3(2), 3603–3607.
Zurück zum Zitat Wua, S., Falk b, T. H., & Chan, W.-Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53, 768–785.CrossRef Wua, S., Falk b, T. H., & Chan, W.-Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53, 768–785.CrossRef
Zurück zum Zitat Yogesh, C. K., Hariharan, M., Ngadiran, R., Adom, A. H., Yaacob, S., Berkai, C., Polat, K. (2017). A new hybrid PSO assisted biogeography-based optimisation for emotion and stress recognition from speech signal. Expert Systems with Applications, 69, 149–158.CrossRef Yogesh, C. K., Hariharan, M., Ngadiran, R., Adom, A. H., Yaacob, S., Berkai, C., Polat, K. (2017). A new hybrid PSO assisted biogeography-based optimisation for emotion and stress recognition from speech signal. Expert Systems with Applications, 69, 149–158.CrossRef
Metadaten
Titel
Robust emotion recognition from speech: Gamma tone features and models
verfasst von
A. Revathi
N. Sasikaladevi
R. Nagakrishnan
C. Jeyalakshmi
Publikationsdatum
04.08.2018
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9546-1

Weitere Artikel der Ausgabe 3/2018

International Journal of Speech Technology 3/2018 Zur Ausgabe

Neuer Inhalt