nach oben

International Journal of Speech Technology

Erschienen in:

04.08.2018

Robust emotion recognition from speech: Gamma tone features and models

verfasst von: A. Revathi, N. Sasikaladevi, R. Nagakrishnan, C. Jeyalakshmi

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Affective computing is gaining paramount importance in ensuring the better and effective human–machine interaction. As glottal and speech signals depict the characteristics of the emotional nature of the speaker in addition to the linguistic information, speaker’s emotions are needed to be recognised to give meaningful response by the system. This paper emphasises the effectiveness and efficiency in selecting the energy features by passing the speech through the Gamma tone filters spaced in Equivalent rectangular bandwidth (ERB), MEL and BARK scale. Various modelling techniques are used to develop the robust multi-speaker independent speaker’s emotion/stress recognition system. Since EMO-DB Berlin database and SAVEE emotional audio-visual database used in this work contain the only limited set of speech utterances uttered by 10/4 actors/speakers in different emotions, it has become challenging to improve the performance of the stress/emotion recognition system. Speaker independent emotion recognition is done by extracting the Gamma tone energy features and cepstral features by passing the concatenated speech considered for training through the Gamma tone filters spaced in ERB, MEL and BARK scales. Subsequently, VQ/Fuzzy clustering models and continuous density hidden Markov models are created for all emotions and evaluation is done with the utterances of a speaker independent of speeches considered for training. The proposed features for test utterances are captured and applied to the VQ/Fuzzy/MHMM/SVM models and testing is performed by using minimum distance criterion/maximum log-likelihood criterion. The proposed Gamma tone energy/cepstral features and modelling techniques provide complementary evidence in assessing the performance of the system. This algorithm offers 96%, 79%, and 95.3% as weighted accuracy recall for the stress recognition system with respect to the classification done on emotion-specific group VQ/Fuzzy/MHMM/SVM models for GTF energy features with Gamma tone filters spaced in ERB, MEL and BARK scale respectively for the system evaluated for the EMO-DB database. Weighted accuracy recall is found to be 91%, 93% and 94% for the classification done on emotion-specific group models for GTF energy features with Gamma tone filters spaced in ERB, MEL and BARK scale respectively for the evaluation done on the utterances chosen from the SAVEE database. Gamma tone Cepstral features provide the overall accuracy of 92%, 90% and 92% for filters spaced in ERB, MEL and BARK scale for Berlin EMO-DB. Decision level fusion classification based on GTF energy features and modelling techniques provides the overall accuracy as 99.8% for EO-DB database and 100% for SAVEE database.

Vorheriger Artikel Blind digital speech watermarking using filter bank multicarrier modulation for 5G and IoT driven networks

Nächster Artikel The effect of different acoustic noise on speech signal formant frequency location

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Anagnostopoulos, C.-N.,·Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review, 43, 155–177.CrossRef

Babu, M., Arun Kumar, M. N., & Santhosh, S. M. (2014). Extracting MFCC AND GTCC features for emotion recognition from audio speech signals. International Journal of Research in Computer Applications and Robotics, 2(8), 46–63.

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of german emotional speech (EMO-DB). Proceedings Interspeech. Lissabon, Portugal. http://emodb.bilderbar.info/start.html.

Garg, E., & Bahl, M. (2014). Emotion recognition in speech using gammatone cepstral coefficients. International Journal of Application or Innovation in Engineering & Management (IJAIEM), 3(10), 285–291.

Kaur, I., Kumar, R., Kaur, P. (2017). Speech emotion detection based on optimistic—DNN (Deep Neural Network) approach. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), 6(4), 150–156.

Koolagudi, S. G., Sharma, K., & Sreenivasa Rao, K. (2012). Speaker recognition in emotional environment. Communications in Computer and Information Science, 305, 117–124.CrossRef

Lee, C.-C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53, 1162–1171.CrossRef

Li, Z., & Gao, Y. (2016). Acoustic feature extraction method for robust speaker identification. International Journal of Multimedia Tools and Applications, 75, 7391–7406.CrossRef

Marković, B., Galić, J., Grozdić, Đ, Jovičić, S. T., & Mijić, M. (2017). Whispered speech recognition based on gammatone filterbank cepstral coefficients. Journal of Communications Technology and Electronics, 62(11), 1255–1261.CrossRef

Mohanty, S. (2016). Language independent emotion recognition in speech signals. International Journal of Advanced Research in Computer Science and Software Engineering, 6(10), 299–301.

Moore, J. D., Tian, L., Lai, C. (2014). Word-level emotion recognition using high-level features, LNCS. Berlin: Springer. https://doi.org/10.1007/978-3-642-54903-8_2.

Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49, 98–112.CrossRef

Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.CrossRef

Patel, P., Chaudhari, A., Kale, R., & Pund, M. A. (2009). Emotion recognition from speech with gaussian mixture models & via boosted GMM. International Journal of Research In Science & Engineering, 3(2), 47–53.

Peng, Z., Zhu, Z., Unoki, M., Dang, J., & Akagi, M. (2017). Speech emotion recognition using multichannel parallel convolutional recurrent neural networks based on Gammatone Auditory Filterbank. Proceedings of APSIPA Annual Summit and Conference, pp 1750–1755. https://ieeexplore.ieee.org/document/8282316/.

Pervaiz, M., & Khan, T. A. (2016). Emotion recognition from speech using prosodic and linguistic features. International Journal of Advanced Computer Science and Applications, 7(8), 84–90.CrossRef

Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16, 143–160.CrossRef

Rabiner, L. & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.

Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRef

Sapra, A., Panwar, N., & Panwar, S. (2013). Emotion recognition from speech. International Journal of Emerging Technology and Advanced Engineering, 3(2), 341–345.

Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1, Winter-Spring), 41–46.

Sharma, A., Anderson, D. V. (2015). Deep emotion recognition using prosodic and spectral feature extraction and classification based on cross-validation and bootstrap. IEEE Signal Processing and Signal Processing Education Workshop. https://ieeexplore.ieee.org/document/7369591/.

Sreenivasa Rao, K., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S. V. S. K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3(2), 3603–3607.

Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M. A., Schuller, B., Zafeiriou, S. (2016). Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. IEEE ICASSP, pp. 5200–5204. https://ieeexplore.ieee.org/document/7472669/.

Vogt, T., Andr, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. Proceedings Language Resources and Evaluation Conference, pp. 1123–1126. https://www.informatik.uni-augsburg.de/lehrstuehle/hcm/publications/2006-LREC/.

Wua, S., Falk b, T. H., & Chan, W.-Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53, 768–785.CrossRef

Yogesh, C. K., Hariharan, M., Ngadiran, R., Adom, A. H., Yaacob, S., Berkai, C., Polat, K. (2017). A new hybrid PSO assisted biogeography-based optimisation for emotion and stress recognition from speech signal. Expert Systems with Applications, 69, 149–158.CrossRef

Zhang, W., Meng, X., Li, Z., Lu, Q., & Tan, S. (2015). Emotion recognition in speech using multi-classification SVM. UIC-ATC-IEEE ScalCom-CBDCom-IoP, pp. 1181–1186. https://ieeexplore.ieee.org/document/7518394/.

Titel: Robust emotion recognition from speech: Gamma tone features and models
verfasst von: A. Revathi
N. Sasikaladevi
R. Nagakrishnan
C. Jeyalakshmi
Publikationsdatum: 04.08.2018
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-018-9546-1

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Kundenpotenzial/© Andrii Yalanskyi / Getty Images / iStock, Toyota-Logo/© ollo / Getty Images / iStock, Sebastian Glenschek/© Hermes International, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH, adäsion-Webinar-Matinee/© krystiannawrocki_ Getty Images

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2018

Determining the optimal conditions for signal reconstruction based on STFT magnitude

Research on English pronunciation training based on intelligent speech recognition

Language identification using phase information

Significance of duration modification for speaker verification under mismatch speech tempo condition

Spoken language recognition using a new conditional cascade method to combine acoustic and phonetic results

Blind digital speech watermarking using filter bank multicarrier modulation for 5G and IoT driven networks

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.