nach oben

International Journal of Speech Technology

10.04.2024

Survey on Arabic speech emotion recognition

verfasst von: Latifa Iben Nasr, Abir Masmoudi, Lamia Hadrich Belguith

Erschienen in: International Journal of Speech Technology

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Emotions represent a fundamental aspect when evaluating user satisfaction or collecting customer feedback in human interactions, as well as in the realm of human–computer interface (HCI) technologies. Moreover, as human beings, we possess a distinctive capacity for communication through spoken language. Recently, the realm of Speech Emotion Recognition (SER) has garnered substantial interest and gained significant traction within the domain of Natural Language Processing (NLP). Its primary objective remains the identification of various emotions, such as sadness, neutrality, and anger, from audio speech using a diverse array of classifiers. This paper conducts a comprehensive critical analysis of the existing Arabic SER studies. Furthermore, this research delves into the performance and constraints associated with these previous works. It also sheds light on the current promising trends aimed at enhancing methods for recognizing emotions in speech. To the best of our knowledge, this research stands as a pioneering contribution to the SER field, particularly in the context of reviewing existing Arabic studies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abdel-Hamid, L. (2020). Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features. Speech Communication, 122, 19–30.

Abdel-Hamid, L., Shaker, N. H., & Emara, I. (2020). Analysis of linguistic and prosodic features of bilingual Arabic–English speakers for speech emotion recognition. IEEE Access, 8, 72957–72970.

Agrima, A., Mounir, I., Farchi, A., ElMazouzi, L., & Mounir, B. (2022). Emotion recognition based on the energy distribution of plosive syllables. International Journal of Electrical and Computer Engineering, 12(6), 6159.

Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56–76.CrossRef

Alamri, H. & Alshanbari, H. (2023). Emotion recognition in Arabic speech from Saudi dialect corpus using machine learning and deep learning algorithms.

Al-Faham, A., & Ghneim, N. (2016). Towards enhanced Arabic speech emotion recognition: Comparison between three methodologies. Asian Journal of Science and Technology, 7(3), 2665–2669.

Aljuhani, R. H., Alshutayri, A., & Alahdal, S. (2021). Arabic speech emotion recognition from Saudi dialect corpus, Jeddah, Saudi Arabia. IEEE Access, 9, 127081–127085.CrossRef

Alnuaim, A. A., Zakariah, M., Alhadlaq, A., Shashidhar, C., Hatamleh, W. A., Tarazi, H., ... & Ratna, R. (2022). Human–computer interaction with detection of speaker emotions using convolution neural networks. Computational Intelligence and Neuroscience, 2022, 7463091.

Al-onazi, B. B., Nauman, M. A., Jahangir, R., Malik, M. M., Alkhammash, E. H., & Elshewey, A. M. (2022). Transformer-based multilingual speech emotion recognition using data augmentation and feature fusion. Applied Sciences, 12(18), 9188.CrossRef

Alsabhan, W. (2023). Human–computer interaction with a real-time speech emotion recognition with ensembling techniques 1D convolution neural network and attention. Sensors, 23(3), 1386.

Barrett, L. F., & Russell, J. A. (1998). Independence and bipolarity in the structure of current affect. Journal of Personality and Social Psychology, 74(4), 967.CrossRef

Burkhardt, F., Ajmera, J., Englert, R., Stegmann, J., & Burleson, W. (2006). Detecting anger in automated voice portal dialogs. In Proceedings of the ninth international conference on spoken language processing, Pittsburgh, PA, USA, September 17–21, 2006.

Busso, C., Bulut, M., & Narayanan, S. (2012). Toward effective automatic recognition systems of emotion in speech. In Social emotions in nature and artifact: Emotions in human and human–computer interaction (pp. 110–127). Oxford University Press.

Cherif, R. Y., Moussaoui, A., Frahta, N., & Berrimi, M. (2021). Effective speech emotion recognition using deep learning approaches for Algerian dialect. In 2021 international conference of women in data science at Taif University (WiDSTaif), 2021 (pp. 1–6). IEEE.

Dahmani, H., Hussein, H., Meyer-Sickendiek, B., & Jokisch, O. (2019). Natural Arabic language resources for emotion recognition in Algerian dialect. In Arabic language processing: From theory to practice: 7th international conference (ICALP 2019), Nancy, France, October 16–17, 2019, Proceedings 7 (pp. 18–33). Springer.

Devillers, L., Vaudable, C. & Chasatgnol, C. (2010). Real-life emotion-related states detection in call centers: A cross-corpora study. In Proceedings of INTERSPEECH, Makuhari, Chiba, Japan, 2010 (pp. 2350–2355).

El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.CrossRef

El Seknedy, M., & Fawzi, S. A. (2022). Emotion recognition system for Arabic speech: Case study Egyptian accent. In International conference on model and data engineering, 2022 (pp. 102–115). Springer.

Eshkol-Taravella, I., Baude, O., Maurel, D., Hriba, L., Dugua, C., & Tellier, I. (2011). Un grand corpus oral “disponible”: le corpus d’Orléans 1 1968–2012. Revue TAL, Ressources Linguistiques Libres, 53(2), 17–46. https://halshs.archives-ouvertes.fr/halshs-01163053

Eyben, F., Weninger, F., Gross, F., & Schuller, B. (2013). Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of the 21st ACM international conference on multimedia, MM 2013, 2013, Barcelona, Spain (pp. 835–838). https://doi.org/10.1145/2502081.2502224

Garnier-Rizet, M., Adda, G., Cailliau, F., Gauvain, J. L., Guillemin-Lanne, S., Lamel, L., ... & Waast-Richard, C. (2008). CallSurf: Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content. In Proceedings of language resources and evaluation conference (LREC), 2008, Marrakech, Morocco (pp. 2623–2628).

Hadjadji, I., Falek, L., Demri, L., & Teffahi, H. (2019). Emotion recognition in Arabic speech. In 2019 international conference on advanced electrical engineering (ICAEE), 2019 (pp. 1–5). IEEE.

Hifny, Y., & Ali, A. (2019, May). Efficient Arabic emotion recognition using deep neural networks. In 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP 2019) (pp. 6710–6714). IEEE.

Horkous, H. (2021). La reconnaissance des émotions dans le dialecte Algérien. Doctoral Dissertation, Ecole Nationale Supérieure Polytechnique Alger.

Hossain, M. S., Muhammad, G., Song, B., Hassan, M. M., Alelaiwi, A., & Alamri, A. (2015). Audio–visual emotion-aware cloud gaming framework. IEEE Transactions on Circuits and Systems for Video Technology, 25, 2105–2118.CrossRef

Hsu, W.-N., Bolte, B., Tsai, Y.-H.H., Lakhotia, K., Salakhutdinov, R., & Mohamed, A. (2021). HuBERT: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech and Language Processing, 29, 3451–3460.CrossRef

Khalil, A., Al-Khatib, W., El-Alfy, E. S., & Cheded, L. (2018). Anger detection in Arabic speech dialogs. In 2018 international conference on computing sciences and engineering (ICCSE 2018) (pp. 1–6). IEEE.

Klaylat, S., Osman, Z., Hamandi, L., & Zantout, R. (2018). Emotion recognition in Arabic speech. Analog Integrated Circuits and Signal Processing, 96, 337–351.CrossRef

Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15, 99–117.CrossRef

Kossaifi, J., Walecki, R., Panagakis, Y., Shen, J., Schmitt, M., Ringeval, F., et al. (2019). SEWA DB: A rich database for audio–visual emotion and sentiment research in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2944808CrossRef

Lieskovská, E., Jakubec, M., Jarina, R., & Chmulík, M. (2021). A review on speech emotion recognition using deep learning and attention mechanism. Electronics, 10(10), 1163.CrossRef

Likitha, M. S., Gupta, S. R. R., Hasitha, K., & Raju, A. U. (2017). Speech based human emotion recognition using MFCC. In Proceedings of the international conference on wireless communication, signal processing and networking (WiSPNET), March 2017 (pp. 2257–2260).

Lotfian, R., & Busso, C. (2019). Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 10(4), 471–483. https://doi.org/10.1109/TAFFC.2017.2736999CrossRef

Macary, M. (2022). Analyse de données massives en temps réel pour l’extraction d’informations sémantiques et émotionnelles de la parole. Doctoral Dissertation, Le Mans Université.

Macary, M., Tahon, M., Estève, Y., & Rousseau, A. (2020). AlloSat: A new call center French corpus for satisfaction and frustration analysis. In Language resources and evaluation conference (LREC 2020).

Meddeb, M., Hichem, K., & Alimi, A. (2016). Automated extraction of features from Arabic emotional speech corpus. International Journal of Computer Information Systems and Industrial Management Applications, 8, 184194.

Meftah, A., Qamhan, M., Alotaibi, Y. A., & Zakariah, M. (2020). Arabic speech emotion recognition using KNN and KSUEmotions corpus. International Journal of Simulation Systems Science and Technology. https://doi.org/10.5013/IJSSST.a.21.02.21CrossRef

Meftah, A. H., Qamhan, M. A., Seddiq, Y., Alotaibi, Y. A., & Selouani, S. A. (2021). King Saud University emotions corpus: Construction, analysis, evaluation, and comparison. IEEE Access, 9, 54201–54219.

Mohamed, O., & Aly, S. A. (2021). Arabic speech emotion recognition employing wav2vec2. 0 and HuBERT based on BAVED dataset. arXiv preprint arXiv:2110.04425

Mohammad, O. A., & Elhadef, M. (2021). Arabic speech emotion recognition method based on LPC and PPSD. In 2021 2nd international conference on computation, automation and knowledge management (ICCAKM) (pp. 31–36). IEEE.

Morrison, K. M. (2007). Natural resources, aid, and democratization: A best case scenario. Public Choice, 131(3–4), 365–386.CrossRef

Munot, R., & Nenkova, A. (2019). Emotion impacts speech recognition performance. In Proceedings of the conference of North American Chapter of the Association of Computational Linguistics, student research workshop, 2019 (pp. 16–21). https://doi.org/10.18653/v1/n19-3003.

Nasr, L. I., Masmoudi, A., & Belguith, L. H. (2023). Natural Tunisian speech preprocessing for features extraction. In 2023 IEEE/ACIS 23rd international conference on computer and information science (ICIS 2023) (pp. 73–78). IEEE.

Nema, B. M., & Abdul-Kareem, A. A. (2018). Preprocessing signal for speech emotion recognition. Al-Mustansiriyah Journal of Science, 28(3), 157. https://doi.org/10.23851/mjs.v28i3.48CrossRef

Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence–arousal space. IEEE Transactions on Affective Computing, 2(2), 92–105.CrossRef

Oh, K., Lee, D., Ko, B., & Choi, H. (2017). A Chatbot for psychiatric counseling in mental healthcare service based on emotional dialogue analysis and sentence generation. In Proceedings of the 2017 18th IEEE international conference on mobile data management (MDM), Daejeon, Korea, May 29–June 1, 2017 (pp. 371–375).

Platt, J. C. (1998, April). Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14.

Poorna, S. S., & Nair, G. J. (2019). Multistage classification scheme to enhance speech emotion recognition. International Journal of Speech Technology, 22, 327–340.CrossRef

Prasetya, M. R., Harjoko, A., & Supriyanto, C. (2019). Speech emotion recognition of Indonesian movie audio tracks based on MFCC and SVM. In 2019 international conference on contemporary computing and informatics (IC3I) (pp. 22–25). IEEE.

Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1), 227–256.CrossRef

Shahin, I., Alomari, O. A., Nassif, A. B., Afyouni, I., Hashem, I. A., & Elnagar, A. (2023). An efficient feature selection method for Arabic and English speech emotion recognition using Grey Wolf Optimizer. Applied Acoustics, 205, 109279.CrossRef

Shahin, I., Hindawi, N., Nassif, A. B., Alhudhaif, A., & Polat, K. (2022). Novel dual-channel long short-term memory compressed capsule networks for emotion recognition. Expert Systems with Applications, 188, 116080.CrossRef

Shahin, I., Nassif, A. B., & Hamsa, S. (2019). Emotion recognition using hybrid Gaussian mixture model and deep neural network. IEEE Access, 7, 26777–26787.CrossRef

Singh, Y. B., & Goel, S. (2022). A systematic literature review of speech emotion recognition approaches. Neurocomputing, 492, 245–263.CrossRef

Tahon, M. (2012). Analyse acoustique de la voix émotionnelle de locuteurs lors d’une interaction humain-robot. These de doctorat, Paris 11.

Tahon, M., Macary, M., & Luzzati, D. (2021). Mutual impact of acoustic and linguistic representations for continuous emotion recognition in call-center conversations. https://doi.org/10.36227/techrxiv.17104526.v1

Tajalsir, M., Hernandez, S. M., & Mohammed, F. A. (2022). ASERS-LSTM: Arabic speech emotion recognition system based on LSTM model. Signal and Image Processing: An International Journal. https://doi.org/10.5121/sipij.2022.13102CrossRef

Torres-García, A. A., Garcia, C. A. R., Villasenor-Pineda, L., & Mendoza-Montoya, O. (Eds.) (2021). Biosignal processing and classification using computational learning and intelligence: Principles, algorithms, and applications. Academic Press.

Vapnik, V. (1995). The nature of statistical learning theory. Springer.CrossRef

Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.CrossRef

Wani, T. M., Gunawan, T. S., Qadri, S. A. A., Kartiwi, M., & Ambikairajah, E. (2021). A comprehensive review of speech emotion recognition systems. IEEE Access, 9, 47795–47814.

Wierzbicka, A. (1999). Emotions across languages and cultures: Diversity and universals. Cambridge University Press.

Wong, E., & Sridharan, S. (2003). Fusion of output scores on language identification system. In Multilingual speech and language processing, 2003 (p. 7).

Yenigalla, P., Kumar, A., Tripathi, S., Singh, C., Kar, S., & Vepa, J. (2018). Speech emotion recognition using spectrogram and phoneme embedding. In Proceedings of the INTERSPEECH, September 2–6, 2018, Hyderabad, India.

Zaghouani, W. (2017). Critical survey of the freely available Arabic corpora. arXiv preprint arXiv:1702.07835

Zantout, R., Klaylat, S., Hamandi, L., & Osman, Z. (2020). Ensemble models for enhancement of an Arabic speech emotion recognition system. In Advances in information and communication: Proceedings of the 2019 future of information and communication conference (FICC) (Vol. 2, pp. 174–187). Springer.

Zeng, X., & Wang, D. S. (2009). A generalized extended rational expansion method and its application to (1 + 1)-dimensional dispersive long wave equation. Applied Mathematics and Computation, 212(2), 296–304.MathSciNetCrossRef

Titel: Survey on Arabic speech emotion recognition
verfasst von: Latifa Iben Nasr
Abir Masmoudi
Lamia Hadrich Belguith
Publikationsdatum: 10.04.2024
Verlag: Springer US
Erschienen in: International Journal of Speech Technology
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-024-10088-7

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence_ieS/© Springer Fachmedien Wiesbaden GmbH, Search Icon, Banner Hanser, Strompreise/© vejaa / stock.adobe.com, Bunte Männchen, die Kunden darstelle, werden von einem riesigen Magneten angezogen. /© Oleksiy Mark, Dr. Daniel Schneider/© Fraunhofer IESE, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.