Skip to main content
Top
Published in: International Journal of Speech Technology 3/2023

25-08-2023 | Manuscript

Speaker and gender dependencies in within/cross linguistic Speech Emotion Recognition

Authors: Adil Chakhtouna, Sara Sekkate, Abdellah Adib

Published in: International Journal of Speech Technology | Issue 3/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this difficult period and with the great influence of COVID-19 on many aspects of people’s lives, many areas have been affected such as economy, tourism and especially issues related to the medical field. For example in healthcare, a lot of people suffered from psychological and emotional disorders. Speech Emotion Recognition (SER) seems to be useful for different medical teams to understand the emotional state of their patients. The central contribution of this research is the creation of new features called Stationary Mel Frequency Cepstral Coefficients (SMFCC) and Discrete Mel Frequency Cepstral Coefficients (DMFCC) through the use of Multilevel Wavelet Transform (MWT) and conventional MFCC features. The proposed method was evaluated in different patterns: Within/Cross-language, Speaker-Dependency and Gender-Dependency. Recognition rates of \(91.4\%\), \(74.4\%\) and \(80,8\%\) were reached for EMO-DB (German), RAVDESS (English) and EMOVO (Italian) target databases, respectively, in Speaker-dependent (SD) experiments for both genders (female and male). Therefore, the conclusive performance matrix is mentioned below to provide additional information on the model’s performance in the various experiments performed. The experimental results show that the proposed SER system outperforms other previous SER studies.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Ahmed, S. T., Singh, D. K., Basha, S. M., Abouel Nasr, E., Kamrani, A. K., & Aboudaif, M. K. (2021). Neural network based mental depression identification and sentiments classification technique from speech signals: A covid-19 focused pandemic study. Frontiers in Public Health, 9, 781827.CrossRef Ahmed, S. T., Singh, D. K., Basha, S. M., Abouel Nasr, E., Kamrani, A. K., & Aboudaif, M. K. (2021). Neural network based mental depression identification and sentiments classification technique from speech signals: A covid-19 focused pandemic study. Frontiers in Public Health, 9, 781827.CrossRef
go back to reference Akil, S., Sekkate, S., & Adib, A. (2021). Feature selection based on machine learning for credit scoring: An evaluation of filter and embedded methods. In 2021 International conference on innovations in intelligent systems and applications (INISTA) (pp. 1–6). IEEE. Akil, S., Sekkate, S., & Adib, A. (2021). Feature selection based on machine learning for credit scoring: An evaluation of filter and embedded methods. In 2021 International conference on innovations in intelligent systems and applications (INISTA) (pp. 1–6). IEEE.
go back to reference Ancilin, J., & Milton, A. (2021). Improved speech emotion recognition with Mel frequency magnitude coefficient. Applied Acoustics, 179, 108046.CrossRef Ancilin, J., & Milton, A. (2021). Improved speech emotion recognition with Mel frequency magnitude coefficient. Applied Acoustics, 179, 108046.CrossRef
go back to reference Assunção, G., Menezes, P., & Perdigão, F. (2020). Speaker awareness for speech emotion recognition. International Journal of Online and Biomedical Engineering, 16(4), 15–22. Assunção, G., Menezes, P., & Perdigão, F. (2020). Speaker awareness for speech emotion recognition. International Journal of Online and Biomedical Engineering, 16(4), 15–22.
go back to reference Bhavan, A., Chauhan, P., Shah, R. R., et al. (2019). Bagged support vector machines for emotion recognition from speech. Knowledge-Based Systems, 184, 104886.CrossRef Bhavan, A., Chauhan, P., Shah, R. R., et al. (2019). Bagged support vector machines for emotion recognition from speech. Knowledge-Based Systems, 184, 104886.CrossRef
go back to reference Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., Weiss, B., et al. (2005). A database of German emotional speech. Interspeech, 5, 1517–1520. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., Weiss, B., et al. (2005). A database of German emotional speech. Interspeech, 5, 1517–1520.
go back to reference Burrus, C. S., Gopinath, R. A., Guo, H., Odegard, J. E., & Selesnick, I. W. (1997). Introduction to wavelets and wavelet transforms: A primer. Pentice Hall. Burrus, C. S., Gopinath, R. A., Guo, H., Odegard, J. E., & Selesnick, I. W. (1997). Introduction to wavelets and wavelet transforms: A primer. Pentice Hall.
go back to reference Chakhtouna, A., Sekkate, S., & Adib, A. (2021). Improving speech emotion recognition system using spectral and prosodic features. In 2021 International conference on intelligent systems design and applications (ISDA) (pp. 1–10). Springer. Chakhtouna, A., Sekkate, S., & Adib, A. (2021). Improving speech emotion recognition system using spectral and prosodic features. In 2021 International conference on intelligent systems design and applications (ISDA) (pp. 1–10). Springer.
go back to reference Chakhtouna, A., Sekkate, S., & Adib, A. (2022). Improving speaker-dependency/independency of wavelet-based speech emotion recognition. In Emerging trends in intelligent systems & network security (pp. 281–291). Springer. Chakhtouna, A., Sekkate, S., & Adib, A. (2022). Improving speaker-dependency/independency of wavelet-based speech emotion recognition. In Emerging trends in intelligent systems & network security (pp. 281–291). Springer.
go back to reference Chakhtouna, A., Sekkate, S., & Adib, A. (2023). Speech emotion recognition using pre-trained and fine-tuned transfer learning approaches. In Innovations in smart cities applications volume 6: The proceedings of the 7th international conference on smart city applications (pp. 365–374). Springer. Chakhtouna, A., Sekkate, S., & Adib, A. (2023). Speech emotion recognition using pre-trained and fine-tuned transfer learning approaches. In Innovations in smart cities applications volume 6: The proceedings of the 7th international conference on smart city applications (pp. 365–374). Springer.
go back to reference Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.CrossRefMATH Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.CrossRefMATH
go back to reference Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). Emovo corpus: An Italian emotional speech database. In International conference on language resources and evaluation (LREC 2014) (pp. 3501–3504). European Language Resources Association (ELRA). Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). Emovo corpus: An Italian emotional speech database. In International conference on language resources and evaluation (LREC 2014) (pp. 3501–3504). European Language Resources Association (ELRA).
go back to reference Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.CrossRef Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.CrossRef
go back to reference Dissanayake, T., Rajapaksha, Y., Ragel, R., & Nawinne, I. (2019). An ensemble learning approach for electrocardiogram sensor based human emotion recognition. Sensors, 19(20), 4495.CrossRef Dissanayake, T., Rajapaksha, Y., Ragel, R., & Nawinne, I. (2019). An ensemble learning approach for electrocardiogram sensor based human emotion recognition. Sensors, 19(20), 4495.CrossRef
go back to reference Evain, S., Lecouteux, B., Schwab, D., Contesse, A., Pinchaud, A., & Bernardoni, N. H. (2021). Human beatbox sound recognition using an automatic speech recognition toolkit. Biomedical Signal Processing and Control, 67, 102468.CrossRef Evain, S., Lecouteux, B., Schwab, D., Contesse, A., Pinchaud, A., & Bernardoni, N. H. (2021). Human beatbox sound recognition using an automatic speech recognition toolkit. Biomedical Signal Processing and Control, 67, 102468.CrossRef
go back to reference Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., André, E., Busso, C., Devillers, L. Y., Epps, J., Laukka, P., Narayanan, S. S., et al. (2015). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2), 190–202.CrossRef Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., André, E., Busso, C., Devillers, L. Y., Epps, J., Laukka, P., Narayanan, S. S., et al. (2015). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2), 190–202.CrossRef
go back to reference Grossmann, A., & Morlet, J. (1984). Decomposition of hardy functions into square integrable wavelets of constant shape. SIAM Journal on Mathematical Analysis, 15(4), 723–736.MathSciNetCrossRefMATH Grossmann, A., & Morlet, J. (1984). Decomposition of hardy functions into square integrable wavelets of constant shape. SIAM Journal on Mathematical Analysis, 15(4), 723–736.MathSciNetCrossRefMATH
go back to reference Issa, D., Demirci, M. F., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894.CrossRef Issa, D., Demirci, M. F., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894.CrossRef
go back to reference Janse, P. V., Magre, S. B., Kurzekar, P. K., & Deshmukh, R. (2014). A comparative study between MFCC and DWT feature extraction technique. International Journal of Engineering Research and Technology, 3(1), 3124–3127. Janse, P. V., Magre, S. B., Kurzekar, P. K., & Deshmukh, R. (2014). A comparative study between MFCC and DWT feature extraction technique. International Journal of Engineering Research and Technology, 3(1), 3124–3127.
go back to reference Kanwal, S., & Asghar, S. (2021). Speech emotion recognition using clustering based GA-optimized feature set. IEEE Access, 9, 125830–125842.CrossRef Kanwal, S., & Asghar, S. (2021). Speech emotion recognition using clustering based GA-optimized feature set. IEEE Access, 9, 125830–125842.CrossRef
go back to reference Karimi, S., & Sedaaghi, M. H. (2013). Robust emotional speech classification in the presence of babble noise. International Journal of Speech Technology, 16(2), 215–227.CrossRef Karimi, S., & Sedaaghi, M. H. (2013). Robust emotional speech classification in the presence of babble noise. International Journal of Speech Technology, 16(2), 215–227.CrossRef
go back to reference Khalil, M., Adib, A., et al. (2020). An end-to-end multi-level wavelet convolutional neural networks for heart diseases diagnosis. Neurocomputing, 417, 187–201.CrossRef Khalil, M., Adib, A., et al. (2020). An end-to-end multi-level wavelet convolutional neural networks for heart diseases diagnosis. Neurocomputing, 417, 187–201.CrossRef
go back to reference Kishore, K. K., & Satish, P. K. (2013). Emotion recognition in speech using MFCC and wavelet features. In 2013 3rd IEEE international advance computing conference (IACC) (pp. 842–847). IEEE. Kishore, K. K., & Satish, P. K. (2013). Emotion recognition in speech using MFCC and wavelet features. In 2013 3rd IEEE international advance computing conference (IACC) (pp. 842–847). IEEE.
go back to reference Kockmann, M., Burget, L., et al. (2011). Application of speaker-and language identification state-of-the-art techniques for emotion recognition. Speech Communication, 53(9–10), 1172–1185.CrossRef Kockmann, M., Burget, L., et al. (2011). Application of speaker-and language identification state-of-the-art techniques for emotion recognition. Speech Communication, 53(9–10), 1172–1185.CrossRef
go back to reference Kurpukdee, N., Kasuriya, S., Chunwijitra, V., Wutiwiwatchai, C., & Lamsrichan, P. (2017). A study of support vector machines for emotional speech recognition. In 2017 8th international conference of information and communication technology for embedded systems (IC-ICTES) (pp. 1–6). IEEE. Kurpukdee, N., Kasuriya, S., Chunwijitra, V., Wutiwiwatchai, C., & Lamsrichan, P. (2017). A study of support vector machines for emotional speech recognition. In 2017 8th international conference of information and communication technology for embedded systems (IC-ICTES) (pp. 1–6). IEEE.
go back to reference Kursa, M. B., Rudnicki, W. R., et al. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13.CrossRef Kursa, M. B., Rudnicki, W. R., et al. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13.CrossRef
go back to reference Lalitha, S., Tripathi, S., & Gupta, D. (2019). Enhanced speech emotion detection using deep neural networks. International Journal of Speech Technology, 22(3), 497–510.CrossRef Lalitha, S., Tripathi, S., & Gupta, D. (2019). Enhanced speech emotion detection using deep neural networks. International Journal of Speech Technology, 22(3), 497–510.CrossRef
go back to reference Latif, S., Qayyum, A., Usman, M., & Qadir, J. (2018). Cross lingual speech emotion recognition: Urdu vs. Western languages. In 2018 International conference on frontiers of information technology (FIT) (pp. 88–93). IEEE. Latif, S., Qayyum, A., Usman, M., & Qadir, J. (2018). Cross lingual speech emotion recognition: Urdu vs. Western languages. In 2018 International conference on frontiers of information technology (FIT) (pp. 88–93). IEEE.
go back to reference Livingstone, S. R., & Russo, F. A. (2018). The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196391.CrossRef Livingstone, S. R., & Russo, F. A. (2018). The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196391.CrossRef
go back to reference McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., & Nieto, O. (2015). Librosa: Audio and music signal analysis in Python. In Proceedings of the 14th Python in science conference (Vol. 8, pp. 18–25). Citeseer. McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., & Nieto, O. (2015). Librosa: Audio and music signal analysis in Python. In Proceedings of the 14th Python in science conference (Vol. 8, pp. 18–25). Citeseer.
go back to reference Naing, H. M. S., Hidayat, R., Hartanto, R., & Miyanaga, Y. (2020). Discrete wavelet denoising into MFCC for noise suppressive in automatic speech recognition system. International Journal of Intelligent Engineering and Systems, 13(2), 74–82.CrossRef Naing, H. M. S., Hidayat, R., Hartanto, R., & Miyanaga, Y. (2020). Discrete wavelet denoising into MFCC for noise suppressive in automatic speech recognition system. International Journal of Intelligent Engineering and Systems, 13(2), 74–82.CrossRef
go back to reference Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.CrossRef Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.CrossRef
go back to reference Praksah, C., & Gaikwad, V. (2015). Analysis of emotion recognition system through speech signal using KNN, GMM & SVM classifier. IOSR Journal of Electronics and Communication Engineering (IOSR-JECE), 10(2), 55–67. Praksah, C., & Gaikwad, V. (2015). Analysis of emotion recognition system through speech signal using KNN, GMM & SVM classifier. IOSR Journal of Electronics and Communication Engineering (IOSR-JECE), 10(2), 55–67.
go back to reference Ramakrishnan, S., & El Emary, I. M. (2013). Speech emotion recognition approaches in human computer interaction. Telecommunication Systems, 52(3), 1467–1478.CrossRef Ramakrishnan, S., & El Emary, I. M. (2013). Speech emotion recognition approaches in human computer interaction. Telecommunication Systems, 52(3), 1467–1478.CrossRef
go back to reference Ramya, J., Vijaylakshmi, H., & Saifuddin, H. M. (2021). Segmentation of skin lesion images using discrete wavelet transform. Biomedical Signal Processing and Control, 69, 102839.CrossRef Ramya, J., Vijaylakshmi, H., & Saifuddin, H. M. (2021). Segmentation of skin lesion images using discrete wavelet transform. Biomedical Signal Processing and Control, 69, 102839.CrossRef
go back to reference Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16(2), 143–160.CrossRef Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16(2), 143–160.CrossRef
go back to reference Riyad, M., Khalil, M., & Adib, A. (2021). A novel multi-scale convolutional neural network for motor imagery classification. Biomedical Signal Processing and Control, 68, 102747.CrossRef Riyad, M., Khalil, M., & Adib, A. (2021). A novel multi-scale convolutional neural network for motor imagery classification. Biomedical Signal Processing and Control, 68, 102747.CrossRef
go back to reference Rybka, J., & Janicki, A. (2013). Comparison of speaker dependent and speaker independent emotion recognition. International Journal of Applied Mathematics and Computer Science,23(4). Rybka, J., & Janicki, A. (2013). Comparison of speaker dependent and speaker independent emotion recognition. International Journal of Applied Mathematics and Computer Science,23(4).
go back to reference Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., & Narayanan, S. (2010). The interspeech 2010 paralinguistic challenge. In Proceedings of INTERSPEECH 2010 (pp. 2794–2797). Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., & Narayanan, S. (2010). The interspeech 2010 paralinguistic challenge. In Proceedings of INTERSPEECH 2010 (pp. 2794–2797).
go back to reference Sekkate, S., Khalil, M., & Adib, A. (2017). Speaker identification: A way to reduce call-sign confusion events. In 2017 International conference on advanced technologies for signal and image processing (ATSIP) (pp. 1–6). IEEE. Sekkate, S., Khalil, M., & Adib, A. (2017). Speaker identification: A way to reduce call-sign confusion events. In 2017 International conference on advanced technologies for signal and image processing (ATSIP) (pp. 1–6). IEEE.
go back to reference Sekkate, S., Khalil, M., Adib, A., & Ben Jebara, S. (2019a). A multiresolution-based fusion strategy for improving speech emotion recognition efficiency. In International conference on mobile, secure, and programmable networking (pp. 96–109). Springer. Sekkate, S., Khalil, M., Adib, A., & Ben Jebara, S. (2019a). A multiresolution-based fusion strategy for improving speech emotion recognition efficiency. In International conference on mobile, secure, and programmable networking (pp. 96–109). Springer.
go back to reference Sekkate, S., Khalil, M., Adib, A., & Ben Jebara, S. (2019b). An investigation of a feature-level fusion for noisy speech emotion recognition. Computers, 8(4), 91.CrossRef Sekkate, S., Khalil, M., Adib, A., & Ben Jebara, S. (2019b). An investigation of a feature-level fusion for noisy speech emotion recognition. Computers, 8(4), 91.CrossRef
go back to reference Sharma, R., Pachori, R. B., & Sircar, P. (2020). Automated emotion recognition based on higher order statistics and deep learning algorithm. Biomedical Signal Processing and Control, 58, 101867.CrossRef Sharma, R., Pachori, R. B., & Sircar, P. (2020). Automated emotion recognition based on higher order statistics and deep learning algorithm. Biomedical Signal Processing and Control, 58, 101867.CrossRef
go back to reference Shensa, M. J., et al. (1992). The discrete wavelet transform: Wedding the a Trous and Mallat algorithms. IEEE Transactions on Signal Processing, 40(10), 2464–2482.CrossRefMATH Shensa, M. J., et al. (1992). The discrete wavelet transform: Wedding the a Trous and Mallat algorithms. IEEE Transactions on Signal Processing, 40(10), 2464–2482.CrossRefMATH
go back to reference Sönmez, Y. Ü., & Varol, A. (2020). A speech emotion recognition model based on multi-level local binary and local ternary patterns. IEEE Access, 8, 190784–190796.CrossRef Sönmez, Y. Ü., & Varol, A. (2020). A speech emotion recognition model based on multi-level local binary and local ternary patterns. IEEE Access, 8, 190784–190796.CrossRef
go back to reference Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical Signal Processing and Control, 18, 80–90.CrossRef Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical Signal Processing and Control, 18, 80–90.CrossRef
go back to reference Tan, Y., Sun, Z., Duan, F., Solé-Casals, J., & Caiafa, C. F. (2021). A multimodal emotion recognition method based on facial expressions and electroencephalography. Biomedical Signal Processing and Control, 70, 103029.CrossRef Tan, Y., Sun, Z., Duan, F., Solé-Casals, J., & Caiafa, C. F. (2021). A multimodal emotion recognition method based on facial expressions and electroencephalography. Biomedical Signal Processing and Control, 70, 103029.CrossRef
go back to reference Tuncer, T., Dogan, S., & Acharya, U. R. (2021). Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowledge-Based Systems, 211, 106547.CrossRef Tuncer, T., Dogan, S., & Acharya, U. R. (2021). Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowledge-Based Systems, 211, 106547.CrossRef
go back to reference Upadhya, S. S., Cheeran, A., & Nirmal, J. H. (2018). Thomson multitaper MFCC and PLP voice features for early detection of Parkinson disease. Biomedical Signal Processing and Control, 46, 293–301.CrossRef Upadhya, S. S., Cheeran, A., & Nirmal, J. H. (2018). Thomson multitaper MFCC and PLP voice features for early detection of Parkinson disease. Biomedical Signal Processing and Control, 46, 293–301.CrossRef
go back to reference Wang, K., Su, G., Liu, L., & Wang, S. (2020). Wavelet packet analysis for speaker-independent emotion recognition. Neurocomputing, 398, 257–264.CrossRef Wang, K., Su, G., Liu, L., & Wang, S. (2020). Wavelet packet analysis for speaker-independent emotion recognition. Neurocomputing, 398, 257–264.CrossRef
go back to reference Zehra, W., Javed, A. R., Jalil, Z., Khan, H. U., & Gadekallu, T. R. (2021). Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex & Intelligent Systems, 1–10. Zehra, W., Javed, A. R., Jalil, Z., Khan, H. U., & Gadekallu, T. R. (2021). Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex & Intelligent Systems, 1–10.
go back to reference Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1d & 2d CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312–323.CrossRef Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1d & 2d CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312–323.CrossRef
go back to reference Zhu, L., Chen, L., Zhao, D., Zhou, J., & Zhang, W. (2017). Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17(7), 1694.CrossRef Zhu, L., Chen, L., Zhao, D., Zhou, J., & Zhang, W. (2017). Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17(7), 1694.CrossRef
Metadata
Title
Speaker and gender dependencies in within/cross linguistic Speech Emotion Recognition
Authors
Adil Chakhtouna
Sara Sekkate
Abdellah Adib
Publication date
25-08-2023
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 3/2023
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-023-10038-9

Other articles of this Issue 3/2023

International Journal of Speech Technology 3/2023 Go to the issue