Skip to main content
Erschienen in:

08.08.2024

Novel SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

verfasst von: Aron Ritesh, Indra Kiran Sigicharla, Chirag Periwal, Mohanaprasad Kothandaraman, P. S. Nithya Darisini, Sourabh Tiwari, Shivani Arora

Erschienen in: Circuits, Systems, and Signal Processing | Ausgabe 12/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The interpretation of human voices holds importance across various applications. This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications. Voice analysis tech advancements span domains, from improving customer interactions to enhancing healthcare and retail experiences. Discerning emotions aids mental health, while age and gender detection are vital in various contexts. Exploring deep learning models for these predictions involves comparing single, multi-output, and sequential models highlighted in this paper. Sourcing suitable data posed challenges, resulting in the amalgamation of the CREMA-D and EMO-DB datasets. Prior work showed promise in individual predictions, but limited research considered all three variables simultaneously. This paper identifies flaws in an individual model approach and advocates for the novel multi-output learning architecture, the Speech-based Emotion, Gender and Age Analysis (SEGAA) model. The experiments suggest that Multi-output models perform comparably to individual models, efficiently capturing the intricate relationships between variables and speech inputs, all while achieving improved runtime.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat M.R. Ahmed, S. Islam, A.M. Islam, S. Shatabda, An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Syst. Appl. 218, 119633 (2023)CrossRef M.R. Ahmed, S. Islam, A.M. Islam, S. Shatabda, An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Syst. Appl. 218, 119633 (2023)CrossRef
2.
Zurück zum Zitat F. Albu, D. Hagiescu, L. Vladutu, & M. A. Puica, Neural network approaches for children's emotion recognition in intelligent learning applications. In EDULEARN15 Proceedings pp. 3229–3239. IATED. (2015) F. Albu, D. Hagiescu, L. Vladutu, & M. A. Puica, Neural network approaches for children's emotion recognition in intelligent learning applications. In EDULEARN15 Proceedings pp. 3229–3239. IATED. (2015)
3.
Zurück zum Zitat R. Ardila, M. Branson, K. Davis, M. Henretty, M. Kohler, J. Meyer, R. Morais, L. Saunders, F.M. Tyers, G. Weber, G. Weber, 2019 (A massively-multilingual speech corpus. arXiv preprint arXiv, Common voice, 1912), p.06670 R. Ardila, M. Branson, K. Davis, M. Henretty, M. Kohler, J. Meyer, R. Morais, L. Saunders, F.M. Tyers, G. Weber, G. Weber, 2019 (A massively-multilingual speech corpus. arXiv preprint arXiv, Common voice, 1912), p.06670
4.
Zurück zum Zitat R.M. Bădîrcea, A.G. Manta, N.M. Florea, J. Popescu, F.L. Manta, S. Puiu, E-commerce and the factors affecting its development in the age of digital technology: empirical evidence at EU–27 level. Sustainability 14(1), 101 (2021)CrossRef R.M. Bădîrcea, A.G. Manta, N.M. Florea, J. Popescu, F.L. Manta, S. Puiu, E-commerce and the factors affecting its development in the age of digital technology: empirical evidence at EU–27 level. Sustainability 14(1), 101 (2021)CrossRef
5.
Zurück zum Zitat F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, B. Weiss, A database of German emotional speech. In Interspeech 5, 1517–1520 (2005) F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, B. Weiss, A database of German emotional speech. In Interspeech 5, 1517–1520 (2005)
6.
Zurück zum Zitat C. Busso, M. Bulut, C.C. Lee, A. Kazemzadeh, E. Mower, S. Kim, S.S. Narayanan, IEMOCAP: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)CrossRef C. Busso, M. Bulut, C.C. Lee, A. Kazemzadeh, E. Mower, S. Kim, S.S. Narayanan, IEMOCAP: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)CrossRef
7.
Zurück zum Zitat S. Cachero-Martínez, R. Vázquez-Casielles, Building consumer loyalty through e-shopping experiences: The mediating role of emotions. J. Retail. Consum. Serv. 60, 102481 (2021)CrossRef S. Cachero-Martínez, R. Vázquez-Casielles, Building consumer loyalty through e-shopping experiences: The mediating role of emotions. J. Retail. Consum. Serv. 60, 102481 (2021)CrossRef
8.
Zurück zum Zitat H. Cao, D.G. Cooper, M.K. Keutmann, R.C. Gur, A. Nenkova, R. Verma, Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 5(4), 377–390 (2014)CrossRef H. Cao, D.G. Cooper, M.K. Keutmann, R.C. Gur, A. Nenkova, R. Verma, Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 5(4), 377–390 (2014)CrossRef
9.
Zurück zum Zitat K. Dupuis, & M. K. Pichora-Fuller, Toronto emotional speech set (tess)-younger talker_happy. (2010) K. Dupuis, & M. K. Pichora-Fuller, Toronto emotional speech set (tess)-younger talker_happy. (2010)
10.
Zurück zum Zitat M. El Ayadi, M.S. Kamel, F. Karray, Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)CrossRef M. El Ayadi, M.S. Kamel, F. Karray, Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)CrossRef
11.
Zurück zum Zitat A. Elena-Bucea, F. Cruz-Jesus, T. Oliveira, P.S. Coelho, Assessing the role of age, education, gender and income on the digital divide: Evidence for the European Union. Inf. Syst. Front. 23, 1007–1021 (2021)CrossRef A. Elena-Bucea, F. Cruz-Jesus, T. Oliveira, P.S. Coelho, Assessing the role of age, education, gender and income on the digital divide: Evidence for the European Union. Inf. Syst. Front. 23, 1007–1021 (2021)CrossRef
12.
Zurück zum Zitat I. S. Engberg, A. V. Hansen, O. Andersen, & P. Dalsgaard, Design, recording and verification of a Danish emotional speech database. In Fifth European conference on speech communication and technology. (1997) I. S. Engberg, A. V. Hansen, O. Andersen, & P. Dalsgaard, Design, recording and verification of a Danish emotional speech database. In Fifth European conference on speech communication and technology. (1997)
13.
Zurück zum Zitat G. Gonzales, E.L. de Mola, K.A. Gavulic, T. McKay, C. Purcell, Mental health needs among lesbian, gay, bisexual, and transgender college students during the COVID-19 pandemic. J. Adolesc. Health. 67(5), 645–648 (2020)CrossRef G. Gonzales, E.L. de Mola, K.A. Gavulic, T. McKay, C. Purcell, Mental health needs among lesbian, gay, bisexual, and transgender college students during the COVID-19 pandemic. J. Adolesc. Health. 67(5), 645–648 (2020)CrossRef
14.
Zurück zum Zitat S. Goyal, V. V. Patage, & S. Tiwari, Gender and age group predictions from speech features using multi-layer perceptron model. In: 2020 IEEE 17th India Council International Conference (INDICON) pp. 1–6. IEEE. (2020) S. Goyal, V. V. Patage, & S. Tiwari, Gender and age group predictions from speech features using multi-layer perceptron model. In: 2020 IEEE 17th India Council International Conference (INDICON) pp. 1–6. IEEE. (2020)
15.
Zurück zum Zitat W.Y. Jiao, L.N. Wang, J. Liu, S.F. Fang, F.Y. Jiao, M. Pettoello-Mantovani, E. Somekh, Behavioural and emotional disorders in children during the COVID-19 epidemic. J. Pediatr. 221, 264–266 (2020)CrossRef W.Y. Jiao, L.N. Wang, J. Liu, S.F. Fang, F.Y. Jiao, M. Pettoello-Mantovani, E. Somekh, Behavioural and emotional disorders in children during the COVID-19 epidemic. J. Pediatr. 221, 264–266 (2020)CrossRef
16.
Zurück zum Zitat S.G. Koolagudi, K.S. Rao, Emotion recognition from speech: A review. Int. J. Speech Technol. 15, 99–117 (2012)CrossRef S.G. Koolagudi, K.S. Rao, Emotion recognition from speech: A review. Int. J. Speech Technol. 15, 99–117 (2012)CrossRef
17.
Zurück zum Zitat S. G. Koolagudi, R. Reddy, & K. S. Rao, Emotion recognition from speech signal using epoch parameters. In: 2010 international conference on signal processing and communications (SPCOM) pp. 1–5. IEEE. (2010) S. G. Koolagudi, R. Reddy, & K. S. Rao, Emotion recognition from speech signal using epoch parameters. In: 2010 international conference on signal processing and communications (SPCOM) pp. 1–5. IEEE. (2010)
18.
Zurück zum Zitat S. R. Livingstone, K. Peck, & F. A. Russo, Ravdess: The ryerson audio-visual database of emotional speech and song. In: Annual meeting of the canadian society for brain, behaviour and cognitive science pp. 205–211 (2012) S. R. Livingstone, K. Peck, & F. A. Russo, Ravdess: The ryerson audio-visual database of emotional speech and song. In: Annual meeting of the canadian society for brain, behaviour and cognitive science pp. 205–211 (2012)
20.
Zurück zum Zitat R. Pappagari, J. Villalba, P. Żelasko, L. Moro-Velazquez, & N. Dehak, Copypaste: An augmentation method for speech emotion recognition. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 6324–6328. IEEE. (2021) R. Pappagari, J. Villalba, P. Żelasko, L. Moro-Velazquez, & N. Dehak, Copypaste: An augmentation method for speech emotion recognition. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 6324–6328. IEEE. (2021)
21.
Zurück zum Zitat D. S. Park, W. Chan, Y. Zhang, C. C. Chiu, B. Zoph, E. D. Cubuk, & Q. V. Le, Specaugment: A simple data augmentation method for automatic speech recognition. In: arXiv preprint arXiv:1904.08779 (2019) D. S. Park, W. Chan, Y. Zhang, C. C. Chiu, B. Zoph, E. D. Cubuk, & Q. V. Le, Specaugment: A simple data augmentation method for automatic speech recognition. In: arXiv preprint arXiv:​1904.​08779 (2019)
22.
Zurück zum Zitat L. Schmid, A. Gerharz, A. Groll, & M. Pauly, Machine Learning for Multi-Output Regression In: When should a holistic multivariate approach be preferred over separate univariate ones?. arXiv preprint arXiv:2201.05340 (2022) L. Schmid, A. Gerharz, A. Groll, & M. Pauly, Machine Learning for Multi-Output Regression In: When should a holistic multivariate approach be preferred over separate univariate ones?. arXiv preprint arXiv:​2201.​05340 (2022)
23.
Zurück zum Zitat M. Schroder, & R. Cowie, Issues in emotion-oriented computing-towards a shared understanding. In: Workshop on emotion and computing. Citeseer. (2006) M. Schroder, & R. Cowie, Issues in emotion-oriented computing-towards a shared understanding. In: Workshop on emotion and computing. Citeseer. (2006)
24.
Zurück zum Zitat X. Song, Z. Wu, Y. Huang, D. Su, & H. Meng, SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition. In: Interspeech pp. 581–585 (2020) X. Song, Z. Wu, Y. Huang, D. Su, & H. Meng, SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition. In: Interspeech pp. 581–585 (2020)
27.
Zurück zum Zitat D. Ververidis, C. Kotropoulos, Emotional speech recognition: Resources, features, and methods. Speech Commun. 48(9), 1162–1181 (2006)CrossRef D. Ververidis, C. Kotropoulos, Emotional speech recognition: Resources, features, and methods. Speech Commun. 48(9), 1162–1181 (2006)CrossRef
28.
Zurück zum Zitat S. Wang, Z. Wu, G. He, S. Wang, H. Sun, F. Fan, Semi-supervised classification-aware cross-modal deep adversarial data augmentation. Futur. Gener. Comput. Syst. 125, 194–205 (2021)CrossRef S. Wang, Z. Wu, G. He, S. Wang, H. Sun, F. Fan, Semi-supervised classification-aware cross-modal deep adversarial data augmentation. Futur. Gener. Comput. Syst. 125, 194–205 (2021)CrossRef
29.
Zurück zum Zitat T. M. Wani, T. S. Gunawan, H. Mansor, S. A. A. Qadri, A. Sophian, E. Ambikairajah, & E. Ihsanto, Multilanguage speech-based gender classification using time-frequency features and SVM classifier. In: Advances in robotics, automation and data analytics: selected papers from iCITES 2020 pp. 1–10 Springer (2021) T. M. Wani, T. S. Gunawan, H. Mansor, S. A. A. Qadri, A. Sophian, E. Ambikairajah, & E. Ihsanto, Multilanguage speech-based gender classification using time-frequency features and SVM classifier. In: Advances in robotics, automation and data analytics: selected papers from iCITES 2020 pp. 1–10 Springer (2021)
31.
Zurück zum Zitat Q. Zheng, X. Tian, Z. Yu, N. Jiang, A. Elhanashi, S. Saponara, R. Yu, Application of wavelet-packet transform driven deep learning method in PM2 5 concentration prediction: A case study of Qingdao. Chin. Sustain. Cities Soc. 92, 104486 (2023)CrossRef Q. Zheng, X. Tian, Z. Yu, N. Jiang, A. Elhanashi, S. Saponara, R. Yu, Application of wavelet-packet transform driven deep learning method in PM2 5 concentration prediction: A case study of Qingdao. Chin. Sustain. Cities Soc. 92, 104486 (2023)CrossRef
32.
Zurück zum Zitat Q. Zheng, X. Tian, Z. Yu, H. Wang, A. Elhanashi, S. Saponara, DL-PR: Generalized automatic modulation classification method based on deep learning with priori regularization. Eng. Appl. Artif. Intell. 122, 106082 (2023)CrossRef Q. Zheng, X. Tian, Z. Yu, H. Wang, A. Elhanashi, S. Saponara, DL-PR: Generalized automatic modulation classification method based on deep learning with priori regularization. Eng. Appl. Artif. Intell. 122, 106082 (2023)CrossRef
33.
Zurück zum Zitat Q. Zheng, P. Zhao, Y. Li, H. Wang, Y. Yang, Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput. Appl. 33(13), 7723–7745 (2021)CrossRef Q. Zheng, P. Zhao, Y. Li, H. Wang, Y. Yang, Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput. Appl. 33(13), 7723–7745 (2021)CrossRef
34.
Zurück zum Zitat Q. Zheng, P. Zhao, H. Wang, A. Elhanashi, S. Saponara, Fine-grained modulation classification using multi-scale radio transformer with dual-channel representation. IEEE Commun. Lett. 26(6), 1298–1302 (2022)CrossRef Q. Zheng, P. Zhao, H. Wang, A. Elhanashi, S. Saponara, Fine-grained modulation classification using multi-scale radio transformer with dual-channel representation. IEEE Commun. Lett. 26(6), 1298–1302 (2022)CrossRef
35.
Zurück zum Zitat Q. Zheng, P. Zhao, D. Zhang, H. Wang, MR-DCAE: manifold regularization-based deep convolutional autoencoder for unauthorized broadcasting identification. Int. J. Intell. Syst. 36(12), 7204–7238 (2021)CrossRef Q. Zheng, P. Zhao, D. Zhang, H. Wang, MR-DCAE: manifold regularization-based deep convolutional autoencoder for unauthorized broadcasting identification. Int. J. Intell. Syst. 36(12), 7204–7238 (2021)CrossRef
Metadaten
Titel
Novel SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech
verfasst von
Aron Ritesh
Indra Kiran Sigicharla
Chirag Periwal
Mohanaprasad Kothandaraman
P. S. Nithya Darisini
Sourabh Tiwari
Shivani Arora
Publikationsdatum
08.08.2024
Verlag
Springer US
Erschienen in
Circuits, Systems, and Signal Processing / Ausgabe 12/2024
Print ISSN: 0278-081X
Elektronische ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-024-02817-9