Skip to main content
Erschienen in:

06.12.2023

Hilbert Domain Analysis of Wavelet Packets for Emotional Speech Classification

verfasst von: Biswajit Karan, Arvind Kumar

Erschienen in: Circuits, Systems, and Signal Processing | Ausgabe 4/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work investigates the significance of Hilbert domain characterization of wavelet packets in classifying different emotion of speech signal. The goal of this paper is to create a new emotional speech database and introduce a new feature extraction approach that can recognize various emotions. The proposed feature, wavelet cepstral coefficients (WCC) are based on Hilbert spectrum analysis of the wavelet packet of the speech signal. The speaker-independent machine learning models are developed using multiclass support vector machine (SVM) and k-nearest neighbourhood (KNN) classifier. The approach is tested with newly developed Telugu Indian database and the EMOVO (Italian emotional speech) database. Our proposed wavelet features achieve a peak accuracy of 73.5%, further boosted by NCA feature selection by 3–5%, resulting in an improved unweighted average recall (UAR) of 78% for database 1 and 87.50% for database 2, employing optimal wavelet features in conjunction with SVM classification. The proposed features outperformed the baseline Mel-frequency cepstral coefficients (MFCC) feature. The performance of newly formulated features is better than other existing methodologies tested with different language databases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat B.J. Abbaschian, D. Sierra-Sosa, A. Elmaghraby, Deep learning techniques for speech emotion recognition, from databases to models. Sensors 21(4), 1249 (2021)PubMedPubMedCentralCrossRefADS B.J. Abbaschian, D. Sierra-Sosa, A. Elmaghraby, Deep learning techniques for speech emotion recognition, from databases to models. Sensors 21(4), 1249 (2021)PubMedPubMedCentralCrossRefADS
2.
Zurück zum Zitat J. Ancilin, A. Milton, Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)CrossRef J. Ancilin, A. Milton, Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)CrossRef
4.
Zurück zum Zitat X. Cai, D. Dai, Z. Wu, X. Li, J. Li & H. Meng (2021) Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5734–5738). IEEE. https://doi.org/10.1109/ICASSP39728.2021.9413907 X. Cai, D. Dai, Z. Wu, X. Li, J. Li & H. Meng (2021) Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5734–5738). IEEE. https://​doi.​org/​10.​1109/​ICASSP39728.​2021.​9413907
6.
Zurück zum Zitat Y. Chavhan, M.L. Dhore, P. Yesaware, Speech emotion recognition using support vector machine. Int. J. Comput. Appl. 1(20), 6–9 (2010) Y. Chavhan, M.L. Dhore, P. Yesaware, Speech emotion recognition using support vector machine. Int. J. Comput. Appl. 1(20), 6–9 (2010)
8.
Zurück zum Zitat G. Costantini, I. Iaderola, A. Paoloni, & M. Todisco (2014) EMOVO corpus: an Italian emotional speech database. In: Proceedings of the ninth international conference on language resources and evaluation (LREC'14) (pp. 3501–3504). European Language Resources Association (ELRA) G. Costantini, I. Iaderola, A. Paoloni, & M. Todisco (2014) EMOVO corpus: an Italian emotional speech database. In: Proceedings of the ninth international conference on language resources and evaluation (LREC'14) (pp. 3501–3504). European Language Resources Association (ELRA)
9.
Zurück zum Zitat S.K. Dmello, S.D. Craig, A. Witherspoon, B. McDaniel, A. Graesser, Automatic detection of learner’s affect from conversational cues. User Model. User-Adapt. Interact. 18, 45–80 (2008)CrossRef S.K. Dmello, S.D. Craig, A. Witherspoon, B. McDaniel, A. Graesser, Automatic detection of learner’s affect from conversational cues. User Model. User-Adapt. Interact. 18, 45–80 (2008)CrossRef
10.
Zurück zum Zitat S. Deb, S. Dandapat, Multiscale amplitude feature and significance of enhanced vocal tract information for emotion classification. IEEE Trans. Cybernet. 49(3), 802–815 (2018)CrossRef S. Deb, S. Dandapat, Multiscale amplitude feature and significance of enhanced vocal tract information for emotion classification. IEEE Trans. Cybernet. 49(3), 802–815 (2018)CrossRef
11.
Zurück zum Zitat V.N. Degaonkar, S.D. Apte, Emotion modeling from speech signal based on wavelet packet transform. Int. J. Speech Technol. 16(1), 1–5 (2013)CrossRef V.N. Degaonkar, S.D. Apte, Emotion modeling from speech signal based on wavelet packet transform. Int. J. Speech Technol. 16(1), 1–5 (2013)CrossRef
12.
Zurück zum Zitat J. Deng, X. Xu, Z. Zhang, S. Frühholz, B. Schuller, Exploitation of phase-based features for whispered speech emotion recognition. IEEE Access 4, 4299–4309 (2016)CrossRef J. Deng, X. Xu, Z. Zhang, S. Frühholz, B. Schuller, Exploitation of phase-based features for whispered speech emotion recognition. IEEE Access 4, 4299–4309 (2016)CrossRef
13.
Zurück zum Zitat A. Ganapathy, Speech emotion recognition using deep learning techniques. ABC J. Adv. Res. 5(2), 113–122 (2016)CrossRef A. Ganapathy, Speech emotion recognition using deep learning techniques. ABC J. Adv. Res. 5(2), 113–122 (2016)CrossRef
14.
Zurück zum Zitat P. Gangamohan, S.R. Kadiri, B. Yegnanarayana, Analysis of emotional speech at subsegmental level. Interspeech 2013, 1916–1920 (2013) P. Gangamohan, S.R. Kadiri, B. Yegnanarayana, Analysis of emotional speech at subsegmental level. Interspeech 2013, 1916–1920 (2013)
16.
Zurück zum Zitat T.S. Gunawan, M.F. Alghifari, M.A. Morshidi, M. Kartiwi, A review on emotion recognition algorithms using speech analysis Indonesian. J. Electric. Eng. Inform. (IJEEI) 6(1), 12–20 (2018) T.S. Gunawan, M.F. Alghifari, M.A. Morshidi, M. Kartiwi, A review on emotion recognition algorithms using speech analysis Indonesian. J. Electric. Eng. Inform. (IJEEI) 6(1), 12–20 (2018)
17.
Zurück zum Zitat L. Guo, L. Wang, J. Dang, E.S. Chng, S. Nakagawa, Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition. Speech Commun. 136, 118–127 (2022)CrossRef L. Guo, L. Wang, J. Dang, E.S. Chng, S. Nakagawa, Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition. Speech Commun. 136, 118–127 (2022)CrossRef
18.
Zurück zum Zitat F. Haider, S. Pollak, P. Albert, S. Luz, Emotion recognition in low-resource settings: an evaluation of automatic feature selection methods. Comput. Speech Language 65, 101119 (2021)CrossRef F. Haider, S. Pollak, P. Albert, S. Luz, Emotion recognition in low-resource settings: an evaluation of automatic feature selection methods. Comput. Speech Language 65, 101119 (2021)CrossRef
20.
Zurück zum Zitat T. Han, J. Zhang, Z. Zhang, G. Sun, L. Ye, H. Ferdinando, S. Yang, Emotion recognition and school violence detection from children speech. EURASIP J. Wirel. Commun. Netw. 2018(1), 1–10 (2018)CrossRef T. Han, J. Zhang, Z. Zhang, G. Sun, L. Ye, H. Ferdinando, S. Yang, Emotion recognition and school violence detection from children speech. EURASIP J. Wirel. Commun. Netw. 2018(1), 1–10 (2018)CrossRef
21.
Zurück zum Zitat Z. M. Hira, D. F. Gillies (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015 Z. M. Hira, D. F. Gillies (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015
22.
Zurück zum Zitat N.H. Ho, H.J. Yang, S.H. Kim, G. Lee, Multimodal approach of speech emotion recognition using multi-level multi-head fusion attention-based recurrent neural network. IEEE Access 8, 61672–61686 (2020)CrossRef N.H. Ho, H.J. Yang, S.H. Kim, G. Lee, Multimodal approach of speech emotion recognition using multi-level multi-head fusion attention-based recurrent neural network. IEEE Access 8, 61672–61686 (2020)CrossRef
23.
Zurück zum Zitat D. Issa, M.F. Demirci, A. Yazici, Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)CrossRef D. Issa, M.F. Demirci, A. Yazici, Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)CrossRef
24.
Zurück zum Zitat R. Jahangir, Y.W. Teh, F. Hanif, G. Mujtaba, Deep learning approaches for speech emotion recognition: state of the art and research challenges. Multimed. Tools Appl. 80, 1–68 (2021) R. Jahangir, Y.W. Teh, F. Hanif, G. Mujtaba, Deep learning approaches for speech emotion recognition: state of the art and research challenges. Multimed. Tools Appl. 80, 1–68 (2021)
25.
Zurück zum Zitat C. Joesph, A. Rajeswari, B. Premalatha, & C. Balapriya (2020) Implementation of physiological signal based emotion recognition algorithm. In 2020 IEEE 36th international conference on data engineering (ICDE) (pp. 2075–2079). IEEE C. Joesph, A. Rajeswari, B. Premalatha, & C. Balapriya (2020) Implementation of physiological signal based emotion recognition algorithm. In 2020 IEEE 36th international conference on data engineering (ICDE) (pp. 2075–2079). IEEE
26.
Zurück zum Zitat S.R. Kadiri, P. Gangamohan, S.V. Gangashetty, P. Alku, B. Yegnanarayana, Excitation features of speech for emotion recognition using neutral speech as reference. Circ. Syst. Signal Process. 39(9), 4459–4481 (2020)CrossRef S.R. Kadiri, P. Gangamohan, S.V. Gangashetty, P. Alku, B. Yegnanarayana, Excitation features of speech for emotion recognition using neutral speech as reference. Circ. Syst. Signal Process. 39(9), 4459–4481 (2020)CrossRef
27.
Zurück zum Zitat B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, An investigation about the relationship between dysarthria level of speech and the neurological state of Parkinson’s patients. Biocybernet. Biomed. Eng. 42(2), 710–726 (2022)CrossRef B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, An investigation about the relationship between dysarthria level of speech and the neurological state of Parkinson’s patients. Biocybernet. Biomed. Eng. 42(2), 710–726 (2022)CrossRef
28.
Zurück zum Zitat B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, K. Mahto, Hilbert spectrum analysis for automatic detection and evaluation of Parkinson’s speech. Biomed. Signal Process. Control 61, 102050 (2020)CrossRef B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, K. Mahto, Hilbert spectrum analysis for automatic detection and evaluation of Parkinson’s speech. Biomed. Signal Process. Control 61, 102050 (2020)CrossRef
29.
Zurück zum Zitat B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, K. Mahto, Non-negative matrix factorization-based time-frequency feature extraction of voice signal for Parkinson’s disease prediction. Comput. Speech Lang. 69, 101216 (2021)CrossRef B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, K. Mahto, Non-negative matrix factorization-based time-frequency feature extraction of voice signal for Parkinson’s disease prediction. Comput. Speech Lang. 69, 101216 (2021)CrossRef
30.
Zurück zum Zitat L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, M. A. Mahjoub & C. Cleder (2019). Automatic speech emotion recognition using machine learning L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, M. A. Mahjoub & C. Cleder (2019). Automatic speech emotion recognition using machine learning
31.
Zurück zum Zitat S.G. Koolagudi, S.R. Krothapalli, Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. Int. J. Speech Technol. 15, 495–511 (2012)CrossRef S.G. Koolagudi, S.R. Krothapalli, Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. Int. J. Speech Technol. 15, 495–511 (2012)CrossRef
32.
Zurück zum Zitat P.T. Krishnan, A.N. Joseph Raj, V. Rajangam, Emotion classification from speech signal based on empirical mode decomposition and non-linear features: speech emotion recognition. Complex Intell. Syst. 7, 1919–1934 (2021)CrossRef P.T. Krishnan, A.N. Joseph Raj, V. Rajangam, Emotion classification from speech signal based on empirical mode decomposition and non-linear features: speech emotion recognition. Complex Intell. Syst. 7, 1919–1934 (2021)CrossRef
33.
Zurück zum Zitat T. Kumar, S.S. Rajest, K.O. Villalba-Condori, D. Arias-Chavez, K. Rajesh, M.K. Chakravarthi, An evaluation on speech recognition technology based on machine learning. Webology 19(1), 646–663 (2022)CrossRef T. Kumar, S.S. Rajest, K.O. Villalba-Condori, D. Arias-Chavez, K. Rajesh, M.K. Chakravarthi, An evaluation on speech recognition technology based on machine learning. Webology 19(1), 646–663 (2022)CrossRef
35.
Zurück zum Zitat S. Kwon, Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network. Int. J. Intell. Syst. 36(9), 5116–5135 (2021)CrossRef S. Kwon, Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network. Int. J. Intell. Syst. 36(9), 5116–5135 (2021)CrossRef
36.
Zurück zum Zitat Z.T. Liu, M. Wu, W.H. Cao, J.W. Mao, J.P. Xu, G.Z. Tan, Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273, 271–280 (2018)CrossRef Z.T. Liu, M. Wu, W.H. Cao, J.W. Mao, J.P. Xu, G.Z. Tan, Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273, 271–280 (2018)CrossRef
37.
Zurück zum Zitat S. Lalitha, S. Tripathi, D. Gupta, Enhanced speech emotion detection using deep neural networks. Int. J. Speech Technol. 22, 497–510 (2019)CrossRef S. Lalitha, S. Tripathi, D. Gupta, Enhanced speech emotion detection using deep neural networks. Int. J. Speech Technol. 22, 497–510 (2019)CrossRef
38.
Zurück zum Zitat S. Latif, R. Rana, S. Younis, J. Qadir, & J. Epps (2018) Transfer learning for improving speech emotion classification accuracy. arXiv preprint arXiv:1801.06353 S. Latif, R. Rana, S. Younis, J. Qadir, & J. Epps (2018) Transfer learning for improving speech emotion classification accuracy. arXiv preprint arXiv:​1801.​06353
39.
Zurück zum Zitat C.C. Lee, E. Mower, C. Busso, S. Lee, S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach. Speech Commun. 53(9–10), 1162–1171 (2011)CrossRef C.C. Lee, E. Mower, C. Busso, S. Lee, S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach. Speech Commun. 53(9–10), 1162–1171 (2011)CrossRef
40.
Zurück zum Zitat E. Lieskovská, M. Jakubec, R. Jarina, M. Chmulík, A review on speech emotion recognition using deep learning and attention mechanism. Electronics 10(10), 1163 (2021)CrossRef E. Lieskovská, M. Jakubec, R. Jarina, M. Chmulík, A review on speech emotion recognition using deep learning and attention mechanism. Electronics 10(10), 1163 (2021)CrossRef
41.
Zurück zum Zitat D. Litman, & K. Forbes-Riley (2004) Predicting student emotions in computer-human tutoring dialogues. In: Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04) (pp. 351–358) D. Litman, & K. Forbes-Riley (2004) Predicting student emotions in computer-human tutoring dialogues. In: Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04) (pp. 351–358)
42.
Zurück zum Zitat M. Liu, English speech emotion recognition method based on speech recognition. Int. J. Speech Technol. 25(2), 391–398 (2022)CrossRef M. Liu, English speech emotion recognition method based on speech recognition. Int. J. Speech Technol. 25(2), 391–398 (2022)CrossRef
43.
Zurück zum Zitat W. Liu, W. L. Zheng, & B. L. Lu (2016) Emotion recognition using multimodal deep learning. In Neural information processing: 23rd international conference, ICONIP 2016, Kyoto, Japan, Oct 16–21, 2016, Proceedings, Part II 23 (pp. 521–529). Springer International Publishing W. Liu, W. L. Zheng, & B. L. Lu (2016) Emotion recognition using multimodal deep learning. In Neural information processing: 23rd international conference, ICONIP 2016, Kyoto, Japan, Oct 16–21, 2016, Proceedings, Part II 23 (pp. 521–529). Springer International Publishing
44.
Zurück zum Zitat T. L. Nwe, F. S. Wei, & L. C. De Silva (2001) Speech based emotion classification. In Proceedings of IEEE region 10 international conference on electrical and electronic technology. TENCON 2001 (Cat. No. 01CH37239) (Vol. 1, pp. 297–301). IEEE T. L. Nwe, F. S. Wei, & L. C. De Silva (2001) Speech based emotion classification. In Proceedings of IEEE region 10 international conference on electrical and electronic technology. TENCON 2001 (Cat. No. 01CH37239) (Vol. 1, pp. 297–301). IEEE
45.
Zurück zum Zitat S. Olhede, & A. T. Walden (2004) The Hilbert spectrum via wavelet projections.In: Proceedings of the royal society of London. Series A: mathematical, physical and engineering sciences, 460(2044), 955–975 S. Olhede, & A. T. Walden (2004) The Hilbert spectrum via wavelet projections.In: Proceedings of the royal society of London. Series A: mathematical, physical and engineering sciences460(2044), 955–975
46.
Zurück zum Zitat T. Özseven, A novel feature selection method for speech emotion recognition. Appl. Acoust. 146, 320–326 (2019)CrossRef T. Özseven, A novel feature selection method for speech emotion recognition. Appl. Acoust. 146, 320–326 (2019)CrossRef
47.
Zurück zum Zitat R. Pappagari, J. Villalba, P. Żelasko, L. Moro-Velazquez & N. Dehak (2021). Copypaste: an augmentation method for speech emotion recognition. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6324–6328). IEEE. [43] R. Pappagari, J. Villalba, P. Żelasko, L. Moro-Velazquez & N. Dehak (2021). Copypaste: an augmentation method for speech emotion recognition. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6324–6328). IEEE. [43]
48.
Zurück zum Zitat Percival, B. Donald, and T. Andrew Walden. Wavelet methods for time series analysis. Vol. 4. Cambridge university press, 2000 Percival, B. Donald, and T. Andrew Walden. Wavelet methods for time series analysis. Vol. 4. Cambridge university press, 2000
49.
Zurück zum Zitat S. Ramakrishnan, I.M. El Emary, Speech emotion recognition approaches in human computer interaction. Telecommun. Syst. 52, 1467–1478 (2013)CrossRef S. Ramakrishnan, I.M. El Emary, Speech emotion recognition approaches in human computer interaction. Telecommun. Syst. 52, 1467–1478 (2013)CrossRef
50.
Zurück zum Zitat S. Ramesh, S. Gomathi, S. Sasikala, T.R. Saravanan, Automatic speech emotion detection using hybrid of gray wolf optimizer and naïve Bayes. Int. J. Speech Technol. 12, 1–8 (2021) S. Ramesh, S. Gomathi, S. Sasikala, T.R. Saravanan, Automatic speech emotion detection using hybrid of gray wolf optimizer and naïve Bayes. Int. J. Speech Technol. 12, 1–8 (2021)
51.
Zurück zum Zitat K.S. Rao, S.G. Koolagudi, R.R. Vempada, Emotion recognition from speech using global and local prosodic features. Int. J. Speech Technol. 16, 143–160 (2013)CrossRef K.S. Rao, S.G. Koolagudi, R.R. Vempada, Emotion recognition from speech using global and local prosodic features. Int. J. Speech Technol. 16, 143–160 (2013)CrossRef
52.
Zurück zum Zitat A. Shahzadi, A. Ahmadyfard, A. Harimi, K. Yaghmaie, Speech emotion recognition using nonlinear dynamics features. Turk. J. Electric. Eng. Comput. Sci. 23, 2056 (2015)CrossRef A. Shahzadi, A. Ahmadyfard, A. Harimi, K. Yaghmaie, Speech emotion recognition using nonlinear dynamics features. Turk. J. Electric. Eng. Comput. Sci. 23, 2056 (2015)CrossRef
53.
Zurück zum Zitat P. Shen, Z. Changjun, & X. Chen (2011) Automatic speech emotion recognition using support vector machine. In Proceedings of 2011 international conference on electronic & mechanical engineering and information technology (Vol. 2, pp. 621–625). IEEE P. Shen, Z. Changjun, & X. Chen (2011) Automatic speech emotion recognition using support vector machine. In Proceedings of 2011 international conference on electronic & mechanical engineering and information technology (Vol. 2, pp. 621–625). IEEE
54.
Zurück zum Zitat M. Sidorov, S. Ultes, & A. Schmitt (2014) Emotions are a personal thing: Towards speaker-adaptive emotion recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4803–4807). IEEE M. Sidorov, S. Ultes, & A. Schmitt (2014) Emotions are a personal thing: Towards speaker-adaptive emotion recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4803–4807). IEEE
55.
Zurück zum Zitat M. Swain, B. Maji, P. Kabisatpathy, A. Routray, A DCRNN-based ensemble classifier for speech emotion recognition in Odia language. Complex Intell. Syst. 8(5), 4237–4249 (2022)CrossRef M. Swain, B. Maji, P. Kabisatpathy, A. Routray, A DCRNN-based ensemble classifier for speech emotion recognition in Odia language. Complex Intell. Syst. 8(5), 4237–4249 (2022)CrossRef
56.
Zurück zum Zitat D. Tang, P. Kuppens, L. Geurts, T. van Waterschoot, End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network. EURASIP J. Audio Speech Music Process. 2021(1), 18 (2021)PubMedPubMedCentralCrossRef D. Tang, P. Kuppens, L. Geurts, T. van Waterschoot, End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network. EURASIP J. Audio Speech Music Process. 2021(1), 18 (2021)PubMedPubMedCentralCrossRef
57.
Zurück zum Zitat J.H. Tao, J. Huang, Y. Li, Z. Lian, M.Y. Niu, Semi-supervised ladder networks for speech emotion recognition. Int. J. Autom. Comput. 16, 437–448 (2019)CrossRef J.H. Tao, J. Huang, Y. Li, Z. Lian, M.Y. Niu, Semi-supervised ladder networks for speech emotion recognition. Int. J. Autom. Comput. 16, 437–448 (2019)CrossRef
58.
Zurück zum Zitat K. Tarunika, R. B. Pradeeba, & P. Aruna (2018). Applying machine learning techniques for speech emotion recognition. In 2018 9th international conference on computing, communication and networking technologies (ICCCNT) (pp. 1–5). IEEE K. Tarunika, R. B. Pradeeba, & P. Aruna (2018). Applying machine learning techniques for speech emotion recognition. In 2018 9th international conference on computing, communication and networking technologies (ICCCNT) (pp. 1–5). IEEE
59.
Zurück zum Zitat W. Ting, Y. Guo-Zheng, Y. Bang-Hua, S. Hong, EEG feature extraction based on wavelet packet decomposition for brain computer interface. Measurement 41(6), 618–625 (2008)CrossRefADS W. Ting, Y. Guo-Zheng, Y. Bang-Hua, S. Hong, EEG feature extraction based on wavelet packet decomposition for brain computer interface. Measurement 41(6), 618–625 (2008)CrossRefADS
60.
Zurück zum Zitat S. Tripathi, A. Kumar, A. Ramesh, C. Singh, & P. Yenigalla (2019) Deep learning based emotion recognition system using speech features and transcriptions. arXiv preprint arXiv:1906.05681 S. Tripathi, A. Kumar, A. Ramesh, C. Singh, & P. Yenigalla (2019) Deep learning based emotion recognition system using speech features and transcriptions. arXiv preprint arXiv:​1906.​05681
61.
Zurück zum Zitat T. Tuncer, S. Dogan, U.R. Acharya, Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl. Based Syst. 211, 106547 (2021)CrossRef T. Tuncer, S. Dogan, U.R. Acharya, Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl. Based Syst. 211, 106547 (2021)CrossRef
62.
Zurück zum Zitat M.Z. Uddin, E.G. Nilsson, Emotion recognition using speech and neural structured learning to facilitate edge intelligence. Eng. Appl. Artif. Intell. 94, 103775 (2020)CrossRef M.Z. Uddin, E.G. Nilsson, Emotion recognition using speech and neural structured learning to facilitate edge intelligence. Eng. Appl. Artif. Intell. 94, 103775 (2020)CrossRef
63.
Zurück zum Zitat T. Vogt, E. André, An evaluation of emotion units and feature types for real-time speech emotion recognition. KI-Künstliche Intelligenz 25, 213–223 (2011)CrossRef T. Vogt, E. André, An evaluation of emotion units and feature types for real-time speech emotion recognition. KI-Künstliche Intelligenz 25, 213–223 (2011)CrossRef
64.
Zurück zum Zitat Y. Wang, L. Guan, Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10(5), 936–946 (2008)CrossRef Y. Wang, L. Guan, Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10(5), 936–946 (2008)CrossRef
65.
Zurück zum Zitat P. Yadav, G. Aggarwal, Speech emotion classification using machine learning. Int. J. Comput. Appl. 118(13), 44 (2015) P. Yadav, G. Aggarwal, Speech emotion classification using machine learning. Int. J. Comput. Appl. 118(13), 44 (2015)
66.
Zurück zum Zitat Z. Zhang, E. Coutinho, J. Deng, B. Schuller, Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans. Audio Speech Language Process. 23(1), 115–126 (2014) Z. Zhang, E. Coutinho, J. Deng, B. Schuller, Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans. Audio Speech Language Process. 23(1), 115–126 (2014)
67.
Zurück zum Zitat J. Zhao, X. Mao, L. Chen, Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 47, 312–323 (2019)CrossRef J. Zhao, X. Mao, L. Chen, Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 47, 312–323 (2019)CrossRef
68.
Zurück zum Zitat C. Zheng, C. Wang, N. Jia, A two-channel speech emotion recognition model based on raw stacked waveform. Multimed. Tools Appl. 81(8), 11537–11562 (2022)CrossRef C. Zheng, C. Wang, N. Jia, A two-channel speech emotion recognition model based on raw stacked waveform. Multimed. Tools Appl. 81(8), 11537–11562 (2022)CrossRef
Metadaten
Titel
Hilbert Domain Analysis of Wavelet Packets for Emotional Speech Classification
verfasst von
Biswajit Karan
Arvind Kumar
Publikationsdatum
06.12.2023
Verlag
Springer US
Erschienen in
Circuits, Systems, and Signal Processing / Ausgabe 4/2024
Print ISSN: 0278-081X
Elektronische ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-023-02544-7