nach oben

Wireless Personal Communications

Erschienen in:

21.02.2017

Novel Sub-band Spectral Centroid Weighted Wavelet Packet Features with Importance-Weighted Support Vector Machines for Robust Speech Emotion Recognition

verfasst von: Yongming Huang, Wu Ao, Guobao Zhang

Erschienen in: Wireless Personal Communications | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we propose novel sub-band spectral centroid weighted wavelet packet cepstral coefficients (W-WPCC) for robust speech emotion recognition. Wavelet packet transform (WPT), as an effective tool for non-stationary signal analysis, is applied for speech analysis with a human auditory perception based WP filterbank structure. For each sub-band, the spectral centroid, which has been proved to be noise-robust, is calculated. On this basis, the W-WPCC feature is computed by combining the sub-band energies with sub-band spectral centroids via a weighting scheme to generate noise-robust acoustic features. The importance-weighted support vector machine (IW-SVM) is proposed to improve the robustness of classifier to the noises, while the important weight is utilized to compensate the covariate shift between test dataset and training dataset. Clean speech environments while demonstrates better noise-robustness in noisy environments and the IW-SVM improves the robustness to white Gaussian noise in speech emotion recognition compared with conventional classifiers.

Vorheriger Artikel Global Optimization of Wireless Seismic Sensor Network Based on the Kriging Model and Improved Particle Swarm Optimization Algorithm

Nächster Artikel Sensing as Services: Resource-Oriented Service Publishing Method for Devices in Internet of Things

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Zeng, Z. H., Pantic, M., Roisman, G. I., et al. (2009). A survey of affect recognition methods: audio, visual, and spontaneous expressions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58.CrossRef

Brisson, J., Martel, K., Serres, J., Sirois, S., & Adrien, J. L. (2014). Acoustic analysis of oral productions of infants later diagnosed with autism and their mother. Infant Mental Health Journal, 35(3), 285–295.CrossRef

Kiavash, B., Rob, N., & Wim, W. (2016). Towards multimodal emotion recognition in e-learning environments. Interactive Learning Environments, 24(3), 590–605.CrossRef

Crumpton, J., & Bethel, C. L. (2015). A survey of using vocal prosody to convey emotion in robot speech. International Journal of Social Robotics, 8(2), 271–285.CrossRef

Inshirah, I., & Salam, M. S. H. (2015). Voice quality features for speech emotion recognition. Journal of Information Assurance and Security, 10(4), 183–191.

Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs[J]. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.CrossRef

Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture[C]//Acoustics, Speech, and Signal Processing, 2004. In Proceedings. (ICASSP ‘04). IEEE International Conference on, 2004: I-577-580.

Vlasenko, B., Schuller, B., Wendemuth, A. et al. (2007). Frame vs. turn-level: Emotion recognition from speech considering static and dynamic processing[C]//Affective Computing and Intelligent Interaction, Proceedings, 2007: 139–147, 781.

Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification[J]. The Journal of the Acoustical Society of America, 55(6), 1304–1312.CrossRef

10.

Guzman, M., Correa, S., Munoz, D., et al. (2013). Influence on spectral energy distribution of emotional expression[J]. Journal of Voice, 27(1), 129.e1–129.e10.CrossRef

11.

Fastl, H., & Zwicer, E. (1999). Psychoacoustics: Facts and models[M] (2nd ed.). New York: Springer-Verlag.

12.

Iliev, A. I., & Scordilis M. S. (2011). Spoken emotion recognition using glottal symmetry[J]. Eurasip Journal on Advances in Signal Processing, 2011(1), 1–11.CrossRef

13.

Hassan, A., Damper, R., & Niranjan, M. (2013). On acoustic emotion recognition: Compensating for covariate shift. IEEE Transactions on Audio, Speech and Language Processing, 21(7), 1458–1468.CrossRef

14.

Shamiand, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication, 49(3), 201–212.CrossRef

15.

Tahon, M., & Devillers, L. (2016). Towards a small set of robust acoustic features for emotion recognition: Challenges. IEEE-ACM Transactions on Audio Speech and Language Processing, 24(1), 16–28.CrossRef

16.

Shah, M., Chakrabarti, C., & Spanias, A. (2015). Within and cross-corpus speech emotion recognition using latent topic model-based features. Eurasip Journal on Audio Speech and Music Processing, 2015(1), 1–17.CrossRef

17.

Deng, J., Xia, R., Zhang, Z., & Liu, Y. (2014) Introducing shared-hidden-layer autoencoders for transfer learning and their application in acoustic emotion recognition. Icassp IEEE International Conference on Acoustics, 4818–4822.

18.

Tahon, M., Sehili, M. A., & Devillers, L. (2015). Cross-corpus experiments on laughter and emotion detection in HRI with elderly people. Springer International Publishing, 31(3), 547–548.

19.

Song, P., Jin, Y., Zha, C., & Zhao, L. (2015). Speech emotion recognition method based on hidden factor analysis. Electronics Letters, 51(1), 112–114.CrossRef

20.

Mallat, S. (2009). A wavelet tour of signal processing[M] (3rd ed.). Burlington: Academic Press.MATH

21.

Daubechies, I. (1992). Ten lectures on wavelets[M] Philadelphia: Society for industrial and applied mathematics.

22.

Mallat, S. G. (1989). A theory for multiresolution signal decomposition: the wavelet representation[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 11(7), 674–693.CrossRefMATH

23.

Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition[M]. New Jersey: Prentice-Hall.MATH

24.

Karmakar, A., Kumar, A., & Patney, R. K. (2007). Design of optimal wavelet packet trees based on auditory perception criterion[J]. IEEE Signal Processing Letters, 14(4), 240–243.CrossRef

25.

Li, Y., Zhang, G, & Huang, Y. (2013). Adaptive wavelet packet filter-bank based acoustic feature for speech emotion recognition[C]. In Proceedings of 2013 Chinese Intelligent Automation Conference-Intelligent Information Processing. Heidelberg: Springer Verlag, pp. 359–366.

26.

Wu, S. Q., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features[J]. Speech Communication, 53(5), 768–785.CrossRef

27.

Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H.-P., & Smola, A. J. (2006). Integrating structured biological data by kernel maximum mean discrepancy. Bioinfor-matics, 22(14), e49–e57.CrossRef

28.

Hido, S., Tsuboi, Y., Kashima, H., & Sugiyama, M. (2007). Novelty detection by density ratio estimation. In Proceedings of IBIS.

29.

Mozafari, A. S., & Amzad, M. (2016). A SVM-based model-transferring method for heterogeneous domain adaptation. Pattern Recognition, 56, 142–158.

30.

Burkhardt, F., Paeschke, A., Rolfes, M. et al. (2005). A database of German emotional speech[C]//Proceeding INTERSPEECH 2005, pp. 1517–1520.

Titel: Novel Sub-band Spectral Centroid Weighted Wavelet Packet Features with Importance-Weighted Support Vector Machines for Robust Speech Emotion Recognition
verfasst von: Yongming Huang
Wu Ao
Guobao Zhang
Publikationsdatum: 21.02.2017
Verlag: Springer US
Erschienen in: Wireless Personal Communications / Ausgabe 3/2017
Print ISSN: 0929-6212
Elektronische ISSN: 1572-834X
DOI: https://doi.org/10.1007/s11277-017-4052-3

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2017

Protocol for Controlling Congestion in Wireless Sensor Networks

Research on Time Synchronization Algorithm of High Precision and Low Power Consumption Based on IRBRS WSNs

Composite Fault Diagnosis in Wireless Sensor Networks Using Neural Networks

A Delay Aware Super-Peer Selection Algorithm for Gradient Topology Utilizing Learning Automata

Localization Algorithm for Large Scale Wireless Sensor Networks Based on Fast-SVM

A Novel Energy Efficient Stable Clustering Approach for Wireless Sensor Networks

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.