nach oben

International Journal of Speech Technology

Erschienen in:

02.08.2016

An optimal two stage feature selection for speech emotion recognition using acoustic features

verfasst von: Swarna Kuchibhotla, Hima Deepthi Vankayalapati, Koteswara Rao Anne

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Feature Fusion plays an important role in speech emotion recognition to improve the classification accuracy by combining the most popular acoustic features for speech emotion recognition like energy, pitch and mel frequency cepstral coefficients. However the performance of the system is not optimal because of the computational complexity of the system, which occurs due to high dimensional correlated feature set after feature fusion. In this paper, a two stage feature selection method is proposed. In first stage feature selection, appropriate features are selected and fused together for speech emotion recognition. In second stage feature selection, optimal feature subset selection techniques [sequential forward selection (SFS) and sequential floating forward selection (SFFS)] are used to eliminate the curse of dimensionality problem due to high dimensional feature vector after feature fusion. Finally the emotions are classified by using several classifiers like Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Support Vector Machine (SVM) and K Nearest Neighbor (KNN). The performance of overall emotion recognition system is validated over Berlin and Spanish databases by considering classification rate. An optimal uncorrelated feature set is obtained by using SFS and SFFS individually. Results reveal that SFFS is a better choice as a feature subset selection method because SFS suffers from nesting problem i.e it is difficult to discard a feature after it is retained into the set. SFFS eliminates this nesting problem by making the set not to be fixed at any stage but floating up and down during the selection based on the objective function. Experimental results showed that the efficiency of the classifier is improved by 15–20 % with two stage feature selection method when compared with performance of the classifier with feature fusion.

Nächster Artikel Stream fusion for multi-stream automatic speech recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Anne, K. R., Kuchibhotla, S., & Vankayalapati, H. D. (2015). Acoustic modeling for emotion recognition. Berlin: Springer.CrossRef

Bou-Ghazale, S. E., & Hansen, J. H. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing, 8(4), 429–442.CrossRef

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of german emotional speech. Interspeech, 5, 1517–1520.

Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40(1), 5–32.CrossRefMATH

Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.CrossRef

Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap (Vol. 57). Boca Raton, FL: CRC Press.MATH

El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.CrossRefMATH

Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–874.MathSciNetCrossRef

Fernandez, R. (2003). A computational model for the automatic recognition of affect in speech. PhD thesis, Massachusetts Institute of Technology

Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. In LREC

Iohnstone, T., & Scherer, K. (2000). Vocal communication of emotion. Handbook of emotion (pp. 220–235). New York: Guilford.

Jaimes, A., & Sebe, N. (2007). Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding, 108(1), 116–134.CrossRef

Jain, A., & Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2), 153–158.CrossRef

John, G. H., Kohavi, R., Pfleger, K., et al. (1994). Irrelevant features and the subset selection problem. ICML, 94, 121–129.

Kim, S., Georgiou, P. G., Lee, S., & Narayanan, S. (2007). Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In MMSP 2007, IEEE 9th Workshop on Multimedia Signal Processing (pp. 48–51). IEEE.

Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1), 273–324.CrossRefMATH

Kuchibhotla, S., Vankayalapati, H., Vaddi, R., & Anne, K. (2014a). A comparative analysis of classifiers in emotion recognition through acoustic features. International Journal of Speech Technology, 17(4), 401–408.CrossRef

Kuchibhotla, S., Yalamanchili, B., Vankayalapati, H., & Anne, K. (2014b). Speech emotion recognition using regularized discriminant analysis. In Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2013 (pp. 363–369). Springer.

Kuchibhotlaa, S., Vankayalapati, H. D., Yalamanchili, B., & Anne, K. R. (2015). Analysis and evaluation of discriminant analysis techniques for multiclass classification of human vocal emotions. In Advances in Intelligent Informatics (pp. 325–333). Springer.

Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In INTERSPEECH. Beijing: Citeseer.

Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.CrossRef

Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490–501.CrossRef

Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93, 1097.CrossRef

Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden markov models. Speech Communication, 41(4), 603–623.CrossRef

Pantic, M., & Rothkrantz, L. J. (2003). Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9), 1370–1390.CrossRef

Pohjalainen, J., Räsänen, O., & Kadioglu, S. (2013). Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Computer Speech & Language, 29(1), 145–171.CrossRef

Pudil, P., Novovičová, J., & Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(11), 1119–1125.CrossRef

Sato, N., & Obuchi, Y. (2007). Emotion recognition using mel-frequency cepstral coefficients. Information and Media Technologies, 2(3), 835–848.

Scherer, K. R., Banse, R., Wallbott, H. G., & Goldbeck, T. (1991). Vocal cues in emotion encoding and decoding. Motivation and Emotion, 15(2), 123–148.CrossRef

Sedaaghi, M. H., Ververidis, D., & Kotropoulos, C. (2007). Improving speech emotion recognition using adaptive genetic algorithms. In Proceedings of European Signal Processing Conference (EUSIPCO).

Šimundić, A. M. (2008). Measures of diagnostic accuracy: Basic definitions. Medical and Biological Sciences, 22(4), 61–65.

Somol, P., Pudil, P., Novovičová, J., & Paclık, P. (1999). Adaptive floating search methods in feature selection. Pattern Recognition Letters, 20(11), 1157–1163.CrossRef

Tao, J., & Tan, T. (2005). Affective computing: A review. In International Conference on Affective computing and intelligent interaction (pp. 981–995). Springer.

Vankayalapati, H., Anne, K., & Kyamakya, K. (2010). Extraction of visual and acoustic features of the driver for monitoring driver ergonomics applied to extended driver assistance systems. In Data and Mobility (pp. 83–94). Springer.

Vankayalapati, H. D., Siddha, V. R., Kyamakya, K., & Anne, K. R. (2011). Driver emotion detection from the acoustic features of the driver for real-time assessment of driving ergonomics process. International Society for Advanced Science and Technology (ISAST) Transactions on Computers and Intelligent Systems, 3(1), 65–73.

Ververidis, D., & Kotropoulos, C. (2006). Fast sequential floating forward selection applied to emotional speech features estimated on des and susas data collections. In textit2006 14th European on Signal Processing Conference

Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956–2970.CrossRefMATH

Vogt, T., André, E, & Wagner, J. (2008). Automatic recognition of emotions from speech: A review of the literature and recommendations for practical realisation. In Affect and Emotion in Human-Computer Interaction (pp. 75–91). Springer.

Whitney, A. W. (1971). A direct method of nonparametric measurement selection. IEEE Transactions on Computers, 100(9), 1100–1103.MathSciNetCrossRefMATH

Williams, C. E., & Stevens, K. N. (1981). Vocal correlates of emotional states. In J. K. Darby (Ed.), Speech evaluation in psychiatry (pp. 221–240). New York: Grune & Stratton.

Titel: An optimal two stage feature selection for speech emotion recognition using acoustic features
verfasst von: Swarna Kuchibhotla
Hima Deepthi Vankayalapati
Koteswara Rao Anne
Publikationsdatum: 02.08.2016
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-016-9358-0

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2016

Word sense disambiguation for Arabic text using Wikipedia and Vector Space Model

Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering

Spectral analysis of infant cries and adult speech

Speaker diarization system using MKMFCC parameterization and WLI-fuzzy clustering

Voice assessments for detecting patients with Parkinson’s diseases using PCA and NPCA

Subspace filtering approach based on orthogonal projection for better analysis of stressed speech under clean and noisy environments

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.