Skip to main content
Erschienen in: International Journal of Speech Technology 4/2015

29.07.2015

Four-stage feature selection to recognize emotion from speech signals

verfasst von: A. Milton, S. Tamil Selvi

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Feature selection plays an important role in emotion recognition from speech signals because it improves the classification accuracy by choosing the best uncorrelated features. In wrapper method of feature selection, the features are evaluated by a classifier. Features of large dimension will increase the computational complexity of the classifier, and further it will affect the training of classifiers which needs inverse of covariance matrix. We propose a four-stage feature selection method which avoids the problem of curse of dimensionality by the principle of divide and conquer. In the proposed method, the dimension of the feature vector is shortened at any stage in a way that the classifiers, whose training is affected by the large feature dimension, can also be used to evaluate the features. Experimental results show that the four-stage feature selection method improves classification accuracy. Another method to improve classification accuracy is evolved by bringing together several classifiers with a fusion technique. Class-specific multiple classifiers scheme is one such method that improves classification accuracy by combining optimum performance feature set and classifier for each emotional class. In this work, we improve the performance of the class-specific multiple classifiers scheme by embedding the proposed feature selection method in its structure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aha, D. W., & Bankert, R. L. (1996). A comparative evaluation of sequential feature selection algorithms. Learning from Data, Lecture Notes in Statistics, 112, 199–206.CrossRef Aha, D. W., & Bankert, R. L. (1996). A comparative evaluation of sequential feature selection algorithms. Learning from Data, Lecture Notes in Statistics, 112, 199–206.CrossRef
Zurück zum Zitat Ai, H., Litman, D. J., Forbes-Riley, K., Rotaru, M., Tetreault, A., & Purandare, A. (2006). Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In Proceedings of Interspeech, 2006 (pp. 797–800). Pittsburgh, PA. Ai, H., Litman, D. J., Forbes-Riley, K., Rotaru, M., Tetreault, A., & Purandare, A. (2006). Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In Proceedings of Interspeech, 2006 (pp. 797–800). Pittsburgh, PA.
Zurück zum Zitat Altun, H., & Polat, G. (2009). Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Systems with Applications, 36(4), 8197–8203.CrossRef Altun, H., & Polat, G. (2009). Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Systems with Applications, 36(4), 8197–8203.CrossRef
Zurück zum Zitat Batliner, A., Fischer, K., Huber, R., Spiker, J., & Nöth, E. (2000). Desperately seeking emotions: Actors, wizards and human beings. In: Proceedings of the ISCA workshop on speech and emotion: A conceptual framework for research, Belfast. pp. 195–200. Batliner, A., Fischer, K., Huber, R., Spiker, J., & Nöth, E. (2000). Desperately seeking emotions: Actors, wizards and human beings. In: Proceedings of the ISCA workshop on speech and emotion: A conceptual framework for research, Belfast. pp. 195–200.
Zurück zum Zitat Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., & Amir, N. (2011). Whodunnit-searching for the most important feature types signaling emotion-related user states in speech. Computer Speech & Language, 25(1), 4–28.CrossRef Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., & Amir, N. (2011). Whodunnit-searching for the most important feature types signaling emotion-related user states in speech. Computer Speech & Language, 25(1), 4–28.CrossRef
Zurück zum Zitat Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7), 613–625.CrossRef Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7), 613–625.CrossRef
Zurück zum Zitat Böck, R., Hübner, D., & Wendemuth, A. (2010). Determining optimal signal features and parameters for HMM-based emotion classification. In: Proceedings of the 15th IEEE mediterranean electrotechnical conference, Valletta, MT. pp. 1586–1590. Böck, R., Hübner, D., & Wendemuth, A. (2010). Determining optimal signal features and parameters for HMM-based emotion classification. In: Proceedings of the 15th IEEE mediterranean electrotechnical conference, Valletta, MT. pp. 1586–1590.
Zurück zum Zitat Boersma, P., & Weenink, D. (2009). Praat:doing phonetics by computer (computer program). Amsterdam: Institute of Phonetic Sciences, University of Amsterdam. Boersma, P., & Weenink, D. (2009). Praat:doing phonetics by computer (computer program). Amsterdam: Institute of Phonetic Sciences, University of Amsterdam.
Zurück zum Zitat Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication, 53(9–10), 1186–1197.CrossRef Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication, 53(9–10), 1186–1197.CrossRef
Zurück zum Zitat Burkhardt, F., Paeschke, A., Rolfes, M., Senlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In: Proceedings of interspeech 2005, Lisbon. pp. 1517–1520. Burkhardt, F., Paeschke, A., Rolfes, M., Senlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In: Proceedings of interspeech 2005, Lisbon. pp. 1517–1520.
Zurück zum Zitat Calix, R. A., & Knapp, G. M. (2013). Actor level emotion magnitude prediction in text and speech. Multimedia Tools and Applications, 62, 319–332.CrossRef Calix, R. A., & Knapp, G. M. (2013). Actor level emotion magnitude prediction in text and speech. Multimedia Tools and Applications, 62, 319–332.CrossRef
Zurück zum Zitat Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falcão, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460.CrossRef Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falcão, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460.CrossRef
Zurück zum Zitat Inanoglu, Z., & Young, S. (2009). Data-driven emotion conversion in spoken English. Speech Communication, 51(3), 268–283.CrossRef Inanoglu, Z., & Young, S. (2009). Data-driven emotion conversion in spoken English. Speech Communication, 51(3), 268–283.CrossRef
Zurück zum Zitat Klein, J., Moon, Y., & Picard, R. W. (2002). This computer responds to user frustration: theory, design and results. Interacting with Computers, 14(2), 119–140.CrossRef Klein, J., Moon, Y., & Picard, R. W. (2002). This computer responds to user frustration: theory, design and results. Interacting with Computers, 14(2), 119–140.CrossRef
Zurück zum Zitat Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.CrossRef Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.CrossRef
Zurück zum Zitat Kotti, M., & Paternò, F. (2012). Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. International Journal of Speech Technology, 15(2), 131–150.CrossRef Kotti, M., & Paternò, F. (2012). Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. International Journal of Speech Technology, 15(2), 131–150.CrossRef
Zurück zum Zitat Kuncheva, L. I., Bezdek, J. C., & Duin, R. P. W. (2001). Decision templates for multiple classifier fusion: An experimental comparison. Pattern Recognition, 34(2), 299–314.MATHCrossRef Kuncheva, L. I., Bezdek, J. C., & Duin, R. P. W. (2001). Decision templates for multiple classifier fusion: An experimental comparison. Pattern Recognition, 34(2), 299–314.MATHCrossRef
Zurück zum Zitat Lee, C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9–10), 1162–1171.CrossRef Lee, C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9–10), 1162–1171.CrossRef
Zurück zum Zitat Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transaction on Speech and Audio Processing, 13(2), 293–303.CrossRef Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transaction on Speech and Audio Processing, 13(2), 293–303.CrossRef
Zurück zum Zitat Makhoul, J. (1975). Linear prediction: A tutorial review. Proceeding of IEEE, 63(4), 561–580.CrossRef Makhoul, J. (1975). Linear prediction: A tutorial review. Proceeding of IEEE, 63(4), 561–580.CrossRef
Zurück zum Zitat Mansoorizadeh, M., & Charkari, N. M. (2010). Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools and Applications, 49, 277–297.CrossRef Mansoorizadeh, M., & Charkari, N. M. (2010). Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools and Applications, 49, 277–297.CrossRef
Zurück zum Zitat Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006). The enterface’05 audio-visual emotion database. In: Proceedings of IEEE workshop on multimedia database management, Atlanta Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006). The enterface’05 audio-visual emotion database. In: Proceedings of IEEE workshop on multimedia database management, Atlanta
Zurück zum Zitat Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech & Language, 28(3), 727–742.CrossRef Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech & Language, 28(3), 727–742.CrossRef
Zurück zum Zitat Murray, I. R., & Arnott, J. L. (2008). Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Computer Speech & Language, 22(2), 107–129.CrossRef Murray, I. R., & Arnott, J. L. (2008). Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Computer Speech & Language, 22(2), 107–129.CrossRef
Zurück zum Zitat Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transaction on Affective Computing, 3(1), 116–125.CrossRef Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transaction on Affective Computing, 3(1), 116–125.CrossRef
Zurück zum Zitat Pérez-Espinosa, H., Reyes-García, C. A., & Villasenor-Pineda, L. (2012). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control, 7(1), 79–87.CrossRef Pérez-Espinosa, H., Reyes-García, C. A., & Villasenor-Pineda, L. (2012). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control, 7(1), 79–87.CrossRef
Zurück zum Zitat Pfister, T., & Robinson, P. (2011). Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis. IEEE Transaction on Affective Computing, 2(2), 66–78.CrossRef Pfister, T., & Robinson, P. (2011). Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis. IEEE Transaction on Affective Computing, 2(2), 66–78.CrossRef
Zurück zum Zitat Pierre-Yves, O. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human–Computer Studies, 59(1–2), 157–183.CrossRef Pierre-Yves, O. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human–Computer Studies, 59(1–2), 157–183.CrossRef
Zurück zum Zitat Rong, J., Li, G., & Chen, Y. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing and Management, 45(3), 315–328.CrossRef Rong, J., Li, G., & Chen, Y. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing and Management, 45(3), 315–328.CrossRef
Zurück zum Zitat Scheirer, J., Fernandez, R., Klein, J., & Picard, R. W. (2002). Frustrating the user on purpose: a step toward building an effective computer. Interacting with Computers, 14(2), 93–118.CrossRef Scheirer, J., Fernandez, R., Klein, J., & Picard, R. W. (2002). Frustrating the user on purpose: a step toward building an effective computer. Interacting with Computers, 14(2), 93–118.CrossRef
Zurück zum Zitat Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011a). Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 3(9–10), 1062–1087.CrossRef Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011a). Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 3(9–10), 1062–1087.CrossRef
Zurück zum Zitat Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of interspeech 2005, Lisbon. pp. 805-808. Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of interspeech 2005, Lisbon. pp. 805-808.
Zurück zum Zitat Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., & Rigoll, G. (2010). Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Transaction on Affective Computing, 1(1), 1–14.CrossRef Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., & Rigoll, G. (2010). Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Transaction on Affective Computing, 1(1), 1–14.CrossRef
Zurück zum Zitat Schuller, B., Zhang, Z., Weninger, F., & Rigoll, G. (2011b). Using multiple databases for training in emotion recognition: To unite or to vote? In: Proceedings of interspeech 2011, Florence. pp. 1553–1556. Schuller, B., Zhang, Z., Weninger, F., & Rigoll, G. (2011b). Using multiple databases for training in emotion recognition: To unite or to vote? In: Proceedings of interspeech 2011, Florence. pp. 1553–1556.
Zurück zum Zitat Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.CrossRef Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.CrossRef
Zurück zum Zitat Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transaction on Audio, Speech and Language Processing, 14(4), 1145–1154.CrossRef Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transaction on Audio, Speech and Language Processing, 14(4), 1145–1154.CrossRef
Zurück zum Zitat Väyrynen, E., Toivanen, J., & Seppänen, T. (2011). Classification of emotion in spoken Finnish using vowel-length segments: Increasing reliability with a fusion technique. Speech Communication, 53(3), 269–282.CrossRef Väyrynen, E., Toivanen, J., & Seppänen, T. (2011). Classification of emotion in spoken Finnish using vowel-length segments: Increasing reliability with a fusion technique. Speech Communication, 53(3), 269–282.CrossRef
Zurück zum Zitat Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956–2970.MATHCrossRef Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956–2970.MATHCrossRef
Zurück zum Zitat Vlasenko, B., Prylipko, D., Böck, R., & Wendemuth, A. (2014). Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Computer Speech & Language, 28, 483–500.CrossRef Vlasenko, B., Prylipko, D., Böck, R., & Wendemuth, A. (2014). Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Computer Speech & Language, 28, 483–500.CrossRef
Zurück zum Zitat Vlasenko, B., & Wendemuth, A. (2009). Processing affected speech within human machine interaction. In: Proceedings of interspeech 2009, Brighton. pp. 2039–2042. Vlasenko, B., & Wendemuth, A. (2009). Processing affected speech within human machine interaction. In: Proceedings of interspeech 2009, Brighton. pp. 2039–2042.
Zurück zum Zitat Wang, F., Sahli, H., Gao, J., Jiang, D., & Verhelst, W. (2014). Relevance units machine based dimensional and continuous speech emotion prediction. Multimedia Tools and Applications,. doi:10.1007/s11042-014-2319-1. Wang, F., Sahli, H., Gao, J., Jiang, D., & Verhelst, W. (2014). Relevance units machine based dimensional and continuous speech emotion prediction. Multimedia Tools and Applications,. doi:10.​1007/​s11042-014-2319-1.
Zurück zum Zitat Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2010). Multi-stage classification of emotional speech motivated by a dimensional model. Multimedia Tools and Applications, 46, 119–345.CrossRef Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2010). Multi-stage classification of emotional speech motivated by a dimensional model. Multimedia Tools and Applications, 46, 119–345.CrossRef
Zurück zum Zitat Zhang, S., & Zhao, X. (2013). Dimensionality reduction-based spoken emotion recognition. Multimedia Tools and Applications, 63(3), 615–646.CrossRef Zhang, S., & Zhao, X. (2013). Dimensionality reduction-based spoken emotion recognition. Multimedia Tools and Applications, 63(3), 615–646.CrossRef
Metadaten
Titel
Four-stage feature selection to recognize emotion from speech signals
verfasst von
A. Milton
S. Tamil Selvi
Publikationsdatum
29.07.2015
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2015
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9294-4

Weitere Artikel der Ausgabe 4/2015

International Journal of Speech Technology 4/2015 Zur Ausgabe

Neuer Inhalt