Skip to main content
Erschienen in: International Journal of Speech Technology 2/2019

08.03.2019

Multistage classification scheme to enhance speech emotion recognition

verfasst von: S. S. Poorna, G. J. Nair

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

During the past decades, emotion recognition from speech has become one of the most explored areas in affective computing. These systems lack universality due to multilingualism. Research in this direction is restrained due to unavailability of emotional speech databases in various spoken languages. Arabic is one such language, which faces this inadequacy. The proposed work aims at developing a speech emotion recognition system for Arabic speaking community. A speech database with elicited emotions—anger, happiness, sadness, disgust, surprise and neutrality are recorded from 14 subjects, who are non-native, but proficient speakers in the language. The prosodic, spectral and cepstral features are extracted after pre-processing. Subsequently the features were subjected to single stage classification using supervised learning methods viz. Support vector machine and Extreme learning machine. The performance of the speech emotion recognition systems implemented are compared in terms of accuracy, specificity, precision and recall. Further analysis is carried out by adopting three multistage classification schemes. The first scheme followed a two stage classification by initially identifying gender and then the emotions. The second used a divide and conquer approach, utilizing cascaded binary classifiers and the third, a parallel approach by classification with individual features, followed by a decision logic. The result of the study depicts that, these multistage classification schemes an bring improvement in the performance of speech emotion recognition system compared to the one with single stage classification. Comparable results were obtained for same experiments carried out using Emo-DB database.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech and Language, 25(3), 556–570.CrossRef Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech and Language, 25(3), 556–570.CrossRef
Zurück zum Zitat Badshah, A.M., Ahmad, J., Lee, M.Y., & Baik, S.W. (2016). Divide-and-conquer based ensemble to spot emotions in speech using MFCC and random forest. In arXiv:1610.01382v1. Badshah, A.M., Ahmad, J., Lee, M.Y., & Baik, S.W. (2016). Divide-and-conquer based ensemble to spot emotions in speech using MFCC and random forest. In arXiv:​1610.​01382v1.
Zurück zum Zitat Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Ninth European Conference on Speech Communication and Technolog (pp. 1517–1520). Lisbon, Portugal. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Ninth European Conference on Speech Communication and Technolog (pp. 1517–1520). Lisbon, Portugal.
Zurück zum Zitat Chen, C., You, M., Song, M., Bu, J., & Liu, J. (2006). An enhanced speech emotion recognition system based on discourse information. In Computational Science ICCS 2006 (p. 449456). New York: Springer. Chen, C., You, M., Song, M., Bu, J., & Liu, J. (2006). An enhanced speech emotion recognition system based on discourse information. In Computational Science ICCS 2006 (p. 449456). New York: Springer.
Zurück zum Zitat Cortes, C., & Vapnik, V. (1995). Support vector machine. Machine Learning, 20(3), 273–297.MATH Cortes, C., & Vapnik, V. (1995). Support vector machine. Machine Learning, 20(3), 273–297.MATH
Zurück zum Zitat El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.CrossRefMATH El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.CrossRefMATH
Zurück zum Zitat Fayek, H. M., Lech, M., & Cavedon, L. (2017). Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks, 92, 60–68.CrossRef Fayek, H. M., Lech, M., & Cavedon, L. (2017). Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks, 92, 60–68.CrossRef
Zurück zum Zitat Ghazi, D., Inkpen, D., & Szpakowicz, S. (2010). Hierarchical approach to emotion recognition and classification in Texts. In A. Farzindar & V. Keelj (Eds.), Advances in artificial intelligence, Lecture Notes in Computer Science Berlin: Springer. Ghazi, D., Inkpen, D., & Szpakowicz, S. (2010). Hierarchical approach to emotion recognition and classification in Texts. In A. Farzindar & V. Keelj (Eds.), Advances in artificial intelligence, Lecture Notes in Computer Science Berlin: Springer.
Zurück zum Zitat Giannakopoulos, T. (2009). A method for silence removal and segmentation of speech signals, implemented in Matlab (p. 2). Athens: University of Athens. Giannakopoulos, T. (2009). A method for silence removal and segmentation of speech signals, implemented in Matlab (p. 2). Athens: University of Athens.
Zurück zum Zitat Hassan, A., & Damper, R. I. (2010). Multi-class and hierarchical SVMs for emotion recognition. In Eleventh Annual Conference of the International Speech Communication Association. Hassan, A., & Damper, R. I. (2010). Multi-class and hierarchical SVMs for emotion recognition. In Eleventh Annual Conference of the International Speech Communication Association.
Zurück zum Zitat Haykin, S. (1998). Neural networks: A comprehensive foundation (2nd ed.). Upper Saddle Rive: Prentice Hall.MATH Haykin, S. (1998). Neural networks: A comprehensive foundation (2nd ed.). Upper Saddle Rive: Prentice Hall.MATH
Zurück zum Zitat Hozjan, V., & Kai, Z. (2003). Context-independent multilingual emotion recognition from speech signals. International Journal of Speech Technology, 6(3), 311–320.CrossRef Hozjan, V., & Kai, Z. (2003). Context-independent multilingual emotion recognition from speech signals. International Journal of Speech Technology, 6(3), 311–320.CrossRef
Zurück zum Zitat Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and Applications. Neurocomputing, 70(1–3), 489–501.CrossRef Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and Applications. Neurocomputing, 70(1–3), 489–501.CrossRef
Zurück zum Zitat Huang, K. Y., Wu, C. H., Su, M. H., & Kuo, Y. T. (2018). Detecting unipolar and bipolar depressive disorders from elicited speech responses using latent affective structure model. In IEEE Transactions on Affective Computing. Huang, K. Y., Wu, C. H., Su, M. H., & Kuo, Y. T. (2018). Detecting unipolar and bipolar depressive disorders from elicited speech responses using latent affective structure model. In IEEE Transactions on Affective Computing.
Zurück zum Zitat Huang, G. B., Zhu, Q. Y., Siew, & C. K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. In IEEE Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, 2004 (Vol. 2, pp. 985–990). Huang, G. B., Zhu, Q. Y., Siew, & C. K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. In IEEE Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, 2004 (Vol. 2, pp. 985–990).
Zurück zum Zitat Huber, R., Anton, B., Jan, B., Elmar, N., Volker, W., & Heinrich, N. (2000). Recognition of emotion in a realistic dialogue scenario. In Proceedings of International Conference on Spoken Language Processing. Beijing, China, pp 665- 668. Huber, R., Anton, B., Jan, B., Elmar, N., Volker, W., & Heinrich, N. (2000). Recognition of emotion in a realistic dialogue scenario. In Proceedings of International Conference on Spoken Language Processing. Beijing, China, pp 665- 668.
Zurück zum Zitat Kadiri, S. R., Gangamohan, P., Gangashetty, S. V., & Yegnanarayana, B. (2015). Analysis of excitation source features of speech for emotion recognition. In Sixteenth Annual Conference of the International Speech Communication Association. Kadiri, S. R., Gangamohan, P., Gangashetty, S. V., & Yegnanarayana, B. (2015). Analysis of excitation source features of speech for emotion recognition. In Sixteenth Annual Conference of the International Speech Communication Association.
Zurück zum Zitat Kim, E. H., Hyun, K. H., Kim, S. H., & Kwak, Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics, 14(3), 317–325.CrossRef Kim, E. H., Hyun, K. H., Kim, S. H., & Kwak, Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics, 14(3), 317–325.CrossRef
Zurück zum Zitat Kotti, M., & Patern, F. (2012). Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. International Journal of Speech Technology, 15(2), 131–150.CrossRef Kotti, M., & Patern, F. (2012). Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. International Journal of Speech Technology, 15(2), 131–150.CrossRef
Zurück zum Zitat Lausen, A., & Schacht, A. (2018). Gender differences in the recognition of vocal emotions. Frontiers in Psychology, 9, 882.CrossRef Lausen, A., & Schacht, A. (2018). Gender differences in the recognition of vocal emotions. Frontiers in Psychology, 9, 882.CrossRef
Zurück zum Zitat Lee, C. C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9–10), 1162–1171.CrossRef Lee, C. C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9–10), 1162–1171.CrossRef
Zurück zum Zitat Lugger, M., Janoir, M. E., & Yang, B. (2009, August). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In 17th European IEEE Conference on Signal Processing, 2009 (pp. 1225-1229). Lugger, M., Janoir, M. E., & Yang, B. (2009, August). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In 17th European IEEE Conference on Signal Processing, 2009 (pp. 1225-1229).
Zurück zum Zitat Mayoraz, E., & Alpaydin, E. (1999). Support vector machines for multi-class classification. In International Work-Conference on Artificial Neural Networks (pp. 833–842). Springer, Berlin, Heidelberg. Mayoraz, E., & Alpaydin, E. (1999). Support vector machines for multi-class classification. In International Work-Conference on Artificial Neural Networks (pp. 833–842). Springer, Berlin, Heidelberg.
Zurück zum Zitat Meddeb, M., Hichem, K., & Alimi. A. (2014). Intelligent remote control for TV program based on emotion in Arabic speech. International Journal of Scientific Research and Engineering Technology (IJSET), Vol. 1, ISSN 2277-1581 Meddeb, M., Hichem, K., & Alimi. A. (2014). Intelligent remote control for TV program based on emotion in Arabic speech. International Journal of Scientific Research and Engineering Technology (IJSET), Vol. 1, ISSN 2277-1581
Zurück zum Zitat Miguel Signorelli, C. (2018). Can computers become conscious and overcome humans? Frontiers in Robotics and AI, 5, 45. Miguel Signorelli, C. (2018). Can computers become conscious and overcome humans? Frontiers in Robotics and AI, 5, 45.
Zurück zum Zitat Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech and Language, 28(3), 727–742.CrossRef Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech and Language, 28(3), 727–742.CrossRef
Zurück zum Zitat Morrison, D., Wang, R., Xu, W., & Silva, L. C. D. (2007). Incremental learning for spoken affect classification and its application in call-centres. International Journal of Intelligent Systems Technologies and Applications, 2, 242–254.CrossRef Morrison, D., Wang, R., Xu, W., & Silva, L. C. D. (2007). Incremental learning for spoken affect classification and its application in call-centres. International Journal of Intelligent Systems Technologies and Applications, 2, 242–254.CrossRef
Zurück zum Zitat Mundt, J. C., Snyder, P. J., Cannizzaro, M. S., Chappie, K., & Geralts, D. S. (2007). Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. Journal of Neurolinguistics, 20(1), 50–64.CrossRef Mundt, J. C., Snyder, P. J., Cannizzaro, M. S., Chappie, K., & Geralts, D. S. (2007). Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. Journal of Neurolinguistics, 20(1), 50–64.CrossRef
Zurück zum Zitat Padhi, D. R., & Gupta, R. (2017). IVR Wizard of OZ field experiment with less-literate telecom customers. In IFIP Conference on Human-Computer Interaction (pp. 492–495). Cham: Springer. Padhi, D. R., & Gupta, R. (2017). IVR Wizard of OZ field experiment with less-literate telecom customers. In IFIP Conference on Human-Computer Interaction (pp. 492–495). Cham: Springer.
Zurück zum Zitat Picard, R. W. (1997). Affective computing. Cambridge: The MIT Press. Picard, R. W. (1997). Affective computing. Cambridge: The MIT Press.
Zurück zum Zitat Poorna, C. Y., Jeevitha, Shyama Jayan, Nair, Sini Santhosh, & Nair, G. J. (2015). Emotion recognition using multi-parameter speech feature classification, IEEE International Conference on Computers, Communications, and Systems, India. Electronic. ISBN 978-1-4673-9756-8. Poorna, C. Y., Jeevitha, Shyama Jayan, Nair, Sini Santhosh, & Nair, G. J. (2015). Emotion recognition using multi-parameter speech feature classification, IEEE International Conference on Computers, Communications, and Systems, India. Electronic. ISBN 978-1-4673-9756-8.
Zurück zum Zitat Poorna, S. S., Anuraj, K., & Nair, G. J. (2018). A weight based approach for emotion recognition from speech: An analysis using South Indian languages. In Soft computing systems. ICSCS 2018. Communications in Computer and Information Science, Vol. 837. Springer. Poorna, S. S., Anuraj, K., & Nair, G. J. (2018). A weight based approach for emotion recognition from speech: An analysis using South Indian languages. In Soft computing systems. ICSCS 2018. Communications in Computer and Information Science, Vol. 837. Springer.
Zurück zum Zitat Rabiner, L. R., & Schafer, R. W. (2011). Theory and application of digital speech processing (1st ed.). New York: Prentice Hall. Rabiner, L. R., & Schafer, R. W. (2011). Theory and application of digital speech processing (1st ed.). New York: Prentice Hall.
Zurück zum Zitat Siddiqui, S., Monem, A. A., & Shaalan, K. (2017). Towards Improving Sentiment Analysis in Arabic. In: A. Hassanien, K. Shaalan, T. Gaber, A. Azar, & M. Tolba (Eds.), Proceedings of the International Conference on Advanced Intelligent Systems and Informatics. Advances in Intelligent Systems and Computing, Vol. 533. Cham: Springer. Siddiqui, S., Monem, A. A., & Shaalan, K. (2017). Towards Improving Sentiment Analysis in Arabic. In: A. Hassanien, K. Shaalan, T. Gaber, A. Azar, & M. Tolba (Eds.), Proceedings of the International Conference on Advanced Intelligent Systems and Informatics. Advances in Intelligent Systems and Computing, Vol. 533. Cham: Springer.
Zurück zum Zitat Swain, M., Routray, A., & Kabisatpathy, P. (2018). Databases, features and classifiers for speech emotion recognition: A review. International Journal of Speech Technology, 21(1), 93–120.CrossRef Swain, M., Routray, A., & Kabisatpathy, P. (2018). Databases, features and classifiers for speech emotion recognition: A review. International Journal of Speech Technology, 21(1), 93–120.CrossRef
Zurück zum Zitat Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 11621181.CrossRef Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 11621181.CrossRef
Zurück zum Zitat Vlasenko, B., Schuller, B., Wendemuth, A., & Rigoll, G. (2007). Frame vs. turn-level: Emotion recognition from speech considering static and dynamic processing. In International Conference on Affective Computing and Intelligent Interaction, pp. 139–147. Berlin: Springer. Vlasenko, B., Schuller, B., Wendemuth, A., & Rigoll, G. (2007). Frame vs. turn-level: Emotion recognition from speech considering static and dynamic processing. In International Conference on Affective Computing and Intelligent Interaction, pp. 139–147. Berlin: Springer.
Zurück zum Zitat Wolpert, D. H. (2002). The supervised learning no-free-lunch theorems. In R. Roy, M. Köppen, S. Ovaska, T. Furuhashi, & F. Hoffmann (Eds.), Soft computing and industry (pp. 25–42). London: Springer. Wolpert, D. H. (2002). The supervised learning no-free-lunch theorems. In R. Roy, M. Köppen, S. Ovaska, T. Furuhashi, & F. Hoffmann (Eds.), Soft computing and industry (pp. 25–42). London: Springer.
Zurück zum Zitat Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2011). Classification of Emotional Speech Based on an Automatically Elaborated Hierarchical Classifier (p. 753819). Article ID: ISRN Signal Processing. Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2011). Classification of Emotional Speech Based on an Automatically Elaborated Hierarchical Classifier (p. 753819). Article ID: ISRN Signal Processing.
Metadaten
Titel
Multistage classification scheme to enhance speech emotion recognition
verfasst von
S. S. Poorna
G. J. Nair
Publikationsdatum
08.03.2019
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2019
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-019-09605-w

Weitere Artikel der Ausgabe 2/2019

International Journal of Speech Technology 2/2019 Zur Ausgabe

Neuer Inhalt