Skip to main content
Top
Published in: International Journal of Speech Technology 1/2017

28-10-2016

Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification

Authors: Na Yang, Jianbo Yuan, Yun Zhou, Ilker Demirkol, Zhiyao Duan, Wendi Heinzelman, Melissa Sturge-Apple

Published in: International Journal of Speech Technology | Issue 1/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As an essential approach to understanding human interactions, emotion classification is a vital component of behavioral studies as well as being important in the design of context-aware systems. Recent studies have shown that speech contains rich information about emotion, and numerous speech-based emotion classification methods have been proposed. However, the classification performance is still short of what is desired for the algorithms to be used in real systems. We present an emotion classification system using several one-against-all support vector machines with a thresholding fusion mechanism to combine the individual outputs, which provides the functionality to effectively increase the emotion classification accuracy at the expense of rejecting some samples as unclassified. Results show that the proposed system outperforms three state-of-the-art methods and that the thresholding fusion mechanism can effectively improve the emotion classification, which is important for applications that require very high accuracy but do not require that all samples be classified. We evaluate the system performance for several challenging scenarios including speaker-independent tests, tests on noisy speech signals, and tests using non-professional acted recordings, in order to demonstrate the performance of the system and the effectiveness of the thresholding fusion mechanism in real scenarios.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Al Machot, F., Mosa, A. H., Dabbour, K., Fasih, A., Schwarzlmuller, C., Ali, M., & Kyamakya, K. (2011). A novel real-time emotion detection system from audio streams based on Bayesian quadratic discriminate classifier for ADAS. In Nonlinear Dynamics and Synchronization 16th Int’l Symposium on Theoretical Electrical Engineering, Joint 3rd Int’l Workshop on. Al Machot, F., Mosa, A. H., Dabbour, K., Fasih, A., Schwarzlmuller, C., Ali, M., & Kyamakya, K. (2011). A novel real-time emotion detection system from audio streams based on Bayesian quadratic discriminate classifier for ADAS. In Nonlinear Dynamics and Synchronization 16th Int’l Symposium on Theoretical Electrical Engineering, Joint 3rd Int’l Workshop on.
go back to reference Ang, J., Dhillon, R., Krupski, A., Shriberg, E., & Stolcke, A. (2002). Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In Proceeings of International Conference on Spoken Language Processing (pp. 2037–2040). Ang, J., Dhillon, R., Krupski, A., Shriberg, E., & Stolcke, A. (2002). Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In Proceeings of International Conference on Spoken Language Processing (pp. 2037–2040).
go back to reference Bakeman, R. (1997). Behavioral observation and coding. Handbook of research methods in social psychology. Cambridge: Cambridge University Press. Bakeman, R. (1997). Behavioral observation and coding. Handbook of research methods in social psychology. Cambridge: Cambridge University Press.
go back to reference Bänziger, T., Patel, S., & Scherer, K. R. (2014). The role of perceived voice and speech characteristics in vocal emotion communication. Journal of nonverbal behavior, 38(1), 31–52.CrossRef Bänziger, T., Patel, S., & Scherer, K. R. (2014). The role of perceived voice and speech characteristics in vocal emotion communication. Journal of nonverbal behavior, 38(1), 31–52.CrossRef
go back to reference Bao, H., Xu, M. X., & Zheng, T. F. (2007). Emotion attribute projection for speaker recognition on emotional speech. In Procceedings of Interspeech (pp. 758–761). Bao, H., Xu, M. X., & Zheng, T. F. (2007). Emotion attribute projection for speaker recognition on emotional speech. In Procceedings of Interspeech (pp. 758–761).
go back to reference Barra-Chicote, R., Yamagishi, J., King, S., Montero, J. M., & Macias-Guarasa, J. (2010). Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Communication, 52(5), 394–404.CrossRef Barra-Chicote, R., Yamagishi, J., King, S., Montero, J. M., & Macias-Guarasa, J. (2010). Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Communication, 52(5), 394–404.CrossRef
go back to reference Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., & Aharonson, V. (2006). Combining efforts for improving automatic classification of emotional user states. In Proceedings of the Fifth Slovenian and First International Language Technologies Conference. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., & Aharonson, V. (2006). Combining efforts for improving automatic classification of emotional user states. In Proceedings of the Fifth Slovenian and First International Language Technologies Conference.
go back to reference Bellegarda, J. R. (2013). Data-driven analysis of emotion in text using latent affective folding and embedding. Computational Intelligence, 29(3), 506–526.MathSciNetCrossRef Bellegarda, J. R. (2013). Data-driven analysis of emotion in text using latent affective folding and embedding. Computational Intelligence, 29(3), 506–526.MathSciNetCrossRef
go back to reference Bitouk, D., Ragini, V., & Ani, N. (2010). Class-level spectral features for emotion recognition. Journal of Speech Communication, 52(7–8), 613–625.CrossRef Bitouk, D., Ragini, V., & Ani, N. (2010). Class-level spectral features for emotion recognition. Journal of Speech Communication, 52(7–8), 613–625.CrossRef
go back to reference Black, M. P., Katsamanis, A., Baucom, B. R., Lee, C. C., Lammert, A. C., & Christensen, A. (2013). Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features. Speech communication, 55(1), 1–21.CrossRef Black, M. P., Katsamanis, A., Baucom, B. R., Lee, C. C., Lammert, A. C., & Christensen, A. (2013). Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features. Speech communication, 55(1), 1–21.CrossRef
go back to reference Chang, K., Fisher, D., & Canny, J. (2011). AMMON: a speech analysis library for analyzing affect, stress, and mental health on mobile phones. In 2nd International Workshop on Sensing Applications on Mobile Phones. Chang, K., Fisher, D., & Canny, J. (2011). AMMON: a speech analysis library for analyzing affect, stress, and mental health on mobile phones. In 2nd International Workshop on Sensing Applications on Mobile Phones.
go back to reference Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1), 321–357.MATH Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1), 321–357.MATH
go back to reference Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schröder, M. (2000). ‘FEELTRACE’: an instrument for recording perceived emotion in real time. In ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion. Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schröder, M. (2000). ‘FEELTRACE’: an instrument for recording perceived emotion in real time. In ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion.
go back to reference Eskimez, S. E., Imade, K., Yang, N., Sturge-Appley, M., Duan, Z., & Heinzelman, W. (2016). Emotion classification: How does an automated system compare to naive human coders? In Acoustics, Speech and Signal Processing, Proceedings of the IEEE International Conference on. Eskimez, S. E., Imade, K., Yang, N., Sturge-Appley, M., Duan, Z., & Heinzelman, W. (2016). Emotion classification: How does an automated system compare to naive human coders? In Acoustics, Speech and Signal Processing, Proceedings of the IEEE International Conference on.
go back to reference Farrús, M., Ejarque, P., Temko, A., & Hernando, J. (2007). Histogram equalization in SVM multimodal person verification. In Proceedings of IAPR/IEEE International Conference on Biometrics. Farrús, M., Ejarque, P., Temko, A., & Hernando, J. (2007). Histogram equalization in SVM multimodal person verification. In Proceedings of IAPR/IEEE International Conference on Biometrics.
go back to reference Goudbeek, M., Goldman, J. P., & Scherer, K. R. (2009). Emotion dimensions and formant position. In INTERSPEECH (pp. 1575–1578). Goudbeek, M., Goldman, J. P., & Scherer, K. R. (2009). Emotion dimensions and formant position. In INTERSPEECH (pp. 1575–1578).
go back to reference Goyal, A., Riloff, E., Daumé III, H., & Gilbert, N. (2010). Toward plot units: automatic affect state analysis. In Proceedings of HLT/NAACL Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (CAET). Goyal, A., Riloff, E., Daumé III, H., & Gilbert, N. (2010). Toward plot units: automatic affect state analysis. In Proceedings of HLT/NAACL Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (CAET).
go back to reference Gupta P., & Rajput N. (2007). Two-stream emotion recognition for call center monitoring. In INTERSPEECH (pp. 2241–2244). Gupta P., & Rajput N. (2007). Two-stream emotion recognition for call center monitoring. In INTERSPEECH (pp. 2241–2244).
go back to reference Hoque, M., Yeasin, M., & Louwerse, M. (2006). Robust recognition of emotion from speech. Intelligent virtual agents (pp. 42–53). Berlin: Springer. Lecture notes in computer science.CrossRef Hoque, M., Yeasin, M., & Louwerse, M. (2006). Robust recognition of emotion from speech. Intelligent virtual agents (pp. 42–53). Berlin: Springer. Lecture notes in computer science.CrossRef
go back to reference Hsu, C.W., Chang ,C.C., & Lin, C.J. (2003). A practical guide to support vector classification. Hsu, C.W., Chang ,C.C., & Lin, C.J. (2003). A practical guide to support vector classification.
go back to reference Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing. New Jersey: Prentice Hall PTR. Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing. New Jersey: Prentice Hall PTR.
go back to reference Huisman, G., Van Hout, M., van Dijk, E., van der Geest, T., & Heylen, D. (2013). Lemtool—measuring emotions in visual interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Huisman, G., Van Hout, M., van Dijk, E., van der Geest, T., & Heylen, D. (2013). Lemtool—measuring emotions in visual interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.
go back to reference Jong, N. H. D., & Wempe, T. (2007). Automatic measurement of speech rate in spoken Dutch. ACLC Working Papers, 2(2), 49–58. Jong, N. H. D., & Wempe, T. (2007). Automatic measurement of speech rate in spoken Dutch. ACLC Working Papers, 2(2), 49–58.
go back to reference Kawanami, H., Iwami, Y., Toda, T., Saruwatari, H., & Shikano, K. (2003). GMM-based voice conversion applied to emotional speech synthesis. In Proceedings of Eurospeech. Kawanami, H., Iwami, Y., Toda, T., Saruwatari, H., & Shikano, K. (2003). GMM-based voice conversion applied to emotional speech synthesis. In Proceedings of Eurospeech.
go back to reference Kerig, P., & Baucom, D. (2004). Couple observational coding systems. Abington: Routledge. Kerig, P., & Baucom, D. (2004). Couple observational coding systems. Abington: Routledge.
go back to reference Kwon, O.W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In EUROSPEECH. (pp. 125–128). Kwon, O.W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In EUROSPEECH. (pp. 125–128).
go back to reference Lee, C., & Lee, G. G. (2007). Emotion recognition for affective user interfaces using natural language dialogs. In Procceedings of IEEE International Symposium on Robot and Human interactive Communication. (pp. 798–801). Lee, C., & Lee, G. G. (2007). Emotion recognition for affective user interfaces using natural language dialogs. In Procceedings of IEEE International Symposium on Robot and Human interactive Communication. (pp. 798–801).
go back to reference Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.CrossRef Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.CrossRef
go back to reference Lee, C. M., Narayanan, S. S., & Pieraccini, R. (2002). Combining acoustic and language information for emotion recognition. In Proceeding of 7th International Conference on Spoken Language Processing. Lee, C. M., Narayanan, S. S., & Pieraccini, R. (2002). Combining acoustic and language information for emotion recognition. In Proceeding of 7th International Conference on Spoken Language Processing.
go back to reference Lee, L., & Rose, R. C. (1996). Speaker normalization using efficient frequency warping procedures. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1 (pp. 353–356). Lee, L., & Rose, R. C. (1996). Speaker normalization using efficient frequency warping procedures. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1 (pp. 353–356).
go back to reference Liberman, M., Davis, K., Grossman, M., Martey, N., & Bell, J. (2002). Emotional prosody speech and transcripts. Philadelphia: Linguistic Data Consortium (LDC). Liberman, M., Davis, K., Grossman, M., Martey, N., & Bell, J. (2002). Emotional prosody speech and transcripts. Philadelphia: Linguistic Data Consortium (LDC).
go back to reference Ling, C., Dong, M., Li, H., Yu, Z. L., & Chan, P. (2010). Machine learning methods in the application of speech emotion recognition. In Application of Machine Learning (pp. 1–19). Ling, C., Dong, M., Li, H., Yu, Z. L., & Chan, P. (2010). Machine learning methods in the application of speech emotion recognition. In Application of Machine Learning (pp. 1–19).
go back to reference Özkul, S., Bozkurt, E., Asta, S., Yemez, Y., & Erzin, E. (2012). Multimodal analysis of upper-body gestures, facial expressions and speech. In Procceedings of the 4th International Workshop on Corpora for Research on Emotion Sentiment and Social Signals. Özkul, S., Bozkurt, E., Asta, S., Yemez, Y., & Erzin, E. (2012). Multimodal analysis of upper-body gestures, facial expressions and speech. In Procceedings of the 4th International Workshop on Corpora for Research on Emotion Sentiment and Social Signals.
go back to reference Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Mchine Intelligence, 27(8), 1226–1238.CrossRef Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Mchine Intelligence, 27(8), 1226–1238.CrossRef
go back to reference Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods (pp. 185–208). Cambridge: MIT Press. Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods (pp. 185–208). Cambridge: MIT Press.
go back to reference Qin, L., Ling, Z. H., Wu, Y. J., Zhang, B. F., & Wang, R. H. (2006). Hmm-based emotional speech synthesis using average emotion model. In Procceedings of Chinese Spoken Language Processing (pp. 233–240). Qin, L., Ling, Z. H., Wu, Y. J., Zhang, B. F., & Wang, R. H. (2006). Hmm-based emotional speech synthesis using average emotion model. In Procceedings of Chinese Spoken Language Processing (pp. 233–240).
go back to reference Rachuri, K. K., Musolesi, M., Mascolo, C., Rentfrow, P. J., Longworth, C., & Aucinas, A. (2010). EmotionSense: a mobile phones based adaptive platform for experimental social psychology research. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing (pp. 281–290). Rachuri, K. K., Musolesi, M., Mascolo, C., Rentfrow, P. J., Longworth, C., & Aucinas, A. (2010). EmotionSense: a mobile phones based adaptive platform for experimental social psychology research. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing (pp. 281–290).
go back to reference Roberto, B. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4), 537–550.CrossRef Roberto, B. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4), 537–550.CrossRef
go back to reference Rong, J., Li, G., & Chen, Y. P. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing and Management, 45(3), 315–328.CrossRef Rong, J., Li, G., & Chen, Y. P. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing and Management, 45(3), 315–328.CrossRef
go back to reference Sauter, D. A., Eisner, F., Calder, A. J., & Scott, S. K. (2010). Perceptual cues in nonverbal vocal expressions of emotion, 63(11), 2251–2272. Sauter, D. A., Eisner, F., Calder, A. J., & Scott, S. K. (2010). Perceptual cues in nonverbal vocal expressions of emotion, 63(11), 2251–2272.
go back to reference Scherer, K. R. (2003). Vocal communication of emotion: a review of research paradigms. Speech Communication, 40(1–2), 227–256.CrossRefMATH Scherer, K. R. (2003). Vocal communication of emotion: a review of research paradigms. Speech Communication, 40(1–2), 227–256.CrossRefMATH
go back to reference Scherer, K. R. (2005). What are emotions? And how can they be measured? Social Science Information, 44(4), 695–729.MathSciNetCrossRef Scherer, K. R. (2005). What are emotions? And how can they be measured? Social Science Information, 44(4), 695–729.MathSciNetCrossRef
go back to reference Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden markov model-based speech emotion recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 2 (p. 1). Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden markov model-based speech emotion recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 2 (p. 1).
go back to reference Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1 (p. 577). Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1 (p. 577).
go back to reference Schuller, B., Vlasenko, B., Minguez, R., Rigoll, G., & Wendemuth, A. (2007). Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In: IEEE Workshop on Automatic Speech Recognition Understanding (pp. 596–600). Schuller, B., Vlasenko, B., Minguez, R., Rigoll, G., & Wendemuth, A. (2007). Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In: IEEE Workshop on Automatic Speech Recognition Understanding (pp. 596–600).
go back to reference Sethu, V., Ambikairajah, E., & Epps, J. (2008). Empirical mode decomposition based weighted frequency feature for speech-based emotion classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 5017–5020). Sethu, V., Ambikairajah, E., & Epps, J. (2008). Empirical mode decomposition based weighted frequency feature for speech-based emotion classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 5017–5020).
go back to reference Shafran, I. (2005). A comparison of classifiers for detecting emotion from speech. In IEEE International Conference on Acoustics, Speech and Signal Processing. Shafran, I. (2005). A comparison of classifiers for detecting emotion from speech. In IEEE International Conference on Acoustics, Speech and Signal Processing.
go back to reference Shrawankar, U., & Thakare, V.M. (2013). Adverse conditions and ASR techniques for robust speech user interface. arXiv preprint arXiv:13035515. Shrawankar, U., & Thakare, V.M. (2013). Adverse conditions and ASR techniques for robust speech user interface. arXiv preprint arXiv:​13035515.
go back to reference Steidl, S., Polzehl, T., Bunnell, H. T., Dou, Y., Muthukumar, P. K., Perry, D., Prahallad, K., Vaughn, C., Black, A. W., & Metze, F. (2012). Emotion identification for evaluation of synthesized emotional speech. In Procceedings of Speech Prosody. Steidl, S., Polzehl, T., Bunnell, H. T., Dou, Y., Muthukumar, P. K., Perry, D., Prahallad, K., Vaughn, C., Black, A. W., & Metze, F. (2012). Emotion identification for evaluation of synthesized emotional speech. In Procceedings of Speech Prosody.
go back to reference Tacconi, D., Mayora, O., Lukowicz, P., Arnrich, B., Setz, C., Troster, G., & Haring, C. (2008). Activity and emotion recognition to support early diagnosis of psychiatric diseases. In Pervasive Computing Technologies for Healthcare (PervasiveHealth), Second International Conference on (pp. 100–102). Tacconi, D., Mayora, O., Lukowicz, P., Arnrich, B., Setz, C., Troster, G., & Haring, C. (2008). Activity and emotion recognition to support early diagnosis of psychiatric diseases. In Pervasive Computing Technologies for Healthcare (PervasiveHealth), Second International Conference on (pp. 100–102).
go back to reference Tang, H., Chu, S. M., Hasegawa-Johnson, M., & Huang, T. S. (2009). Emotion recognition from speech via boosted Gaussian mixture models. In IEEE International Conference on Multimedia and Expo (ICME) (pp. 294–297). Tang, H., Chu, S. M., Hasegawa-Johnson, M., & Huang, T. S. (2009). Emotion recognition from speech via boosted Gaussian mixture models. In IEEE International Conference on Multimedia and Expo (ICME) (pp. 294–297).
go back to reference Vapnik, V. N. (1998). Statistical learning theory. New Jersey: Wiley.MATH Vapnik, V. N. (1998). Statistical learning theory. New Jersey: Wiley.MATH
go back to reference Vlasenko, B., Schuller, B., Wendemuth, A., & Rigoll, G. (2007). Combining frame and turn-level information for robust recognition of emotions within speech. In INTERSPEECH. (pp. 2249–2252). Vlasenko, B., Schuller, B., Wendemuth, A., & Rigoll, G. (2007). Combining frame and turn-level information for robust recognition of emotions within speech. In INTERSPEECH. (pp. 2249–2252).
go back to reference Wu, C. H., Kung, C., Lin, J. C., & Wei, W. L. (2013). Two-level hierarchical alignment for semi-coupled hmm-based audiovisual emotion recognition with temporal course. IEEE Transactions on Multimedia, 15(8), 1880–1895.CrossRef Wu, C. H., Kung, C., Lin, J. C., & Wei, W. L. (2013). Two-level hierarchical alignment for semi-coupled hmm-based audiovisual emotion recognition with temporal course. IEEE Transactions on Multimedia, 15(8), 1880–1895.CrossRef
go back to reference Wu, G., & Chang, E. Y. (2003). Class-boundary alignment for imbalanced dataset learning. In Workshop on Learning from Imbalanced Datasets II, ICML (pp. 49–56). Wu, G., & Chang, E. Y. (2003). Class-boundary alignment for imbalanced dataset learning. In Workshop on Learning from Imbalanced Datasets II, ICML (pp. 49–56).
go back to reference Wu, S., Falk, T. H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In Procceedings of the 16th International Conference on Digital Signal Processing. Wu, S., Falk, T. H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In Procceedings of the 16th International Conference on Digital Signal Processing.
go back to reference Xia, R., & Liu, Y. (2012). Using i-vector space model for emotion recognition. In Procceedings of Interspeech. Xia, R., & Liu, Y. (2012). Using i-vector space model for emotion recognition. In Procceedings of Interspeech.
go back to reference Yang, N., Ba, H., Cai, W., Demirkol, I., & Heinzelman, W. (2014). BaNa: a noise resilient fundamental frequency detection algorithm for speech and music. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12), 1833–1848.CrossRef Yang, N., Ba, H., Cai, W., Demirkol, I., & Heinzelman, W. (2014). BaNa: a noise resilient fundamental frequency detection algorithm for speech and music. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12), 1833–1848.CrossRef
go back to reference Yang, N., Muraleedharan, R., Kohl, J., Demirkol, I., Heinzelman, W., & Sturge-Apple, M. (2012). Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion. In Spoken Language Technology Workshop (SLT), 2012 IEEE (pp. 455–460). Yang, N., Muraleedharan, R., Kohl, J., Demirkol, I., Heinzelman, W., & Sturge-Apple, M. (2012). Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion. In Spoken Language Technology Workshop (SLT), 2012 IEEE (pp. 455–460).
go back to reference Yang, Y., Fairbairn, C., & Cohn, J. F. (2013). Detecting depression severity from vocal prosody. IEEE Transactions on Affective Computing, 4(2), 142–150.CrossRef Yang, Y., Fairbairn, C., & Cohn, J. F. (2013). Detecting depression severity from vocal prosody. IEEE Transactions on Affective Computing, 4(2), 142–150.CrossRef
go back to reference Yun, S., & Yoo, C. D. (2012). Loss-scaled large-margin gaussian mixture models for speech emotion classification. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 585–598.CrossRef Yun, S., & Yoo, C. D. (2012). Loss-scaled large-margin gaussian mixture models for speech emotion classification. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 585–598.CrossRef
go back to reference Zhang S., Zhao X., & Lei B. (2013). Speech emotion recognition using an enhanced kernel isomap for human-robot interaction. International Journal of Advanced Robotic Systems. doi:10.5772/55403. Zhang S., Zhao X., & Lei B. (2013). Speech emotion recognition using an enhanced kernel isomap for human-robot interaction. International Journal of Advanced Robotic Systems. doi:10.​5772/​55403.
Metadata
Title
Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification
Authors
Na Yang
Jianbo Yuan
Yun Zhou
Ilker Demirkol
Zhiyao Duan
Wendi Heinzelman
Melissa Sturge-Apple
Publication date
28-10-2016
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2017
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-016-9364-2

Other articles of this Issue 1/2017

International Journal of Speech Technology 1/2017 Go to the issue