Skip to main content
Erschienen in: Neural Computing and Applications 1/2013

01.07.2013 | Original Article

Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method

verfasst von: Mansour Sheikhan, Mahdi Bejani, Davood Gharavian

Erschienen in: Neural Computing and Applications | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The speech signal consists of linguistic information and also paralinguistic one such as emotion. The modern automatic speech recognition systems have achieved high performance in neutral style speech recognition, but they cannot maintain their high recognition rate for spontaneous speech. So, emotion recognition is an important step toward emotional speech recognition. The accuracy of an emotion recognition system is dependent on different factors such as the type and number of emotional states and selected features, and also the type of classifier. In this paper, a modular neural-support vector machine (SVM) classifier is proposed, and its performance in emotion recognition is compared to Gaussian mixture model, multi-layer perceptron neural network, and C5.0-based classifiers. The most efficient features are also selected by using the analysis of variations method. It is noted that the proposed modular scheme is achieved through a comparative study of different features and characteristics of an individual emotional state with the aim of improving the recognition performance. Empirical results show that even by discarding 22% of features, the average emotion recognition accuracy can be improved by 2.2%. Also, the proposed modular neural-SVM classifier improves the recognition accuracy at least by 8% as compared to the simulated monolithic classifiers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
2.
Zurück zum Zitat Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25:556–570CrossRef Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25:556–570CrossRef
3.
Zurück zum Zitat Ai H, Litman DJ, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring systems. In: The proceedings of Interspeech, pp 797–800 Ai H, Litman DJ, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring systems. In: The proceedings of Interspeech, pp 797–800
4.
Zurück zum Zitat Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human–human call center dialogs. In: The proceedings of Interspeech, pp 801–804 Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human–human call center dialogs. In: The proceedings of Interspeech, pp 801–804
5.
Zurück zum Zitat Lee CC, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using a hierarchical binary decision tree approach. In: The proceedings of Interspeech, pp 320–323 Lee CC, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using a hierarchical binary decision tree approach. In: The proceedings of Interspeech, pp 320–323
6.
Zurück zum Zitat Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: The proceedings of Interspeech, pp 340–343 Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: The proceedings of Interspeech, pp 340–343
7.
Zurück zum Zitat Klein J, Moon Y, Picard RW (2002) This computer responds to user frustration: theory, design and results. Interact Comput 14:119–140CrossRef Klein J, Moon Y, Picard RW (2002) This computer responds to user frustration: theory, design and results. Interact Comput 14:119–140CrossRef
8.
Zurück zum Zitat López-Cózar R, Silovsky J, Kroul M (2011) Enhancement of emotion detection in spoken dialogue systems by combining several information sources. Speech Commun 53:1210–1228CrossRef López-Cózar R, Silovsky J, Kroul M (2011) Enhancement of emotion detection in spoken dialogue systems by combining several information sources. Speech Commun 53:1210–1228CrossRef
9.
Zurück zum Zitat Fernandez R, Picard R (2011) Recognizing affect from speech prosody using hierarchical graphical models. Speech Commun 53:1088–1103CrossRef Fernandez R, Picard R (2011) Recognizing affect from speech prosody using hierarchical graphical models. Speech Commun 53:1088–1103CrossRef
10.
Zurück zum Zitat Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Interact Stud 59:157–183CrossRef Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Interact Stud 59:157–183CrossRef
11.
Zurück zum Zitat Huber R, Batliner A, Buckow J, Nöth E, Warnke V, Niemann H (2000) Recognition of emotion in a realistic dialogue scenario. In: The proceedings of international conference on spoken language processing, pp 665–668 Huber R, Batliner A, Buckow J, Nöth E, Warnke V, Niemann H (2000) Recognition of emotion in a realistic dialogue scenario. In: The proceedings of international conference on spoken language processing, pp 665–668
12.
Zurück zum Zitat Yacoub S, Simske S, Lin X, Burns J (2003) Recognition of emotions in interactive voice response systems. In: The proceeding of European conference on speech communication and technology, pp 729–732 Yacoub S, Simske S, Lin X, Burns J (2003) Recognition of emotions in interactive voice response systems. In: The proceeding of European conference on speech communication and technology, pp 729–732
13.
Zurück zum Zitat Polzehl T, Schmitt A, Metze F, Wagner M (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Commun 53:1198–1209CrossRef Polzehl T, Schmitt A, Metze F, Wagner M (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Commun 53:1198–1209CrossRef
14.
Zurück zum Zitat Lee CM, Narayanan S (2003) Emotion recognition using a data-driven fuzzy inference system. In: The proceedings of Eurospeech, pp 157–160 Lee CM, Narayanan S (2003) Emotion recognition using a data-driven fuzzy inference system. In: The proceedings of Eurospeech, pp 157–160
15.
Zurück zum Zitat Litman DJ, Forbes-Riley K (2006) Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun 48:559–590CrossRef Litman DJ, Forbes-Riley K (2006) Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun 48:559–590CrossRef
16.
Zurück zum Zitat Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40:117–143MATHCrossRef Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40:117–143MATHCrossRef
17.
Zurück zum Zitat Ang J, Dhillon R, Krupski A, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: The proceedings of international conference on spoken language processing, pp 2037–2039 Ang J, Dhillon R, Krupski A, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: The proceedings of international conference on spoken language processing, pp 2037–2039
18.
Zurück zum Zitat Liscombe J, Hirschberg J, Venditti JJ (2005) Detecting certainness in spoken tutorial dialogues. In: The proceeding of European conference on speech communication and technology, pp 1837–1840 Liscombe J, Hirschberg J, Venditti JJ (2005) Detecting certainness in spoken tutorial dialogues. In: The proceeding of European conference on speech communication and technology, pp 1837–1840
19.
Zurück zum Zitat Womack BD, Hansen JHL (1996) Classification of speech under stress using target driven features. Speech Commun 20:131–150CrossRef Womack BD, Hansen JHL (1996) Classification of speech under stress using target driven features. Speech Commun 20:131–150CrossRef
20.
Zurück zum Zitat Gharavian D, Ahadi SM (2008) Stressed speech recognition using a warped frequency scale. IEICE Electron Express 5:187–191CrossRef Gharavian D, Ahadi SM (2008) Stressed speech recognition using a warped frequency scale. IEICE Electron Express 5:187–191CrossRef
21.
Zurück zum Zitat Laukka P, Neiberg D, Forsell M, Karlsson I, Elenius K (2011) Expression of affect in spontaneous speech: acoustic correlates and automatic detection of irritation and resignation. Comput Speech Lang 25:84–104CrossRef Laukka P, Neiberg D, Forsell M, Karlsson I, Elenius K (2011) Expression of affect in spontaneous speech: acoustic correlates and automatic detection of irritation and resignation. Comput Speech Lang 25:84–104CrossRef
22.
Zurück zum Zitat Tolkmitt FJ, Scherer KR (1986) Effect of experimentally induced stress on vocal parameters. J Exp Psychol Hum Percept Perform 12:302–313CrossRef Tolkmitt FJ, Scherer KR (1986) Effect of experimentally induced stress on vocal parameters. J Exp Psychol Hum Percept Perform 12:302–313CrossRef
23.
Zurück zum Zitat Cairns D, Hansen JHL (1994) Nonlinear analysis and detection of speech under stressed conditions. J Acoust Soc Am 96:3392–3400CrossRef Cairns D, Hansen JHL (1994) Nonlinear analysis and detection of speech under stressed conditions. J Acoust Soc Am 96:3392–3400CrossRef
24.
Zurück zum Zitat Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: The proceedings of international conference on spoken language processing, vol 3, pp 1970–1973 Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: The proceedings of international conference on spoken language processing, vol 3, pp 1970–1973
25.
Zurück zum Zitat Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13:293–303CrossRef Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13:293–303CrossRef
26.
Zurück zum Zitat Gharavian D, Ahadi SM (2005) The effect of emotion on Farsi speech parameters: a statistical evaluation. In: The proceedings of international conference on speech and computer, pp 463–466 Gharavian D, Ahadi SM (2005) The effect of emotion on Farsi speech parameters: a statistical evaluation. In: The proceedings of international conference on speech and computer, pp 463–466
27.
Zurück zum Zitat Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48:1162–1181CrossRef Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48:1162–1181CrossRef
28.
Zurück zum Zitat Shami M, Verhelst W (2007) An evaluation of the robustness of existing supervised machine learning approaches to the classifications of emotions in speech. Speech Commun 49:201–212CrossRef Shami M, Verhelst W (2007) An evaluation of the robustness of existing supervised machine learning approaches to the classifications of emotions in speech. Speech Commun 49:201–212CrossRef
29.
Zurück zum Zitat Altun H, Polat G (2009) Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Syst Appl 36:8197–8203CrossRef Altun H, Polat G (2009) Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Syst Appl 36:8197–8203CrossRef
30.
Zurück zum Zitat Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl (published online 27 May 2011). doi:10.1007/s00521-011-0643-1 Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl (published online 27 May 2011). doi:10.​1007/​s00521-011-0643-1
31.
Zurück zum Zitat Sheikhan M, Safdarkhani MK, Gharavian D (2011) Emotion recognition of speech using small-size selected feature set and ANN-based classifiers: a comparative study. World Appl Sci J 14:616–625 Sheikhan M, Safdarkhani MK, Gharavian D (2011) Emotion recognition of speech using small-size selected feature set and ANN-based classifiers: a comparative study. World Appl Sci J 14:616–625
32.
Zurück zum Zitat Gharavian D, Sheikhan M, Pezhmanpour M (2011) GMM-based emotion recognition in Farsi language using feature selection algorithms. World Appl Sci J 14:626–638 Gharavian D, Sheikhan M, Pezhmanpour M (2011) GMM-based emotion recognition in Farsi language using feature selection algorithms. World Appl Sci J 14:626–638
33.
Zurück zum Zitat Fersini E, Messina E, Archetti F (2012) Emotional states in judicial courtrooms: an experimental investigation. Speech Commun 54:11–22CrossRef Fersini E, Messina E, Archetti F (2012) Emotional states in judicial courtrooms: an experimental investigation. Speech Commun 54:11–22CrossRef
34.
Zurück zum Zitat Young SJ, Evermann G, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2002) The HTK book (Ver. 3.2). Cambridge University Press, Cambridge Young SJ, Evermann G, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2002) The HTK book (Ver. 3.2). Cambridge University Press, Cambridge
35.
Zurück zum Zitat SPSS Inc. (2007) Clementine® 12.0 algorithms guide. Integral Solutions Limited, Chicago SPSS Inc. (2007) Clementine® 12.0 algorithms guide. Integral Solutions Limited, Chicago
36.
Zurück zum Zitat Freedman DA (2005) Statistical models: theory and practice. Cambridge University Press, CambridgeCrossRef Freedman DA (2005) Statistical models: theory and practice. Cambridge University Press, CambridgeCrossRef
37.
Zurück zum Zitat Rong J, Li G, Chen YP (2009) Acoustic feature selection for automatic emotion recognition from speech. Info Process Manage 45:315–328CrossRef Rong J, Li G, Chen YP (2009) Acoustic feature selection for automatic emotion recognition from speech. Info Process Manage 45:315–328CrossRef
38.
Zurück zum Zitat Kao Y, Lee L (2006) Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In: The proceedings of international conference on spoken language processing, pp 1814–1817 Kao Y, Lee L (2006) Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In: The proceedings of international conference on spoken language processing, pp 1814–1817
39.
Zurück zum Zitat Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: The proceedings of international conference on spoken language processing, pp 809–812 Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: The proceedings of international conference on spoken language processing, pp 809–812
40.
Zurück zum Zitat Pao T, Chen Y, Yeh J, Chang Y (2008) Emotion recognition and evaluation of Mandarin speech using weighted D-KNN classification. Int J Innov Comput Info Control 4:1695–1709 Pao T, Chen Y, Yeh J, Chang Y (2008) Emotion recognition and evaluation of Mandarin speech using weighted D-KNN classification. Int J Innov Comput Info Control 4:1695–1709
41.
Zurück zum Zitat Sidorova J (2009) Speech emotion recognition with TGI+.2 classifier. In: The proceedings of the EACL student research workshop, pp 54–60 Sidorova J (2009) Speech emotion recognition with TGI+.2 classifier. In: The proceedings of the EACL student research workshop, pp 54–60
42.
Zurück zum Zitat Gajšek R, Štruc V, Mihelič F (2010) Multi-modal emotion recognition using canonical correlations and acoustic features. In: The proceedings of international conference on pattern recognition, pp 4133–4136 Gajšek R, Štruc V, Mihelič F (2010) Multi-modal emotion recognition using canonical correlations and acoustic features. In: The proceedings of international conference on pattern recognition, pp 4133–4136
43.
Zurück zum Zitat Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90:1415–1423MATHCrossRef Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90:1415–1423MATHCrossRef
44.
Zurück zum Zitat Bitouk D, Verma R, Nenkova A (2010) Class-level spectral features for emotion recognition. Speech Commun 52:613–625CrossRef Bitouk D, Verma R, Nenkova A (2010) Class-level spectral features for emotion recognition. Speech Commun 52:613–625CrossRef
45.
Zurück zum Zitat Yeh J, Pao T, Lin C, Tsai Y, Chen Y (2010) Segment-based emotion recognition from continuous Mandarin Chinese speech. Comput Hum Behav 27:1545–1552CrossRef Yeh J, Pao T, Lin C, Tsai Y, Chen Y (2010) Segment-based emotion recognition from continuous Mandarin Chinese speech. Comput Hum Behav 27:1545–1552CrossRef
46.
Zurück zum Zitat Wu S, Falk TH, Chan WY (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53:768–785CrossRef Wu S, Falk TH, Chan WY (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53:768–785CrossRef
47.
Zurück zum Zitat He L, Lech M, Maddage NC, Allen NB (2011) Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomed Signal Process Control 6:139–146CrossRef He L, Lech M, Maddage NC, Allen NB (2011) Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomed Signal Process Control 6:139–146CrossRef
48.
Zurück zum Zitat Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324MATHCrossRef Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324MATHCrossRef
49.
Zurück zum Zitat Hyvarinen A (1999) Survey of independent component analysis. Neural Comput Surv 2:94–128 Hyvarinen A (1999) Survey of independent component analysis. Neural Comput Surv 2:94–128
50.
Zurück zum Zitat Talavera L (1999) Feature selection as a preprocessing step for hierarchical clustering. In: The proceedings of international conference on machine learning, pp 389–397 Talavera L (1999) Feature selection as a preprocessing step for hierarchical clustering. In: The proceedings of international conference on machine learning, pp 389–397
51.
Zurück zum Zitat Liu H, Motoda H, Yu L (2002) Feature selection with selective sampling. In: The proceedings of international conference on machine learning, pp 395–402 Liu H, Motoda H, Yu L (2002) Feature selection with selective sampling. In: The proceedings of international conference on machine learning, pp 395–402
52.
Zurück zum Zitat Ververidis D, Kotropoulos C (2006) Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In: The proceedings of European signal processing conference, pp 1–5 Ververidis D, Kotropoulos C (2006) Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In: The proceedings of European signal processing conference, pp 1–5
53.
Zurück zum Zitat Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N (2011) Whodunnit-Searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25:4–28CrossRef Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N (2011) Whodunnit-Searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25:4–28CrossRef
54.
Zurück zum Zitat Haq S, Jackson PJB, Edge J (2008) Audio-visual feature selection and reduction for emotion classification. In: The proceedings of international conference on auditory-visual speech processing, pp 185–190 Haq S, Jackson PJB, Edge J (2008) Audio-visual feature selection and reduction for emotion classification. In: The proceedings of international conference on auditory-visual speech processing, pp 185–190
55.
Zurück zum Zitat Pérez-Espinosa H, Reyes-García CA, Villaseñor-Pineda L (2011) Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomed Signal Process Control (published online 3 April 2011). doi:10.1016/j.bspc.2011.02.008 Pérez-Espinosa H, Reyes-García CA, Villaseñor-Pineda L (2011) Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomed Signal Process Control (published online 3 April 2011). doi:10.​1016/​j.​bspc.​2011.​02.​008
56.
Zurück zum Zitat Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: The proceedings of the international conference on spoken language processing, pp 222–225 Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: The proceedings of the international conference on spoken language processing, pp 222–225
57.
Zurück zum Zitat Väyrynen E, Toivanen J, Seppänen T (2011) Classification of emotion in spoken Finnish using vowel-length segments: increasing reliability with a fusion technique. Speech Commun 53:269–282CrossRef Väyrynen E, Toivanen J, Seppänen T (2011) Classification of emotion in spoken Finnish using vowel-length segments: increasing reliability with a fusion technique. Speech Commun 53:269–282CrossRef
58.
Zurück zum Zitat Iliev AI, Scordilis MS, Papa JP, Falcão AX (2010) Spoken emotion recognition through optimum-path forest classification using glottal features. Comput Speech Lang 24:445–460CrossRef Iliev AI, Scordilis MS, Papa JP, Falcão AX (2010) Spoken emotion recognition through optimum-path forest classification using glottal features. Comput Speech Lang 24:445–460CrossRef
59.
Zurück zum Zitat El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44:572–587MATHCrossRef El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44:572–587MATHCrossRef
60.
Zurück zum Zitat Nwe TL, Foo SV, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41:603–623CrossRef Nwe TL, Foo SV, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41:603–623CrossRef
61.
Zurück zum Zitat Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 2, pp 1–4 Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 2, pp 1–4
62.
Zurück zum Zitat Luengo I, Navas E, Hernáez I, Sanchez J (2005) Automatic emotion recognition using prosodic parameters. In: The proceeding of Interspeech, pp 493–496 Luengo I, Navas E, Hernáez I, Sanchez J (2005) Automatic emotion recognition using prosodic parameters. In: The proceeding of Interspeech, pp 493–496
63.
Zurück zum Zitat Kockmann M, Burget L, Černocky JH (2011) Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Commun (article in press). doi:10.1016/j.specom.2011.01.007 Kockmann M, Burget L, Černocky JH (2011) Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Commun (article in press). doi:10.​1016/​j.​specom.​2011.​01.​007
64.
Zurück zum Zitat Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 1, pp 577–580 Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 1, pp 577–580
65.
Zurück zum Zitat Chuang ZJ, Wu CH (2004) Emotion recognition using acoustic features and textual content. In: The proceedings of the international conference on multimedia and expo, vol 1, pp 53–56 Chuang ZJ, Wu CH (2004) Emotion recognition using acoustic features and textual content. In: The proceedings of the international conference on multimedia and expo, vol 1, pp 53–56
66.
Zurück zum Zitat Hoch S, Althoff F, McGlaun G, Rigooll G (2005) Bimodal fusion of emotional data in an automotive environment. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 2, pp 1085–1088 Hoch S, Althoff F, McGlaun G, Rigooll G (2005) Bimodal fusion of emotional data in an automotive environment. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 2, pp 1085–1088
67.
Zurück zum Zitat Morrison D, Wang R, de Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centers. Speech Commun 49:98–112CrossRef Morrison D, Wang R, de Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centers. Speech Commun 49:98–112CrossRef
68.
Zurück zum Zitat Chandaka S, Chatterjee A, Munshi S (2009) Support vector machines employing cross-correlation for emotional speech recognition. Measurement 42:611–618CrossRef Chandaka S, Chatterjee A, Munshi S (2009) Support vector machines employing cross-correlation for emotional speech recognition. Measurement 42:611–618CrossRef
69.
Zurück zum Zitat Wang F, Verhelst W, Sahli H (2011) Relevance vector machine based speech emotion recognition. Lecture Notes in Computer Science. Affect Comput Intell Interact 6975:111–120CrossRef Wang F, Verhelst W, Sahli H (2011) Relevance vector machine based speech emotion recognition. Lecture Notes in Computer Science. Affect Comput Intell Interact 6975:111–120CrossRef
70.
Zurück zum Zitat Nicholson J, Takahashi K, Nakatsu R (1999) Emotion recognition in speech using neural networks. In: The proceedings of the international conference on neural information processing, vol 2, pp 495–501 Nicholson J, Takahashi K, Nakatsu R (1999) Emotion recognition in speech using neural networks. In: The proceedings of the international conference on neural information processing, vol 2, pp 495–501
71.
Zurück zum Zitat Lee CM, Narayanan S, Pieraccini R (2002) Combining acoustic and language information for emotion recognition. In: The proceedings of the international conference on spoken language processing, pp 873-876 Lee CM, Narayanan S, Pieraccini R (2002) Combining acoustic and language information for emotion recognition. In: The proceedings of the international conference on spoken language processing, pp 873-876
72.
Zurück zum Zitat Park CH, Lee DW, Sim KB (2002) Emotion recognition of speech based on RNN. In: The proceedings of the international conference on machine learning and cybernetics, vol 4, pp 2210–2213 Park CH, Lee DW, Sim KB (2002) Emotion recognition of speech based on RNN. In: The proceedings of the international conference on machine learning and cybernetics, vol 4, pp 2210–2213
73.
Zurück zum Zitat Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71:2553–2562CrossRef Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71:2553–2562CrossRef
74.
Zurück zum Zitat Planet S, Iriondo I, Socor′o J, Monzo C, Adell J (2009) GTMURL contribution to the INTERSPEECH 2009 emotion challenge. In: The proceedings of 10th annual of the international speech communication association (Interspeech’09), pp 316–319 Planet S, Iriondo I, Socor′o J, Monzo C, Adell J (2009) GTMURL contribution to the INTERSPEECH 2009 emotion challenge. In: The proceedings of 10th annual of the international speech communication association (Interspeech’09), pp 316–319
75.
Zurück zum Zitat Lee CC, Mower E, Busso C, Lee S, Narayanan S (2011) Emotion recognition using a hierarchical binary decision tree approach. Speech Commun 53:1162–1171CrossRef Lee CC, Mower E, Busso C, Lee S, Narayanan S (2011) Emotion recognition using a hierarchical binary decision tree approach. Speech Commun 53:1162–1171CrossRef
76.
Zurück zum Zitat Schwenker F, Scherer S, Schmidt M, Schels M, Glodek M (2010) Multiple classifier systems for the recognition of human emotions. Lecture Notes in Computer Science. Multiple Classif Syst 5997:315–324CrossRef Schwenker F, Scherer S, Schmidt M, Schels M, Glodek M (2010) Multiple classifier systems for the recognition of human emotions. Lecture Notes in Computer Science. Multiple Classif Syst 5997:315–324CrossRef
77.
Zurück zum Zitat Schwenker F, Scherer S, Magdi YM, Palm G (2009) The GMM-SVM supervector approach for the recognition of the emotional status from speech. Lecture Notes in Computer Science. Artif Neural Netw 5768:894–903 Schwenker F, Scherer S, Magdi YM, Palm G (2009) The GMM-SVM supervector approach for the recognition of the emotional status from speech. Lecture Notes in Computer Science. Artif Neural Netw 5768:894–903
78.
Zurück zum Zitat Scherer S, Schwenker F, Palm G (2008) Emotion recognition from speech using multi-classifier systems and RBF-ensembles. Studies in Computational Intelligence. Speech, audio, image and biomedical signal processing using neural networks, vol 83, pp 49–70 Scherer S, Schwenker F, Palm G (2008) Emotion recognition from speech using multi-classifier systems and RBF-ensembles. Studies in Computational Intelligence. Speech, audio, image and biomedical signal processing using neural networks, vol 83, pp 49–70
79.
Zurück zum Zitat Wu CH, Liang WB (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans Affect Comput 2:10–21CrossRef Wu CH, Liang WB (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans Affect Comput 2:10–21CrossRef
80.
Zurück zum Zitat Lefter I, Rothkrantz LJM, Wiggers P, van Leeuwen DA (2010) Emotion recognition from speech by combining databases and fusion of classifiers. Lecture Notes in Computer Science. Text Speech Dialogue 6231:353–360CrossRef Lefter I, Rothkrantz LJM, Wiggers P, van Leeuwen DA (2010) Emotion recognition from speech by combining databases and fusion of classifiers. Lecture Notes in Computer Science. Text Speech Dialogue 6231:353–360CrossRef
81.
Zurück zum Zitat Scherer S, Schwenker F, Palm G (2009) Classifier fusion for emotion recognition from speech. In: Advanced intelligent environments, pp 95–117 Scherer S, Schwenker F, Palm G (2009) Classifier fusion for emotion recognition from speech. In: Advanced intelligent environments, pp 95–117
82.
Zurück zum Zitat Pao TL, Chien CS, Chen YT, Yeh JH, Cheng YM, Liao WY (2007) Combination of multiple classifiers for improving emotion recognition in Mandarin speech. In: The proceedings of the international conference on intelligent information hiding and multimedia signal processing, vol 1, pp 35–38 Pao TL, Chien CS, Chen YT, Yeh JH, Cheng YM, Liao WY (2007) Combination of multiple classifiers for improving emotion recognition in Mandarin speech. In: The proceedings of the international conference on intelligent information hiding and multimedia signal processing, vol 1, pp 35–38
83.
Zurück zum Zitat Clavel C, Vasilescu I, Devillers L (2011) Fiction support for realistic portrayals of fear-type emotional manifestations. Comput Speech Lang 25:63–83CrossRef Clavel C, Vasilescu I, Devillers L (2011) Fiction support for realistic portrayals of fear-type emotional manifestations. Comput Speech Lang 25:63–83CrossRef
84.
Zurück zum Zitat Bijankhan M, Sheikhzadegan J, Roohani MR, Samareh Y, Lucas C, Tebiani M (1994) The speech database of Farsi spoken language. In: The proceedings of the international conference on speech science and technology, pp 826–831 Bijankhan M, Sheikhzadegan J, Roohani MR, Samareh Y, Lucas C, Tebiani M (1994) The speech database of Farsi spoken language. In: The proceedings of the international conference on speech science and technology, pp 826–831
85.
Zurück zum Zitat Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun (article in press). doi:10.1016/j.specom.2011.01.011 Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun (article in press). doi:10.​1016/​j.​specom.​2011.​01.​011
86.
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
87.
90.
Zurück zum Zitat Ghanem AS, Venkatesh S, West G (2010) Multi-class pattern classification in imbalanced data. In: The proceedings of the international conference on pattern recognition, pp 2881–2884 Ghanem AS, Venkatesh S, West G (2010) Multi-class pattern classification in imbalanced data. In: The proceedings of the international conference on pattern recognition, pp 2881–2884
91.
Zurück zum Zitat Wang Y, Guan L (2005) Recognizing human emotion from audiovisual information. In: The proceedings of the international conference on acoustics, speech, and signal processing, pp 1125–1128 Wang Y, Guan L (2005) Recognizing human emotion from audiovisual information. In: The proceedings of the international conference on acoustics, speech, and signal processing, pp 1125–1128
92.
Zurück zum Zitat Kittler J, Hojjatoleslami A, Windeatt T (1997) Weighting factors in multiple expert fusion. In: The proceedings of the British machine vision conference, pp 42–50 Kittler J, Hojjatoleslami A, Windeatt T (1997) Weighting factors in multiple expert fusion. In: The proceedings of the British machine vision conference, pp 42–50
93.
Zurück zum Zitat Yu F, Chang E, Xu Y, Shum H (2001) Emotion detection from speech to enrich multimedia content. In: The proceedings of the IEEE Pacific Rim conference on multimedia: advances in multimedia information processing, pp 550–557 Yu F, Chang E, Xu Y, Shum H (2001) Emotion detection from speech to enrich multimedia content. In: The proceedings of the IEEE Pacific Rim conference on multimedia: advances in multimedia information processing, pp 550–557
94.
Zurück zum Zitat Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signal. In: The proceedings of the European conference on speech communication and technology, pp 125–128 Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signal. In: The proceedings of the European conference on speech communication and technology, pp 125–128
95.
Zurück zum Zitat Ayadi M, Kamel S, Karray F (2007) Speech emotion recognition using Gaussian mixture vector autoregressive models. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 5, pp 957–960 Ayadi M, Kamel S, Karray F (2007) Speech emotion recognition using Gaussian mixture vector autoregressive models. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 5, pp 957–960
Metadaten
Titel
Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method
verfasst von
Mansour Sheikhan
Mahdi Bejani
Davood Gharavian
Publikationsdatum
01.07.2013
Verlag
Springer-Verlag
Erschienen in
Neural Computing and Applications / Ausgabe 1/2013
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-012-0814-8

Weitere Artikel der Ausgabe 1/2013

Neural Computing and Applications 1/2013 Zur Ausgabe