Top

Neural Computing and Applications

Published in:

01-05-2013 | Original Article

Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model

Authors: Davood Gharavian, Mansour Sheikhan, Farhad Ashoftedel

Published in: Neural Computing and Applications | Issue 6/2013

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In recent four decades, enormous efforts have been focused on developing automatic speech recognition systems to extract linguistic information, but much research is needed to decode the paralinguistic information such as speaking styles and emotion. The effect of using first three normalized formant frequencies and pitch frequency as supplementary features on improving the performance of an emotion recognition system that uses Mel-frequency cepstral coefficients and energy-related features, as the components of feature vector, is investigated in this paper. The normalization is performed using a dynamic time warping-multi-layer perceptron hybrid model after determining the frequency range that is most affected by emotion. To reduce the number of features, fast correlation-based filter and analysis of variations (ANOVA) methods are used in this study. Recognizing of the emotional states is performed using Gaussian mixture model. Experimental results show that first formant (F₁)-based warping and ANOVA-based feature selection result in the best performance as compared to other simulated systems in this study, and the average emotion recognition accuracy is acceptable as compared to most of the recent researches in this field.

previous article State of charge neural computational models for high energy density batteries in electric vehicles

next article Partly adaptive elastic net and its application to microarray classification

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90:1415–1423MATHCrossRef

Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53:1062–1087CrossRef

Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: The proceedings of international conference on spoken language processing, vol 3, pp 1970–1973

Ai H, Litman DJ, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring systems. In: The proceedings of Interspeech, pp 797–800

Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human–human call center dialogs. In: The proceedings of Interspeech, pp 801–804

Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: The proceedings of Interspeech, pp 340–343

López-Cózar R, Silovsky J, Kroul M (2011) Enhancement of emotion detection in spoken dialogue systems by combining several information sources. Speech Commun 53:1210–1228CrossRef

Fernandez R, Picard R (2011) Recognizing affect from speech prosody using hierarchical graphical models. Speech Commun 53:1088–1103CrossRef

Kao Y, Lee L (2006) Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In: The proceedings of international conference on spoken language processing, pp 1814–1817

10.

Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48:1162–1181CrossRef

11.

Pao T, Chen Y, Yeh J, Chang Y (2008) Emotion recognition and evaluation of Mandarin speech using weighted D-KNN classification. Int J Innov Comput Inf Control 4:1695–1709

12.

Altun H, Polat G (2009) Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Syst Appl 36:8197–8203CrossRef

13.

Gajšek R, Struc V, Mihelič F (2010) Multi-modal emotion recognition using canonical correlations and acoustic features. In: The proceedings of international conference on pattern recognition, pp 4133–4136

14.

Bitouk D, Verma R, Nenkova A (2010) Class-level spectral features for emotion recognition. Speech Commun 52:613–625CrossRef

15.

Yeh J, Pao T, Lin C, Tsai Y, Chen Y (2010) Segment-based emotion recognition from continuous Mandarin Chinese speech. Comput Hum Behav 27:1545–1552CrossRef

16.

Wu S, Falk TH, Chan W-Y (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53:768–785CrossRef

17.

He L, Lech M, Maddage NC, Allen NB (2011) Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomed Signal Process Control 6:139–146CrossRef

18.

Polzehl T, Schmitt A, Metze F, Wagner M (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Commun 53:1198–1209CrossRef

19.

Ververidis D, Kotropoulos C (2006) Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In: The proceedings of European signal processing conference, pp 1–5

20.

Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N (2011) Whodunnit-searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25:4–28CrossRef

21.

Haq S, Jackson PJB, Edge J (2008) Audio-visual feature selection and reduction for emotion classification. In: The proceedings of international conference on auditory-visual speech processing, pp 185–190

22.

Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl [Published Online 27 May 2011 (doi:10.1007/s00521-011-0643-1)]

23.

Rong J, Li G, Chen YP (2009) Acoustic feature selection for automatic emotion recognition from speech. Inf Process Manag 45:315–328CrossRef

24.

Iliev AI, Scordilis MS, Papa JP, Falcão AX (2010) Spoken emotion recognition through optimum-path forest classification using glottal features. Comput Speech Lang 24:445–460CrossRef

25.

Väyrynen E, Toivanen J, Seppänen T (2011) Classification of emotion in spoken Finnish using vowel-length segments: increasing reliability with a fusion technique. Speech Commun 53:269–282CrossRef

26.

Fersini E, Messina E, Archetti F (2012) Emotional states in judicial courtrooms: an experimental investigation. Speech Commun 54:11–22CrossRef

27.

El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44:572–587MATHCrossRef

28.

Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25:556–570CrossRef

29.

Kockmann M, Burget L, Černocky JH (2011) Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Commun (Article in Press, doi:10.1016/j.specom.2011.01.007)

30.

Chandaka S, Chatterjee A, Munshi S (2009) Support vector machines employing cross-correlation for emotional speech recognition. Measurement 42:611–618CrossRef

31.

Yacoub S, Simske S, Lin X, Burns J (2003) Recognition of emotions in interactive voice response systems. In: The proceeding of European conference on speech communication and technology, pp 729–732

32.

Lee CM, Narayanan S, Pieraccini R (2002) Combining acoustic and language information for emotion recognition. In: The proceedings of the international conference on spoken language processing, pp 873–876

33.

Park CH, Lee DW, Sim KB (2002) Emotion recognition of speech based on RNN. In: The proceedings of the international conference on machine learning and cybernetics, vol 4, pp 2210–2213

34.

Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71:2553–2562CrossRef

35.

Planet S, Iriondo I, Socor′o J, Monzo C, Adell J (2009) GTMURL contribution to the INTERSPEECH 2009 emotion challenge. In: The proceedings of 10th annual of the international speech communication association (Interspeech’09), pp 316–319

36.

Lee C–C, Mower E, Busso C, Lee S, Narayanan S (2011) Emotion recognition using a hierarchical binary decision tree approach. Speech Commun 53:1162–1171CrossRef

37.

Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555MathSciNetMATH

38.

NIST/SEMATECH (2011) e-Handbook of Statistical Methods. (http://www.itl.nist.gov/div898/handbook/)

39.

Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23:67–72CrossRef

40.

Yu F, Chang E, Xu Y, Shum H (2001) Emotion detection from speech to enrich multimedia content. In: The proceedings of the IEEE Pacific Rim conference on multimedia: advances in multimedia information processing, pp 550–557

41.

Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signal. In: The proceedings of the European conference on speech communication and technology, pp 125–128

42.

Ayadi M, Kamel S, Karray F (2007) Speech emotion recognition using Gaussian mixture vector autoregressive models. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 5, pp 957–960

43.

Vlasenko B, Wendemuth A (2007) Tuning hidden Markov model for speech emotion recognition. In: The proceedings of 33rd German annual conference on acoustics, pp 317–320

44.

Sidorova J (2009) Speech emotion recognition with TGI + .2 classifier. In: The proceedings of the EACL student research workshop, pp 54–60

45.

Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: The proceedings of the international conference on spoken language processing, pp 222–225

46.

Luengo I, Navas E, Hernáez I, Sanchez J (2005) Automatic emotion recognition using prosodic parameters. In: The proceeding of Interspeech, pp 493–496

47.

Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: The proceedings of international conference on spoken language processing, pp 809–812

Title: Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model
Authors: Davood Gharavian
Mansour Sheikhan
Farhad Ashoftedel
Publication date: 01-05-2013
Publisher: Springer-Verlag
Published in: Neural Computing and Applications / Issue 6/2013
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-012-0884-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 6/2013

On some new operations in soft module theory

Estimation of brainstem auditory evoked potentials using a nonlinear adaptive filtering algorithm

A rough margin-based one class support vector machine

Synchronization of unknown chaotic neural networks with stochastic perturbation and time delay in the leakage term based on adaptive control and parameter identification

A fuzzy linear programming model for risk evaluation in failure mode and effects analysis

Real-time torque control using discrete-time recurrent high-order neural networks

Premium Partner