nach oben

Soft Computing

Erschienen in:

21.03.2016 | Methodologies and Application

Feature extraction based on bio-inspired model for robust emotion recognition

verfasst von: Enrique M. Albornoz, Diego H. Milone, Hugo L. Rufiner

Erschienen in: Soft Computing | Ausgabe 17/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Emotional state identification is an important issue to achieve more natural speech interactive systems. Ideally, these systems should also be able to work in real environments in which generally exist some kind of noise. Several bio-inspired representations have been applied to artificial systems for speech processing under noise conditions. In this work, an auditory signal representation is used to obtain a novel bio-inspired set of features for emotional speech signals. These characteristics, together with other spectral and prosodic features, are used for emotion recognition under noise conditions. Neural models were trained as classifiers and results were compared to the well-known mel-frequency cepstral coefficients. Results show that using the proposed representations, it is possible to significantly improve the robustness of an emotion recognition system. The results were also validated in a speaker-independent scheme and with two emotional speech corpora.

Vorheriger Artikel Multimodal biometric system based on information set theory and refined scores

Nächster Artikel A bi-criteria evolutionary algorithm for a constrained multi-depot vehicle routing problem

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Baseline feature set for the INTERSPEECH 2013 Computational Paralinguistics Evaluation Challenge.

Using Matlab.

http://pascal.kgw.tu-berlin.de/emodb/.

Each partition has 196 utterances for training, 63 utterances for generalization test and 63 utterances for the final validation.

Available at http://www.enterface.net/enterface05.

Adell Mercado J, Bonafonte Cávez A, Escudero Mancebo D (2005) Analysis of prosodic features: towards modelling of emotional and pragmatic attributes of speech. Proces Leng Nat 35:277–283

Albornoz EM, Milone DH (2016) Emotion recognition in never-seen languages using a novel ensemble method with emotion profiles. IEEE Trans Affect Comput Prepr. doi:10.1109/TAFFC.2015.2503757

Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25(3):556–570CrossRef

Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N (2011) Whodunnit–searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25(1):4–28CrossRef

Bishop CM (2006) Pattern recognition and machine learning, 1st edn. Springer

Borchert M, Dusterhoft A (2005) Emotions in speech—experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. In: Proceedings of IEEE international conference on natural language processing and knowledge engineering (NLP-KE), pp 147–151

Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of german emotional speech. In: Proceedings of Interspeech, Lisboa, pp 1517–1520

Chanel G, Kierkels JJ, Soleymani M, Pun T (2009) Short-term emotion assessment in a recall paradigm. Int J Hum Comp Stud 67(8):607–627CrossRef

Chi T, Ru P, Shamma SA (2005) Multiresolution spectrotemporal analysis of complex sounds. J Acoust Soc Am 118(2):887–906CrossRef

Chin YH, Lin SH, Lin CH, Siahaan E, Frisky A, Wang JC (2014) Emotion profile-based music recommendation. In: Proceedings of 7th international conference on Ubi-media computing and workshops (UMEDIA), pp 111–114

Clavel C, Vasilescu I, Devillers L, Richard G, Ehrette T (2008) Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun 50(6):487–503CrossRef

Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF (2015) A review of depression and suicide risk assessment using speech analysis. Speech Commun 71:10–49CrossRef

Dellaert F, Polzin T, Waibel A (1996) Recognizing emotions in speech. In: Proceedings of international conference on spoken language processing (ICSLP), vol 3, pp 1970–1973

Deller JR Jr, Proakis JG, Hansen JH (1993) Discrete-time processing of speech signals. Prentice Hall, Upper Saddle River

Devillers L, Vidrascu L (2007) Real-life emotion recognition in speech. In: Müller C (ed) Speaker classification II: selected projects, lecture notes in computer science, vol 4441/2007. Springer-Verlag, Berlin, pp 34–42

El Ayadi M, Kamel M, Karray F (2007) Speech emotion recognition using Gaussian mixture vector autoregressive models. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), vol 4, pp IV-957–IV-960

El Ayadi M, Kamel M, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587CrossRefMATH

Eyben F, Schuller B, Rigoll G (2012) Improving generalisation and robustness ofacoustic affect recognition. In: Proceedings of the 14th ACM international conference on multimodal interaction, ACM, ICMI ’12, pp 517–522

Eyben F, Weninger F, Schuller B (2013) Affect recognition in real-life acoustic conditions—a new perspective on feature selection. In: Proceedings of Interspeech, Lyon, pp 2044–2048

Fu L, Mao X, Chen L (2008) Speaker independent emotion recognition based on SVM/HMMs fusion system. In: Proceedings of international conference on audio, language and image processing (ICALIP), pp 61–65

Giakoumis D, Tzovaras D, Hassapis G (2013) Subject-dependent biosignal features for increased accuracy in psychological stress detection. Int J Hum Comput Stud 71(4):425–439CrossRef

Han Z, Lun S, Wang J (2012) A study on speech emotion recognition based onCCBC and neural network. In: Proceedings of international conference on computer science and electronics engineering (ICCSEE), IEEE Computer Society, vol 2, pp 144–147

Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall

Iliev AI, Scordilis MS, Papa JP, Falcão AX (2010) Spoken emotion recognition through optimum-path forest classification using glottal features. Comput Speech Lang 24(3):445–460CrossRef

Kandali A, Routray A, Basu T (2010) Vocal emotion recognition in five languages of Assam using features based on MFCCs and Eigen values of autocorrelation matrix in presence of babble noise. In: Proceedings of national conference on communications (NCC), pp 1–5

Kapoor A, Burleson W, Picard RW (2007) Automatic prediction of frustration. Int J Hum Comput Stud 65(8):724–736CrossRef

Kim J (2007) Bimodal emotion recognition using speech and physiological changes. In: Robust speech recognition and understanding. I-Tech Education and Publishing, Vienna, pp 265–280

Kim J, André E (2008) Emotion recognition based on physiological changes in music listening. IEEE Trans Pattern Anal Mach Intell 30(12):2067–2083CrossRef

Koolagudi S, Rao K (2012) Emotion recognition from speech using source, system, and prosodic features. Int J Speech Technol 15:265–289CrossRef

Lazarus R (2001) Relational meaning and discrete emotions. In: Scherer K, Schorr A, Johnstone T (eds) Appraisal processes in emotion: Theory, methods, research. Oxford University Press, New York, pp 37–67

Lee C, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using ahierarchical binary decision tree approach. In: Proceedings of Interspeech, Brighton, pp 320–323

Lin YL, Wei G (2005) Speech emotion recognition based on HMM and SVM. In: Proceedings of international conference on machine learning and cybernetics, vol 8, pp 4898–4901

Luengo Gil I, Navas Cordón E, Hernáez Rioja IC, Sánchez de la Fuente J (2005) Reconocimiento automático de emociones utilizando parámetros prosódicos. Proces Leng Nat 35:13–20

Lugger M, Yang B (2008) Psychological motivated multi-stage emotion classification exploiting voice quality features. In: Speech recognition, Technologies and Applications. InTech

Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visualemotion database. In: Proceedings of the 22nd international conference on data engineering workshops, ICDEW ’06. IEEE Computer Society, pp 1517–1520

Martínez C, Goddard J, Milone D, Rufiner H (2012) Bioinspired sparse spectro-temporal representation of speech for robust classification. Comput Speech Lang 26(5):336–348CrossRef

Martínez C, Goddard J, Di Persia L, Milone D, Rufiner H (2015) Denoising sound signals in a bioinspired non-negative spectro-temporal domain. Digit Signal Process Rev J 38:22–31CrossRef

Mesgarani N, Shamma S (2007) Denoising in the domain of spectrotemporal modulations. EURASIP J Audio Speech Music Process 3:1–8CrossRef

Michie D, Spiegelhalter D, Taylor C (1994) Machine learning, neural and statistical classification. Ellis Horwood, University College, London

Morrison D, Wang R, Silva LCD (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun 49(2):98–112CrossRef

Noguerias A, Moreno A, Bonafonte A, Mariño J (2001) Speech emotion recognition using hidden Markov models. In: Proceedings of European conference on speech communication and technology (Eurospeech), pp 2679–2682

Pao TL, Liao WY, Chen YT, Yeh JH, Cheng YM, Chien C (2007) Comparison of several classifiers for emotion recognition from noisy mandarin speech. In: Proceedings of 3rd international conference on intelligent information hiding and multimedia signal processing (IIHMSP), vol 1, pp 23–26

Rao KS, Koolagudi SG (2013) Robust emotion recognition using spectral and prosodic features. Springer

Schindler K, Van Gool L, de Gelder B (2008) Recognizing emotions expressed by body pose: a biologically inspired neural model. Neural Netw 21(9):1238–1246CrossRef

Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp I-577–I-580

Schuller B, Seppi D, Batliner A, Maier A, Steidl S (2007) Towards more reality in the recognition of emotional speech. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), vol 4, pp IV-941–IV-944

Schuller B, Vlasenko B, Arsic D, Rigoll G, Wendemuth A (2008) Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition. In: Proceedings of IEEE international conference on multimedia and expo, pp 1333–1336

Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Muller C, Narayanan S (2013) Paralinguistics in speech and language-state-of-the-art and the challenge. Comput Speech Lang 27(1):4–39CrossRef

Schuller BW, Weninger F (2012) Ten recent trends in computational paralinguistics. In: Esposito A, Esposito AM, Vinciarelli A, Hoffmann R, Müller VC (eds) Cognitive behavioural systems, Lecture notes in computer science, vol 7403. Springer, Berlin, pp 35–49

Shamma SA, Chadwick RS, Wilbur WJ, Morrish KA, Rinzel J (1986) A biophysical model of cochlear processing: intensity dependence of pure tone responses. J Acoust Soc Am 80(1):133–145CrossRef

Shojaeilangari S, Yau WY, Nandakumar K, Li J, Teoh EK (2015) Robust representation and recognition of facial emotions using extreme sparse learning. IEEE Trans Image Process 24(7):2140–2152MathSciNetCrossRef

Sztahó D, Imre V, Vicsi K (2011) Automatic classification of emotions in spontaneous speech. In: Esposito A, Vinciarelli A, Vicsi K, Pelachaud C, Nijholt A (eds) Analysis of verbal and nonverbal communication and enactment. The processing issues. Lecture notes in computer science, vol 6800. Springer, Berlin, pp 229–239

Tacconi D, Mayora O, Lukowicz P, Arnrich B, Setz C, Tröster G, Haring C (2008) Activity and emotion recognition to support early diagnosis of psychiatric diseases. In: Proceedings of 2nd international conference on pervasive computing technologies for healthcare, pp 100–102

Tawari A, Trivedi M (2010) Speech emotion analysis in noisy real-world environment. In: Proceedings of 20th international conference on pattern recognition (ICPR), pp 4605–4608

Truong KP, van Leeuwen DA (2007) Automatic discrimination between laughter and speech. Speech Commun 49(2):144–158CrossRef

Wagner J, Vogt T, Andr E (2007) A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In: Paiva A, Prada R, Picard R (eds) Affective computing and intelligent interaction. Lecture notes in computer science, vol 4738. Springer, Berlin, pp 114–125

Wang Y, Guan L, Venetsanopoulos AN (2012) Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Trans Multimed 14(3):597–607CrossRef

Wöllmer M, Kaiser M, Eyben F, Schuller B, Rigoll G (2013) Lstm-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis Comput 31(2):153–163CrossRef

Xiao Z, Dellandréa E, Dou W, Chen L (2009) Recognition of emotions in speech by a hierarchical approach. In: Proceedings of international conference on affective computing and intelligent interaction (ACII), pp 312–319

Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90(5):1415–1423CrossRefMATH

Yang X, Wang K, Shamma SA (1992) Auditory representations of acoustic signals. IEEE Trans Inf Theory 38(2):824–839CrossRef

Yildirim S, Narayanan S, Potamianos A (2011) Detecting emotional state of a child in a conversational computer game. Comput Speech Lang 25(1):29–44CrossRef

Young S, Evermann G, Kershaw D, Moore G, Odell J, Ollason D, Valtchev V, Woodland P (2001) The HTK book (for HTK version 3.1). Cambridge University Engineering Department, Cambridge

Zell A, Mamier G, Vogt M, Mache N, Hubner R, Doring S, Herrmann KU, Soyez T, Schmalzl M, Sommer T, Hatzigeorgiou A, Posselt D, Schreiner T, Kett B, Clemente G (1998) SNNS (Stuttgart neural network simulator). SNNS user manual version 4, Stuttgart

Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58CrossRef

Titel: Feature extraction based on bio-inspired model for robust emotion recognition
verfasst von: Enrique M. Albornoz
Diego H. Milone
Hugo L. Rufiner
Publikationsdatum: 21.03.2016
Verlag: Springer Berlin Heidelberg
Erschienen in: Soft Computing / Ausgabe 17/2017
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-016-2110-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 17/2017

Non-intrusive quantification of performance and its relationship to mood

Solving complex multi-UAV mission planning problems using multi-objective genetic algorithms

Determining the optimal number of body-worn sensors for human activity recognition

A bi-criteria evolutionary algorithm for a constrained multi-depot vehicle routing problem

Special issue IDC 2015 & INISTA 2015

Firefly algorithm with adaptive control parameters