nach oben

Journal on Multimodal User Interfaces

Erschienen in:

01.12.2016 | Original Paper

Audio-visual emotion recognition using multi-directional regression and Ridgelet transform

verfasst von: M. Shamim Hossain, Ghulam Muhammad

Erschienen in: Journal on Multimodal User Interfaces | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we propose an audio-visual emotion recognition system using multi-directional regression (MDR) audio features and ridgelet transform based face image features. MDR features capture directional derivative information in a spectro-temporal domain of speech, and, thereby, suitable to encode different levels of increasing or decreasing pitch and formant frequencies. For video inputs, interest points in a time frame are detected using spectro-temporal filters, and ridgelet transform is applied to cuboids around the interest points. Two separate extreme learning machine classifiers, one for speech modality and the other for face modality, are used. The scores of these two classifiers are fused using a Bayesian sum rule to make the final decision. Experimental results on eNTERFACE database show that the proposed method achieves accuracy of 85.06 % using bimodal inputs, 64.04 % using speech only, and 58.38 % using face only; these accuracies outnumber the accuracies obtained by some other state-of-the-art systems using the same database.

Vorheriger Artikel Integrity analysis of knee joint by acoustic emission technique

Nächster Artikel Action recognition based on binary patterns of action-history and histogram of oriented gradient

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine—belief network architecture. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp I-577–580

Zhou Y, Sun Y, Zhang J, Yan Y (2009) Speech emotion recognition using both spectral and prosodic features. In: Proceedings of International Conference Information Engineering and Computer Science (ICIECS), pp 1–4

Devillers L, Vidrascu V (2006) Real-life emotion detection with lexical and paralinguistic cues on Human-Human call center dialogs. In: Proceedings of Interspeech’2006, Pittsburgh

Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2012) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl 21(8):2115–2126. doi:10.1007/s00521-011-0643-1 CrossRef

Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25(3):556–570CrossRef

Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In: Proceedings of Interspeech’2005, Lisbon

Bettadapura V (2012) Face expression recognition and analysis: the state of the art. College of Computing, Georgia Institute of Technology. arXiv:1203.6722v1

Senechal T, Rapp V, Salam H, Seguier R, Bailly K, Prevost L (2012) Facial action recognition combining heterogeneous features via multikernel learning. IEEE Trans Syst Man Cybern B 42(4):993–1005CrossRef

Agrawal S, Khatri P (2015) Facial expression detection techniques: based on Viola and Jones algorithm and principal component analysis. In: Proceedings of 2015 Fifth International Conference on Advanced Computing & Communication Technologies (ACCT), pp 108–112, 21-22

10.

Majumder A, Behera L, Subramanian VK (2014) Emotion recognition from geometric facial features using self-organizing map. Pattern Recogn 47(3):1282–1293CrossRef

11.

Pantic M, Valstar MF, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: Proceedings of 13th ACM International Conference on Multimedia’05, pp 317–321. Database available at http://www.mmifacedb.com/

12.

Bejani M, Gharavian D, Charkari NM (2014) Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Comput Appl 24(2):399–412CrossRef

13.

Martin O, Kotsia I, Macq B, Pitas I (2006) The eNTERFACE’05 audiovisual emotion database. In: Proceedings of ICDEW’2006, p 8, Atlanta, April 3–8

14.

Kachele M, Glodek M, Zharkov D, Meudt S, Schwenker F (2014) Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp 671–678

15.

Jeremie N, Vincent R, Kevin B, Lionel P, Mohamed C (2014) Audio-visual emotion recognition: a dynamic, multimodal approach. In: Proceedings of 26th French conference on interaction of human-machine (IHM’14), Lille

16.

Lin J-C, Wu C-H, Wei W-L (2012) Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition. IEEE Trans Multimed 14(1):142–156CrossRef

17.

Kim Y, Lee H, Provost EM (2013) Deep learning for robust feature generation in audiovisual emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3687–3691, 26–31 May 2013

18.

Metallinou A, Wollmer M, Katsamanis A, Eyben F, Schuller B, Narayanan S (2012) Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans Affect Comput 3(2):184–198CrossRef

19.

Mesgarani N, David S, Fritz J, Shamma S (2008) Phoneme representation and classification in primary cortex. J Acoust Soc Am 123:899–909CrossRef

20.

Muhammad G, Mesallam T, Almalki K, Farahat M, Mahmood A, Alsulaiman M (2012) Multi directional regression (MDR) based features for automatic voice disorder detection. J Voice 26(6):817.e19–817.e27CrossRef

21.

Do MN, Vetterli M (2003) The finite ridgelet transform for image representation. IEEE Trans Image Process 12(1):16–28MathSciNetCrossRefMATH

22.

Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501CrossRef

23.

Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Proceedings of IEEE VS-PETS’2005, pp 65–72, Beijing, 15–16 Oct 2005

24.

Starck J-L, Candès EJ, Donoho DL (2002) The curvelet transform for image denoising. IEEE Trans Image Process 11:670–684MathSciNetCrossRefMATH

25.

Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B 42(2):513–529CrossRef

26.

Huang W, Li N, Lin Z, Huang G-B, Zong W, Zhou J, Duan Y (2013) Liver tumor detection and segmentation using kernel-based extreme learning machine. In: Proceedings of 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC ’13), pp 3662–3665, Osaka

27.

Savojardo C, Fariselli P, Casadio R (2013) BETAWARE: a machine-learning tool to detect and predict transmembrane beta barrel proteins in Prokaryotes. Bioinformatics 29(4):504–505CrossRef

28.

Yin XX, Hadjiloucas S, Zhang Y (2014) Complex extreme learning machine applications in terahertz pulsed signals feature sets. Comput Methods Programs Biomed 117(2):387–403CrossRef

29.

Hossain MS, Muhammad G, Song B, Hassan M, Alelaiwi A, Alamri A (2015) Audio-visual emotion-aware cloud gaming framework. IEEE Trans Circuits Syst Video Technol. doi:10.1109/TCSVT.2015.2444731

30.

Kanade T, Cohn J, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proceedings of IEEE international conference on face and gesture recognition (AFGR ‘00), pp 46–53

31.

Mansoorizadeh M, Charkari NM (2010) Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools Appl 49(2):277–297CrossRef

32.

Jiang D, Cui Y, Zhang X, Fan P, Ganzalez I, Sahli H (2011) Audio visual emotion recognition based on triple-stream dynamic bayesian network models. In: D’Mello S, et al. (eds) ACII 2011, Part I, LNCS 6974, pp 609–618

33.

Paleari M, Huet B (June 2008) Toward emotion indexing of multi-media excerpts. in: Proceedings of International Workshop on Content Based Multimedia Indexing (CBMI), pp 425-432, London

34.

Muhammad G, Masud M, Alelaiwi A, Rahman MA, Karime A, Alamri A, Hossain MS (2015) Spectro-temporal directional derivative based automatic speech recognition for a serious game scenario. Multimed Tools Appl 74(14):5313–5327. doi:10.1007/s11042-014-1973-7 CrossRef

35.

Jin Q, Li C, Chen S, Wu H (2015) Speech emotion recognition with acoustic and lexical features. In: Proceedings 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4749–4753, 19–24 Apr 2015

36.

Poria S, Cambria E, Howard N, Huang G-B, Hussain A (2015) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing. doi:10.1016/j.neucom.2015.01.095

37.

Hossain MS, Muhammad G (2015) Cloud-assisted speech and face recognition framework for health monitoring. Mob Netw Appl 20(3):391–399. doi:10.1007/s11036-015-0586-3 CrossRef

Titel: Audio-visual emotion recognition using multi-directional regression and Ridgelet transform
verfasst von: M. Shamim Hossain
Ghulam Muhammad
Publikationsdatum: 01.12.2016
Verlag: Springer International Publishing
Erschienen in: Journal on Multimodal User Interfaces / Ausgabe 4/2016
Print ISSN: 1783-7677
Elektronische ISSN: 1783-8738
DOI: https://doi.org/10.1007/s12193-015-0207-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2016

Pedestrian activity classification using patterns of motion and histogram of oriented gradient

Spatial and temporal variations of feature tracks for crowd behavior analysis

Integrity analysis of knee joint by acoustic emission technique

Action recognition based on binary patterns of action-history and histogram of oriented gradient

Premium Partner