nach oben

Erschienen in:

04.01.2021 | Original Article

HEU Emotion: a large-scale database for multimodal emotion recognition in the wild

verfasst von: Jing Chen, Chenhui Wang, Kejun Wang, Chaoqun Yin, Cong Zhao, Tao Xu, Xinyi Zhang, Ziqiang Huang, Meichen Liu, Tao Yang

Erschienen in: Neural Computing and Applications | Ausgabe 14/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The study of affective computing in the wild setting is underpinned by databases. Existing multimodal emotion databases in the real-world conditions are few and small, with a limited number of subjects and expressed in a single language. To meet this requirement, we collected, annotated, and prepared to release a new natural state video database (called HEU Emotion). HEU Emotion contains a total of 19,004 video clips, which is divided into two parts according to the data source. The first part contains videos downloaded from Tumblr, Google, and Giphy, including 10 emotions and two modalities (facial expression and body posture). The second part includes corpus taken manually from movies, TV series, and variety shows, consisting of 10 emotions and three modalities (facial expression, body posture, and emotional speech). HEU Emotion is by far the most extensive multimodal emotional database with 9951 subjects. In order to provide a benchmark for emotion recognition, we used many conventional machine learning and deep learning methods to evaluate HEU Emotion. We proposed a multimodal attention module to fuse multimodal features adaptively. After multimodal fusion, the recognition accuracies for the two parts increased by 2.19% and 4.01%, respectively, over those of single-modal facial expression recognition.

Vorheriger Artikel Development of energy efficient drive for ventilation system using recurrent neural network

Nächster Artikel Optimization of the grapes reception process

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://github.com/ShiqiYu/libfacedetection..

Ahonen T, Hadid A, Pietikäinen M (2004) Face recognition with local binary patterns. In: European conference on computer vision, pp 469–481. Springer

Ben-Younes H, Cadene R, Thome N, Cord M (2019) Block: Bilinear superdiagonal fusion for visual question answering and visual relationship detection. arXiv preprint arXiv:1902.00038

Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation 42(4):335–339CrossRef

Chen SY, Hsu CC, Kuo CC, Ku LW, et al (2018) Emotionlines: An emotion corpus of multi-party conversations. arXiv preprint arXiv:1802.08379

Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734. Association for Computational Linguistics

Chou HC, Lin WC, Chang LC, Li CC, Ma HP, Lee CC (2017) Nnime: The nthu-ntua chinese interactive multimodal emotion corpus. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp 292–298. IEEE

Clavel C, Vasilescu I, Devillers L, Richard G, Ehrette T, Sedogbo C (2006) The safe corpus: illustrating extreme emotions in dynamic situations. First International Workshop on Emotion: Corpora for Research on Emotion and Affect (International conference on Language Resources and Evaluation (LREC 2006)). Genoa, Italy, pp 76–79

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1, pp 886–893. IEEE

De Silva LC, Miyasato T, Nakatsu R (1997) Facial emotion recognition using multi-modal information. In: Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat., vol. 1, pp 397–401. IEEE

10.

Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp 2106–2112. IEEE

11.

Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMed 19(3):34–41. https://doi.org/10.1109/MMUL.2012.26CrossRef

12.

Ekman P (1976) Pictures of facial affect. Consulting Psychologists Press,

13.

Ekman P (1993) Facial expression and emotion. Am psychol 48(4):384CrossRef

14.

Ekman P (2003) Emotions revealed: Recognizing faces and feelings to improve communication and emotional life. Holt Paperback 128(8):140–140

15.

Eyben F, Weninger F, Gross F, Schuller B (2013) Recent developments in opensmile, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on Multimedia, pp 835–838. ACM

16.

Fan Y, Lam JC, Li VO (2018) Multi-region ensemble convolutional neural network for facial expression recognition. In: International Conference on Artificial Neural Networks, pp. 84–94. Springer

17.

Fan Y, Lu X, Li D, Liu Y (2016) Video-based emotion recognition using cnn-rnn and c3d hybrid networks. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 445–450. ACM

18.

Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836CrossRef

19.

Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee DH, et al (2013) Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp 117–124. Springer

20.

Gunes H, Piccardi M (2006) A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: 18th International Conference on Pattern Recognition (ICPR’06), vol 1, pp 1148–1153. IEEE

21.

Haq S, Jackson PJ (2011) Multimodal emotion recognition. In: Machine audition: principles, algorithms and systems, pp 398–423. IGI Global

22.

Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 6546–6555

23.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

24.

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

25.

Hu P, Cai D, Wang S, Yao A, Chen Y (2017) Learning supervised scoring ensemble for emotion recognition in the wild. In: Proceedings of the 19th ACM international conference on multimodal interaction, pp 553–560. ACM

26.

Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

27.

Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1867–1874

28.

Kosti R, Alvarez JM, Recasens A, Lapedriza A (2017) Emotion recognition in context. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1960–1968. 10.1109/CVPR.2017.212

29.

Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, pp 503–510. ACM

30.

Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2584–2593 10.1109/CVPR.2017.277

31.

Li Y, Tao J, Chao L, Bao W, Liu Y (2017) Cheavd: a chinese natural emotional audio-visual database. J Amb Intell Human Comput 8(6):913–924CrossRef

32.

Li Y, Tao J, Schuller B, Shan S, Jiang D, Jia J (2018) Mec 2017: Multimodal emotion recognition challenge. In: 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), pp 1–5. IEEE

33.

Liu C, Tang T, Lv K, Wang M (2018) Multi-feature based emotion recognition for video clips. In: Proceedings of the 2018 on International Conference on Multimodal Interaction, pp 630–634. ACM

34.

Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image process 11(4):467–476CrossRef

35.

Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one 13(5):e0196391CrossRef

36.

Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp 94–101. IEEE

37.

Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06), pp 8–8. IEEE

38.

Matsumoto D, Hwang HS (2011) Evidence for training the ability to read microexpressions of emotion. Motivation Emotion 35(2):181–191CrossRef

39.

McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264(5588):746CrossRef

40.

McKeown G, Valstar M, Cowie R, Pantic M, Schroder M (2011) The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17CrossRef

41.

Mehrabian A (2008) Communication without words. Communication theory, pp 193–200

42.

Mollahosseini A, Hasani B, Mahoor MH (2019) Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affective Comput 10(1):18–31. https://doi.org/10.1109/TAFFC.2017.2740923CrossRef

43.

Ng HW, Nguyen VD, Vonikakis V, Winkler S (2015) Deep learning for emotion recognition on small datasets using transfer learning. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, pp 443–449. ACM

44.

Peng X, Xia Z, Li L, Feng X (2016) Towards facial expression recognition in the wild: A new database and deep recognition system. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 93–99

45.

Perepelkina O, Kazimirova E, Konstantinova M (2018) Ramas: Russian multimodal corpus of dyadic interaction for affective computing. In: International Conference on Speech and Computer, pp 501–510. Springer

46.

Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: A multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint arXiv:1810.02508

47.

Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

48.

Rothe R, Timofte R, Van Gool L (2015) Dex: Deep expectation of apparent age from a single image. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 10–15

49.

Sapiński T, Kamińska D, Pelikant A, Ozcinar C, Avots E, Anbarjafari G (2018) Multimodal database of emotional speech, video and gestures. In: International Conference on Pattern Recognition, pp 153–163. Springer

50.

Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association

51.

Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576

52.

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

53.

Suykens JA (2001) Support vector machines: a nonlinear modelling and control perspective. Eur J Control 7(2–3):311–327CrossRef

54.

Vasilescu I, Devillers L, Clavel C, Ehrette T (2004) Fiction database for emotion detection in abnormal situations. In: Eighth International Conference on Spoken Language Processing

55.

Xie Z (2010) Ryerson multimedia research laboratory (rml). http://www.rml.ryerson.ca/rml-emotion-database.html

56.

Yan J, Zheng W, Cui Z, Tang C, Zhang T, Zong Y (2018) Multi-cue fusion for emotion recognition in the wild. Neurocomputing 309:27–35CrossRef

57.

Yu F, Chang E, Xu YQ, Shum HY (2001) Emotion detection from speech to enrich multimedia content. In: Pacific-Rim Conference on Multimedia, pp 550–557. Springer

58.

Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503CrossRef

59.

Zhu Y, Lan Z, Newsam S, Hauptmann A (2018) Hidden two-stream convolutional networks for action recognition. In: Asian Conference on Computer Vision, pp 363–378. Springer

Titel: HEU Emotion: a large-scale database for multimodal emotion recognition in the wild
verfasst von: Jing Chen
Chenhui Wang
Kejun Wang
Chaoqun Yin
Cong Zhao
Tao Xu
Xinyi Zhang
Ziqiang Huang
Meichen Liu
Tao Yang
Publikationsdatum: 04.01.2021
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 14/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-020-05616-w

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 14/2021

Efficient scale estimation methods using lightweight deep convolutional neural networks for visual tracking

Short-term traffic flow prediction based on improved wavelet neural network

Incorporating Actor-Critic in Monte Carlo tree search for symbolic regression

Integration of support vector regression and grey wolf optimization for estimating the ultimate bearing capacity in concrete-filled steel tube columns

Software defect prediction model based on LASSO–SVM

Improved convolutional neural network in remote sensing image classification

Premium Partner