Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2023

18.06.2023 | Theoretical Advances

A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos

verfasst von: Shubao Qi, Baolin Liu

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Video-based group emotion recognition is an important research area in computer vision and is of great significance for the intelligent understanding of videos and human–computer interactions. Previous studies have adopted the traditional two-stage shallow pipeline to extract visual or audio features and train classifiers. A single feature or two are insufficient to comprehensively represent video information. In addition, sparse expression of emotions has not been addressed effectively. Therefore, in this study, we propose a novel deep convolutional neural networks (CNNs) architecture for video-based group emotion recognition that fuses multimodal feature information such as vision, audio, optical flow, and face. To address the problem of sparse emotional expressions in videos, we constructed an improved keyframe extraction algorithm for a visual stream to extract keyframes with more emotional features. A subnetwork incorporating spatial and channel attention was designed to automatically concentrate on the regions and channels carrying distinctive information in each keyframe to more accurately represent the emotional features of the visual stream. The proposed model was used to conduct extensive experiments on a video group affect dataset. It outperformed other video-based group emotion recognition methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Tu Z, Guo Z, Xie W et al (2017) Fusing disparate object signatures for salient object detection in video. Pattern Recognit 72:285–299CrossRef Tu Z, Guo Z, Xie W et al (2017) Fusing disparate object signatures for salient object detection in video. Pattern Recognit 72:285–299CrossRef
2.
Zurück zum Zitat Singh R, Kushwaha AKS, Srivastava R (2019) Multi-view recognition system for human activity based on multiple features for video surveillance system. Multimed Tools Appl 78(12):17165–17196CrossRef Singh R, Kushwaha AKS, Srivastava R (2019) Multi-view recognition system for human activity based on multiple features for video surveillance system. Multimed Tools Appl 78(12):17165–17196CrossRef
3.
Zurück zum Zitat Shukla A, Gullapuram SS, Katti H et al (2017) Affect recognition in Ads with application to computational advertising. ACM Multimedia, pp 1148–1156 Shukla A, Gullapuram SS, Katti H et al (2017) Affect recognition in Ads with application to computational advertising. ACM Multimedia, pp 1148–1156
4.
Zurück zum Zitat Wang Y, Zhou S, Liu Y et al (2022) ConGNN: context-consistent cross-graph neural network for group emotion recognition in the wild. Inf Sci 610:707–724CrossRef Wang Y, Zhou S, Liu Y et al (2022) ConGNN: context-consistent cross-graph neural network for group emotion recognition in the wild. Inf Sci 610:707–724CrossRef
5.
Zurück zum Zitat Dai Y, Liu X, Dong S et al (2019) Group emotion recognition based on global and local features. IEEE Access 7:1–1 Dai Y, Liu X, Dong S et al (2019) Group emotion recognition based on global and local features. IEEE Access 7:1–1
6.
Zurück zum Zitat Shamsi SN, Singh BP, Wadhwa M (2018) Group affect prediction using multimodal distributions. In: 2018 IEEE winter applications of computer vision workshops (WACVW). IEEE, pp 77–83 Shamsi SN, Singh BP, Wadhwa M (2018) Group affect prediction using multimodal distributions. In: 2018 IEEE winter applications of computer vision workshops (WACVW). IEEE, pp 77–83
7.
Zurück zum Zitat Ottl S, Amiriparian S, Gerczuk M, et al. (2020) Group-level speech emotion recognition utilising deep spectrum features. In: ICMI ‘20: international conference on multimodal interaction Ottl S, Amiriparian S, Gerczuk M, et al. (2020) Group-level speech emotion recognition utilising deep spectrum features. In: ICMI ‘20: international conference on multimodal interaction
8.
Zurück zum Zitat Pinto JR, Gonalves TFS, Pinto C, et al. (2020) Audiovisual classification of group emotion valence using activity recognition networks. In: Fourth IEEE international conference on image processing, applications and systems (IPAS 2020). IEEE Pinto JR, Gonalves TFS, Pinto C, et al. (2020) Audiovisual classification of group emotion valence using activity recognition networks. In: Fourth IEEE international conference on image processing, applications and systems (IPAS 2020). IEEE
9.
Zurück zum Zitat Wang Y, Wu J, Heracleous P, et al. (2020) Implicit knowledge injectable cross attention audiovisual model for group emotion recognition. In: Proceedings of the 2020 international conference on multimodal interaction. pp 827–834 Wang Y, Wu J, Heracleous P, et al. (2020) Implicit knowledge injectable cross attention audiovisual model for group emotion recognition. In: Proceedings of the 2020 international conference on multimodal interaction. pp 827–834
11.
Zurück zum Zitat Tu G, Fu Y, Li B et al (2020) A multi-task neural approach for emotion attribution, classification, and summarization. IEEE Trans Multimed 22(1):148–159CrossRef Tu G, Fu Y, Li B et al (2020) A multi-task neural approach for emotion attribution, classification, and summarization. IEEE Trans Multimed 22(1):148–159CrossRef
12.
Zurück zum Zitat Kosinski M (2021) Facial recognition technology can expose political orientation from naturalistic facial images. Sci Rep 11(1):1–7 Kosinski M (2021) Facial recognition technology can expose political orientation from naturalistic facial images. Sci Rep 11(1):1–7
13.
Zurück zum Zitat Lakshmy V, Ramana Murthy OV (2018) Image based group happiness intensity analysis. In: Jude Hemanth D, Smys S (eds) Computational vision and bio inspired computing. Springer International Publishing, Cham, pp 1032–1040CrossRef Lakshmy V, Ramana Murthy OV (2018) Image based group happiness intensity analysis. In: Jude Hemanth D, Smys S (eds) Computational vision and bio inspired computing. Springer International Publishing, Cham, pp 1032–1040CrossRef
14.
Zurück zum Zitat Lu G, Zhang W. (2019) Happiness intensity estimation for a group of people in images using convolutional neural networks. In: 2019 3rd international conference on electronic information technology and computer engineering (EITCE) Lu G, Zhang W. (2019) Happiness intensity estimation for a group of people in images using convolutional neural networks. In: 2019 3rd international conference on electronic information technology and computer engineering (EITCE)
15.
Zurück zum Zitat Sharma G, Ghosh S, Dhall A. (2019) Automatic group level affect and cohesion prediction in videos. In: 2019 8th international conference on affective computing and intelligent interaction workshops and demos (ACIIW). IEEE, 161–167 Sharma G, Ghosh S, Dhall A. (2019) Automatic group level affect and cohesion prediction in videos. In: 2019 8th international conference on affective computing and intelligent interaction workshops and demos (ACIIW). IEEE, 161–167
16.
Zurück zum Zitat Surace L, Patacchiola M, Battini Sönmez E, et al. (2017) Emotion recognition in the wild using deep neural networks and Bayesian classifiers. In: Proceedings of the 19th ACM international conference on multimodal interaction. pp 593–597 Surace L, Patacchiola M, Battini Sönmez E, et al. (2017) Emotion recognition in the wild using deep neural networks and Bayesian classifiers. In: Proceedings of the 19th ACM international conference on multimodal interaction. pp 593–597
17.
Zurück zum Zitat Wei Q, Zhao Y, Xu Q, et al. (2017) A new deep-learning framework for group emotion recognition. In: ACM international conference on multimodal interaction. ACM, pp 587–592 Wei Q, Zhao Y, Xu Q, et al. (2017) A new deep-learning framework for group emotion recognition. In: ACM international conference on multimodal interaction. ACM, pp 587–592
18.
Zurück zum Zitat Khan AS, Li Z, Cai J, et al. (2018) Group-level emotion recognition using deep models with a four-stream hybrid network. In: Proceedings of the 20th ACM international conference on multimodal interaction. pp 623–629 Khan AS, Li Z, Cai J, et al. (2018) Group-level emotion recognition using deep models with a four-stream hybrid network. In: Proceedings of the 20th ACM international conference on multimodal interaction. pp 623–629
19.
Zurück zum Zitat Wang J, Zhao Z, Liang J, et al. (2018) Video-based emotion recognition using face frontalization and deep spatiotemporal feature. In: 2018 first asian conference on affective computing and intelligent interaction (ACII Asia) Wang J, Zhao Z, Liang J, et al. (2018) Video-based emotion recognition using face frontalization and deep spatiotemporal feature. In: 2018 first asian conference on affective computing and intelligent interaction (ACII Asia)
20.
Zurück zum Zitat Doherty AR, Byrne D, Smeaton A F, et al. (2008) Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs. In: Conference on image and video retrieval. ACM Doherty AR, Byrne D, Smeaton A F, et al. (2008) Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs. In: Conference on image and video retrieval. ACM
21.
Zurück zum Zitat Jahagirdar A, Nagmode M (2019) Two level key frame extraction for action recognition using content based adaptive threshold. Int J Intell Eng Syst 12(5):43–52 Jahagirdar A, Nagmode M (2019) Two level key frame extraction for action recognition using content based adaptive threshold. Int J Intell Eng Syst 12(5):43–52
22.
Zurück zum Zitat Xue H, Qin J, Quan C et al (2021) Open set sheep face recognition based on Euclidean space metric. Math Probl Eng 2021:1–15 Xue H, Qin J, Quan C et al (2021) Open set sheep face recognition based on Euclidean space metric. Math Probl Eng 2021:1–15
23.
Zurück zum Zitat Wu H, Zhang Z, Wu Q (2021) Exploring syntactic and semantic features for authorship attribution. Appl Soft Comput 111:107815CrossRef Wu H, Zhang Z, Wu Q (2021) Exploring syntactic and semantic features for authorship attribution. Appl Soft Comput 111:107815CrossRef
24.
Zurück zum Zitat Amiriparian S (2019) Deep representation learning techniques for audio signal processing. Dissertation. Technische Universität München, München Amiriparian S (2019) Deep representation learning techniques for audio signal processing. Dissertation. Technische Universität München, München
27.
Zurück zum Zitat Deng J, Dong W, Socher R, et al. (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of computer vision and pattern recognition, 2009. Deng J, Dong W, Socher R, et al. (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of computer vision and pattern recognition, 2009.
28.
Zurück zum Zitat Dhall A, Sharma G, Goecke R, et al. (2020) EmotiW 2020: driver gaze, group emotion, student engagement and physiological signal based challenges. In: ICMI ‘20: international conference on multimodal interaction. Dhall A, Sharma G, Goecke R, et al. (2020) EmotiW 2020: driver gaze, group emotion, student engagement and physiological signal based challenges. In: ICMI ‘20: international conference on multimodal interaction.
29.
Zurück zum Zitat Guo X, Polania LF, Zhu B, et al. (2020) Graph neural networks for image understanding based on multiple cues: group emotion recognition and event recognition as use cases. In: Workshop on applications of computer vision. IEEE Guo X, Polania LF, Zhu B, et al. (2020) Graph neural networks for image understanding based on multiple cues: group emotion recognition and event recognition as use cases. In: Workshop on applications of computer vision. IEEE
Metadaten
Titel
A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
verfasst von
Shubao Qi
Baolin Liu
Publikationsdatum
18.06.2023
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2023
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-023-01178-4

Weitere Artikel der Ausgabe 3/2023

Pattern Analysis and Applications 3/2023 Zur Ausgabe

Premium Partner