Top

Published in:

2020 | OriginalPaper | Chapter

Global Affective Video Content Regression Based on Complementary Audio-Visual Features

Authors : Xiaona Guo, Wei Zhong, Long Ye, Li Fang, Yan Heng, Qin Zhang

Published in: MultiMedia Modeling

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, we propose a new framework for global affective video content regression with five complementary audio-visual features. For the audio modality, we select the global audio feature eGeMAPS and two deep features SoundNet and VGGish. As for the visual modality, the key frames of original images and those of optical flow images are both used to extract VGG-19 features with finetuned models, in order to represent the original visual cues in conjunction with motion information. In the experiments, we perform the evaluations of selected audio and visual features on the dataset of Emotional Impact of Movies Task 2016 (EIMT16), and compare our results with those of competitive teams in EIMT16 and state-of-the-art method. The experimental results show that the fusion of five features can achieve better regression results in both arousal and valence dimensions, indicating the selected five features are complementary with each other in the audio-visual modalities. Furthermore, the proposed approach can achieve better regression results than the state-of-the-art method in both evaluation metrics of MSE and PCC in the arousal dimension and comparable MSE results in the valence dimension. Although our approach obtains slightly lower PCC result than the state-of-the-art method in the valence dimension, the fused feature vectors used in our framework have much lower dimensions with a total of 1752, only five thousandths of feature dimensions in the state-of-the-art method, largely bringing down the memory requirements and computational burden.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Facial Expression Restoration Based on Improved Graph Convolutional Networks

next chapter Studying Public Medical Images from the Open Access Literature and Social Networks for Model Training and Knowledge Extraction

Baveye, Y., Chamaret, C., Dellandréa, E., Chen, L.M.: Affective video content analysis: a multidisciplinary insight. IEEE Trans. Affect. Comput. 9(4), 396–409 (2018)CrossRef

Baveye, Y., Dellandréa, E., Chamaret, C., Chen, L.M.: LIRIS-ACCEDE: a video database for affective content analysis. IEEE Trans. Affect. Comput. 6(1), 43–55 (2015)CrossRef

Sjöberg, M., Baveye, Y., Wang, H.L., Quang, V.L., Ionescu, B., et al.: The MediaEval 2015 affective impact of movies task. In: MediaEval (2015)

Dellandréa, E., Chen, L.M., Baveye, Y., Sjöberg, M.V., Chamaret, C.: The MediaEval 2016 emotional impact of movies task. In: MediaEval (2016)

Chen, S.Z., Jin, Q.: RUC at MediaEval 2016 emotional impact of movies task: fusion of multimodal features. In: MediaEval (2016)

Liu, Y., Gu, Z.L., Zhang, Y., Liu, Y.: Mining emotional features of movies. In: MediaEval (2016)

Ma, Y., Ye, Z.P., Xu, M.X.: THU-HCSI at MediaEval 2016: emotional impact of movies task. In: MediaEval (2016)

Jan, A., Gaus, Y.F.B.A., Meng, H.Y., Zhang, F.: BUL in MediaEval 2016 emotional impact of movies task. In: MediaEval (2016)

Timoleon, A.T., Hadjileontiadis, L.J.: AUTH-SGP in MediaEval 2016 emotional impact of movies task. In: MediaEval (2016)

10.

Yi, Y., Wang, H.L.: Multi-modal learning for affective content analysis in movies. Multimedia Tools Appl. 78(10), 13331–13350 (2019)CrossRef

11.

Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., Andre, E., et al.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)CrossRef

12.

Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: learning sound representations from unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900. Barcelona, Spain (2016)

13.

Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 776–780. New Orleans, USA (2017)

14.

Hershey, S., Chaudhuri, S., Ellis, D.P.W., Gemmeke, J.F., Jansen, A., et al.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 131–135. New Orleans, USA (2017)

15.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations. San Diego, USA (2015)

16.

Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(3), 551–585 (2006)MathSciNetMATH

17.

Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., Ishwaran, H., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)MathSciNetCrossRef

Title: Global Affective Video Content Regression Based on Complementary Audio-Visual Features
Authors: Xiaona Guo
Wei Zhong
Long Ye
Li Fang
Yan Heng
Qin Zhang
Publisher: Springer International Publishing
Book: MultiMedia Modeling
Print ISBN: 978-3-030-37733-5

Electronic ISBN: 978-3-030-37734-2

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-37734-2_44

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"