Skip to main content
Top

2020 | OriginalPaper | Chapter

Global Affective Video Content Regression Based on Complementary Audio-Visual Features

Authors : Xiaona Guo, Wei Zhong, Long Ye, Li Fang, Yan Heng, Qin Zhang

Published in: MultiMedia Modeling

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we propose a new framework for global affective video content regression with five complementary audio-visual features. For the audio modality, we select the global audio feature eGeMAPS and two deep features SoundNet and VGGish. As for the visual modality, the key frames of original images and those of optical flow images are both used to extract VGG-19 features with finetuned models, in order to represent the original visual cues in conjunction with motion information. In the experiments, we perform the evaluations of selected audio and visual features on the dataset of Emotional Impact of Movies Task 2016 (EIMT16), and compare our results with those of competitive teams in EIMT16 and state-of-the-art method. The experimental results show that the fusion of five features can achieve better regression results in both arousal and valence dimensions, indicating the selected five features are complementary with each other in the audio-visual modalities. Furthermore, the proposed approach can achieve better regression results than the state-of-the-art method in both evaluation metrics of MSE and PCC in the arousal dimension and comparable MSE results in the valence dimension. Although our approach obtains slightly lower PCC result than the state-of-the-art method in the valence dimension, the fused feature vectors used in our framework have much lower dimensions with a total of 1752, only five thousandths of feature dimensions in the state-of-the-art method, largely bringing down the memory requirements and computational burden.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Baveye, Y., Chamaret, C., Dellandréa, E., Chen, L.M.: Affective video content analysis: a multidisciplinary insight. IEEE Trans. Affect. Comput. 9(4), 396–409 (2018)CrossRef Baveye, Y., Chamaret, C., Dellandréa, E., Chen, L.M.: Affective video content analysis: a multidisciplinary insight. IEEE Trans. Affect. Comput. 9(4), 396–409 (2018)CrossRef
2.
go back to reference Baveye, Y., Dellandréa, E., Chamaret, C., Chen, L.M.: LIRIS-ACCEDE: a video database for affective content analysis. IEEE Trans. Affect. Comput. 6(1), 43–55 (2015)CrossRef Baveye, Y., Dellandréa, E., Chamaret, C., Chen, L.M.: LIRIS-ACCEDE: a video database for affective content analysis. IEEE Trans. Affect. Comput. 6(1), 43–55 (2015)CrossRef
3.
go back to reference Sjöberg, M., Baveye, Y., Wang, H.L., Quang, V.L., Ionescu, B., et al.: The MediaEval 2015 affective impact of movies task. In: MediaEval (2015) Sjöberg, M., Baveye, Y., Wang, H.L., Quang, V.L., Ionescu, B., et al.: The MediaEval 2015 affective impact of movies task. In: MediaEval (2015)
4.
go back to reference Dellandréa, E., Chen, L.M., Baveye, Y., Sjöberg, M.V., Chamaret, C.: The MediaEval 2016 emotional impact of movies task. In: MediaEval (2016) Dellandréa, E., Chen, L.M., Baveye, Y., Sjöberg, M.V., Chamaret, C.: The MediaEval 2016 emotional impact of movies task. In: MediaEval (2016)
5.
go back to reference Chen, S.Z., Jin, Q.: RUC at MediaEval 2016 emotional impact of movies task: fusion of multimodal features. In: MediaEval (2016) Chen, S.Z., Jin, Q.: RUC at MediaEval 2016 emotional impact of movies task: fusion of multimodal features. In: MediaEval (2016)
6.
go back to reference Liu, Y., Gu, Z.L., Zhang, Y., Liu, Y.: Mining emotional features of movies. In: MediaEval (2016) Liu, Y., Gu, Z.L., Zhang, Y., Liu, Y.: Mining emotional features of movies. In: MediaEval (2016)
7.
go back to reference Ma, Y., Ye, Z.P., Xu, M.X.: THU-HCSI at MediaEval 2016: emotional impact of movies task. In: MediaEval (2016) Ma, Y., Ye, Z.P., Xu, M.X.: THU-HCSI at MediaEval 2016: emotional impact of movies task. In: MediaEval (2016)
8.
go back to reference Jan, A., Gaus, Y.F.B.A., Meng, H.Y., Zhang, F.: BUL in MediaEval 2016 emotional impact of movies task. In: MediaEval (2016) Jan, A., Gaus, Y.F.B.A., Meng, H.Y., Zhang, F.: BUL in MediaEval 2016 emotional impact of movies task. In: MediaEval (2016)
9.
go back to reference Timoleon, A.T., Hadjileontiadis, L.J.: AUTH-SGP in MediaEval 2016 emotional impact of movies task. In: MediaEval (2016) Timoleon, A.T., Hadjileontiadis, L.J.: AUTH-SGP in MediaEval 2016 emotional impact of movies task. In: MediaEval (2016)
10.
go back to reference Yi, Y., Wang, H.L.: Multi-modal learning for affective content analysis in movies. Multimedia Tools Appl. 78(10), 13331–13350 (2019)CrossRef Yi, Y., Wang, H.L.: Multi-modal learning for affective content analysis in movies. Multimedia Tools Appl. 78(10), 13331–13350 (2019)CrossRef
11.
go back to reference Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., Andre, E., et al.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)CrossRef Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., Andre, E., et al.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)CrossRef
12.
go back to reference Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: learning sound representations from unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900. Barcelona, Spain (2016) Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: learning sound representations from unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900. Barcelona, Spain (2016)
13.
go back to reference Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 776–780. New Orleans, USA (2017) Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 776–780. New Orleans, USA (2017)
14.
go back to reference Hershey, S., Chaudhuri, S., Ellis, D.P.W., Gemmeke, J.F., Jansen, A., et al.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 131–135. New Orleans, USA (2017) Hershey, S., Chaudhuri, S., Ellis, D.P.W., Gemmeke, J.F., Jansen, A., et al.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 131–135. New Orleans, USA (2017)
15.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations. San Diego, USA (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations. San Diego, USA (2015)
16.
go back to reference Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(3), 551–585 (2006)MathSciNetMATH Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(3), 551–585 (2006)MathSciNetMATH
17.
go back to reference Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., Ishwaran, H., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)MathSciNetCrossRef Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., Ishwaran, H., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)MathSciNetCrossRef
Metadata
Title
Global Affective Video Content Regression Based on Complementary Audio-Visual Features
Authors
Xiaona Guo
Wei Zhong
Long Ye
Li Fang
Yan Heng
Qin Zhang
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-37734-2_44