Skip to main content
Top

2014 | OriginalPaper | Chapter

8. Multimodal Violence Detection in Hollywood Movies: State-of-the-Art and Benchmarking

Authors : Claire-Hélène Demarty, Cédric Penet, Bogdan Ionescu, Guillaume Gravier, Mohammad Soleymani

Published in: Fusion in Computer Vision

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter introduces a benchmark evaluation targeting the detection of violent scenes in Hollywood movies. The evaluation was implemented in 2011 and 2012 as an affect task in the framework of the international MediaEval benchmark initiative. We report on these 2 years of evaluation, providing a detailed description of the dataset created, describing the state of the art by studying the results achieved by participants and providing a detailed analysis of two of the best performing multimodal systems. We elaborate on the lessons learned after 2 years to provide insights on future work emphasizing multimodal modeling and fusion.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
4
The development data is intended for designing and training the approaches.
 
5
The test set data is intended for the official benckmarking.
 
6
The Yaafe toolkit for audio feature extraction was used.
 
Literature
1.
go back to reference Acar E, Albayrak S (2012) Dai lab at mediaeval 2012 affect task: the detection of violent scenes using affective features. In: MediaEval 2012, multimedia benchmark workshop Acar E, Albayrak S (2012) Dai lab at mediaeval 2012 affect task: the detection of violent scenes using affective features. In: MediaEval 2012, multimedia benchmark workshop
2.
go back to reference Acar E, Spiegel S, Albayrak S (2011) Mediaeval 2011 affect task: Violent scene detection combining audio and visual features with svm. In: MediaEval 2011, multimedia benchmark workshop Acar E, Spiegel S, Albayrak S (2011) Mediaeval 2011 affect task: Violent scene detection combining audio and visual features with svm. In: MediaEval 2011, multimedia benchmark workshop
3.
go back to reference Baveye Y, Urban F, Chamaret C, Demoulin V, Hellier P (2013) Saliency-guided consistent color harmonization. Computational color imaging, Lecture notes in computer science, vol 7786. Springer, Berlin, pp 105–118 Baveye Y, Urban F, Chamaret C, Demoulin V, Hellier P (2013) Saliency-guided consistent color harmonization. Computational color imaging, Lecture notes in computer science, vol 7786. Springer, Berlin, pp 105–118
4.
go back to reference Chen LH, Hsu HW, Wang LY, Su CW (2011) Violence detection in movies. In: 8th IEEE international conference on computer graphics, imaging and visualization (CGIV 2011), pp 119–124 Chen LH, Hsu HW, Wang LY, Su CW (2011) Violence detection in movies. In: 8th IEEE international conference on computer graphics, imaging and visualization (CGIV 2011), pp 119–124
6.
go back to reference Cheng WH, Chu WT, Wu JL (2003) Semantic context detection based on hierarchical audio models. In: Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval, pp 109–115 Cheng WH, Chu WT, Wu JL (2003) Semantic context detection based on hierarchical audio models. In: Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval, pp 109–115
8.
go back to reference Datta A, Shah M, Da Vitoria Lobo N (2002) Person-on-person violence detection in video data. In: Proceedings of 16th IEEE international conference on pattern recognition, vol 1. pp 433–438 Datta A, Shah M, Da Vitoria Lobo N (2002) Person-on-person violence detection in video data. In: Proceedings of 16th IEEE international conference on pattern recognition, vol 1. pp 433–438
9.
go back to reference Demarty CH, Penet C, Gravier G, Soleymani M (2012) A benchmarking campaign for the multimodal detection of violent scenes in movies. In: Computer Vision-ECCV 2012. Workshops and demonstrations, Springer, pp 416–425 Demarty CH, Penet C, Gravier G, Soleymani M (2012) A benchmarking campaign for the multimodal detection of violent scenes in movies. In: Computer Vision-ECCV 2012. Workshops and demonstrations, Springer, pp 416–425
10.
go back to reference Derbas N, Thollard F, Safadi B, Quénot G (2012) Lig at mediaeval 2012 affect task: use of a generic method. In: MediaEval 2012, multimedia benchmark workshop Derbas N, Thollard F, Safadi B, Quénot G (2012) Lig at mediaeval 2012 affect task: use of a generic method. In: MediaEval 2012, multimedia benchmark workshop
11.
go back to reference de Souza FDM, Chávez GC, do Valle E, de A Araujo A (2010) Violence detection in video using spatio-temporal features. In: 23rd IEEE conference on graphics, patterns and images (SIBGRAPI 2010), pp 224–230 de Souza FDM, Chávez GC, do Valle E, de A Araujo A (2010) Violence detection in video using spatio-temporal features. In: 23rd IEEE conference on graphics, patterns and images (SIBGRAPI 2010), pp 224–230
12.
go back to reference de Weijer JV, Schmid C, Verbeek J, Larlus D (2009) Learning color names for real-world applications. IEEE Trans Image Process 18(7):1512–1523CrossRefMathSciNet de Weijer JV, Schmid C, Verbeek J, Larlus D (2009) Learning color names for real-world applications. IEEE Trans Image Process 18(7):1512–1523CrossRefMathSciNet
13.
go back to reference Eyben F, Weninger F, Lehment N, Rigoll G, Schuller B (2012) Violent scenes detection with large, brute-forced acoustic and visual feature sets. In: MediaEval 2012, multimedia benchmark workshop Eyben F, Weninger F, Lehment N, Rigoll G, Schuller B (2012) Violent scenes detection with large, brute-forced acoustic and visual feature sets. In: MediaEval 2012, multimedia benchmark workshop
14.
go back to reference Giannakopoulos T, Makris A, Kosmopoulos D, Perantonis S, Theodoridis S (2010) Audio-visual fusion for detecting violent scenes in videos. In: Konstantopoulos S et al (eds) Artificial intelligence: theories, models and applications, Lecture notes in computer scienc, vol 6040. Springer, pp 91–100 Giannakopoulos T, Makris A, Kosmopoulos D, Perantonis S, Theodoridis S (2010) Audio-visual fusion for detecting violent scenes in videos. In: Konstantopoulos S et al (eds) Artificial intelligence: theories, models and applications, Lecture notes in computer scienc, vol 6040. Springer, pp 91–100
15.
go back to reference Glotin H, Razik J, Paris S, Prevot JM (2011) Real-time entropic unsupervised violent scenes detection in hollywood movies - dyni @ mediaeval affect task 2011. In: MediaEval 2011, multimedia benchmark workshop Glotin H, Razik J, Paris S, Prevot JM (2011) Real-time entropic unsupervised violent scenes detection in hollywood movies - dyni @ mediaeval affect task 2011. In: MediaEval 2011, multimedia benchmark workshop
16.
go back to reference Gninkoun G, Soleymani M (2011) Automatic violence scenes detection: a multi-modal approach. In: MediaEval 2011, multimedia benchmark workshop Gninkoun G, Soleymani M (2011) Automatic violence scenes detection: a multi-modal approach. In: MediaEval 2011, multimedia benchmark workshop
17.
go back to reference Gong Y, Wang W, Jiang S, Huang Q, Gao W (2008) Detecting violent scenes in movies by auditory and visual cues. In: Huang YM et al (eds) Advances in multimedia information processing - (PCM 2008), Lecture notes in computer science, vol 5353. Springer, pp 317–326 Gong Y, Wang W, Jiang S, Huang Q, Gao W (2008) Detecting violent scenes in movies by auditory and visual cues. In: Huang YM et al (eds) Advances in multimedia information processing - (PCM 2008), Lecture notes in computer science, vol 5353. Springer, pp 317–326
20.
go back to reference Ionescu B, Buzuloiu V, Lambert P, Coquin D (2006) Improved cut detection for the segmentation of animation movies. In: IEEE international conference on acoustics, speech, and signal processing Ionescu B, Buzuloiu V, Lambert P, Coquin D (2006) Improved cut detection for the segmentation of animation movies. In: IEEE international conference on acoustics, speech, and signal processing
21.
go back to reference Ionescu B, Schlüter J, Mironică I, Schedl M (2013) A naive mid-level concept-based fusion approach to violence detection in hollywood movies. In: Proceedings of the 3rd ACM international conference on multimedia retrieval, pp 215–222 Ionescu B, Schlüter J, Mironică I, Schedl M (2013) A naive mid-level concept-based fusion approach to violence detection in hollywood movies. In: Proceedings of the 3rd ACM international conference on multimedia retrieval, pp 215–222
22.
go back to reference Jiang YG, Dai Q, Tan CC, Xue X, Ngo CW (2012) The shanghai-hongkong team at mediaeval2012: Violent scene detection using trajectory-based features. In: MediaEval 2012, multimedia benchmark workshop Jiang YG, Dai Q, Tan CC, Xue X, Ngo CW (2012) The shanghai-hongkong team at mediaeval2012: Violent scene detection using trajectory-based features. In: MediaEval 2012, multimedia benchmark workshop
23.
go back to reference Kriegel B (2003) La violence à la télévision. rapport de la mission d’évaluation, d’analyse et de propositions relative aux représentations violentes à la télévision. Technical report, Ministère de la Culture et de la Communication, Paris Kriegel B (2003) La violence à la télévision. rapport de la mission d’évaluation, d’analyse et de propositions relative aux représentations violentes à la télévision. Technical report, Ministère de la Culture et de la Communication, Paris
25.
go back to reference Lam V, Le DD, Le SP, Satoh S, Duong DA (2012) Nii, Japan at mediaeval 2012 violent scenes detection affect task. In: MediaEval 2011, multimedia benchmark workshop Lam V, Le DD, Le SP, Satoh S, Duong DA (2012) Nii, Japan at mediaeval 2012 violent scenes detection affect task. In: MediaEval 2011, multimedia benchmark workshop
26.
go back to reference Lam V, Le DD, Satoh S, Duong, DA (2011) Nii, Japan at mediaeval 2011 violent scenes detection task. In: MediaEval 2011, multimedia benchmark workshop Lam V, Le DD, Satoh S, Duong, DA (2011) Nii, Japan at mediaeval 2011 violent scenes detection task. In: MediaEval 2011, multimedia benchmark workshop
27.
go back to reference Lin J, Wang W (2009) Weakly-supervised violence detection in movies with audio and video based co-training. In: Advances in multimedia information processing-PCM 2009, Springer, pp 930–935 Lin J, Wang W (2009) Weakly-supervised violence detection in movies with audio and video based co-training. In: Advances in multimedia information processing-PCM 2009, Springer, pp 930–935
28.
go back to reference Lucas P (2002) Restricted Bayesian network structure learning. In: Advances in Bayesian networks, studies in fuzziness and soft computing, pp 217–232 Lucas P (2002) Restricted Bayesian network structure learning. In: Advances in Bayesian networks, studies in fuzziness and soft computing, pp 217–232
29.
go back to reference Ludwig O, Delgado D, Goncalves V, Nunes U (2009) Trainable classifier-fusion schemes: An application to pedestrian detection. In: IEEE internation conference on intelligent transportation systems, pp 432–437 Ludwig O, Delgado D, Goncalves V, Nunes U (2009) Trainable classifier-fusion schemes: An application to pedestrian detection. In: IEEE internation conference on intelligent transportation systems, pp 432–437
30.
go back to reference Martin V, Glotin H, Paris S, Halkias X, Prevot JM (2012) Violence detection in video by large scale multi-scale local binary pattern dynamics. In: MediaEval 2012, multimedia benchmark workshop Martin V, Glotin H, Paris S, Halkias X, Prevot JM (2012) Violence detection in video by large scale multi-scale local binary pattern dynamics. In: MediaEval 2012, multimedia benchmark workshop
31.
go back to reference Nam J, Alghoniemy M, Tewfik AH (1998) Audio-visual content-based violent scene characterization. In: Proceedings of IEEE international conference on image processing (ICIP-98), vol 1. pp 353–357 Nam J, Alghoniemy M, Tewfik AH (1998) Audio-visual content-based violent scene characterization. In: Proceedings of IEEE international conference on image processing (ICIP-98), vol 1. pp 353–357
32.
go back to reference Nievas EB, Suarez OD, García GB, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Computer analysis of images and patterns, Springer, pp 332–339 Nievas EB, Suarez OD, García GB, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Computer analysis of images and patterns, Springer, pp 332–339
33.
go back to reference Penet C, Demarty CH, Gravier G, Gros P (2011) Technicolor and inria/irisa at mediaeval 2011: learning temporal modality integration with bayesian networks. In: MediaEval 2011, Multimedia Benchmark Workshop, CEUR Workshop Proceedings, vol 807. http://CEUR-WS.org Penet C, Demarty CH, Gravier G, Gros P (2011) Technicolor and inria/irisa at mediaeval 2011: learning temporal modality integration with bayesian networks. In: MediaEval 2011, Multimedia Benchmark Workshop, CEUR Workshop Proceedings, vol 807. http://​CEUR-WS.​org
34.
go back to reference Penet C, Demarty CH, Gravier G, Gros P (2013) Audio event detection in movies using multiple audio words and contextual Bayesian networks. In: Workshop on content-based multimedia indexing Penet C, Demarty CH, Gravier G, Gros P (2013) Audio event detection in movies using multiple audio words and contextual Bayesian networks. In: Workshop on content-based multimedia indexing
35.
go back to reference Penet C, Demarty CH, Soleymani M, Gravier G, Gros P (2012) Technicolor/inria/imperial college london at the mediaeval 2012 violent scene detection task. In: MediaEval 2012, multimedia benchmark workshop Penet C, Demarty CH, Soleymani M, Gravier G, Gros P (2012) Technicolor/inria/imperial college london at the mediaeval 2012 violent scene detection task. In: MediaEval 2012, multimedia benchmark workshop
36.
go back to reference Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536CrossRef Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536CrossRef
37.
go back to reference Safadi B, Quéenot G (2011) Lig at mediaeval 2011 affect task: use of a generic method. In: MediaEval 2011, multimedia benchmark, workshop Safadi B, Quéenot G (2011) Lig at mediaeval 2011 affect task: use of a generic method. In: MediaEval 2011, multimedia benchmark, workshop
38.
go back to reference Schlüter J, Ionescu B, Mironică I, Schedl M (2012) Arf @ mediaeval 2012: an uninformed approach to violence detection in hollywood movies. In: MediaEval 2012, multimedia benchmark, workshop Schlüter J, Ionescu B, Mironică I, Schedl M (2012) Arf @ mediaeval 2012: an uninformed approach to violence detection in hollywood movies. In: MediaEval 2012, multimedia benchmark, workshop
39.
go back to reference Violence (1996) A public health priority. Technical Report, World Health Organization, Geneva, WHO/EHA/SPI.POA.2 Violence (1996) A public health priority. Technical Report, World Health Organization, Geneva, WHO/EHA/SPI.POA.2
40.
go back to reference Zajdel W, Krijnders JD, Andringa T, Gavrila DM (2007) Cassandra: audio-video sensor fusion for aggression detection. In: IEEE conference on advanced video and signal based surveillance (AVSS 2007), pp 200–205 Zajdel W, Krijnders JD, Andringa T, Gavrila DM (2007) Cassandra: audio-video sensor fusion for aggression detection. In: IEEE conference on advanced video and signal based surveillance (AVSS 2007), pp 200–205
Metadata
Title
Multimodal Violence Detection in Hollywood Movies: State-of-the-Art and Benchmarking
Authors
Claire-Hélène Demarty
Cédric Penet
Bogdan Ionescu
Guillaume Gravier
Mohammad Soleymani
Copyright Year
2014
DOI
https://doi.org/10.1007/978-3-319-05696-8_8

Premium Partner