Skip to main content
Erschienen in: Pattern Analysis and Applications 2/2024

01.06.2024 | Original Article

A spatio-temporal model for violence detection based on spatial and temporal attention modules and 2D CNNs

verfasst von: Javad Mahmoodi, Hossein Nezamabadi-pour

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Violence detection is a difficult task because it involves analyzing video clips from multiple security cameras, which are located in various places and operate continuously. When violent crimes occur, a system should be able to reliably detect them in real-time and immediately alert a surveillance team. Currently, researchers employ deep learning models to detect violent behavior. Notably, a large number of deep learning approaches are based on extracting spatio-temporal information from a video by exploiting either 3D Convolutional Neural Networks (CNNs) or multi-stream networks. Despite their success, these techniques require a lot of parameters than 2D CNNs and have high computational complexity. Therefore, we present a simple spatio-temporal attention mechanism combined with a 2D CNN for an effective violence detection system. We propose a Squeeze Temporal Attention block that allows a 2D CNN to learn spatiotemporal features in videos. This effective block uses squeeze and temporal attention modules to summarize a video stream into three channels. In addition, we introduce spatial attention and feature fusion modules to improve the performance of the proposed system. The spatial attention module, Entropy Spatial Module, utilizes an entropy filter and frame differences to focus on spatial regions of the video with more movement. The fusion module parallelizes two dense layers with a 2D CNN to effectively enhance the classifier's performance. As a result, our proposed model achieves improved performance results in terms of accuracy when compared to Long Short-Term Memory, multi-stream networks, and current 3D CNNs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations ICLR 2015—conference track proceedings, pp 1–14 Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations ICLR 2015—conference track proceedings, pp 1–14
3.
4.
9.
Zurück zum Zitat Zhu Y, Lan Z, Newsam S, Hauptmann A (2017) Hidden two-stream convolutional networks for action recognition. Lecture notes in computer science (including subseries lecture notes artificial intelligence and lecture notes in bioinformatics), vol 11363. LNCS, pp 363–378. https://doi.org/10.1007/978-3-030-20893-6_23 Zhu Y, Lan Z, Newsam S, Hauptmann A (2017) Hidden two-stream convolutional networks for action recognition. Lecture notes in computer science (including subseries lecture notes artificial intelligence and lecture notes in bioinformatics), vol 11363. LNCS, pp 363–378. https://​doi.​org/​10.​1007/​978-3-030-20893-6_​23
11.
Zurück zum Zitat Song W, Zhang D, Zhao X, Yu J, Zheng R, Wang A (2019) A novel violent video detection scheme based on modified 3D Convolutional Neural Networks. IEEE Access 7:39172–39179CrossRef Song W, Zhang D, Zhao X, Yu J, Zheng R, Wang A (2019) A novel violent video detection scheme based on modified 3D Convolutional Neural Networks. IEEE Access 7:39172–39179CrossRef
13.
Zurück zum Zitat Ding C, Fan S, Zhu M, Feng W, Jia B (2014) Violence detection in video by using 3D Convolutional Neural Networks. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8888, pp 551–558. https://doi.org/10.1007/978-3-319-14364-4_53 Ding C, Fan S, Zhu M, Feng W, Jia B (2014) Violence detection in video by using 3D Convolutional Neural Networks. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8888, pp 551–558. https://​doi.​org/​10.​1007/​978-3-319-14364-4_​53
16.
Zurück zum Zitat Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. Lecture notes in computer science (including subseries lecture notes artificial intelligence and lecture notes in bioinformatics), vol 11219. LNCS, no 1, pp 318–335. https://doi.org/10.1007/978-3-030-01267-0_19 Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. Lecture notes in computer science (including subseries lecture notes artificial intelligence and lecture notes in bioinformatics), vol 11219. LNCS, no 1, pp 318–335. https://​doi.​org/​10.​1007/​978-3-030-01267-0_​19
20.
Zurück zum Zitat Chen M, Hauptmann A (2009) MoSIFT: recognizing human actions in surveillance videos. Informedia@TRECVID, pp 1–16 Chen M, Hauptmann A (2009) MoSIFT: recognizing human actions in surveillance videos. Informedia@TRECVID, pp 1–16
21.
Zurück zum Zitat Bermejo Nievas E, Deniz Suarez O, Bueno García G, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 6855. LNCS, no PART 2, pp 332–339. https://doi.org/10.1007/978-3-642-23678-5_39 Bermejo Nievas E, Deniz Suarez O, Bueno García G, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 6855. LNCS, no PART 2, pp 332–339. https://​doi.​org/​10.​1007/​978-3-642-23678-5_​39
31.
Zurück zum Zitat Traore A, Akhloufi MA, Traoré A, Akhloufi MA, Traore A, Akhloufi MA (2020) Violence detection in videos using deep recurrent and Convolutional Neural Networks. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC), vol 2020-Octob, pp 154–159. https://doi.org/10.1109/SMC42975.2020.9282971 Traore A, Akhloufi MA, Traoré A, Akhloufi MA, Traore A, Akhloufi MA (2020) Violence detection in videos using deep recurrent and Convolutional Neural Networks. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC), vol 2020-Octob, pp 154–159. https://​doi.​org/​10.​1109/​SMC42975.​2020.​9282971
32.
Zurück zum Zitat Tan M, Le QV (2019) EfficientNet: rethinking model scaling for Convolutional Neural Networks. In: 36th international conference on machine learning, ICML 2019, vol 2019-June, pp 10691–10700 Tan M, Le QV (2019) EfficientNet: rethinking model scaling for Convolutional Neural Networks. In: 36th international conference on machine learning, ICML 2019, vol 2019-June, pp 10691–10700
Metadaten
Titel
A spatio-temporal model for violence detection based on spatial and temporal attention modules and 2D CNNs
verfasst von
Javad Mahmoodi
Hossein Nezamabadi-pour
Publikationsdatum
01.06.2024
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 2/2024
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-024-01265-0

Weitere Artikel der Ausgabe 2/2024

Pattern Analysis and Applications 2/2024 Zur Ausgabe

Premium Partner