Skip to main content
Top
Published in: Pattern Analysis and Applications 2/2024

01-06-2024 | Original Article

A spatio-temporal model for violence detection based on spatial and temporal attention modules and 2D CNNs

Authors: Javad Mahmoodi, Hossein Nezamabadi-pour

Published in: Pattern Analysis and Applications | Issue 2/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Violence detection is a difficult task because it involves analyzing video clips from multiple security cameras, which are located in various places and operate continuously. When violent crimes occur, a system should be able to reliably detect them in real-time and immediately alert a surveillance team. Currently, researchers employ deep learning models to detect violent behavior. Notably, a large number of deep learning approaches are based on extracting spatio-temporal information from a video by exploiting either 3D Convolutional Neural Networks (CNNs) or multi-stream networks. Despite their success, these techniques require a lot of parameters than 2D CNNs and have high computational complexity. Therefore, we present a simple spatio-temporal attention mechanism combined with a 2D CNN for an effective violence detection system. We propose a Squeeze Temporal Attention block that allows a 2D CNN to learn spatiotemporal features in videos. This effective block uses squeeze and temporal attention modules to summarize a video stream into three channels. In addition, we introduce spatial attention and feature fusion modules to improve the performance of the proposed system. The spatial attention module, Entropy Spatial Module, utilizes an entropy filter and frame differences to focus on spatial regions of the video with more movement. The fusion module parallelizes two dense layers with a 2D CNN to effectively enhance the classifier's performance. As a result, our proposed model achieves improved performance results in terms of accuracy when compared to Long Short-Term Memory, multi-stream networks, and current 3D CNNs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations ICLR 2015—conference track proceedings, pp 1–14 Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations ICLR 2015—conference track proceedings, pp 1–14
3.
4.
9.
go back to reference Zhu Y, Lan Z, Newsam S, Hauptmann A (2017) Hidden two-stream convolutional networks for action recognition. Lecture notes in computer science (including subseries lecture notes artificial intelligence and lecture notes in bioinformatics), vol 11363. LNCS, pp 363–378. https://doi.org/10.1007/978-3-030-20893-6_23 Zhu Y, Lan Z, Newsam S, Hauptmann A (2017) Hidden two-stream convolutional networks for action recognition. Lecture notes in computer science (including subseries lecture notes artificial intelligence and lecture notes in bioinformatics), vol 11363. LNCS, pp 363–378. https://​doi.​org/​10.​1007/​978-3-030-20893-6_​23
11.
go back to reference Song W, Zhang D, Zhao X, Yu J, Zheng R, Wang A (2019) A novel violent video detection scheme based on modified 3D Convolutional Neural Networks. IEEE Access 7:39172–39179CrossRef Song W, Zhang D, Zhao X, Yu J, Zheng R, Wang A (2019) A novel violent video detection scheme based on modified 3D Convolutional Neural Networks. IEEE Access 7:39172–39179CrossRef
13.
go back to reference Ding C, Fan S, Zhu M, Feng W, Jia B (2014) Violence detection in video by using 3D Convolutional Neural Networks. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8888, pp 551–558. https://doi.org/10.1007/978-3-319-14364-4_53 Ding C, Fan S, Zhu M, Feng W, Jia B (2014) Violence detection in video by using 3D Convolutional Neural Networks. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8888, pp 551–558. https://​doi.​org/​10.​1007/​978-3-319-14364-4_​53
16.
go back to reference Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. Lecture notes in computer science (including subseries lecture notes artificial intelligence and lecture notes in bioinformatics), vol 11219. LNCS, no 1, pp 318–335. https://doi.org/10.1007/978-3-030-01267-0_19 Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. Lecture notes in computer science (including subseries lecture notes artificial intelligence and lecture notes in bioinformatics), vol 11219. LNCS, no 1, pp 318–335. https://​doi.​org/​10.​1007/​978-3-030-01267-0_​19
20.
go back to reference Chen M, Hauptmann A (2009) MoSIFT: recognizing human actions in surveillance videos. Informedia@TRECVID, pp 1–16 Chen M, Hauptmann A (2009) MoSIFT: recognizing human actions in surveillance videos. Informedia@TRECVID, pp 1–16
21.
go back to reference Bermejo Nievas E, Deniz Suarez O, Bueno García G, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 6855. LNCS, no PART 2, pp 332–339. https://doi.org/10.1007/978-3-642-23678-5_39 Bermejo Nievas E, Deniz Suarez O, Bueno García G, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 6855. LNCS, no PART 2, pp 332–339. https://​doi.​org/​10.​1007/​978-3-642-23678-5_​39
31.
go back to reference Traore A, Akhloufi MA, Traoré A, Akhloufi MA, Traore A, Akhloufi MA (2020) Violence detection in videos using deep recurrent and Convolutional Neural Networks. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC), vol 2020-Octob, pp 154–159. https://doi.org/10.1109/SMC42975.2020.9282971 Traore A, Akhloufi MA, Traoré A, Akhloufi MA, Traore A, Akhloufi MA (2020) Violence detection in videos using deep recurrent and Convolutional Neural Networks. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC), vol 2020-Octob, pp 154–159. https://​doi.​org/​10.​1109/​SMC42975.​2020.​9282971
32.
go back to reference Tan M, Le QV (2019) EfficientNet: rethinking model scaling for Convolutional Neural Networks. In: 36th international conference on machine learning, ICML 2019, vol 2019-June, pp 10691–10700 Tan M, Le QV (2019) EfficientNet: rethinking model scaling for Convolutional Neural Networks. In: 36th international conference on machine learning, ICML 2019, vol 2019-June, pp 10691–10700
Metadata
Title
A spatio-temporal model for violence detection based on spatial and temporal attention modules and 2D CNNs
Authors
Javad Mahmoodi
Hossein Nezamabadi-pour
Publication date
01-06-2024
Publisher
Springer London
Published in
Pattern Analysis and Applications / Issue 2/2024
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-024-01265-0

Other articles of this Issue 2/2024

Pattern Analysis and Applications 2/2024 Go to the issue

Premium Partner