Skip to main content

2017 | OriginalPaper | Buchkapitel

Trajectory-Pooled Deep Convolutional Networks for Violence Detection in Videos

verfasst von : Zihan Meng, Jiabin Yuan, Zhen Li

Erschienen in: Computer Vision Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Violence detection in videos is of great importance in many applications, ranging from teenagers protection to online media filtering and searching to surveillance systems. Typical methods mostly rely on hand-crafted features, which may lack enough discriminative capacity for the specific task of violent action recognition. Inspired by the good performance of deep models for human action recognition, we propose a novel method for detecting human violent behaviour in videos by integrating trajectory and deep convolutional neural networks, which takes advantage of hand-crafted features [21] and deep-learned features [23]. To evaluate this method, we carry out experiments on two different violence datasets: Hockey Fights dataset and Crowd Violence dataset. The results demonstrate the advantage of our method over state-of-the art methods on these datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Annane, D., Chevrolet, J.C., Chevret, S., Raphael, J.C.: Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 1(4), 568–576 (2014) Annane, D., Chevrolet, J.C., Chevret, S., Raphael, J.C.: Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 1(4), 568–576 (2014)
2.
Zurück zum Zitat Chen, M.Y., Hauptmann, A.: Mosift: recognizing human actions in surveillance videos. Ann. Pharmacother. 39(1), 150–152 (2009) Chen, M.Y., Hauptmann, A.: Mosift: recognizing human actions in surveillance videos. Ann. Pharmacother. 39(1), 150–152 (2009)
3.
Zurück zum Zitat Cheng, W.H., Chu, W.T., Wu, J.L.: Semantic context detection based on hierarchical audio models. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 109–115 (2003) Cheng, W.H., Chu, W.T., Wu, J.L.: Semantic context detection based on hierarchical audio models. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 109–115 (2003)
4.
Zurück zum Zitat Datta, A., Shah, M., Lobo, N.D.V.: Person-on-person violence detection in video data. In: Proceedings of International Conference on Pattern Recognition, vol. 1, pp. 433–438 (2002) Datta, A., Shah, M., Lobo, N.D.V.: Person-on-person violence detection in video data. In: Proceedings of International Conference on Pattern Recognition, vol. 1, pp. 433–438 (2002)
5.
Zurück zum Zitat Deniz, O., Serrano, I., Bueno, G., Kim, T.K.: Fast violence detection in video. In: The International Conference on Computer Vision Theory and Applications, pp. 478–485 (2014) Deniz, O., Serrano, I., Bueno, G., Kim, T.K.: Fast violence detection in video. In: The International Conference on Computer Vision Theory and Applications, pp. 478–485 (2014)
6.
Zurück zum Zitat Ding, C., Fan, S., Zhu, M., Feng, W., Jia, B.: Violence detection in video by using 3D convolutional neural networks. In: Bebis, G., et al. (eds.) ISVC 2014. LNCS, vol. 8888, pp. 551–558. Springer, Cham (2014). doi:10.1007/978-3-319-14364-4_53 Ding, C., Fan, S., Zhu, M., Feng, W., Jia, B.: Violence detection in video by using 3D convolutional neural networks. In: Bebis, G., et al. (eds.) ISVC 2014. LNCS, vol. 8888, pp. 551–558. Springer, Cham (2014). doi:10.​1007/​978-3-319-14364-4_​53
7.
Zurück zum Zitat Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds.) CCPR 2016. CCIS, vol. 662, pp. 517–531. Springer, Singapore (2016). doi:10.1007/978-981-10-3002-4_43 CrossRef Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds.) CCPR 2016. CCIS, vol. 662, pp. 517–531. Springer, Singapore (2016). doi:10.​1007/​978-981-10-3002-4_​43 CrossRef
8.
Zurück zum Zitat Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). doi:10.1007/3-540-45103-X_50 CrossRef Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). doi:10.​1007/​3-540-45103-X_​50 CrossRef
9.
Zurück zum Zitat Garcia-Garcia, A., Gomez-Donoso, F., Garcia-Rodriguez, J., Orts-Escolano, S., Cazorla, M., Azorin-Lopez, J.: PointNet: a 3D convolutional neural network for real-time object class recognition. In: International Joint Conference on Neural Networks, pp. 1578–1584 (2016) Garcia-Garcia, A., Gomez-Donoso, F., Garcia-Rodriguez, J., Orts-Escolano, S., Cazorla, M., Azorin-Lopez, J.: PointNet: a 3D convolutional neural network for real-time object class recognition. In: International Joint Conference on Neural Networks, pp. 1578–1584 (2016)
10.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition, pp. 580–587 (2014)
11.
Zurück zum Zitat Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behavior. In: Computer Vision and Pattern Recognition Workshops, pp. 1–6 (2012) Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behavior. In: Computer Vision and Pattern Recognition Workshops, pp. 1–6 (2012)
12.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
13.
Zurück zum Zitat Lev, G., Sadeh, G., Klein, B., Wolf, L.: RNN fisher vectors for action recognition and image annotation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 833–850. Springer, Cham (2016). doi:10.1007/978-3-319-46466-4_50 CrossRef Lev, G., Sadeh, G., Klein, B., Wolf, L.: RNN fisher vectors for action recognition and image annotation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 833–850. Springer, Cham (2016). doi:10.​1007/​978-3-319-46466-4_​50 CrossRef
14.
Zurück zum Zitat Li, W., Li, X., Qiu, J.: Human action recognition based on dense of spatio-temporal interest points and HOG-3D descriptor. In: International Conference on Internet Multimedia Computing and Service, p. 44 (2015) Li, W., Li, X., Qiu, J.: Human action recognition based on dense of spatio-temporal interest points and HOG-3D descriptor. In: International Conference on Internet Multimedia Computing and Service, p. 44 (2015)
15.
Zurück zum Zitat Liu, P., Wang, J., She, M., Liu, H.: Human action recognition based on 3D SIFT and LDA model. In: Robotic Intelligence in Informationally Structured Space, pp. 12–17 (2011) Liu, P., Wang, J., She, M., Liu, H.: Human action recognition based on 3D SIFT and LDA model. In: Robotic Intelligence in Informationally Structured Space, pp. 12–17 (2011)
16.
Zurück zum Zitat Nam, J., Alghoniemy, M., Tewfik, A.H.: Audio-visual content-based violent scene characterization. In: Proceedings of International Conference on Image Processing, ICIP 1998, pp. 353–357 (1998) Nam, J., Alghoniemy, M., Tewfik, A.H.: Audio-visual content-based violent scene characterization. In: Proceedings of International Conference on Image Processing, ICIP 1998, pp. 353–357 (1998)
17.
Zurück zum Zitat Nchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)MathSciNetCrossRefMATH Nchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)MathSciNetCrossRefMATH
18.
Zurück zum Zitat Nievas, E.B., Suarez, O.D., García, G.B., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23678-5_39 CrossRef Nievas, E.B., Suarez, O.D., García, G.B., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-23678-5_​39 CrossRef
19.
Zurück zum Zitat Sadlier, D.A., O’Connor, N.E.: Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans. Circ. Syst. Video Technol. 15(10), 1225–1233 (2005)CrossRef Sadlier, D.A., O’Connor, N.E.: Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans. Circ. Syst. Video Technol. 15(10), 1225–1233 (2005)CrossRef
20.
Zurück zum Zitat Samanta, S., Chanda, B.: FaSTIP: a new method for detection and description of space-time interest points for human activity classification. In: Eighth Indian Conference on Computer Vision, Graphics and Image Processing, p. 8 (2012) Samanta, S., Chanda, B.: FaSTIP: a new method for detection and description of space-time interest points for human activity classification. In: Eighth Indian Conference on Computer Vision, Graphics and Image Processing, p. 8 (2012)
21.
Zurück zum Zitat Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, pp. 3551–3558 (2013) Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
22.
Zurück zum Zitat Wang, L., Guo, S., Huang, W., Qiao, Y.: Places205-vggnet models for scene recognition. Comput. Sci. (2015). arXiv preprint arXiv:1508.01667 Wang, L., Guo, S., Huang, W., Qiao, Y.: Places205-vggnet models for scene recognition. Comput. Sci. (2015). arXiv preprint arXiv:​1508.​01667
23.
Zurück zum Zitat Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314 (2015) Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314 (2015)
24.
Zurück zum Zitat Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). doi:10.1007/978-3-319-46484-8_2 CrossRef Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). doi:10.​1007/​978-3-319-46484-8_​2 CrossRef
25.
Zurück zum Zitat Yue, H., Chen, W., Wu, X., Wang, J.: Visualizing bag-of-words for high-resolution remote sensing image classification. J. Appl. Remote Sens. 10(1), 015022 (2016)CrossRef Yue, H., Chen, W., Wu, X., Wang, J.: Visualizing bag-of-words for high-resolution remote sensing image classification. J. Appl. Remote Sens. 10(1), 015022 (2016)CrossRef
26.
Zurück zum Zitat Zhang, T., Yang, Z., Jia, W., Yang, B., Yang, J., He, X.: A new method for violence detection in surveillance scenes. Multimedia Tools Appl. 75(12), 7327–7349 (2016)CrossRef Zhang, T., Yang, Z., Jia, W., Yang, B., Yang, J., He, X.: A new method for violence detection in surveillance scenes. Multimedia Tools Appl. 75(12), 7327–7349 (2016)CrossRef
Metadaten
Titel
Trajectory-Pooled Deep Convolutional Networks for Violence Detection in Videos
verfasst von
Zihan Meng
Jiabin Yuan
Zhen Li
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-68345-4_39