Top

Published in:

2019 | OriginalPaper | Chapter

Natural Language Description of Surveillance Events

Authors : Sk. Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, Partha Pratim Roy

Published in: Information Technology and Applied Mathematics

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper presents a novel method to represent hours of surveillance video in a pattern-based text log. We present a tag and template-based technique that automatically generates natural language descriptions of surveillance events. We combine the output of some of the existing object tracker, deep learning guided object and action classifiers, and graph-based scene knowledge to assign hierarchical tags and generate natural language description of surveillance events. Unlike some state-of-the-art image and short video descriptor methods, our approach can describe videos, specifically surveillance videos by combining frame-level, temporal-level, and behavior-level target tags/features. We evaluate our method against two baseline video descriptors, and our analysis suggests that supervised scene knowledge and template can improve video descriptions, specially in surveillance videos.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Pre-bond Testing of TSVs in 3D IC Using Segmented Cellular Automata

next chapter Real-Time Intrusion Detection System Using Computational Intelligence and Neural Network: Review, Analysis and Anticipated Solution of Machine Learning

https://github.com/tiny-dnn.

https://github.com/cmu-mtlab/meteor.

Aradhye, H., Toderici, G., Yagnik, J.: Video2text: learning to annotate video content. In: IEEE International Conference on Data Mining Workshops, 2009. ICDMW’09, pp. 144–151. IEEE (2009)

Chen, X., Lawrence, Z.C.: Mind’s eye: a recurrent visual representation for image caption generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2422–2431 (2015)

Denkowski, M., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380 (2014)

Dogra, D.P., Ahmed, A., Bhaskar, H.: Smart video summarization using mealy machine-based trajectory modelling for surveillance applications. Multimed. Tools Appl. 75(11), 6373–6401 (2016)CrossRef

Donahue, J., Anne, H.L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)

Guadarrama, S., Krishnamoorthy, N., Malkarnenkar, G., Venugopalan, S., Mooney, R., Darrell, T., Saenko, K.: Youtube2text: recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2712–2719 (2013)

Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRef

Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47, 853–899 (2013)MathSciNetMATH

Huang, H., Lu, Y., Zhang, F., Sun, S.: A multi-modal clustering method for web videos. In: International Conference on Trustworthy Computing and Services, pp. 163–169. Springer (2012)CrossRef

10.

Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)

11.

Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying Visual-semantic Embeddings with Multimodal Neural Language Models (2014). arXiv:1411.2539

12.

Krishnamoorthy, N., Malkarnenkar, G., Mooney, R.J., Saenko, K., Guadarrama, S.: Generating natural-language video descriptions using text-mined knowledge. In: AAAI, vol. 1, p. 2 (2013)

13.

Kuznetsova, P., Ordonez, V., Berg, T.L., Choi, Y.: Treetalk: composition and compression of trees for image descriptions. TACL 2(10), 351–362 (2014)

14.

Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C.C., Lee, J.T., Mukherjee, S., Aggarwal, J., Lee, H., Davis, L., et al.: A large-scale benchmark dataset for event recognition in surveillance video. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 3153–3160. IEEE (2011)

15.

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)

16.

Rohrbach, A., Rohrbach, M., Tandon, N., Schiele, B.: A dataset for movie description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3202–3212 (2015)

17.

Rohrbach, M., Qiu, W., Titov, I., Thater, S., Pinkal, M., Schiele, B.: Translating video content to natural language descriptions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 433–440 (2013)

18.

Vedantam, R., Lawrence, Zitnick, C., Parikh, D.: Cider: consensus-based image description evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575 (2015)

19.

Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4534–4542 (2015)

20.

Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., Saenko, K.: Translating Videos to Natural Language Using Deep Recurrent Neural Networks (2014). arXiv:1412.4729

21.

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)

22.

Wei, S., Zhao, Y., Zhu, Z., Liu, N.: Multimodal fusion for video searchreranking. IEEE Trans. Knowl. Data Eng. 22(8), 1191–1199 (2010)CrossRef

23.

Welch, G., Bishop, G.: An introduction to the Kalman filter. In: Annual Conference Computer Graphics Interactions Technology, pp. 12–17. ACM (2001)

24.

Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing videos by exploiting temporal structure. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4507–4515 (2015)

25.

Zivkovic, Z.: Improved adaptive Gaussian mixture model for background subtraction. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 2, pp. 28–31. IEEE (2004)

Title: Natural Language Description of Surveillance Events
Authors: Sk. Arif Ahmed
Debi Prosad Dogra
Samarjit Kar
Partha Pratim Roy
Publisher: Springer Singapore
Book: Information Technology and Applied Mathematics
Print ISBN: 978-981-10-7589-6

Electronic ISBN: 978-981-10-7590-2

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-981-10-7590-2_10

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner