Top

Multimedia Systems

Published in:

30-03-2022 | Regular Paper

Predicting skeleton trajectories using a Skeleton-Transformer for video anomaly detection

Authors: Wenfeng Pang, Qianhua He, Yanxiong Li

Published in: Multimedia Systems | Issue 4/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Video anomaly detection detects video contents that do not conform to normal patterns offered by the training set. Because appearance-based features are susceptible to background interference, unlike most papers applying appearance-based methods, this paper proposes a novel Skeleton-Transformer (SkT) to predict future pose components in video frames and take errors between predicted pose components and corresponding expected values as anomaly scores. In SkT, we apply the multi-head self-attention (MSA) module and temporal convolutional layer (TCL), which are complementary because they focus on processing information from different viewpoints, to compose a skeleton attention (SkA) block. The MSA module can capture long-range dependencies between arbitrary pairwise pose components on spatial and temporal dimensions from different perspectives, while the TCL concentrates on local temporal information. Finally, multiple SkA blocks are stacked to form the major constituent of the SkT. To the best of our knowledge, the proposed approach is the first work applying Transformer framework to anomaly detection based on pose components, and we conduct experiments to determine the optimal structure. The proposed method achieves a frame-level AUC of 77.65% on the HR-ShanghaiTech dataset, exceeding state-of-the-art methods. Moreover, ablation studies validate each module’s effectiveness in the SkT, further verifying that the Transformer-based method is promising for anomaly detection.

previous article Deep learning in multimedia healthcare applications: a review

next article A deep learning-based framework for detecting COVID-19 patients using chest X-rays

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41, 1–58 (2009)CrossRef

Nayak, R., Chandra Pati, U., Kumar Das, S.: A comprehensive review on deep learning-based methods for video anomaly detection. Image Vis. Comput. 106, 104078 (2021). https://doi.org/10.1016/j.imavis.2020.104078CrossRef

Morais, R., Le, V., Tran, T., Saha, B., Mansour, M., Venkatesh, S.: Learning regularity in skeleton trajectories for anomaly detection in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11996–12004 (2019). github https://github.com/RomeroBarata/skeleton_based_anomaly_detection

Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2720–2727 (2013)

Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 733–742 (2016)

Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., Hua, X.-S.: Spatio-temporal autoencoder for video anomaly detection. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1933–1941 (2017)

Zhou, S., Shen, W., Zeng, D., Zhang, Z.: Unusual event detection in crowded scenes by trajectory analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1300–1304 (2015)

Kim, J., Grauman, K.: Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2928 (2009)

Schölkopf, B., Williamson, R.C., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. Adv. Neural. Inf. Process. Syst. 12, 582–588 (1999)

10.

Luo, W., Liu, W., Gao, S.: Remembering history with convolutional lstm for anomaly detection. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 439–444 (2017)

11.

Medel, J.R., Savakis, A.: Anomaly detection in video using predictive convolutional long short-term memory networks. arXiv:1612.00390 (arXiv preprint) (2016)

12.

Markovitz, A., Sharir, G., Friedman, I., Zelnik-Manor, L., Avidan, S.: Graph embedded pose clustering for anomaly detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10539–10547 (2020)

13.

Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)

14.

Xie, J., Ross, G., Ali, F.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487 (2016)

15.

Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning (ICML), pp. 7354–7363 (2019)

16.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 6000–6010 (2017)

17.

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (2019)

18.

Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. arXiv:2005.14165 (arXiv preprint) (2020)

19.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929 (arXiv preprint) (2020)

20.

He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: Transreid: transformer-based object re-identification. arXiv:2102.04378 (arXiv preprint) (2021)

21.

Dong, L., Xu, S., Xu, B.: Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5888 (2018). https://doi.org/10.1109/ICASSP.2018.8462506

22.

Li, N., Liu, S., Liu, Y., Zhao, S., Liu, M.: Neural speech synthesis with transformer network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6706–6713 (2019)

23.

Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271 (arXiv preprint) (2018)

24.

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078 (arXiv preprint) (2014)

25.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

26.

Smeureanu, S., Ionescu, R.T., Popescu, M., Alexe, B.: Deep appearance features for abnormal behavior detection in video. In: International Conference on Image Analysis and Processing, pp. 779–789 (2017)

27.

Hinami, R., Mei, T., Satoh, S.: Joint detection and recounting of abnormal events by learning deep generic knowledge. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3619–3627 (2017)

28.

Mansour, R.F., Escorcia-Gutierrez, J., Gamarra, M., Villanueva, J.A., Leal, N.: Intelligent video anomaly detection and classification using faster rcnn with deep reinforcement learning model. Image Vis. Comput. 112, 104229–104229 (2021)CrossRef

29.

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)CrossRef

30.

Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015)

31.

Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6536–6545 (2018)

32.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27, 2672–2680 (2014)

33.

Zhong, Y., Chen, X., Jiang, J., Ren, F.: A cascade reconstruction model with generalization ability evaluation for anomaly detection in videos. Pattern Recogn. 122, 108336 (2022)CrossRef

34.

Liu, Z., Nie, Y., Long, C., Zhang, Q., Li, G.: A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 13588–13597 (2021)

35.

Huang, J., Ling, C.X.: Using auc and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)CrossRef

36.

Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 341–349 (2017)

37.

Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)

38.

Fang, H.-S., Xie, S., Tai, Y.-W., Lu, C.: RMPE: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2353–2362 (2017)

39.

Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1975–1981 (2010)

40.

Kingma, P.D., Ba, L.J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)

41.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp. 448–456 (2015)

42.

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (arXiv preprint) (2016)

43.

Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. arXiv:1607.08022 (arXiv preprint) (2016)

44.

Wu, Y., He, K.: Group normalization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

Title: Predicting skeleton trajectories using a Skeleton-Transformer for video anomaly detection
Authors: Wenfeng Pang
Qianhua He
Yanxiong Li
Publication date: 30-03-2022
Publisher: Springer Berlin Heidelberg
Published in: Multimedia Systems / Issue 4/2022
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI: https://doi.org/10.1007/s00530-022-00915-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 4/2022

A deep learning-based framework for detecting COVID-19 patients using chest X-rays

Automatic segmentation of melanoma skin cancer using transfer learning and fine-tuning

From coarse to fine: multi-level feature fusion network for fine-grained image retrieval

A light-weight convolutional Neural Network Architecture for classification of COVID-19 chest X-Ray images

Fusion of AI techniques to tackle COVID-19 pandemic: models, incidence rates, and future trends

Special issue deep learning for multimedia healthcare