skip to main content
10.1145/3474085.3475693acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection

Published:17 October 2021Publication History

ABSTRACT

Detecting abnormal activities in real-world surveillance videos is an important yet challenging task as the prior knowledge about video anomalies is usually limited or unavailable. Despite that many approaches have been developed to resolve this problem, few of them can capture the normal spatio-temporal patterns effectively and efficiently. Moreover, existing works seldom explicitly consider the local consistency at frame level and global coherence of temporal dynamics in video sequences. To this end, we propose Convolutional Transformer based Dual Discriminator Generative Adversarial Networks (CT-D2GAN) to perform unsupervised video anomaly detection. Specifically, we first present a convolutional transformer to perform future frame prediction. It contains three key components, i.e., a convolutional encoder to capture the spatial information of the input video clips, a temporal self-attention module to encode the temporal dynamics, and a convolutional decoder to integrate spatio-temporal features and predict the future frame. Next, a dual discriminator based adversarial training procedure, which jointly considers an image discriminator that can maintain the local consistency at frame-level and a video discriminator that can enforce the global coherence of temporal dynamics, is employed to enhance the future frame prediction. Finally, the prediction error is used to identify abnormal video frames. Thoroughly empirical studies on three public video anomaly detection datasets, i.e., UCSD Ped2, CUHK Avenue, and Shanghai Tech Campus, demonstrate the effectiveness of the proposed adversarial spatio-temporal modeling framework.

Skip Supplemental Material Section

Supplemental Material

ct-d2gan-presentation.mp4

mp4

19 MB

References

  1. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning (ICML). PMLR, 214--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European Conference on Computer Vision (ECCV). Springer, 25--36.Google ScholarGoogle ScholarCross RefCross Ref
  3. Yunpeng Chang, Zhigang Tu, Wei Xie, and Junsong Yuan. 2020. Clustering Driven Deep Autoencoder for Video Anomaly Detection. In European Conference on Computer Vision (ECCV). Springer, 329--345.Google ScholarGoogle Scholar
  4. Yong Shean Chong and Yong Haur Tay. 2017. Abnormal event detection in videos using spatiotemporal autoencoder. In International Symposium on Neural Networks (ISNN). Springer, 189--196.Google ScholarGoogle ScholarCross RefCross Ref
  5. Fei Dong, Yu Zhang, and Xiushan Nie. 2020. Dual Discriminator Generative Adversarial Network for Video Anomaly Detection. IEEE Access, Vol. 8 (2020), 88170--88176.Google ScholarGoogle ScholarCross RefCross Ref
  6. Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. 2019. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In IEEE International Conference on Computer Vision (ICCV). IEEE, 1705--1714.Google ScholarGoogle Scholar
  7. Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems (NIPS). 5767--5777. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury, and Larry S Davis. 2016. Learning temporal regularity in video sequences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 733--742.Google ScholarGoogle ScholarCross RefCross Ref
  9. Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2462--2470.Google ScholarGoogle ScholarCross RefCross Ref
  10. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5967--5976.Google ScholarGoogle Scholar
  11. Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR) (2015).Google ScholarGoogle Scholar
  12. Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. In Advances in Neural Information Processing Systems (NIPS). 971--980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2014. Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 1 (2014), 18--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future Frame Prediction for Anomaly Detection -- A New Baseline. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 6536--6545.Google ScholarGoogle ScholarCross RefCross Ref
  15. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3431--3440.Google ScholarGoogle ScholarCross RefCross Ref
  16. Cewu Lu, Jianping Shi, and Jiaya Jia. 2013. Abnormal event detection at 150 FPS in MATLAB. In IEEE International Conference on Computer Vision (ICCV). IEEE, 2720--2727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Weixin Luo, Wen Liu, and Shenghua Gao. 2017a. Remembering history with convolutional LS™ for anomaly detection. In IEEE International Conference on Multimedia and Expo (ICME). IEEE, 439--444.Google ScholarGoogle Scholar
  18. Weixin Luo, Wen Liu, and Shenghua Gao. 2017b. A revisit of sparse coding based anomaly detection in stacked RNN framework. IEEE International Conference on Computer Vision (ICCV), Vol. 1, 2 (2017), 3.Google ScholarGoogle ScholarCross RefCross Ref
  19. Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos. 2010. Anomaly detection in crowded scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1975--1981.Google ScholarGoogle ScholarCross RefCross Ref
  20. Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  21. Tu Nguyen, Trung Le, Hung Vu, and Dinh Phung. 2017. Dual discriminator generative adversarial nets. In Advances in neural information processing systems (NIPS). 2670--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hyunjong Park, Jongyoun Noh, and Bumsub Ham. 2020. Learning Memory-guided Normality for Anomaly Detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 14372--14381.Google ScholarGoogle Scholar
  23. Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image Transformer. In International Conference on Machine Learning (ICML). PMLR, 4052--4061.Google ScholarGoogle Scholar
  24. Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison W. Cottrell. 2017. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In International Joint Conference on Artificial Intelligence (IJCAI). 2627--26332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mahdyar Ravanbakhsh, Moin Nabi, Enver Sangineto, Lucio Marcenaro, Carlo Regazzoni, and Nicu Sebe. 2017. Abnormal Event Detection in Videos using Generative Adversarial Nets. IEEE International Conference on Image Processing (ICIP) (2017), 1577--1581.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, 234--241.Google ScholarGoogle ScholarCross RefCross Ref
  27. Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LS™ network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems (NIPS). 802--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems (NIPS). 568--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dongjin Song and Dacheng Tao. 2010. Biologically Inspired Feature Manifold for Scene Classification. IEEE Transactions on Image Processing, Vol. 19, 1 (2010), 174--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 6479--6488.Google ScholarGoogle Scholar
  31. Yao Tang, Lin Zhao, Shanshan Zhang, Chen Gong, Guangyu Li, and Jian Yang. 2020. Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, Vol. 129 (2020), 123--130.Google ScholarGoogle ScholarCross RefCross Ref
  32. Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. MoCoGAN: Decomposing Motion and Content for Video Generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1526--1535.Google ScholarGoogle Scholar
  33. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS). 6000--6010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dan Xu, Yan Yan, Elisa Ricci, and Nicu Sebe. 2017. Detecting anomalous events in videos by learning deep representations of appearance and motion. Computer Vision and Image Understanding, Vol. 156 (2017), 117--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Han Xu, Pengwei Liang, Wei Yu, Junjun Jiang, and Jiayi Ma. 2019. Learning a Generative Model for Fusing Infrared and Visible Images via Conditional Generative Adversarial Network with Dual Discriminators.. In International Joint Conference on Artificial Intelligence (IJCAI). 3954--3960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Fisher Yu and Vladlen Koltun. 2016. Multi-scale context aggregation by dilated convolutions. International Conference on Learning Representations (ICLR) (2016).Google ScholarGoogle Scholar
  37. Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5505--5514.Google ScholarGoogle Scholar
  38. Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, and V. Nitesh Chawla. 2019 b. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. In Association for the Advancement of Artificial Intelligence (AAAI). AAAI, 1409--1416.Google ScholarGoogle Scholar
  39. Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019 a. Self-attention generative adversarial networks. In International Conference on Machine Learning (ICML). PMLR, 7354--7363.Google ScholarGoogle Scholar
  40. Yiru Zhao, Bing Deng, Chen Shen, Yao Liu, Hongtao Lu, and Xian-Sheng Hua. 2017. Spatio-Temporal AutoEncoder for Video Anomaly Detection. In ACM International Conference on Multimedia (ACM MM). ACM, 1933--1941. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            MM '21: Proceedings of the 29th ACM International Conference on Multimedia
            October 2021
            5796 pages
            ISBN:9781450386517
            DOI:10.1145/3474085

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 October 2021

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate995of4,171submissions,24%

            Upcoming Conference

            MM '24
            MM '24: The 32nd ACM International Conference on Multimedia
            October 28 - November 1, 2024
            Melbourne , VIC , Australia

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader