ABSTRACT
Detecting abnormal activities in real-world surveillance videos is an important yet challenging task as the prior knowledge about video anomalies is usually limited or unavailable. Despite that many approaches have been developed to resolve this problem, few of them can capture the normal spatio-temporal patterns effectively and efficiently. Moreover, existing works seldom explicitly consider the local consistency at frame level and global coherence of temporal dynamics in video sequences. To this end, we propose Convolutional Transformer based Dual Discriminator Generative Adversarial Networks (CT-D2GAN) to perform unsupervised video anomaly detection. Specifically, we first present a convolutional transformer to perform future frame prediction. It contains three key components, i.e., a convolutional encoder to capture the spatial information of the input video clips, a temporal self-attention module to encode the temporal dynamics, and a convolutional decoder to integrate spatio-temporal features and predict the future frame. Next, a dual discriminator based adversarial training procedure, which jointly considers an image discriminator that can maintain the local consistency at frame-level and a video discriminator that can enforce the global coherence of temporal dynamics, is employed to enhance the future frame prediction. Finally, the prediction error is used to identify abnormal video frames. Thoroughly empirical studies on three public video anomaly detection datasets, i.e., UCSD Ped2, CUHK Avenue, and Shanghai Tech Campus, demonstrate the effectiveness of the proposed adversarial spatio-temporal modeling framework.
Supplemental Material
- Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning (ICML). PMLR, 214--223. Google ScholarDigital Library
- Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European Conference on Computer Vision (ECCV). Springer, 25--36.Google ScholarCross Ref
- Yunpeng Chang, Zhigang Tu, Wei Xie, and Junsong Yuan. 2020. Clustering Driven Deep Autoencoder for Video Anomaly Detection. In European Conference on Computer Vision (ECCV). Springer, 329--345.Google Scholar
- Yong Shean Chong and Yong Haur Tay. 2017. Abnormal event detection in videos using spatiotemporal autoencoder. In International Symposium on Neural Networks (ISNN). Springer, 189--196.Google ScholarCross Ref
- Fei Dong, Yu Zhang, and Xiushan Nie. 2020. Dual Discriminator Generative Adversarial Network for Video Anomaly Detection. IEEE Access, Vol. 8 (2020), 88170--88176.Google ScholarCross Ref
- Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. 2019. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In IEEE International Conference on Computer Vision (ICCV). IEEE, 1705--1714.Google Scholar
- Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems (NIPS). 5767--5777. Google ScholarDigital Library
- Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury, and Larry S Davis. 2016. Learning temporal regularity in video sequences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 733--742.Google ScholarCross Ref
- Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2462--2470.Google ScholarCross Ref
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5967--5976.Google Scholar
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR) (2015).Google Scholar
- Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. In Advances in Neural Information Processing Systems (NIPS). 971--980. Google ScholarDigital Library
- Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2014. Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 1 (2014), 18--32. Google ScholarDigital Library
- Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future Frame Prediction for Anomaly Detection -- A New Baseline. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 6536--6545.Google ScholarCross Ref
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3431--3440.Google ScholarCross Ref
- Cewu Lu, Jianping Shi, and Jiaya Jia. 2013. Abnormal event detection at 150 FPS in MATLAB. In IEEE International Conference on Computer Vision (ICCV). IEEE, 2720--2727. Google ScholarDigital Library
- Weixin Luo, Wen Liu, and Shenghua Gao. 2017a. Remembering history with convolutional LS™ for anomaly detection. In IEEE International Conference on Multimedia and Expo (ICME). IEEE, 439--444.Google Scholar
- Weixin Luo, Wen Liu, and Shenghua Gao. 2017b. A revisit of sparse coding based anomaly detection in stacked RNN framework. IEEE International Conference on Computer Vision (ICCV), Vol. 1, 2 (2017), 3.Google ScholarCross Ref
- Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos. 2010. Anomaly detection in crowded scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1975--1981.Google ScholarCross Ref
- Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In International Conference on Learning Representations (ICLR).Google Scholar
- Tu Nguyen, Trung Le, Hung Vu, and Dinh Phung. 2017. Dual discriminator generative adversarial nets. In Advances in neural information processing systems (NIPS). 2670--2680. Google ScholarDigital Library
- Hyunjong Park, Jongyoun Noh, and Bumsub Ham. 2020. Learning Memory-guided Normality for Anomaly Detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 14372--14381.Google Scholar
- Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image Transformer. In International Conference on Machine Learning (ICML). PMLR, 4052--4061.Google Scholar
- Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison W. Cottrell. 2017. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In International Joint Conference on Artificial Intelligence (IJCAI). 2627--26332. Google ScholarDigital Library
- Mahdyar Ravanbakhsh, Moin Nabi, Enver Sangineto, Lucio Marcenaro, Carlo Regazzoni, and Nicu Sebe. 2017. Abnormal Event Detection in Videos using Generative Adversarial Nets. IEEE International Conference on Image Processing (ICIP) (2017), 1577--1581.Google ScholarDigital Library
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, 234--241.Google ScholarCross Ref
- Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LS™ network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems (NIPS). 802--810. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems (NIPS). 568--576. Google ScholarDigital Library
- Dongjin Song and Dacheng Tao. 2010. Biologically Inspired Feature Manifold for Scene Classification. IEEE Transactions on Image Processing, Vol. 19, 1 (2010), 174--184. Google ScholarDigital Library
- Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 6479--6488.Google Scholar
- Yao Tang, Lin Zhao, Shanshan Zhang, Chen Gong, Guangyu Li, and Jian Yang. 2020. Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, Vol. 129 (2020), 123--130.Google ScholarCross Ref
- Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. MoCoGAN: Decomposing Motion and Content for Video Generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1526--1535.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS). 6000--6010. Google ScholarDigital Library
- Dan Xu, Yan Yan, Elisa Ricci, and Nicu Sebe. 2017. Detecting anomalous events in videos by learning deep representations of appearance and motion. Computer Vision and Image Understanding, Vol. 156 (2017), 117--127. Google ScholarDigital Library
- Han Xu, Pengwei Liang, Wei Yu, Junjun Jiang, and Jiayi Ma. 2019. Learning a Generative Model for Fusing Infrared and Visible Images via Conditional Generative Adversarial Network with Dual Discriminators.. In International Joint Conference on Artificial Intelligence (IJCAI). 3954--3960. Google ScholarDigital Library
- Fisher Yu and Vladlen Koltun. 2016. Multi-scale context aggregation by dilated convolutions. International Conference on Learning Representations (ICLR) (2016).Google Scholar
- Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5505--5514.Google Scholar
- Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, and V. Nitesh Chawla. 2019 b. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. In Association for the Advancement of Artificial Intelligence (AAAI). AAAI, 1409--1416.Google Scholar
- Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019 a. Self-attention generative adversarial networks. In International Conference on Machine Learning (ICML). PMLR, 7354--7363.Google Scholar
- Yiru Zhao, Bing Deng, Chen Shen, Yao Liu, Hongtao Lu, and Xian-Sheng Hua. 2017. Spatio-Temporal AutoEncoder for Video Anomaly Detection. In ACM International Conference on Multimedia (ACM MM). ACM, 1933--1941. Google ScholarDigital Library
Index Terms
- Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection
Recommendations
Spatio-Temporal AutoEncoder for Video Anomaly Detection
MM '17: Proceedings of the 25th ACM international conference on MultimediaAnomalous events detection in real-world video scenes is a challenging problem due to the complexity of "anomaly" as well as the cluttered backgrounds, objects and motions in the scenes. Most existing methods use hand-crafted features in local spatial ...
Generative Adversarial Networks for anomaly detection in aerial images
AbstractGenerative Adversarial Networks (GANs) are commonly used as a system able to perform unsupervised learning. We propose and demonstrate the use of a GAN architecture, known as the fast Anomaly Generative Adversarial Network (f-AnoGAN), to solve ...
Graphical abstractDisplay Omitted
Highlights- Identification of anomalies in aerial images with one-class training.
- The system automatically learns to discern unknown imagery.
- The selected architecture outperforms other one-class approaches that use generative adversarial ...
Future of generative adversarial networks (GAN) for anomaly detection in network security: A review
Highlights- Existing GAN models focus on data imbalance mitigation.
- GAN models have evolved to be more computationally efficient by incorporating auto-encoders.
- Mini-batch training is a key optimization strategy for GANs in real-time network ...
AbstractAnomaly detection is crucial in various applications, particularly cybersecurity and network intrusion. However, a common challenge across anomaly detection techniques is the scarcity of data that accurately represents abnormal behavior, as such ...
Comments