research-article

Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection

Authors:
Xinyang Feng

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Dongjin Song

University of Connecticut, Storrs, CT, USA

University of Connecticut, Storrs, CT, USA
View Profile

,
Yuncong Chen

NEC Laboratories America, Inc., Princeton, NJ, USA

NEC Laboratories America, Inc., Princeton, NJ, USA
View Profile

,
Zhengzhang Chen

NEC Laboratories America, Inc., Princeton, NJ, USA

NEC Laboratories America, Inc., Princeton, NJ, USA
View Profile

,
Jingchao Ni

NEC Laboratories America, Inc., Princeton, NJ, USA

NEC Laboratories America, Inc., Princeton, NJ, USA
View Profile

,
Haifeng Chen

NEC Laboratories America, Inc., Princeton, NJ, USA

NEC Laboratories America, Inc., Princeton, NJ, USA
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 5546–5554https://doi.org/10.1145/3474085.3475693

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 5546–5554

ABSTRACT

Detecting abnormal activities in real-world surveillance videos is an important yet challenging task as the prior knowledge about video anomalies is usually limited or unavailable. Despite that many approaches have been developed to resolve this problem, few of them can capture the normal spatio-temporal patterns effectively and efficiently. Moreover, existing works seldom explicitly consider the local consistency at frame level and global coherence of temporal dynamics in video sequences. To this end, we propose Convolutional Transformer based Dual Discriminator Generative Adversarial Networks (CT-D2GAN) to perform unsupervised video anomaly detection. Specifically, we first present a convolutional transformer to perform future frame prediction. It contains three key components, i.e., a convolutional encoder to capture the spatial information of the input video clips, a temporal self-attention module to encode the temporal dynamics, and a convolutional decoder to integrate spatio-temporal features and predict the future frame. Next, a dual discriminator based adversarial training procedure, which jointly considers an image discriminator that can maintain the local consistency at frame-level and a video discriminator that can enforce the global coherence of temporal dynamics, is employed to enhance the future frame prediction. Finally, the prediction error is used to identify abnormal video frames. Thoroughly empirical studies on three public video anomaly detection datasets, i.e., UCSD Ped2, CUHK Avenue, and Shanghai Tech Campus, demonstrate the effectiveness of the proposed adversarial spatio-temporal modeling framework.

Supplemental Material

ct-d2gan-presentation.mp4

mp4

19 MB

Download

References

Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning (ICML). PMLR, 214--223. Google ScholarDigital Library
Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European Conference on Computer Vision (ECCV). Springer, 25--36.Google ScholarCross Ref
Yunpeng Chang, Zhigang Tu, Wei Xie, and Junsong Yuan. 2020. Clustering Driven Deep Autoencoder for Video Anomaly Detection. In European Conference on Computer Vision (ECCV). Springer, 329--345.Google Scholar
Yong Shean Chong and Yong Haur Tay. 2017. Abnormal event detection in videos using spatiotemporal autoencoder. In International Symposium on Neural Networks (ISNN). Springer, 189--196.Google ScholarCross Ref
Fei Dong, Yu Zhang, and Xiushan Nie. 2020. Dual Discriminator Generative Adversarial Network for Video Anomaly Detection. IEEE Access, Vol. 8 (2020), 88170--88176.Google ScholarCross Ref
Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. 2019. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In IEEE International Conference on Computer Vision (ICCV). IEEE, 1705--1714.Google Scholar
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems (NIPS). 5767--5777. Google ScholarDigital Library
Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury, and Larry S Davis. 2016. Learning temporal regularity in video sequences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 733--742.Google ScholarCross Ref
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2462--2470.Google ScholarCross Ref
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5967--5976.Google Scholar
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR) (2015).Google Scholar
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. In Advances in Neural Information Processing Systems (NIPS). 971--980. Google ScholarDigital Library
Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2014. Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 1 (2014), 18--32. Google ScholarDigital Library
Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future Frame Prediction for Anomaly Detection -- A New Baseline. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 6536--6545.Google ScholarCross Ref
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3431--3440.Google ScholarCross Ref
Cewu Lu, Jianping Shi, and Jiaya Jia. 2013. Abnormal event detection at 150 FPS in MATLAB. In IEEE International Conference on Computer Vision (ICCV). IEEE, 2720--2727. Google ScholarDigital Library
Weixin Luo, Wen Liu, and Shenghua Gao. 2017a. Remembering history with convolutional LS™ for anomaly detection. In IEEE International Conference on Multimedia and Expo (ICME). IEEE, 439--444.Google Scholar
Weixin Luo, Wen Liu, and Shenghua Gao. 2017b. A revisit of sparse coding based anomaly detection in stacked RNN framework. IEEE International Conference on Computer Vision (ICCV), Vol. 1, 2 (2017), 3.Google ScholarCross Ref
Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos. 2010. Anomaly detection in crowded scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1975--1981.Google ScholarCross Ref
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In International Conference on Learning Representations (ICLR).Google Scholar
Tu Nguyen, Trung Le, Hung Vu, and Dinh Phung. 2017. Dual discriminator generative adversarial nets. In Advances in neural information processing systems (NIPS). 2670--2680. Google ScholarDigital Library
Hyunjong Park, Jongyoun Noh, and Bumsub Ham. 2020. Learning Memory-guided Normality for Anomaly Detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 14372--14381.Google Scholar
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image Transformer. In International Conference on Machine Learning (ICML). PMLR, 4052--4061.Google Scholar
Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison W. Cottrell. 2017. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In International Joint Conference on Artificial Intelligence (IJCAI). 2627--26332. Google ScholarDigital Library
Mahdyar Ravanbakhsh, Moin Nabi, Enver Sangineto, Lucio Marcenaro, Carlo Regazzoni, and Nicu Sebe. 2017. Abnormal Event Detection in Videos using Generative Adversarial Nets. IEEE International Conference on Image Processing (ICIP) (2017), 1577--1581.Google ScholarDigital Library
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, 234--241.Google ScholarCross Ref
Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LS™ network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems (NIPS). 802--810. Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems (NIPS). 568--576. Google ScholarDigital Library
Dongjin Song and Dacheng Tao. 2010. Biologically Inspired Feature Manifold for Scene Classification. IEEE Transactions on Image Processing, Vol. 19, 1 (2010), 174--184. Google ScholarDigital Library
Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 6479--6488.Google Scholar
Yao Tang, Lin Zhao, Shanshan Zhang, Chen Gong, Guangyu Li, and Jian Yang. 2020. Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, Vol. 129 (2020), 123--130.Google ScholarCross Ref
Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. MoCoGAN: Decomposing Motion and Content for Video Generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1526--1535.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS). 6000--6010. Google ScholarDigital Library
Dan Xu, Yan Yan, Elisa Ricci, and Nicu Sebe. 2017. Detecting anomalous events in videos by learning deep representations of appearance and motion. Computer Vision and Image Understanding, Vol. 156 (2017), 117--127. Google ScholarDigital Library
Han Xu, Pengwei Liang, Wei Yu, Junjun Jiang, and Jiayi Ma. 2019. Learning a Generative Model for Fusing Infrared and Visible Images via Conditional Generative Adversarial Network with Dual Discriminators.. In International Joint Conference on Artificial Intelligence (IJCAI). 3954--3960. Google ScholarDigital Library
Fisher Yu and Vladlen Koltun. 2016. Multi-scale context aggregation by dilated convolutions. International Conference on Learning Representations (ICLR) (2016).Google Scholar
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5505--5514.Google Scholar
Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, and V. Nitesh Chawla. 2019 b. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. In Association for the Advancement of Artificial Intelligence (AAAI). AAAI, 1409--1416.Google Scholar
Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019 a. Self-attention generative adversarial networks. In International Conference on Machine Learning (ICML). PMLR, 7354--7363.Google Scholar
Yiru Zhao, Bing Deng, Chen Shen, Yao Liu, Hongtao Lu, and Xian-Sheng Hua. 2017. Spatio-Temporal AutoEncoder for Video Anomaly Detection. In ACM International Conference on Multimedia (ACM MM). ACM, 1933--1941. Google ScholarDigital Library

Index Terms

Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene anomaly detection
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Adversarial learning
      2. Unsupervised learning
        Anomaly detection
    2. Machine learning approaches
      1. Neural networks

Recommendations

Spatio-Temporal AutoEncoder for Video Anomaly Detection
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Anomalous events detection in real-world video scenes is a challenging problem due to the complexity of "anomaly" as well as the cluttered backgrounds, objects and motions in the scenes. Most existing methods use hand-crafted features in local spatial ...
Read More
Generative Adversarial Networks for anomaly detection in aerial images
Abstract
Generative Adversarial Networks (GANs) are commonly used as a system able to perform unsupervised learning. We propose and demonstrate the use of a GAN architecture, known as the fast Anomaly Generative Adversarial Network (f-AnoGAN), to solve ...
Graphical abstract

Display Omitted
Highlights
- Identification of anomalies in aerial images with one-class training.
- The system automatically learns to discern unknown imagery.
- The selected architecture outperforms other one-class approaches that use generative adversarial ...
Read More
Future of generative adversarial networks (GAN) for anomaly detection in network security: A review
Highlights
- Existing GAN models focus on data imbalance mitigation.
- GAN models have evolved to be more computationally efficient by incorporating auto-encoders.
- Mini-batch training is a key optimization strategy for GANs in real-time network ...
Abstract
Anomaly detection is crucial in various applications, particularly cybersecurity and network intrusion. However, a common challenge across anomaly detection techniques is the scarcity of data that accurately represents abnormal behavior, as such ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convolutional neural network
generative adversarial networks
spatio-temporal modeling
transformer model
video anomaly detection
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 615
  Total Downloads
- Downloads (Last 12 months)167
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Spatio-Temporal AutoEncoder for Video Anomaly Detection

Generative Adversarial Networks for anomaly detection in aerial images

Future of generative adversarial networks (GAN) for anomaly detection in network security: A review