skip to main content
10.1145/3394171.3413973acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events

Authors Info & Claims
Published:12 October 2020Publication History

ABSTRACT

As a vital topic in media content interpretation, video anomaly detection (VAD) has made fruitful progress via deep neural network (DNN). However, existing methods usually follow a reconstruction or frame prediction routine. They suffer from two gaps: (1) They cannot localize video activities in a both precise and comprehensive manner. (2) They lack sufficient abilities to utilize high-level semantics and temporal context information. Inspired by frequently-used cloze test in language study, we propose a brand-new VAD solution named Video Event Completion (VEC) to bridge gaps above: First, we propose a novel pipeline to achieve both precise and comprehensive enclosure of video activities. Appearance and motion are exploited as mutually complimentary cues to localize regions of interest (RoIs). A normalized spatio-temporal cube (STC) is built from each RoI as a video event, which lays the foundation of VEC and serves as a basic processing unit. Second, we encourage DNN to capture high-level semantics by solving a visual cloze test. To build such a visual cloze test, a certain patch of STC is erased to yield an incomplete event (IE). The DNN learns to restore the original video event from the IE by inferring the missing patch. Third, to incorporate richer motion dynamics, another DNN is trained to infer erased patches' optical flow. Finally, two ensemble strategies using different types of IE and modalities are proposed to boost VAD performance, so as to fully exploit the temporal context and modality information for VAD. VEC can consistently outperform state-of-the-art methods by a notable margin (typically 1.5%-5% AUROC) on commonly-used VAD benchmarks. Our codes and results can be verified at github.com/yuguangnudt/VEC_VAD

Skip Supplemental Material Section

Supplemental Material

3394171.3413973.mp4

mp4

25.4 MB

References

  1. Davide Abati, Angelo Porrello, Simone Calderara, and Rita Cucchiara. 2019. Latent space autoregression for novelty detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 481--490.Google ScholarGoogle ScholarCross RefCross Ref
  2. Borislav Antić and Björn Ommer. 2011. Video parsing for abnormality detection. In 2011 International Conference on Computer Vision. IEEE, 2415--2422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arslan Basharat, Alexei Gritai, and Mubarak Shah. 2008. Learning object motion patterns for anomaly detection and improved object detection. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  4. Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6154--6162.Google ScholarGoogle ScholarCross RefCross Ref
  5. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR), Vol. 41, 3 (2009), 15.Google ScholarGoogle Scholar
  6. Kai-Wen Cheng, Yie-Tarng Chen, and Wen-Hsien Fang. 2015. Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2909--2917.Google ScholarGoogle ScholarCross RefCross Ref
  7. Yang Cong, Junsong Yuan, and Ji Liu. 2011. Sparse reconstruction cost for abnormal event detection. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 3449--3456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.Google ScholarGoogle Scholar
  9. Thomas G Dietterich. 2000. Ensemble methods in machine learning. In International workshop on multiple classifier systems. Springer, 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. 2019. Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  11. Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury, and Larry S Davis. 2016. Learning temporal regularity in video sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition. 733--742.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ryota Hinami, Tao Mei, and Shin'ichi Satoh. 2017. Joint detection and recounting of abnormal events by learning deep generic knowledge. In Proceedings of the IEEE International Conference on Computer Vision. 3619--3627.Google ScholarGoogle ScholarCross RefCross Ref
  13. Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2462--2470.Google ScholarGoogle ScholarCross RefCross Ref
  14. Radu Tudor Ionescu, Fahad Shahbaz Khan, Mariana-Iuliana Georgescu, and Ling Shao. 2019. Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7842--7851.Google ScholarGoogle ScholarCross RefCross Ref
  15. Shehroz S Khan and Michael G Madden. 2014. One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review, Vol. 29, 3 (2014), 345--374.Google ScholarGoogle ScholarCross RefCross Ref
  16. Louis Kratz and Ko Nishino. 2009. Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1446--1453.Google ScholarGoogle ScholarCross RefCross Ref
  17. Long Lan, Xinchao Wang, Gang Hua, Thomas S Huang, and Dacheng Tao. 2020. Semi-online Multi-people Tracking by Re-identification. International Journal of Computer Vision (2020), 1--19.Google ScholarGoogle Scholar
  18. Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2016. Autoencoding beyond pixels using a learned similarity metric. In International Conference on Machine Learning. 1558--1566.Google ScholarGoogle Scholar
  19. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, Vol. 521, 7553 (2015), 436--444.Google ScholarGoogle Scholar
  20. Kun Liu and Huadong Ma. 2019. Exploring Background-bias for Anomaly Detection in Surveillance Videos. In Proceedings of the 27th ACM International Conference on Multimedia. 1490--1499.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future frame prediction for anomaly detection--a new baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6536--6545.Google ScholarGoogle ScholarCross RefCross Ref
  22. Cewu Lu, Jianping Shi, and Jiaya Jia. 2013. Abnormal event detection at 150 fps in matlab. In Proceedings of the IEEE international conference on computer vision. 2720--2727.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yiwei Lu, K Mahesh Kumar, Seyed shahabeddin Nabavi, and Yang Wang. 2019. Future Frame Prediction Using Convolutional VRNN for Anomaly Detection. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--8.Google ScholarGoogle Scholar
  24. Weixin Luo, Wen Liu, and Shenghua Gao. 2017a. Remembering history with convolutional lstm for anomaly detection. In 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 439--444.Google ScholarGoogle ScholarCross RefCross Ref
  25. Weixin Luo, Wen Liu, and Shenghua Gao. 2017b. A revisit of sparse coding based anomaly detection in stacked rnn framework. In Proceedings of the IEEE International Conference on Computer Vision. 341--349.Google ScholarGoogle ScholarCross RefCross Ref
  26. Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos. 2010. Anomaly detection in crowded scenes. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1975--1981.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ramin Mehran, Alexis Oyama, and Mubarak Shah. 2009. Abnormal crowd behavior detection using social force model. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20--25 June 2009, Miami, Florida, USA.Google ScholarGoogle ScholarCross RefCross Ref
  28. Romero Morais, Vuong Le, Truyen Tran, Budhaditya Saha, Moussa Mansour, and Svetha Venkatesh. 2019. Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11996--12004.Google ScholarGoogle ScholarCross RefCross Ref
  29. Trong-Nguyen Nguyen and Jean and Meunier. 2019. Anomaly Detection in Video Sequence with Appearance-Motion Correspondence. In Proceedings of the IEEE International Conference on Computer Vision. 1273--1283.Google ScholarGoogle ScholarCross RefCross Ref
  30. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024--8035.Google ScholarGoogle Scholar
  31. Claudio Piciarelli, Christian Micheloni, and Gian Luca Foresti. 2008. Trajectory-based anomalous event detection. IEEE Transactions on Circuits and Systems for video Technology, Vol. 18, 11 (2008), 1544--1554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mahdyar Ravanbakhsh, Moin Nabi, Enver Sangineto, Lucio Marcenaro, Carlo Regazzoni, and Nicu Sebe. 2017. Abnormal event detection in videos using generative adversarial nets. In 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 1577--1581.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google ScholarGoogle ScholarCross RefCross Ref
  34. Mohammad Sabokrou, Mohsen Fayyaz, Mahmood Fathy, Zahra Moayed, and Reinhard Klette. 2018a. Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Computer Vision and Image Understanding, Vol. 172 (2018), 88--97.Google ScholarGoogle ScholarCross RefCross Ref
  35. Mohammad Sabokrou, Mohammad Khalooei, Mahmood Fathy, and Ehsan Adeli. 2018b. Adversarially learned one-class classifier for novelty detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3379--3388.Google ScholarGoogle ScholarCross RefCross Ref
  36. Yao Tang, Lin Zhao, Shanshan Zhang, Chen Gong, Guangyu Li, and Jian Yang. 2020. Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, Vol. 129 (2020), 123--130.Google ScholarGoogle ScholarCross RefCross Ref
  37. Hanh TM Tran and David Hogg. 2017. Anomaly detection using a convolutional winner-take-all autoencoder. In Proceedings of the British Machine Vision Conference 2017. British Machine Vision Association.Google ScholarGoogle ScholarCross RefCross Ref
  38. Wikipedia. 2019. Cloze test. https://en.wikipedia.org/wiki/Cloze_test.Google ScholarGoogle Scholar
  39. Dan Xu, Elisa Ricci, Yan Yan, Jingkuan Song, and Nicu Sebe. 2015. Learning deep representations of appearance and motion for anomalous event detection. In Proceedings of the British Machine Vision Conference. 8.1--8.8.Google ScholarGoogle ScholarCross RefCross Ref
  40. Dan Xu, Yan Yan, Elisa Ricci, and Nicu Sebe. 2017. Detecting anomalous events in videos by learning deep representations of appearance and motion. Computer Vision and Image Understanding, Vol. 156 (2017), 117--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Shiyang Yan, Jeremy S Smith, Wenjin Lu, and Bailing Zhang. 2018. Abnormal Event Detection from Videos using a Two-stream Recurrent Variational Autoencoder. IEEE Transactions on Cognitive and Developmental Systems (2018).Google ScholarGoogle Scholar
  42. Muchao Ye, Xiaojiang Peng, Weihao Gan, Wei Wu, and Yu Qiao. 2019. AnoPCN: Video Anomaly Detection via Deep Predictive Coding Network. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, 1805--1813.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan. 2008. Sensor-based abnormal human-activity detection. IEEE Transactions on Knowledge and Data Engineering, Vol. 20, 8 (2008), 1082--1090.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tianzhu Zhang, Hanqing Lu, and Stan Z Li. 2009. Learning semantic scene models by object classification and trajectory clustering. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1940--1947.Google ScholarGoogle ScholarCross RefCross Ref
  45. Bin Zhao, Li Fei-Fei, and Eric P Xing. 2011. Online detection of unusual events in videos via dynamic sparse coding. In CVPR 2011. IEEE, 3313--3320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yiru Zhao, Bing Deng, Chen Shen, Yao Liu, Hongtao Lu, and Xian-Sheng Hua. 2017. Spatio-temporal autoencoder for video anomaly detection. In Proceedings of the 25th ACM international conference on Multimedia. ACM, 1933--1941.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Joey Tianyi Zhou, Jiawei Du, Hongyuan Zhu, Xi Peng, Yong Liu, and Rick Siow Mong Goh. 2019 a. AnomalyNet: An anomaly detection network for video surveillance. IEEE Transactions on Information Forensics and Security (2019).Google ScholarGoogle Scholar
  48. Joey Tianyi Zhou, Le Zhang, Zhiwen Fang, Jiawei Du, Xi Peng, and Xiao Yang. 2019 b. Attention-Driven Loss for Anomaly Detection in Video Surveillance. IEEE Transactions on Circuits and Systems for Video Technology (2019).Google ScholarGoogle Scholar

Index Terms

  1. Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '20: Proceedings of the 28th ACM International Conference on Multimedia
        October 2020
        4889 pages
        ISBN:9781450379885
        DOI:10.1145/3394171

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 October 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader