skip to main content
10.1145/3133956.3134015acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning

Published:30 October 2017Publication History

ABSTRACT

Anomaly detection is a critical step towards building a secure and trustworthy system. The primary purpose of a system log is to record system states and significant events at various critical points to help debug system failures and perform root cause analysis. Such log data is universally available in nearly all computer systems. Log data is an important and valuable resource for understanding system status and performance issues; therefore, the various system logs are naturally excellent source of information for online monitoring and anomaly detection. We propose DeepLog, a deep neural network model utilizing Long Short-Term Memory (LSTM), to model a system log as a natural language sequence. This allows DeepLog to automatically learn log patterns from normal execution, and detect anomalies when log patterns deviate from the model trained from log data under normal execution. In addition, we demonstrate how to incrementally update the DeepLog model in an online fashion so that it can adapt to new log patterns over time. Furthermore, DeepLog constructs workflows from the underlying system log so that once an anomaly is detected, users can diagnose the detected anomaly and perform root cause analysis effectively. Extensive experimental evaluations over large log data have shown that DeepLog has outperformed other existing log-based anomaly detection methods based on traditional data mining methodologies.

Skip Supplemental Material Section

Supplemental Material

References

  1. VAST Challenge 2011. 2011. MC2 - Computer Networking Operations. (2011). http://hcil2.cs.umd.edu/newvarepository/VAST%20Challenge%202011/challenges/MC2%20-%20Computer%20Networking%20Operations/ [Online; accessed 08-May-2017].Google ScholarGoogle Scholar
  2. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. 2016 TensorFlow: A system for large-scale machine learning Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI). 264--285.Google ScholarGoogle Scholar
  3. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin 2003. A neural probabilistic language model. Journal of machine learning research Vol. 3, Feb (2003), 1137--1155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ivan Beschastnikh, Yuriy Brun, Michael D Ernst, and Arvind Krishnamurthy 2014. Inferring models of concurrent systems from logs of their behavior with CSight Proc. International Conference on Software Engineering (ICSE ). 468--479.Google ScholarGoogle Scholar
  5. Andrea Bittau, Adam Belay, Ali Mashtizadeh, David Mazières, and Dan Boneh. 2014. Hacking blind Security and Privacy (SP), 2014 IEEE Symposium on. IEEE, 227--242.Google ScholarGoogle Scholar
  6. François Chollet. 2015. keras. https://github.com/fchollet/keras. (2015). [Online; accessed 08-May-2017].Google ScholarGoogle Scholar
  7. Marcello Cinque, Domenico Cotroneo, and Antonio Pecchia. 2013. Event logs for the analysis of software failures: A rule-based approach. IEEE Transactions on Software Engineering (TSE) (2013), 806--821. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andrew M Dai and Quoc V Le 2015. Semi-supervised sequence learning. In Proc. Neural Information Processing Systems Conference (NIPS). 3079--3087.Google ScholarGoogle Scholar
  9. Min Du and Feifei Li. 2016. Spell: Streaming Parsing of System Event Logs. In Proc. IEEE International Conference on Data Mining (ICDM). 859--864. Google ScholarGoogle ScholarCross RefCross Ref
  10. Min Du and Feifei Li. 2017. ATOM: Efficient Tracking, Monitoring, and Orchestration of Cloud Resources. IEEE Transactions on Parallel and Distributed Systems (2017).Google ScholarGoogle Scholar
  11. Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li. 2009. Execution anomaly detection in distributed systems through unstructured log analysis Proc. IEEE International Conference on Data Mining (ICDM). 149--158.Google ScholarGoogle Scholar
  12. Yoav Goldberg. 2016. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research Vol. 57 (2016), 345--420.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, and Abdullah Mueen. 2016. LogMine: Fast Pattern Recognition for Log Analytics Proc. Conference on Information and Knowledge Management (CIKM). 1573--1582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Stephen E Hansen and E Todd Atkins 1993. Automated System Monitoring and Notification with Swatch. Proc. Large Installation System Administration Conference (LISA). 145--152.Google ScholarGoogle Scholar
  16. Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R Lyu 2016. An evaluation study on log parsing and its use in log mining Proc. International Conference on Dependable Systems and Networks (DSN). 654--661.Google ScholarGoogle Scholar
  17. Shilin He, Jieming Zhu, Pinjia He, and Michael R Lyu. 2016. Experience Report: System Log Analysis for Anomaly Detection Proc. International Symposium on Software Reliability Engineering (ISSRE). 207--218. Google ScholarGoogle ScholarCross RefCross Ref
  18. Sepp Hochreiter and Jürgen Schmidhuber 1997. Long short-term memory. Neural computation (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, and Xuewei Chen 2016. Log clustering based problem identification for online service systems Proc. International Conference on Software Engineering (ICSE ). 102--111.Google ScholarGoogle Scholar
  20. Chaochun Liu, Huan Sun, Nan Du, Shulong Tan, Hongliang Fei, Wei Fan, Tao Yang, Hao Wu, Yaliang Li, and Chenwei Zhang. 2016. Augmented LS™ Framework to Construct Medical Self-diagnosis Android Proc. IEEE International Conference on Data Mining (ICDM). 251--260.Google ScholarGoogle Scholar
  21. Jian-Guang Lou, Qiang Fu, Shengqi Yang, Jiang Li, and Bin Wu 2010. Mining program workflow from interleaved traces. Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, and Jiang Li 2010. Mining Invariants from Console Logs for System Problem Detection. Proc. USENIX Annual Technical Conference (ATC). 231--244.Google ScholarGoogle Scholar
  23. Adetokunbo AO Makanju, A Nur Zincir-Heywood, and Evangelos E Milios 2009. Clustering event logs using iterative partitioning Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). 1255--1264.Google ScholarGoogle Scholar
  24. Christopher D Manning and Hinrich Schütze 1999. Foundations of statistical natural language processing. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model.. In Interspeech, Vol. Vol. 2. 3.Google ScholarGoogle ScholarCross RefCross Ref
  26. Karthik Nagaraj, Charles Killian, and Jennifer Neville. 2012. Structured comparative analysis of systems logs to diagnose performance problems Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI). 26--26.Google ScholarGoogle Scholar
  27. Christopher Olah. 2015. Understanding LS™ Networks. (2015). http://colah.github.io/posts/2015-08-Understanding-LSTMsshownote[Online; accessed 16-May-2017].Google ScholarGoogle Scholar
  28. Alina Oprea, Zhou Li, Ting-Fang Yen, Sang H Chin, and Sumayah Alrwais 2015. Detection of early-stage enterprise infection by mining large-scale log data Proc. International Conference on Dependable Systems and Networks (DSN). 45--56.Google ScholarGoogle Scholar
  29. James E Prewett. 2003. Analyzing cluster log files using Logsurfer. In Proc. Annual Conference on Linux Clusters.Google ScholarGoogle Scholar
  30. Robert Ricci, Eric Eide, and The CloudLab Team. 2014. Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications. USENIX ;login:, Vol. 39, 6 (Dec. 2014). https://www.usenix.org/publications/login/dec14/ricciGoogle ScholarGoogle Scholar
  31. John P Rouillard. 2004. Real-time Log File Analysis Using the Simple Event Correlator (SEC). Proc. Large Installation System Administration Conference (LISA). 133--150.Google ScholarGoogle Scholar
  32. Sudip Roy, Arnd Christian König, Igor Dvorkin, and Manish Kumar 2015. Perfaugur: Robust diagnostics for performance anomalies in cloud services Proc. IEEE International Conference on Data Engineering (ICDE). IEEE, 1167--1178. Google ScholarGoogle ScholarCross RefCross Ref
  33. Elastic Stack. 2017. The Open Source Elastic Stack. (2017). https://www.elastic.co/products[Online; accessed 16-May-2017].Google ScholarGoogle Scholar
  34. Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM Neural Networks for Language Modeling.. In Interspeech. 194--197.Google ScholarGoogle Scholar
  35. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks Proc. Neural Information Processing Systems Conference (NIPS). 3104--3112.Google ScholarGoogle Scholar
  36. Liang Tang and Tao Li. 2010. LogTree: A framework for generating system events from raw textual logs Proc. IEEE International Conference on Data Mining (ICDM). 491--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Liang Tang, Tao Li, and Chang-Shing Perng 2011. LogSig: Generating system events from raw textual logs Proc. Conference on Information and Knowledge Management (CIKM). 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan 2009. Online system problem detection by mining patterns of console logs Proc. IEEE International Conference on Data Mining (ICDM). 588--597.Google ScholarGoogle Scholar
  39. Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael I Jordan 2009. Detecting large-scale system problems by mining console logs Proc. ACM Symposium on Operating Systems Principles (SOSP). 117--132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Kenji Yamanishi and Yuko Maruyama 2015. Dynamic syslog mining for network failure monitoring Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). 499--508.Google ScholarGoogle Scholar
  41. Ting-Fang Yen, Alina Oprea, Kaan Onarlioglu, Todd Leetham, William Robertson, Ari Juels, and Engin Kirda 2013. Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks Proc. International Conference on Dependable Systems and Networks (ACSAC). 199--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiao Yu, Pallavi Joshi, Jianwu Xu, Guoliang Jin, Hui Zhang, and Guofei Jiang. 2016. CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs Proc. ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 489--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ding Yuan, Haohui Mai, Weiwei Xiong, Lin Tan, Yuanyuan Zhou, and Shankar Pasupathy. 2010. SherLog: error diagnosis by connecting clues from run-time logs ACM SIGARCH computer architecture news. ACM, 143--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ke Zhang, Jianwu Xu, Martin Renqiang Min, Guofei Jiang, Konstantinos Pelechrinis, and Hui Zhang 2016. Automated IT system failure prediction: A deep learning approach Proc. IEEE International Conference on Big Data (IEEE BigData). 1291--1300.Google ScholarGoogle ScholarCross RefCross Ref
  45. Xu Zhao, Kirk Rodrigues, Yu Luo, Ding Yuan, and Michael Stumm 2016. Non-intrusive performance profiling for entire software stacks based on the flow reconstruction principle. In Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI). 603--618.Google ScholarGoogle Scholar

Index Terms

  1. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
        October 2017
        2682 pages
        ISBN:9781450349468
        DOI:10.1145/3133956

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 October 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CCS '17 Paper Acceptance Rate151of836submissions,18%Overall Acceptance Rate1,261of6,999submissions,18%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader