skip to main content
10.1145/2939672.2939712acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Anomaly Detection Using Program Control Flow Graph Mining From Execution Logs

Published:13 August 2016Publication History

ABSTRACT

We focus on the problem of detecting anomalous run-time behavior of distributed applications from their execution logs. Specifically we mine templates and template sequences from logs to form a control flow graph (cfg) spanning distributed components. This cfg represents the baseline healthy system state and is used to flag deviations from the expected behavior of runtime logs. The novelty in our work stems from the new techniques employed to: (1) overcome the instrumentation requirements or application specific assumptions made in prior log mining approaches, (2) improve the accuracy of mined templates and the cfg in the presence of long parameters and high amount of interleaving respectively, and (3) improve by orders of magnitude the scalability of the cfg mining process in terms of volume of log data that can be processed per day. We evaluate our approach using (a) synthetic log traces and (b) multiple real-world log datasets collected at different layers of application stack. Results demonstrate that our template mining, cfg mining, and anomaly detection algorithms have high accuracy. The distributed implementation of our pipeline is highly scalable and has more than 500 GB/day of log data processing capability even on a 10 low-end VM based (Spark + Hadoop) cluster. We also demonstrate the efficacy of our end-to-end system using a case study with the Openstack VM provisioning system.

Skip Supplemental Material Section

Supplemental Material

kdd2016_nandi_graph_mining_01-acm.mp4

mp4

435.4 MB

References

  1. B. Abrahao, F. ChieriChetti, R. Kleinberg, and A. Panconesi. Trace complexity of network inference. In KDD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Appdynamics. www.appdynamics.com.Google ScholarGoogle Scholar
  3. I. Beschastnikh, Y. Brun, M. D. Ernst, A. Krishnamurthy, and T. E. Anderson. Mining temporal invariants from partially ordered logs. In SLAML, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Bosnacki, W. Lightenberg, M. Odenbrett, A. Wijs, and P. Hilbers. Parallel algorithms for transitive reduction of weighted graphs. In Math Maced, 2010.Google ScholarGoogle Scholar
  5. E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. In TKDE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. V. der Aalst, T. Weijters, and L. Maruster. Workflow mining: Discovering process models from event logs. In TKDE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Elasticsearch: Search and analyze data in real time. https://www.elastic.co/products/elasticsearch.Google ScholarGoogle Scholar
  8. D. R. Ferreira and D. Gillblad. Discovering process models from unlabelled event logs. In BPM, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Q. Fu, J.-G. Lou, Y. Wang, and J. Li. Execution anomaly detection in distributed systems through unstructured log analysis. In ICDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and influence. In KDD, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. W. Gunther and W. M. van der Aalst. Fuzzy mining - adaptive process simplification based on multi-perspective metrics. In BPM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Indyk and R. Motwani. Approximate nearest neighbor - towards removing the curse of dimensionality. In STOC, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. Ju, L. Soares, K. G. Shin, K. D. Ryu, and D. D. Silva. On fault resilience of openstack. In SOCC, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kafka: A high-throughput distributed messaging system. http://kafka.apache.org.Google ScholarGoogle Scholar
  15. T. Li, F. Liang, S. Ma, and W. Peng. An integrated framework on mining log files for computing system management. In KDD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Logstash: Collect, enrich and transform data. https://www.elastic.co/products/logstash.Google ScholarGoogle Scholar
  17. J.-G. Lou, Q. Fu, Y. Wang, and J. Li. Mining dependency in distributed systems through unstructured logs analysis. In SIGOPS Operation Systems Review, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J.-G. Lou, Q. Fu, S. Yang, J. Li, and B. Wu. Mining program workflow from interleaved traces. In KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Makanju, A. Z. Heywood, and E. E. Milios. Clustering event logs using iterative partitioning. In KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. In DMKD, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Openstack: Open-source software for creating public and private clouds. www.openstack.org.Google ScholarGoogle Scholar
  22. J. Pei, J. Han, B. Mortazavi, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. Mining sequential patterns efficiently by prefix-projected pattern growth. In ICDE, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Reidimeister, M. Jiang, and P. A. Ward. Mining unstructured log files for recurrent fault diagnosis. In IM, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  24. Spark: Lightning-fast cluster computing. http://spark.apache.org.Google ScholarGoogle Scholar
  25. J. Stearley. Towards informatic analysis of syslogs. In Cluster, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Vaarandi. A data clustering algorithm for mining patterns from event logs. In IPOM, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  27. R. Vaarandi. A breadth first algorithm for mining frequent patterns from event logs. In Intell. Comm., 2004.Google ScholarGoogle ScholarCross RefCross Ref
  28. R. Vaarandi and M. Pihelgas. Logcluster - a data clustering and pattern mining algorithm for event logs. In CNSM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Wu, Y. Chen, and J. Han. Association mining in large datasets: A re-examination of its measures. In PKDD, 2009.Google ScholarGoogle Scholar
  30. W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordan. Online system problem detection by mining patterns of console logs. In ICDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordan. Detecting large scale system problems by mining console logs. In ICML, 2010.Google ScholarGoogle Scholar

Index Terms

  1. Anomaly Detection Using Program Control Flow Graph Mining From Execution Logs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
            August 2016
            2176 pages
            ISBN:9781450342322
            DOI:10.1145/2939672

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 August 2016

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader