ABSTRACT
Many systems, such as distributed operating systems, complex networks, and high throughput web-based applications, are continuously generating large volume of event logs. These logs contain useful information to help system administrators to understand the system running status and to pinpoint the system failures. Generally, due to the scale and complexity of modern systems, the generated logs are beyond the analytic power of human beings. Therefore, it is imperative to develop a comprehensive log analysis system to support effective system management. Although a number of log mining techniques have been proposed to address specific log analysis use cases, few research and industrial efforts have been paid on providing integrated systems with an end-to-end solution to facilitate the log analysis routines.
In this paper, we design and implement an integrated system, called FIU Log Analysis Platform (a.k.a. FLAP), that aims to facilitate the data analytics for system event logs. FLAP provides an end-to-end solution that utilizes advanced data mining techniques to assist log analysts to conveniently, timely, and accurately conduct event log knowledge discovery, system status investigation, and system failure diagnosis. Specifically, in FLAP, state-of-the-art template learning techniques are used to extract useful information from unstructured raw logs; advanced data transformation techniques are proposed and leveraged for event transformation and storage; effective event pattern mining, event summarization, event querying, and failure prediction techniques are designed and integrated for log analytics; and user-friendly interfaces are utilized to present the informative analysis results intuitively and vividly. Since 2016, FLAP has been used by Huawei Technologies Co. Ltd for internal event log analysis, and has provided effective support in its system operation and workflow optimization.
Supplemental Material
- Amazon CloudWatch. http://aws.amazon.com/cloudwatch/.Google Scholar
- Scribe. https://github.com/facebookarchive/scribe.Google Scholar
- S.-H. Cha and S. N. Srihari. On measuring the distance between histograms. Pattern Recognition, 35(6):1355--1370, 2002. Google ScholarCross Ref
- O. Etzion and P. Niblett. Event processing in action. Manning Publications Co., 2010.Google ScholarDigital Library
- Z. Ge, J. Yates, L. Breslau, D. Pei, H. Yan, and D. Massey. Grca: A generic root cause analysis platform for service quality management in large isp networks. In ACM Conference on Emerging Networking Experiments and Technologies, 2010.Google Scholar
- P. D. Grünwald. The minimum description length principle. MIT press, 2007.Google ScholarCross Ref
- HP. HP Operations Analytics: a New Analytics Platform to Support the Transformation of IT. HP White Paper, 2013.Google Scholar
- IBM. Monitoring the ibm http server on z/os from the tivoli enterprise portal. IBM White Paper, 2013.Google Scholar
- Y. Jiang, C. Perng, and T. Li. META: multi-resolution framework for event summarization. In Proceedings of the 2014 SIAM International Conference on Data Mining, pages 605--613, 2014. Google ScholarCross Ref
- Y. Jiang, C.-S. Perng, and T. Li. Natural event summarization. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 765--774. ACM, 2011. Google ScholarDigital Library
- J. Kiernan and E. Terzi. Constructing comprehensive summaries of large event sequences. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(4):21, 2009. Google ScholarDigital Library
- T. Li. Event Mining: Algorithms and Applications, volume 38. CRC Press, 2015.Google ScholarDigital Library
- T. Li, C. Zeng, Y. Jiang, W. Zhou, L. Tang, Z. Liu, and Y. Huang. Data-driven Techniques in Computing System Management. ACM Computing Surveys, 2017. Google ScholarDigital Library
- T. Li and S. Ma. Mining temporal patterns without predefined time windows. In IEEE ICDM 2004, pages 451--454, 2004.Google Scholar
- Y. Liang, Y. Zhang, H. Xiong, and R. Sahoo. Failure prediction in ibm bluegene/l event logs. In IEEE ICDM 2007, pages 583--588, 2007.Google ScholarDigital Library
- J.-G. Lou, Q. Fu, .Y. Wang, and J. Li. Mining dependency in distributed systems through unstructured logs analysis. ACM SIGOPS Operating Systems Review, 44(1):91--96, 2010. Google ScholarDigital Library
- S. Ma and J. L. Hellerstein. Mining partially periodic event patterns with unknown periods. In IEEE ICDE 2001, pages 205--214. IEEE, 2001.Google Scholar
- M. L. Massie, B. N. Chun, and D. E. Culler. The Ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 30(7):817--840, 2004. Google ScholarCross Ref
- K. Nagaraj, C. Killian, and J. Neville. Structured comparative analysis of systems logs to diagnose performance problems. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pages 353--366, 2012.Google ScholarDigital Library
- S. Schneider, I. Beschastnikh, S. Chernyak, M. D. Ernst, and Y. Brun. Synoptic: Summarizing system logs with refinement. In SLAML, 2010.Google Scholar
- L. Tang and T. Li. Logtree: A framework for generating system events from raw textual logs. In IEEE ICDM 2010, pages 491--500, 2010. Google ScholarDigital Library
- L. Tang, T. Li, Y. Jiang, and Z. Chen. Dynamic query forms for database queries. IEEE Transactions on Knowledge and Data Engineering, 2014. Google ScholarCross Ref
- L. Tang, T. Li, and L. Shwartz. Discovering lag intervals for temporal dependencies. In ACM SIGKDD, pages 633--641, 2012. Google ScholarDigital Library
- N. Tatti and J. Vreeken. The long and the short of it: summarising event sequences with serial episodes. In ACM SIGKDD, pages 462--470, 2012. Google ScholarDigital Library
- P. Wang, H. Wang, M. Liu, and W. Wang. An algorithmic approach to event summarization. In ACM SIGMOD, pages 183--194, 2010. Google ScholarDigital Library
- E. Wu, Y. Diao, and S. Rizvi. High-performance complex event processing over streams. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 407--418. ACM, 2006. Google ScholarDigital Library
- W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 117--132. ACM, 2009. Google ScholarDigital Library
- C. Zeng, L. Tang, W. Zhou, T. Li, L. Shwartz, and G. Y. Grabarnik. An Integrated framework for Mining Temporal Logs from Fluctuating Events. IEEE Transactions on Services Computing, 2017.Google Scholar
Index Terms
- FLAP: An End-to-End Event Log Analysis Platform for System Management
Recommendations
Unsupervised Latent Aspect Discovery for Diverse Event Summarization
MM '15: Proceedings of the 23rd ACM international conference on MultimediaRecently, the fast growth of social media communities and mobile devices encourages more people to share their media data online than ever before. Analyzing data and summarizing data into useful information have become increasingly popular and important ...
Event summarization for system management
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data miningIn system management applications, an overwhelming amount of data are generated and collected in the form of temporal events. While mining temporal event data to discover interesting and frequent patterns has obtained rapidly increasing research efforts,...
Natural event summarization
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementEvent mining is a useful way to understand computer system behaviors. The focus of recent works on event mining has been shifted to event summarization from discovering frequent patterns. Event summarization seeks to provide a comprehensible explanation ...
Comments