ABSTRACT
As the size of software becomes larger and more complex, finding the cause of defects becomes increasingly difficult. Moreover, it is hard to reproduce defects when many components such as processes in platform environment or devices in IoT environment are involved. In this case, analyzing logs are the only way to get debugging insights, but manual log analysis is highly labor intensive work. In this paper, we propose a new log analysis system called historian which runs based on history of test logs. Our system first computes importance and noise scores of each log line by using statistical text mining techniques, and then highlights abnormal log lines based on computed scores for providing debugging insights. We applied historian to Tizen Native API test logs, and our system highlighted only about 4% log lines in average. We also provided highlighted failed logs to Tizen developers and the developers said that failure related log lines were highlighted well. These experimental results show that our system effectively highlights abnormal log lines and provides debugging insights to developers.
- 2019. Elastic Search. https://www.elastic.co.Google Scholar
- 2019. Splunk. https://www.splunk.com.Google Scholar
- 2019. Sumo-Logic. https://www.sumologic.com.Google Scholar
- 2019. Tizen. https://www.tizen.org/.Google Scholar
- 2019. Tizen Compliance Tests. https://source.tizen.org/compliance/compliance-tests.Google Scholar
- Anunay Amar and Peter C. Rigby. 2019. Mining Historical Test Logs to Predict Bugs and Localize Faults in the Test Logs. In Proceedings of the 41st International Conference on Software Engineering (Montreal, Quebec, Canada) (ICSE '19). IEEE Press, Piscataway, NJ, USA, 140--151. Google ScholarDigital Library
- J. H. Andrews and Yingjun Zhang. 2003. General test result checking with log file analysis. IEEE Transactions on Software Engineering 29, 7 (July 2003), 634--648. Google ScholarDigital Library
- Howard Barringer, Alex Groce, Klaus Havelund, and Margaret Smith. 2010. Formal Analysis of Log Files. Journal of Aerospace Computing, Information, and Communication 7, 11 (2010), 365--390. arXiv:https://doi.org/10.2514/1.49356 Google ScholarCross Ref
- Diego Castro and Marcelo Schots. 2018. Analysis of Test Log Information Through Interactive Visualizations. In Proceedings of the 26th Conference on Program Comprehension (Gothenburg, Sweden) (ICPC '18). ACM, New York, NY, USA, 156--166. Google ScholarDigital Library
- Chinghway Lim, N. Singh, and S. Yajnik. 2008. A log mining approach to failure analysis of enterprise telephony systems. In 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). 398--403. Google ScholarCross Ref
- Song Fu and Cheng-Zhong Xu. 2007. Quantifying Temporal and Spatial Correlation of Failure Events for Proactive Management. In Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems (SRDS '07). IEEE Computer Society, Washington, DC, USA, 175--184. http://dl.acm.org/citation.cfm?id=1308172.1308233Google ScholarDigital Library
- A. Groce, K. Havelund, and M. Smith. 2010. From scripts to specifications: the evolution of a flight software testing effort. In 2010 ACM/IEEE 32nd International Conference on Software Engineering, Vol. 2. 129--138. Google ScholarDigital Library
- Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, and Abdullah Mueen. 2016. LogMine: Fast Pattern Recognition for Log Analytics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM '16). ACM, New York, NY, USA, 1573--1582. Google ScholarDigital Library
- Stefan Heule, Marc Nunkesser, and Alexander Hall. 2013. HyperLogLog in Practice: Algorithmic Engineering of a State of the Art Cardinality Estimation Algorithm. In Proceedings of the 16th International Conference on Extending Database Technology (Genoa, Italy) (EDBT '13). ACM, New York, NY, USA, 683--692. Google ScholarDigital Library
- D. M. Himmelblau. 1968. Process analysis by statistical methods. John Wiley & Sons, New York, NY, USA. 71--72 pages. Google ScholarCross Ref
- He Jiang, Xiaochen Li, Zijiang Yang, and Jifeng Xuan. 2017. What Causes My Test Alarm?: Automatic Cause Analysis for Test Alarms in System and Integration Testing. In Proceedings of the 39th International Conference on Software Engineering (ICSE '17). IEEE Press, Piscataway, NJ, USA, 712--723. Google ScholarDigital Library
- Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets (2nd ed.). Cambridge University Press, New York, NY, USA.Google Scholar
- Joseph Berkson M.D. 1944. Application of the Logistic Function to Bio-Assay. J. Amer. Statist. Assoc. 39, 227 (1944), 357--365. arXiv:https://doi.org/10.1080/01621459.1944.10500699 Google ScholarCross Ref
- G. Ramachandran and J. Ranganathan. 1953. J. Madras Univ. Sect. (1953), 76.Google Scholar
- John P. Rouillard. 2004. Refereed Papers: Real-time Log File Analysis Using the Simple Event Correlator (SEC). In Proceedings of the 18th USENIX Conference on System Administration (Atlanta, GA) (LISA '04). USENIX Association, Berkeley, CA, USA, 133--150. http://dl.acm.org/citation.cfm?id=1052676.1052694Google ScholarDigital Library
- Felix Salfner and Steffen Tschirpke. 2008. Error Log Processing for Accurate Failure Prediction. In Proceedings of the First USENIX Conference on Analysis of System Logs (WASL '08). USENIX Association, Berkeley, CA, USA, 4--4. http://dl.acm.org/citation.cfm?id=1855886.1855890Google ScholarDigital Library
- D. Tu, R. Chen, Z. Du, and Y. Liu. 2009. A Method of Log File Analysis for Test Oracle. In 2009 International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing. 351--354. Google ScholarDigital Library
- Z. Zheng, Z. Lan, B. H. Park, and A. Geist. 2009. System log pre-processing to improve failure prediction. In 2009 IEEE/IFIP International Conference on Dependable Systems Networks. 572--577.Google Scholar
Index Terms
- Automatic abnormal log detection by analyzing log history for providing debugging insight
Recommendations
Robust log-based anomaly detection on unstable log data
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringLogs are widely used by large and complex software-intensive systems for troubleshooting. There have been a lot of studies on log-based anomaly detection. To detect the anomalies, the existing methods mainly construct a detection model using log event ...
Fully dynamic connectivity in O(log n(log log n)2) amortized expected time
SODA '17: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete AlgorithmsDynamic connectivity is one of the most fundamental problems in dynamic graph algorithms. We present a new randomized dynamic connectivity structure with O(log n(log log n)2) amortized expected update time and O(log n/log log log n) query time, which ...
Log-based anomaly detection without log parsing
ASE '21: Proceedings of the 36th IEEE/ACM International Conference on Automated Software EngineeringSoftware systems often record important runtime information in system logs for troubleshooting purposes. There have been many studies that use log data to construct machine learning models for detecting system anomalies. Through our empirical study, we ...
Comments