Skip to main content
Top

2021 | OriginalPaper | Chapter

Multi-source Log Clustering in Distributed Systems

Authors : Jackson Raffety, Brooklynn Stone, Jan Svacina, Connor Woodahl, Tomas Cerny, Pavel Tisnovsky

Published in: Information Science and Applications

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Distributed systems are seeing wider use as software becomes more complex and cloud systems increase in popularity. Preforming anomaly detection and other log analysis procedures on distributed systems have not seen much research. To this end, we propose a simple and generic method of clustering log statements from separate log files to perform future log analysis. We identify variable components of log statements and find matches of these variables between the sources. After scoring the variables, we select the one with the highest score to be the clustering basis. We performed a case study of our method on the two open-source projects, to which we found success in the results of our method and created an open-source project log-matcher.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aharon M, Barash G, Cohen I, Mordechai E (2009) One graph is worth a thousand logs: uncovering hidden structures in massive system event logs. In: Machine learning and knowledge discovery in databases. Springer, Berlin Heidelberg, pp 227–243 Aharon M, Barash G, Cohen I, Mordechai E (2009) One graph is worth a thousand logs: uncovering hidden structures in massive system event logs. In: Machine learning and knowledge discovery in databases. Springer, Berlin Heidelberg, pp 227–243
3.
go back to reference Chuah E, Kuo S, Hiew P, Tjhi W, Lee G, Hammond J, Michalewicz MT, Hung T, Browne JC (2010) Diagnosing the root-causes of failures from cluster log files. In: International conference on high performance computing, pp 1–10 Chuah E, Kuo S, Hiew P, Tjhi W, Lee G, Hammond J, Michalewicz MT, Hung T, Browne JC (2010) Diagnosing the root-causes of failures from cluster log files. In: International conference on high performance computing, pp 1–10
4.
go back to reference Debnath B, Solaimani M, Gulzar MAG, Arora N, Lumezanu C, Xu J, Zong B, Zhang H, Jiang G, Khan L (2018) Loglens: a real-time log analysis system. In: 2018 IEEE 38th international conference on distributed computing systems (ICDCS), pp 1052–1062 Debnath B, Solaimani M, Gulzar MAG, Arora N, Lumezanu C, Xu J, Zong B, Zhang H, Jiang G, Khan L (2018) Loglens: a real-time log analysis system. In: 2018 IEEE 38th international conference on distributed computing systems (ICDCS), pp 1052–1062
5.
go back to reference He P, Zhu J, Zheng Z, Lyu MR (2017) Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE international conference on web services (ICWS), pp 33–40 He P, Zhu J, Zheng Z, Lyu MR (2017) Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE international conference on web services (ICWS), pp 33–40
6.
go back to reference Jia Z, Shen C, Yi X, Chen Y, Yu T, Guan X (2017) Big-data analysis of multi-source logs for anomaly detection on network-based system. In: 2017 13th IEEE conference on automation science and engineering (CASE), pp 1136–1141 Jia Z, Shen C, Yi X, Chen Y, Yu T, Guan X (2017) Big-data analysis of multi-source logs for anomaly detection on network-based system. In: 2017 13th IEEE conference on automation science and engineering (CASE), pp 1136–1141
11.
go back to reference Liu J, Zhu J, He S, He P, Zheng Z, Lyu MR (2019) Logzip: extracting hidden structures via iterative clustering for log compression. In: Proceedings of the 34th IEEE/ACM international conference on automated software engineering. ASE ’19, IEEE Press, pp 863–873. https://doi.org/10.1109/ASE.2019.00085 Liu J, Zhu J, He S, He P, Zheng Z, Lyu MR (2019) Logzip: extracting hidden structures via iterative clustering for log compression. In: Proceedings of the 34th IEEE/ACM international conference on automated software engineering. ASE ’19, IEEE Press, pp 863–873. https://​doi.​org/​10.​1109/​ASE.​2019.​00085
12.
go back to reference Lu J, Liu C, Li F, Li L, Feng X, Xue J (2020) Cloudraid: detecting distributed concurrency bugs via log-mining and enhancement. IEEE Trans Software Eng p 1 Lu J, Liu C, Li F, Li L, Feng X, Xue J (2020) Cloudraid: detecting distributed concurrency bugs via log-mining and enhancement. IEEE Trans Software Eng p 1
13.
go back to reference Lu S, Wei X, Li Y, Wang L (2018) Detecting anomaly in big data system logs using convolutional neural network. In: 2018 IEEE 16th international conference on dependable, autonomic and secure computing, 16th international conference on pervasive intelligence and computing, 4th international conference on big data intelligence and computing and cyber science and technology Congress, pp 151–158 Lu S, Wei X, Li Y, Wang L (2018) Detecting anomaly in big data system logs using convolutional neural network. In: 2018 IEEE 16th international conference on dependable, autonomic and secure computing, 16th international conference on pervasive intelligence and computing, 4th international conference on big data intelligence and computing and cyber science and technology Congress, pp 151–158
14.
go back to reference Nagaraj K, Killian C, Neville J (2011) Structured comparative analysis of systems logs to diagnose performance problems Nagaraj K, Killian C, Neville J (2011) Structured comparative analysis of systems logs to diagnose performance problems
15.
go back to reference Pei K, Gu Z, Saltaformaggio B, Ma S, Wang F, Zhang Z, Si L, Zhang X, Xu D (2016) Hercule: attack story reconstruction via community discovery on correlated log graph. In: Proceedings of the 32nd annual conference on computer security applications. ACSAC ’16, ACM, New York, pp 583–595. https://doi.org/10.1145/2991079.2991122 Pei K, Gu Z, Saltaformaggio B, Ma S, Wang F, Zhang Z, Si L, Zhang X, Xu D (2016) Hercule: attack story reconstruction via community discovery on correlated log graph. In: Proceedings of the 32nd annual conference on computer security applications. ACSAC ’16, ACM, New York, pp 583–595. https://​doi.​org/​10.​1145/​2991079.​2991122
21.
go back to reference Shu X, Smiy J, Daphne Yao D, Lin H (2013) Massive distributed and parallel log analysis for organizational security. In: 2013 IEEE Globecom workshops (GC Wkshps), pp 194–199 Shu X, Smiy J, Daphne Yao D, Lin H (2013) Massive distributed and parallel log analysis for organizational security. In: 2013 IEEE Globecom workshops (GC Wkshps), pp 194–199
22.
go back to reference Sun Y, Guo S, Chen Z (2019) Intelligent log analysis system for massive and multi-source security logs: Mmslas design and implementation plan. In: 15th international conference on mobile Ad-Hoc and sensor networks, pp 416–421 Sun Y, Guo S, Chen Z (2019) Intelligent log analysis system for massive and multi-source security logs: Mmslas design and implementation plan. In: 15th international conference on mobile Ad-Hoc and sensor networks, pp 416–421
23.
go back to reference Tania KD, Tama BA (2017) Implementation of regular expression (regex) on knowledge management system. In: 2017 international conference on data and software engineering (ICoDSE), pp 1–5 Tania KD, Tama BA (2017) Implementation of regular expression (regex) on knowledge management system. In: 2017 international conference on data and software engineering (ICoDSE), pp 1–5
24.
go back to reference Wurzenberger M, Skopik F, Landauer M, Greitbauer P, Fiedler R, Kastner W (2017) Incremental clustering for semi-supervised anomaly detection applied on log data. In: Proceedings of the 12th international conference on availability, reliability and security. ARES ’17, ACM, New York, NY. https://doi.org/10.1145/3098954.3098973 Wurzenberger M, Skopik F, Landauer M, Greitbauer P, Fiedler R, Kastner W (2017) Incremental clustering for semi-supervised anomaly detection applied on log data. In: Proceedings of the 12th international conference on availability, reliability and security. ARES ’17, ACM, New York, NY. https://​doi.​org/​10.​1145/​3098954.​3098973
25.
go back to reference Xu W (2010) System problem detection by mining console logs. PhD thesis, USA Xu W (2010) System problem detection by mining console logs. PhD thesis, USA
26.
go back to reference Zeng Q, Duan H, Liu C (2020) Top-down process mining from multi-source running logs based on refinement of petri nets. IEEE Access 8:61355–61369CrossRef Zeng Q, Duan H, Liu C (2020) Top-down process mining from multi-source running logs based on refinement of petri nets. IEEE Access 8:61355–61369CrossRef
27.
Metadata
Title
Multi-source Log Clustering in Distributed Systems
Authors
Jackson Raffety
Brooklynn Stone
Jan Svacina
Connor Woodahl
Tomas Cerny
Pavel Tisnovsky
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-33-6385-4_4

Premium Partner