Skip to main content
Erschienen in: Empirical Software Engineering 1/2018

15.06.2017

Examining the stability of logging statements

verfasst von: Suhas Kabinna, Cor-Paul Bezemer, Weiyi Shang, Mark D. Syer, Ahmed E. Hassan

Erschienen in: Empirical Software Engineering | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Logging statements (embedded in the source code) produce logs that assist in understanding system behavior, monitoring choke-points and debugging. Prior work showcases the importance of logging statements in operating, understanding and improving software systems. The wide dependence on logs has lead to a new market of log processing and management tools. However, logs are often unstable, i.e., the logging statements that generate logs are often changed without the consideration of other stakeholders, causing sudden failures of log processing tools and increasing the maintenance costs of such tools. We examine the stability of logging statements in four open source applications namely: Liferay, ActiveMQ, Camel and CloudStack. We find that 20–45% of their logging statements change throughout their lifetime. The median number of days between the introduction of a logging statement and the first change to that statement is between 1 and 17 in our studied applications. These numbers show that in order to reduce maintenance effort, developers of log processing tools must be careful when selecting the logging statements on which their tools depend. In order to effectively mitigate the issues that are caused by unstable logging statements, we make an important first step towards determining whether a logging statement is likely to remain unchanged in the future. First, we use a random forest classifier to determine whether a just-introduced logging statement will change in the future, based solely on metrics that are calculated when it is introduced. Second, we examine whether a long-lived logging statement is likely to change based on its change history. We leverage Cox proportional hazards models (Cox models) to determine the change risk of long-lived logging statements in the source code. Through our case study on four open source applications, we show that our random forest classifier achieves a 83–91% precision, a 65–85% recall and a 0.95–0.96 AUC. We find that file ownership, developer experience, log density and SLOC are important metrics in our studied projects for determining the stability of logging statements in both our random forest classifiers and Cox models. Developers can use our approach to determine the risk of a logging statement changing in their own projects, to construct more robust log processing tools, by ensuring that these tools depend on logs that are generated by more stable logging statements.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Bigliardi L, Lanza M, Bacchelli A, D’Ambros M, Mocci A (2014) Quantitatively exploring non-code software artifacts. In: 14th international conference on quality software (QSIC), 2014. IEEE, pp 286– 295 Bigliardi L, Lanza M, Bacchelli A, D’Ambros M, Mocci A (2014) Quantitatively exploring non-code software artifacts. In: 14th international conference on quality software (QSIC), 2014. IEEE, pp 286– 295
Zurück zum Zitat Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: 6th IEEE international working conference on mining software repositories, 2009. MSR’09. IEEE, pp 1–10 Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: 6th IEEE international working conference on mining software repositories, 2009. MSR’09. IEEE, pp 1–10
Zurück zum Zitat Boulon J, Konwinski A, Qi R, Rabkin A, Yang E, Yang M (2008) Chukwa, a large-scale monitoring system. In: Proceedings of cloud computing and its applications, vol 8, pp 1–5 Boulon J, Konwinski A, Qi R, Rabkin A, Yang E, Yang M (2008) Chukwa, a large-scale monitoring system. In: Proceedings of cloud computing and its applications, vol 8, pp 1–5
Zurück zum Zitat Carasso D (2012) Exploring splunk. CITO Research, New York, USA. ISBN, p 978 Carasso D (2012) Exploring splunk. CITO Research, New York, USA. ISBN, p 978
Zurück zum Zitat Cohen J, Cohen P, West S G, Aiken L S (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge Cohen J, Cohen P, West S G, Aiken L S (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge
Zurück zum Zitat Collett D (2015) Modelling survival data in medical research. CRC Press Collett D (2015) Modelling survival data in medical research. CRC Press
Zurück zum Zitat Ding R, Zhou H, Lou J-G, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: a cost-aware logging mechanism for performance diagnosis. In: 2015 USENIX annual technical conference (USENIX ATC 15), pp 139–150 Ding R, Zhou H, Lou J-G, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: a cost-aware logging mechanism for performance diagnosis. In: 2015 USENIX annual technical conference (USENIX ATC 15), pp 139–150
Zurück zum Zitat Elbers C, Ridder G (1982) True and spurious duration dependence: the identifiability of the proportional hazard model. Rev Econ Stud 49(3):403–409MathSciNetCrossRefMATH Elbers C, Ridder G (1982) True and spurious duration dependence: the identifiability of the proportional hazard model. Rev Econ Stud 49(3):403–409MathSciNetCrossRefMATH
Zurück zum Zitat Fisher L D, Lin D Y (1999) Time-dependent covariates in the cox proportional-hazards regression model. Annu Rev Public Health 20(1):145–157CrossRef Fisher L D, Lin D Y (1999) Time-dependent covariates in the cox proportional-hazards regression model. Annu Rev Public Health 20(1):145–157CrossRef
Zurück zum Zitat Fu Q, Zhu J, Hu W, Lou J-G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? An empirical study on logging practices in industry. In: Proceedings of ICSE companion 2014: the 36th international conference on software engineering, pp 24–33 Fu Q, Zhu J, Hu W, Lou J-G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? An empirical study on logging practices in industry. In: Proceedings of ICSE companion 2014: the 36th international conference on software engineering, pp 24–33
Zurück zum Zitat Fu Q, Lou J-G, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the ICDM 2009, ninth IEEE international conference on data mining. IEEE, pp 149–158 Fu Q, Lou J-G, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the ICDM 2009, ninth IEEE international conference on data mining. IEEE, pp 149–158
Zurück zum Zitat Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering, vol 1. IEEE Press, pp 789–800 Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering, vol 1. IEEE Press, pp 789–800
Zurück zum Zitat Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing, vol 280. Wiley Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing, vol 280. Wiley
Zurück zum Zitat Harrell F (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer Harrell F (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer
Zurück zum Zitat Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85 Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85
Zurück zum Zitat Hoaglin DC, Welsch RE (1978) The hat matrix in regression and anova. Am Stat 32(1):17–22MATH Hoaglin DC, Welsch RE (1978) The hat matrix in regression and anova. Am Stat 32(1):17–22MATH
Zurück zum Zitat Hosmer DW Jr, Lemeshow S (1999) Applied survival analysis: regression modelling of time to event data Hosmer DW Jr, Lemeshow S (1999) Applied survival analysis: regression modelling of time to event data
Zurück zum Zitat Hripcsak G, Rothschild AS (2005) Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc 12(3):296–298CrossRef Hripcsak G, Rothschild AS (2005) Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc 12(3):296–298CrossRef
Zurück zum Zitat Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314 Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314
Zurück zum Zitat Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016a) Examining the stability of logging statements. In: SANER 2016: Proceedings of IEEE international conference on the software analysis, evolution and re-engineering. IEEE Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016a) Examining the stability of logging statements. In: SANER 2016: Proceedings of IEEE international conference on the software analysis, evolution and re-engineering. IEEE
Zurück zum Zitat Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016b) Logging library migrations: a case study for the apache software foundation projects. Mining Software Repositories. page To appear Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016b) Logging library migrations: a case study for the apache software foundation projects. Mining Software Repositories. page To appear
Zurück zum Zitat Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11):1073–1086CrossRef Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11):1073–1086CrossRef
Zurück zum Zitat Koru AG, Zhang D, Liu H (2007) Modeling the effect of size on defect proneness for open-source software. In: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, p 10 Koru AG, Zhang D, Liu H (2007) Modeling the effect of size on defect proneness for open-source software. In: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, p 10
Zurück zum Zitat Li H, Shang W, Hassan AE (2016a) Which log level should developers choose for a new logging statement? Empir Softw Eng. page To appear Li H, Shang W, Hassan AE (2016a) Which log level should developers choose for a new logging statement? Empir Softw Eng. page To appear
Zurück zum Zitat Li H, Shang W, Zou Y, Hassan AE (2016b) Towards just-in-time suggestions for log changes. Empir Softw Eng. page To appear Li H, Shang W, Zou Y, Hassan AE (2016b) Towards just-in-time suggestions for log changes. Empir Softw Eng. page To appear
Zurück zum Zitat Lou J-G, Fu Q, Yang S, Xu Y, Li J (2010) Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC’10, Berkeley, CA, USA. USENIX Association, p 24 Lou J-G, Fu Q, Yang S, Xu Y, Li J (2010) Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC’10, Berkeley, CA, USA. USENIX Association, p 24
Zurück zum Zitat Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of (ICSE) 2013, 35th international conference on software engineering, pp 1012–1021 Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of (ICSE) 2013, 35th international conference on software engineering, pp 1012–1021
Zurück zum Zitat Mednis M, Aurich MK (2012) Application of string similarity ratio and edit distance in automatic metabolite reconciliation comparing reconstructions and models. Biosyst Info Technol 1(1):14–18CrossRef Mednis M, Aurich MK (2012) Application of string similarity ratio and edit distance in automatic metabolite reconciliation comparing reconstructions and models. Biosyst Info Technol 1(1):14–18CrossRef
Zurück zum Zitat Miller RG Jr (2011) Survival analysis, vol 66. Wiley Miller RG Jr (2011) Survival analysis, vol 66. Wiley
Zurück zum Zitat Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: assessment of a critical software development process. In: Proceedings of the 37th international conference on software engineering, vol 2. IEEE Press, pp 169–178 Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: assessment of a critical software development process. In: Proceedings of the 37th international conference on software engineering, vol 2. IEEE Press, pp 169–178
Zurück zum Zitat Ren H, Tang X, Lee JJ, Feng L, Everett AD, Hong WK, Khuri FR, Mao L (2004) Expression of hepatoma-derived growth factor is a strong prognostic predictor for patients with early-stage non–small-cell lung cancer. J Clin Oncol 22(16):3230–3237CrossRef Ren H, Tang X, Lee JJ, Feng L, Everett AD, Hong WK, Khuri FR, Mao L (2004) Expression of hepatoma-derived growth factor is a strong prognostic predictor for patients with early-stage non–small-cell lung cancer. J Clin Oncol 22(16):3230–3237CrossRef
Zurück zum Zitat Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton, MA, USA. ISBN 0408709294MATH Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton, MA, USA. ISBN 0408709294MATH
Zurück zum Zitat Serfling RJ (2009) Approximation theorems of mathematical statistics, vol 162. Wiley Serfling RJ (2009) Approximation theorems of mathematical statistics, vol 162. Wiley
Zurück zum Zitat Shang W (2012) Bridging the divide between software developers and operators using logs. In: Proceedings of the 34th international conference on software engineering. IEEE, pp 1583–1586 Shang W (2012) Bridging the divide between software developers and operators using logs. In: Proceedings of the 34th international conference on software engineering. IEEE, pp 1583–1586
Zurück zum Zitat Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014a) An exploratory study of the evolution of communicated information about the execution of large software systems. Journal of Software: Evolution and Process 26(1):3–26 Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014a) An exploratory study of the evolution of communicated information about the execution of large software systems. Journal of Software: Evolution and Process 26(1):3–26
Zurück zum Zitat Shang W, Nagappan M, Hassan AE, Jiang ZM (2014b) Understanding log lines using development knowledge. In: Proceedings of ICSME 2014, the international conference on software maintenance and evolution. IEEE, pp 21–30 Shang W, Nagappan M, Hassan AE, Jiang ZM (2014b) Understanding log lines using development knowledge. In: Proceedings of ICSME 2014, the international conference on software maintenance and evolution. IEEE, pp 21–30
Zurück zum Zitat Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27CrossRef Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27CrossRef
Zurück zum Zitat Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinforma 9(1):307CrossRef Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinforma 9(1):307CrossRef
Zurück zum Zitat Syer M, Nagappan M, Adams B, Hassan AE (2015) Replicating and re-evaluating the theory of relative defect-proneness. IEEE Trans Softw Eng 41(2):176–197CrossRef Syer M, Nagappan M, Adams B, Hassan AE (2015) Replicating and re-evaluating the theory of relative defect-proneness. IEEE Trans Softw Eng 41(2):176–197CrossRef
Zurück zum Zitat Tan J, Pan X, Kavulya S, Gandhi R, Narasimhan P (2008) Salsa: Analyzing logs as state machines. In: WASL’08: Proceedings of the 1st USENIX conference on analysis of system logs. USENIX Association, p 6 Tan J, Pan X, Kavulya S, Gandhi R, Narasimhan P (2008) Salsa: Analyzing logs as state machines. In: WASL’08: Proceedings of the 1st USENIX conference on analysis of system logs. USENIX Association, p 6
Zurück zum Zitat Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SOPS 2009, 22nd symposium on operating systems principle, pp 117–132 Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SOPS 2009, 22nd symposium on operating systems principle, pp 117–132
Zurück zum Zitat Xu X, Weber I, Bass L, Zhu L, Wada H, Teng F (2013) Detecting cloud provisioning errors using an annotated process model. In: Proceedings of MW4NG 2013, the 8th workshop on middleware for next generation internet computing. ACM, p 5 Xu X, Weber I, Bass L, Zhu L, Wada H, Teng F (2013) Detecting cloud provisioning errors using an annotated process model. In: Proceedings of MW4NG 2013, the 8th workshop on middleware for next generation internet computing. ACM, p 5
Zurück zum Zitat Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012) Be conservative: enhancing failure diagnosis with proactive logging. In: OSDI 2012, USENIX Symposium on operating systems design and implementation, pp 293–306 Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012) Be conservative: enhancing failure diagnosis with proactive logging. In: OSDI 2012, USENIX Symposium on operating systems design and implementation, pp 293–306
Zurück zum Zitat Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. In: Proceedings of ASPLOS 2011, the 16th conference on architectural support for programming languages and operating systems, pp 3–14 Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. In: Proceedings of ASPLOS 2011, the 16th conference on architectural support for programming languages and operating systems, pp 3–14
Zurück zum Zitat Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of ICSE 2012, the 34th international conference on software engineering. IEEE Press, pp 102–112 Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of ICSE 2012, the 34th international conference on software engineering. IEEE Press, pp 102–112
Zurück zum Zitat Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304CrossRef Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304CrossRef
Zurück zum Zitat Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: Proceedings of ICSE 2015, the 37th international conference on software engineering, vol 1. IEEE Press, Piscataway, NJ, USA, pp 415–425 Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: Proceedings of ICSE 2015, the 37th international conference on software engineering, vol 1. IEEE Press, Piscataway, NJ, USA, pp 415–425
Metadaten
Titel
Examining the stability of logging statements
verfasst von
Suhas Kabinna
Cor-Paul Bezemer
Weiyi Shang
Mark D. Syer
Ahmed E. Hassan
Publikationsdatum
15.06.2017
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 1/2018
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-017-9518-0

Weitere Artikel der Ausgabe 1/2018

Empirical Software Engineering 1/2018 Zur Ausgabe