nach oben

Empirical Software Engineering

Erschienen in:

28.02.2019

RETRACTED ARTICLE: The smell of fear: on the relation between test smells and flaky tests

verfasst von: Fabio Palomba, Andy Zaidman

Erschienen in: Empirical Software Engineering | Ausgabe 5/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Regression testing is the activity performed by developers to check whether new modifications have not introduced bugs. A crucial requirement to make regression testing effective is that test cases are deterministic. Unfortunately, this is not always the case as some tests might suffer from so-called flakiness, i.e., tests that exhibit both a passing and a failing outcome with the same code. Flaky tests are widely recognized as a serious issue, since they hide real bugs and increase software inspection costs. While previous research has focused on understanding the root causes of test flakiness and devising techniques that automatically fix them, in this paper we explore an orthogonal perspective: the relation between flaky tests and test smells, i.e., suboptimal development choices applied when developing tests. Relying on (1) an analysis of the state-of-the-art and (2) interviews with industrial developers, we first identify five flakiness-inducing test smell types, namely Resource Optimism, Indirect Testing, Test Run War, Fire and Forget, and Conditional Test Logic, and automate their detection. Then, we perform a large-scale empirical study on 19,532 JUnit test methods of 18 software systems, discovering that the five considered test smells causally co-occur with flaky tests in 75% of the cases. Furthermore, we evaluate the effect of refactoring, showing that it is not only able to remove design flaws, but also fixes all 75% flaky tests causally co-occurring with test smells.

Vorheriger Artikel The life-cycle of merge conflicts: processes, barriers, and strategies

Nächster Artikel Using bug descriptions to reformulate queries during text-retrieval-based bug localization

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Available here: https://github.com/apache

https://fbinfer.com

https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html

http://www.eclemma.org/jacoco/

Abbes M, Khomh F, Gueheneuc Y-G, Antoniol G (2011) An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: Proceedings of the European conference on software maintenance and reengineering (CSMR). IEEE, pp 181–190

Al Dallal J (2015) Identifying refactoring opportunities in object-oriented code: a systematic literature review. Inf Softw Technol 58:231–249CrossRef

Arcoverde R, Garcia A, Figueiredo E (2011) Understanding the longevity of code smells: preliminary results of an explanatory survey. In: Proceedings of the international workshop on refactoring tools. ACM, pp 33–36

Athanasiou D, Nugroho A, Visser J, Zaidman A (2014) Test code quality and its relation to issue handling performance. IEEE Trans Softw Eng 40(11):1100–1125CrossRef

Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading

Bavota G, De Carluccio B, De Lucia A, Di Penta M, Oliveto R, Strollo O (2012) When does a refactoring induce bugs? An empirical study. In: Proceedings of the international working conference on source code analysis and manipulation (SCAM). IEEE, pp 104–113

Bavota G, De Lucia A, Di Penta M, Oliveto R, Palomba F (2015a) An experimental investigation on the innate relationship between quality and refactoring. J Syst Softw 107:1–14CrossRef

Bavota G, Qusef A, Oliveto R, De Lucia A, Binkley D (2015b) Are test smells really harmful? An empirical study. Empir Softw Eng 20(4):1052–1094CrossRef

Beck (2002) Test driven development: by example. Addison-Wesley Longman Publishing Co. Inc., Boston

Bell J, Kaiser G (2014) Unit test virtualization with VMVM. In: Proceedings of the international conference on software engineering (ICSE). ACM, pp 550–561

Beller M, Gousios G, Panichella A, Zaidman A (2015a) When, how, and why developers (do not) test in their IDEs. In: Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE). ACM, pp 179–190

Beller M, Gousios G, Zaidman A (2015b) How (much) do developers test? In: Proceedings of the international conference on software engineering (ICSE). IEEE, pp 559–562

Beller M, Gousios G, Zaidman A (2017a) Oops, my tests broke the build: an explorative analysis of Travis CI with GitHub. In: Proceedings of the international conference on mining software repositories (MSR). ACM, pp 356–367

Beller M, Gousios G, Panichella A, Proksch S, Amann S, Zaidman A (2017b) Developer testing in the ide: patterns, beliefs, and behavior. In: IEEE transactions on software engineering (TSE), to Appear

Beller M, Gousios G, Zaidman A (2017c) TravisTorrent Synthesizing Travis CI And GitHub for full-stack research on continuous integration. In: Proceedings of the international conference on mining software repositories (MSR). IEEE, pp 447–450

Bell J, Legunsen O, Hilton M, Eloussi L, Yung T, Marinov D (2018) Deflaker: automatically detecting flaky tests. In: Proceedings of the international conference on software engineering (ICSE). ACM, pp 433–444

Budd TA (1980) Mutation analysis of program test data. Ph.D. dissertation, New Haven, aAI8025191

Catolino G, Palomba F, De Lucia A, Ferrucci F, Zaidman A (2017) Developer-related factors in change prediction: an empirical assessment. In: Proceedings of the international conference on program comprehension (ICPC). IEEE, pp 186–195

Catolino G, Palomba F, De Lucia A, Ferrucci F, Zaidman A (2018) Enhancing change prediction models using developer-related factors. J Syst Softw 143 (9):14–28CrossRef

Croux C, Dehon C (2010) Influence functions of the spearman and kendall correlation measures. Stat Methods Appl 19(4):497–515. [Online]. Available: https://doi.org/10.1007/s10260-010-0142-z MathSciNetCrossRef

Daniel B, Jagannath V, Dig D, Marinov D (2009) Reassert: suggesting repairs for broken unit tests. In: Proceedings of the international conference on automated software engineering (ASE). IEEE, pp 433–444

Developers G (2012) No more flaky tests on the go team. [Online]. Available: https://www.thoughtworks.com/insights/blog/no-more-flaky-tests-go-team

Developers C (2018) Flakiness dashboard howto. [Online]. Available: http://www.chromium.org/developers/testing/flakiness-dashboard

Di Nucci D, Palomba F, Tamburri DA, Serebrenik A, De Lucia A (2018) Detecting code smells using machine learning techniques: are we there yet? In: 25th IEEE international conference on software analysis, evolution and reengineering. IEEE, pp 612–621

Engström E., Runeson P (2010) A qualitative survey of regression testing practices. In: Proceedings of the international conference on product-focused software process improvement (PROFES). Springer, Berlin Heidelberg, pp 3–16

Farchi E, Nir Y, Ur S (2003) Concurrent bug patterns and how to test them. In: Proceedings international parallel and distributed processing symposium, p 7

Fowler M (1999) Refactoring: improving the design of existing code. Addison-Wesley, ReadingMATH

Fowler M (2011) Eradicating non-determinism in tests. [Online]. Available: https://martinfowler.com/articles/nonDeterminism.html

Garousi V, Felderer M, Mäntylä MV (2016) The need for multivocal literature reviews in software engineering: complementing systematic literature reviews with grey literature. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering. ACM, p 26

Garousi V, Küċük B (2018) Smells in software test code: a survey of knowledge in industry and academia. J Syst Softw 138:52–81CrossRef

Gousios G, Zaidman A, Storey M-A, van Deursen A (2015) Work practices and challenges in pull-based development: the integrator’s perspective. In: Proceedings of the international conference on software engineering (ICSE). IEEE, pp 358–368

Greiler M, van Deursen A, Storey MA (2013a) Automated detection of test fixture strategies and smells. In: Proceedings of the international conference on software testing, verification and validation (ICST). IEEE, pp 322–331

Greiler M, Zaidman A, van Deursen A, Storey M-A (2013b) Strategies for avoiding text fixture smells during software evolution. In: Proceedings of the 10th working conference on mining software repositories (MSR). IEEE, pp 387–396

Hao D, Zhang L, Zhong H, Mei H, Sun J (2005) Eliminating harmful redundancy for testing-based fault localization using test suite reduction: an experimental study. In: Proceedings of the international conference on software maintenance (ICSM). IEEE, pp 683–686

Jaccard P (1901) Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin del la Societé Vaudoise des Sciences Naturelles 37:547–579

Jin G, Song L, Zhang W, Lu S, Liblit B (2011) Automated atomicity-violation fixing. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation (PLDI). ACM, pp 389–400

Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773CrossRef

Kendall M (1948) Rank correlation methods. Charles Griffin & Company Limited, LondonMATH

Khomh F, Vaucher S, Guéhéneuc Y-G, Sahraoui H (2009) A Bayesian approach for the detection of code and design smells. In: Proceedings of the 9th international conference on quality software (QSIC). IEEE, pp 305–314

Khomh F, Di Penta M, Guéhéneuc Y-G, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empir Softw Eng 17(3):243–275CrossRef

Kleiman S, Shah D, Smaalders B (1996) Programming with threads. Sun Soft Press Mountain View

Lacoste FJ (2009) Killing the gatekeeper: introducing a continuous integration system. In: 2009 agile conference, pp 387–392

Lanza M, Marinescu R (2006) Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer, BerlinMATH

Lozano A, Wermelinger M, Nuseibeh B (2007) Assessing the impact of bad smells using historical information. In: Proceedings of the international workshop on principles of software evolution (IWPSE). ACM, pp 31–34

Lu S, Park S, Seo E, Zhou Y (2008) Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the international conference on architectural support for programming languages and operating systems (ASPLOS). ACM, pp 329–339

Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Proceedings of the SIGSOFT international symposium on foundations of software engineering (FSE). ACM, pp 643–653

Mackenzie N, Knipe S (2006) Research dilemmas: paradigms, methods and methodology. Issues Educ Res 16(2):193–205

Malaiya YK, Li MN, Bieman JM, Karcich R (2002) Software reliability growth with test coverage. IEEE Trans Reliab 51(4):420–426CrossRef

Marinescu R (2004) Detection strategies: metrics-based rules for detecting design flaws. In: Proceedings of the international conference on software maintenance (ICSM). IEEE, pp 350–359

Marinescu P, Hosek P, Cadar C (2014) Covrig: a framework for the analysis of code, test, and coverage evolution in real software. In: Proceedings of the international symposium on software testing and analysis (ISSTA). ACM, pp 93–104

Melski E (2018) 6 tips for writing robust, maintainable unit tests. [Online]. Available: https://blog.melski.net/tag/unit-tests/

Memon AM, Cohen MB (2013) Automated testing of gui applications: models, tools, and controlling flakiness. In: Proceedings of the international conference on software engineering (ICSE). IEEE, pp 1479–1480

Mens T, Tourwé T (2004) A survey of software refactoring. IEEE Trans Softw Eng 30(2):126–139CrossRef

Meszaros G (2007) xUnit test patterns: refactoring test code. Addison Wesley, Reading

Micco J (2016) Flaky tests at Google and how we mitigate them, last visited, March 24th, 2017. [Online] Available: https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html

Moha N, Guéhéneuc Y-G, Duchien L, Meur A-FL (2010) Decor: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1):20–36CrossRef

Moonen L, van Deursen A, Zaidman A, Bruntink M (2008) On the interplay between software testing and evolution and its effect on program comprehension. In: Mens T, Demeyer S (eds) Software evolution. Springer, pp 173–202

Muşlu K, Soran B, Wuttke J (2011) Finding bugs by isolating unit tests. In: Proceedings of the SIGSOFT symposium on foundations of software engineering and the european conference on software engineering (ESEC/FSE). ACM, pp 496–499

Munro MJ (2005) Product metrics for automatic identification of “bad smell” design problems in java source-code. In: Proceedings of the international software metrics symposium (METRICS). IEEE

Oliveto R, Khomh F, Antoniol G, Guéhéneuc Y-G (2010) Numerical signatures of antipatterns: an approach based on B-Splines. In: Proceedings of the 14th conference on software maintenance and reengineering. IEEE Computer Society Press, pp 248–251

Palomba F, Zaidman A (2017) Does refactoring of test smells induce fixing flaky tests?. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 1–12

Palomba F, Zaidman A (2018) The smell of fear: on the relation between test smells and flaky tests - online appendix, [Online] Available: https://tinyurl.com/ycnmnd6w

Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A (2014) Do they really smell bad? A study on developers’ perception of bad code smells. In: Proceedings of the 30th IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 101–110

Palomba F, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A (2015) Mining version histories for detecting code smells. IEEE Trans Softw Eng 41(5):462–489CrossRef

Palomba F, Di Nucci D, Panichella A, Oliveto R, De Lucia A (2016a) On the diffusion of test smells in automatically generated test code: an empirical study. In: Proceedings of the international workshop on search-based software testing (SBST). ACM, pp 5–14

Palomba F, Panichella A, De Lucia A, Oliveto R, Zaidman A (2016b) A textual-based technique for smell detection. In: IEEE 24th international conference on program comprehension (ICPC), pp 1–10

Palomba F, Panichella A, Zaidman A, Oliveto R, De Lucia A (2016c) Automatic test case generation: what if test code quality matters?. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 130–141

Palomba F, Panichella A, Zaidman A, Oliveto R, De Lucia A (2017a) The scent of a smell: an extensive comparison between textual and structural smells. IEEE Transactions on Software Engineering

Palomba F, Zaidman A, Oliveto R, De Lucia A (2017b) An exploratory study on the relationship between changes and refactoring. In: 2017 IEEE/ACM 25th international conference on program comprehension (ICPC). IEEE, pp 176–185

Palomba F, Zanoni M, Fontana FA, De Lucia A, Oliveto R (2017c) Toward a smell-aware bug prediction model. IEEE Transactions on Software Engineering

Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018a) A large-scale empirical study on the lifecycle of code smell co-occurrences. Inf Softw Technol 99:1–10CrossRef

Palomba F, Zaidman A, Lucia A (2018b) Automatic test smell detection using information retrieval techniques. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE

Palomba F, Bavota G, Penta M et al (2018c) Empir Software Eng 23:1188–1221. https://doi.org/10.1007/s10664-017-9535-z CrossRef

Palomba F, Tamburri DA, Arcelli Fontana F, Oliveto R, Zaidman A, Serebrenik A (2019) Beyond technical aspects: how do community smells influence the intensity of code smells? IEEE transactions on software engineering

Perez A, Abreu R, van Deursen A (2017) A test-suite diagnosability metric for spectrum-based fault localization approaches. In: Proceedings of the international conference on software engineering (ICSE). ACM, pp 654–664

Peters R, Zaidman A (2012) Evaluating the lifespan of code smells using software repository mining. In: Proceedings of the European conference on software maintenance and reengineering (CSMR). IEEE, pp 411–416

Pinto LS, Sinha S, Orso A (2012) Understanding myths and realities of test-suite evolution. In: Proceedings of the international symposium on the foundations of software engineering (FSE). ACM, pp 33:1–33:11

Ratiu D, Ducasse S, Gîrba T, Marinescu R (2004) Using history information to improve design flaws detection. In: Proceedings of the European conference on software maintenance and reengineering (CSMR). IEEE, pp 223–232

Sackett DL (1979) Bias in analytic research. In: The case-control study consensus and controversy. Elsevier, pp 51–63

Sjoberg D, Yamashita A, Anda B, Mockus A, Dyba T (2013) Quantifying the effect of code smells on maintenance effort. IEEE Trans Softw Eng 39(8):1144–1156CrossRef

Spadini D, Palomba F, Zaidman A, Bruntink M, Bacchelli A (2018) On the relation of test smells to software code quality. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE

Tsantalis N, Chatzigeorgiou A (2009) Identification of move method refactoring opportunities. IEEE Trans Softw Eng 35(3):347–367CrossRef

Tsantalis N, Chatzigeorgiou A (2011) Identification of extract method refactoring opportunities for the decomposition of methods. J Syst Softw 84(10):1757–1782CrossRef

Tufano M, Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2016) An empirical investigation into the nature of test smells. In: Proceedings of the international conference on automated software engineering (ASE). ACM, pp 4–15

Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). Trans Softw Eng (TSE) 43(11):1063–1088CrossRef

Vahabzadeh A, Fard AM, Mesbah A (2015) An empirical study of bugs in test code. In: 2015 IEEE international conference on software maintenance and evolution (ICSME), pp 101–110

van Deursen A, Moonen L, Bergh A, Kok G (2001) Refactoring test code. In: Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP), pp 92–95

Van Rompaey B, Bois B, Demeyer S, Rieger M (2007) On the detection of test smells: a metrics-based approach for general fixture and eager test. IEEE Trans Softw Eng 33(12):800–817CrossRef

Weiss RS (1995) Learning from strangers: the art and method of qualitative interview studies. Simon and Schuster, New York

Yamashita A (2012) Do code smells reflect important maintainability aspects? In: International conference on software maintenance (ICSM). IEEE, pp 306–315

Yamashita A, Moonen L (2013) Exploring the impact of inter-smell relations on software maintainability: an empirical study. In: Proceedings of the international conference on software engineering (ICSE). IEEE, pp 682–691

Yamashita A, Zanoni M, Fontana FA, Walter B (2015) Inter-smell relations in industrial and open source systems: a replication and comparative analysis. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE, pp 121–130

Yang G, Khurshid S, Kim M (2012) Specification-based test repair using a lightweight formal method. In: Proceedings of the international symposium on formal methods (FM), pp 455–470

Zaidman A, Van Rompaey B, van Deursen A, Demeyer S (2011) Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining. Empir Softw Eng 16(3):325–364. [Online]. Available: https://doi.org/10.1007/s10664-010-9143-7 CrossRef

Zhang S, Jalali D, Wuttke J, Muslu K, Lam W, Ernst MD, Notkin D (2014) Empirically revisiting the test independence assumption. In: Proceedings of the international symposium on software testing and analysis (ISSTA). ACM, pp 385–396

Zhang Y, Mesbah A (2015) Assertions are strongly correlated with test suite effectiveness. In: Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE). ACM, pp 214–224

Titel: RETRACTED ARTICLE: The smell of fear: on the relation between test smells and flaky tests
verfasst von: Fabio Palomba
Andy Zaidman
Publikationsdatum: 28.02.2019
Verlag: Springer US
Erschienen in: Empirical Software Engineering / Ausgabe 5/2019
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-019-09683-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 5/2019

Editor’s Note: Special Section on Source Code Analysis and Manipulation

Using bug descriptions to reformulate queries during text-retrieval-based bug localization

Guest editorial: special section on predictive models and data analytics in software engineering

A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation

Editor’s Note: Special Section on Software Maintenance and Evolution

Security code smells in Android ICC