ABSTRACT
Mutation testing has been used and studied for over four decades as a method to assess the strength of a test suite. This technique adds an artificial bug (i.e., a mutation) to a program to produce a mutant, and the test suite is run to determine if any of its test cases are sufficient to detect this mutation (i.e., kill the mutant). In this situation, a test case that fails is the one that kills the mutant. However, little is known about the nature of these kills. In this paper, we present an empirical study that investigates the nature of these kills. We seek to answer questions, such as: How are test cases failing so that they contribute to mutant kills? How many test cases fail for each killed mutant, given that only a single failure is required to kill that mutant? How do program crashes contribute to kills, and what are the origins and nature of these crashes? We found several revealing results across all subjects, including the substantial contribution of "crashes" to test failures leading to mutant kills, the existence of diverse causes for test failures even for a single mutation, and the specific types of exceptions that commonly instigate crashes. We posit that this study and its results should likely be taken into account for practitioners in their use of mutation testing and interpretation of its mutation score, and for researchers who study and leverage mutation testing in their future work.
- Allen T Acree, Timothy A Budd, Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1979. Mutation Analysis.. Georgia Inst of Tech Atlanta School of Information And Computer Science. Google Scholar
- Paul Ammann, Marcio Eduardo Delamaro, and Jeff Offutt. 2014. Establishing theoretical minimal sets of mutants. In 2014 IEEE seventh international conference on software testing, verification and validation. 21–30. https://doi.org/10.1109/ICST.2014.13 Google ScholarDigital Library
- Moritz Beller, Chu-Pan Wong, Johannes Bader, Andrew Scott, Mateusz Machalica, Satish Chandra, and Erik Meijer. 2021. What it would take to use mutation testing in industry—a study at facebook. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 268–277. https://doi.org/10.1109/ICSE-SEIP52600.2021.00036 Google ScholarDigital Library
- Marat Boshernitsan, Roongko Doong, and Alberto Savoia. 2006. From Daikon to Agitator: Lessons and Challenges in Building a Commercial Tool for Developer Testing. In Proceedings of the 2006 International Symposium on Software Testing and Analysis (ISSTA ’06). Association for Computing Machinery, New York, NY, USA. 169–180. isbn:1595932631 https://doi.org/10.1145/1146238.1146258 Google ScholarDigital Library
- Leonardo Bottaci. 2010. Type sensitive application of mutation operators for dynamically typed programs. In 2010 Third International Conference on Software Testing, Verification, and Validation Workshops. 126–131. https://doi.org/10.1109/ICSTW.2010.56 Google ScholarDigital Library
- Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: a code manipulation tool to implement adaptable systems. Adaptable and extensible component systems, 30, 19 (2002), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.5769 Google Scholar
- Thierry Titcheu Chekam, Mike Papadakis, Maxime Cordy, and Yves Le Traon. 2021. Killing stubborn mutants with symbolic execution. ACM Transactions on Software Engineering and Methodology (TOSEM), 30, 2 (2021), 1–23. issn:0164-1212 https://doi.org/10.1145/3425497 Google ScholarDigital Library
- Henry Coles, Thomas Laurent, Christopher Henard, Mike Papadakis, and Anthony Ventresque. 2016. Pit: a practical mutation testing tool for java. In Proceedings of the 25th international symposium on software testing and analysis. 449–452. https://doi.org/10.1145/2931037.2948707 Google ScholarDigital Library
- Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on test data selection: Help for the practicing programmer. Computer, 11, 4 (1978), 34–41. https://doi.org/10.1109/C-M.1978.218136 Google ScholarDigital Library
- Hang Du, Vijay Krishna Palepu, and James A. Jones. 2023. spideruci/MutationKills: To Kill a Mutant: An Empirical Study of Mutation Testing Kills (ISSTA Replication Package) (v1.0.0). https://doi.org/10.5281/zenodo.7939536 Google ScholarDigital Library
- M.D. Ernst, J. Cockrell, W.G. Griswold, and D. Notkin. 2001. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering, 27, 2 (2001), 99–123. https://doi.org/10.1109/32.908957 Google ScholarDigital Library
- Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming, 69, 1 (2007), 35–45. issn:0167-6423 https://doi.org/10.1016/j.scico.2007.01.015 Special issue on Experimental Software and Toolkits Google ScholarDigital Library
- Antonia Estero-Botaro, Francisco Palomo-Lozano, Inmaculada Medina-Bulo, Juan José Domínguez-Jiménez, and Antonio García-Domínguez. 2015. Quality metrics for mutation testing with applications to WS-BPEL compositions. Software Testing, Verification and Reliability, 25, 5-7 (2015), 536–571. https://doi.org/10.1002/stvr.1528 Google ScholarDigital Library
- Gordon Fraser and Andreas Zeller. 2010. Mutation-driven generation of unit tests and oracles. In Proceedings of the 19th international symposium on Software testing and analysis. 147–158. https://doi.org/10.1145/1831708.1831728 Google ScholarDigital Library
- R. Geist, A.J. Offutt, and F.C. Harris. 1992. Estimation and enhancement of real-time software reliability through mutation analysis. IEEE Trans. Comput., 41, 5 (1992), 550–558. https://doi.org/10.1109/12.142681 Google ScholarDigital Library
- Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical program repair via bytecode mutation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 19–30. https://doi.org/10.1145/3293882.3330559 Google ScholarDigital Library
- Rahul Gopinath, Amin Alipour, Iftekhar Ahmed, Carlos Jensen, and Alex Groce. 2015. How hard does mutation analysis have to be, anyway? In 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE). 216–227. https://doi.org/10.1109/ISSRE.2015.7381815 Google ScholarDigital Library
- Shin Hong, Byeongcheol Lee, Taehoon Kwak, Yiru Jeon, Bongsuk Ko, Yunho Kim, and Moonzoo Kim. 2015. Mutation-based fault localization for real-world multilingual programs (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 464–475. https://doi.org/10.1109/ASE.2015.14 Google ScholarDigital Library
- Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering, 37, 5 (2010), 649–678. https://doi.org/10.1109/TSE.2010.62 Google ScholarDigital Library
- René Just, Bob Kurtz, and Paul Ammann. 2017. Inferring mutant utility from program context. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 284–294. https://doi.org/10.1145/3092703.3092732 Google ScholarDigital Library
- Marinos Kintis, Mike Papadakis, and Nicos Malevris. 2010. Evaluating mutation testing alternatives: A collateral experiment. In 2010 Asia Pacific Software Engineering Conference. 300–309. https://doi.org/10.1109/APSEC.2010.42 Google ScholarDigital Library
- Mario Linares-Vásquez, Gabriele Bavota, Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk. 2017. Enabling mutation testing for android apps. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 233–244. https://doi.org/10.1145/3106237.3106275 Google ScholarDigital Library
- Dor D Ma’ayan. 2018. The quality of junit tests: an empirical study report. In 2018 IEEE/ACM 1st International Workshop on Software Qualities and their Dependencies (SQUADE). 33–36. https://doi.org/10.1145/3194095.3194102 Google ScholarDigital Library
- Phil McMinn, Chris J Wright, Colton J McCurdy, and Gregory M Kapfhammer. 2017. Automatic detection and removal of ineffective mutants for the mutation analysis of relational database schemas. IEEE Transactions on Software Engineering, 45, 5 (2017), 427–463. https://doi.org/10.1109/TSE.2017.2786286 Google ScholarCross Ref
- Seokhyeon Moon, Yunho Kim, Moonzoo Kim, and Shin Yoo. 2014. Ask the mutants: Mutating faulty programs for fault localization. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation. 153–162. https://doi.org/10.1109/ICST.2014.28 Google ScholarDigital Library
- Kevin Moran, Michele Tufano, Carlos Bernal-Cárdenas, Mario Linares-Vásquez, Gabriele Bavota, Christopher Vendome, Massimiliano Di Penta, and Denys Poshyvanyk. 2018. Mdroid+: A mutation testing framework for android. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion). 33–36. https://doi.org/10.1145/3183440.3183492 Google ScholarDigital Library
- Miloš Ojdanić, Wei Ma, Thomas Laurent, Thierry Titcheu Chekam, Anthony Ventresque, and Mike Papadakis. 2022. On the use of commit-relevant mutants. Empirical Software Engineering, 27, 5 (2022), 1–31. https://doi.org/10.1007/s10664-022-10138-1 Google ScholarDigital Library
- Oracle. 2020. Throwable (Java Platform SE 7 ). Retrieved April 15, 2023 from https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html Google Scholar
- Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: mutation-based fault localization. Software Testing, Verification and Reliability, 25, 5-7 (2015), 605–628. https://doi.org/10.1002/stvr.1509 Google ScholarDigital Library
- Mike Papadakis, Donghwan Shin, Shin Yoo, and Doo-Hwan Bae. 2018. Are mutation scores correlated with real fault detection? a large scale empirical study on the relationship between mutants and real faults. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 537–548. https://doi.org/10.1145/3180155.3180183 Google ScholarDigital Library
- Matthew Patrick, Manuel Oriol, and John A Clark. 2012. MESSI: Mutant evaluation by static semantic interpretation. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation. 711–719. https://doi.org/10.1109/ICST.2012.161 Google ScholarDigital Library
- Goran Petrović and Marko Ivanković. 2018. State of mutation testing at google. In Proceedings of the 40th international conference on software engineering: Software engineering in practice. 163–171. https://doi.org/10.1145/3183519.3183521 Google ScholarDigital Library
- Alessandro Viola Pizzoleto, Fabiano Cutigi Ferrari, Jeff Offutt, Leo Fernandes, and Márcio Ribeiro. 2019. A systematic literature review of techniques and metrics to reduce the cost of mutation testing. Journal of Systems and Software, 157 (2019), 110388. https://doi.org/10.1016/j.jss.2019.07.100 Google ScholarCross Ref
- David S. Rosenblum. 1995. A practical approach to programming with assertions. IEEE transactions on Software Engineering, 21, 1 (1995), 19–31. https://doi.org/10.1109/32.341844 Google ScholarDigital Library
- David Schuler, Valentin Dallmeier, and Andreas Zeller. 2009. Efficient Mutation Testing by Checking Invariant Violations. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis (ISSTA ’09). Association for Computing Machinery, New York, NY, USA. 69–80. isbn:9781605583389 https://doi.org/10.1145/1572272.1572282 Google ScholarDigital Library
- David Schuler and Andreas Zeller. 2011. Assessing Oracle Quality with Checked Coverage. In 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation. 90–99. https://doi.org/10.1109/ICST.2011.32 Google ScholarDigital Library
- August Shi, Jonathan Bell, and Darko Marinov. 2019. Mitigating the effects of flaky tests on mutation testing. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 112–122. https://doi.org/10.1145/3293882.3330568 Google ScholarDigital Library
- Ben H Smith and Laurie Williams. 2009. On guiding the augmentation of an automated test suite via mutation analysis. Empirical software engineering, 14, 3 (2009), 341–369. https://doi.org/10.1007/s10664-008-9083-7 Google ScholarDigital Library
- Thierry Titcheu Chekam, Mike Papadakis, Tegawendé F Bissyandé, Yves Le Traon, and Koushik Sen. 2020. Selecting fault revealing mutants. Empirical Software Engineering, 25, 1 (2020), 434–487. https://doi.org/10.1007/s10664-019-09778-7 Google ScholarDigital Library
- Willem Visser. 2016. What makes killing a mutant hard. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 39–44. https://doi.org/10.1145/2970276.2970345 Google ScholarDigital Library
- Jifeng Xuan, Xiaoyuan Xie, and Martin Monperrus. 2015. Crash reproduction via test case mutation: Let existing test cases help. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 910–913. https://doi.org/10.1145/2786805.2803206 Google ScholarDigital Library
- Xiangjuan Yao, Mark Harman, and Yue Jia. 2014. A study of equivalent and stubborn mutation operators using human analysis of equivalence. In Proceedings of the 36th international conference on software engineering. 919–930. https://doi.org/10.1145/2568225.2568265 Google ScholarDigital Library
- Yucheng Zhang and Ali Mesbah. 2015. Assertions are strongly correlated with test suite effectiveness. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 214–224. https://doi.org/10.1145/2786805.2786858 Google ScholarDigital Library
- Qianqian Zhu, Annibale Panichella, and Andy Zaidman. 2018. A systematic literature review of how mutation testing supports quality assurance processes. Software Testing, Verification and Reliability, 28, 6 (2018), e1675. https://doi.org/10.1002/stvr.1675 Google ScholarCross Ref
Index Terms
- To Kill a Mutant: An Empirical Study of Mutation Testing Kills
Recommendations
Prioritizing mutants to guide mutation testing
ICSE '22: Proceedings of the 44th International Conference on Software EngineeringMutation testing offers concrete test goals (mutants) and a rigorous test efficacy criterion, but it is expensive due to vast numbers of mutants, many of which are neither useful nor actionable. Prior work has focused on selecting representative and ...
An Empirical Study on the Scalability of Selective Mutation Testing
ISSRE '14: Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability EngineeringSoftware testing plays an important role in ensuring software quality by running a program with test suites. Mutation testing is designed to evaluate whether a test suite is adequate in detecting faults. Due to the expensive cost of mutation testing, ...
Faster mutation testing inspired by test prioritization and reduction
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and AnalysisMutation testing is a well-known but costly approach for determining test adequacy. The central idea behind the approach is to generate mutants, which are small syntactic transformations of the program under test, and then to measure for a given test ...
Comments