skip to main content
10.1145/3597926.3598090acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections

To Kill a Mutant: An Empirical Study of Mutation Testing Kills

Published:13 July 2023Publication History

ABSTRACT

Mutation testing has been used and studied for over four decades as a method to assess the strength of a test suite. This technique adds an artificial bug (i.e., a mutation) to a program to produce a mutant, and the test suite is run to determine if any of its test cases are sufficient to detect this mutation (i.e., kill the mutant). In this situation, a test case that fails is the one that kills the mutant. However, little is known about the nature of these kills. In this paper, we present an empirical study that investigates the nature of these kills. We seek to answer questions, such as: How are test cases failing so that they contribute to mutant kills? How many test cases fail for each killed mutant, given that only a single failure is required to kill that mutant? How do program crashes contribute to kills, and what are the origins and nature of these crashes? We found several revealing results across all subjects, including the substantial contribution of "crashes" to test failures leading to mutant kills, the existence of diverse causes for test failures even for a single mutation, and the specific types of exceptions that commonly instigate crashes. We posit that this study and its results should likely be taken into account for practitioners in their use of mutation testing and interpretation of its mutation score, and for researchers who study and leverage mutation testing in their future work.

References

  1. Allen T Acree, Timothy A Budd, Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1979. Mutation Analysis.. Georgia Inst of Tech Atlanta School of Information And Computer Science. Google ScholarGoogle Scholar
  2. Paul Ammann, Marcio Eduardo Delamaro, and Jeff Offutt. 2014. Establishing theoretical minimal sets of mutants. In 2014 IEEE seventh international conference on software testing, verification and validation. 21–30. https://doi.org/10.1109/ICST.2014.13 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Moritz Beller, Chu-Pan Wong, Johannes Bader, Andrew Scott, Mateusz Machalica, Satish Chandra, and Erik Meijer. 2021. What it would take to use mutation testing in industry—a study at facebook. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 268–277. https://doi.org/10.1109/ICSE-SEIP52600.2021.00036 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marat Boshernitsan, Roongko Doong, and Alberto Savoia. 2006. From Daikon to Agitator: Lessons and Challenges in Building a Commercial Tool for Developer Testing. In Proceedings of the 2006 International Symposium on Software Testing and Analysis (ISSTA ’06). Association for Computing Machinery, New York, NY, USA. 169–180. isbn:1595932631 https://doi.org/10.1145/1146238.1146258 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Leonardo Bottaci. 2010. Type sensitive application of mutation operators for dynamically typed programs. In 2010 Third International Conference on Software Testing, Verification, and Validation Workshops. 126–131. https://doi.org/10.1109/ICSTW.2010.56 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: a code manipulation tool to implement adaptable systems. Adaptable and extensible component systems, 30, 19 (2002), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.5769 Google ScholarGoogle Scholar
  7. Thierry Titcheu Chekam, Mike Papadakis, Maxime Cordy, and Yves Le Traon. 2021. Killing stubborn mutants with symbolic execution. ACM Transactions on Software Engineering and Methodology (TOSEM), 30, 2 (2021), 1–23. issn:0164-1212 https://doi.org/10.1145/3425497 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Henry Coles, Thomas Laurent, Christopher Henard, Mike Papadakis, and Anthony Ventresque. 2016. Pit: a practical mutation testing tool for java. In Proceedings of the 25th international symposium on software testing and analysis. 449–452. https://doi.org/10.1145/2931037.2948707 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on test data selection: Help for the practicing programmer. Computer, 11, 4 (1978), 34–41. https://doi.org/10.1109/C-M.1978.218136 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hang Du, Vijay Krishna Palepu, and James A. Jones. 2023. spideruci/MutationKills: To Kill a Mutant: An Empirical Study of Mutation Testing Kills (ISSTA Replication Package) (v1.0.0). https://doi.org/10.5281/zenodo.7939536 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M.D. Ernst, J. Cockrell, W.G. Griswold, and D. Notkin. 2001. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering, 27, 2 (2001), 99–123. https://doi.org/10.1109/32.908957 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming, 69, 1 (2007), 35–45. issn:0167-6423 https://doi.org/10.1016/j.scico.2007.01.015 Special issue on Experimental Software and Toolkits Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Antonia Estero-Botaro, Francisco Palomo-Lozano, Inmaculada Medina-Bulo, Juan José Domínguez-Jiménez, and Antonio García-Domínguez. 2015. Quality metrics for mutation testing with applications to WS-BPEL compositions. Software Testing, Verification and Reliability, 25, 5-7 (2015), 536–571. https://doi.org/10.1002/stvr.1528 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gordon Fraser and Andreas Zeller. 2010. Mutation-driven generation of unit tests and oracles. In Proceedings of the 19th international symposium on Software testing and analysis. 147–158. https://doi.org/10.1145/1831708.1831728 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Geist, A.J. Offutt, and F.C. Harris. 1992. Estimation and enhancement of real-time software reliability through mutation analysis. IEEE Trans. Comput., 41, 5 (1992), 550–558. https://doi.org/10.1109/12.142681 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical program repair via bytecode mutation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 19–30. https://doi.org/10.1145/3293882.3330559 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Rahul Gopinath, Amin Alipour, Iftekhar Ahmed, Carlos Jensen, and Alex Groce. 2015. How hard does mutation analysis have to be, anyway? In 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE). 216–227. https://doi.org/10.1109/ISSRE.2015.7381815 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Shin Hong, Byeongcheol Lee, Taehoon Kwak, Yiru Jeon, Bongsuk Ko, Yunho Kim, and Moonzoo Kim. 2015. Mutation-based fault localization for real-world multilingual programs (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 464–475. https://doi.org/10.1109/ASE.2015.14 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering, 37, 5 (2010), 649–678. https://doi.org/10.1109/TSE.2010.62 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. René Just, Bob Kurtz, and Paul Ammann. 2017. Inferring mutant utility from program context. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 284–294. https://doi.org/10.1145/3092703.3092732 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Marinos Kintis, Mike Papadakis, and Nicos Malevris. 2010. Evaluating mutation testing alternatives: A collateral experiment. In 2010 Asia Pacific Software Engineering Conference. 300–309. https://doi.org/10.1109/APSEC.2010.42 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mario Linares-Vásquez, Gabriele Bavota, Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk. 2017. Enabling mutation testing for android apps. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 233–244. https://doi.org/10.1145/3106237.3106275 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dor D Ma’ayan. 2018. The quality of junit tests: an empirical study report. In 2018 IEEE/ACM 1st International Workshop on Software Qualities and their Dependencies (SQUADE). 33–36. https://doi.org/10.1145/3194095.3194102 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Phil McMinn, Chris J Wright, Colton J McCurdy, and Gregory M Kapfhammer. 2017. Automatic detection and removal of ineffective mutants for the mutation analysis of relational database schemas. IEEE Transactions on Software Engineering, 45, 5 (2017), 427–463. https://doi.org/10.1109/TSE.2017.2786286 Google ScholarGoogle ScholarCross RefCross Ref
  25. Seokhyeon Moon, Yunho Kim, Moonzoo Kim, and Shin Yoo. 2014. Ask the mutants: Mutating faulty programs for fault localization. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation. 153–162. https://doi.org/10.1109/ICST.2014.28 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kevin Moran, Michele Tufano, Carlos Bernal-Cárdenas, Mario Linares-Vásquez, Gabriele Bavota, Christopher Vendome, Massimiliano Di Penta, and Denys Poshyvanyk. 2018. Mdroid+: A mutation testing framework for android. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion). 33–36. https://doi.org/10.1145/3183440.3183492 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Miloš Ojdanić, Wei Ma, Thomas Laurent, Thierry Titcheu Chekam, Anthony Ventresque, and Mike Papadakis. 2022. On the use of commit-relevant mutants. Empirical Software Engineering, 27, 5 (2022), 1–31. https://doi.org/10.1007/s10664-022-10138-1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Oracle. 2020. Throwable (Java Platform SE 7 ). Retrieved April 15, 2023 from https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html Google ScholarGoogle Scholar
  29. Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: mutation-based fault localization. Software Testing, Verification and Reliability, 25, 5-7 (2015), 605–628. https://doi.org/10.1002/stvr.1509 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mike Papadakis, Donghwan Shin, Shin Yoo, and Doo-Hwan Bae. 2018. Are mutation scores correlated with real fault detection? a large scale empirical study on the relationship between mutants and real faults. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 537–548. https://doi.org/10.1145/3180155.3180183 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Matthew Patrick, Manuel Oriol, and John A Clark. 2012. MESSI: Mutant evaluation by static semantic interpretation. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation. 711–719. https://doi.org/10.1109/ICST.2012.161 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Goran Petrović and Marko Ivanković. 2018. State of mutation testing at google. In Proceedings of the 40th international conference on software engineering: Software engineering in practice. 163–171. https://doi.org/10.1145/3183519.3183521 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Alessandro Viola Pizzoleto, Fabiano Cutigi Ferrari, Jeff Offutt, Leo Fernandes, and Márcio Ribeiro. 2019. A systematic literature review of techniques and metrics to reduce the cost of mutation testing. Journal of Systems and Software, 157 (2019), 110388. https://doi.org/10.1016/j.jss.2019.07.100 Google ScholarGoogle ScholarCross RefCross Ref
  34. David S. Rosenblum. 1995. A practical approach to programming with assertions. IEEE transactions on Software Engineering, 21, 1 (1995), 19–31. https://doi.org/10.1109/32.341844 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. David Schuler, Valentin Dallmeier, and Andreas Zeller. 2009. Efficient Mutation Testing by Checking Invariant Violations. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis (ISSTA ’09). Association for Computing Machinery, New York, NY, USA. 69–80. isbn:9781605583389 https://doi.org/10.1145/1572272.1572282 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. David Schuler and Andreas Zeller. 2011. Assessing Oracle Quality with Checked Coverage. In 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation. 90–99. https://doi.org/10.1109/ICST.2011.32 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. August Shi, Jonathan Bell, and Darko Marinov. 2019. Mitigating the effects of flaky tests on mutation testing. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 112–122. https://doi.org/10.1145/3293882.3330568 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ben H Smith and Laurie Williams. 2009. On guiding the augmentation of an automated test suite via mutation analysis. Empirical software engineering, 14, 3 (2009), 341–369. https://doi.org/10.1007/s10664-008-9083-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Thierry Titcheu Chekam, Mike Papadakis, Tegawendé F Bissyandé, Yves Le Traon, and Koushik Sen. 2020. Selecting fault revealing mutants. Empirical Software Engineering, 25, 1 (2020), 434–487. https://doi.org/10.1007/s10664-019-09778-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Willem Visser. 2016. What makes killing a mutant hard. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 39–44. https://doi.org/10.1145/2970276.2970345 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jifeng Xuan, Xiaoyuan Xie, and Martin Monperrus. 2015. Crash reproduction via test case mutation: Let existing test cases help. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 910–913. https://doi.org/10.1145/2786805.2803206 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiangjuan Yao, Mark Harman, and Yue Jia. 2014. A study of equivalent and stubborn mutation operators using human analysis of equivalence. In Proceedings of the 36th international conference on software engineering. 919–930. https://doi.org/10.1145/2568225.2568265 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yucheng Zhang and Ali Mesbah. 2015. Assertions are strongly correlated with test suite effectiveness. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 214–224. https://doi.org/10.1145/2786805.2786858 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Qianqian Zhu, Annibale Panichella, and Andy Zaidman. 2018. A systematic literature review of how mutation testing supports quality assurance processes. Software Testing, Verification and Reliability, 28, 6 (2018), e1675. https://doi.org/10.1002/stvr.1675 Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. To Kill a Mutant: An Empirical Study of Mutation Testing Kills

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader