To Kill a Mutant: An Empirical Study of Mutation Testing Kills

Authors:
Hang Du

University of California at Irvine, USA

University of California at Irvine, USA
View Profile

,
Vijay Krishna Palepu

Microsoft, USA

Microsoft, USA
View Profile

,
James A. Jones

University of California at Irvine, USA

University of California at Irvine, USA
View Profile

ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and AnalysisJuly 2023Pages 715–726https://doi.org/10.1145/3597926.3598090

Published:13 July 2023Publication History

Related Artifact: Reproduction Package for An Empricial Study of Mutation Testing Kills July 2023 experiment https://doi.org/10.5281/zenodo.7939536

ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 715–726

ABSTRACT

Mutation testing has been used and studied for over four decades as a method to assess the strength of a test suite. This technique adds an artificial bug (i.e., a mutation) to a program to produce a mutant, and the test suite is run to determine if any of its test cases are sufficient to detect this mutation (i.e., kill the mutant). In this situation, a test case that fails is the one that kills the mutant. However, little is known about the nature of these kills. In this paper, we present an empirical study that investigates the nature of these kills. We seek to answer questions, such as: How are test cases failing so that they contribute to mutant kills? How many test cases fail for each killed mutant, given that only a single failure is required to kill that mutant? How do program crashes contribute to kills, and what are the origins and nature of these crashes? We found several revealing results across all subjects, including the substantial contribution of "crashes" to test failures leading to mutant kills, the existence of diverse causes for test failures even for a single mutation, and the specific types of exceptions that commonly instigate crashes. We posit that this study and its results should likely be taken into account for practitioners in their use of mutation testing and interpretation of its mutation score, and for researchers who study and leverage mutation testing in their future work.

References

Allen T Acree, Timothy A Budd, Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1979. Mutation Analysis.. Georgia Inst of Tech Atlanta School of Information And Computer Science. Google Scholar
Paul Ammann, Marcio Eduardo Delamaro, and Jeff Offutt. 2014. Establishing theoretical minimal sets of mutants. In 2014 IEEE seventh international conference on software testing, verification and validation. 21–30. https://doi.org/10.1109/ICST.2014.13 Google ScholarDigital Library
Moritz Beller, Chu-Pan Wong, Johannes Bader, Andrew Scott, Mateusz Machalica, Satish Chandra, and Erik Meijer. 2021. What it would take to use mutation testing in industry—a study at facebook. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 268–277. https://doi.org/10.1109/ICSE-SEIP52600.2021.00036 Google ScholarDigital Library
Marat Boshernitsan, Roongko Doong, and Alberto Savoia. 2006. From Daikon to Agitator: Lessons and Challenges in Building a Commercial Tool for Developer Testing. In Proceedings of the 2006 International Symposium on Software Testing and Analysis (ISSTA ’06). Association for Computing Machinery, New York, NY, USA. 169–180. isbn:1595932631 https://doi.org/10.1145/1146238.1146258 Google ScholarDigital Library
Leonardo Bottaci. 2010. Type sensitive application of mutation operators for dynamically typed programs. In 2010 Third International Conference on Software Testing, Verification, and Validation Workshops. 126–131. https://doi.org/10.1109/ICSTW.2010.56 Google ScholarDigital Library
Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: a code manipulation tool to implement adaptable systems. Adaptable and extensible component systems, 30, 19 (2002), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.5769 Google Scholar
Thierry Titcheu Chekam, Mike Papadakis, Maxime Cordy, and Yves Le Traon. 2021. Killing stubborn mutants with symbolic execution. ACM Transactions on Software Engineering and Methodology (TOSEM), 30, 2 (2021), 1–23. issn:0164-1212 https://doi.org/10.1145/3425497 Google ScholarDigital Library
Henry Coles, Thomas Laurent, Christopher Henard, Mike Papadakis, and Anthony Ventresque. 2016. Pit: a practical mutation testing tool for java. In Proceedings of the 25th international symposium on software testing and analysis. 449–452. https://doi.org/10.1145/2931037.2948707 Google ScholarDigital Library
Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on test data selection: Help for the practicing programmer. Computer, 11, 4 (1978), 34–41. https://doi.org/10.1109/C-M.1978.218136 Google ScholarDigital Library
Hang Du, Vijay Krishna Palepu, and James A. Jones. 2023. spideruci/MutationKills: To Kill a Mutant: An Empirical Study of Mutation Testing Kills (ISSTA Replication Package) (v1.0.0). https://doi.org/10.5281/zenodo.7939536 Google ScholarDigital Library
M.D. Ernst, J. Cockrell, W.G. Griswold, and D. Notkin. 2001. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering, 27, 2 (2001), 99–123. https://doi.org/10.1109/32.908957 Google ScholarDigital Library
Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming, 69, 1 (2007), 35–45. issn:0167-6423 https://doi.org/10.1016/j.scico.2007.01.015 Special issue on Experimental Software and Toolkits Google ScholarDigital Library
Antonia Estero-Botaro, Francisco Palomo-Lozano, Inmaculada Medina-Bulo, Juan José Domínguez-Jiménez, and Antonio García-Domínguez. 2015. Quality metrics for mutation testing with applications to WS-BPEL compositions. Software Testing, Verification and Reliability, 25, 5-7 (2015), 536–571. https://doi.org/10.1002/stvr.1528 Google ScholarDigital Library
Gordon Fraser and Andreas Zeller. 2010. Mutation-driven generation of unit tests and oracles. In Proceedings of the 19th international symposium on Software testing and analysis. 147–158. https://doi.org/10.1145/1831708.1831728 Google ScholarDigital Library
R. Geist, A.J. Offutt, and F.C. Harris. 1992. Estimation and enhancement of real-time software reliability through mutation analysis. IEEE Trans. Comput., 41, 5 (1992), 550–558. https://doi.org/10.1109/12.142681 Google ScholarDigital Library
Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical program repair via bytecode mutation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 19–30. https://doi.org/10.1145/3293882.3330559 Google ScholarDigital Library
Rahul Gopinath, Amin Alipour, Iftekhar Ahmed, Carlos Jensen, and Alex Groce. 2015. How hard does mutation analysis have to be, anyway? In 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE). 216–227. https://doi.org/10.1109/ISSRE.2015.7381815 Google ScholarDigital Library
Shin Hong, Byeongcheol Lee, Taehoon Kwak, Yiru Jeon, Bongsuk Ko, Yunho Kim, and Moonzoo Kim. 2015. Mutation-based fault localization for real-world multilingual programs (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 464–475. https://doi.org/10.1109/ASE.2015.14 Google ScholarDigital Library
Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering, 37, 5 (2010), 649–678. https://doi.org/10.1109/TSE.2010.62 Google ScholarDigital Library
René Just, Bob Kurtz, and Paul Ammann. 2017. Inferring mutant utility from program context. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 284–294. https://doi.org/10.1145/3092703.3092732 Google ScholarDigital Library
Marinos Kintis, Mike Papadakis, and Nicos Malevris. 2010. Evaluating mutation testing alternatives: A collateral experiment. In 2010 Asia Pacific Software Engineering Conference. 300–309. https://doi.org/10.1109/APSEC.2010.42 Google ScholarDigital Library
Mario Linares-Vásquez, Gabriele Bavota, Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk. 2017. Enabling mutation testing for android apps. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 233–244. https://doi.org/10.1145/3106237.3106275 Google ScholarDigital Library
Dor D Ma’ayan. 2018. The quality of junit tests: an empirical study report. In 2018 IEEE/ACM 1st International Workshop on Software Qualities and their Dependencies (SQUADE). 33–36. https://doi.org/10.1145/3194095.3194102 Google ScholarDigital Library
Phil McMinn, Chris J Wright, Colton J McCurdy, and Gregory M Kapfhammer. 2017. Automatic detection and removal of ineffective mutants for the mutation analysis of relational database schemas. IEEE Transactions on Software Engineering, 45, 5 (2017), 427–463. https://doi.org/10.1109/TSE.2017.2786286 Google ScholarCross Ref
Seokhyeon Moon, Yunho Kim, Moonzoo Kim, and Shin Yoo. 2014. Ask the mutants: Mutating faulty programs for fault localization. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation. 153–162. https://doi.org/10.1109/ICST.2014.28 Google ScholarDigital Library
Kevin Moran, Michele Tufano, Carlos Bernal-Cárdenas, Mario Linares-Vásquez, Gabriele Bavota, Christopher Vendome, Massimiliano Di Penta, and Denys Poshyvanyk. 2018. Mdroid+: A mutation testing framework for android. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion). 33–36. https://doi.org/10.1145/3183440.3183492 Google ScholarDigital Library
Miloš Ojdanić, Wei Ma, Thomas Laurent, Thierry Titcheu Chekam, Anthony Ventresque, and Mike Papadakis. 2022. On the use of commit-relevant mutants. Empirical Software Engineering, 27, 5 (2022), 1–31. https://doi.org/10.1007/s10664-022-10138-1 Google ScholarDigital Library
Oracle. 2020. Throwable (Java Platform SE 7 ). Retrieved April 15, 2023 from https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html Google Scholar
Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: mutation-based fault localization. Software Testing, Verification and Reliability, 25, 5-7 (2015), 605–628. https://doi.org/10.1002/stvr.1509 Google ScholarDigital Library
Mike Papadakis, Donghwan Shin, Shin Yoo, and Doo-Hwan Bae. 2018. Are mutation scores correlated with real fault detection? a large scale empirical study on the relationship between mutants and real faults. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 537–548. https://doi.org/10.1145/3180155.3180183 Google ScholarDigital Library
Matthew Patrick, Manuel Oriol, and John A Clark. 2012. MESSI: Mutant evaluation by static semantic interpretation. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation. 711–719. https://doi.org/10.1109/ICST.2012.161 Google ScholarDigital Library
Goran Petrović and Marko Ivanković. 2018. State of mutation testing at google. In Proceedings of the 40th international conference on software engineering: Software engineering in practice. 163–171. https://doi.org/10.1145/3183519.3183521 Google ScholarDigital Library
Alessandro Viola Pizzoleto, Fabiano Cutigi Ferrari, Jeff Offutt, Leo Fernandes, and Márcio Ribeiro. 2019. A systematic literature review of techniques and metrics to reduce the cost of mutation testing. Journal of Systems and Software, 157 (2019), 110388. https://doi.org/10.1016/j.jss.2019.07.100 Google ScholarCross Ref
David S. Rosenblum. 1995. A practical approach to programming with assertions. IEEE transactions on Software Engineering, 21, 1 (1995), 19–31. https://doi.org/10.1109/32.341844 Google ScholarDigital Library
David Schuler, Valentin Dallmeier, and Andreas Zeller. 2009. Efficient Mutation Testing by Checking Invariant Violations. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis (ISSTA ’09). Association for Computing Machinery, New York, NY, USA. 69–80. isbn:9781605583389 https://doi.org/10.1145/1572272.1572282 Google ScholarDigital Library
David Schuler and Andreas Zeller. 2011. Assessing Oracle Quality with Checked Coverage. In 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation. 90–99. https://doi.org/10.1109/ICST.2011.32 Google ScholarDigital Library
August Shi, Jonathan Bell, and Darko Marinov. 2019. Mitigating the effects of flaky tests on mutation testing. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 112–122. https://doi.org/10.1145/3293882.3330568 Google ScholarDigital Library
Ben H Smith and Laurie Williams. 2009. On guiding the augmentation of an automated test suite via mutation analysis. Empirical software engineering, 14, 3 (2009), 341–369. https://doi.org/10.1007/s10664-008-9083-7 Google ScholarDigital Library
Thierry Titcheu Chekam, Mike Papadakis, Tegawendé F Bissyandé, Yves Le Traon, and Koushik Sen. 2020. Selecting fault revealing mutants. Empirical Software Engineering, 25, 1 (2020), 434–487. https://doi.org/10.1007/s10664-019-09778-7 Google ScholarDigital Library
Willem Visser. 2016. What makes killing a mutant hard. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 39–44. https://doi.org/10.1145/2970276.2970345 Google ScholarDigital Library
Jifeng Xuan, Xiaoyuan Xie, and Martin Monperrus. 2015. Crash reproduction via test case mutation: Let existing test cases help. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 910–913. https://doi.org/10.1145/2786805.2803206 Google ScholarDigital Library
Xiangjuan Yao, Mark Harman, and Yue Jia. 2014. A study of equivalent and stubborn mutation operators using human analysis of equivalence. In Proceedings of the 36th international conference on software engineering. 919–930. https://doi.org/10.1145/2568225.2568265 Google ScholarDigital Library
Yucheng Zhang and Ali Mesbah. 2015. Assertions are strongly correlated with test suite effectiveness. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 214–224. https://doi.org/10.1145/2786805.2786858 Google ScholarDigital Library
Qianqian Zhu, Annibale Panichella, and Andy Zaidman. 2018. A systematic literature review of how mutation testing supports quality assurance processes. Software Testing, Verification and Reliability, 28, 6 (2018), e1675. https://doi.org/10.1002/stvr.1675 Google ScholarCross Ref

Index Terms

To Kill a Mutant: An Empirical Study of Mutation Testing Kills
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation

Recommendations

Prioritizing mutants to guide mutation testing
ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Mutation testing offers concrete test goals (mutants) and a rigorous test efficacy criterion, but it is expensive due to vast numbers of mutants, many of which are neither useful nor actionable. Prior work has focused on selecting representative and ...
Read More
An Empirical Study on the Scalability of Selective Mutation Testing
ISSRE '14: Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability Engineering

Software testing plays an important role in ensuring software quality by running a program with test suites. Mutation testing is designed to evaluate whether a test suite is adequate in detecting faults. Due to the expensive cost of mutation testing, ...
Read More
Faster mutation testing inspired by test prioritization and reduction
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

Mutation testing is a well-known but costly approach for determining test adequacy. The central idea behind the approach is to generate mutants, which are small syntactic transformations of the program under test, and then to measure for a given test ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2023
1554 pages
ISBN:9798400702211
DOI:10.1145/3597926
General Chair:
René Just
University of Washington, USA
,
Program Chair:
Gordon Fraser
University of Passau, Germany
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available / v1.1
- Artifacts Evaluated & Reusable / v1.1
Author Tags
empirical study
mutant detection
mutation testing
test failure classification
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate58of213submissions,27%
Upcoming Conference
ISSTA '24

Sponsor:

sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 389
  Total Downloads
- Downloads (Last 12 months)389
- Downloads (Last 6 weeks)48
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

To Kill a Mutant: An Empirical Study of Mutation Testing Kills

ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prioritizing mutants to guide mutation testing

An Empirical Study on the Scalability of Selective Mutation Testing

Faster mutation testing inspired by test prioritization and reduction