skip to main content
10.1145/3460319.3464825acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections

DeepCrime: mutation testing of deep learning systems based on real faults

Published:11 July 2021Publication History

ABSTRACT

Deep Learning (DL) solutions are increasingly adopted, but how to test them remains a major open research problem. Existing and new testing techniques have been proposed for and adapted to DL systems, including mutation testing. However, no approach has investigated the possibility to simulate the effects of real DL faults by means of mutation operators. We have defined 35 DL mutation operators relying on 3 empirical studies about real faults in DL systems. We followed a systematic process to extract the mutation operators from the existing fault taxonomies, with a formal phase of conflict resolution in case of disagreement. We have implemented 24 of these DL mutation operators into DeepCrime, the first source-level pre-training mutation tool based on real DL faults. We have assessed our mutation operators to understand their characteristics: whether they produce interesting, i.e., killable but not trivial, mutations. Then, we have compared the sensitivity of our tool to the changes in the quality of test data with that of DeepMutation++, an existing post-training DL mutation tool.

References

  1. 2013. DiffMerge: an application to visually compare and merge files on Windows, OS X and Linux. https://sourcegear.com/diffmerge/Google ScholarGoogle Scholar
  2. 2019. FrameworkData. https://towardsdatascience.com/deep-learning-framework-power-scores-2018-23607ddf297aGoogle ScholarGoogle Scholar
  3. 2020. DeepCrime Replication Package. https://zenodo.org/record/4772465Google ScholarGoogle Scholar
  4. 2020. An implementation of a multimodal CNN for appearance-based gaze estimation. https://github.com/dlsuroviki/UnityEyesModelGoogle ScholarGoogle Scholar
  5. 2020. Keras Code Examples. Available at https://keras.io/examples/Google ScholarGoogle Scholar
  6. 2020. Keras MNIST CNN Model. Available at https://keras.io/examples/vision/mnist_convnet/Google ScholarGoogle Scholar
  7. 2020. Keras Movie Recommender Model. Available at https://keras.io/examples/structured_data/collaborative_filtering_movielens/Google ScholarGoogle Scholar
  8. 2020. Movie Recommender Dataset. Available at http://files.grouplens.org/datasets/movielens/ml-latest-small.zipGoogle ScholarGoogle Scholar
  9. 2020. Speaker Recognition Dataset. Available at https://www.kaggle.com/kongaevans/speaker-recognition-datasetGoogle ScholarGoogle Scholar
  10. 2020. Speaker Recognition Model. Available at https://keras.io/examples/audio/speaker_recognition_using_cnn/Google ScholarGoogle Scholar
  11. Boris Beizer. 1984. Software System Testing and Quality Assurance. Van Nostrand Reinhold Co., New York, NY, USA. isbn:0-442-21306-9Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016. End to End Learning for Self-Driving Cars.. CoRR, abs/1604.07316 (2016), arxiv:1604.07316Google ScholarGoogle Scholar
  13. Taejoon Byun, Vaibhav Sharma, Abhishek Vijayakumar, Sanjai Rayadurgam, and Darren Cofer. 2019. Input prioritization for testing neural networks. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest). 63–70. https://doi.org/10.1109/AITest.2019.000-6 Google ScholarGoogle ScholarCross RefCross Ref
  14. 2020. DeepCrime. https://github.com/dlfaults/deepcrimeGoogle ScholarGoogle Scholar
  15. Yarin Gal and Zoubin Ghahramani. 2016. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems. 1019–1027.Google ScholarGoogle Scholar
  16. Marcio Augusto Guimarães, Leo Fernandes, Márcio Ribeiro, Marcelo d’Amorim, and Rohit Gheyi. 2020. Optimizing Mutation Testing by Discovering Dynamic Mutant Subsumption Relations. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST). 198–208. https://doi.org/10.1109/ICST46399.2020.00029 Google ScholarGoogle ScholarCross RefCross Ref
  17. Jahangirova Gunel, Stocco Andrea, and Tonella Paolo. 2021. Quality Metrics and Oracles for Autonomous Vehicles Testing. In 2021 IEEE 14th International Conference on Software Testing, Validation and Verification (ICST). https://doi.org/10.1109/ICST49551.2021.00030 Google ScholarGoogle ScholarCross RefCross Ref
  18. Qiang Hu, Lei Ma, Xiaofei Xie, Bing Yu, Yang Liu, and Jianjun Zhao. 2019. DeepMutation++: A Mutation Testing Framework for Deep Learning Systems. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1158–1161. https://doi.org/10.1109/ASE.2019.00126 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. [n.d.]. Dataset of Real Faults in Deep Learning Systems. https://zenodo.org/record/3667541#.Xzmily2B3zsGoogle ScholarGoogle Scholar
  20. Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of Real Faults in Deep Learning Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1110–1121. isbn:9781450371216 https://doi.org/10.1145/3377811.3380395 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). ACM, New York, NY, USA. 510–520. isbn:978-1-4503-5572-8 https://doi.org/10.1145/3338906.3338955 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gunel Jahangirova and Paolo Tonella. 2020. An Empirical Evaluation of Mutation Operators for Deep Learning Systems. In IEEE International Conference on Software Testing, Verification and Validation (ICST’20). IEEE, 12 pages. https://doi.org/10.1109/ICST46399.2020.00018 Google ScholarGoogle ScholarCross RefCross Ref
  23. Ken Kelley and Kristopher J Preacher. 2012. On effect size.. Psychological methods, 17, 2 (2012), 137. https://doi.org/10.1037/a0028086 Google ScholarGoogle ScholarCross RefCross Ref
  24. Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering, ICSE. 1039–1049. https://doi.org/10.1109/ICSE.2019.00108 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bob Kurtz, Paul Ammann, Marcio E Delamaro, Jeff Offutt, and Lin Deng. 2014. Mutant subsumption graphs. In IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops. 176–185. https://doi.org/10.1109/ICSTW.2014.20 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bob Kurtz, Paul Ammann, Jeff Offutt, Márcio E. Delamaro, Mariet Kurtz, and Nida Gökçe. 2016. Analyzing the validity of selective mutation with dominator mutants. In ACM Sigsoft International Symposium on Foundations of Software Engineering. https://doi.org/10.1145/2950290.2950322 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yann LeCun. 1998. The MNIST Database of Handwritten Digits. Available at http://yann. lecun. com/exdb/mnist/Google ScholarGoogle Scholar
  28. Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity Testing Criteria for Deep Learning Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). ACM, New York, NY, USA. 120–131. isbn:978-1-4503-5937-5 https://doi.org/10.1145/3238147.3238202 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepMutation: Mutation Testing of Deep Learning Systems. In 29th IEEE International Symposium on Software Reliability Engineering, ISSRE 2018, Memphis, TN, USA, October 15-18, 2018. 100–111. https://doi.org/10.1109/ISSRE.2018.00021 Google ScholarGoogle ScholarCross RefCross Ref
  30. John Ashworth Nelder and Robert WM Wedderburn. 1972. Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135, 3 (1972), 370–384. https://doi.org/10.2307/2344614 Google ScholarGoogle ScholarCross RefCross Ref
  31. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles. 1–18. https://doi.org/10.1145/3132747.3132785 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Maryam Vahdat Pour, Zhuo Li, Lei Ma, and Hadi Hemmati. 2021. A Search-Based Testing Framework for Deep Neural Networks of Source Code Embedding. In IEEE International Conference on Software Testing, Verification and Validation (ICST’21). IEEE, 11 pages. arxiv:2101.07910Google ScholarGoogle ScholarCross RefCross Ref
  33. W. Shen, J. Wan, and Z. Chen. 2018. MuNN: Mutation Analysis of Neural Networks. In 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). 108–115. https://doi.org/10.1109/QRS-C.2018.00032 Google ScholarGoogle ScholarCross RefCross Ref
  34. Jeongju Sohn, Sungmin Kang, and Shin Yoo. 2019. Search Based Repair of Deep Neural Networks. arXiv preprint arXiv:1912.12463, arxiv:1912.12463Google ScholarGoogle Scholar
  35. Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial sample detection for deep neural network through model mutation testing. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 1245–1256. https://doi.org/10.1109/ICSE.2019.00126 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, and Wenbin Zhang. 2021. Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 397–409. https://doi.org/10.1109/ICSE43902.2021.00046 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Edwin B Wilson. 1927. Probable inference, the law of succession, and statistical inference. J. Amer. Statist. Assoc., 22, 158 (1927), 209–212. https://doi.org/10.1080/01621459.1927.10502953 Google ScholarGoogle ScholarCross RefCross Ref
  38. Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, and Andreas Bulling. 2016. Learning an Appearance-Based Gaze Estimator from One Million Synthesised Images. ETRA ’16. Association for Computing Machinery, New York, NY, USA. 131–138. isbn:9781450341257 https://doi.org/10.1145/2857491.2857492 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Lotfi A Zadeh. 1965. Fuzzy sets. Information and control, 8, 3 (1965), 338–353. https://doi.org/10.1016/S0019-9958(65)90241-X Google ScholarGoogle ScholarCross RefCross Ref
  40. Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An Empirical Study on TensorFlow Program Bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018). ACM, New York, NY, USA. 129–140. isbn:978-1-4503-5699-2 https://doi.org/10.1145/3213846.3213866 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DeepCrime: mutation testing of deep learning systems based on real faults

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
      July 2021
      685 pages
      ISBN:9781450384599
      DOI:10.1145/3460319

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 July 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate58of213submissions,27%

      Upcoming Conference

      ISSTA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader