skip to main content
10.1145/3324884.3416629acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Open Access

Legion: best-first concolic testing

Authors Info & Claims
Published:27 January 2021Publication History

ABSTRACT

Concolic execution and fuzzing are two complementary coverage-based testing techniques. How to achieve the best of both remains an open challenge. To address this research problem, we propose and evaluate Legion. Legion re-engineers the Monte Carlo tree search (MCTS) framework from the AI literature to treat automated test generation as a problem of sequential decision-making under uncertainty. Its best-first search strategy provides a principled way to learn the most promising program states to investigate at each search iteration, based on observed rewards from previous iterations. Legion incorporates a form of directed fuzzing that we call approximate path-preserving fuzzing (APPFuzzing) to investigate program states selected by MCTS. APPFuzzing serves as the Monte Carlo simulation technique and is implemented by extending prior work on constrained sampling. We evaluate Legion against competitors on 2531 benchmarks from the coverage category of Test-Comp 2020, as well as measuring its sensitivity to hyperparameters, demonstrating its effectiveness on a wide variety of input programs.

References

  1. Animesh Basak Chowdhury, Raveendra Kumar Medicherla, and Venkatesh R. 2019. VeriFuzz: Program Aware Fuzzing. In Tools and Algorithms for the Construction and Analysis of Systems, Dirk Beyer, Marieke Huisman, Fabrice Kordon, and Bernhard Steffen (Eds.). Springer International Publishing, Cham, 244--249.Google ScholarGoogle Scholar
  2. D. Beyer. 2020. Second Competition on Software Testing: Test-Comp 2020. In Proc. FASE (LNCS). Springer. https://www.sosy-lab.org/research/pub/2020-FASE.Second_Competition_on_Software_Testing_Test-Comp_2020.pdfGoogle ScholarGoogle Scholar
  3. Dirk Beyer and Marie-Christine Jakobs. 2019. CoVeriTest: Cooperative verifier-based testing. In International Conference on Fundamental Approaches to Software Engineering. Springer, 389--408.Google ScholarGoogle ScholarCross RefCross Ref
  4. Dirk Beyer, Stefan Löwe, and Philipp Wendler. 2019. Reliable benchmarking: Requirements and solutions. International Journal on Software Tools for Technology Transfer 21, 1 (2019), 1--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2329--2344.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2017. Coverage-based greybox fuzzing as Markov chain. IEEE Transactions on Software Engineering 45, 5 (2017), 489--506.Google ScholarGoogle ScholarCross RefCross Ref
  7. C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton. 2012. A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games 4, 1 (March 2012), 1--43. Google ScholarGoogle ScholarCross RefCross Ref
  8. Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs.. In OSDI, Vol. 8. 209--224.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Marek Chalupa, Tomáš Jašek, Lukáš Tomovič, Martin Hruška, Veronika Šoková, Paulína Ayaziová, Jan Strejček, and Tomáš Vojnar. 2020. Symbiotic 7: Integration of Predator and More. In Tools and Algorithms for the Construction and Analysis of Systems, Armin Biere and David Parker (Eds.). Springer International Publishing, Cham, 413--417.Google ScholarGoogle Scholar
  10. Peng Chen and Hao Chen. 2018. Angora: Efficient fuzzing by principled search. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 711--725.Google ScholarGoogle ScholarCross RefCross Ref
  11. R. Dutra, K. Laeufer, J. Bachrach, and K. Sen. 2018. Efficient Sampling of SAT Solutions for Testing. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 549--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shuitao Gan, Chao Zhang, Xiaojun Qin, Xuwen Tu, Kang Li, Zhongyu Pei, and Zuoning Chen. 2018. Collafl: Path sensitive fuzzing. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 679--696.Google ScholarGoogle ScholarCross RefCross Ref
  13. Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (Chicago, IL, USA) (PLDI '05). Association for Computing Machinery, New York, NY, USA, 213--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox Fuzzing for Security Testing. Queue 10, 1 (Jan. 2012), 20--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&fuzz: Machine learning for input fuzzing. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 50--59.Google ScholarGoogle ScholarCross RefCross Ref
  16. Joxan Jaffar, Rasool Maghareh, Sangharatna Godboley, and Xuan-Linh Ha. 2020. TracerX: Dynamic symbolic execution with interpolation (competition contribution). In International Conference on Fundamental Approaches to Software Engineering. Springer, 530--534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Levente Kocsis and Csaba Szepesvári. 2006. Bandit based Monte-Carlo planning. In European Conference on Machine Learning. Springer, 282--293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hoang M Le. 2020. Llvm-based hybrid fuzzing with LibKluzzer (competition contribution). In International Conference on Fundamental Approaches to Software Engineering. Springer, 535--539.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavy-weight dynamic binary instrumentation. ACM Sigplan notices 42, 6 (2007), 89--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Peng, Y. Shoshitaishvili, and M. Payer. 2018. T-Fuzz: Fuzzing by Program Transformation. In 2018 IEEE Symposium on Security and Privacy (SP). 697--710.Google ScholarGoogle Scholar
  21. BobJoyce Reed Hastings. 1991. Purify: Fast detection of memory leaks and access errors. In In Proc. of the Winter 1992 USENIX Conference. Citeseer.Google ScholarGoogle Scholar
  22. Sebastian Ruland, Malte Lochau, and Marie-Christine Jakobs. 2020. HybridTiger: Hybrid model checking and domination-based partitioning for efficient multi-goal test-suite generation (competition contribution). In International Conference on Fundamental Approaches to Software Engineering. Springer, 520--524.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Andrey Ryabinin. 2014. UBSan: run-time undefined behavior sanity checker. E-mail publié sur la liste de développement du noyau Linux 20 (2014).Google ScholarGoogle Scholar
  24. Kostya Serebryany. 2015. libFuzzer-a library for coverage-guided fuzz testing. LLVM project (2015).Google ScholarGoogle Scholar
  25. Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitriy Vyukov. 2012. AddressSanitizer: A fast address sanitychecker. In Presented as part of the 2012 {USENIX} Annual Technical Conference ({USENIX} {ATC} 12). 309--318.Google ScholarGoogle Scholar
  26. Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, et al. 2016. Sok: (State of) The Art of War: Offensive techniques in binary analysis. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 138--157.Google ScholarGoogle ScholarCross RefCross Ref
  27. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484.Google ScholarGoogle Scholar
  28. Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2016. Driller: Augmenting Fuzzing Through Selective Symbolic Execution.. In NDSS, Vol. 16. 1--16.Google ScholarGoogle Scholar
  29. István Szita, Guillaume Chaslot, and Pieter Spronck. 2010. Monte-Carlo Tree Search in Settlers of Catan. In Advances in Computer Games, H. Jaap van den Herik and Pieter Spronck (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 21--32.Google ScholarGoogle Scholar
  30. Guy Van den Broeck, Kurt Driessens, and Jan Ramon. 2009. Monte-Carlo tree search in poker using expected reward distributions. In Asian Conference on Machine Learning. Springer, 367--381.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xinyu Wang, Jun Sun, Zhenbang Chen, Peixin Zhang, Jingyi Wang, and Yun Lin. 2018. Towards Optimal Concolic Testing. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE '18). Association for Computing Machinery, New York, NY, USA, 291--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C.D. Ward and Peter Cowling. 2009. Monte Carlo search applied to card selection in Magic: The Gathering. 9 -- 16. Google ScholarGoogle ScholarCross RefCross Ref
  33. Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: A Coverage-Guided Fuzz Testing Framework for Deep Neural Networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 146--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Yeh, H. Lu, J. Yeh, and S. Huang. 2017. Path Exploration Based on Monte Carlo Tree Search for Symbolic Execution. In 2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI). 33--37. Google ScholarGoogle ScholarCross RefCross Ref
  35. Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. {QSYM}: A practical concolic execution engine tailored for hybrid fuzzing. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 745--761.Google ScholarGoogle Scholar
  36. Michal Zalewski. 2017. American fuzzy lop (AFL) fuzzer.Google ScholarGoogle Scholar
  37. Lei Zhao, Yue Duan, Heng Yin, and Jifeng Xuan. 2019. Send Hardest Problems My Way: Probabilistic Path Prioritization for Hybrid Fuzzing.. In NDSS.Google ScholarGoogle Scholar
  1. Legion: best-first concolic testing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader