ABSTRACT
Mutation-based greybox fuzzing---unquestionably the most widely-used fuzzing technique---relies on a set of non-crashing seed inputs (a corpus) to bootstrap the bug-finding process. When evaluating a fuzzer, common approaches for constructing this corpus include: (i) using an empty file; (ii) using a single seed representative of the target's input format; or (iii) collecting a large number of seeds (e.g., by crawling the Internet). Little thought is given to how this seed choice affects the fuzzing process, and there is no consensus on which approach is best (or even if a best approach exists).
To address this gap in knowledge, we systematically investigate and evaluate how seed selection affects a fuzzer's ability to find bugs in real-world software. This includes a systematic review of seed selection practices used in both evaluation and deployment contexts, and a large-scale empirical evaluation (over 33 CPU-years) of six seed selection approaches. These six seed selection approaches include three corpus minimization techniques (which select the smallest subset of seeds that trigger the same range of instrumentation data points as a full corpus).
Our results demonstrate that fuzzing outcomes vary significantly depending on the initial seeds used to bootstrap the fuzzer, with minimized corpora outperforming singleton, empty, and large (in the order of thousands of files) seed sets. Consequently, we encourage seed selection to be foremost in mind when evaluating/deploying fuzzers, and recommend that (a) seed choice be carefully considered and explicitly documented, and (b) never to evaluate fuzzers with only a single seed.
- 2020. Kenney. https://www.kenney.nl/Google Scholar
- 2020. The Motion Monkey. https://www.themotionmonkey.co.uk/Google Scholar
- 2020. Open Game Art. https://opengameart.org/Google Scholar
- 2020. Regular Expression Library. http://regexlib.comGoogle Scholar
- Humberto Abdelnur, Radu State, Obes Jorge Lucangeli, and Olivier Festor. 2010. Spectral Fuzzing: Evaluation & Feedback. INRIA. https://hal.inria.fr/inria-00452015Google Scholar
- Mike Aizatsky, Kostya Serebryany, Oliver Chang, Abhishek Arya, and Meredith Whittaker. 2016. Announcing OSS-Fuzz: Continuous fuzzing for open source software. https://opensource.googleblog.com/2016/12/announcing-oss-fuzz-continuous-fuzzing.htmlGoogle Scholar
- Andrea Arcuri and Lionel Briand. 2011. A Practical Guide for Using Statistical Tests to Assess Randomized Algorithms in Software Engineering. In ACM/IEEE International Conference on Software Engineering (ICSE). 1–10. https://doi.org/10.1145/1985793.1985795 Google ScholarDigital Library
- Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for Deep Bugs with Grammars. In Network and Distributed System Security Symposium (NDSS). https://www.ndss-symposium.org/ndss-paper/nautilus-fishing-for-deep-bugs-with-grammarsGoogle Scholar
- Cornelius Aschermann, Sergej Schumilo, Tim Blazytko, Robert Gawlik, and Thorsten Holz. 2019. REDQUEEN: Fuzzing with Input-to-State Correspondence. In Network and Distributed System Security Symposium (NDSS). https://www.ndss-symposium.org/ndss-paper/redqueen-fuzzing-with-input-to-state-correspondence/Google Scholar
- Florent Avellaneda. 2020. A short description of the solver EvalMaxSAT. In MaxSAT Evaluations. http://florent.avellaneda.free.fr/dl/EvalMaxSAT.pdfGoogle Scholar
- Tim Blazytko, Cornelius Aschermann, Moritz Schlögel, Ali Abbasi, Sergej Schumilo, Simon Wörner, and Thorsten Holz. 2019. GRIMOIRE: Synthesizing Structure While Fuzzing. In USENIX Security Symposium (SEC). 1985–2002. https://www.usenix.org/system/files/sec19-blazytko.pdfGoogle Scholar
- Marcel Böhme and Brandon Falk. 2020. Fuzzing: On the Exponential Cost of Vulnerability Discovery. In Joint European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). 713–724. https://doi.org/10.1145/3368089.3409729 Google ScholarDigital Library
- Marcel Böhme, Valentin J.M. Manès, and Sang Kil Cha. 2020. Boosting Fuzzer Efficiency: An Information Theoretic Perspective. In Joint European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). 678–689. https://doi.org/10.1145/3368089.3409748 Google ScholarDigital Library
- Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed Greybox Fuzzing. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 2329–2344. https://doi.org/10.1145/3133956.3134020 Google ScholarDigital Library
- Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage-Based Greybox Fuzzing as Markov Chain. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 1032–1043. https://doi.org/10.1145/2976749.2978428 Google ScholarDigital Library
- Oliver Chang, Abhishek Arya, Kostya Serebryany, and Josh Armour. 2017. OSS-Fuzz: Five months later, and rewarding projects. https://opensource.googleblog.com/2017/05/oss-fuzz-five-months-later-and.htmlGoogle Scholar
- Hongxu Chen, Shengjian Guo, Yinxing Xue, Yulei Sui, Cen Zhang, Yuekang Li, Haijun Wang, and Yang Liu. 2020. MUZZ: Thread-aware Grey-box Fuzzing for Effective Bug Hunting in Multithreaded Programs. In USENIX Security Symposium (SEC). 2325–2342. https://www.usenix.org/conference/usenixsecurity20/presentation/chen-hongxuGoogle Scholar
- Hongxu Chen, Yinxing Xue, Yuekang Li, Bihuan Chen, Xiaofei Xie, Xiuheng Wu, and Yang Liu. 2018. Hawkeye: Towards a Desired Directed Grey-Box Fuzzer. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 2095–2108. https://doi.org/10.1145/3243734.3243849 Google ScholarDigital Library
- Yaohui Chen, Peng Li, Jun Xu, Shengjian Guo, Rundong Zhou, Yulong Zhang, Tao Wei, and Long Lu. 2020. SAVIOR: Towards Bug-Driven Hybrid Testing. In IEEE Symposium on Security and Privacy (S&P). 1580–1596. https://doi.org/10.1109/SP40000.2020.00002 Google ScholarCross Ref
- Yaohui Chen, Dongliang Mu, Jun Xu, Zhichuang Sun, Wenbo Shen, Xinyu Xing, Long Lu, and Bing Mao. 2019. PTrix: Efficient Hardware-Assisted Fuzzing for COTS Binary. In ACM Asia Conference on Computer and Communications Security (ASIACCS). 633–645. https://doi.org/10.1145/3321705.3329828 Google ScholarDigital Library
- Deja Vu Security. [n.d.]. PeachMinset. http://community.peachfuzzer.com/minset.htmlGoogle Scholar
- Brandon Falk. 2021. Fuzzing: Corpus Minimization. https://youtu.be/947b0lgyvJsGoogle Scholar
- Andrea Fioraldi, Dominik Maier, Heiko Eiß feldt, and Marc Heuse. 2020. AFL++: Combining Incremental Steps of Fuzzing Research. In USENIX Workshop on Offensive Technologies (WOOT). https://www.usenix.org/conference/woot20/presentation/fioraldiGoogle Scholar
- Shuitao Gan, Chao Zhang, Peng Chen, Bodong Zhao, Xiaojun Qin, Dong Wu, and Zuoning Chen. 2020. GREYONE: Data Flow Sensitive Fuzzing. In USENIX Security Symposium (SEC). 2577–2594. https://www.usenix.org/conference/usenixsecurity20/presentation/ganGoogle Scholar
- Shuitao Gan, Chao Zhang, Xiaojun Qin, Xuwen Tu, Kang Li, Zhongyu Pei, and Zuoning Chen. 2018. CollAFL: Path Sensitive Fuzzing. In IEEE Symposium on Security and Privacy (S&P). 679–696. https://doi.org/10.1109/SP.2018.00040 Google ScholarCross Ref
- Google. 2016. Google Fuzzer Test Suite. https://github.com/google/fuzzer-test-suiteGoogle Scholar
- Google. 2020. FuzzBench. https://google.github.io/fuzzbench/Google Scholar
- Rahul Gopinath, Carlos Jensen, and Alex Groce. 2014. Code Coverage for Suite Evaluation by Developers. In ACM/IEEE International Conference on Software Engineering (ICSE). 72–82. https://doi.org/10.1145/2568225.2568278 Google ScholarDigital Library
- Gustavo Grieco, Martín Ceresa, Agustín Mista, and Pablo Buiras. 2017. QuickFuzz testing for fun and profit. Journal of Systems and Software, 134 (2017), Dec., 340–354. https://doi.org/10.1016/j.jss.2017.09.018 Google ScholarDigital Library
- HyungSeok Han, DongHyeon Oh, and Sang Kil Cha. 2019. CodeAlchemist: Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines. In Symposium on Network and Distributed System Security (NDSS). https://www.ndss-symposium.org/ndss-paper/codealchemist-semantics-aware-code-generation-to-find-vulnerabilities-in-javascript-engines/Google ScholarCross Ref
- Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2021. Magma: A Ground-Truth Fuzzing Benchmark. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4, 3 (2021), March, https://doi.org/10.1145/3428334 Google ScholarDigital Library
- Hwa-You Hsu and Alessandro Orso. 2009. MINTS: A General Framework and Tool for Supporting Test-Suite Minimization. In ACM/IEEE International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE.2009.5070541 Google ScholarDigital Library
- Kyriakos Ispoglou, Daniel Austin, Vishwath Mohan, and Mathias Payer. 2020. FuzzGen: Automatic Fuzzer Generation. In USENIX Security Symposium (SEC). 2271–2287. https://www.usenix.org/conference/usenixsecurity20/presentation/ispoglouGoogle Scholar
- Yuseok Jeon, WookHyun Han, Nathan Burow, and Mathias Payer. 2020. FuZZan: Efficient Sanitizer Metadata Design for Fuzzing. In USENIX Annual Technical Conference (ATC). 249–263. https://www.usenix.org/conference/atc20/presentation/jeonGoogle Scholar
- Edward L Kaplan and Paul Meier. 1958. Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc., 53, 282 (1958), June, https://doi.org/10.2307/2281868 Google ScholarCross Ref
- Richard M. Karp. 2011. Computational Complexity of Combinatorial and Graph-Theoretic Problems. In Theoretical Computer Science, F. Preparata (Ed.) (CIME Summer Schools, Vol. 68). Springer, 97–184. https://doi.org/10.1007/978-3-642-11120-4_3 Google ScholarCross Ref
- George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 2123–2138. https://doi.org/10.1145/3243734.3243804 Google ScholarDigital Library
- Pavneet Singh Kochhar, Ferdian Thung, and David Lo. 2015. Code coverage and test suite effectiveness: Empirical study with real bugs in large systems. In IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER). 560–564. https://doi.org/10.1109/SANER.2015.7081877 Google ScholarCross Ref
- Yuekang Li, Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu, and Alwen Tiu. 2017. Steelix: Program-state Based Binary Fuzzing. In Joint European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). 627–637. https://doi.org/10.1145/3106237.3106295 Google ScholarDigital Library
- Yuekang Li, Yinxing Xue, Hongxu Chen, Xiuheng Wu, Cen Zhang, Xiaofei Xie, Haijun Wang, and Yang Liu. 2019. Cerebro: Context-Aware Adaptive Fuzzing for Effective Vulnerability Detection. In Joint European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). 533–544. https://doi.org/10.1145/3338906.3338975 Google ScholarDigital Library
- Jun-Wei Lin, Reyhaneh Jabbarvand, Joshua Garcia, and Sam Malek. 2018. Nemo: Multi-Criteria Test-Suite Minimization with Integer Nonlinear Programming. In ACM/IEEE International Conference on Software Engineering (ICSE). 1039–1049. https://doi.org/10.1145/3180155.3180174 Google ScholarDigital Library
- Chenyang Lyu, Shouling Ji, Chao Zhang, Yuwei Li, Wei-Han Lee, Yu Song, and Raheem Beyah. 2019. MOPT: Optimized Mutation Scheduling for Fuzzers. In USENIX Security Symposium (SEC). 1949–1966. https://www.usenix.org/conference/usenixsecurity19/presentation/lyuGoogle Scholar
- Valentin J. M. Manès, Soomin Kim, and Sang Kil Cha. 2020. Ankou: Guiding Grey-Box Fuzzing towards Combinatorial Difference. In ACM/IEEE International Conference on Software Engineering (ICSE). 1024–1036. https://doi.org/10.1145/3377811.3380421 Google ScholarDigital Library
- Nathan Mantel. 1966. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports, 50, 3 (1966), 163–170.Google Scholar
- Björn Mathis, Rahul Gopinath, Michaël Mera, Alexander Kampmann, Matthias Höschele, and Andreas Zeller. 2019. Parser-Directed Fuzzing. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 548–560. https://doi.org/10.1145/3314221.3314651 Google ScholarDigital Library
- Charlie Miller. 2008. Fuzz By Number: More Data About Fuzzing Than You Ever Wanted To Know. In CanSecWest. https://cansecwest.com/csw08/csw08-miller.pdfGoogle Scholar
- Mozilla. 2015. Dharma: A Generation-based, Context-Free Grammar Fuzzer. https://blog.mozilla.org/security/2015/06/29/dharma/Google Scholar
- Mozilla. 2018. Introducing the ASan Nightly Project. https://blog.mozilla.org/security/2018/07/19/introducing-the-asan-nightly-project/Google Scholar
- Mozilla. 2020. Fuzzing—Test Samples. https://firefox-source-docs.mozilla.org/tools/fuzzing/index.htmlGoogle Scholar
- Ben Nagy. 2010. Prospecting for Rootite: More Code Coverage, More Bugs, Less Wasted Effort. In Ruxcon. https://2010.ruxcon.org.au/presentations/##pfrGoogle Scholar
- Stefan Nagy and Matthew Hicks. 2019. Full-Speed Fuzzing: Reducing Fuzzing Overhead through Coverage-Guided Tracing. In IEEE Symposium on Security and Privacy (S&P). 787–802. https://doi.org/10.1109/ISTAS48451.2019.8937885 Google ScholarCross Ref
- Timothy Nosco, Jared Ziegler, Zechariah Clark, Davy Marrero, Todd Finkler, Andrew Barbarello, and W. Michael Petullo. 2020. The Industrial Age of Hacking. In USENIX Security Symposium (SEC). 1129–1146. https://www.usenix.org/conference/usenixsecurity20/presentation/noscoGoogle Scholar
- Sebastian Österlund, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. 2020. ParmeSan: Sanitizer-guided Greybox Fuzzing. In USENIX Security Symposium (SEC). 2289–2306. https://www.usenix.org/conference/usenixsecurity20/presentation/osterlundGoogle Scholar
- Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. 2019. Semantic Fuzzing with Zest. In ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 329–340. https://doi.org/10.1145/3293882.3330576 Google ScholarDigital Library
- Rohan Padhye, Caroline Lemieux, Koushik Sen, Laurent Simon, and Hayawardh Vijayakumar. 2019. FuzzFactory: Domain-Specific Fuzzing with Waypoints. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), Oct., 174–1749. https://doi.org/10.1145/3360600 Google ScholarDigital Library
- Shankara Pailoor, Andrew Aday, and Suman Jana. 2018. MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation. In USENIX Security Symposium (SEC). 729–743. https://www.usenix.org/conference/usenixsecurity18/presentation/pailoorGoogle Scholar
- Daniel Plohmann, Martin Clauss, Steffen Enders, and Elmar Padilla. 2018. Malpedia: A Collaborative Effort to Inventorize the Malware Landscape. Journal on Cybercrime & Digital Investigations, 3, 1 (2018), https://doi.org/10.18464/cybin.v3i1.17 Google ScholarCross Ref
- Alexandre Rebert, Sang Kil Cha, Thanassis Avgerinos, Jonathan Foote, David Warren, Gustavo Grieco, and David Brumley. 2014. Optimizing Seed Selection for Fuzzing. In USENIX Security Symposium (SEC). 861–875. https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/rebertGoogle Scholar
- Christopher Salls, Aravind Machiry, Adam Doupe, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2020. Exploring Abstraction Functions in Fuzzing. In IEEE Conference on Communications and Network Security (CNS). 1–9. https://doi.org/10.1109/CNS48642.2020.9162273 Google ScholarCross Ref
- Scrapinghub. 2020. Scrapy. https://scrapy.org/Google Scholar
- Kosta Serebryany. 2016. Continuous Fuzzing with libFuzzer and AddressSanitizer. In IEEE Cybersecurity Development (SecDev). 157. https://doi.org/10.1109/SecDev.2016.043 Google ScholarCross Ref
- Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitriy Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In USENIX Annual Technical Conference (ATC). 309–318. https://www.usenix.org/conference/atc12/technical-sessions/presentation/serebryanyGoogle Scholar
- JHU/APL Staff. 2019. Assembled Labeled Library for Static Analysis Research (ALLSTAR) Dataset. https://allstar.jhuapl.edu/Google Scholar
- Robert Swiecki. 2016. honggfuzz. http://honggfuzz.com/Google Scholar
- The Clang Team. 2020. Source-based Code Coverage. https://clang.llvm.org/docs/SourceBasedCodeCoverage.htmlGoogle Scholar
- Jonas Benedict Wagner. 2017. Elastic Program Transformations Automatically Optimizing the Reliability/Performance Trade-off in Systems Software. Ph.D. Dissertation. EPFL. http://infoscience.epfl.ch/record/228899Google Scholar
- Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-Driven Seed Generation for Fuzzing. In IEEE Symposium on Security and Privacy (S&P). 579–594. https://doi.org/10.1109/SP.2017.23 Google ScholarCross Ref
- Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: Grammar-Aware Greybox Fuzzing. In ACM/IEEE International Conference on Software Engineering (ICSE). 724–735. https://doi.org/10.1109/ICSE.2019.00081 Google ScholarDigital Library
- Jinghan Wang, Yue Duan, Wei Song, Heng Yin, and Chengyu Song. 2019. Be Sensitive and Collaborative: Analyzing Impact of Coverage Metrics in Greybox Fuzzing. In International Symposium on Research in Attacks, Intrusions and Defenses (RAID). 1–15. https://www.usenix.org/conference/raid2019/presentation/wangGoogle Scholar
- Yanhao Wang, Xiangkun Jia, Yuwei Liu, Kyle Zeng, Tiffany Bao, Dinghao Wu, and Purui Su. 2020. Not All Coverage Measurements Are Equal: Fuzzing by Coverage Accounting for Input Prioritization. In Network and Distributed System Security Symposium (NDSS). https://www.ndss-symposium.org/ndss-paper/not-all-coverage-measurements-are-equal-fuzzing-by-coverage-accounting-for-input-prioritization/Google ScholarCross Ref
- Wen Xu, Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. 2017. Designing New Operating Primitives to Improve Fuzzing Performance. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 2313–2328. https://doi.org/10.1145/3133956.3134046 Google ScholarDigital Library
- Tai Yue, Pengfei Wang, Yong Tang, Enze Wang, Bo Yu, Kai Lu, and Xu Zhou. 2020. EcoFuzz: Adaptive Energy-Saving Greybox Fuzzing as a Variant of the Adversarial Multi-Armed Bandit. In USENIX Security Symposium (SEC). 2307–2324. https://www.usenix.org/conference/usenixsecurity20/presentation/yueGoogle Scholar
- Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In USENIX Security Symposium (SEC). 745–761. https://www.usenix.org/conference/usenixsecurity18/presentation/yunGoogle Scholar
- Michał Zalewski. 2015. American Fuzzy Lop (AFL). http://lcamtuf.coredump.cx/afl/Google Scholar
- Peiyuan Zong, Tao Lv, Dawei Wang, Zizhuang Deng, Ruigang Liang, and Kai Chen. 2020. FuzzGuard: Filtering out Unreachable Inputs in Directed Grey-box Fuzzing through Deep Learning. In USENIX Security Symposium (SEC). 2255–2269. https://www.usenix.org/conference/usenixsecurity20/presentation/zongGoogle Scholar
Index Terms
- Seed selection for successful fuzzing
Recommendations
Evaluating seed selection for fuzzing JavaScript engines
AbstractJavaScript (JS), as a platform-independent programming language, remains to be the most popular language over the years. However, popular JavaScript engines that have been widely utilized by web browsers to interpret JS code, have become the most ...
Graphuzz: Data-driven Seed Scheduling for Coverage-guided Greybox Fuzzing
Seed scheduling is a critical step of greybox fuzzing, which assigns different weights to seed test cases during seed selection, and significantly impacts the efficiency of fuzzing. Existing seed scheduling strategies rely on manually designed models to ...
Alphuzz: Monte Carlo Search on Seed-Mutation Tree for Coverage-Guided Fuzzing
ACSAC '22: Proceedings of the 38th Annual Computer Security Applications ConferenceCoverage-based greybox fuzzing (CGF) has been approved to be effective in finding security vulnerabilities. Seed scheduling, the process of selecting an input as the seed from the seed pool for the next fuzzing iteration, plays a central role in CGF. ...
Comments