ABSTRACT
This paper reports on our experience implementing a technique for sifting through static analysis reports using dynamic symbolic execution. Our insight is that if a static analysis tool produces a partial trace through the program under analysis, annotated with conditions that the analyser believes are important for the bug to trigger, then a dynamic symbolic execution tool may be able to exploit the trace by (a) guiding the search heuristically so that paths that follow the trace most closely are prioritised for exploration, and (b) pruning the search using the conditions associated with each step of the trace. This may allow the bug to be quickly confirmed using dynamic symbolic execution, if it turns out to be a true positive, yielding an input that triggers the bug.
To experiment with this approach, we have implemented the idea in a tool chain that allows the popular open-source static analysis tools Clang Static Analyzer (CSA) and Infer to be combined with the popular open-source dynamic symbolic execution engine KLEE. Our findings highlight two interesting negative results. First, while fault injection experiments show the promise of our technique, they also reveal that the traces provided by static analysis tools are not that useful in guiding search. Second, we have systematically applied CSA and Infer to a large corpus of real-world applications that are suitable for analysis with KLEE, and find that the static analysers are rarely able to find non-trivial true positive bugs for this set of applications.
We believe our case study can inform static analysis and dynamic symbolic execution tool developers as to where improvements may be necessary, and serve as a call to arms for researchers interested in combining symbolic execution and static analysis to identify more suitable benchmark suites for evaluation of research ideas.
- Domagoj Babić, Lorenzo Martignoni, Stephen McCamant, and Dawn Song. 2011. Statically-directed Dynamic Automated Test Generation. In Proc. of the International Symposium on Software Testing and Analysis (ISSTA’11). https://doi.org/10.1145/2001420.2001423 Google ScholarDigital Library
- Bernhard Beckert, Mihai Herda, Michael Kirsten, and Shmuel S. Tyszberowicz. 2020. Integration of Static and Dynamic Analysis Techniques for Checking Noninterference. In Deductive Software Verification: Future Perspectives - Reflections on the Occasion of 20 Years of KeY (Lecture Notes in Computer Science, Vol. 12345). Springer, 287–312. https://doi.org/10.1007/978-3-030-64354-6_12 Google ScholarCross Ref
- Marcel Böhme and Abhik Roychoudhury. 2014. Corebench: Studying complexity of regression errors". In Proc. of the International Symposium on Software Testing and Analysis (ISSTA’14). https://doi.org/10.1145/2610384.2628058 Google ScholarDigital Library
- Fraser Brown, Deian Stefan, and Dawson Engler. 2020. Sys: A Static/Symbolic Tool for Finding Good Bugs in Good (Browser) Code. In Proc. of the 29th USENIX Security Symposium (USENIX Security’20). Google Scholar
- Cristian Cadar. 2015. Targeted program transformations for symbolic execution. In Proc. of the Joint Meeting of the European Software Engineering Conference and the ACM Symposium on the Foundations of Software Engineering, New Ideas Track (ESEC/FSE NI’15). https://doi.org/10.1145/2786805.2803205 Google ScholarDigital Library
- Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In Proc. of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). Google ScholarDigital Library
- Cristian Cadar and Dawson Engler. 2005. Execution Generated Test Cases: How to Make Systems Code Crash Itself. In Proc. of the 12th International SPIN Workshop on Model Checking of Software (SPIN’05). https://doi.org/10.1007/11537328_2 Google ScholarDigital Library
- Cristian Cadar, Vijay Ganesh, Peter Pawlowski, David Dill, and Dawson Engler. 2006. EXE: Automatically Generating Inputs of Death. In Proc. of the 13th ACM Conference on Computer and Communications Security (CCS’06). https://doi.org/10.1145/1455518.1455522 Google ScholarDigital Library
- Cristian Cadar and Koushik Sen. 2013. Symbolic Execution for Software Testing: Three Decades Later. Communications of the Association for Computing Machinery (CACM), 56, 2 (2013), 82–90. https://doi.org/10.1145/2408776.2408795 Google ScholarDigital Library
- Cristiano Calcagno and Dino Distefano. 2011. Infer: An Automatic Program Verifier for Memory Safety of C Programs. In Proc. of the 3rd International Conference on NASA Formal Methods (NFM’11). Google ScholarCross Ref
- Cristiano Calcagno, Dino Distefano, Peter W. O’Hearn, and Hongseok Yang. 2011. Compositional Shape Analysis by Means of Bi-Abduction. J. ACM, 58, 6 (2011), 26:1–26:66. https://doi.org/10.1145/2049697.2049700 Google ScholarDigital Library
- Omar Chebaro, Nikolai Kosmatov, Alain Giorgetti, and Jacques Julliand. 2011. The SANTE Tool: Value Analysis, Program Slicing and Test Generation for C Program Debugging. In Proc. of the 5st International Conference on Tests and Proofs (TAP’11). Google ScholarCross Ref
- Maria Christakis, Peter Müller, and Valentin Wüstholz. 2006. Guiding Dynamic Symbolic Execution Toward Unverified Program Executions. In Proc. of the 28th International Conference on Software Engineering (ICSE’06). https://doi.org/10.1145/2884781.2884843 Google ScholarDigital Library
- 2022. https://clang-analyzer.llvm.org Google Scholar
- 2021. Clang 12 documentation: Cross Translation Unit (CTU) Analysis. https://clang.llvm.org/docs/analyzer/user-docs/CrossTranslationUnit.html Google Scholar
- 2022. https://scan.coverity.com/ Google Scholar
- Christoph Csallner and Yannis Smaragdakis. 2005. Check ’n’ Crash: Combining static checking and testing. In Proc. of the 27th International Conference on Software Engineering (ICSE’05). https://doi.org/10.1145/1062455.1062533 Google ScholarDigital Library
- Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proc. of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08). Google ScholarDigital Library
- Peter Dinges and Gul Agha. 2014. Targeted Test Input Generation Using Symbolic-Concrete Backward Execution. In Proc. of the 29th IEEE International Conference on Automated Software Engineering (ASE’14). https://doi.org/10.1145/2642937.2642951 Google ScholarDigital Library
- Josselin Feist, Laurent Mounier, Sébastien Bardin, Robin David, and Marie-Laure Potet. 2016. Finding the Needle in the Heap: Combining Static Analysis and Dynamic Symbolic Execution to Trigger Use-after-Free. In Proceedings of the 6th Workshop on Software Security, Protection, and Reverse Engineering (SSPREW’16). https://doi.org/10.1145/3015135.3015137 Google ScholarDigital Library
- 2022. https://engineering.fb.com/2017/09/06/android/finding-inter-procedural-bugs-at-scale-with-infer-static-analyzer/ Google Scholar
- 2022. https://www.microfocus.com/en-us/cyberres/application-security/static-code-analyzer Google Scholar
- 2022. https://frama-c.com Google Scholar
- Fengjuan Gao, Yu Wang, Linzhang Wang, Zijiang Yang, and Xuandong Li. 2020. Automatic Buffer Overflow Warning Validation. Journal of Computer Science and Technology, 35, 6 (2020), 1406–1427. https://doi.org/10.1007/s11390-020-0525-z Google ScholarDigital Library
- Xi Ge, Kunal Taneja, Tao Xie, and Nikolai Tillmann. 2011. DyTa: Dynamic Symbolic Execution Guided with Static Verification Results. In Proc. of the 33rd International Conference on Software Engineering (ICSE’11). https://doi.org/10.1145/1985793.1985971 Google ScholarDigital Library
- 2022. https://www.gnu.org/software/coreutils/ Google Scholar
- Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. In Proc. of the Conference on Programing Language Design and Implementation (PLDI’05). Google ScholarDigital Library
- Istvan Haller, Asia Slowinska, Matthias Neugschwandtner, and Herbert Bos. 2013. Dowsing for Overflows: A Guided Fuzzer to Find Buffer Boundary Violations. In Proc. of the 22st USENIX Security Symposium (USENIX Security’13). Google Scholar
- GrammaTech Inc.. 2022. CodeSonar. https://www.grammatech.com/codesonar-cc Google Scholar
- Wei Jin and Alessandro Orso. 2012. BugRedux: Reproducing Field Failures for In-house Debugging. In Proc. of the 34th International Conference on Software Engineering (ICSE’12). Google ScholarDigital Library
- Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, and Wei Le. 2021. Validating Static Warnings via Testing Code Fragments. In Proc. of the International Symposium on Software Testing and Analysis (ISSTA’21). https://doi.org/10.1145/3460319.3464832 Google ScholarDigital Library
- 2022. https://doi.org/10.5281/zenodo.6539575 Google ScholarDigital Library
- 2022. https://srg.doc.ic.ac.uk/projects/klee-sa/ Google Scholar
- Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proc. of the 2nd International Symposium on Code Generation and Optimization (CGO’04). Google ScholarDigital Library
- Mengchen Li, Yuanjun Chen, Linzhang Wang, and Guoqing Xu. 2013. Dynamically Validating Static Memory Leak Warnings. In Proc. of the International Symposium on Software Testing and Analysis (ISSTA’13). https://doi.org/10.1145/2483760.2483778 Google ScholarDigital Library
- Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou, and Yuanyuan Zhou. 2005. BugBench: Benchmarks for Evaluating Bug Detection Tools. In Workshop on the Evaluation of Software Defect Detection Tools. Google Scholar
- Kin-Keung Ma, Yit Phang Khoo, Jeffrey S. Foster, and Michael Hicks. 2011. Directed Symbolic Execution. In Proc. of the 18th International Static Analysis Symposium (SAS’11). Google ScholarDigital Library
- Paul Dan Marinescu and Cristian Cadar. 2013. KATCH: High-Coverage Testing of Software Patches. In Proc. of the Joint Meeting of the European Software Engineering Conference and the ACM Symposium on the Foundations of Software Engineering (ESEC/FSE’13). https://doi.org/10.1145/2491411.2491438 Google ScholarDigital Library
- T. T. Nguyen, P. Maleehuan, T. Aoki, T. Tomita, and I. Yamada. 2019. Reducing False Positives of Static Analysis for SEI CERT C Coding Standard. In IEEE/ACM Joint 7th International Workshop on Conducting Empirical Studies in Industry (CESI’19) and 6th International Workshop on Software Engineering Research and Industrial Practice (SER IP’19). https://doi.org/10.1109/CESSER-IP.2019.00015 Google ScholarDigital Library
- Martin Nowack and Cristian Cadar. 2020. KLEE Symbolic Execution Engine in 2019. International Journal on Software Tools for Technology Transfer, https://doi.org/10.1007/s10009-020-00570-3 Google ScholarDigital Library
- H. Post, C. Sinz, A. Kaiser, and T. Gorges. 2008. Reducing False Positives by Combining Abstract Interpretation and Bounded Model Checking. In Proc. of the 23rd IEEE International Conference on Automated Software Engineering (ASE’08). https://doi.org/10.1109/ASE.2008.29 Google ScholarDigital Library
- M. R. Gadelha, E. Steffinlongo, L. C. Cordeiro, B. Fischer, and D. Nicole. 2019. SMT-Based Refutation of Spurious Bug Reports in the Clang Static Analyzer. In Proc. of the 41th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion’19). https://doi.org/10.1109/ICSE-Companion.2019.00026 Google ScholarDigital Library
- Daniel Schemmel, Julian Büning, Frank Busse, Martin Nowack, and Cristian Cadar. 2022. A Deterministic Memory Allocator for Dynamic Symbolic Execution. In Proc. of the 36th European Conference on Object-Oriented Programming (ECOOP’22). Google Scholar
- Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A Concolic Unit Testing Engine for C. In Proc. of the Joint Meeting of the European Software Engineering Conference and the ACM Symposium on the Foundations of Software Engineering (ESEC/FSE’05). https://doi.org/10.1145/1081706.1081750 Google ScholarCross Ref
- Kunal Taneja, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. 2011. eXpress: guided path exploration for efficient regression test generation. In Proc. of the International Symposium on Software Testing and Analysis (ISSTA’11). https://doi.org/10.1145/2001420.2001422 Google ScholarDigital Library
- Nikolai Tillmann and Jonathan De Halleux. 2008. Pex: white box test generation for .NET. In Proc. of the 2nd International Conference on Tests and Proofs (TAP’08). https://doi.org/10.1007/978-3-540-79124-9_10 Google ScholarCross Ref
- Zhihong Xu and Gregg Rothermel. 2009. Directed Test Suite Augmentation. In Proc. of the 16th Asia-Pacific Software Engineering Conference (ASPEC’09). Google ScholarDigital Library
- Cristian Zamfir and George Candea. 2010. Execution Synthesis: A Technique for Automated Software Debugging. In Proc. of the 5th European Conference on Computer Systems (EuroSys’10). https://doi.org/10.1145/1755913.1755946 Google ScholarDigital Library
- Bin Zhang, Chao Feng, Bo Wu, and Chaojing Tang. 2016. Detecting integer overflow in Windows binary executables based on symbolic execution. In 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). 385–390. https://doi.org/10.1109/SNPD.2016.7515929 Google ScholarCross Ref
Index Terms
- Combining static analysis error traces with dynamic symbolic execution (experience paper)
Recommendations
Running symbolic execution forever
ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and AnalysisWhen symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a ...
Tuning parallel symbolic execution engine for better performance
Symbolic execution is widely used in many code analysis, testing, and verification tools. As symbolic execution exhaustively explores all feasible paths, it is quite time consuming. To handle the problem, researchers have paralleled existing symbolic ...
Scaling symbolic execution using staged analysis
Recent advances in constraint solving technology and raw computation power have led to a substantial increase in the effectiveness of techniques based on symbolic execution for systematic bug finding. However, scaling symbolic execution remains a ...
Comments