Random testing for C and C++ compilers with YARPGen

Authors:
Vsevolod Livinskii

University of Utah, USA

University of Utah, USA
View Profile

,
Dmitry Babokin

Intel Corporation, USA

Intel Corporation, USA
View Profile

,
John Regehr

University of Utah, USA

University of Utah, USA
View Profile

Proceedings of the ACM on Programming Languages Volume 4 Issue OOPSLAArticle No.: 196pp 1–25https://doi.org/10.1145/3428264

Published:13 November 2020Publication History

Proceedings of the ACM on Programming Languages

Abstract

Compilers should not crash and they should not miscompile applications. Random testing is an effective method for finding compiler bugs that have escaped other kinds of testing. This paper presents Yet Another Random Program Generator (YARPGen), a random test-case generator for C and C++ that we used to find and report more than 220 bugs in GCC, LLVM, and the Intel® C++ Compiler. Our research contributions include a method for generating expressive programs that avoid undefined behavior without using dynamic checks, and generation policies, a mechanism for increasing diversity of generated code and for triggering more optimizations. Generation policies decrease the testing time to find hard-to-trigger compiler bugs and, for the kinds of scalar optimizations YARPGen was designed to stress-test, increase the number of times these optimizations are applied by the compiler by an average of 20% for LLVM and 40% for GCC. We also created tools for automating most of the common tasks related to compiler fuzzing; these tools are also useful for fuzzers other than ours.

Supplemental Material

oopsla20main-p292-p-video.mp4

mp4

129.6 MB

Download

References

Domenico Amalfitano, Nicola Amatucci, Anna Rita Fasolino, Porfirio Tramontana, Emily Kowalczyk, and Atif M. Memon. 2015. Exploiting the Saturation Efect in Automatic Random Testing of Android Applications. In Proceedings of the Second ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft '15). IEEE Press, 33-43.Google Scholar
Gergö Barany. 2018a. Finding Missed Compiler Optimizations by Diferential Testing. In Proceedings of the 27th International Conference on Compiler Construction (CC 2018 ). Association for Computing Machinery, New York, NY, USA, 82-92. https://doi.org/10.1145/3178372.3179521 Google ScholarDigital Library
Gergö Barany. 2018b. Liveness-Driven Random Program Generation. In Logic-Based Program Synthesis and Transformation, Fabio Fioravanti and John P. Gallagher (Eds.). Springer International Publishing, Cham, 112-127.Google Scholar
Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017 ). Association for Computing Machinery, New York, NY, USA, 95-110. https://doi.org/10.1145/3062341.3062349 Google ScholarDigital Library
Matteo Biagiola, Andrea Stocco, Filippo Ricca, and Paolo Tonella. 2019. Diversity-Based Web Test Generation. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019 ). Association for Computing Machinery, New York, NY, USA, 142-153. https: //doi.org/10.1145/3338906.3338970 Google ScholarDigital Library
Marcel Böhme and Brandon Falk. 2020. Fuzzing: On the Exponential Cost of Vulnerability Discovery. In Proceedings of the 2020 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software. 11.Google ScholarDigital Library
Colin J Burgess and M Saidi. 1996. The automatic generation of test cases for optimizing Fortran compilers. Information and Software Technology 38, 2 ( 1996 ), 111-119.Google Scholar
Junjie Chen, Jibesh Patra, Michael Pradel, Yingfei Xiong, Hongyu Zhang, Dan Hao, and Lu Zhang. 2020. A Survey of Compiler Testing. ACM Comput. Surv. 53, 1, Article 4 ( Feb. 2020 ), 36 pages. https://doi.org/10.1145/3363562 Google ScholarDigital Library
Tsong Yueh Chen, Fei-Ching Kuo, Robert G. Merkel, and T. H. Tse. 2010. Adaptive Random Testing: The ART of Test Case Diversity. J. Syst. Softw. 83, 1 (Jan. 2010 ), 60-66. https://doi.org/10.1016/j.jss. 2009. 02.022 Google ScholarDigital Library
Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler Fuzzing through Deep Learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018 ). Association for Computing Machinery, New York, NY, USA, 95-105. https://doi.org/10.1145/3213846.3213848 Google ScholarDigital Library
Pascal Cuoq, Florent Kirchner, Nikolai Kosmatov, Virgile Prevosto, Julien Signoles, and Boris Yakobowski. 2012. Frama-C: A Software Analysis Perspective. In Proceedings of the 10th International Conference on Software Engineering and Formal Methods (SEFM'12). Springer-Verlag, Berlin, Heidelberg, 233-247. https://doi.org/10.1007/978-3-642-33826-7_16 Google ScholarDigital Library
K. Dewey, J. Roesch, and B. Hardekopf. 2015. Fuzzing the Rust Typechecker Using CLP (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 482-493.Google Scholar
Eric Eide and John Regehr. 2008. Volatiles are miscompiled, and what to do about it. In Proceedings of the 8th ACM international conference on Embedded software. 255-264.Google ScholarDigital Library
Alex Groce, Chaoqiang Zhang, Eric Eide, Yang Chen, and John Regehr. 2012. Swarm Testing. In Proceedings of the 2012 International Symposium on Software Testing and Analysis (ISSTA 2012 ). Association for Computing Machinery, New York, NY, USA, 78-88. https://doi.org/10.1145/2338965.2336763 Google ScholarDigital Library
Atsushi Hashimoto and Nagisa Ishiura. 2016. Detecting arithmetic optimization opportunities for C compilers by randomly generated equivalent programs. IPSJ Transactions on System LSI Design Methodology 9 ( 2016 ), 21-29.Google Scholar
R. Huang, W. Sun, Y. Xu, H. Chen, D. Towey, and X. Xia. 2019. A Survey on Adaptive Random Testing. IEEE Transactions on Software Engineering ( 2019 ).Google Scholar
International Organization for Standardization 2011. ISO/IEC 9899: 201x: Programming Languages-C. International Organization for Standardization. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf.Google Scholar
International Organization for Standardization 2012. ISO/IEC N3337: Working Draft, Standard for Programming Language C++. International Organization for Standardization. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf.Google Scholar
Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler Validation via Equivalence modulo Inputs. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14). Association for Computing Machinery, New York, NY, USA, 216-226. https://doi.org/10.1145/2594291.2594334 Google ScholarDigital Library
Vu Le, Chengnian Sun, and Zhendong Su. 2015. Finding Deep Compiler Bugs via Guided Stochastic Program Mutation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015 ). Association for Computing Machinery, New York, NY, USA, 386-399. https://doi.org/10. 1145/2814270.2814319 Google ScholarDigital Library
Christian Lindig. 2005. Random Testing of C Calling Conventions. In Proceedings of the Sixth International Symposium on Automated Analysis-Driven Debugging (AADEBUG'05). Association for Computing Machinery, New York, NY, USA, 3-12. https://doi.org/10.1145/1085130.1085132 Google ScholarDigital Library
Xiao Liu, Xiaoting Li, Rupesh Prajapati, and Dinghao Wu. 2019. Deepfuzz: Automatic generation of syntax valid c programs for fuzz testing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1044-1051.Google ScholarDigital Library
Michaël Marcozzi, Qiyi Tang, Alastair Donaldson, and Cristian Cadar. 2019. A Systematic Impact Study for Fuzzer-Found Compiler Bugs. arXiv preprint arXiv: 1902. 09334 ( 2019 ).Google Scholar
William M. McKeeman. 1998. Diferential Testing for Software. Digital Technical Journal 10, 1 (Dec. 1998 ), 100-107.Google Scholar
Eriko Nagai, Hironobu Awazu, Nagisa Ishiura, and Naoya Takeda. 2012. Random testing of C compilers targeting arithmetic optimization. In Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2012 ). 48-53.Google Scholar
Eriko Nagai, Atsushi Hashimoto, and Nagisa Ishiura. 2013. Scaling up size and number of expressions in random testing of arithmetic optimization of C compilers. In Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2013 ). 88-93.Google Scholar
Eriko Nagai, Atsushi Hashimoto, and Nagisa Ishiura. 2014. Reinforcing random testing of arithmetic optimization of C compilers by scaling up size and number of expressions. IPSJ Transactions on System LSI Design Methodology 7 ( 2014 ), 91-100.Google Scholar
K. Nakamura and N. Ishiura. 2016. Random testing of C compilers based on test program generation by equivalence transformation. In 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). 676-679.Google Scholar
Georg Ofenbeck, Tiark Rompf, and Markus Püschel. 2016. RandIR: Diferential Testing for Embedded Compilers. In Proceedings of the 2016 7th ACM SIGPLAN Symposium on Scala (SCALA 2016 ). Association for Computing Machinery, New York, NY, USA, 21-30. https://doi.org/10.1145/2998392.2998397 Google ScholarDigital Library
Jibesh Patra and Michael Pradel. 2016. Learning to fuzz: Application-independent fuzz testing with probabilistic, generative models of input data. TU Darmstadt, Department of Computer Science, Tech. Rep. TUD-CS-2016-14664 ( 2016 ).Google Scholar
John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-Case Reduction for C Compiler Bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). Association for Computing Machinery, New York, NY, USA, 335-346. https://doi.org/10.1145/2254064.2254104 Google ScholarDigital Library
Richard L. Sauder. 1962. A General Test Data Generator for COBOL. In Proceedings of the May 1-3, 1962, Spring Joint Computer Conference (AIEE-IRE ' 62 (Spring)). Association for Computing Machinery, New York, NY, USA, 317-323. https://doi.org/10.1145/1460833.1460869 Google ScholarDigital Library
Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding Compiler Bugs via Live Code Mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016 ). Association for Computing Machinery, New York, NY, USA, 849-863. https://doi.org/10.1145/2983990.2984038 Google ScholarDigital Library
David B. Whalley. 1994. Automatic Isolation of Compiler Errors. ACM Trans. Program. Lang. Syst. 16, 5 (Sept. 1994 ), 1648-1659. https://doi.org/10.1145/186025.186103 Google ScholarDigital Library
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11). Association for Computing Machinery, New York, NY, USA, 283-294. https://doi.org/10.1145/1993498.1993532 Google ScholarDigital Library

Index Terms

Random testing for C and C++ compilers with YARPGen
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Finding and understanding bugs in C compilers
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to ...
Read More
Finding and understanding bugs in C compilers
PLDI '11

Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to ...
Read More
Volatiles are miscompiled, and what to do about it
EMSOFT '08: Proceedings of the 8th ACM international conference on Embedded software

C's volatile qualifier is intended to provide a reliable link between operations at the source-code level and operations at the memory-system level. We tested thirteen production-quality C compilers and, for each, found situations in which the compiler ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Programming Languages Volume 4, Issue OOPSLA
November 2020
3108 pages
EISSN:2475-1421
DOI:10.1145/3436718
Issue’s Table of Contents

Copyright © 2020 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 November 2020
Published in pacmpl Volume 4, Issue OOPSLA

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Distinguished Paper
Author Tags
automated testing
compiler defect
compiler testing
random program generation
random testing
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 41
  Total Citations
  View Citations
- 2,939
  Total Downloads
- Downloads (Last 12 months)870
- Downloads (Last 6 weeks)106
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.