Abstract
Validating optimizing compilers is challenging because it is hard to generate valid test programs (i.e., those that do not expose any undefined behavior). Equivalence Modulo Inputs (EMI) is an effective, promising methodology to tackle this problem. Given a test program with some inputs, EMI mutates the program to derive variants that are semantically equivalent w.r.t. these inputs. The state-of-the-art instantiations of EMI are Orion and Athena, both of which rely on deleting code from or inserting code into code regions that are not executed under the inputs. Although both have demonstrated their ability in finding many bugs in GCC and LLVM, they are still limited due to their mutation strategies that operate only on dead code regions.
This paper presents a novel EMI technique that allows mutation in the entire program (i.e., both live and dead regions). By removing the restriction of mutating only the dead regions, our technique significantly increases the EMI variant space. It also helps to more thoroughly stress test compilers as compilers must optimize mutated live code, whereas mutated dead code might be eliminated. Finally, our technique also makes compiler bugs more noticeable as miscompilations on mutated dead code may not be observable.
We have realized the proposed technique in Hermes. The evaluation demonstrates Hermes’s effectiveness. In 13 months, Hermes found 168 confirmed, valid bugs in GCC and LLVM, of which 132 have already been fixed.
- ACE. SuperTest compiler test and validation suite. http:// www.ace.nl/compiler/supertest.html, accessed: 2016- 03-20.Google Scholar
- A. Balestrat. CCG: A random C code generator.Google Scholar
- Y. Chen, T. Su, C. Sun, Z. Su, and J. Zhao. Coverage-directed differential testing of jvm implementations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’16, pages 85–99, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- N. Chong, A. Donaldson, A. Lascu, and C. Lidbury. Manycore compiler fuzzing. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2015. Google ScholarDigital Library
- P. Cuoq, B. Monate, A. Pacalet, V. Prevosto, J. Regehr, B. Yakobowski, and X. Yang. Testing static analyzers with randomly generated programs. In A. Goodloe and S. Person, editors, NASA Formal Methods, volume 7226 of Lecture Notes in Computer Science, pages 120–125. Springer Berlin Heidelberg, 2012. Google ScholarDigital Library
- A. F. Donaldson and A. Lascu. Metamorphic testing for (graphics) compilers. In 1st International Workshop on Metamorphic Testing, in concunction with the 38th International Conference on Software Engineering, 5 2016. Google ScholarDigital Library
- GCC Wiki. Predictive Commoning. https://gcc.gnu. org/wiki/PredictiveCommoning, accessed: 2016-03-20.Google Scholar
- T. Hoare. The verifying compiler: A grand challenge for computing research. In Modular Programming Languages, pages 25–35. Springer, 2003.Google Scholar
- V. Le, M. Afshari, and Z. Su. Compiler validation via equivalence modulo inputs. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2014. Google ScholarDigital Library
- V. Le, C. Sun, and Z. Su. Randomized stress-testing of linktime optimizers. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA), 2014. Google ScholarDigital Library
- V. Le, C. Sun, and Z. Su. Finding Deep Compiler Bugs via Guided Stochastic Program Mutation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pages 386–399, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- X. Leroy. Coinductive big-step operational semantics. In European Symposium on Programming (ESOP’06), volume 3924 of Lecture Notes in Computer Science, pages 54–68. Springer, 2006. Google ScholarDigital Library
- X. Leroy. Formal certification of a compiler back-end, or: programming a compiler with a proof assistant. In Proceedings of the 33rd ACM Symposium on Principles of Programming Languages (POPL), pages 42–54, 2006. Google ScholarDigital Library
- X. Leroy. A formally verified compiler back-end. Journal of Automated Reasoning, 43(4):363–446, 2009. Google ScholarDigital Library
- LLVM’s Analysis and Transform Passes. Loop-Closed SSA Form Pass. http://llvm.org/docs/Passes.html# lcssa-loop-closed-ssa-form-pass, accessed: 2016-03- 20.Google Scholar
- N. P. Lopes, D. Menendez, S. Nagarakatte, and J. Regehr. Provably correct peephole optimizations with alive. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2015, pages 22–32, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- G. Malecha, G. Morrisett, A. Shinnar, and R. Wisnesky. Toward a verified relational database management system. In Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 237–248, 2010. Google ScholarDigital Library
- L. Martignoni, R. Paleari, A. Reina, G. F. Roglia, and D. Bruschi. A methodology for testing CPU emulators. ACM Transactions on Software Engineering and Methodology (TOSEM), 22(4):29, 2013. Google ScholarDigital Library
- S. McPeak, D. S. Wilkerson, and S. Goldsmith. Berkeley Delta. http://delta.tigris.org/, accessed: 2016-03-20.Google Scholar
- E. Nagai, H. Awazu, N. Ishiura, and N. Takeda. Random testing of C compilers targeting arithmetic optimization. In Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2012), pages 48–53, 2012.Google Scholar
- E. Nagai, A. Hashimoto, and N. Ishiura. Scaling up size and number of expressions in random testing of arithmetic optimization of C compilers. In Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2013), pages 88–93, 2013.Google Scholar
- G. C. Necula. Translation validation for an optimizing compiler. In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 83–94, 2000. Google ScholarDigital Library
- Plum Hall, Inc. The Plum Hall Validation Suite for C. http: //www.plumhall.com/stec.html, accessed: 2016-03-20.Google Scholar
- A. Pnueli, M. Siegel, and E. Singerman. Translation validation. In Proceedings of the 4th International Conference on Tools and Algorithms for Construction and Analysis of Systems (TACAS), pages 151–166, 1998. Google ScholarDigital Library
- J. Regehr, Y. Chen, P. Cuoq, E. Eide, C. Ellison, and X. Yang. Test-case reduction for C compiler bugs. In Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 335–346, 2012. Google ScholarDigital Library
- H. Samet. Automatically proving the correctness of translations involving optimized code. PhD Thesis, Stanford University, May 1975. Google ScholarDigital Library
- C. Sun, V. Le, and Z. Su. Finding and analyzing compiler warning defects. In Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pages 203– 213, 2016. Google ScholarDigital Library
- C. Sun, V. Le, Q. Zhang, and Z. Su. Toward understanding compiler bugs in gcc and llvm. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, pages 294–305, 2016. Google ScholarDigital Library
- R. Tate, M. Stepp, Z. Tatlock, and S. Lerner. Equality saturation: a new approach to optimization. In Proceedings of the ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages (POPL), pages 264–276, 2009. Google ScholarDigital Library
- The Clang Team. Clang 3.4 documentation: LibTooling. http://clang.llvm.org/docs/LibTooling.html, accessed: 2016-03-20.Google Scholar
- J.-B. Tristan, P. Govereau, and G. Morrisett. Evaluating valuegraph translation validation for LLVM. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 295–305, 2011. Google ScholarDigital Library
- Wikipedia. Global Value Numbering. https: //en.wikipedia.org/wiki/Global_value_numbering, accessed: 2016-03-20.Google Scholar
- Wikipedia. Loop unswitching. https://en.wikipedia. org/wiki/Loop_unswitching, accessed: 2016-03-20.Google Scholar
- Wikipedia. Sequence point. https://en.wikipedia.org/ wiki/Sequence_point, accessed: 2016-03-20.Google Scholar
- X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 283–294, 2011. Google ScholarDigital Library
- C. Zhao, Y. Xue, Q. Tao, L. Guo, and Z. Wang. Automated test program generation for an industrial optimizing compiler. In ICSE Workshop on Automation of Software Test (AST), pages 36–43, 2009.Google Scholar
- J. Zhao, S. Nagarakatte, M. M. K. Martin, and S. Zdancewic. Formal verification of SSA-based optimizations for LLVM. In Proceedings of the 2013 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 175–186, 2013. Google ScholarDigital Library
- Introduction Illustrative Examples GCC Miscompilation Bug 66186 LLVM Miscompilation Bug 26266 Equivalence Modulo Inputs Approach Program Profiling Live Code EMI Mutation Predicate, Expression and TCB Synthesis Predicate Synthesis Bottom-Up Valid Expression Synthesis TCB Synthesis with Speculative Execution Implementation Sparse Profiling Evaluation Testing Setup Quantitative Results Comparison with Athena Effectiveness Line Coverage Comparison of Mutation Strategies Sample Bugs Related Work Compiler Testing Verified Compilers Translation Validation Conclusion AcknowledgmentsGoogle Scholar
Index Terms
- Finding compiler bugs via live code mutation
Recommendations
Finding and understanding bugs in C compilers
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationCompilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to ...
Finding compiler bugs via live code mutation
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsValidating optimizing compilers is challenging because it is hard to generate valid test programs (i.e., those that do not expose any undefined behavior). Equivalence Modulo Inputs (EMI) is an effective, promising methodology to tackle this problem. ...
Finding deep compiler bugs via guided stochastic program mutation
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsCompiler testing is important and challenging. Equivalence Modulo Inputs (EMI) is a recent promising approach for compiler validation. It is based on mutating the unexecuted statements of an existing program under some inputs to produce new equivalent ...
Comments