Abstract
We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed <i>synchronous, delayed visibility (SDV)</i> semantics, which captures the execution of a GPU kernel by multiple groups of threads. The SDV semantics provides operational definitions for barrier divergence and for both inter- and intra-group data races. We build on the semantics to develop a method for reducing the task of verifying a massively parallel GPU kernel to that of verifying a sequential program. This completely avoids the need to reason about thread interleavings, and allows existing techniques for sequential program verification to be leveraged. We describe an efficient encoding of data race detection and propose a method for automatically inferring the loop invariants that are required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, that can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 162 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, automatic verification of a large number of real-world kernels.
- Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, and John Wickerson. 2015. GPU concurrency: Weak behaviours and programming assumptions. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). ACM, New York, 577--591. Google ScholarDigital Library
- AMD. 2013. OpenCL Programming Guide, Revision 2.7. Retrieved from http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf.Google Scholar
- Ethel Bardsley, Adam Betts, Nathan Chong, Peter Collingbourne, Pantazis Deligiannis, Alastair F. Donaldson, Jeroen Ketema, Daniel Liew, and Shaz Qadeer. 2014a. Engineering a static verification tool for GPU kernels. In Proceedings of the 26th International Conference on Computer Aided Verification (CAV 2014). Lecture Notes in Computer Science, Vol. 8559. Springer, Berlin, 226--242. Google ScholarDigital Library
- Ethel Bardsley and Alastair F. Donaldson. 2014. Warps and atomics: Beyond barrier synchronization in the verification of GPU kernels. In Proceedings of the 6th NASA Formal Methods Symposium (NFM’14). Lecture Notes in Computer Science, Vol. 8430. Springer, Berlin, 230--245.Google Scholar
- Ethel Bardsley, Alastair F. Donaldson, and John Wickerson. 2014b. KernelInterceptor: Automating GPU kernel verification by intercepting kernels and their parameters. In Proceedings of the 2014 International Workshop on OpenCL (IWOCL’14). ACM, New York, Article 7, 5 pages. Google ScholarDigital Library
- Michael Barnett, Bor-Yuh Evan Chang, Robert DeLine, Bart Jacobs, and K. Rustan M. Leino. 2005. Boogie: A modular reusable verifier for object-oriented programs. In Revised Lectures of the 4th International Symposium on Formal Methods for Components and Objects (FMCO’05). Lecture Notes in Computer Science, Vol. 4111. Springer, Berlin, 364--387. Google ScholarDigital Library
- Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanovic, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011. CVC4. In Proceedings of the 23rd International Conference in Computer Aided Verification (CAV’11). Lecture Notes in Computer Science, Vol. 6806. Springer, Berlin, 171--177. Google ScholarDigital Library
- Al Bessey, Ken Block, Benjamin Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles-Henri Gros, Asya Kamsky, Scott McPeak, and Dawson R. Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66--75. Google ScholarDigital Library
- Adam Betts, Nathan Chong, Alastair F. Donaldson, Shaz Qadeer, and Paul Thomson. 2012. GPUVerify: A verifier for GPU kernels. In Proceedings of the 27th ACM International Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA’12). ACM, New York, 113--132. Google ScholarDigital Library
- Dirk Beyer, Thomas A. Henzinger, Rupak Majumdar, and Andrey Rybalchenko. 2007. Path invariants. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, New York, 300--309. Google ScholarDigital Library
- Stefan Blom, Marieke Huisman, and Matej Mihelčić. 2014. Specification and verification of GPGPU programs. Science of Computer Programming 95, 3 (2014), 376--388.Google ScholarDigital Library
- Michael Boyer, Kevin Skadron, and Westley Weimer. 2008. Automated dynamic analysis of CUDA programs. In Proceedings of the 3rd Workshop on Software Tools for MultiCore Systems (STMCS’08). Retrieved from http://people.csail.mit.edu/rabbah/conferences/08/cgo/stmcs/.Google Scholar
- Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). USENIX Association, Berkeley, CA, 209--224. Google ScholarDigital Library
- Joshua E. Cates, Aaron E. Lefohn, and Ross T. Whitaker. 2004. GIST: An interactive, GPU-based level set segmentation tool for 3D medical images. Medical Image Analysis 8 (2004), 217--231. Issue 3.Google ScholarCross Ref
- Wei-Fan Chiang, Ganesh Gopalakrishnan, Guodong Li, and Zvonimir Rakamaric. 2013. Formal analysis of GPU programs with atomics via conflict-directed delay-bounding. In Proceedings of 5th International NASA Formal Methods Symposium (NFM’13). Lecture Notes in Computer Science, Vol. 7871. Springer, Berlin, 213--228.Google ScholarCross Ref
- Nathan Chong, Alastair F. Donaldson, Paul Kelly, Shaz Qadeer, and Jeroen Ketema. 2013. Barrier invariants: A shared state abstraction for the analysis of data-dependent GPU kernels. In Proceedings of the 28th ACM International Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA’13). ACM, New York, 605--622. Google ScholarDigital Library
- Nathan Chong, Alastair F. Donaldson, and Jeroen Ketema. 2014. A sound and complete abstraction for reasoning about parallel prefix sums. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’14). ACM, New York, 397--410. Google ScholarDigital Library
- Ching-Tsun Chou, Phanindra K. Mannava, and Seungjoon Park. 2004. A simple method for parameterized verification of cache coherence protocols. In Proceedings of the 5th International Conference on Formal Methods in Computer-Aided Design (FMCAD’04) Lecture Notes in Computer Science, Vol. 3312. Springer, Berlin, 382--398.Google ScholarCross Ref
- Peter Collingbourne, Cristian Cadar, and Paul H. J. Kelly. 2012. Symbolic testing of OpenCL code. In Revised Selected Papers of the 7th International Haifa Verification Conference (HVC’11) Lecture Notes in Computer Science, Vol. 7261. Springer, Berlin, 203--218. Google ScholarDigital Library
- Peter Collingbourne, Alastair F. Donaldson, Jeroen Ketema, and Shaz Qadeer. 2013. Interleaving and lock-step semantics for analysis and verification of GPU kernels. In Proceedings of the 22nd European Symposium on Programming (ESOP’13). Lecture Notes in Computer Science, Vol. 7792. Springer, Berlin, 270--289. Google ScholarDigital Library
- Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’77). ACM, New York, 238--252. Google ScholarDigital Library
- Leonardo Mendonça de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08). Lecture Notes in Computer Science, Vol. 4963. Springer, Berlin, 337--340. Google ScholarDigital Library
- Alastair F. Donaldson. 2014. The GPUVerify method: A tutorial overview. Electronic Communications of the EASST 70, Article 1 (2014), 16 pages. Retrieved from http://journal.ub.tu-berlin.de/eceasst/article/view/986.Google Scholar
- Alastair F. Donaldson, Daniel Kroening, and Philipp Rümmer. 2010. Automatic analysis of scratch-pad memory code for heterogeneous multicore processors. In Proceedings of the 16th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’10). Lecture Notes in Computer Science, Vol. 6015. Springer, Berlin, 280--295. Google ScholarDigital Library
- Alastair F. Donaldson, Daniel Kroening, and Philipp Rümmer. 2011. Automatic analysis of DMA races using model checking and k-induction. Formal Methods in System Design 39, 1 (2011), 83--113. Google ScholarDigital Library
- Michael Emmi, Shaz Qadeer, and Zvonimir Rakamaric. 2011. Delay-bounded scheduling. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2011). ACM, New York, 411--422. Google ScholarDigital Library
- Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming 69, 1--3 (2007), 35--45. Google ScholarDigital Library
- Cormac Flanagan and K. Rustan M. Leino. 2001. Houdini, an annotation assistant for ESC/Java. In Proceedings of the International Symposium of Formal Methods Europe (FME’01). Lecture Notes in Computer Science, Vol. 2021. Springer, Berlin, 500--517. Google ScholarDigital Library
- Susanne Graf and Hassen Saïdi. 1997. Construction of abstract state graphs with PVS. In Proceedings of the 9th International Conference on Computer Aided Verification (CAV’97). Lecture Notes in Computer Science, Vol. 1254. Springer, Berlin, 72--83. Google ScholarDigital Library
- Axel Habermaier and Alexander Knapp. 2012. On the correctness of the SIMT execution model of GPUs. In Proceedings of the 21st European Symposium on Programming (ESOP’12). Lecture Notes in Computer Science, Vol. 7211. Springer, Berlin, 316--335. Google ScholarDigital Library
- Mark J. Harris. 2004. Fast fluid dynamics simulation on the GPU. In GPU Gems. Addison-Wesley, Boston, MA, 637--665.Google Scholar
- Temesghen Kahsai, Yeting Ge, and Cesare Tinelli. 2011. Instantiation-based invariant discovery. In Proceedings of 3rd International NASA Formal Methods Symposium (NFM’11). Lecture Notes in Computer Science, Vol. 6617. Springer, Berlin, 192--206. Google ScholarDigital Library
- Rajesh K. Karmani, P. Madhusudan, and Brandon M. Moore. 2011. Thread contracts for safe parallelism. In Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’11). ACM, New York, 125--134. Google ScholarDigital Library
- Khronos OpenCL Working Group. 2012. The OpenCL Specification, Version 1.2. Document revision 19. Retrieved from https://www.khronos.org/registry/cl/specs/opencl-1.2.pdf.Google Scholar
- Khronos OpenCL Working Group. 2014a. The OpenCL C Specification, Version 2.0. Document revision 26. Retrieved from https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf.Google Scholar
- Khronos OpenCL Working Group. 2014b. The OpenCL Extension Specification, Version 2.0. Document revision 26. Retrieved from https://www.khronos.org/registry/cl/specs/opencl-2.0-extensions.pdf.Google Scholar
- Petr Klus, Simon Lam, Dag Lyberg, Ming Sin Cheung, Graham Pullan, Ian McFarlane, Giles S. H. Yeo, and Brian Y. H. Lam. 2012. BarraCUDA—A fast short read sequence aligner using graphics processing units. BMC Research Notes 5, Article 27 (2012), 7 pages.Google Scholar
- Kensuke Kojima and Atsushi Igarashi. 2013. A Hoare logic for SIMT programs. In Proceeding of the 11th Asian Symposium on Programming Languages and Systems (APLAS’13). Lecture Notes in Computer Science, Vol. 8301. Springer, Berlin, 58--73. Google ScholarDigital Library
- Shuvendu K. Lahiri and Shaz Qadeer. 2009. Complexity and algorithms for monomial and clausal predicate abstraction. In Proceedings of the 22nd International Conference on Automated Deduction (CADE-22). Lecture Notes in Computer Science, Vol. 5663. Springer, Berlin, 214--229. Google ScholarDigital Library
- K. Rustan M. Leino, Greg Nelson, and James B. Saxe. 2000. ESC/Java User’s Manual. Technical Report SRC Technical Note 2000-002. Compaq Systems Research Center.Google Scholar
- Alan Leung, Manish Gupta, Yuvraj Agarwal, Rajesh Gupta, Ranjit Jhala, and Sorin Lerner. 2012. Verifying GPU kernels by test amplification. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, New York, 383--394. Google ScholarDigital Library
- Guodong Li and Ganesh Gopalakrishnan. 2010. Scalable SMT-based verification of GPU kernel functions. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’10). ACM, New York, 187--196. Google ScholarDigital Library
- Guodong Li, Peng Li, Geoffrey Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh, and Sreeranga P. Rajan. 2012b. GKLEE: Concolic verification and test generation for GPUs. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’12). ACM, New York, 215--224. Google ScholarDigital Library
- Heng Li and Richard Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 14 (2009), 1754--1760. Google ScholarDigital Library
- Peng Li, Guodong Li, and Ganesh Gopalakrishnan. 2012a. Parametric flows: Automated behavior equivalencing for symbolic analysis of races in CUDA programs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE, Los Alamitos, CA. Google ScholarDigital Library
- Anton Lokhmotov. 2011. Mobile and Embedded Computing on Mali GPUs. In Proceedings of the 2nd UK GPU Computing Conference.Google Scholar
- Kenneth McMillan. 1999. Verification of infinite state systems by compositional model checking. In Proceedings of the 10th Conference on Correct Hardware Design and Verification Methods (CHARME’99). Lecture Notes in Computer Science, Vol. 1703. Springer, Berlin, 219--234. Google ScholarDigital Library
- Kenneth L. McMillan. 2006. Lazy abstraction with interpolants. In Proceedings of the 18th International Conference on Computer Aided Verification (CAV’06). Lecture Notes in Computer Science, Vol. 4144. Springer, Berlin, 123--136. Google ScholarDigital Library
- Microsoft Corporation. 2012. C++ AMP: Language and Programming Model, Version 1.0. Retrieved from http://download.microsoft.com/download/4/0/E/40EA02D8-23A7-4BD2-AD3A-0BFFFB640F28/CppAMPLanguageAndProgrammingModel.pdf.Google Scholar
- Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. 2002. Isabelle/HOL: A Proof Assistant for Higher-Order Logic. Lecture Notes in Computer Science, Vol. 2283. Springer, Berlin. Google ScholarDigital Library
- Nvidia. 2012a. CUDA C Programming Guide, Version 5.0. http://docs.nvidia.com/cuda/cuda-c-programming-guide/.Google Scholar
- Nvidia. 2012b. Parallel Thread Execution ISA, Version 3.1.Google Scholar
- Lars Nyland. 2012. Personal Communication.Google Scholar
- Lars Nyland, Mark Harris, and Jan Prins. 2007. Fast N-Body Simulation with CUDA. Addison-Wesley, Upper Saddle River, NJ, 677--696.Google Scholar
- Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H. J. Kelly, and Andrew J. Davison. 2013. SLAM++: Simultaneous localisation and mapping at the level of objects. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, Los Alamitos, CA, 1352--1359. Google ScholarDigital Library
- Saurabh Srivastava and Sumit Gulwani. 2009. Program verification using templates over predicate abstraction. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). ACM, New York, 223--234. Google ScholarDigital Library
- Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’96). ACM, New York, 32--41. Google ScholarDigital Library
- Murali Talupur and Mark R. Tuttle. 2008. Going with the flow: Parameterized verification using message flows. In Proceedings of the 8th International Conference on Formal Methods in Computer-Aided Design (FMCAD’08). IEEE, Los Alamitos, CA. Google ScholarDigital Library
- Stavros Tripakis, Christos Stergiou, and Roberto Lublinerman. 2010. Checking equivalence of SPMD programs using non-interference. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar’10). Retrieved from http://static.usenix.org/events/hotpar10/.Google ScholarCross Ref
- Christian Urban and Julien Narboux. 2009. Formal SOS-proofs for the lambda-calculus. Electronic Notes in Theoretical Computer Science 247 (2009), 139--155. Google ScholarDigital Library
- John Wickerson. 2014. Syntax and semantics of a GPU kernel programming language. Archive of Formal Proofs. Retrieved from http://afp.sourceforge.net/entries/GPU_Kernel_PL.shtml.Google Scholar
Index Terms
- The Design and Implementation of a Verification Technique for GPU Kernels
Recommendations
GPUVerify: a verifier for GPU kernels
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsWe present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming ...
GPUVerify: a verifier for GPU kernels
OOPSLA '12We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming ...
Verification of producer-consumer synchronization in GPU programs
PLDI '15Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. No previous work has considered the higher performance, but more complex, warp-specialized ...
Comments