skip to main content
research-article
Open Access

The Design and Implementation of a Verification Technique for GPU Kernels

Published:22 May 2015Publication History
Skip Abstract Section

Abstract

We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed <i>synchronous, delayed visibility (SDV)</i> semantics, which captures the execution of a GPU kernel by multiple groups of threads. The SDV semantics provides operational definitions for barrier divergence and for both inter- and intra-group data races. We build on the semantics to develop a method for reducing the task of verifying a massively parallel GPU kernel to that of verifying a sequential program. This completely avoids the need to reason about thread interleavings, and allows existing techniques for sequential program verification to be leveraged. We describe an efficient encoding of data race detection and propose a method for automatically inferring the loop invariants that are required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, that can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 162 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, automatic verification of a large number of real-world kernels.

References

  1. Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, and John Wickerson. 2015. GPU concurrency: Weak behaviours and programming assumptions. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). ACM, New York, 577--591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD. 2013. OpenCL Programming Guide, Revision 2.7. Retrieved from http://developer.amd.com/wordpress/media/2013/07/AMD&lowbar;Accelerated&lowbar;Parallel&lowbar;Processing&lowbar;OpenCL&lowbar;Programming&lowbar;Guide-rev-2.7.pdf.Google ScholarGoogle Scholar
  3. Ethel Bardsley, Adam Betts, Nathan Chong, Peter Collingbourne, Pantazis Deligiannis, Alastair F. Donaldson, Jeroen Ketema, Daniel Liew, and Shaz Qadeer. 2014a. Engineering a static verification tool for GPU kernels. In Proceedings of the 26th International Conference on Computer Aided Verification (CAV 2014). Lecture Notes in Computer Science, Vol. 8559. Springer, Berlin, 226--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ethel Bardsley and Alastair F. Donaldson. 2014. Warps and atomics: Beyond barrier synchronization in the verification of GPU kernels. In Proceedings of the 6th NASA Formal Methods Symposium (NFM’14). Lecture Notes in Computer Science, Vol. 8430. Springer, Berlin, 230--245.Google ScholarGoogle Scholar
  5. Ethel Bardsley, Alastair F. Donaldson, and John Wickerson. 2014b. KernelInterceptor: Automating GPU kernel verification by intercepting kernels and their parameters. In Proceedings of the 2014 International Workshop on OpenCL (IWOCL’14). ACM, New York, Article 7, 5 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael Barnett, Bor-Yuh Evan Chang, Robert DeLine, Bart Jacobs, and K. Rustan M. Leino. 2005. Boogie: A modular reusable verifier for object-oriented programs. In Revised Lectures of the 4th International Symposium on Formal Methods for Components and Objects (FMCO’05). Lecture Notes in Computer Science, Vol. 4111. Springer, Berlin, 364--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanovic, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011. CVC4. In Proceedings of the 23rd International Conference in Computer Aided Verification (CAV’11). Lecture Notes in Computer Science, Vol. 6806. Springer, Berlin, 171--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Al Bessey, Ken Block, Benjamin Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles-Henri Gros, Asya Kamsky, Scott McPeak, and Dawson R. Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Adam Betts, Nathan Chong, Alastair F. Donaldson, Shaz Qadeer, and Paul Thomson. 2012. GPUVerify: A verifier for GPU kernels. In Proceedings of the 27th ACM International Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA’12). ACM, New York, 113--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dirk Beyer, Thomas A. Henzinger, Rupak Majumdar, and Andrey Rybalchenko. 2007. Path invariants. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, New York, 300--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Stefan Blom, Marieke Huisman, and Matej Mihelčić. 2014. Specification and verification of GPGPU programs. Science of Computer Programming 95, 3 (2014), 376--388.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Michael Boyer, Kevin Skadron, and Westley Weimer. 2008. Automated dynamic analysis of CUDA programs. In Proceedings of the 3rd Workshop on Software Tools for MultiCore Systems (STMCS’08). Retrieved from http://people.csail.mit.edu/rabbah/conferences/08/cgo/stmcs/.Google ScholarGoogle Scholar
  13. Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). USENIX Association, Berkeley, CA, 209--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Joshua E. Cates, Aaron E. Lefohn, and Ross T. Whitaker. 2004. GIST: An interactive, GPU-based level set segmentation tool for 3D medical images. Medical Image Analysis 8 (2004), 217--231. Issue 3.Google ScholarGoogle ScholarCross RefCross Ref
  15. Wei-Fan Chiang, Ganesh Gopalakrishnan, Guodong Li, and Zvonimir Rakamaric. 2013. Formal analysis of GPU programs with atomics via conflict-directed delay-bounding. In Proceedings of 5th International NASA Formal Methods Symposium (NFM’13). Lecture Notes in Computer Science, Vol. 7871. Springer, Berlin, 213--228.Google ScholarGoogle ScholarCross RefCross Ref
  16. Nathan Chong, Alastair F. Donaldson, Paul Kelly, Shaz Qadeer, and Jeroen Ketema. 2013. Barrier invariants: A shared state abstraction for the analysis of data-dependent GPU kernels. In Proceedings of the 28th ACM International Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA’13). ACM, New York, 605--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Nathan Chong, Alastair F. Donaldson, and Jeroen Ketema. 2014. A sound and complete abstraction for reasoning about parallel prefix sums. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’14). ACM, New York, 397--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ching-Tsun Chou, Phanindra K. Mannava, and Seungjoon Park. 2004. A simple method for parameterized verification of cache coherence protocols. In Proceedings of the 5th International Conference on Formal Methods in Computer-Aided Design (FMCAD’04) Lecture Notes in Computer Science, Vol. 3312. Springer, Berlin, 382--398.Google ScholarGoogle ScholarCross RefCross Ref
  19. Peter Collingbourne, Cristian Cadar, and Paul H. J. Kelly. 2012. Symbolic testing of OpenCL code. In Revised Selected Papers of the 7th International Haifa Verification Conference (HVC’11) Lecture Notes in Computer Science, Vol. 7261. Springer, Berlin, 203--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Peter Collingbourne, Alastair F. Donaldson, Jeroen Ketema, and Shaz Qadeer. 2013. Interleaving and lock-step semantics for analysis and verification of GPU kernels. In Proceedings of the 22nd European Symposium on Programming (ESOP’13). Lecture Notes in Computer Science, Vol. 7792. Springer, Berlin, 270--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’77). ACM, New York, 238--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Leonardo Mendonça de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08). Lecture Notes in Computer Science, Vol. 4963. Springer, Berlin, 337--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Alastair F. Donaldson. 2014. The GPUVerify method: A tutorial overview. Electronic Communications of the EASST 70, Article 1 (2014), 16 pages. Retrieved from http://journal.ub.tu-berlin.de/eceasst/article/view/986.Google ScholarGoogle Scholar
  24. Alastair F. Donaldson, Daniel Kroening, and Philipp Rümmer. 2010. Automatic analysis of scratch-pad memory code for heterogeneous multicore processors. In Proceedings of the 16th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’10). Lecture Notes in Computer Science, Vol. 6015. Springer, Berlin, 280--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Alastair F. Donaldson, Daniel Kroening, and Philipp Rümmer. 2011. Automatic analysis of DMA races using model checking and k-induction. Formal Methods in System Design 39, 1 (2011), 83--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael Emmi, Shaz Qadeer, and Zvonimir Rakamaric. 2011. Delay-bounded scheduling. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2011). ACM, New York, 411--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming 69, 1--3 (2007), 35--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Cormac Flanagan and K. Rustan M. Leino. 2001. Houdini, an annotation assistant for ESC/Java. In Proceedings of the International Symposium of Formal Methods Europe (FME’01). Lecture Notes in Computer Science, Vol. 2021. Springer, Berlin, 500--517. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Susanne Graf and Hassen Saïdi. 1997. Construction of abstract state graphs with PVS. In Proceedings of the 9th International Conference on Computer Aided Verification (CAV’97). Lecture Notes in Computer Science, Vol. 1254. Springer, Berlin, 72--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Axel Habermaier and Alexander Knapp. 2012. On the correctness of the SIMT execution model of GPUs. In Proceedings of the 21st European Symposium on Programming (ESOP’12). Lecture Notes in Computer Science, Vol. 7211. Springer, Berlin, 316--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mark J. Harris. 2004. Fast fluid dynamics simulation on the GPU. In GPU Gems. Addison-Wesley, Boston, MA, 637--665.Google ScholarGoogle Scholar
  32. Temesghen Kahsai, Yeting Ge, and Cesare Tinelli. 2011. Instantiation-based invariant discovery. In Proceedings of 3rd International NASA Formal Methods Symposium (NFM’11). Lecture Notes in Computer Science, Vol. 6617. Springer, Berlin, 192--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rajesh K. Karmani, P. Madhusudan, and Brandon M. Moore. 2011. Thread contracts for safe parallelism. In Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’11). ACM, New York, 125--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Khronos OpenCL Working Group. 2012. The OpenCL Specification, Version 1.2. Document revision 19. Retrieved from https://www.khronos.org/registry/cl/specs/opencl-1.2.pdf.Google ScholarGoogle Scholar
  35. Khronos OpenCL Working Group. 2014a. The OpenCL C Specification, Version 2.0. Document revision 26. Retrieved from https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf.Google ScholarGoogle Scholar
  36. Khronos OpenCL Working Group. 2014b. The OpenCL Extension Specification, Version 2.0. Document revision 26. Retrieved from https://www.khronos.org/registry/cl/specs/opencl-2.0-extensions.pdf.Google ScholarGoogle Scholar
  37. Petr Klus, Simon Lam, Dag Lyberg, Ming Sin Cheung, Graham Pullan, Ian McFarlane, Giles S. H. Yeo, and Brian Y. H. Lam. 2012. BarraCUDA—A fast short read sequence aligner using graphics processing units. BMC Research Notes 5, Article 27 (2012), 7 pages.Google ScholarGoogle Scholar
  38. Kensuke Kojima and Atsushi Igarashi. 2013. A Hoare logic for SIMT programs. In Proceeding of the 11th Asian Symposium on Programming Languages and Systems (APLAS’13). Lecture Notes in Computer Science, Vol. 8301. Springer, Berlin, 58--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Shuvendu K. Lahiri and Shaz Qadeer. 2009. Complexity and algorithms for monomial and clausal predicate abstraction. In Proceedings of the 22nd International Conference on Automated Deduction (CADE-22). Lecture Notes in Computer Science, Vol. 5663. Springer, Berlin, 214--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. Rustan M. Leino, Greg Nelson, and James B. Saxe. 2000. ESC/Java User’s Manual. Technical Report SRC Technical Note 2000-002. Compaq Systems Research Center.Google ScholarGoogle Scholar
  41. Alan Leung, Manish Gupta, Yuvraj Agarwal, Rajesh Gupta, Ranjit Jhala, and Sorin Lerner. 2012. Verifying GPU kernels by test amplification. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, New York, 383--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Guodong Li and Ganesh Gopalakrishnan. 2010. Scalable SMT-based verification of GPU kernel functions. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’10). ACM, New York, 187--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Guodong Li, Peng Li, Geoffrey Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh, and Sreeranga P. Rajan. 2012b. GKLEE: Concolic verification and test generation for GPUs. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’12). ACM, New York, 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Heng Li and Richard Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 14 (2009), 1754--1760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Peng Li, Guodong Li, and Ganesh Gopalakrishnan. 2012a. Parametric flows: Automated behavior equivalencing for symbolic analysis of races in CUDA programs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Anton Lokhmotov. 2011. Mobile and Embedded Computing on Mali GPUs. In Proceedings of the 2nd UK GPU Computing Conference.Google ScholarGoogle Scholar
  47. Kenneth McMillan. 1999. Verification of infinite state systems by compositional model checking. In Proceedings of the 10th Conference on Correct Hardware Design and Verification Methods (CHARME’99). Lecture Notes in Computer Science, Vol. 1703. Springer, Berlin, 219--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Kenneth L. McMillan. 2006. Lazy abstraction with interpolants. In Proceedings of the 18th International Conference on Computer Aided Verification (CAV’06). Lecture Notes in Computer Science, Vol. 4144. Springer, Berlin, 123--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Microsoft Corporation. 2012. C&plus;&plus; AMP: Language and Programming Model, Version 1.0. Retrieved from http://download.microsoft.com/download/4/0/E/40EA02D8-23A7-4BD2-AD3A-0BFFFB640F28/CppAMPLanguageAndProgrammingModel.pdf.Google ScholarGoogle Scholar
  50. Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. 2002. Isabelle/HOL: A Proof Assistant for Higher-Order Logic. Lecture Notes in Computer Science, Vol. 2283. Springer, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Nvidia. 2012a. CUDA C Programming Guide, Version 5.0. http://docs.nvidia.com/cuda/cuda-c-programming-guide/.Google ScholarGoogle Scholar
  52. Nvidia. 2012b. Parallel Thread Execution ISA, Version 3.1.Google ScholarGoogle Scholar
  53. Lars Nyland. 2012. Personal Communication.Google ScholarGoogle Scholar
  54. Lars Nyland, Mark Harris, and Jan Prins. 2007. Fast N-Body Simulation with CUDA. Addison-Wesley, Upper Saddle River, NJ, 677--696.Google ScholarGoogle Scholar
  55. Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H. J. Kelly, and Andrew J. Davison. 2013. SLAM&plus;&plus;: Simultaneous localisation and mapping at the level of objects. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, Los Alamitos, CA, 1352--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Saurabh Srivastava and Sumit Gulwani. 2009. Program verification using templates over predicate abstraction. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). ACM, New York, 223--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’96). ACM, New York, 32--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Murali Talupur and Mark R. Tuttle. 2008. Going with the flow: Parameterized verification using message flows. In Proceedings of the 8th International Conference on Formal Methods in Computer-Aided Design (FMCAD’08). IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Stavros Tripakis, Christos Stergiou, and Roberto Lublinerman. 2010. Checking equivalence of SPMD programs using non-interference. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar’10). Retrieved from http://static.usenix.org/events/hotpar10/.Google ScholarGoogle ScholarCross RefCross Ref
  60. Christian Urban and Julien Narboux. 2009. Formal SOS-proofs for the lambda-calculus. Electronic Notes in Theoretical Computer Science 247 (2009), 139--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. John Wickerson. 2014. Syntax and semantics of a GPU kernel programming language. Archive of Formal Proofs. Retrieved from http://afp.sourceforge.net/entries/GPU_Kernel_PL.shtml.Google ScholarGoogle Scholar

Index Terms

  1. The Design and Implementation of a Verification Technique for GPU Kernels

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Programming Languages and Systems
        ACM Transactions on Programming Languages and Systems  Volume 37, Issue 3
        June 2015
        134 pages
        ISSN:0164-0925
        EISSN:1558-4593
        DOI:10.1145/2785583
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 May 2015
        • Accepted: 1 March 2015
        • Revised: 1 November 2014
        • Received: 1 April 2014
        Published in toplas Volume 37, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader