skip to main content
10.1145/2509136.2509517acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels

Published:29 October 2013Publication History

ABSTRACT

Data-dependent GPU kernels, whose data or control flow are dependent on the input of the program, are difficult to verify because they require reasoning about shared state manipulated by many parallel threads. Existing verification techniques for GPU kernels achieve soundness and scalability by using a two-thread reduction and making the contents of the shared state nondeterministic each time threads synchronise at a barrier, to account for all possible thread interactions. This coarse abstraction prohibits verification of data-dependent kernels. We present barrier invariants, a novel abstraction technique which allows key properties about the shared state of a kernel to be preserved across barriers during formal reasoning. We have integrated barrier invariants with the GPUVerify tool, and present a detailed case study showing how they can be used to verify three prefix sum algorithms, allowing efficient modular verification of a stream compaction kernel, a key building block for GPU programming. This analysis goes significantly beyond what is possible using existing verification techniques for GPU kernels.

References

  1. M. Barnett et al. Boogie: A modular reusable verifier for object-oriented programs. In FMCO, pages 364--387, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Betts, N. Chong, A. F. Donaldson, S. Qadeer, and P. Thomson. GPUVerify: a verifier for GPU kernels. In OOPSLA, pages 113--132, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Billeter, O. Olsson, and U. Assarsson. Efficient stream compaction on wide SIMD many-core architectures. In HPG, pages 159--166, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. E. Blelloch. Prefix sums and their applications. In J. H. Reif, editor, Synthesis of Parallel Algorithms. Morgan Kaufmann, 1990.Google ScholarGoogle Scholar
  5. R. P. Brent and H.-T. Kung. A regular layout for parallel adders. IEEE Trans. Computers, 31 (3): 260--264, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, pages 209--224, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, pages 44--54, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W.-F. Chiang, G. Gopalakrishnan, G. Li, and Z. Rakamaric. Formal analysis of GPU programs with atomics via conflict-directed delay-bounding. In NFM, pages 213--228, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  9. C.-T. Chou, P. K. Mannava, and S. Park. A simple method for parameterized verification of cache coherence protocols. In FMCAD, pages 382--398, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Christakis, P. Müller, and V. Wüstholz. Collaborative verification and testing with explicit assumptions. In FM, pages 132--146, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  11. P. Collingbourne, C. Cadar, and P. H. J. Kelly. Symbolic testing of OpenCL code. In HVC, pages 203--218, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Collingbourne, A. F. Donaldson, J. Ketema, and S. Qadeer. Interleaving and lock-step semantics for analysis and verification of GPU kernels. In ESOP, pages 270--289, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238--252, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. M. de Moura and N. Bjørner. Efficient E-matching for SMT solvers. In CADE, pages 183--198, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. M. de Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, pages 337--340, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Detlefs, G. Nelson, and J. B. Saxe. Simplify: a theorem prover for program checking. J. ACM, 52 (3): 365--473, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. F. Donaldson, L. Haller, and D. Kroening. Strengthening induction-based race checking with lightweight static analysis. In VMCAI, pages 169--183, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Harris, S. Sengupta, and J. D. Owens. Parallel prefix sum (scan) with CUDA. In H. Nguyen, editor, GPU Gems 3. Addison-Wesley, 2007.Google ScholarGoogle Scholar
  19. M. Huisman and M. Mihelčić. Specification and verification of GPGPU programs using permission-based separation logic. In BYTECODE, 2013.Google ScholarGoogle Scholar
  20. C. B. Jones. Tentative steps toward a development method for interfering programs. ACM Trans. Program. Lang. Syst., 5 (4): 596--619, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Karmani, P. Madhusudan, and B. Moore. Thread contracts for safe parallelism. In PPoPP, pages 125--134, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Khronos OpenCL Working Group. The OpenCL specification, version 1.2, 2012.Google ScholarGoogle Scholar
  23. P. M. Kogge and H. S. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Trans. Computers, C-22 (8): 786--793, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Leung, M. Gupta, Y. Agarwal, et al. Verifying GPU kernels by test amplification. In PLDI, pages 383--394, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Li and G. Gopalakrishnan. Scalable SMT-based verification of GPU kernel functions. In FSE, pages 187--196, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: Concolic verification and test generation for GPUs. In PPoPP, pages 215--224, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Li, G. Li, and G. Gopalakrishnan. Parametric flows: Automated behavior equivalencing for symbolic analysis of races in CUDA programs. In SC, pages 29:1--29:10, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. McMillan. Verification of infinite state systems by compositional model checking. In CHARME, pages 219--234, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Moskal. Programming with triggers. In SMT, pages 20--29, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. NVIDIA. CUDA C programming guide, version 5.0, 2012.Google ScholarGoogle Scholar
  31. S. S. Owicki and D. Gries. An axiomatic proof technique for parallel programs I. Acta Inf., 6: 319--340, 1976.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. F. Siegel and T. K. Zirkel. Loop invariant symbolic execution for parallel programs. In VMCAI, pages 412--427, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Stratton et al. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical Report IMPACT-12-01, UIUC, 2012.Google ScholarGoogle Scholar
  34. M. Talupur and M. R. Tuttle. Going with the flow: Parameterized verification using message flows. In FMCAD, pages 1--8, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
        October 2013
        904 pages
        ISBN:9781450323741
        DOI:10.1145/2509136

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 October 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        OOPSLA '13 Paper Acceptance Rate50of189submissions,26%Overall Acceptance Rate268of1,244submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader