research-article

Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels

Authors:
Nathan Chong

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom
View Profile

,
Alastair F. Donaldson

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom
View Profile

,
Paul H.J. Kelly

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom
View Profile

,
Jeroen Ketema

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom
View Profile

,
Shaz Qadeer

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applicationsOctober 2013Pages 605–622https://doi.org/10.1145/2509136.2509517

Published:29 October 2013Publication History

OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

Pages 605–622

ABSTRACT

Data-dependent GPU kernels, whose data or control flow are dependent on the input of the program, are difficult to verify because they require reasoning about shared state manipulated by many parallel threads. Existing verification techniques for GPU kernels achieve soundness and scalability by using a two-thread reduction and making the contents of the shared state nondeterministic each time threads synchronise at a barrier, to account for all possible thread interactions. This coarse abstraction prohibits verification of data-dependent kernels. We present barrier invariants, a novel abstraction technique which allows key properties about the shared state of a kernel to be preserved across barriers during formal reasoning. We have integrated barrier invariants with the GPUVerify tool, and present a detailed case study showing how they can be used to verify three prefix sum algorithms, allowing efficient modular verification of a stream compaction kernel, a key building block for GPU programming. This analysis goes significantly beyond what is possible using existing verification techniques for GPU kernels.

References

M. Barnett et al. Boogie: A modular reusable verifier for object-oriented programs. In FMCO, pages 364--387, 2005. Google ScholarDigital Library
A. Betts, N. Chong, A. F. Donaldson, S. Qadeer, and P. Thomson. GPUVerify: a verifier for GPU kernels. In OOPSLA, pages 113--132, 2012. Google ScholarDigital Library
M. Billeter, O. Olsson, and U. Assarsson. Efficient stream compaction on wide SIMD many-core architectures. In HPG, pages 159--166, 2009. Google ScholarDigital Library
G. E. Blelloch. Prefix sums and their applications. In J. H. Reif, editor, Synthesis of Parallel Algorithms. Morgan Kaufmann, 1990.Google Scholar
R. P. Brent and H.-T. Kung. A regular layout for parallel adders. IEEE Trans. Computers, 31 (3): 260--264, 1982. Google ScholarDigital Library
C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, pages 209--224, 2008. Google ScholarDigital Library
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, pages 44--54, 2009. Google ScholarDigital Library
W.-F. Chiang, G. Gopalakrishnan, G. Li, and Z. Rakamaric. Formal analysis of GPU programs with atomics via conflict-directed delay-bounding. In NFM, pages 213--228, 2013.Google ScholarCross Ref
C.-T. Chou, P. K. Mannava, and S. Park. A simple method for parameterized verification of cache coherence protocols. In FMCAD, pages 382--398, 2004.Google ScholarCross Ref
M. Christakis, P. Müller, and V. Wüstholz. Collaborative verification and testing with explicit assumptions. In FM, pages 132--146, 2012.Google ScholarCross Ref
P. Collingbourne, C. Cadar, and P. H. J. Kelly. Symbolic testing of OpenCL code. In HVC, pages 203--218, 2011. Google ScholarDigital Library
P. Collingbourne, A. F. Donaldson, J. Ketema, and S. Qadeer. Interleaving and lock-step semantics for analysis and verification of GPU kernels. In ESOP, pages 270--289, 2013. Google ScholarDigital Library
P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238--252, 1977. Google ScholarDigital Library
L. M. de Moura and N. Bjørner. Efficient E-matching for SMT solvers. In CADE, pages 183--198, 2007. Google ScholarDigital Library
L. M. de Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, pages 337--340, 2008. Google ScholarDigital Library
D. Detlefs, G. Nelson, and J. B. Saxe. Simplify: a theorem prover for program checking. J. ACM, 52 (3): 365--473, 2005. Google ScholarDigital Library
A. F. Donaldson, L. Haller, and D. Kroening. Strengthening induction-based race checking with lightweight static analysis. In VMCAI, pages 169--183, 2011. Google ScholarDigital Library
M. Harris, S. Sengupta, and J. D. Owens. Parallel prefix sum (scan) with CUDA. In H. Nguyen, editor, GPU Gems 3. Addison-Wesley, 2007.Google Scholar
M. Huisman and M. Mihelčić. Specification and verification of GPGPU programs using permission-based separation logic. In BYTECODE, 2013.Google Scholar
C. B. Jones. Tentative steps toward a development method for interfering programs. ACM Trans. Program. Lang. Syst., 5 (4): 596--619, 1983. Google ScholarDigital Library
R. Karmani, P. Madhusudan, and B. Moore. Thread contracts for safe parallelism. In PPoPP, pages 125--134, 2011. Google ScholarDigital Library
Khronos OpenCL Working Group. The OpenCL specification, version 1.2, 2012.Google Scholar
P. M. Kogge and H. S. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Trans. Computers, C-22 (8): 786--793, 1973. Google ScholarDigital Library
A. Leung, M. Gupta, Y. Agarwal, et al. Verifying GPU kernels by test amplification. In PLDI, pages 383--394, 2012. Google ScholarDigital Library
G. Li and G. Gopalakrishnan. Scalable SMT-based verification of GPU kernel functions. In FSE, pages 187--196, 2010. Google ScholarDigital Library
G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: Concolic verification and test generation for GPUs. In PPoPP, pages 215--224, 2012. Google ScholarDigital Library
P. Li, G. Li, and G. Gopalakrishnan. Parametric flows: Automated behavior equivalencing for symbolic analysis of races in CUDA programs. In SC, pages 29:1--29:10, 2012. Google ScholarDigital Library
K. McMillan. Verification of infinite state systems by compositional model checking. In CHARME, pages 219--234, 1999. Google ScholarDigital Library
M. Moskal. Programming with triggers. In SMT, pages 20--29, 2009. Google ScholarDigital Library
NVIDIA. CUDA C programming guide, version 5.0, 2012.Google Scholar
S. S. Owicki and D. Gries. An axiomatic proof technique for parallel programs I. Acta Inf., 6: 319--340, 1976.Google ScholarDigital Library
S. F. Siegel and T. K. Zirkel. Loop invariant symbolic execution for parallel programs. In VMCAI, pages 412--427, 2012. Google ScholarDigital Library
J. Stratton et al. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical Report IMPACT-12-01, UIUC, 2012.Google Scholar
M. Talupur and M. R. Tuttle. Going with the flow: Parameterized verification using message flows. In FMCAD, pages 1--8, 2008. Google ScholarDigital Library

Index Terms

Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
1. Theory of computation
  1. Logic
  2. Semantics and reasoning
    1. Program reasoning

Recommendations

Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
OOPSLA '13

Data-dependent GPU kernels, whose data or control flow are dependent on the input of the program, are difficult to verify because they require reasoning about shared state manipulated by many parallel threads. Existing verification techniques for GPU ...
Read More
GPUVerify: a verifier for GPU kernels
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming ...
Read More
GPUVerify: a verifier for GPU kernels
OOPSLA '12

We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
October 2013
904 pages
ISBN:9781450323741
DOI:10.1145/2509136
Co-chair:
Antony Hosking
Purdue University, USA
,
General Chair:
Patrick Eugster
Purdue University, USA
,
Program Chair:
Cristina V. Lopes
University of California, Irvine, USA
ACM SIGPLAN Notices Volume 48, Issue 10
OOPSLA '13
October 2013
867 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2544173
Editor:
Mark W. Bailey
Hamilton College, Clinton, NY
Issue’s Table of Contents
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
concurrency
data races
gpus
verification
Qualifiers
- research-article
Conference

Acceptance Rates
OOPSLA '13 Paper Acceptance Rate50of189submissions,26%Overall Acceptance Rate268of1,244submissions,22%
More
Upcoming Conference
SPLASH '24

Sponsor:

sigplan

ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

October 20 - 25, 2024

Pasadena , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 234
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels

OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels

GPUVerify: a verifier for GPU kernels

GPUVerify: a verifier for GPU kernels