Verification of producer-consumer synchronization in GPU programs

Authors:
Rahul Sharma

Stanford University, USA

Stanford University, USA
View Profile

,
Michael Bauer

NVIDIA Research, USA

NVIDIA Research, USA
View Profile

,
Alex Aiken

Stanford University, USA

Stanford University, USA
View Profile

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2015Pages 88–98https://doi.org/10.1145/2737924.2737962

Published:03 June 2015Publication History

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 88–98

ABSTRACT

Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. No previous work has considered the higher performance, but more complex, warp-specialized kernels based on producer-consumer named barriers available on current hardware. In this work we present the first formal operational semantics for named barriers and define what it means for a warp-specialized kernel to be correct. We give algorithms for verifying the correctness of warp-specialized kernels and prove that they are both sound and complete for the most common class of warp-specialized programs. We also present WEFT, a verification tool for checking warp-specialized code. Using WEFT, we discover several non-trivial bugs in production warp-specialized kernels.

References

Parallel thread execution ISA version 4.1. http://docs.nvidia. com/cuda/parallel-thread-execution/index.html, 2014.Google Scholar
A. Aiken and D. Gay. Barrier inference. In POPL, pages 342–354, 1998. Google ScholarDigital Library
J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: Weak behaviours and programming assumptions. In ASPLOS, pages 577– 591, 2015. Google ScholarDigital Library
E. Bardsley and A. F. Donaldson. Warps and atomics: Beyond barrier synchronization in the verification of GPU kernels. In NFM, pages 230–245, 2014.Google ScholarDigital Library
E. Bardsley, A. Betts, N. Chong, P. Collingbourne, P. Deligiannis, A. F. Donaldson, J. Ketema, D. Liew, and S. Qadeer. Engineering a static verification tool for GPU kernels. In CAV, pages 226–242, 2014. Google ScholarDigital Library
M. Bauer, H. Cook, and B. Khailany. CudaDMA: optimizing GPU memory bandwidth via warp specialization. SC ’11, 2011. Google ScholarDigital Library
M. Bauer, S. Treichler, and A. Aiken. Singe: Leveraging warp specialization for high performance on GPUs. PPoPP ’14, 2014. Google ScholarDigital Library
A. Betts, N. Chong, A. F. Donaldson, S. Qadeer, and P. Thomson. GPUVerify: a verifier for GPU kernels. In OOPSLA, pages 113–132, 2012. Google ScholarDigital Library
J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo. Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science and Discovery, page 015001, 2009.Google Scholar
W. Chiang, G. Gopalakrishnan, G. Li, and Z. Rakamaric. Formal analysis of GPU programs with atomics via conflict-directed delaybounding. In NFM, pages 213–228, 2013.Google Scholar
N. Chong, A. F. Donaldson, P. H. J. Kelly, J. Ketema, and S. Qadeer. Barrier invariants: a shared state abstraction for the analysis of datadependent GPU kernels. In OOPSLA, pages 605–622, 2013. Google ScholarDigital Library
N. Chong, A. F. Donaldson, and J. Ketema. A sound and complete abstraction for reasoning about parallel prefix sums. In POPL, pages 397–410, 2014. Google ScholarDigital Library
P. Collingbourne, C. Cadar, and P. H. J. Kelly. Symbolic crosschecking of floating-point and SIMD code. In EuroSys, pages 315–328, 2011. Google ScholarDigital Library
S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco. Gpus and the future of parallel computing. IEEE Micro, 31(5):7–17, 2011. Google ScholarDigital Library
Khronos. The OpenCL Specification, Version 1.0. The Khronos OpenCL Working Group, December 2008.Google Scholar
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558–565, 1978. Google ScholarDigital Library
D.-K. Le, W.-N. Chin, and Y. M. Teo. Verification of static and dynamic barrier synchronization using bounded permissions. In ICFEM, pages 231–248, 2013.Google ScholarCross Ref
A. Leung, M. Gupta, Y. Agarwal, R. Gupta, R. Jhala, and S. Lerner. Verifying GPU kernels by test amplification. In PLDI, pages 383–394, 2012. Google ScholarDigital Library
G. Li and G. Gopalakrishnan. Scalable smt-based verification of GPU kernel functions. In FSE, pages 187–196, 2010. Google ScholarDigital Library
G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: concolic verification and test generation for GPUs. In PPOPP, pages 215–224, 2012. Google ScholarDigital Library
P. Li, G. Li, and G. Gopalakrishnan. Parametric flows: automated behavior equivalencing for symbolic analysis of races in CUDA programs. In SC, page 29, 2012. Google ScholarDigital Library
NVIDIA. CUDA programming guide 6.0. http://developer. download.nvidia.com/compute/DevZone/docs/html/C/doc/ CUDA_C_Programming_Guide.pdf, August 2014.Google Scholar
Oracle. Class phaser. https://docs.oracle.com/javase/7/ docs/api/java/util/concurrent/Phaser.html, 2014.Google Scholar
S. G. Parker, J. Bigler, A. Dietrich, H. Friedrich, J. Hoberock, D. Luebke, D. McAllister, M. McGuire, K. Morley, A. Robison, and M. Stich. OptiX: A general purpose ray tracing engine. ACM Transactions on Graphics, August 2010. Google ScholarDigital Library
G. Ramalingam. Context-sensitive synchronization-sensitive analysis is undecidable. ACM Trans. Program. Lang. Syst., 22(2):416–430, 2000. Google ScholarDigital Library
Top500. Top 500 supercomputers. http://www.top500.org, 2014.Google Scholar

Index Terms

Verification of producer-consumer synchronization in GPU programs

Recommendations

The Design and Implementation of a Verification Technique for GPU Kernels

We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed <i>synchronous, ...
Read More
Verification of producer-consumer synchronization in GPU programs
PLDI '15

Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. No previous work has considered the higher performance, but more complex, warp-specialized ...
Read More
GPUVerify: a verifier for GPU kernels
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2015
630 pages
ISBN:9781450334686
DOI:10.1145/2737924
General Chair:
David Grove
IBM Research, USA
,
Program Chair:
Steve Blackburn
Australian National University, Australia
ACM SIGPLAN Notices Volume 50, Issue 6
PLDI '15
June 2015
630 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2813885
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPUs
Verification
barrier recycling
data races
deadlock
named barriers
synchronization
warp specialization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 979
  Total Downloads
- Downloads (Last 12 months)218
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Verification of producer-consumer synchronization in GPU programs

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

The Design and Implementation of a Verification Technique for GPU Kernels

Verification of producer-consumer synchronization in GPU programs

GPUVerify: a verifier for GPU kernels