ABSTRACT
Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. No previous work has considered the higher performance, but more complex, warp-specialized kernels based on producer-consumer named barriers available on current hardware. In this work we present the first formal operational semantics for named barriers and define what it means for a warp-specialized kernel to be correct. We give algorithms for verifying the correctness of warp-specialized kernels and prove that they are both sound and complete for the most common class of warp-specialized programs. We also present WEFT, a verification tool for checking warp-specialized code. Using WEFT, we discover several non-trivial bugs in production warp-specialized kernels.
- Parallel thread execution ISA version 4.1. http://docs.nvidia. com/cuda/parallel-thread-execution/index.html, 2014.Google Scholar
- A. Aiken and D. Gay. Barrier inference. In POPL, pages 342–354, 1998. Google ScholarDigital Library
- J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: Weak behaviours and programming assumptions. In ASPLOS, pages 577– 591, 2015. Google ScholarDigital Library
- E. Bardsley and A. F. Donaldson. Warps and atomics: Beyond barrier synchronization in the verification of GPU kernels. In NFM, pages 230–245, 2014.Google ScholarDigital Library
- E. Bardsley, A. Betts, N. Chong, P. Collingbourne, P. Deligiannis, A. F. Donaldson, J. Ketema, D. Liew, and S. Qadeer. Engineering a static verification tool for GPU kernels. In CAV, pages 226–242, 2014. Google ScholarDigital Library
- M. Bauer, H. Cook, and B. Khailany. CudaDMA: optimizing GPU memory bandwidth via warp specialization. SC ’11, 2011. Google ScholarDigital Library
- M. Bauer, S. Treichler, and A. Aiken. Singe: Leveraging warp specialization for high performance on GPUs. PPoPP ’14, 2014. Google ScholarDigital Library
- A. Betts, N. Chong, A. F. Donaldson, S. Qadeer, and P. Thomson. GPUVerify: a verifier for GPU kernels. In OOPSLA, pages 113–132, 2012. Google ScholarDigital Library
- J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo. Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science and Discovery, page 015001, 2009.Google Scholar
- W. Chiang, G. Gopalakrishnan, G. Li, and Z. Rakamaric. Formal analysis of GPU programs with atomics via conflict-directed delaybounding. In NFM, pages 213–228, 2013.Google Scholar
- N. Chong, A. F. Donaldson, P. H. J. Kelly, J. Ketema, and S. Qadeer. Barrier invariants: a shared state abstraction for the analysis of datadependent GPU kernels. In OOPSLA, pages 605–622, 2013. Google ScholarDigital Library
- N. Chong, A. F. Donaldson, and J. Ketema. A sound and complete abstraction for reasoning about parallel prefix sums. In POPL, pages 397–410, 2014. Google ScholarDigital Library
- P. Collingbourne, C. Cadar, and P. H. J. Kelly. Symbolic crosschecking of floating-point and SIMD code. In EuroSys, pages 315–328, 2011. Google ScholarDigital Library
- S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco. Gpus and the future of parallel computing. IEEE Micro, 31(5):7–17, 2011. Google ScholarDigital Library
- Khronos. The OpenCL Specification, Version 1.0. The Khronos OpenCL Working Group, December 2008.Google Scholar
- L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558–565, 1978. Google ScholarDigital Library
- D.-K. Le, W.-N. Chin, and Y. M. Teo. Verification of static and dynamic barrier synchronization using bounded permissions. In ICFEM, pages 231–248, 2013.Google ScholarCross Ref
- A. Leung, M. Gupta, Y. Agarwal, R. Gupta, R. Jhala, and S. Lerner. Verifying GPU kernels by test amplification. In PLDI, pages 383–394, 2012. Google ScholarDigital Library
- G. Li and G. Gopalakrishnan. Scalable smt-based verification of GPU kernel functions. In FSE, pages 187–196, 2010. Google ScholarDigital Library
- G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: concolic verification and test generation for GPUs. In PPOPP, pages 215–224, 2012. Google ScholarDigital Library
- P. Li, G. Li, and G. Gopalakrishnan. Parametric flows: automated behavior equivalencing for symbolic analysis of races in CUDA programs. In SC, page 29, 2012. Google ScholarDigital Library
- NVIDIA. CUDA programming guide 6.0. http://developer. download.nvidia.com/compute/DevZone/docs/html/C/doc/ CUDA_C_Programming_Guide.pdf, August 2014.Google Scholar
- Oracle. Class phaser. https://docs.oracle.com/javase/7/ docs/api/java/util/concurrent/Phaser.html, 2014.Google Scholar
- S. G. Parker, J. Bigler, A. Dietrich, H. Friedrich, J. Hoberock, D. Luebke, D. McAllister, M. McGuire, K. Morley, A. Robison, and M. Stich. OptiX: A general purpose ray tracing engine. ACM Transactions on Graphics, August 2010. Google ScholarDigital Library
- G. Ramalingam. Context-sensitive synchronization-sensitive analysis is undecidable. ACM Trans. Program. Lang. Syst., 22(2):416–430, 2000. Google ScholarDigital Library
- Top500. Top 500 supercomputers. http://www.top500.org, 2014.Google Scholar
Index Terms
- Verification of producer-consumer synchronization in GPU programs
Recommendations
The Design and Implementation of a Verification Technique for GPU Kernels
We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed <i>synchronous, ...
Verification of producer-consumer synchronization in GPU programs
PLDI '15Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. No previous work has considered the higher performance, but more complex, warp-specialized ...
GPUVerify: a verifier for GPU kernels
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsWe present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming ...
Comments