research-article

A sound and complete abstraction for reasoning about parallel prefix sums

Authors:
Nathan Chong

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom
View Profile

,
Alastair F. Donaldson

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom
View Profile

,
Jeroen Ketema

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom
View Profile

POPL '14: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesJanuary 2014Pages 397–409https://doi.org/10.1145/2535838.2535882

Published:08 January 2014Publication History

POPL '14: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

Pages 397–409

ABSTRACT

Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance.

We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length $n$ if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n² \lg(n)) space, and is more feasible for large n than a method by Voigtlaender that uses O(n) space for the input and result but requires running O(n²) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction.

We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtlaender's method when applied to large arrays.

Supplemental Material

d2_left_t9.mp4

mp4

252.7 MB

Download

References

A. Betts, N. Chong, A. F. Donaldson, S. Qadeer, and P. Thomson. GPUVerify: a verifier for GPU kernels. In phOOPSLA, pages 113--132, 2012. Google ScholarDigital Library
M. Billeter, O. Olsson, and U. Assarsson. Efficient stream compaction on wide SIMD many-core architectures. In phHPG, pages 159--166, 2009. Google ScholarDigital Library
G. E. Blelloch. Scans as primitive parallel operations. phIEEE Trans. Comput., 38 (11): 1526--1538, 1989. Google ScholarDigital Library
G. E. Blelloch. Prefix sums and their applications. In J. H. Reif, editor, phSynthesis of Parallel Algorithms. Morgan Kaufmann, 1990.Google Scholar
R. P. Brent and H.-T. Kung. A regular layout for parallel adders. phIEEE Trans. Computers, 31 (3): 260--264, 1982. Google ScholarDigital Library
C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In phOSDI, pages 209--224, 2008. Google ScholarDigital Library
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In phWorkload Characterization, pages 44--54, 2009. Google ScholarDigital Library
N. Chong, A. F. Donaldson, P. H. Kelly, J. Ketema, and S. Qadeer. Barrier invariants: A shared state abstraction for the analysis of data-dependent GPU kernels. In phOOPSLA, pages 605--622, 2013. Google ScholarDigital Library
P. Collingbourne, , C. Cadar, and P. H. J. Kelly. Symbolic testing of OpenCL code. In phHVC'11, pages 203--218, 2012. Google ScholarDigital Library
P. Collingbourne, A. F. Donaldson, J. Ketema, and S. Qadeer. Interleaving and lock-step semantics for analysis and verification of GPU kernels. In phESOP, pages 270--289, 2013. Google ScholarDigital Library
P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In phPOPL, pages 238--252, 1977. Google ScholarDigital Library
A. Danalis et al. The scalable heterogeneous computing (SHOC) benchmark suite. In phGPGPU 2010, pages 63--74, 2010. Google ScholarDigital Library
io\uglu et al.(1990)E\ugecio\uglu, Gallopoulos, and Koç}PolynomialsO. E\ugecio\uglu, E. Gallopoulos, and C. Koç. A parallel method for fast and practical high-order Newton interpolation. phBIT Numerical Mathematics, 30: 268--288, 1990. Google ScholarDigital Library
W. Gropp, E. Lusk, and A. Skjellum. phUsing MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press, 2nd edition, 1999. Google ScholarDigital Library
A. Habermaier and A. Knapp. On the correctness of the SIMT execution model of GPUs. In phESOP, pages 316--335, 2012. Google ScholarDigital Library
M. Harris, S. Sengupta, and J. Owens. Parallel prefix sum (scan) with CUDA. In H. Nguyen, editor, phGPU Gems 3. Addison-Wesley, 2007.Google Scholar
R. Hinze. An algebra of scans. In phMPC, pages 186--210, 2004.Google Scholar
'c(2013)}BYTECODEM. Huisman and M. Mihel\vcić. Specification and verification of GPGPU programs using permission-based separation logic. In phBYTECODE, 2013.Google Scholar
012)}OpenCLKhronos OpenCL Working Group. The OpenCL specification, version 1.2, 2012.Google Scholar
D. E. Knuth. phThe Art of Computer Programming, volume 3. Addison-Wesley, 2nd edition, 1998. Google ScholarDigital Library
P. M. Kogge and H. S. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. phIEEE Trans. Computers, C-22 (8): 786--793, 1973. Google ScholarDigital Library
R. E. Ladner and M. J. Fischer. Parallel prefix computation. phJ. ACM, 27 (4): 831--838, 1980. Google ScholarDigital Library
A. Leung, M. Gupta, Y. Agarwal, et al. Verifying GPU kernels by test amplification. In phPLDI, pages 383--394, 2012. Google ScholarDigital Library
G. Li and G. Gopalakrishnan. Scalable SMT-based verification of GPU kernel functions. In phFSE, pages 187--196, 2010. Google ScholarDigital Library
G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: concolic verification and test generation for GPUs. In phPPoPP, pages 215--224, 2012. Google ScholarDigital Library
D. Merrill and A. Grimshaw. Parallel scan for stream architectures. Technical Report CX2009--14, Department of Computer Science, University of Virginia, 2009.Google Scholar
012)}CUDANVIDIA. CUDA C programming guide, version 5.0, 2012.Google Scholar
B. C. Pierce. phTypes and Programming Languages. MIT Press, 2002. Google ScholarDigital Library
(2006)}DBLP:conf/pvm/SandersT06P. Sanders and J. L. Tr\"aff. Parallel prefix (scan) algorithms for MPI. In phPVM/MPI, pages 22--29, 2006.Google Scholar
N. Satish, M. Harris, and M. Garland. Designing efficient sorting algorithms for manycore GPUs. In phIPDPS, pages 1--10, 2009. Google ScholarDigital Library
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In phGH, pages 97--106, 2007. Google ScholarDigital Library
I. Sergeev. On the complexity of parallel prefix circuits. Technical Report TR13-041, Electronic Colloquium on Computational Complexity, 2013.Google Scholar
M. Sheeran. Functional and dynamic programming in the design of parallel prefix networks. phJ. of Funct. Program., 21 (1): 59--114, 2011. Google ScholarDigital Library
J. Sklansky. Conditional-sum addition logic. phIRE Trans. Electronic Computers, EC-9: 226--231, 1960.Google Scholar
B. So, A. M. Ghuloum, and Y. Wu. Optimizing data parallel operations on many-core platforms. In phIn First Workshop on Software Tools for Multi-Core Systems (STMCS), 2006.Google Scholar
H. S. Stone. Parallel processing with the perfect shuffle. phIEEE Trans. Comput., 20 (2): 153--161, 1971. Google ScholarDigital Library
J. Stratton et al. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical Report IMPACT-12-01, University of Illinois at Urbana-Champaign, 2012.Google Scholar
er(2008)}Voigtlaender08J. Voigtl\"ander. Much ado about two (pearl): a pearl on parallel prefix computation. In phPOPL, pages 29--35, 2008. Google ScholarDigital Library

Index Terms

A sound and complete abstraction for reasoning about parallel prefix sums
1. Theory of computation
  1. Logic
  2. Semantics and reasoning
    1. Program reasoning

Recommendations

Higher-order and tuple-based massively-parallel prefix sums
PLDI '16

Prefix sums are an important parallel primitive, especially in massively-parallel programs. This paper discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums. Moreover, it describes and evaluates SAM, a ...
Read More
A sound and complete abstraction for reasoning about parallel prefix sums
POPL '14

Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie ...
Read More
Abstraction and Refinement: Towards Scalable and Exact Verification of Neural Networks
As a new programming paradigm, deep neural networks (DNNs) have been increasingly deployed in practice, but the lack of robustness hinders their applications in safety-critical domains. While there are techniques for verifying DNNs with formal guarantees, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
POPL '14: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
January 2014
702 pages
ISBN:9781450325448
DOI:10.1145/2535838
General Chair:
Suresh Jagannathan
Purdue University, USA
,
Program Chair:
Peter Sewell
University of Cambridge, UK
ACM SIGPLAN Notices Volume 49, Issue 1
POPL '14
January 2014
661 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2578855
Editor:
Mark W. Bailey
Hamilton College, Clinton, NY
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 January 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
abstraction
formal verification
gpus
parallel prefix sum computation
Qualifiers
- research-article
Conference

Acceptance Rates
POPL '14 Paper Acceptance Rate51of220submissions,23%Overall Acceptance Rate824of4,130submissions,20%
More
Upcoming Conference
POPL '25

Sponsor:

sigplan

The 52nd Annual ACM SIGPLAN Symposium on Principles of Programming Languages

January 19 - 25, 2025

Denver , CO , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 338
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A sound and complete abstraction for reasoning about parallel prefix sums

POPL '14: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Higher-order and tuple-based massively-parallel prefix sums

A sound and complete abstraction for reasoning about parallel prefix sums

Abstraction and Refinement: Towards Scalable and Exact Verification of Neural Networks