research-article

Copperhead: compiling an embedded data parallel language

Authors:
Bryan Catanzaro

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Michael Garland

NVIDIA Corporation, Santa Clara, CA, USA

NVIDIA Corporation, Santa Clara, CA, USA
View Profile

,
Kurt Keutzer

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 46 Issue 8August 2011pp 47–56https://doi.org/10.1145/2038037.1941562

Published:12 February 2011Publication History

ACM SIGPLAN Notices

Abstract

Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. The characteristics of parallel microprocessors and the limitations of current programming methodologies motivate our design of Copperhead, a high-level data parallel language embedded in Python. The Copperhead programmer describes parallel computations via composition of familiar data parallel primitives supporting both flat and nested data parallel computation on arrays of data. Copperhead programs are expressed in a subset of the widely used Python programming language and interoperate with standard Python modules, including libraries for numeric computation, data visualization, and analysis. In this paper, we discuss the language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code. We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations. We also outline the runtime support by which Copperhead programs interoperate with standard Python modules. We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallel programming on GPUs. Copperhead code is concise, on average requiring 3.6 times fewer lines of code than CUDA, and the compiler generates efficient code, yielding 45-100% of the performance of hand-crafted, well optimized CUDA code.

References

N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In SC'09: Proc. Conference on High Performance Computing Networking, Storage and Analysis, pages 1--11. ACM, 2009. Google ScholarDigital Library
G. E. Blelloch. Vector models for data-parallel computing. MIT Press, Cambridge, MA, USA, 1990. ISBN 0-262-02313-X. URL http://www.cs.cmu.edu/guyb/papers/Ble90.pdf. Google ScholarDigital Library
G. E. Blelloch. Programming parallel algorithms. phCommun. ACM, 39 (3): 85--97, 1996. Google ScholarDigital Library
G. E. Blelloch, S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21 (1): 4--14, Apr. 1994. Google ScholarDigital Library
B. Catanzaro, N. Sundaram, and K. Keutzer. Fast Support Vector Machine Training and Classification on Graphics Processors. In Proceedings of the 25th International Conference on Machine Learning, pages 104--111, 2008. Google ScholarDigital Library
B. Catanzaro, S. Kamil, Y. Lee, K. Asanović, J. Demmel, K. Keutzer, J. Shalf, K. Yelick, and A. Fox. SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization. Technical Report UCB/EECS-2010--23, EECS Department, University of California, Berkeley, 2010.Google Scholar
M. M. T. Chakravarty, R. Leshchinskiy, S. P. Jones, G. Keller, and S. Marlow. Data parallel Haskell: a status report. In Proc. 2007 Workshop on Declarative Aspects of Multicore Programming, pages 10--18. ACM, 2007. Google ScholarDigital Library
Clyther. Clyther: Python language extension for opencl. http://clyther.sourceforge.net/, 2010.Google Scholar
Cython. Cython: C-extensions for python. http://cython.org/, 2010.Google Scholar
R. Garg and J. N. Amaral. Compiling python to a hybrid execution environment. In GPGPU'10: Proc. 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 19--30. ACM, 2010. Google ScholarDigital Library
A. Ghuloum, E. Sprangle, J. Fang, G. Wu, and X. Zhou. Ct: A Flexible Parallel Programming Model for Tera-scale Architectures. Technical Report White Paper, Intel Corporation, 2007.Google Scholar
T. D. Han and T. S. Abdelrahman. hiCUDA: a high-level directive-based language for GPU programming. In GPGPU-2: Proc. 2nd Workshop on General Purpose Processing on Graphics Processing Units, pages 52--61. ACM, 2009. Google ScholarDigital Library
W. D. Hillis and J. Guy L. Steele. Data parallel algorithms. Commun. ACM, 29 (12): 1170--1183, 1986. ISSN 0001-0782. Google ScholarDigital Library
J. Hoberock and N. Bell. Thrust: A parallel template library, 2009. URL http://www.meganewtons.com/. Version 1.2.Google Scholar
P. Hudak and M. P. Jones. Haskell vs. Ada vs. C vs. Awk vs...an experiment in software prototyping productivity. Technical Report YALEU/DCS/RR-1049, Yale University Department of Computer Science, New Haven, CT, 1994.Google Scholar
S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Comput., 13 (3): 637--649, 2001. ISSN 0899--7667. Google ScholarDigital Library
A. Klöckner. Codepy. http://mathema.tician.de/software/codepy, 2010.Google Scholar
A. Klöckner, N. Pinto, Y. Lee, B. Catanzaro, P. Ivanov, and A. Fasih. Pycuda: Gpu run-time code generation for high-performance computing. CoRR, abs/0911.3456, 2009.Google Scholar
S. Lee, M. M. T. Chakravarty, V. Grover, and G. Keller. GPU kernels as data-parallel array computations in Haskell. In Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods (EPAHM 2009), 2009.Google Scholar
S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In Proc. 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 101--110. ACM, 2009. Google ScholarDigital Library
M. McCool. Data-parallel programming on the Cell BE and the GPU using the RapidMind development platform. In Proc. GSPx Multicore Applications Conference, Nov. 2006.Google Scholar
M. D. McCool, Z. Qin, and T. S. Popa. Shader metaprogramming. In Proc. Graphics Hardware 2002, pages 57--68. Eurographics Association, 2002. Google ScholarDigital Library
J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. phQueue, 6 (2): 40--53, Mar/Apr 2008. Google ScholarDigital Library
NVIDIA CUDA C Programming Guide. NVIDIA Corporation, May 2010. URL http://www.nvidia.com/CUDA. Version 3.1.Google Scholar
L. Prechelt. Are Scripting Languages Any Good? A Validation of Perl, Python, Rexx, and Tcl against C, C, and Java. In Advances in Computers, volume 57, pages 205--270. Elsevier, 2003.Google ScholarCross Ref
J. Sanders and E. Kandrot. CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley, July 2010. ISBN 978-0-13-138768-3. Google ScholarDigital Library
N. Sundaram, T. Brox, and K. Keutzer. Dense Point Trajectories by GPU-accelerated Large Displacement Optical Flow. In phEuropean Conference on Computer Vision, pages 438--445, September 2010. Google ScholarDigital Library
D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. SIGARCH Comput. Archit. News, 34 (5): 325--335, 2006. Google ScholarDigital Library
Theano. Theano: A python array expression library. http://deeplearning.net/software/theano/, 2010.Google Scholar
M. Wolfe. Implementing the PGI accelerator model. In Proc. 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 43--50. ACM, 2010. Google ScholarDigital Library

Index Terms

Copperhead: compiling an embedded data parallel language
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
  2. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language types
        Concurrent programming languages
        Distributed programming languages
        Parallel programming languages

Recommendations

Copperhead: compiling an embedded data parallel language
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level ...
Read More
Braid: integrating task and data parallelism
FRONTIERS '95: Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)

Archetype data parallel or task parallel applications are well served by contemporary languages. However, for applications containing a balance of task and data parallelism the choice of language is less clear. While there are languages that enable both ...
Read More
Exploiting Implicit Parallelism in Dynamic Array Programming Languages
ARRAY'14: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

We have built an interpreter for the array programming language J. The interpreter exploits implicit data parallelism in the language to achieve good parallel speedups on a variety of benchmark applications.

Many array programming languages operate on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 46, Issue 8
PPoPP '11
August 2011
300 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2038037
Issue’s Table of Contents
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
February 2011
326 pages
ISBN:9781450301190
DOI:10.1145/1941553
General Chair:
Calin Cascaval
Qualcomm Research, USA
,
Program Chair:
Pen-Chung Yew
Academia Sinica, Taiwan and University of Minnesota at Twin Cities, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 February 2011
Check for updates
Author Tags
data parallelism
gpu
python
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 148
  Total Citations
  View Citations
- 902
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Copperhead: compiling an embedded data parallel language

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Copperhead: compiling an embedded data parallel language

Braid: integrating task and data parallelism

Exploiting Implicit Parallelism in Dynamic Array Programming Languages