skip to main content
research-article

Copperhead: compiling an embedded data parallel language

Published:12 February 2011Publication History
Skip Abstract Section

Abstract

Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. The characteristics of parallel microprocessors and the limitations of current programming methodologies motivate our design of Copperhead, a high-level data parallel language embedded in Python. The Copperhead programmer describes parallel computations via composition of familiar data parallel primitives supporting both flat and nested data parallel computation on arrays of data. Copperhead programs are expressed in a subset of the widely used Python programming language and interoperate with standard Python modules, including libraries for numeric computation, data visualization, and analysis. In this paper, we discuss the language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code. We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations. We also outline the runtime support by which Copperhead programs interoperate with standard Python modules. We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallel programming on GPUs. Copperhead code is concise, on average requiring 3.6 times fewer lines of code than CUDA, and the compiler generates efficient code, yielding 45-100% of the performance of hand-crafted, well optimized CUDA code.

References

  1. N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In SC'09: Proc. Conference on High Performance Computing Networking, Storage and Analysis, pages 1--11. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. E. Blelloch. Vector models for data-parallel computing. MIT Press, Cambridge, MA, USA, 1990. ISBN 0-262-02313-X. URL http://www.cs.cmu.edu/guyb/papers/Ble90.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. E. Blelloch. Programming parallel algorithms. phCommun. ACM, 39 (3): 85--97, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. E. Blelloch, S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21 (1): 4--14, Apr. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Catanzaro, N. Sundaram, and K. Keutzer. Fast Support Vector Machine Training and Classification on Graphics Processors. In Proceedings of the 25th International Conference on Machine Learning, pages 104--111, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Catanzaro, S. Kamil, Y. Lee, K. Asanović, J. Demmel, K. Keutzer, J. Shalf, K. Yelick, and A. Fox. SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization. Technical Report UCB/EECS-2010--23, EECS Department, University of California, Berkeley, 2010.Google ScholarGoogle Scholar
  7. M. M. T. Chakravarty, R. Leshchinskiy, S. P. Jones, G. Keller, and S. Marlow. Data parallel Haskell: a status report. In Proc. 2007 Workshop on Declarative Aspects of Multicore Programming, pages 10--18. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Clyther. Clyther: Python language extension for opencl. http://clyther.sourceforge.net/, 2010.Google ScholarGoogle Scholar
  9. Cython. Cython: C-extensions for python. http://cython.org/, 2010.Google ScholarGoogle Scholar
  10. R. Garg and J. N. Amaral. Compiling python to a hybrid execution environment. In GPGPU'10: Proc. 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 19--30. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Ghuloum, E. Sprangle, J. Fang, G. Wu, and X. Zhou. Ct: A Flexible Parallel Programming Model for Tera-scale Architectures. Technical Report White Paper, Intel Corporation, 2007.Google ScholarGoogle Scholar
  12. T. D. Han and T. S. Abdelrahman. hiCUDA: a high-level directive-based language for GPU programming. In GPGPU-2: Proc. 2nd Workshop on General Purpose Processing on Graphics Processing Units, pages 52--61. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. D. Hillis and J. Guy L. Steele. Data parallel algorithms. Commun. ACM, 29 (12): 1170--1183, 1986. ISSN 0001-0782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Hoberock and N. Bell. Thrust: A parallel template library, 2009. URL http://www.meganewtons.com/. Version 1.2.Google ScholarGoogle Scholar
  15. P. Hudak and M. P. Jones. Haskell vs. Ada vs. C vs. Awk vs...an experiment in software prototyping productivity. Technical Report YALEU/DCS/RR-1049, Yale University Department of Computer Science, New Haven, CT, 1994.Google ScholarGoogle Scholar
  16. S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Comput., 13 (3): 637--649, 2001. ISSN 0899--7667. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Klöckner. Codepy. http://mathema.tician.de/software/codepy, 2010.Google ScholarGoogle Scholar
  18. A. Klöckner, N. Pinto, Y. Lee, B. Catanzaro, P. Ivanov, and A. Fasih. Pycuda: Gpu run-time code generation for high-performance computing. CoRR, abs/0911.3456, 2009.Google ScholarGoogle Scholar
  19. S. Lee, M. M. T. Chakravarty, V. Grover, and G. Keller. GPU kernels as data-parallel array computations in Haskell. In Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods (EPAHM 2009), 2009.Google ScholarGoogle Scholar
  20. S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In Proc. 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 101--110. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. McCool. Data-parallel programming on the Cell BE and the GPU using the RapidMind development platform. In Proc. GSPx Multicore Applications Conference, Nov. 2006.Google ScholarGoogle Scholar
  22. M. D. McCool, Z. Qin, and T. S. Popa. Shader metaprogramming. In Proc. Graphics Hardware 2002, pages 57--68. Eurographics Association, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. phQueue, 6 (2): 40--53, Mar/Apr 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. NVIDIA CUDA C Programming Guide. NVIDIA Corporation, May 2010. URL http://www.nvidia.com/CUDA. Version 3.1.Google ScholarGoogle Scholar
  25. L. Prechelt. Are Scripting Languages Any Good? A Validation of Perl, Python, Rexx, and Tcl against C, C, and Java. In Advances in Computers, volume 57, pages 205--270. Elsevier, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  26. J. Sanders and E. Kandrot. CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley, July 2010. ISBN 978-0-13-138768-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Sundaram, T. Brox, and K. Keutzer. Dense Point Trajectories by GPU-accelerated Large Displacement Optical Flow. In phEuropean Conference on Computer Vision, pages 438--445, September 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. SIGARCH Comput. Archit. News, 34 (5): 325--335, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Theano. Theano: A python array expression library. http://deeplearning.net/software/theano/, 2010.Google ScholarGoogle Scholar
  30. M. Wolfe. Implementing the PGI accelerator model. In Proc. 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 43--50. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Copperhead: compiling an embedded data parallel language

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 46, Issue 8
              PPoPP '11
              August 2011
              300 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/2038037
              Issue’s Table of Contents
              • cover image ACM Conferences
                PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
                February 2011
                326 pages
                ISBN:9781450301190
                DOI:10.1145/1941553
                • General Chair:
                • Calin Cascaval,
                • Program Chair:
                • Pen-Chung Yew

              Copyright © 2011 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 12 February 2011

              Check for updates

              Qualifiers

              • research-article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader