research-article

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors

Authors:
Jayanth Gummaraju

Advanced Micro Devices, Sunnyvale, CA, USA

Advanced Micro Devices, Sunnyvale, CA, USA
View Profile

,
Laurent Morichetti

Advanced Micro Devices, Sunnyvale, CA, USA

Advanced Micro Devices, Sunnyvale, CA, USA
View Profile

,
Michael Houston

Advanced Micro Devices, Sunnyvale, CA, USA

Advanced Micro Devices, Sunnyvale, CA, USA
View Profile

,
Ben Sander

Advanced Micro Devices, Sunnyvale, CA, USA

Advanced Micro Devices, Sunnyvale, CA, USA
View Profile

,
Benedict R. Gaster

Advanced Micro Devices, Sunnyvale, CA, USA

Advanced Micro Devices, Sunnyvale, CA, USA
View Profile

,
Bixia Zheng

Advanced Micro Devices, Sunnyvale, CA, USA

Advanced Micro Devices, Sunnyvale, CA, USA
View Profile

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesSeptember 2010Pages 205–216https://doi.org/10.1145/1854273.1854302

Published:11 September 2010Publication History

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Pages 205–216

ABSTRACT

Modern processors are evolving into hybrid, heterogeneous processors with both CPU and GPU cores used for general purpose computation. Several languages such as Brook, CUDA, and more recently OpenCL are being developed to fully harness the potential of these processors. These languages typically involve the control code running on the CPU and the performance-critical, data-parallel kernel code running on the GPUs.

In this paper, we present Twin Peaks, a software platform for heterogeneous computing that executes code originally targeted for GPUs efficiently on CPUs as well. This permits a more balanced execution between the CPU and GPU, and enables portability of code between these architectures and to CPU-only environments. We propose several techniques in the runtime system to efficiently utilize the caches and functional units present in CPUs. Using OpenCL as a canonical language for heterogeneous computing, and running several experiments on real hardware, we show that our techniques enable GPGPU-style code to execute efficiently on multicore CPUs with minimal runtime overhead. These results also show that for maximum performance, it is beneficial for applications to utilize both CPUs and GPUs as accelerator targets.

References

}}GPGPU. www.gpgpu.org.Google Scholar
}}Intel TBB. www.threadingbuildingblocks.org.Google Scholar
}}OpenCL. www.khronos.org/opencl/.Google Scholar
}}T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. ACM Trans. on Computer Systems, 10(1), 1992. Google ScholarDigital Library
}}I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. In SIGGRAPH, 2004. Google ScholarDigital Library
}}B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. In Proceedings of ASPLOS, 1998. Google ScholarDigital Library
}}T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In Proceedings of PLDI, 1999. Google ScholarDigital Library
}}C. H. Crawford, P. Henning, M. Kistler, and C. Wright. Accelerating computing with the Cell broadband engine processor. In Proceedings of Computing Frontiers, 2008. Google ScholarDigital Library
}}J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum. Streamware: Programming general-purpose multicore processors using streams. In Proceedings of ASPLOS XIII, 2008. Google ScholarDigital Library
}}G. Hoflehner, K. Kirkegaard, R. Skinner, D. Lavery, Y.-F. Lee, and W. Li. Compiler Optimizations for Transaction Processing Workloads on Itanium R Linux Systems. In Proceedings of International Symposium on Microarchitecture, 2004. Google ScholarDigital Library
}}H. P. Hofstee. Power efficient processor architecture and the cell processor. In Proceedings of HPCA, 2005. Google ScholarDigital Library
}}H. P. Hofstee. Power efficient processor architecture and the cell processor. In Proceedings of HPCA, 2005. Google ScholarDigital Library
}}N. Joukov, A. Kashyap, G. Sivathanu, and E. Zadok. KeFence: An Electric Fence for Kernel Buffers. In StorageSS, 2005. Google ScholarDigital Library
}}R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan. Heterogeneous chip multiprocessors. Computer, 38(11):32--38, 2005. Google ScholarDigital Library
}}C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO '04: Proceedings of the International Symposium on Code generation and Optimization, 2004. Google ScholarDigital Library
}}V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proceedings of ISCA, 2010. Google ScholarDigital Library
}}NVIDIA Corporation. CUDA Programming Guide 2.0, June 2008.Google Scholar
}}OpenMP Architecture Review Board. OpenMP Application Program Interface 3.0, 2007.Google Scholar
}}J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips. GPU computing. Proceedings of the IEEE, 96(5), May 2008.Google ScholarCross Ref
}}B. Saha, X. Zhou, H. Chen, Y. Gao, S. Yan, M. Rajagopalan, J. Fang, P. Zhang, R. Ronen, and A. Mendelson. Programming model for a heterogeneous x86 platform. In PLDI, 2009. Google ScholarDigital Library
}}L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. In SIGGRAPH, 2008. Google ScholarDigital Library
}}J. Stratton, S. S. Stone, and W. W. Hwu. M-CUDA: An efficient implementation of CUDA kernels on multicores. Int'l Workshop on Languages and Compilers for Parallel Computing, 2008. Google ScholarDigital Library
}}D. Tarditi, S. Puri, and J. Oglesby. Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses. In Proceedings of ASPLOS, 2006. Google ScholarDigital Library
}}C. A. Thekkath, T. D. Nguyen, E. Moy, and E. D. Lazowska. Implementing network protocols at user level. IEEE/ACM Trans. Netw., 1(5), 1993. Google ScholarDigital Library
}}P. H. Wang, J. D. Collins, G. N. Chinya, H. Jiang, X. Tian, M. Girkar, N. Y. Yang, G.-Y. Lueh, and H. Wang. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proceedings of PLDI, 2007. Google ScholarDigital Library

Index Terms

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Programming heterogeneous computing systems with Graphics Processing Units (GPU) and multi-core CPUs in them is complex and time-consuming. OpenCL has emerged as an attractive programming framework for heterogeneous systems. But utilizing multiple ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Programming heterogeneous computing systems with Graphics Processing Units (GPU) and multi-core CPUs in them is complex and time-consuming. OpenCL has emerged as an attractive programming framework for heterogeneous systems. But utilizing multiple ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
September 2010
596 pages
ISBN:9781450301787
DOI:10.1145/1854273
General Chair:
Valentina Salapura
IBM TJ Watson Research Center
,
Program Chairs:
Michael Gschwind
IBM Systems & Technology Group
,
Jens Knoop
Technische Universität Wien
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 September 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPGPU
OpenCL
multicore
programmability
runtime
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 1,330
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.