ABSTRACT
The client computing platform is moving towards a heterogeneous architecture consisting of a combination of cores focused on scalar performance, and a set of throughput-oriented cores. The throughput oriented cores (e.g. a GPU) may be connected over both coherent and non-coherent interconnects, and have different ISAs. This paper describes a programming model for such heterogeneous platforms. We discuss the language constructs, runtime implementation, and the memory model for such a programming environment. We implemented this programming environment in a x86 heterogeneous platform simulator. We ported a number of workloads to our programming environment, and present the performance of our programming environment on these workloads.
- Adve S, Adve V, Hill M.D. and Vernon M.K. Comparison of Hardware and Software Cache Coherence Schemes. ISCA 1991. Google ScholarDigital Library
- AMD CTM http://ati.amd.com/companyinfo/researcher/documents/ ATI_CTM_Guide.pdfGoogle Scholar
- AMD Stream SDK, ati.amd.com/technology/streamcomputing.Google Scholar
- Amza C., Cox A.L., Dwarkadas S., Keleher P., Lu H., Rajamony R., Yu W., Zwaenepoel W. TreadMarks: Shared Memory Computing on Networks of Workstations. IEEE Computer, Feb 1996. Google ScholarDigital Library
- Boehm H., Adve S. Foundations of the C++ memory model. Programming Language Design and Implementation (PLDI). 2008. Google ScholarDigital Library
- Dubey P. Recognition, Mining, and Synthesis moves computers to the era of tera. Technology@Intel, Feb 2005.Google Scholar
- Gelado I., Kelm J.H., Ryoo S., Navarro N., Lumetta S.S., Hwu W.W. CUBA: An Architecture for Efficient CPU/Co-processor Data Communication. ICS, June 2008. Google ScholarDigital Library
- Gschwind M., Hofstee H.P., Flachs B., Hopkins M., Watanabe Y., Yamakazi T. Synergistic Processing in Cell's Multicore Architecture. IEEE Micro, April 2006. Google ScholarDigital Library
- Kontothanasis L., Stets R., Hunt G., Rencuzogullari U., Altekar G., Dwarkadas S., Scott M.L. Shared Memory Computing on Clusters with Symmetric Multiprocessors and System Area Networks. ACM Transactions on Computer Systems, Aug 2005. Google ScholarDigital Library
- Luebke, D., Harris, M., Krüger, J., Purcell, T., Govindaraju, N., Buck, I., Woolley, C., and Lefohn, A. 2004. GPGPU: general purpose computation on graphics hardware. SIGGRAPH 2004. Google ScholarDigital Library
- Nvidia Corp, CUDA Programming Environment, www.nvidia.com/object/cuda_what_is.html.Google Scholar
- OpenCL 1.0, http://www.khronos.org/opencl/.Google Scholar
- Ryoo S., Rodrigues C.I., Baghsorki S.S., Stone S.S., Kirk D.B., Hwu W.W. Optimization Principles and Application Performance Evaluation of a Multithreaded LRB using CUDA. PPoPP 2008. Google ScholarDigital Library
- Saraswat, V. A., Sarkar, V., and von Praun, C. 2007. X10: concurrent programming for modern architectures. PPoPP 2007. Google ScholarDigital Library
- Saha, B., Adl-Tabatabai, A., Ghuloum, A., Rajagopalan, M., Hudson, R. L., Petersen, L., Menon, V., Murphy, B., Shpeisman, T., Sprangle, E., Rohillah, A., Carmean, D., and Fang, J. 2007. Enabling scalability and performance in a large scale CMP environment. Eurosys 2007. Google ScholarDigital Library
- Seiler L., Carmean D., Sprangle E., Forsyth T., Abrash M., Dubey P., Junkins S., Lake A., Sugerman J., Cavin R., Espasa R., Grochowski E., Juan T., Hanrahan P. Larrabee: A Many-Core x86 Architecture for Visual Computing. ACM Transactions on Graphics, August 2008. Google ScholarDigital Library
- UPC Consortium, UPC language specifications. Lawrence Berkeley National Lab Tech Report LBNL--59208, 2005.Google Scholar
- Wang P., Collins J.D., Chinya G. N., Jiang H., Tian X., Girkar M., Yang N. Y., Lueh G., Wang H. Exochi: Architecture and programming environment for a heterogeneous multi-core multithreaded system. PLDI 2007. Google ScholarDigital Library
Index Terms
- Programming model for a heterogeneous x86 platform
Recommendations
Optimizing a shared virtual memory system for a heterogeneous CPU-accelerator platform
The client computing platform is moving towards a heterogeneous architecture that combines scalar-oriented CPU cores and throughput-oriented accelerator cores. Recognizing that existing programming models for such heterogeneous platforms are still ...
Programming model for a heterogeneous x86 platform
PLDI '09The client computing platform is moving towards a heterogeneous architecture consisting of a combination of cores focused on scalar performance, and a set of throughput-oriented cores. The throughput oriented cores (e.g. a GPU) may be connected over ...
Performance Evaluation of Fast Fourier Transform Application on Heterogeneous Platforms
CYBERC '11: Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge DiscoveryHeterogeneous platforms, integrating SMPs, clusters, GPUs, FPGAs, etc. are becoming the most popular architectures of supercomputers. Achieving high performance on CPUs or GPUs requires careful consideration of their different architectures, which ...
Comments