ABSTRACT
Systems with specialized processors such as those used for accel- erating computations (like NVIDIA's graphics processors or IBM's Cell) have proven their utility in terms of higher performance and lower power consumption. They have also been shown to outperform general purpose processors in case of graphics intensive or high performance applications and for enterprise applications like modern financial codes or web hosts that require scalable image processing. These facts are causing tremendous growth in accelerator-based platforms in the high performance domain with systems like Keeneland, supercomputers like Tianhe-1, RoadRunner and even in data center systems like Amazon's EC2.
The physical hardware in these systems, once purchased and assembled, is not reconfigurable and is expensive to modify or upgrade. This can eventually limit applications' performance and scalability unless they are rewritten to match specific versions of hardware and compositions of components, both for single nodes and for clusters of machines. To address this problem and to support increased flexibility in usage models for CUDA-based GPGPU applications, our research proposes GPGPU assemblies, where each assembly combines a desired number of CPUs and CUDA-supported GPGPUs to form a 'virtual execution platform' for an application. System-level software, then, creates and manages assemblies, including mapping them seamlessly to the actual cluster- and node- level hardware resources present in the system. Experimental evaluations of the initial implementation of GPGPU assemblies demonstrates their feasibility and advantages derived from their use.
- Amazon Inc. High performance computing using amazon ec2. http://aws.amazon.com/ec2/hpc-applications/.Google Scholar
- P. Barham, B. Dragovic, K. Fraser, et al. Xen and the art of virtualization. In SOSP, Bolton Landing, USA, 2003. Google ScholarDigital Library
- J. S. Chase, D. E. Irwin, L. E. Grit, et al. Dynamic virtual clusters in a grid site manager. In HPDC, Washington, DC, USA, 2003. Google ScholarDigital Library
- Citrix Corp. Xenserver multi-gpu passthrough for hdx 3d pro graphics. http://community.citrix.com/display/ocb/2010/06/28/XenServerGoogle Scholar
- J. Duato, A. J. Peña, F. Silla, et al. rCUDA: Reducing the number of gpu-based accelerators in high performance clusters. In HPCS, Caen, France, 2010.Google ScholarCross Ref
- N. Farooqi, A. Kerr, G. Diamos, et al. A framework for dynamically instrumenting gpu compute applications within gpu ocelot. In GPGPU-4, Newport Beach, CA, USA, 2011. Google ScholarDigital Library
- V. Gupta, A. Gavrilovska, et al. GViM: Gpu-accelerated virtual machines. In HPCVirt, Nuremberg, Germany, 2009. Google ScholarDigital Library
- V. Gupta, K. Schwan, N. Tolia, et al. Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In USENIX ATC, Portland, USA, 2011. Google ScholarDigital Library
- J. Lange, K. Pedretti, P. Dinda, et al. Minimal overhead virtualization of a large scale supercomputer. In VEE, Newport Beach, USA, March 2011. Google ScholarDigital Library
- Microsoft Corp. RemoteFX: Rich end user experience for virtual and session-based desktops. http://www.microsoft.com/windowsserver2008/en/us/rds-remotefx.aspx.Google Scholar
- NVIDIA. Nvidia cuda compute unified device architecture - programming guide. http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf, June 2007.Google Scholar
- NVIDIA Corp. NVIDIA SLI Multi-OS. http://www.nvidia.com/object/sli_multi_os.html.Google Scholar
- K. Pedretti and P. Bridges. Opportunities for leveraging os virtualization in high-end supercomputing. In MASVDC, Atlanta, USA, December 2010.Google Scholar
- S. Ryoo, C. I. Rodrigues, et al. Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In PPoPP, Salt Lake City, USA, 2008. Google ScholarDigital Library
- L. Shi, H. Chen, and J. Sun. vCUDA: Gpu accelerated high performance computing in virtual machines. In IPDPS, Rome, Italy, 2009. Google ScholarDigital Library
- A. I. Sundararaj and P. A. Dinda. Towards virtual networks for virtual machine grid computing. In Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3, San Jose, USA, 2004. Google ScholarDigital Library
- J. Vetter, K. Schwan, et al. Keeneland: National institute for experimental computing. http://keeneland.gatech.edu/?q=about, 2010.Google Scholar
Index Terms
- Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies
Recommendations
C-DAC's efforts: application kernels on HPC cluster with GPU accelerators
ATIP '12: Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?We describe the problem of parallelization of finite difference method (FDM) and finite element method (FEM) computations for certain class of partial differential equations (PDEs) on High Performance Computing (HPC) GPU cluster. For FDM, the structured ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresAchieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresAchieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Comments