Skip to main content
Top
Published in: The Journal of Supercomputing 6/2015

01-06-2015

Developing adaptive multi-device applications with the Heterogeneous Programming Library

Authors: Moisés  Viñas, Zeki  Bozkus, Basilio  B.  Fraguela, Diego  Andrade, Ramón  Doallo

Published in: The Journal of Supercomputing | Issue 6/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The usage of heterogeneous devices presents two main problems. One is their complex programming, a problem that grows when multiple devices are used. The second issue is that even if the codes for these devices can be portable on top of OpenCL, they lack performance portability, effectively requiring specialized implementations for each device to get good performance. In this paper we extend the Heterogeneous Programming Library (HPL), which improves the usability of heterogeneous systems on top of OpenCL, to better handle both issues. First, we provide HPL with mechanisms to support the implementation of any multi-device application that requires arbitrary patterns of communication between several devices and a host memory. In a second stage HPL is improved with an adaptive scheme to optimize communications between devices depending on the execution environment. An evaluation using benchmarks with very different nature shows that HPL reduces the SLOCs and programming effort of OpenCL applications by 27 and 43 %, respectively, while improving the performance of applications that exchange data between devices by 28 % on average.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Acosta A, Almeida F (2013) Skeletal based programming for dynamic programming on multiGPU systems. J Supercomput 65(3):1125–1136CrossRef Acosta A, Almeida F (2013) Skeletal based programming for dynamic programming on multiGPU systems. J Supercomput 65(3):1125–1136CrossRef
2.
go back to reference Barak A, Ben-Nun T, Levy E, Shiloh A (2010) A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: 2010 IEEE international conference on cluster computing workshops and posters (CLUSTER WORKSHOPS), pp 1–7 Barak A, Ben-Nun T, Levy E, Shiloh A (2010) A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: 2010 IEEE international conference on cluster computing workshops and posters (CLUSTER WORKSHOPS), pp 1–7
3.
go back to reference Duato J, Pena A, Silla F, Mayo R, Quintana-Ortí E (2010) rCUDA: reducing the number of GPU-based accelerators in high performance clusters. In: 2010 International conference on high performance computing and simulation (HPCS 2010), pp 224–231 Duato J, Pena A, Silla F, Mayo R, Quintana-Ortí E (2010) rCUDA: reducing the number of GPU-based accelerators in high performance clusters. In: 2010 International conference on high performance computing and simulation (HPCS 2010), pp 224–231
4.
go back to reference Duran A, Ayguadé E, Badia R, Labarta J, Martinell L, Martorell X, Planas J (2011) OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process Lett 21(2):173–193CrossRefMathSciNet Duran A, Ayguadé E, Badia R, Labarta J, Martinell L, Martorell X, Planas J (2011) OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process Lett 21(2):173–193CrossRefMathSciNet
5.
go back to reference Fraguela BB, Renau J, Feautrier P, Padua D, Torrellas J (2003) Programming the FlexRAM parallel intelligent memory system. ACM SIGPLAN Not 38(10):49–60CrossRef Fraguela BB, Renau J, Feautrier P, Padua D, Torrellas J (2003) Programming the FlexRAM parallel intelligent memory system. ACM SIGPLAN Not 38(10):49–60CrossRef
6.
go back to reference Geijn RAVD, Watts J (1997) SUMMA: scalable universal matrix multiplication algorithm. Concurr Comput Pract Exp 9(4):255–274CrossRef Geijn RAVD, Watts J (1997) SUMMA: scalable universal matrix multiplication algorithm. Concurr Comput Pract Exp 9(4):255–274CrossRef
7.
go back to reference González C, Fraguela B (2013) A framework for argument-based task synchronization with automatic detection of dependencies. Parallel Comput 39(9):475–489CrossRef González C, Fraguela B (2013) A framework for argument-based task synchronization with automatic detection of dependencies. Parallel Comput 39(9):475–489CrossRef
8.
go back to reference Grasso I, Pellegrini S, Cosenza B, Fahringer T (2013) LibWater: heterogeneous distributed computing made easy. In: International conference on supercomputing (ICS’13), pp 161–172 Grasso I, Pellegrini S, Cosenza B, Fahringer T (2013) LibWater: heterogeneous distributed computing made easy. In: International conference on supercomputing (ICS’13), pp 161–172
9.
go back to reference Guo J, Bikshandi G, Fraguela B, Padua D (2009) Writing productive stencil codes with overlapped tiling. Concurr Comput Pract Exp 21(1):25–39CrossRef Guo J, Bikshandi G, Fraguela B, Padua D (2009) Writing productive stencil codes with overlapped tiling. Concurr Comput Pract Exp 21(1):25–39CrossRef
10.
go back to reference Halstead MH (1977) Elements of software science. Elsevier Science Inc., New York, USAMATH Halstead MH (1977) Elements of software science. Elsevier Science Inc., New York, USAMATH
11.
go back to reference Kegel P, Steuwer M, Gorlatch S (2013) dOpenCL: towards uniform programming of distributed heterogeneous multi-/many-core systems. J Parallel Distrib Comput 73(12):1639–1648CrossRef Kegel P, Steuwer M, Gorlatch S (2013) dOpenCL: towards uniform programming of distributed heterogeneous multi-/many-core systems. J Parallel Distrib Comput 73(12):1639–1648CrossRef
12.
go back to reference Khronos OpenCL Working Group (2013) The OpenCL specification. Version 2 Khronos OpenCL Working Group (2013) The OpenCL specification. Version 2
13.
go back to reference Kim J, Seo S, Lee J, Nah J, Jo G, Lee J (2012) SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In: Proceedings of the 26th ACM international conference on supercomputing (ICS’12), pp 341–352 Kim J, Seo S, Lee J, Nah J, Jo G, Lee J (2012) SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In: Proceedings of the 26th ACM international conference on supercomputing (ICS’12), pp 341–352
14.
go back to reference Lamport L (1979) How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans Comput 28(9):690–691CrossRefMATH Lamport L (1979) How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans Comput 28(9):690–691CrossRefMATH
15.
go back to reference Li K, Hudak P (1989) Memory coherence in shared virtual memory systems. ACM Trans Comput Syst 7(4):321–359CrossRef Li K, Hudak P (1989) Memory coherence in shared virtual memory systems. ACM Trans Comput Syst 7(4):321–359CrossRef
16.
go back to reference Lobeiras J, Viñas M, Amor M, Fraguela B, Arenaz M, García J, Castro M (2013) Parallelization of shallow water simulations on current multi-threaded systems. Int J High Perform Comput Appl 27(4):493–512 Lobeiras J, Viñas M, Amor M, Fraguela B, Arenaz M, García J, Castro M (2013) Parallelization of shallow water simulations on current multi-threaded systems. Int J High Perform Comput Appl 27(4):493–512
17.
go back to reference Nieuwpoort RV, Romein JW (2011) Correlating radio astronomy signals with many-core hardware. Int J Parallel Program 39(1):88–114CrossRef Nieuwpoort RV, Romein JW (2011) Correlating radio astronomy signals with many-core hardware. Int J Parallel Program 39(1):88–114CrossRef
18.
go back to reference Nvidia (2008) Nvidia: CUDA compute unified device architecture Nvidia (2008) Nvidia: CUDA compute unified device architecture
19.
go back to reference Seo S, Jo G, Lee J (2011) Performance characterization of the NAS parallel benchmarks in OpenCL. In: Proceedings of the 2011 IEEE international symposium on workload characterization, IISWC ’11, pp 137–148 Seo S, Jo G, Lee J (2011) Performance characterization of the NAS parallel benchmarks in OpenCL. In: Proceedings of the 2011 IEEE international symposium on workload characterization, IISWC ’11, pp 137–148
20.
go back to reference Steuwer M, Gorlatch S (2014) SkelCL: a high-level extension of OpenCL for multi-GPU systems. J Supercomput 69(1):25–33 Steuwer M, Gorlatch S (2014) SkelCL: a high-level extension of OpenCL for multi-GPU systems. J Supercomput 69(1):25–33
21.
go back to reference Stumm M, Zhou S (1990) Algorithms implementing distributed shared memory. Computer 23(5):54–64CrossRef Stumm M, Zhou S (1990) Algorithms implementing distributed shared memory. Computer 23(5):54–64CrossRef
22.
go back to reference Thoman P, Kofler K, Studt H, Thomson J, Fahringer T (2011) Automatic OpenCL device characterization: guiding optimized kernel design. In: Euro-Par’11, LNCS, vol 6853. Springer, pp 438–452 Thoman P, Kofler K, Studt H, Thomson J, Fahringer T (2011) Automatic OpenCL device characterization: guiding optimized kernel design. In: Euro-Par’11, LNCS, vol 6853. Springer, pp 438–452
23.
go back to reference Viñas M, Bozkus Z, Fraguela B (2013) Exploiting heterogeneous parallelism with the Heterogeneous Programming Library. J Parallel Distrib Comput 73(12):1627–1638CrossRef Viñas M, Bozkus Z, Fraguela B (2013) Exploiting heterogeneous parallelism with the Heterogeneous Programming Library. J Parallel Distrib Comput 73(12):1627–1638CrossRef
24.
go back to reference Viñas M, Bozkus Z, Fraguela B, Andrade D, Doallo R (2014) Exploiting multi-GPU systems using the Heterogeneous Programming Library. In: 14th International conference on computational and mathematical methods in science and engineering (CMMSE 2014), pp 1280–1291 Viñas M, Bozkus Z, Fraguela B, Andrade D, Doallo R (2014) Exploiting multi-GPU systems using the Heterogeneous Programming Library. In: 14th International conference on computational and mathematical methods in science and engineering (CMMSE 2014), pp 1280–1291
25.
go back to reference Viñas M, Lobeiras J, Fraguela B, Arenaz M, Amor M, García J, Castro M, Doallo R (2013) A multi-GPU shallow-water simulation with transport of contaminants. Concurr Comput Pract Exp 25(8):1153–1169CrossRef Viñas M, Lobeiras J, Fraguela B, Arenaz M, Amor M, García J, Castro M, Doallo R (2013) A multi-GPU shallow-water simulation with transport of contaminants. Concurr Comput Pract Exp 25(8):1153–1169CrossRef
26.
go back to reference Xu R, Chandrasekaran S, Chapman B (2013) Exploring programming multi-GPUs using OpenMP and OpenACC-based hybrid model. In: 2013 IEEE 27th International parallel and distributed processing symposium workshops Ph.D. forum (IPDPSW), pp 1169–1176 Xu R, Chandrasekaran S, Chapman B (2013) Exploring programming multi-GPUs using OpenMP and OpenACC-based hybrid model. In: 2013 IEEE 27th International parallel and distributed processing symposium workshops Ph.D. forum (IPDPSW), pp 1169–1176
Metadata
Title
Developing adaptive multi-device applications with the Heterogeneous Programming Library
Authors
Moisés  Viñas
Zeki  Bozkus
Basilio  B.  Fraguela
Diego  Andrade
Ramón  Doallo
Publication date
01-06-2015
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 6/2015
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1352-1

Other articles of this Issue 6/2015

The Journal of Supercomputing 6/2015 Go to the issue

Premium Partner