Skip to main content
Top
Published in: The Journal of Supercomputing 2/2014

01-11-2014

Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria

Authors: Hector Ortega-Arranz, Yuri Torres, Arturo Gonzalez-Escribano, Diego R. Llanos

Published in: The Journal of Supercomputing | Issue 2/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

During the last years, GPU manycore devices have demonstrated their usefulness to accelerate computationally intensive problems. Although arriving at a parallelization of a highly parallel algorithm is an affordable task, the optimization of GPU codes is a challenging activity. The main reason for this is the number of parameters, programming choices, and tuning techniques available, many of them related with complex and sometimes hidden architecture details. A useful strategy to systematically attack these optimization problems is to characterize the different kernels of the application, and use this knowledge to select appropriate configuration parameters. The All-Pair Shortest-Path (APSP) problem is a well-known problem in graph theory whose objective is to find the shortest paths between any pairs of nodes in a graph. This problem can be solved by highly parallel and computational intensive tasks, being a good candidate to be exploited by manycore devices. In this paper, we use kernel characterization criteria to optimize an APSP algorithm implementation for NVIDIA GPUs. Our experimental results show that the combined use of proper configuration policies, and the concurrent kernels capability of new CUDA architectures, leads to a performance improvement of up to 62 % with respect to one of the possible configurations recommended by CUDA, considered as baseline.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Barceló J, Codina E, Casas J, Ferrer JL, García D (2005) Microscopic traffic simulation: a tool for the design, analysis and evaluation of intelligent transport systems. J Intell Robot Syst 41:173–203CrossRef Barceló J, Codina E, Casas J, Ferrer JL, García D (2005) Microscopic traffic simulation: a tool for the design, analysis and evaluation of intelligent transport systems. J Intell Robot Syst 41:173–203CrossRef
2.
go back to reference Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to algorithms, 2nd edn. McGraw-Hill Higher Education, Burr Ridge, Il 60521 Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to algorithms, 2nd edn. McGraw-Hill Higher Education, Burr Ridge, Il 60521
3.
go back to reference Crauser A, Mehlhorn K, Meyer U, Sanders P (1998) A parallelization of Dijkstra’s shortest path algorithm. In: Brim L, Gruska J, Zlatuška J (eds) Mathematical foundations of computer science 1998, LNCS, vol 1450. Springer, Berlin, pp 722–731CrossRef Crauser A, Mehlhorn K, Meyer U, Sanders P (1998) A parallelization of Dijkstra’s shortest path algorithm. In: Brim L, Gruska J, Zlatuška J (eds) Mathematical foundations of computer science 1998, LNCS, vol 1450. Springer, Berlin, pp 722–731CrossRef
4.
go back to reference Dasgupta A (2011) CUDA performance analyzer. Ph.D. thesis, School of Electrical and Computer Engineering, Georgia Institute of Technology Dasgupta A (2011) CUDA performance analyzer. Ph.D. thesis, School of Electrical and Computer Engineering, Georgia Institute of Technology
6.
go back to reference Farooqui N, Kerr A, Diamos G, Yalamanchili S, Schwan K (2011) A framework for dynamically instrumenting GPU compute applications within GPU Ocelot. In: Proceedings of 4th workshop on GPGPU, GPGPU-4, x. ACM, New York, NY, pp 9:1–9:9 Farooqui N, Kerr A, Diamos G, Yalamanchili S, Schwan K (2011) A framework for dynamically instrumenting GPU compute applications within GPU Ocelot. In: Proceedings of 4th workshop on GPGPU, GPGPU-4, x. ACM, New York, NY, pp 9:1–9:9
7.
go back to reference Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. InPar 2012:1–10 Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. InPar 2012:1–10
8.
go back to reference Harris M (2008) Optimizing parallel reduction in CUDA. NVIDIA Harris M (2008) Optimizing parallel reduction in CUDA. NVIDIA
9.
go back to reference Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, San Francisco, CA, USA, p 258 Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, San Francisco, CA, USA, p 258
10.
go back to reference Martín P, Torres R, Gavilanes A (2009) CUDA solutions for the SSSP problem. In: Allen G, Nabrzyski J, Seidel E, van Albada G, Dongarra J, Sloot P (eds) Computational science—ICCS 2009, LNCS, vol 5544. Springer, Berlin, pp 904–913 Martín P, Torres R, Gavilanes A (2009) CUDA solutions for the SSSP problem. In: Allen G, Nabrzyski J, Seidel E, van Albada G, Dongarra J, Sloot P (eds) Computational science—ICCS 2009, LNCS, vol 5544. Springer, Berlin, pp 904–913
11.
go back to reference Nobari S, Lu X, Karras P, Bressan S (2011) Fast random graph generation. In: Proceedings of 14th international Conference on EDBT/ICDT ’11. ACM, NY, pp 331–342 Nobari S, Lu X, Karras P, Bressan S (2011) Fast random graph generation. In: Proceedings of 14th international Conference on EDBT/ICDT ’11. ACM, NY, pp 331–342
12.
go back to reference Ortega-Arranz H, Torres Y, Llanos DR., Gonzalez-Escribano A (2013) A new GPU-based approach to the shortest path problem. In: High performance computing and simulation (HPCS), 2013 international Conference on, pp 505–512 Ortega-Arranz H, Torres Y, Llanos DR., Gonzalez-Escribano A (2013) A new GPU-based approach to the shortest path problem. In: High performance computing and simulation (HPCS), 2013 international Conference on, pp 505–512
13.
go back to reference Rétvári G, Bíró JJ, Cinkler T (2007) On shortest path representation. IEEE ACM Trans Netw 15:1293–1306CrossRef Rétvári G, Bíró JJ, Cinkler T (2007) On shortest path representation. IEEE ACM Trans Netw 15:1293–1306CrossRef
14.
go back to reference Torres Y, González-Escribano A, Llanos DR (2012) uBench: performance impact of CUDA block geometry. In: Techniocal report IT-DI-2012-0001, Universidad de Valladolid Torres Y, González-Escribano A, Llanos DR (2012) uBench: performance impact of CUDA block geometry. In: Techniocal report IT-DI-2012-0001, Universidad de Valladolid
15.
go back to reference Torres Y, Gonzalez-Escribano A, Llanos DR (2013) uBench: exposing the impact of CUDA block geometry in terms of performance. J Supercomput 65:1–14 Torres Y, Gonzalez-Escribano A, Llanos DR (2013) uBench: exposing the impact of CUDA block geometry in terms of performance. J Supercomput 65:1–14
16.
go back to reference Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76CrossRef Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76CrossRef
Metadata
Title
Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria
Authors
Hector Ortega-Arranz
Yuri Torres
Arturo Gonzalez-Escribano
Diego R. Llanos
Publication date
01-11-2014
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 2/2014
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1212-z

Other articles of this Issue 2/2014

The Journal of Supercomputing 2/2014 Go to the issue

Premium Partner