Skip to main content
Erschienen in: The Journal of Supercomputing 11/2015

01.11.2015

Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs

verfasst von: Jörg Dümmler, Sebastian Egerland

Erschienen in: The Journal of Supercomputing | Ausgabe 11/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The article proposes a cost model for two implementations of the all-pairs-shortest-path (APSP) problem on graphics processing units (GPUs) with a particular focus on the correct reproduction of the shape of the runtime curve for different input problem sizes and thread block configurations. The model categorizes thread blocks according to the number of active warps and defines four different approaches to estimate the mapping of thread blocks to the hardware execution resources. Each of these approaches leads to a runtime prediction, which together span an interval for the expected execution time. An experimental evaluation shows that most characteristics of the runtime curve are predicted correctly, which is especially important for small graphs where a tiny increase of the input size may result in a significant increase of the execution time. For large graphs, we show that the cost prediction deviates less than 1 % from the measured times in many cases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baghsorkhi S, Delahaye M, Patel S, Gropp W, Hwu WmW (2010) An adaptive performance modeling tool for GPU architectures. SIGPLAN Not 45(5):105–114CrossRef Baghsorkhi S, Delahaye M, Patel S, Gropp W, Hwu WmW (2010) An adaptive performance modeling tool for GPU architectures. SIGPLAN Not 45(5):105–114CrossRef
2.
Zurück zum Zitat Brodtkorb A, Hagen T, Sætra M (2013) Graphics processing unit (GPU) programming strategies and trends in GPU computing. J Parallel Distrib Comput 73(1):4–13CrossRef Brodtkorb A, Hagen T, Sætra M (2013) Graphics processing unit (GPU) programming strategies and trends in GPU computing. J Parallel Distrib Comput 73(1):4–13CrossRef
3.
Zurück zum Zitat Buluç A, Gilbert J, Budak C (2010) Solving path problems on the GPU. Parallel Comput 36(5–6):241–253CrossRefMATH Buluç A, Gilbert J, Budak C (2010) Solving path problems on the GPU. Parallel Comput 36(5–6):241–253CrossRefMATH
4.
Zurück zum Zitat Che S, Skadron K (2014) BenchFriend: correlating the performance of GPU benchmarks. Int J High Perform Comput Appl 28(2):238–250CrossRef Che S, Skadron K (2014) BenchFriend: correlating the performance of GPU benchmarks. Int J High Perform Comput Appl 28(2):238–250CrossRef
5.
Zurück zum Zitat Guo P, Wang L (2014) Accurate cross architecture performance modeling for sparse matrix—vector multiplication (SpMV) on GPUs. Concurr Comput Pract Exp 27(13):3281–3294. doi:10.1002/cpe.3217 Guo P, Wang L (2014) Accurate cross architecture performance modeling for sparse matrix—vector multiplication (SpMV) on GPUs. Concurr Comput Pract Exp 27(13):3281–3294. doi:10.​1002/​cpe.​3217
6.
Zurück zum Zitat Harish P, Narayanan P (2007) Accelerating large graph algorithms on the GPU using CUDA. In: Proceedings of the 14th international conference on high performance computing (HiPC ’07). Springer, Berlin pp 197–208 Harish P, Narayanan P (2007) Accelerating large graph algorithms on the GPU using CUDA. In: Proceedings of the 14th international conference on high performance computing (HiPC ’07). Springer, Berlin pp 197–208
7.
Zurück zum Zitat Hasan K, Chatterjee A, Radhakrishnan S, Antonio J (2014) Performance prediction model and analysis for compute-intensive tasks on GPUs. In: Hsu CH, Shi X, Salapura V (eds) Proceedings of the 11th IFIP international conference on network and parallel computing (NPC’14). Lecture notes in computer science, vol 8707. Springer, Berlin, pp 612–617 Hasan K, Chatterjee A, Radhakrishnan S, Antonio J (2014) Performance prediction model and analysis for compute-intensive tasks on GPUs. In: Hsu CH, Shi X, Salapura V (eds) Proceedings of the 11th IFIP international conference on network and parallel computing (NPC’14). Lecture notes in computer science, vol 8707. Springer, Berlin, pp 612–617
8.
Zurück zum Zitat Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput Archit News 37(3):152–163MathSciNetCrossRef Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput Archit News 37(3):152–163MathSciNetCrossRef
9.
Zurück zum Zitat Hu Z, Liu G, Dong W (2014) A throughput-aware analytical performance model for GPU applications. In: Proceedings of the 10th annual conference on advanced computer architecture (ACA ’14). Springer, Berlin, pp 98–112 Hu Z, Liu G, Dong W (2014) A throughput-aware analytical performance model for GPU applications. In: Proceedings of the 10th annual conference on advanced computer architecture (ACA ’14). Springer, Berlin, pp 98–112
10.
Zurück zum Zitat Karami A, Mirsoleimani S, Khunjush F (2013) A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs. In: Proceedings of the 17th CSI international symposium on computer architecture and digital systems (CADS ’13). IEEE, pp 15–22 Karami A, Mirsoleimani S, Khunjush F (2013) A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs. In: Proceedings of the 17th CSI international symposium on computer architecture and digital systems (CADS ’13). IEEE, pp 15–22
11.
Zurück zum Zitat Kerr A, Diamos G, Yalamanchili S (2010) Modeling GPU–CPU workloads and systems. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units (GPGPU ’10). ACM, New York, pp 31–42 Kerr A, Diamos G, Yalamanchili S (2010) Modeling GPU–CPU workloads and systems. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units (GPGPU ’10). ACM, New York, pp 31–42
13.
Zurück zum Zitat Kothapalli K, Mukherjee R, Rehman M, Patidar S, Narayanan P, Srinathan K (2009) A performance prediction model for the CUDA GPGPU platform. In: Proceedings of the 2009 international conference on high performance computing (HiPC ’09). IEEE, pp 463–472 Kothapalli K, Mukherjee R, Rehman M, Patidar S, Narayanan P, Srinathan K (2009) A performance prediction model for the CUDA GPGPU platform. In: Proceedings of the 2009 international conference on high performance computing (HiPC ’09). IEEE, pp 463–472
14.
Zurück zum Zitat Lai J, Seznec A (2013) Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs. In: Proceedings of the 2013 IEEE/ACM international symposium on code generation and optimization (CGO ’13). IEEE Computer Society, Washington, pp 1–10 Lai J, Seznec A (2013) Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs. In: Proceedings of the 2013 IEEE/ACM international symposium on code generation and optimization (CGO ’13). IEEE Computer Society, Washington, pp 1–10
15.
Zurück zum Zitat Lawler E (1976) Combinatorial optimization: networks and matroids. Holt, Rinehart and Winston, New York Lawler E (1976) Combinatorial optimization: networks and matroids. Holt, Rinehart and Winston, New York
16.
Zurück zum Zitat Lopez-Novoa U, Mendiburu A, Miguel-Alonso J (2014) A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans Parallel Distrib Syst PP(99):1–1 Lopez-Novoa U, Mendiburu A, Miguel-Alonso J (2014) A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans Parallel Distrib Syst PP(99):1–1
17.
Zurück zum Zitat Ma L, Chamberlain R, Agrawal K (2014) Performance modeling for highly-threaded many-core GPUs. In: Proceedings of the 25th international conference on application-specific systems, architectures and processors (ASAP ’14). IEEE, pp. 84–91 Ma L, Chamberlain R, Agrawal K (2014) Performance modeling for highly-threaded many-core GPUs. In: Proceedings of the 25th international conference on application-specific systems, architectures and processors (ASAP ’14). IEEE, pp. 84–91
18.
Zurück zum Zitat Meng J, Morozov V, Kumaran K, Vishwanath V, Uram T (2011) GROPHECY: GPU performance projection from CPU code skeletons. In: Proceedings of the 2011 international conference for high performance computing, networking, storage and analysis (SC ’11). ACM, New York, pp 14:1–14:11 Meng J, Morozov V, Kumaran K, Vishwanath V, Uram T (2011) GROPHECY: GPU performance projection from CPU code skeletons. In: Proceedings of the 2011 international conference for high performance computing, networking, storage and analysis (SC ’11). ACM, New York, pp 14:1–14:11
21.
Zurück zum Zitat Sato K, Komatsu K, Takizawa H, Kobayashi H (2011) A history-based performance prediction model with profile data classification for automatic task allocation in heterogeneous computing systems. In: Proceedings of the 9th international symposium on parallel and distributed processing with applications (ISPA ’11). IEEE Computer Society, Washington, pp 135–142 Sato K, Komatsu K, Takizawa H, Kobayashi H (2011) A history-based performance prediction model with profile data classification for automatic task allocation in heterogeneous computing systems. In: Proceedings of the 9th international symposium on parallel and distributed processing with applications (ISPA ’11). IEEE Computer Society, Washington, pp 135–142
22.
Zurück zum Zitat Sim J, Dasgupta A, Kim H, Vuduc R (2012) A performance analysis framework for identifying potential benefits in GPGPU applications. SIGPLAN Not 47(8):11–22CrossRef Sim J, Dasgupta A, Kim H, Vuduc R (2012) A performance analysis framework for identifying potential benefits in GPGPU applications. SIGPLAN Not 47(8):11–22CrossRef
23.
Zurück zum Zitat Torres Y, Gonzalez-Escribano A, Llanos D (2013) uBench: exposing the impact of CUDA block geometry in terms of performance. J Supercomput 65(3):1150–1163CrossRef Torres Y, Gonzalez-Escribano A, Llanos D (2013) uBench: exposing the impact of CUDA block geometry in terms of performance. J Supercomput 65(3):1150–1163CrossRef
24.
Zurück zum Zitat Tran QN (2010) Designing efficient many-core parallel algorithms for all-pairs shortest-paths using CUDA. In: Proceedings of the 7th international conference on information technology: new generations (ITNG ’10). IEEE Computer Society, Washington, pp 7–12 Tran QN (2010) Designing efficient many-core parallel algorithms for all-pairs shortest-paths using CUDA. In: Proceedings of the 7th international conference on information technology: new generations (ITNG ’10). IEEE Computer Society, Washington, pp 7–12
25.
Zurück zum Zitat Zhang Y, Owens J (2011) A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE international symposium on high performance computer architecture (HPCA ’11). IEEE Computer Society, Washington, pp 382–393 Zhang Y, Owens J (2011) A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE international symposium on high performance computer architecture (HPCA ’11). IEEE Computer Society, Washington, pp 382–393
Metadaten
Titel
Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs
verfasst von
Jörg Dümmler
Sebastian Egerland
Publikationsdatum
01.11.2015
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 11/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-015-1514-9

Weitere Artikel der Ausgabe 11/2015

The Journal of Supercomputing 11/2015 Zur Ausgabe

Premium Partner