nach oben

The Journal of Supercomputing

Erschienen in:

01.11.2015

Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs

verfasst von: Jörg Dümmler, Sebastian Egerland

Erschienen in: The Journal of Supercomputing | Ausgabe 11/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The article proposes a cost model for two implementations of the all-pairs-shortest-path (APSP) problem on graphics processing units (GPUs) with a particular focus on the correct reproduction of the shape of the runtime curve for different input problem sizes and thread block configurations. The model categorizes thread blocks according to the number of active warps and defines four different approaches to estimate the mapping of thread blocks to the hardware execution resources. Each of these approaches leads to a runtime prediction, which together span an interval for the expected execution time. An experimental evaluation shows that most characteristics of the runtime curve are predicted correctly, which is especially important for small graphs where a tiny increase of the input size may result in a significant increase of the execution time. For large graphs, we show that the cost prediction deviates less than 1 % from the measured times in many cases.

Vorheriger Artikel Fault-tolerant broadcast in anonymous systems

Nächster Artikel Scheduling array redistribution with virtual channel support

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Baghsorkhi S, Delahaye M, Patel S, Gropp W, Hwu WmW (2010) An adaptive performance modeling tool for GPU architectures. SIGPLAN Not 45(5):105–114CrossRef

Brodtkorb A, Hagen T, Sætra M (2013) Graphics processing unit (GPU) programming strategies and trends in GPU computing. J Parallel Distrib Comput 73(1):4–13CrossRef

Buluç A, Gilbert J, Budak C (2010) Solving path problems on the GPU. Parallel Comput 36(5–6):241–253CrossRefMATH

Che S, Skadron K (2014) BenchFriend: correlating the performance of GPU benchmarks. Int J High Perform Comput Appl 28(2):238–250CrossRef

Guo P, Wang L (2014) Accurate cross architecture performance modeling for sparse matrix—vector multiplication (SpMV) on GPUs. Concurr Comput Pract Exp 27(13):3281–3294. doi:10.1002/cpe.3217

Harish P, Narayanan P (2007) Accelerating large graph algorithms on the GPU using CUDA. In: Proceedings of the 14th international conference on high performance computing (HiPC ’07). Springer, Berlin pp 197–208

Hasan K, Chatterjee A, Radhakrishnan S, Antonio J (2014) Performance prediction model and analysis for compute-intensive tasks on GPUs. In: Hsu CH, Shi X, Salapura V (eds) Proceedings of the 11th IFIP international conference on network and parallel computing (NPC’14). Lecture notes in computer science, vol 8707. Springer, Berlin, pp 612–617

Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput Archit News 37(3):152–163MathSciNetCrossRef

Hu Z, Liu G, Dong W (2014) A throughput-aware analytical performance model for GPU applications. In: Proceedings of the 10th annual conference on advanced computer architecture (ACA ’14). Springer, Berlin, pp 98–112

10.

Karami A, Mirsoleimani S, Khunjush F (2013) A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs. In: Proceedings of the 17th CSI international symposium on computer architecture and digital systems (CADS ’13). IEEE, pp 15–22

11.

Kerr A, Diamos G, Yalamanchili S (2010) Modeling GPU–CPU workloads and systems. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units (GPGPU ’10). ACM, New York, pp 31–42

12.

Khronos OpenCL Working Group (2013) The OpenCL specification, Version 2.0. http://www.khronos.org/opencl

13.

Kothapalli K, Mukherjee R, Rehman M, Patidar S, Narayanan P, Srinathan K (2009) A performance prediction model for the CUDA GPGPU platform. In: Proceedings of the 2009 international conference on high performance computing (HiPC ’09). IEEE, pp 463–472

14.

Lai J, Seznec A (2013) Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs. In: Proceedings of the 2013 IEEE/ACM international symposium on code generation and optimization (CGO ’13). IEEE Computer Society, Washington, pp 1–10

15.

Lawler E (1976) Combinatorial optimization: networks and matroids. Holt, Rinehart and Winston, New York

16.

Lopez-Novoa U, Mendiburu A, Miguel-Alonso J (2014) A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans Parallel Distrib Syst PP(99):1–1

17.

Ma L, Chamberlain R, Agrawal K (2014) Performance modeling for highly-threaded many-core GPUs. In: Proceedings of the 25th international conference on application-specific systems, architectures and processors (ASAP ’14). IEEE, pp. 84–91

18.

Meng J, Morozov V, Kumaran K, Vishwanath V, Uram T (2011) GROPHECY: GPU performance projection from CPU code skeletons. In: Proceedings of the 2011 international conference for high performance computing, networking, storage and analysis (SC ’11). ACM, New York, pp 14:1–14:11

19.

Nvidia: CUDA occupancy calculator. http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls

20.

Nvidia (2015) CUDA toolkit documentation, Version 7.0. http://docs.nvidia.com/cuda

21.

Sato K, Komatsu K, Takizawa H, Kobayashi H (2011) A history-based performance prediction model with profile data classification for automatic task allocation in heterogeneous computing systems. In: Proceedings of the 9th international symposium on parallel and distributed processing with applications (ISPA ’11). IEEE Computer Society, Washington, pp 135–142

22.

Sim J, Dasgupta A, Kim H, Vuduc R (2012) A performance analysis framework for identifying potential benefits in GPGPU applications. SIGPLAN Not 47(8):11–22CrossRef

23.

Torres Y, Gonzalez-Escribano A, Llanos D (2013) uBench: exposing the impact of CUDA block geometry in terms of performance. J Supercomput 65(3):1150–1163CrossRef

24.

Tran QN (2010) Designing efficient many-core parallel algorithms for all-pairs shortest-paths using CUDA. In: Proceedings of the 7th international conference on information technology: new generations (ITNG ’10). IEEE Computer Society, Washington, pp 7–12

25.

Zhang Y, Owens J (2011) A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE international symposium on high performance computer architecture (HPCA ’11). IEEE Computer Society, Washington, pp 382–393

Titel: Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs
verfasst von: Jörg Dümmler
Sebastian Egerland
Publikationsdatum: 01.11.2015
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 11/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-015-1514-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 11/2015

Fault-tolerant broadcast in anonymous systems

A hybrid heuristic-based tuned support vector regression model for cloud load prediction

A scalability prediction approach for multi-threaded applications on manycore processors

Master-based routing algorithm and communication-based cluster topology for 2D NoC

Comments on two schemes of identity-based user authentication and key agreement for mobile client–server networks

Dynamic replication strategies in data grid systems: a survey

Premium Partner