nach oben

Formal Methods in System Design

Erschienen in:

05.03.2021

Static detection of uncoalesced accesses in GPU programs

verfasst von: Rajeev Alur, Joseph Devietti, Omar S. Navarro Leija, Nimit Singhania

Erschienen in: Formal Methods in System Design | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

GPU programming has become popular due to the high computational capabilities of GPUs. Obtaining significant performance gains with GPU is however challenging and the programmer needs to be aware of various subtleties of the GPU architecture. One such subtlety lies in accessing GPU memory, where certain access patterns can lead to poor performance. Such access patterns are referred to as uncoalesced global memory accesses. This work presents a light-weight compile-time static analysis to identify such accesses in GPU programs. The analysis relies on a novel abstraction which tracks the access pattern across multiple threads. The abstraction enables quick prediction while providing correctness guarantees. We have implemented the analysis in LLVM and compare it against a dynamic analysis implementation. The static analysis identifies 95 pre-existing uncoalesced accesses in Rodinia, a popular benchmark suite of GPU programs, and finishes within seconds for most programs, in comparison to the dynamic analysis which finds 69 accesses and takes orders of magnitude longer to finish.

Nächster Artikel Markov automata with multiple objectives

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Allen JR, Kennedy K, Porterfield C, Warren J (1983) Conversion of control dependence to data dependence. In: Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on principles of programming languages, POPL ’83. ACM, New York, NY, USA, pp 177–189. https://doi.org/10.1145/567067.567085

Amilkanthwar M, Balachandran, S (2013) CUPL: A compile-time uncoalesced memory access pattern locator for CUDA. In: Proceedings of the 27th international ACM conference on international conference on supercomputing, ICS ’13. ACM, New York, NY, USA, pp 459–460. https://doi.org/10.1145/2464996.2467288

Baskaran MM, Bondhugula U, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P (2008) A compiler framework for optimization of affine loop nests for GPGPUs. In: Proceedings of the 22Nd annual international conference on supercomputing, ICS ’08. ACM, New York, NY, USA, pp 225–234. https://doi.org/10.1145/1375527.1375562

Betts A, Chong N, Donaldson A, Qadeer S, Thomson P (2012) GPUVerify: a verifier for GPU kernels. SIGPLAN Notice 47(10):113–132. https://doi.org/10.1145/2398857.2384625CrossRef

Betts A, Chong N, Donaldson AF, Ketema J, Qadeer S, Thomson P, Wickerson J (2015) The design and implementation of a verification technique for GPU kernels. ACM Trans Program Lang Syst 37(3):10:1-10:49. https://doi.org/10.1145/2743017CrossRef

Boyer RS, Elspas B, Levitt KN (1975) SELECT – a formal system for testing and debugging programs by symbolic execution. In: Proceedings of the international conference on reliable software. ACM, New York, NY, USA, pp 234–245. https://doi.org/10.1145/800027.808445

Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE international symposium on workload characterization (IISWC), IISWC ’09. IEEE Computer Society, Washington, DC, USA, pp 44–54. https://doi.org/10.1109/IISWC.2009.5306797

Cousot P, Cousot R (1977) Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on principles of programming languages, POPL ’77. ACM, New York, NY, USA, pp 238–252. https://doi.org/10.1145/512950.512973

Fauzia N, Pouchet LN, Sadayappan P (2015) Characterizing and enhancing global memory data coalescing on GPUs. In: Proceedings of the 13th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’15. IEEE Computer Society, Washington, DC, USA, pp 12–22. http://dl.acm.org/citation.cfm?id=2738600.2738603

10.

Karrenberg R (2015) Automatic SIMD Vectorization of SSA-based Control Flow Graphs. Springer, BerlinCrossRef

11.

Kim Y, Shrivastava A (2011) CuMAPz: A tool to analyze memory access patterns in CUDA. In: Proceedings of the 48th design automation conference, DAC ’11. ACM, New York, NY, USA, pp 128–133. https://doi.org/10.1145/2024724.2024754

12.

King JC (1975) A new approach to program testing. In: Proceedings of the International Conference on Reliable Software. ACM, New York, NY, USA, pp 228–233. https://doi.org/10.1145/800027.808444

13.

Li G, Gopalakrishnan G (2010) Scalable SMT-based verification of GPU kernel functions. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’10. ACM, New York, NY, USA, pp 187–196. https://doi.org/10.1145/1882291.1882320

14.

Li G, Li P, Sawaya G, Gopalakrishnan G, Ghosh I, Rajan SP (2012) GKLEE: Concolic verification and test generation for GPUs. In: Proceedings of the 17th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’12. ACM, New York, NY, USA, pp 215–224. https://doi.org/10.1145/2145816.2145844

15.

Moll S, Hack S (2018) Partial control-flow linearization. In: Proceedings of the 39th ACM SIGPLAN conference on programming language design and implementation, PLDI 2018. ACM, New York, NY, USA, pp 543–556. https://doi.org/10.1145/3192366.3192413

16.

Nielson F, Nielson HR, Hankin C (2010) Principles of program analysis. Springer, ChamMATH

17.

Nvidia: CUDA C Programming Guide v9.0. http://docs.nvidia.com/cuda/cuda-c-programming-guide/

18.

Nvidia: Nvidia Performance Analysis Tools. http://developer.nvidia.com/performance-analysis-tools/

19.

Pharr M, Mark WR (2012) ispc: A spmd compiler for high-performance cpu programming. In: 2012 innovative parallel computing (InPar), pp 1–13. https://doi.org/10.1109/InPar.2012.6339601

20.

Sung IJ, Stratton JA, Hwu WMW (2010) Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques, PACT ’10. ACM, New York, NY, USA, pp 513–522. https://doi.org/10.1145/1854273.1854336

21.

Ueng SZ, Lathara M, Baghsorkhi SS, Hwu WMW (2008) Languages and compilers for parallel computing. chap. CUDA-Lite: reducing GPU Programming Complexity. Springer, Berlin, pp 1–15. https://doi.org/10.1007/978-3-540-89740-8_1

22.

Wu J, Belevich A, Bendersky E, Heffernan M, Leary C, Pienaar J, Roune B, Springer R, Weng X, Hundt R (2016) Gpucc: An open-source GPGPU compiler. In: Proceedings of the 2016 international symposium on code generation and optimization, CGO ’16. ACM, New York, NY, USA, pp 105–116. https://doi.org/10.1145/2854038.2854041

23.

Yang Y, Xiang P, Kong J, Zhou H (2010) A GPGPU compiler for memory optimization and parallelism management. In: Proceedings of the 31st ACM SIGPLAN conference on programming language design and implementation, PLDI ’10. ACM, New York, NY, USA, pp 86–97. https://doi.org/10.1145/1806596.1806606

Titel: Static detection of uncoalesced accesses in GPU programs
verfasst von: Rajeev Alur
Joseph Devietti
Omar S. Navarro Leija
Nimit Singhania
Publikationsdatum: 05.03.2021
Verlag: Springer US
Erschienen in: Formal Methods in System Design / Ausgabe 1/2022
Print ISSN: 0925-9856
Elektronische ISSN: 1572-8102
DOI: https://doi.org/10.1007/s10703-021-00362-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Premium Partner