Skip to main content
Erschienen in: Formal Methods in System Design 1/2022

05.03.2021

Static detection of uncoalesced accesses in GPU programs

verfasst von: Rajeev Alur, Joseph Devietti, Omar S. Navarro Leija, Nimit Singhania

Erschienen in: Formal Methods in System Design | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

GPU programming has become popular due to the high computational capabilities of GPUs. Obtaining significant performance gains with GPU is however challenging and the programmer needs to be aware of various subtleties of the GPU architecture. One such subtlety lies in accessing GPU memory, where certain access patterns can lead to poor performance. Such access patterns are referred to as uncoalesced global memory accesses. This work presents a light-weight compile-time static analysis to identify such accesses in GPU programs. The analysis relies on a novel abstraction which tracks the access pattern across multiple threads. The abstraction enables quick prediction while providing correctness guarantees. We have implemented the analysis in LLVM and compare it against a dynamic analysis implementation. The static analysis identifies 95 pre-existing uncoalesced accesses in Rodinia, a popular benchmark suite of GPU programs, and finishes within seconds for most programs, in comparison to the dynamic analysis which finds 69 accesses and takes orders of magnitude longer to finish.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Allen JR, Kennedy K, Porterfield C, Warren J (1983) Conversion of control dependence to data dependence. In: Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on principles of programming languages, POPL ’83. ACM, New York, NY, USA, pp 177–189. https://doi.org/10.1145/567067.567085 Allen JR, Kennedy K, Porterfield C, Warren J (1983) Conversion of control dependence to data dependence. In: Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on principles of programming languages, POPL ’83. ACM, New York, NY, USA, pp 177–189. https://​doi.​org/​10.​1145/​567067.​567085
2.
Zurück zum Zitat Amilkanthwar M, Balachandran, S (2013) CUPL: A compile-time uncoalesced memory access pattern locator for CUDA. In: Proceedings of the 27th international ACM conference on international conference on supercomputing, ICS ’13. ACM, New York, NY, USA, pp 459–460. https://doi.org/10.1145/2464996.2467288 Amilkanthwar M, Balachandran, S (2013) CUPL: A compile-time uncoalesced memory access pattern locator for CUDA. In: Proceedings of the 27th international ACM conference on international conference on supercomputing, ICS ’13. ACM, New York, NY, USA, pp 459–460. https://​doi.​org/​10.​1145/​2464996.​2467288
3.
Zurück zum Zitat Baskaran MM, Bondhugula U, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P (2008) A compiler framework for optimization of affine loop nests for GPGPUs. In: Proceedings of the 22Nd annual international conference on supercomputing, ICS ’08. ACM, New York, NY, USA, pp 225–234. https://doi.org/10.1145/1375527.1375562 Baskaran MM, Bondhugula U, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P (2008) A compiler framework for optimization of affine loop nests for GPGPUs. In: Proceedings of the 22Nd annual international conference on supercomputing, ICS ’08. ACM, New York, NY, USA, pp 225–234. https://​doi.​org/​10.​1145/​1375527.​1375562
6.
Zurück zum Zitat Boyer RS, Elspas B, Levitt KN (1975) SELECT – a formal system for testing and debugging programs by symbolic execution. In: Proceedings of the international conference on reliable software. ACM, New York, NY, USA, pp 234–245. https://doi.org/10.1145/800027.808445 Boyer RS, Elspas B, Levitt KN (1975) SELECT – a formal system for testing and debugging programs by symbolic execution. In: Proceedings of the international conference on reliable software. ACM, New York, NY, USA, pp 234–245. https://​doi.​org/​10.​1145/​800027.​808445
7.
Zurück zum Zitat Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE international symposium on workload characterization (IISWC), IISWC ’09. IEEE Computer Society, Washington, DC, USA, pp 44–54. https://doi.org/10.1109/IISWC.2009.5306797 Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE international symposium on workload characterization (IISWC), IISWC ’09. IEEE Computer Society, Washington, DC, USA, pp 44–54. https://​doi.​org/​10.​1109/​IISWC.​2009.​5306797
8.
Zurück zum Zitat Cousot P, Cousot R (1977) Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on principles of programming languages, POPL ’77. ACM, New York, NY, USA, pp 238–252. https://doi.org/10.1145/512950.512973 Cousot P, Cousot R (1977) Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on principles of programming languages, POPL ’77. ACM, New York, NY, USA, pp 238–252. https://​doi.​org/​10.​1145/​512950.​512973
9.
Zurück zum Zitat Fauzia N, Pouchet LN, Sadayappan P (2015) Characterizing and enhancing global memory data coalescing on GPUs. In: Proceedings of the 13th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’15. IEEE Computer Society, Washington, DC, USA, pp 12–22. http://dl.acm.org/citation.cfm?id=2738600.2738603 Fauzia N, Pouchet LN, Sadayappan P (2015) Characterizing and enhancing global memory data coalescing on GPUs. In: Proceedings of the 13th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’15. IEEE Computer Society, Washington, DC, USA, pp 12–22. http://​dl.​acm.​org/​citation.​cfm?​id=​2738600.​2738603
10.
Zurück zum Zitat Karrenberg R (2015) Automatic SIMD Vectorization of SSA-based Control Flow Graphs. Springer, BerlinCrossRef Karrenberg R (2015) Automatic SIMD Vectorization of SSA-based Control Flow Graphs. Springer, BerlinCrossRef
13.
Zurück zum Zitat Li G, Gopalakrishnan G (2010) Scalable SMT-based verification of GPU kernel functions. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’10. ACM, New York, NY, USA, pp 187–196. https://doi.org/10.1145/1882291.1882320 Li G, Gopalakrishnan G (2010) Scalable SMT-based verification of GPU kernel functions. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’10. ACM, New York, NY, USA, pp 187–196. https://​doi.​org/​10.​1145/​1882291.​1882320
14.
Zurück zum Zitat Li G, Li P, Sawaya G, Gopalakrishnan G, Ghosh I, Rajan SP (2012) GKLEE: Concolic verification and test generation for GPUs. In: Proceedings of the 17th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’12. ACM, New York, NY, USA, pp 215–224. https://doi.org/10.1145/2145816.2145844 Li G, Li P, Sawaya G, Gopalakrishnan G, Ghosh I, Rajan SP (2012) GKLEE: Concolic verification and test generation for GPUs. In: Proceedings of the 17th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’12. ACM, New York, NY, USA, pp 215–224. https://​doi.​org/​10.​1145/​2145816.​2145844
16.
Zurück zum Zitat Nielson F, Nielson HR, Hankin C (2010) Principles of program analysis. Springer, ChamMATH Nielson F, Nielson HR, Hankin C (2010) Principles of program analysis. Springer, ChamMATH
20.
Zurück zum Zitat Sung IJ, Stratton JA, Hwu WMW (2010) Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques, PACT ’10. ACM, New York, NY, USA, pp 513–522. https://doi.org/10.1145/1854273.1854336 Sung IJ, Stratton JA, Hwu WMW (2010) Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques, PACT ’10. ACM, New York, NY, USA, pp 513–522. https://​doi.​org/​10.​1145/​1854273.​1854336
22.
Zurück zum Zitat Wu J, Belevich A, Bendersky E, Heffernan M, Leary C, Pienaar J, Roune B, Springer R, Weng X, Hundt R (2016) Gpucc: An open-source GPGPU compiler. In: Proceedings of the 2016 international symposium on code generation and optimization, CGO ’16. ACM, New York, NY, USA, pp 105–116. https://doi.org/10.1145/2854038.2854041 Wu J, Belevich A, Bendersky E, Heffernan M, Leary C, Pienaar J, Roune B, Springer R, Weng X, Hundt R (2016) Gpucc: An open-source GPGPU compiler. In: Proceedings of the 2016 international symposium on code generation and optimization, CGO ’16. ACM, New York, NY, USA, pp 105–116. https://​doi.​org/​10.​1145/​2854038.​2854041
23.
Zurück zum Zitat Yang Y, Xiang P, Kong J, Zhou H (2010) A GPGPU compiler for memory optimization and parallelism management. In: Proceedings of the 31st ACM SIGPLAN conference on programming language design and implementation, PLDI ’10. ACM, New York, NY, USA, pp 86–97. https://doi.org/10.1145/1806596.1806606 Yang Y, Xiang P, Kong J, Zhou H (2010) A GPGPU compiler for memory optimization and parallelism management. In: Proceedings of the 31st ACM SIGPLAN conference on programming language design and implementation, PLDI ’10. ACM, New York, NY, USA, pp 86–97. https://​doi.​org/​10.​1145/​1806596.​1806606
Metadaten
Titel
Static detection of uncoalesced accesses in GPU programs
verfasst von
Rajeev Alur
Joseph Devietti
Omar S. Navarro Leija
Nimit Singhania
Publikationsdatum
05.03.2021
Verlag
Springer US
Erschienen in
Formal Methods in System Design / Ausgabe 1/2022
Print ISSN: 0925-9856
Elektronische ISSN: 1572-8102
DOI
https://doi.org/10.1007/s10703-021-00362-8

Premium Partner