Skip to main content

2019 | OriginalPaper | Buchkapitel

Fast Heuristic-Based GPU Compiler Sequence Specialization

verfasst von : Ricardo Nobre, Luís Reis, João M. P. Cardoso

Erschienen in: Euro-Par 2018: Parallel Processing Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Iterative compilation focused on specialized phase orders (i.e., custom selections of compiler passes and orderings for each program or function) can significantly improve the performance of compiled code. However, phase ordering specialization typically needs to deal with large solution space. A previous approach, evaluated by targeting an x86 CPU, mitigates this issue by first using a training phase on reference codes to produce a small set of high-quality reusable phase orders. This approach then uses these phase orders to compile new codes, without any code analysis. In this paper, we evaluate the viability of using this approach to optimize the GPU execution performance of OpenCL kernels. In addition, we propose and evaluate the use of a heuristic to further reduce the number of evaluated phase orders, by comparing the speedups of the resulting binaries with those of the training phase for each phase order. This information is used to predict which untested phase order is most likely to produce good results (e.g., highest speedup). We performed our measurements using the PolyBench/GPU OpenCL benchmark suite on an NVIDIA Pascal GPU. Without heuristics, we can achieve a geomean execution speedup of 1.64\(\times \), using cross-validation, with 5 non-standard phase orders. With the heuristic, we can achieve the same speedup with only 3 non-standard phase orders. This is close to the geomean speedup achieved in our iterative compilation experiments exploring thousands of phase orders. Given the significant reduction in exploration time and other advantages of this approach, we believe that it is suitable for a wide range of compiler users concerned with performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agakov, F., et al.: Using machine learning to focus iterative optimization. In: CGO 2006, pp. 295–305. IEEE Computer Society, Washington, DC (2006) Agakov, F., et al.: Using machine learning to focus iterative optimization. In: CGO 2006, pp. 295–305. IEEE Computer Society, Washington, DC (2006)
2.
Zurück zum Zitat Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2006)MATH Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2006)MATH
3.
Zurück zum Zitat Almagor, L., et al.: Finding effective compilation sequences. In: LCTES 2004, pp. 231–239. ACM, New York (2004)CrossRef Almagor, L., et al.: Finding effective compilation sequences. In: LCTES 2004, pp. 231–239. ACM, New York (2004)CrossRef
4.
Zurück zum Zitat Ashouri, A.H., Bignoli, A., Palermo, G., Silvano, C., Kulkarni, S., Cavazos, J.: Micomp: mitigating the compiler phase-ordering problem using optimization sub-sequences and machine learning. ACM TACO 14(3), 29 (2017) Ashouri, A.H., Bignoli, A., Palermo, G., Silvano, C., Kulkarni, S., Cavazos, J.: Micomp: mitigating the compiler phase-ordering problem using optimization sub-sequences and machine learning. ACM TACO 14(3), 29 (2017)
5.
Zurück zum Zitat Ashouri, A.H., Bignoli, A., Palermo, G., Silvano, C.: Predictive modeling methodology for compiler phase-ordering. In: PARMA-DITAM 2016, pp. 7–12. ACM, New York (2016) Ashouri, A.H., Bignoli, A., Palermo, G., Silvano, C.: Predictive modeling methodology for compiler phase-ordering. In: PARMA-DITAM 2016, pp. 7–12. ACM, New York (2016)
6.
Zurück zum Zitat Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE IISWC, October 2009 Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE IISWC, October 2009
7.
Zurück zum Zitat Cooper, K.D., et al.: Exploring the structure of the space of compilation sequences using randomized search algorithms. J. Supercomput. 36(2), 135–151 (2006)MathSciNetCrossRef Cooper, K.D., et al.: Exploring the structure of the space of compilation sequences using randomized search algorithms. J. Supercomput. 36(2), 135–151 (2006)MathSciNetCrossRef
8.
Zurück zum Zitat Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: LCTES 1999, pp. 1–9. ACM, New York (1999) Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: LCTES 1999, pp. 1–9. ACM, New York (1999)
9.
Zurück zum Zitat Eide, E., Regehr, J.: Volatiles are miscompiled, and what to do about it. In: Proceedings of the 8th ACM International Conference on Embedded Software, EMSOFT 2008, pp. 255–264. ACM, New York (2008) Eide, E., Regehr, J.: Volatiles are miscompiled, and what to do about it. In: Proceedings of the 8th ACM International Conference on Embedded Software, EMSOFT 2008, pp. 255–264. ACM, New York (2008)
10.
Zurück zum Zitat Huang, Q., et al.: The effect of compiler optimizations on high-level synthesis-generated hardware. ACM TRETS 8(3), 14:1–14:26 (2015) Huang, Q., et al.: The effect of compiler optimizations on high-level synthesis-generated hardware. ACM TRETS 8(3), 14:1–14:26 (2015)
11.
Zurück zum Zitat Kulkarni, S., Cavazos, J.: Mitigating the compiler optimization phase-ordering problem using machine learning. In: OOPSLA 2012, pp. 147–162. ACM, New York (2012)CrossRef Kulkarni, S., Cavazos, J.: Mitigating the compiler optimization phase-ordering problem using machine learning. In: OOPSLA 2012, pp. 147–162. ACM, New York (2012)CrossRef
12.
Zurück zum Zitat Martins, L.G.A., Nobre, R., Cardoso, J.M.P., Delbem, A.C.B., Marques, E.: Clustering-based selection for the exploration of compiler optimization sequences. ACM TACO 13(1), 8:1–8:28 (2016) Martins, L.G.A., Nobre, R., Cardoso, J.M.P., Delbem, A.C.B., Marques, E.: Clustering-based selection for the exploration of compiler optimization sequences. ACM TACO 13(1), 8:1–8:28 (2016)
13.
Zurück zum Zitat Nobre, R.: Identifying sequences of optimizations for HW/SW compilation. In: FPL 2013, pp. 1–2, September 2013 Nobre, R.: Identifying sequences of optimizations for HW/SW compilation. In: FPL 2013, pp. 1–2, September 2013
14.
Zurück zum Zitat Nobre, R., Martins, L.G.A., Cardoso, J.a.M.P.: A graph-based iterative compiler pass selection and phase ordering approach. In: LCTES 2016, pp. 21–30. ACM, New York (2016) Nobre, R., Martins, L.G.A., Cardoso, J.a.M.P.: A graph-based iterative compiler pass selection and phase ordering approach. In: LCTES 2016, pp. 21–30. ACM, New York (2016)
16.
Zurück zum Zitat Purini, S., Jain, L.: Finding good optimization sequences covering program space. ACM TACO 9(4), 56:1–56:23 (2013) Purini, S., Jain, L.: Finding good optimization sequences covering program space. ACM TACO 9(4), 56:1–56:23 (2013)
18.
Zurück zum Zitat Seo, S., Jo, G., Lee, J.: Performance characterization of the NAS parallel benchmarks in OpenCL. In: IISWC 2011, pp. 137–148. IEEE Computer Society, Washington, DC (2011) Seo, S., Jo, G., Lee, J.: Performance characterization of the NAS parallel benchmarks in OpenCL. In: IISWC 2011, pp. 137–148. IEEE Computer Society, Washington, DC (2011)
19.
Zurück zum Zitat Sher, G., Martin, K., Dechev, D.: Preliminary results for neuroevolutionary optimization phase order generation for static compilation. In: ODES 2014, pp. 33–40. ACM, New York (2014) Sher, G., Martin, K., Dechev, D.: Preliminary results for neuroevolutionary optimization phase order generation for static compilation. In: ODES 2014, pp. 33–40. ACM, New York (2014)
20.
Zurück zum Zitat Zhao, J., Nagarakatte, S., Martin, M.M., Zdancewic, S.: Formal verification of SSA-based optimizations for LLVM. SIGPLAN Not. 48(6), 175–186 (2013)CrossRef Zhao, J., Nagarakatte, S., Martin, M.M., Zdancewic, S.: Formal verification of SSA-based optimizations for LLVM. SIGPLAN Not. 48(6), 175–186 (2013)CrossRef
Metadaten
Titel
Fast Heuristic-Based GPU Compiler Sequence Specialization
verfasst von
Ricardo Nobre
Luís Reis
João M. P. Cardoso
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-10549-5_39

Premium Partner