Skip to main content
Top
Published in: International Journal of Parallel Programming 2/2017

13-05-2016

Automatic CPU/GPU Generation of Multi-versioned OpenCL Kernels for C++ Scientific Applications

Authors: Rafael Sotomayor, Luis Miguel Sanchez, Javier Garcia Blas, Javier Fernandez, J. Daniel Garcia

Published in: International Journal of Parallel Programming | Issue 2/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Parallelism has become one of the most extended paradigms used to improve performance. However, it forces software developers to adapt applications and coding mechanisms to exploit the available computing devices. Legacy source code needs to be re-written to take advantage of multi- core and many-core computing devices. Writing parallel applications in a traditional way is hard, expensive, and time consuming. Furthermore, there is often more than one possible transformation or optimization that can be applied to a single piece of legacy code. Therefore many parallel versions of the same original sequential code need to be considered. In this paper, we describe an automatic parallel source code generation workflow (REWORK) for parallel heterogeneous platforms. REWORK automatically identifies promising kernels on legacy C++ source code and generates multiple specific versions of kernels for improving C++ applications, selecting the most adequate version based on both static source code and target platform characteristics.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aldinucci, M., Meneghin, M., Torquati, M.: Efficient smith-waterman on multi-core with fastflow. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 195–199. IEEE (2010) Aldinucci, M., Meneghin, M., Torquati, M.: Efficient smith-waterman on multi-core with fastflow. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 195–199. IEEE (2010)
2.
go back to reference Baghdadi, S., Größlinger, A., Cohen, A.: Putting automatic polyhedral compilation for GPGPU to work. In: Proceedings of the 15th Workshop on Compilers for Parallel Computers (CPC’10). Vienna, Austria (2010) Baghdadi, S., Größlinger, A., Cohen, A.: Putting automatic polyhedral compilation for GPGPU to work. In: Proceedings of the 15th Workshop on Compilers for Parallel Computers (CPC’10). Vienna, Austria (2010)
3.
go back to reference Baráth, Á., Porkoláb, Z.: Attribute-based checking of C++ move semantics. In: Proceedings of the 3rd Workshop on Software Quality Analysis, Monitoring, Improvement and Applications (SQAMIA 2014), Lovran, Croatia, September 19-22, 2014., pp. 9–14 (2014) Baráth, Á., Porkoláb, Z.: Attribute-based checking of C++ move semantics. In: Proceedings of the 3rd Workshop on Software Quality Analysis, Monitoring, Improvement and Applications (SQAMIA 2014), Lovran, Croatia, September 19-22, 2014., pp. 9–14 (2014)
4.
go back to reference Baskaran, M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA code generation for affine programs. In: Gupta, R. (ed.) Compiler construction. Lecture notes in computer science, vol. 6011, pp. 244–263. Springer, Berlin (2010)CrossRef Baskaran, M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA code generation for affine programs. In: Gupta, R. (ed.) Compiler construction. Lecture notes in computer science, vol. 6011, pp. 244–263. Springer, Berlin (2010)CrossRef
5.
go back to reference Baskaran, M.M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: A compiler framework for optimization of affine loop nests for gpgpus. In: Proceedings of the 22Nd Annual International Conference on Supercomputing. ICS ’08, pp. 225–234. ACM, New York, NY, USA (2008) Baskaran, M.M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: A compiler framework for optimization of affine loop nests for gpgpus. In: Proceedings of the 22Nd Annual International Conference on Supercomputing. ICS ’08, pp. 225–234. ACM, New York, NY, USA (2008)
6.
go back to reference Bastoul, C.: Extracting polyhedral representation from high level languages. Tech. rep., LRI, Paris-Sud University (2008). Related to the Clan tool Bastoul, C.: Extracting polyhedral representation from high level languages. Tech. rep., LRI, Paris-Sud University (2008). Related to the Clan tool
7.
go back to reference Bertolli, C., Antao, S.F., Eichenberger, A.E., O’Brien, K., Sura, Z., Jacob, A.C., Chen, T., Sallenave, O.: Coordinating GPU Threads for OpenMP 4.0 in LLVM. In: Proceedings of the 2014 LLVM Compiler Infrastructure in HPC. LLVM-HPC ’14, pp. 12–21. IEEE Press, Piscataway, NJ, USA (2014) Bertolli, C., Antao, S.F., Eichenberger, A.E., O’Brien, K., Sura, Z., Jacob, A.C., Chen, T., Sallenave, O.: Coordinating GPU Threads for OpenMP 4.0 in LLVM. In: Proceedings of the 2014 LLVM Compiler Infrastructure in HPC. LLVM-HPC ’14, pp. 12–21. IEEE Press, Piscataway, NJ, USA (2014)
8.
go back to reference Bhattacharyya, A., Amaral, J.N.: Automatic Speculative Parallelization of Loops Using Polyhedral Dependence Analysis. In: Proceedings of the First International Workshop on Code OptimiSation for MultI and Many Cores, COSMIC ’13, pp. 1:1–1:9. ACM, New York, NY, USA (2013) Bhattacharyya, A., Amaral, J.N.: Automatic Speculative Parallelization of Loops Using Polyhedral Dependence Analysis. In: Proceedings of the First International Workshop on Code OptimiSation for MultI and Many Cores, COSMIC ’13, pp. 1:1–1:9. ACM, New York, NY, USA (2013)
9.
go back to reference Bondhugula, U., Bandishti, V., Cohen, A., Potron, G., Vasilache, N.: Tiling and optimizing time-iterated computations on periodic domains. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. PACT ’14, pp. 39–50. ACM, New York, NY, USA (2014) Bondhugula, U., Bandishti, V., Cohen, A., Potron, G., Vasilache, N.: Tiling and optimizing time-iterated computations on periodic domains. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. PACT ’14, pp. 39–50. ACM, New York, NY, USA (2014)
10.
go back to reference Bradski, G., Kaehler, A.: Learning OpenCV: computer vision with the OpenCV library. O’Reilly Media, Inc., California (2008) Bradski, G., Kaehler, A.: Learning OpenCV: computer vision with the OpenCV library. O’Reilly Media, Inc., California (2008)
11.
go back to reference Campa, S., Danelutto, M., Goli, M., González-Vélez, H., Popescu, A.M., Torquati, M.: Parallel patterns for heterogeneous CPU/GPU architectures: structured parallelism from cluster to cloud. Future Gener. Comp. Syst. 37, 354–366 (2014)CrossRef Campa, S., Danelutto, M., Goli, M., González-Vélez, H., Popescu, A.M., Torquati, M.: Parallel patterns for heterogeneous CPU/GPU architectures: structured parallelism from cluster to cloud. Future Gener. Comp. Syst. 37, 354–366 (2014)CrossRef
12.
go back to reference Doerfert, J., Hammacher, C., Streit, K., Hack, S.: SPolly: Speculative Optimizations in the Polyhedral Model. In: Proceedings 3rd International Workshop on Polyhedral Compilation Techniques (IMPACT), pp. 55–61. Berlin, Germany (2013) Doerfert, J., Hammacher, C., Streit, K., Hack, S.: SPolly: Speculative Optimizations in the Polyhedral Model. In: Proceedings 3rd International Workshop on Polyhedral Compilation Techniques (IMPACT), pp. 55–61. Berlin, Germany (2013)
13.
go back to reference Feld, D., Soddemann, T., Jünger, M., Mallach, S.: Hardware-aware automatic code-transformation to support compilers in exploiting the multi-level parallel potential of modern CPUs. In: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, COSMIC ’15, pp. 2:1–2:10. ACM, New York, NY, USA (2015) Feld, D., Soddemann, T., Jünger, M., Mallach, S.: Hardware-aware automatic code-transformation to support compilers in exploiting the multi-level parallel potential of modern CPUs. In: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, COSMIC ’15, pp. 2:1–2:10. ACM, New York, NY, USA (2015)
14.
go back to reference Grewe, D., O’Boyle, M.F.P.: A static task partitioning approach for heterogeneous systems using opencl. In: Proceedings of the 20th International Conference on Compiler Construction: Part of the Joint European Conferences on Theory and Practice of Software. CC’11/ETAPS’11, pp. 286–305. Springer-Verlag, Berlin, Heidelberg (2011) Grewe, D., O’Boyle, M.F.P.: A static task partitioning approach for heterogeneous systems using opencl. In: Proceedings of the 20th International Conference on Compiler Construction: Part of the Joint European Conferences on Theory and Practice of Software. CC’11/ETAPS’11, pp. 286–305. Springer-Verlag, Berlin, Heidelberg (2011)
15.
go back to reference Grewe, D., Wang, Z., O’Boyle, M.: Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In: Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on, pp. 1–10 (2013) Grewe, D., Wang, Z., O’Boyle, M.: Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In: Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on, pp. 1–10 (2013)
16.
go back to reference GROSSER, T., GROESSLINGER, A., LENGAUER, C.: Polly—performing polyhedral optimizations on a low-level intermediate representation. Parallel Proc. Lett. 22(04), 1250,010 (2012)MathSciNetCrossRef GROSSER, T., GROESSLINGER, A., LENGAUER, C.: Polly—performing polyhedral optimizations on a low-level intermediate representation. Parallel Proc. Lett. 22(04), 1250,010 (2012)MathSciNetCrossRef
17.
go back to reference ISO/IEC: Information technology—programming languages – C++. International Standard ISO/IEC 14882:20111, ISO/IEC, Geneva, Switzerland (2011) ISO/IEC: Information technology—programming languages – C++. International Standard ISO/IEC 14882:20111, ISO/IEC, Geneva, Switzerland (2011)
18.
go back to reference Lincke, R., Lundberg, J., Löwe, W.: Comparing software metrics tools. In: Proceedings of the 2008 International Symposium on Software Testing and Analysis. ISSTA ’08, pp. 131–142. ACM, New York, NY, USA (2008) Lincke, R., Lundberg, J., Löwe, W.: Comparing software metrics tools. In: Proceedings of the 2008 International Symposium on Software Testing and Analysis. ISSTA ’08, pp. 131–142. ACM, New York, NY, USA (2008)
19.
go back to reference Ma, K., Li, X., Chen, W., Zhang, C., Wang, X.: GreenGPU: A holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: 2012 41st International Conference on Parallel Processing (ICPP), pp. 48–57. IEEE (2012) Ma, K., Li, X., Chen, W., Zhang, C., Wang, X.: GreenGPU: A holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: 2012 41st International Conference on Parallel Processing (ICPP), pp. 48–57. IEEE (2012)
21.
go back to reference Mikushin, D., Likhogrud, N., Zhang, E.Z., Bergstrom, C.: KernelGen - The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, Phoenix, AZ, USA, May 19-23, 2014, pp. 1011–1020. IEEE (2014) Mikushin, D., Likhogrud, N., Zhang, E.Z., Bergstrom, C.: KernelGen - The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, Phoenix, AZ, USA, May 19-23, 2014, pp. 1011–1020. IEEE (2014)
22.
go back to reference Nugteren, C., Corporaal, H.: Bones: an automatic skeleton-based C-to-CUDA compiler for GPUs. ACM Trans. Archit. Code Optim. 11(4), 35:1–35:25 (2014)CrossRef Nugteren, C., Corporaal, H.: Bones: an automatic skeleton-based C-to-CUDA compiler for GPUs. ACM Trans. Archit. Code Optim. 11(4), 35:1–35:25 (2014)CrossRef
27.
go back to reference Saaty, T.: Fundamentals of the analytic hierarchy process. RWS Publications, 4922 Ellsworth Avenue, Pittsburgh, PA 15413 (2000) Saaty, T.: Fundamentals of the analytic hierarchy process. RWS Publications, 4922 Ellsworth Avenue, Pittsburgh, PA 15413 (2000)
28.
go back to reference Sanchez, L.M., Fernandez, J., Sotomayor, R., Escolar, S., Garcia, J.D.: A comparative study and evaluation of parallel programming models for shared-memory parallel architectures. New Gener. Comput. 31(3), 139–161 (2013)CrossRef Sanchez, L.M., Fernandez, J., Sotomayor, R., Escolar, S., Garcia, J.D.: A comparative study and evaluation of parallel programming models for shared-memory parallel architectures. New Gener. Comput. 31(3), 139–161 (2013)CrossRef
29.
go back to reference Seo, S., Jo, G., Lee, J.: Performance characterization of the NAS parallel benchmarks in OpenCL. In: Workload Characterization (IISWC), 2011 IEEE International Symposium on, pp. 137–148 (2011) Seo, S., Jo, G., Lee, J.: Performance characterization of the NAS parallel benchmarks in OpenCL. In: Workload Characterization (IISWC), 2011 IEEE International Symposium on, pp. 137–148 (2011)
30.
go back to reference Serban, T., Danelutto, M., Kilpatrick, P.: Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes. In: International Conference on High Performance Computing & Simulation, HPCS 2013, Helsinki, Finland, July 1-5, 2013, pp. 72–79 (2013) Serban, T., Danelutto, M., Kilpatrick, P.: Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes. In: International Conference on High Performance Computing & Simulation, HPCS 2013, Helsinki, Finland, July 1-5, 2013, pp. 72–79 (2013)
31.
go back to reference Thouti, K., Sathe, S.R.: A methodology for translating C-programs to openCL. Int. J. Comput. Appl. 82(3), 11–15 (2013) Thouti, K., Sathe, S.R.: A methodology for translating C-programs to openCL. Int. J. Comput. Appl. 82(3), 11–15 (2013)
32.
go back to reference Viñas, M., Fraguela, B.B., Bozkus, Z., Andrade, D.: Improving OpenCL programmability with the heterogeneous programming library. Procedia Computer Science 51, 110–119 (2015). International Conference On Computational Science, ICCS 2015Computational Science at the Gates of Nature Viñas, M., Fraguela, B.B., Bozkus, Z., Andrade, D.: Improving OpenCL programmability with the heterogeneous programming library. Procedia Computer Science 51, 110–119 (2015). International Conference On Computational Science, ICCS 2015Computational Science at the Gates of Nature
33.
go back to reference Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC: First experiences with real-world applications. In: Proceedings of the 18th International Conference on Parallel Processing. Euro-Par’12, pp. 859–870. Springer, Berlin, Heidelberg (2012) Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC: First experiences with real-world applications. In: Proceedings of the 18th International Conference on Parallel Processing. Euro-Par’12, pp. 859–870. Springer, Berlin, Heidelberg (2012)
Metadata
Title
Automatic CPU/GPU Generation of Multi-versioned OpenCL Kernels for C++ Scientific Applications
Authors
Rafael Sotomayor
Luis Miguel Sanchez
Javier Garcia Blas
Javier Fernandez
J. Daniel Garcia
Publication date
13-05-2016
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 2/2017
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-016-0425-6

Other articles of this Issue 2/2017

International Journal of Parallel Programming 2/2017 Go to the issue

Premium Partner