ABSTRACT
Nowadays, developers can implement applications using OpenCL for all kinds of architectures, like CPUs, GPUs and FPGAs. In this work, we propose a source-to-source compiler that can transform C/C++ source code to optimized OpenCL kernel and host code. This eases the programmability of heterogeneous architectures, to apply different optimization techniques and let the programmer have access during the optimization process. We use polyhedral modeling, which is based on the Polly tool from the LLVM framework, to analysis the code. The information is transferred to the Polyhedral Parallel Code Generator (PPCG), which is a source-to-source compiler that generates OpenCL or CUDA code, since Polly is designed for CPUs. PPCG uses the Abstract Syntax Tree of the Clang for transformation and can thus only process C-code as input. Our approach opens up many new possibilities, like processing other input languages or applying different optimizations, since it works on the intermediate language of the LLVM framework. A main part of the work was the creation of valid C-code from LLVM-IR basic blocks and regions. In addition, a module for recognizing and transforming local variables was created.
- Olaf Bachmann, Paul S. Wang, and Eugene V. Zima. 1994. Chains of Recurrences---a Method to Expedite the Evaluation of Closed-form Functions. In Proceedings of the International Symposium on Symbolic and Algebraic Computation (ISSAC '94). ACM, New York, NY, USA, 242--249. Google ScholarDigital Library
- Muthu Manikandan Baskaran, J. Ramanujam, and P. Sadayappan. 2010. Automatic C-to-CUDA Code Generation for Affine Programs. In Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction (CC'10/ETAPS'10). Springer-Verlag, Berlin, Heidelberg, 244--263.Google Scholar
- Cedric Bastoul. 2004. Code Generation in the Polyhedral Model Is Easier Than You Think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT '04). IEEE Computer Society, Washington, DC, USA, 7--16. Google ScholarCross Ref
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. SIGPLAN Not. 43, 6 (Jun 2008), 101--113. Google ScholarDigital Library
- Alain Darte, Yves Robert, and Frédéric Vivien. 2001. Loop Parallelization Algorithms. Springer Berlin Heidelberg, Berlin, Heidelberg, 141--171.Google Scholar
- Denis Demidov, Karsten Ahnert, Karl Rupp, and Peter Gottschling. 2013. Programming CUDA and OpenCL: A Case Study Using Modern C++ Libraries. SIAM Journal on Scientific Computing 35, 5 (2013), C453--C472.Google ScholarCross Ref
- P. Feautrier. 1988. Parametric integer programming. RAIRO Recherche Opérationnelle 22, 3 (1988), 243--268.Google ScholarCross Ref
- Radhakrishna Giduthuri and Kari Pulli. 2016. OpenVX: A Framework for Accelerating Computer Vision. In SIGGRAPH ASIA 2016 Courses (SA '16). ACM, New York, NY, USA, 14:1--14:50. Google ScholarDigital Library
- Martin Griebl and Christian Lengauer. 1996. The Loop Parallelizer LooPo. In Proc. Sixth Workshop on Compilers for Parallel Computers, volume 21 of Konferenzen des Forschungszentrums JÃijlich. Forschungszentrum, 311--320.Google Scholar
- Sebastian Pop, Albert Cohen, Cédric Bastoul, Sylvain Girbal, Georges-André Silber, and Nicolas Vasilache. 2006. GRAPHITE: Polyhedral analyses and optimizations for GCC. In Proceedings of the GCC Developers' Summit 2006.Google Scholar
- Andrew Richards. 2015. Update on the SYCL for OpenCL Open Standard to Enable C++ Meta Programming on Top of OpenCL. In Proceedings of the 3rd International Workshop on OpenCL (IWOCL '15). ACM, New York, NY, USA, 9:1--9:1. Google ScholarDigital Library
- Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. 2016. Benchmarking State-of-the-Art Deep Learning Software Tools. CoRR abs/1608.07249 (2016).Google Scholar
- J. E. Stone, D. Gohara, and G. Shi. 2010. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. Computing in Science Engineering 12, 3 (May 2010), 66--73. Google ScholarDigital Library
- Christian Lengauer Tobias Grosser, Armin Groesslinger. 2012. Polly - Performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters 22 (2012), 54:1--54:23.Google Scholar
- Sven Verdoolaege. 2010. isl: An Integer Set Library for the Polyhedral Model. Springer Berlin Heidelberg, Berlin, Heidelberg, 299--302.Google Scholar
- Sven Verdoolaege. 2016. Presburger Formulas and Polyhedral Compilation. Polly Labs and KU Leuven. 174 pages.Google Scholar
- Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral Parallel Code Generation for CUDA. ACM Trans. Archit. Code Optim. 9, 4 (Jan 2013), 54:1--54:23. Google ScholarDigital Library
- Sven Verdoolaege and Albert Cohen. 2016. Live Range Reordering. In 6p Workshop on Polyhedral Compilation Techniques (IMPACT, associated with HiPEAC). Prag, Czech Republic.Google Scholar
- Sven Verdoolaege and Tobias Grosser. 2012. Polyhedral Extraction Tool. In Second Int. Workshop on Polyhedral Compilation Techniques (IMPACT'12). Paris, France.Google Scholar
- Sandra Wienke, Christian Terboven, James C. Beyer, and Matthias S. Müller. 2014. A Pattern-Based Comparison of OpenACC and OpenMP for Accelerator Computing. Springer International Publishing, Cham, 812--823.Google Scholar
Index Terms
- Automatic OpenCL Code Generation from LLVM-IR using Polyhedral Optimization
Recommendations
Support OpenCL 2.0 Compiler on LLVM for PTX Simulators
Heterogeneous systems that consist of multiple CPUs and GPUs for high-performance computing are becoming increasingly popular, and OpenCL (Open Computing Language) provides a framework for writing programs that can be executed across heterogeneous ...
Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code
ICFP '15Computers have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort resulting in a tension ...
Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code
ICFP 2015: Proceedings of the 20th ACM SIGPLAN International Conference on Functional ProgrammingComputers have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort resulting in a tension ...
Comments