Top

Published in:

2018 | OriginalPaper | Chapter

Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms

Authors : Konrad Moren, Diana Göhringer

Published in: Computational Science – ICCS 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Heterogeneous computing systems with multiple CPUs and GPUs are increasingly popular. Today, heterogeneous platforms are deployed in many setups, ranging from low-power mobile systems to high performance computing systems. Such platforms are usually programmed using OpenCL which allows to execute the same program on different types of device. Nevertheless, programming such platforms is a challenging job for most non-expert programmers. To enable an efficient application runtime on heterogeneous platforms, programmers require an efficient workload distribution to the available compute devices. The decision how the application should be mapped is non-trivial. In this paper, we present a new approach to build accurate predictive-models for OpenCL programs. We use a machine learning-based predictive model to estimate which device allows best application speed-up. With the LLVM compiler framework we develop a tool for dynamic code-feature extraction. We demonstrate the effectiveness of our novel approach by applying it to different prediction schemes. Using our dynamic feature extraction techniques, we are able to build accurate predictive models, with accuracies varying between 77% and 90%, depending on the prediction mechanism and the scenario. We evaluated our method on an extensive set of parallel applications. One of our findings is that dynamically extracted code features improve the accuracy of the predictive-models by 6.1% on average (maximum 9.5%) as compared to the state of the art.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Architecture Emulation and Simulation of Future Many-Core Epiphany RISC Array Processors

next chapter Combining Data Mining Techniques to Enhance Cardiac Arrhythmia Detection

Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Gropp, W., Matsuoka, S., (eds.) International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, SC 2013, 17–21 November 2013, pp. 45:1–45:12. ACM, New York (2013)

Hoefler, T., Gropp, W., Kramer, W., Snir, M.: Performance modeling for systematic performance tuning. In: State of the Practice Reports, SC 2011, pp. 6:1–6:12. ACM,New York (2011)

Lopez-Novoa, U., Mendiburu, A., Miguel-Alonso, J.: A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans. Parallel Distrib. Syst. 26(1), 272–281 (2015)CrossRef

Bailey, D.H., Snavely, A.: Performance modeling: understanding the past and predicting the future. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 185–195. Springer, Heidelberg (2005). https://doi.org/10.1007/11549468_23CrossRef

Nagasaka, H., Maruyama, N., Nukada, A., Endo, T., Matsuoka, S.: Statistical power modeling of GPU kernels using performance counters. In: Green Computing Conference, pp. 115–122. IEEE Computer Society (2010)

Kerr, A., Diamos, G.F., Yalamanchili, S.: Modeling GPU-CPU workloads and systems. In: Kaeli, D.R., Leeser, M., (eds.) Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2010, Pittsburgh, Pennsylvania, USA, 14 March 2010. ACM International Conference Proceeding Series, vol. 425, pp. 31–42. ACM (2010)

Dao, T.T., Kim, J., Seo, S., Egger, B., Lee, J.: A performance model for GPUs with caches. IEEE Trans. Parallel Distrib. Syst. 26(7), 1800–1813 (2015)CrossRef

Baldini, I., Fink, S.J., Altman, E.R.: Predicting GPU performance from CPU runs using machine learning. In: SBAC-PAD, Washington, DC, USA, pp. 254–261. IEEE Computer Society (2014)

Tripathy, B., Dash, S., Padhy, S.K.: Multiprocessor scheduling and neural network training methods using shuffled frog-leaping algorithm. Comput. Ind. Eng. 80, 154–158 (2015)CrossRef

10.

Grewe, D., O’Boyle, M.F.P.: A static task partitioning approach for heterogeneous systems using OpenCL. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 286–305. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19861-8_16CrossRef

11.

Magni, A., Dubach, C., O’Boyle, M.F.P.: Automatic optimization of thread-coarsening for graphics processors. In: Amaral, J.N., Torrellas, J., (eds.) PACT, pp. 455–466. ACM (2014)

12.

Kofler, K., Grasso, I., Cosenza, B., Fahringer, T.: An automatic input-sensitive approach for heterogeneous task partitioning. In: Malony, A.D., Nemirovsky, M., Midkiff, S.P., (eds.) ICS, pp. 149–160. ACM (2013)

13.

Wen, Y., Wang, Z., O’Boyle, M.F.P.: Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, 17–20 December 2014, pp. 1–10 (2014)

14.

AMD: AMD APP SDK v2.9 (2014)

15.

Lee, J., Kim, J., Seo, S., Kim, S., Park, J., Kim, H., Dao, T.T., Cho, Y., Seo, S.J., Lee, S.H., Cho, S.M., Song, H.J., Suh, S., Choi, J.: An OpenCL framework for heterogeneous multicores with local memory. In: Salapura, V., Gschwind, M., Knoop, J. (eds.) 19th International Conference on Parallel Architecture and Compilation Techniques (PACT 2010), Vienna, Austria, 11–15 September 2010, pp. 193–204. ACM (2010)

16.

Kim, H.S., Hajj, I.E., Stratton, J.A., Lumetta, S.S., Hwu, W.M.: Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures. In: Olukotun, K., Smith, A., Hundt, R., Mars, J. (eds.) Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015, San Francisco, CA, USA, 07–11 February 2015, pp. 257–268. IEEE Computer Society (2015)

17.

Jo, G., Jeon, W.J., Jung, W., Taft, G., Lee, J.: OpenCL framework for arm processors with neon support. In: Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing. WPMVP 2014, pp. 33–40. ACM, New York (2014)

18.

Zima, E.V.: On computational properties of chains of recurrences. In: Proceedings of the 2001 International Symposium on Symbolic and Algebraic Computation. ISSAC 2001, p. 345. ACM, New York (2001)

19.

Engelen, R.A.: Efficient symbolic analysis for optimizing compilers. In: Wilhelm, R. (ed.) CC 2001. LNCS, vol. 2027, pp. 118–132. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45306-7_9CrossRef

20.

Grosser, T., Größlinger, A., Lengauer, C.: Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(4) (2012)MathSciNetCrossRef

21.

Nvidia: NVIDIA OpenCL SDK code samples (2014)

22.

Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar), pp. 1–10, May 2012

23.

Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRef

Title: Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms
Authors: Konrad Moren
Diana Göhringer
Publisher: Springer International Publishing
Book: Computational Science – ICCS 2018
Print ISBN: 978-3-319-93700-7

Electronic ISBN: 978-3-319-93701-4

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-93701-4_23

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner