Skip to main content
Top

2018 | OriginalPaper | Chapter

Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms

Authors : Konrad Moren, Diana Göhringer

Published in: Computational Science – ICCS 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Heterogeneous computing systems with multiple CPUs and GPUs are increasingly popular. Today, heterogeneous platforms are deployed in many setups, ranging from low-power mobile systems to high performance computing systems. Such platforms are usually programmed using OpenCL which allows to execute the same program on different types of device. Nevertheless, programming such platforms is a challenging job for most non-expert programmers. To enable an efficient application runtime on heterogeneous platforms, programmers require an efficient workload distribution to the available compute devices. The decision how the application should be mapped is non-trivial. In this paper, we present a new approach to build accurate predictive-models for OpenCL programs. We use a machine learning-based predictive model to estimate which device allows best application speed-up. With the LLVM compiler framework we develop a tool for dynamic code-feature extraction. We demonstrate the effectiveness of our novel approach by applying it to different prediction schemes. Using our dynamic feature extraction techniques, we are able to build accurate predictive models, with accuracies varying between 77% and 90%, depending on the prediction mechanism and the scenario. We evaluated our method on an extensive set of parallel applications. One of our findings is that dynamically extracted code features improve the accuracy of the predictive-models by 6.1% on average (maximum 9.5%) as compared to the state of the art.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Gropp, W., Matsuoka, S., (eds.) International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, SC 2013, 17–21 November 2013, pp. 45:1–45:12. ACM, New York (2013) Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Gropp, W., Matsuoka, S., (eds.) International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, SC 2013, 17–21 November 2013, pp. 45:1–45:12. ACM, New York (2013)
2.
go back to reference Hoefler, T., Gropp, W., Kramer, W., Snir, M.: Performance modeling for systematic performance tuning. In: State of the Practice Reports, SC 2011, pp. 6:1–6:12. ACM,New York (2011) Hoefler, T., Gropp, W., Kramer, W., Snir, M.: Performance modeling for systematic performance tuning. In: State of the Practice Reports, SC 2011, pp. 6:1–6:12. ACM,New York (2011)
3.
go back to reference Lopez-Novoa, U., Mendiburu, A., Miguel-Alonso, J.: A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans. Parallel Distrib. Syst. 26(1), 272–281 (2015)CrossRef Lopez-Novoa, U., Mendiburu, A., Miguel-Alonso, J.: A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans. Parallel Distrib. Syst. 26(1), 272–281 (2015)CrossRef
5.
go back to reference Nagasaka, H., Maruyama, N., Nukada, A., Endo, T., Matsuoka, S.: Statistical power modeling of GPU kernels using performance counters. In: Green Computing Conference, pp. 115–122. IEEE Computer Society (2010) Nagasaka, H., Maruyama, N., Nukada, A., Endo, T., Matsuoka, S.: Statistical power modeling of GPU kernels using performance counters. In: Green Computing Conference, pp. 115–122. IEEE Computer Society (2010)
6.
go back to reference Kerr, A., Diamos, G.F., Yalamanchili, S.: Modeling GPU-CPU workloads and systems. In: Kaeli, D.R., Leeser, M., (eds.) Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2010, Pittsburgh, Pennsylvania, USA, 14 March 2010. ACM International Conference Proceeding Series, vol. 425, pp. 31–42. ACM (2010) Kerr, A., Diamos, G.F., Yalamanchili, S.: Modeling GPU-CPU workloads and systems. In: Kaeli, D.R., Leeser, M., (eds.) Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2010, Pittsburgh, Pennsylvania, USA, 14 March 2010. ACM International Conference Proceeding Series, vol. 425, pp. 31–42. ACM (2010)
7.
go back to reference Dao, T.T., Kim, J., Seo, S., Egger, B., Lee, J.: A performance model for GPUs with caches. IEEE Trans. Parallel Distrib. Syst. 26(7), 1800–1813 (2015)CrossRef Dao, T.T., Kim, J., Seo, S., Egger, B., Lee, J.: A performance model for GPUs with caches. IEEE Trans. Parallel Distrib. Syst. 26(7), 1800–1813 (2015)CrossRef
8.
go back to reference Baldini, I., Fink, S.J., Altman, E.R.: Predicting GPU performance from CPU runs using machine learning. In: SBAC-PAD, Washington, DC, USA, pp. 254–261. IEEE Computer Society (2014) Baldini, I., Fink, S.J., Altman, E.R.: Predicting GPU performance from CPU runs using machine learning. In: SBAC-PAD, Washington, DC, USA, pp. 254–261. IEEE Computer Society (2014)
9.
go back to reference Tripathy, B., Dash, S., Padhy, S.K.: Multiprocessor scheduling and neural network training methods using shuffled frog-leaping algorithm. Comput. Ind. Eng. 80, 154–158 (2015)CrossRef Tripathy, B., Dash, S., Padhy, S.K.: Multiprocessor scheduling and neural network training methods using shuffled frog-leaping algorithm. Comput. Ind. Eng. 80, 154–158 (2015)CrossRef
11.
go back to reference Magni, A., Dubach, C., O’Boyle, M.F.P.: Automatic optimization of thread-coarsening for graphics processors. In: Amaral, J.N., Torrellas, J., (eds.) PACT, pp. 455–466. ACM (2014) Magni, A., Dubach, C., O’Boyle, M.F.P.: Automatic optimization of thread-coarsening for graphics processors. In: Amaral, J.N., Torrellas, J., (eds.) PACT, pp. 455–466. ACM (2014)
12.
go back to reference Kofler, K., Grasso, I., Cosenza, B., Fahringer, T.: An automatic input-sensitive approach for heterogeneous task partitioning. In: Malony, A.D., Nemirovsky, M., Midkiff, S.P., (eds.) ICS, pp. 149–160. ACM (2013) Kofler, K., Grasso, I., Cosenza, B., Fahringer, T.: An automatic input-sensitive approach for heterogeneous task partitioning. In: Malony, A.D., Nemirovsky, M., Midkiff, S.P., (eds.) ICS, pp. 149–160. ACM (2013)
13.
go back to reference Wen, Y., Wang, Z., O’Boyle, M.F.P.: Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, 17–20 December 2014, pp. 1–10 (2014) Wen, Y., Wang, Z., O’Boyle, M.F.P.: Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, 17–20 December 2014, pp. 1–10 (2014)
15.
go back to reference Lee, J., Kim, J., Seo, S., Kim, S., Park, J., Kim, H., Dao, T.T., Cho, Y., Seo, S.J., Lee, S.H., Cho, S.M., Song, H.J., Suh, S., Choi, J.: An OpenCL framework for heterogeneous multicores with local memory. In: Salapura, V., Gschwind, M., Knoop, J. (eds.) 19th International Conference on Parallel Architecture and Compilation Techniques (PACT 2010), Vienna, Austria, 11–15 September 2010, pp. 193–204. ACM (2010) Lee, J., Kim, J., Seo, S., Kim, S., Park, J., Kim, H., Dao, T.T., Cho, Y., Seo, S.J., Lee, S.H., Cho, S.M., Song, H.J., Suh, S., Choi, J.: An OpenCL framework for heterogeneous multicores with local memory. In: Salapura, V., Gschwind, M., Knoop, J. (eds.) 19th International Conference on Parallel Architecture and Compilation Techniques (PACT 2010), Vienna, Austria, 11–15 September 2010, pp. 193–204. ACM (2010)
16.
go back to reference Kim, H.S., Hajj, I.E., Stratton, J.A., Lumetta, S.S., Hwu, W.M.: Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures. In: Olukotun, K., Smith, A., Hundt, R., Mars, J. (eds.) Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015, San Francisco, CA, USA, 07–11 February 2015, pp. 257–268. IEEE Computer Society (2015) Kim, H.S., Hajj, I.E., Stratton, J.A., Lumetta, S.S., Hwu, W.M.: Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures. In: Olukotun, K., Smith, A., Hundt, R., Mars, J. (eds.) Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015, San Francisco, CA, USA, 07–11 February 2015, pp. 257–268. IEEE Computer Society (2015)
17.
go back to reference Jo, G., Jeon, W.J., Jung, W., Taft, G., Lee, J.: OpenCL framework for arm processors with neon support. In: Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing. WPMVP 2014, pp. 33–40. ACM, New York (2014) Jo, G., Jeon, W.J., Jung, W., Taft, G., Lee, J.: OpenCL framework for arm processors with neon support. In: Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing. WPMVP 2014, pp. 33–40. ACM, New York (2014)
18.
go back to reference Zima, E.V.: On computational properties of chains of recurrences. In: Proceedings of the 2001 International Symposium on Symbolic and Algebraic Computation. ISSAC 2001, p. 345. ACM, New York (2001) Zima, E.V.: On computational properties of chains of recurrences. In: Proceedings of the 2001 International Symposium on Symbolic and Algebraic Computation. ISSAC 2001, p. 345. ACM, New York (2001)
20.
go back to reference Grosser, T., Größlinger, A., Lengauer, C.: Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(4) (2012)MathSciNetCrossRef Grosser, T., Größlinger, A., Lengauer, C.: Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(4) (2012)MathSciNetCrossRef
21.
go back to reference Nvidia: NVIDIA OpenCL SDK code samples (2014) Nvidia: NVIDIA OpenCL SDK code samples (2014)
22.
go back to reference Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar), pp. 1–10, May 2012 Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar), pp. 1–10, May 2012
Metadata
Title
Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms
Authors
Konrad Moren
Diana Göhringer
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-93701-4_23

Premium Partner