Skip to main content
Top

2018 | OriginalPaper | Chapter

Cache-Aware Roofline Model and Medical Image Processing Optimizations in GPUs

Authors : Estefania Serrano, Aleksandar Ilic, Leonel Sousa, Javier Garcia-Blas, Jesus Carretero

Published in: High Performance Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

When optimizing or porting applications to new architectures, a preliminary characterization is necessary to exploit the maximum computing power of the employed devices. Profiling tools are available for numerous architectures and programming models, making it easier to spot possible bottlenecks. However, for a better interpretation of the collected results, current profilers rely on insightful performance models. In this paper, we describe the Cache Aware Roofline Model (CARM) and tools for its generation to enable the performance characterization of GPU architectures and workloads. We use CARM to characterize two kernels that are part of a 3D iterative reconstruction application for Computed Tomography (CT). These two kernels take most of the execution time of the whole method, being therefore suitable for a deeper analysis. By exploring the model and the methodology proposed, the overall performance of the kernels has been improved up to two times compared to the previous implementations.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Using the compilation flag –maxrregcount=32.
 
Literature
1.
go back to reference Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef
2.
go back to reference Ilic, A., Pratas, F., Sousa, L.: Cache-aware Roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014)CrossRef Ilic, A., Pratas, F., Sousa, L.: Cache-aware Roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014)CrossRef
3.
go back to reference Ilic, A., Pratas, F., Sousa, L.: Beyond the roofline: cache-aware power and energy-efficiency modeling for multi-cores. IEEE Trans. Comput. 66(1), 52–58 (2017)MathSciNetCrossRef Ilic, A., Pratas, F., Sousa, L.: Beyond the roofline: cache-aware power and energy-efficiency modeling for multi-cores. IEEE Trans. Comput. 66(1), 52–58 (2017)MathSciNetCrossRef
5.
go back to reference Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing and Simulation (HPCS), pp. 898–907. IEEE (2017) Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing and Simulation (HPCS), pp. 898–907. IEEE (2017)
6.
go back to reference Lopes, A., Pratas, F., Sousa, L., Ilic, A.: Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling. In: 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 259–268. IEEE (2017) Lopes, A., Pratas, F., Sousa, L., Ilic, A.: Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling. In: 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 259–268. IEEE (2017)
7.
go back to reference Feldkamp, L., Davis, L., Kress, J.: Practical cone-beam algorithm. JOSA A 1(6), 612–619 (1984)CrossRef Feldkamp, L., Davis, L., Kress, J.: Practical cone-beam algorithm. JOSA A 1(6), 612–619 (1984)CrossRef
8.
go back to reference de Molina, C., Serrano, E., Garcia-Blas, J., Carretero, J., Desco, M., Abella, M.: Gpu-accelerated iterative reconstruction for limited-data tomography in CBCT systems. BMC Bioinform. 19(1), 171 (2018)CrossRef de Molina, C., Serrano, E., Garcia-Blas, J., Carretero, J., Desco, M., Abella, M.: Gpu-accelerated iterative reconstruction for limited-data tomography in CBCT systems. BMC Bioinform. 19(1), 171 (2018)CrossRef
9.
go back to reference Abella, M., et al.: FUX-Sim: implementation of a fast universal simulation/reconstruction framework for X-ray systems. PLOS ONE 12(7), 1–22 (2017)CrossRef Abella, M., et al.: FUX-Sim: implementation of a fast universal simulation/reconstruction framework for X-ray systems. PLOS ONE 12(7), 1–22 (2017)CrossRef
10.
go back to reference Weaver, V.M.: Linux perf\_event features and overhead. In: The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, vol. 13 (2013) Weaver, V.M.: Linux perf\_event features and overhead. In: The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, vol. 13 (2013)
11.
go back to reference Dongarra, J., et al.: Performance application programming interface Dongarra, J., et al.: Performance application programming interface
13.
go back to reference Kim, K.-H., Kim, K., Park, Q.-H.: Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model. Comput. Phys. Commun. 182(6), 1201–1207 (2011)CrossRef Kim, K.-H., Kim, K., Park, Q.-H.: Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model. Comput. Phys. Commun. 182(6), 1201–1207 (2011)CrossRef
15.
go back to reference Ryoo, J.H., Quirem, S.J., Lebeane, M., Panda, R., Song, S., John, L.K.: GPGPU benchmark suites: how well do they sample the performance spectrum? In: 2015 44th International Conference on Parallel Processing (ICPP), pp. 320–329. IEEE (2015) Ryoo, J.H., Quirem, S.J., Lebeane, M., Panda, R., Song, S., John, L.K.: GPGPU benchmark suites: how well do they sample the performance spectrum? In: 2015 44th International Conference on Parallel Processing (ICPP), pp. 320–329. IEEE (2015)
16.
go back to reference Che, S., Skadron, K.: BenchFriend: correlating the performance of GPU benchmarks. Int. J. High Perform. Comput. Appl. 28(2), 238–250 (2014)CrossRef Che, S., Skadron, K.: BenchFriend: correlating the performance of GPU benchmarks. Int. J. High Perform. Comput. Appl. 28(2), 238–250 (2014)CrossRef
17.
go back to reference Lopez-Novoa, U., Mendiburu, A., Miguel-Alonso, J.: A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans. Parallel Distrib. Syst. 26(1), 272–281 (2015)CrossRef Lopez-Novoa, U., Mendiburu, A., Miguel-Alonso, J.: A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans. Parallel Distrib. Syst. 26(1), 272–281 (2015)CrossRef
19.
go back to reference Konstantinidis, E., Cotronis, Y.: A practical performance model for compute and memory bound GPU kernels. In: 2015 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 651–658. IEEE (2015) Konstantinidis, E., Cotronis, Y.: A practical performance model for compute and memory bound GPU kernels. In: 2015 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 651–658. IEEE (2015)
20.
go back to reference Nugteren, C., van den Braak, G.-J., Corporaal, H.: Roofline-aware DVFS for GPUs. In: Proceedings of International Workshop on Adaptive Self-tuning Computing Systems, p. 8. ACM (2014) Nugteren, C., van den Braak, G.-J., Corporaal, H.: Roofline-aware DVFS for GPUs. In: Proceedings of International Workshop on Adaptive Self-tuning Computing Systems, p. 8. ACM (2014)
21.
go back to reference Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 235–246. IEEE (2010) Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 235–246. IEEE (2010)
22.
go back to reference Mei, X., Chu, X.: Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 28(1), 72–86 (2017)CrossRef Mei, X., Chu, X.: Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 28(1), 72–86 (2017)CrossRef
Metadata
Title
Cache-Aware Roofline Model and Medical Image Processing Optimizations in GPUs
Authors
Estefania Serrano
Aleksandar Ilic
Leonel Sousa
Javier Garcia-Blas
Jesus Carretero
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-02465-9_36

Premium Partner