Skip to main content

2018 | OriginalPaper | Buchkapitel

NUMA Optimizations for Algorithmic Skeletons

verfasst von : Paul Metzger, Murray Cole, Christian Fensch

Erschienen in: Euro-Par 2018: Parallel Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To address NUMA performance anomalies, programmers often resort to application specific optimizations that are not transferable to other programs, or to generic optimizations that do not perform well in all cases. Skeleton based programming models allow NUMA optimizations to be abstracted on a pattern-by-pattern basis, freeing programmers from this complexity. As a case study, we investigate computations that can be implemented with stencil skeletons. We present an analysis of the behavior of a range of simple and complex stencil programs from the NAS and Rodinia benchmark suites, under state-of-the-art NUMA aware page placement (PP) schemes. We show that even though an application (or skeleton) may have implemented the correct, intuitive scheduling of data and work to threads, the resulting performance can be disrupted by an inappropriate PP scheme. In contrast, we show that a NUMA PP-aware stencil implementation scheme can achieve speed ups of up to 2x over a similar scheme which uses the Linux default PP, and that this works across a set of complex stencil applications. Furthermore, we show that a supposed PP performance optimization in the Linux kernel never improves and in some cases degrades stencil performance by up to 0.27x and should therefore be deactivated by stencil skeleton implementations. Finally, we show that further speed ups of up to 1.1x can be achieved by addressing a work imbalance issue caused by poor conventional understanding of NUMA PP.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat McCurdy, C., Vetter, J.: Memphis: finding and fixing NUMA-related performance problems on multi-core platforms. In: Proceedings of ISPASS (2010) McCurdy, C., Vetter, J.: Memphis: finding and fixing NUMA-related performance problems on multi-core platforms. In: Proceedings of ISPASS (2010)
3.
Zurück zum Zitat van Riel, R., Chegu, V.: Automatic NUMA balancing. In: Red Hat Summit (2014) van Riel, R., Chegu, V.: Automatic NUMA balancing. In: Red Hat Summit (2014)
4.
Zurück zum Zitat Gaud, F., et al.: Challenges of memory management on modern NUMA system. Queue 13(8), 70 (2015) Gaud, F., et al.: Challenges of memory management on modern NUMA system. Queue 13(8), 70 (2015)
5.
Zurück zum Zitat Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)CrossRef Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)CrossRef
6.
Zurück zum Zitat González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)CrossRef González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)CrossRef
7.
Zurück zum Zitat Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of HLPP (2010) Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of HLPP (2010)
8.
Zurück zum Zitat Ribeiro, C.P., Mehaut, J.F., Carissimi, A., Castro, M., Fernandes, L.G.: Memory affinity for hierarchical shared memory multiprocessors. In: Proceedings of ICS (2009) Ribeiro, C.P., Mehaut, J.F., Carissimi, A., Castro, M., Fernandes, L.G.: Memory affinity for hierarchical shared memory multiprocessors. In: Proceedings of ICS (2009)
9.
Zurück zum Zitat Yang, R., Antony, J., Rendell, A., Robson, D., Strazdins, P.: Profiling directed NUMA optimization on Linux systems: a case study of the Gaussian computational chemistry code. In: Proceedings of IPDPS (2011) Yang, R., Antony, J., Rendell, A., Robson, D., Strazdins, P.: Profiling directed NUMA optimization on Linux systems: a case study of the Gaussian computational chemistry code. In: Proceedings of IPDPS (2011)
10.
Zurück zum Zitat Bircsak, J., et al.: Extending OpenMP for NUMA machines. In: Proceedings of ICS (2000) Bircsak, J., et al.: Extending OpenMP for NUMA machines. In: Proceedings of ICS (2000)
11.
Zurück zum Zitat Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P.A., Namyst, R.: ForestGOMP: an efficient OpenMP environment for NUMA architectures. Int. J. Parallel Program. 38(5), 418–439 (2010)CrossRef Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P.A., Namyst, R.: ForestGOMP: an efficient OpenMP environment for NUMA architectures. Int. J. Parallel Program. 38(5), 418–439 (2010)CrossRef
12.
Zurück zum Zitat Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of IISWC (2009) Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of IISWC (2009)
13.
Zurück zum Zitat Baily, D., et al.: The NAS parallel benchmarks. Technical report RNR-94-007, NASA Ames Research Center (1994) Baily, D., et al.: The NAS parallel benchmarks. Technical report RNR-94-007, NASA Ames Research Center (1994)
14.
Zurück zum Zitat Dashti, M., et al.: Traffic management: a holistic approach to memory placement on NUMA systems. In: ACM SIGPLAN Notices, vol. 48. ACM (2013) Dashti, M., et al.: Traffic management: a holistic approach to memory placement on NUMA systems. In: ACM SIGPLAN Notices, vol. 48. ACM (2013)
18.
Zurück zum Zitat Bolosky, W., Fitzgerald, R., Scott, M.: Simple but effective techniques for NUMA memory management. ACM SIGOPS Operat. Syst. Rev. 23(5), 19–31 (1989)CrossRef Bolosky, W., Fitzgerald, R., Scott, M.: Simple but effective techniques for NUMA memory management. ACM SIGOPS Operat. Syst. Rev. 23(5), 19–31 (1989)CrossRef
19.
Zurück zum Zitat Gaud, F., Lepers, B., Decouchant, J., Fuston, J., Fedorova, A., Quéma, V.: Large pages may be harmful on NUMA systems. In: Proceedings of USENIX ATC (2014) Gaud, F., Lepers, B., Decouchant, J., Fuston, J., Fedorova, A., Quéma, V.: Large pages may be harmful on NUMA systems. In: Proceedings of USENIX ATC (2014)
21.
Zurück zum Zitat Christen, M., Schenk, O., Burkhart, H.: PATUS: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: Proceedings of IPDPS (2011) Christen, M., Schenk, O., Burkhart, H.: PATUS: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: Proceedings of IPDPS (2011)
22.
Zurück zum Zitat Kamil, S., Chan, C., Oliker, L., Shalf, J., Williams, S.: An auto-tuning framework for parallel multicore stencil computations. In: Proceedings of IPDPS (2010) Kamil, S., Chan, C., Oliker, L., Shalf, J., Williams, S.: An auto-tuning framework for parallel multicore stencil computations. In: Proceedings of IPDPS (2010)
23.
Zurück zum Zitat Shaheen, M., Strzodka, R.: NUMA aware iterative stencil computations on many-core systems. In: Proceedings of IPDPS (2012) Shaheen, M., Strzodka, R.: NUMA aware iterative stencil computations on many-core systems. In: Proceedings of IPDPS (2012)
25.
Zurück zum Zitat Pilla, L.L., et al.: Improving parallel system performance with a NUMA-aware load balancer. Technical report TR-JLPC-11-02, INRIA-Illinois Joint Laboratory on Petascale Computing (2011) Pilla, L.L., et al.: Improving parallel system performance with a NUMA-aware load balancer. Technical report TR-JLPC-11-02, INRIA-Illinois Joint Laboratory on Petascale Computing (2011)
26.
Zurück zum Zitat Chen, Q., Guo, M., Guan, H.: LAWS: locality-aware work-stealing for multi-socket multi-core architectures. In: Proceedings of the International Conference on Supercomputing (2014) Chen, Q., Guo, M., Guan, H.: LAWS: locality-aware work-stealing for multi-socket multi-core architectures. In: Proceedings of the International Conference on Supercomputing (2014)
27.
Zurück zum Zitat Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. IJHPCA 26(2), 110–124 (2012) Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. IJHPCA 26(2), 110–124 (2012)
Metadaten
Titel
NUMA Optimizations for Algorithmic Skeletons
verfasst von
Paul Metzger
Murray Cole
Christian Fensch
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-96983-1_42