Skip to main content

2018 | OriginalPaper | Buchkapitel

Data Partitioning Strategies for Stencil Computations on NUMA Systems

verfasst von : Frank Feinbube, Max Plauth, Marius Knaust, Andreas Polze

Erschienen in: Euro-Par 2017: Parallel Processing Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many scientific problems rely on the efficient execution of stencil computations, which are usually memory-bound. In this paper, stencils on two-dimensional data are executed on NUMA architectures. Each node of a NUMA system processes a distinct partition of the input data independent from other nodes. However, processors may need access to the memory of other nodes at the edges of the partitions. This paper demonstrates two techniques based on machine learning for identifying partitioning strategies that reduce the occurrence of remote memory access. One approach is generally applicable and is based on an uninformed search. The second approach caps the search space by employing geometric decomposition. The partitioning strategies obtained with these techniques are analyzed theoretically. Finally, an evaluation on a real NUMA machine is conducted, which demonstrates that the expected reduction of the remote memory accesses can be achieved.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abraham, S.G., Hudak, D.E.: Compile-time partitioning of iterative parallel loops to reduce cache coherency traffic. IEEE Trans. Parallel Distrib. Syst. 2(3), 318–328 (1991)CrossRef Abraham, S.G., Hudak, D.E.: Compile-time partitioning of iterative parallel loops to reduce cache coherency traffic. IEEE Trans. Parallel Distrib. Syst. 2(3), 318–328 (1991)CrossRef
2.
Zurück zum Zitat Datta, K.: Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. Ph.D. thesis, University of California, Berkeley (2009) Datta, K.: Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. Ph.D. thesis, University of California, Berkeley (2009)
3.
Zurück zum Zitat DeFlumere, A.: Optimal partitioning for parallel matrix computation on a small number of abstract heterogeneous processors. Ph.D. thesis, University College Dublin (2014) DeFlumere, A.: Optimal partitioning for parallel matrix computation on a small number of abstract heterogeneous processors. Ph.D. thesis, University College Dublin (2014)
4.
Zurück zum Zitat Dursun, H., Nomura, K.I., Wang, W., Kunaseth, M., Peng, L., Seymour, R., Kalia, R.K., Nakano, A., Vashishta, P.: In-core optimization of high-order stencil computations. In: PDPTA, pp. 533–538 (2009) Dursun, H., Nomura, K.I., Wang, W., Kunaseth, M., Peng, L., Seymour, R., Kalia, R.K., Nakano, A., Vashishta, P.: In-core optimization of high-order stencil computations. In: PDPTA, pp. 533–538 (2009)
5.
Zurück zum Zitat Hagen, W., Plauth, M., Eberhardt, F., Polze, A.: PGASUS: a framework for C++ application development on NUMA architectures. In: 2016 Fourth International Symposium on Computing and Networking (CANDAR), pp. 368–374. IEEE, Hiroshima, November 2016 Hagen, W., Plauth, M., Eberhardt, F., Polze, A.: PGASUS: a framework for C++ application development on NUMA architectures. In: 2016 Fourth International Symposium on Computing and Networking (CANDAR), pp. 368–374. IEEE, Hiroshima, November 2016
6.
Zurück zum Zitat Henretty, T., Veras, R., Franchetti, F., Pouchet, L.N., Ramanujam, J., Sadayappan, P.: A stencil compiler for short-vector SIMD architectures. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, pp. 13–24. ACM (2013) Henretty, T., Veras, R., Franchetti, F., Pouchet, L.N., Ramanujam, J., Sadayappan, P.: A stencil compiler for short-vector SIMD architectures. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, pp. 13–24. ACM (2013)
7.
Zurück zum Zitat Hewlett-Packard Development Company: Red Hat Linux NUMA Support for HP ProLiant Servers. Technical report. (2013). Accessed 1 Feb 2017 Hewlett-Packard Development Company: Red Hat Linux NUMA Support for HP ProLiant Servers. Technical report. (2013). Accessed 1 Feb 2017
8.
Zurück zum Zitat Jacobi, C.G.J.: Über ein leichtes Verfahren die in der Theorie der Säcularstörungen vorkommenden Gleichungen numerisch aufzulösen. Journal für die reine und angewandte Mathematik 30, 51–94 (1846)MathSciNetCrossRef Jacobi, C.G.J.: Über ein leichtes Verfahren die in der Theorie der Säcularstörungen vorkommenden Gleichungen numerisch aufzulösen. Journal für die reine und angewandte Mathematik 30, 51–94 (1846)MathSciNetCrossRef
10.
Zurück zum Zitat Knaust, M.: Partitioning 2D Data for Stencil Computations on NUMA Systems. Master’s thesis, Hasso Plattner Institute, University of Potsdam (2016) Knaust, M.: Partitioning 2D Data for Stencil Computations on NUMA Systems. Master’s thesis, Hasso Plattner Institute, University of Potsdam (2016)
11.
Zurück zum Zitat Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13. IEEE Computer Society (2010) Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13. IEEE Computer Society (2010)
13.
Zurück zum Zitat Plauth, M., Hagen, W., Feinbube, F., Eberhardt, F., Feinbube, L., Polze, A.: Parallel implementation strategies for hierarchical non-uniform memory access systems by example of the scale-invariant feature transform algorithm. In: IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 1351–1359. IEEE, Chicago, May 2016 Plauth, M., Hagen, W., Feinbube, F., Eberhardt, F., Feinbube, L., Polze, A.: Parallel implementation strategies for hierarchical non-uniform memory access systems by example of the scale-invariant feature transform algorithm. In: IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 1351–1359. IEEE, Chicago, May 2016
14.
Zurück zum Zitat Reed, D.A., Adams, L.M., Patrick, M.L.: Stencils and problem partitionings: their influence on the performance of multiple processor systems. IEEE Trans. Comput. 100(7), 845–858 (1987)CrossRef Reed, D.A., Adams, L.M., Patrick, M.L.: Stencils and problem partitionings: their influence on the performance of multiple processor systems. IEEE Trans. Comput. 100(7), 845–858 (1987)CrossRef
15.
Zurück zum Zitat Roth, G., Mellor-crummey, J., Kennedy, K., Brickner, R.G.: Compiling stencils in high performance Fortran. In: Supercomputing 1997: Proceedings of the 1997 ACM/IEEE conference on Supercomputing, pp. 1–20. ACM Press (1997) Roth, G., Mellor-crummey, J., Kennedy, K., Brickner, R.G.: Compiling stencils in high performance Fortran. In: Supercomputing 1997: Proceedings of the 1997 ACM/IEEE conference on Supercomputing, pp. 1–20. ACM Press (1997)
16.
Zurück zum Zitat Shaheen, M., Strzodka, R.: NUMA aware iterative stencil computations on many-core systems. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), pp. 461–473. IEEE (2012) Shaheen, M., Strzodka, R.: NUMA aware iterative stencil computations on many-core systems. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), pp. 461–473. IEEE (2012)
17.
Zurück zum Zitat Silicon Graphics International Corp: SGI UV 300H for SAP HANA (2015) Silicon Graphics International Corp: SGI UV 300H for SAP HANA (2015)
18.
Zurück zum Zitat Strzodka, R., Shaheen, M., Pajak, D., Seidel, H.P.: Cache oblivious parallelograms in iterative stencil computations. In: Proceedings of the 24th ACM International Conference on Supercomputing, pp. 49–59. ACM (2010) Strzodka, R., Shaheen, M., Pajak, D., Seidel, H.P.: Cache oblivious parallelograms in iterative stencil computations. In: Proceedings of the 24th ACM International Conference on Supercomputing, pp. 49–59. ACM (2010)
19.
Zurück zum Zitat Wellein, G., Hager, G., Zeiser, T., Wittmann, M., Fehske, H.: Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: 33rd Annual IEEE International Computer Software and Applications Conference, COMPSAC 2009, vol. 1, pp. 579-586. IEEE (2009) Wellein, G., Hager, G., Zeiser, T., Wittmann, M., Fehske, H.: Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: 33rd Annual IEEE International Computer Software and Applications Conference, COMPSAC 2009, vol. 1, pp. 579-586. IEEE (2009)
Metadaten
Titel
Data Partitioning Strategies for Stencil Computations on NUMA Systems
verfasst von
Frank Feinbube
Max Plauth
Marius Knaust
Andreas Polze
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-75178-8_48

Neuer Inhalt