Skip to main content
Erschienen in: The Journal of Supercomputing 5/2015

01.05.2015

Hierarchical task mapping for parallel applications on supercomputers

verfasst von: Jingjin Wu, Xuanxing Xiong, Zhiling Lan

Erschienen in: The Journal of Supercomputing | Ausgabe 5/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As the scale of supercomputers grows, so does the size of the interconnect network. Topology-aware task mapping, which maps parallel application processes onto processors to reduce communication cost, becomes increasingly important. Previous works mainly focus on the task mapping between compute nodes (i.e., inter-node mapping), while ignoring the mapping within a node (i.e., intra-node mapping). In this paper, we propose a hierarchical task mapping strategy, which performs both inter-node and intra-node mapping. We consider supercomputers with popular fat-tree and torus network topologies, and introduce two mapping algorithms: (1) a generic recursive tree mapping algorithm, which can handle both inter-node mapping and intra-node mapping; (2) a recursive bipartitioning mapping algorithm for torus topology, which efficiently partitions the compute nodes according to their coordinates. Moreover, a hierarchical task mapping library is developed. Experimental results show that the proposed approach significantly improves the communication performance by up to 77 % with low runtime overhead.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
We use “task mapping” and “topology mapping” interchangeably.
 
2
Note that other graph partitioning algorithms may have different time complexities.
 
3
On Blue Gene/P systems, if the number of allocated nodes is less than \(512\), then their network topology is a mesh.
 
4
LibTopMap failed to derive mapping solutions when the number of processes is larger than 1024.
 
Literatur
9.
Zurück zum Zitat Abts D (2011) The Cray XT4 and Seastar 3-D Torus interconnect. Encycl Parallel Comput 4:470–477. Abts D (2011) The Cray XT4 and Seastar 3-D Torus interconnect. Encycl Parallel Comput 4:470–477.
10.
Zurück zum Zitat Agarwal T, Sharma A, Laxmikant A, Kale LV (2006) Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS) Agarwal T, Sharma A, Laxmikant A, Kale LV (2006) Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS)
12.
Zurück zum Zitat Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Int J Eurogr Assoc (Computer Graphics Forum) 8(1):3–12CrossRef Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Int J Eurogr Assoc (Computer Graphics Forum) 8(1):3–12CrossRef
13.
Zurück zum Zitat Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium, pp 349–357. Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium, pp 349–357.
14.
Zurück zum Zitat Arimilli B, Arimilli R, Chung V, Clark S, Denzel W, Drerup B, Hoefler T, Joyner J, Lewis J, Li J, Ni N, Rajamony R (2010) The PERCS high-performance interconnect. In: Proceedings of the 18th IEEE symposium on high performance interconnects, pp 75–82 Arimilli B, Arimilli R, Chung V, Clark S, Denzel W, Drerup B, Hoefler T, Joyner J, Lewis J, Li J, Ni N, Rajamony R (2010) The PERCS high-performance interconnect. In: Proceedings of the 18th IEEE symposium on high performance interconnects, pp 75–82
15.
Zurück zum Zitat Berman F, Snyder L (1987) On mapping parallel algorithms into parallel architectures. J Parallel Distrib Comput 4(5):439–458CrossRef Berman F, Snyder L (1987) On mapping parallel algorithms into parallel architectures. J Parallel Distrib Comput 4(5):439–458CrossRef
16.
Zurück zum Zitat Bhatele A (2010) Automating topology aware mapping for supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, Urbana. Bhatele A (2010) Automating topology aware mapping for supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, Urbana.
17.
Zurück zum Zitat Bhatele A, Kale LV (2008) Application-specific topology-aware mapping for three dimensional topologies. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 1–8 Bhatele A, Kale LV (2008) Application-specific topology-aware mapping for three dimensional topologies. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 1–8
19.
Zurück zum Zitat Broquedis F, Clet-Ortega J, Moreaud S, Furmento N, Goglin B, Mercier G, Thibault S, Namyst R (2010) hwloc: a generic framework for managing hardware affinities in hpc applications. In: Proceedings of the 18th euromicro international conference on parallel, distributed and network-based processing (PDP), pp 180–186. doi:10.1109/PDP.2010.67 Broquedis F, Clet-Ortega J, Moreaud S, Furmento N, Goglin B, Mercier G, Thibault S, Namyst R (2010) hwloc: a generic framework for managing hardware affinities in hpc applications. In: Proceedings of the 18th euromicro international conference on parallel, distributed and network-based processing (PDP), pp 180–186. doi:10.​1109/​PDP.​2010.​67
20.
Zurück zum Zitat Chockalingam T, Arunkumar S (1992) A randomized heuristics for the mapping problem: the genetic approach. Parallel Comput 18(10):1157–1165CrossRefMATH Chockalingam T, Arunkumar S (1992) A randomized heuristics for the mapping problem: the genetic approach. Parallel Comput 18(10):1157–1165CrossRefMATH
21.
Zurück zum Zitat Chung IH, Lee CR, Zhou J, Chung YC (2011) Hierarchical mapping for HPC applications. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW), pp 1815–1823 Chung IH, Lee CR, Zhou J, Chung YC (2011) Hierarchical mapping for HPC applications. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW), pp 1815–1823
22.
Zurück zum Zitat Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 24th National Conference, pp 157–172. Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 24th National Conference, pp 157–172.
23.
Zurück zum Zitat Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25MathSciNet Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25MathSciNet
24.
Zurück zum Zitat Deveci M, Rajamanickam S, Leung VJ, Pedretti K, Olivier SL, Bunde DP, Çatalyürek UV, Devine K (2014) Exploiting geometric partitioning in task mapping for parallel computers. In: Proceedings of the 2014 IEEE 28th international parallel and distributed processing symposium, pp 27–36 Deveci M, Rajamanickam S, Leung VJ, Pedretti K, Olivier SL, Bunde DP, Çatalyürek UV, Devine K (2014) Exploiting geometric partitioning in task mapping for parallel computers. In: Proceedings of the 2014 IEEE 28th international parallel and distributed processing symposium, pp 27–36
25.
Zurück zum Zitat Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proceedings of the third conference on hypercube concurrent computers and applications: architecture, software, computer systems, and general issues, vol 1, no C3P, pp 210–221 Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proceedings of the third conference on hypercube concurrent computers and applications: architecture, software, computer systems, and general issues, vol 1, no C3P, pp 210–221
26.
Zurück zum Zitat Hoefler T, Snir M (2011) Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of the international conference on supercomputing (ICS), pp 75–84 Hoefler T, Snir M (2011) Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of the international conference on supercomputing (ICS), pp 75–84
27.
Zurück zum Zitat Jeannot E, Mercier G (2010) Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: Proceedings of the 16th International euro-par conference on parallel processing: part II, pp 199–210 Jeannot E, Mercier G (2010) Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: Proceedings of the 16th International euro-par conference on parallel processing: part II, pp 199–210
28.
Zurück zum Zitat Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96–129CrossRefMathSciNet Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96–129CrossRefMathSciNet
29.
Zurück zum Zitat Kravtsov AV, Klypin AA, Khokhlov AM (1997) Adaptive refinement tree: a new high-resolution N-body code for cosmological simulations. Astrophys J Suppl Ser 111:73–94CrossRef Kravtsov AV, Klypin AA, Khokhlov AM (1997) Adaptive refinement tree: a new high-resolution N-body code for cosmological simulations. Astrophys J Suppl Ser 111:73–94CrossRef
30.
Zurück zum Zitat Lee C, Bic L (1989) On the mapping problem using simulated annealing. In: Proceedings of international phoenix conference on computers and communications, pp 40–44. doi:10.1109/PCCC.1989.37357 Lee C, Bic L (1989) On the mapping problem using simulated annealing. In: Proceedings of international phoenix conference on computers and communications, pp 40–44. doi:10.​1109/​PCCC.​1989.​37357
31.
Zurück zum Zitat Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput 34(10):892–901CrossRef Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput 34(10):892–901CrossRef
32.
Zurück zum Zitat Mercier G, Jeannot E (2011) Improving MPI applications performance on multicore clusters with rank reordering. In: Proceedings of the 18th European MPI users’ group conference on recent advances in the message passing interface, pp 39–49 Mercier G, Jeannot E (2011) Improving MPI applications performance on multicore clusters with rank reordering. In: Proceedings of the 18th European MPI users’ group conference on recent advances in the message passing interface, pp 39–49
33.
Zurück zum Zitat Pellegrini F (1994) Static mapping by dual recursive bipartitioning of process architecture graphs. In: Proceedings of the scalable high-performance computing conference, pp 486–493 Pellegrini F (1994) Static mapping by dual recursive bipartitioning of process architecture graphs. In: Proceedings of the scalable high-performance computing conference, pp 486–493
34.
Zurück zum Zitat pellegrini F, Roman J (1996) Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. High-performance computing and networking. Lecture notes in computer science, vol 1067, pp 493–498 pellegrini F, Roman J (1996) Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. High-performance computing and networking. Lecture notes in computer science, vol 1067, pp 493–498
35.
Zurück zum Zitat Plewa T, Linde T, Weirs VG (2005) Adaptive mesh refinement: theory and applications. Springer, BerlinCrossRef Plewa T, Linde T, Weirs VG (2005) Adaptive mesh refinement: theory and applications. Springer, BerlinCrossRef
36.
Zurück zum Zitat Rashti MJ, Green J, Balaji P, Afsahi A, Gropp W (2011) Multi-core and network aware MPI topology functions. In: Proceedings of the 18th European MPI users’ group conference on recent advances in the message passing interface, pp 50–60 Rashti MJ, Green J, Balaji P, Afsahi A, Gropp W (2011) Multi-core and network aware MPI topology functions. In: Proceedings of the 18th European MPI users’ group conference on recent advances in the message passing interface, pp 50–60
37.
Zurück zum Zitat Salman A, Ahmad I, Al-Madani S (2002) Particle swarm optimization for task assignment problem. Microprocess Microsyst 26(8):363–371CrossRef Salman A, Ahmad I, Al-Madani S (2002) Particle swarm optimization for task assignment problem. Microprocess Microsyst 26(8):363–371CrossRef
38.
Zurück zum Zitat Subramoni H, Potluri S, Kandalla K, Barth B, Vienne J, Keasler J, Tomko K, Schulz K, Moody A, Panda D (2012) Design of a scalable infiniband topology service to enable network-topology-aware placement of processes. In: Proceedings of international conference on high performance computing, networking, storage and analysis, pp 1–12. doi:10.1109/SC.2012.47 Subramoni H, Potluri S, Kandalla K, Barth B, Vienne J, Keasler J, Tomko K, Schulz K, Moody A, Panda D (2012) Design of a scalable infiniband topology service to enable network-topology-aware placement of processes. In: Proceedings of international conference on high performance computing, networking, storage and analysis, pp 1–12. doi:10.​1109/​SC.​2012.​47
39.
Zurück zum Zitat Tang W, Lan Z, Desai N, Buettner D, Yu Y (2011) Reducing fragmentation on torus-connected supercomputers. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 828–839 Tang W, Lan Z, Desai N, Buettner D, Yu Y (2011) Reducing fragmentation on torus-connected supercomputers. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 828–839
40.
Zurück zum Zitat Tr\(\ddot{a}\)ff JL (2002) Implementing the MPI process topology mechanism. In: Proceedings of ACM/IEEE conference on supercomputing, pp 28:1–28:14 Tr\(\ddot{a}\)ff JL (2002) Implementing the MPI process topology mechanism. In: Proceedings of ACM/IEEE conference on supercomputing, pp 28:1–28:14
41.
Zurück zum Zitat Wu J, Gonzalez RE, Lan Z, Gnedin NY, Kravtsov AV, Rudd DH, Yu Y (2011) Performance emulation of cell-based AMR cosmology simulations. In: Proceedings of IEEE International conference on cluster computing (CLUSTER), pp 8–16 Wu J, Gonzalez RE, Lan Z, Gnedin NY, Kravtsov AV, Rudd DH, Yu Y (2011) Performance emulation of cell-based AMR cosmology simulations. In: Proceedings of IEEE International conference on cluster computing (CLUSTER), pp 8–16
42.
Zurück zum Zitat Wu J, Lan Z, Xiong X, Gnedin NY, Kravtsov AV (2012) Hierarchical task mapping of cell-based AMR cosmology simulations. In: Proceedings of international conference on high performance computing, networking, storage and analysis, SC ’12, pp 75:1–75:10 Wu J, Lan Z, Xiong X, Gnedin NY, Kravtsov AV (2012) Hierarchical task mapping of cell-based AMR cosmology simulations. In: Proceedings of international conference on high performance computing, networking, storage and analysis, SC ’12, pp 75:1–75:10
43.
Zurück zum Zitat Yu H, Chung IH, Moreira J (2006) Topology mapping for Blue Gene/L supercomputer. In: Proceedings of ACM/IEEE conference on supercomputing, p 52 Yu H, Chung IH, Moreira J (2006) Topology mapping for Blue Gene/L supercomputer. In: Proceedings of ACM/IEEE conference on supercomputing, p 52
44.
Zurück zum Zitat Yu Y, Rudd DH, Lan Z, Gnedin NY, Kravtsov AV, Wu J (2012) Improving parallel IO performance of cell-based AMR cosmology applications. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 933–944 Yu Y, Rudd DH, Lan Z, Gnedin NY, Kravtsov AV, Wu J (2012) Improving parallel IO performance of cell-based AMR cosmology applications. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 933–944
45.
Zurück zum Zitat Zou H, Yu Y, Tang W, Chen HWM (2014) Flexanalytics: a flexible data analytics framework for big data applications with I/O performance improvement. Big Data Res 1(0): 4–13 (Special Issue on Scalable Computing for Big Data) Zou H, Yu Y, Tang W, Chen HWM (2014) Flexanalytics: a flexible data analytics framework for big data applications with I/O performance improvement. Big Data Res 1(0): 4–13 (Special Issue on Scalable Computing for Big Data)
Metadaten
Titel
Hierarchical task mapping for parallel applications on supercomputers
verfasst von
Jingjin Wu
Xuanxing Xiong
Zhiling Lan
Publikationsdatum
01.05.2015
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 5/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1324-5

Weitere Artikel der Ausgabe 5/2015

The Journal of Supercomputing 5/2015 Zur Ausgabe