Skip to main content
Top
Published in: The Journal of Supercomputing 4/2017

26-10-2016

Topology mapping of irregular parallel applications on torus-connected supercomputers

Authors: Jingjin Wu, Xuanxing Xiong, Eduardo Berrocal, Jia Wang, Zhiling Lan

Published in: The Journal of Supercomputing | Issue 4/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Supercomputers with ever increasing computing power are being built for scientific applications. As the system size scales up, so does the size of interconnect network. As a result, communication in supercomputers becomes increasingly expensive due to the long distance between nodes and network contention. Topology mapping, which maps parallel application processes onto compute nodes by considering network topology and application communication pattern, is an essential technique for communication optimization. In this paper, we study the topology mapping problem for torus-connected supercomputers, and present an analytical topology mapping algorithm for parallel applications with irregular communication patterns. We consider our problem as a discrete optimization problem in the geometric domain of a torus topology, and design an analytical mapping algorithm, which uses numerical solvers to compute the mapping. Experimental results show that our algorithm provides high-quality mappings on 3-dimensional torus, which significantly reduce the communication time by up to 72%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
The physical meaning of Eq. (2) is introduced below. The communication graph of the application is modeled as a spring system, where each edge \((i,j)\in E_c\) is represented as a spring with corresponding spring constant being c(ij). The total energy of the springs is a quadratic function of their lengths. A mapping solution is obtained by minimizing the total energy to find a force equilibrium state.
 
Literature
1.
go back to reference Abdel-Gawad AH, Thottethodi M, Bhatele A (2014) RAHTM: routing algorithm aware hierarchical task mapping. In: Proc. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), p 325–335 Abdel-Gawad AH, Thottethodi M, Bhatele A (2014) RAHTM: routing algorithm aware hierarchical task mapping. In: Proc. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), p 325–335
2.
go back to reference Abts D (2011) The Cray XT4 and Seastar 3-D Torus interconnect. Encyclopedia of Parallel Computing, p 470–477 Abts D (2011) The Cray XT4 and Seastar 3-D Torus interconnect. Encyclopedia of Parallel Computing, p 470–477
3.
go back to reference Agarwal T, Sharma A, Laxmikant A, Kale LV (2006) Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS) Agarwal T, Sharma A, Laxmikant A, Kale LV (2006) Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS)
5.
go back to reference Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269CrossRefMATH Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269CrossRefMATH
6.
go back to reference Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proc. the 7th Annual International High Performance Computing Conference. The 1993 High Performance Computing: New Horizons Supercomputing Symposium, p 349–357 Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proc. the 7th Annual International High Performance Computing Conference. The 1993 High Performance Computing: New Horizons Supercomputing Symposium, p 349–357
7.
go back to reference Berman F, Snyder L (1987) On mapping parallel algorithms into parallel architectures. J Parallel Distrib Comput 4(5):439–458CrossRef Berman F, Snyder L (1987) On mapping parallel algorithms into parallel architectures. J Parallel Distrib Comput 4(5):439–458CrossRef
8.
go back to reference Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114CrossRef Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114CrossRef
9.
go back to reference Bhatele A (2010) Automating topology aware mapping for supercomputers. Ph.D. thesis, University of Illinois at Urbana-Champaign, Urbana Bhatele A (2010) Automating topology aware mapping for supercomputers. Ph.D. thesis, University of Illinois at Urbana-Champaign, Urbana
10.
go back to reference Bhatele A, Gamblin T, Langer SH, Bremer PT, Draeger EW, Hamann B, Isaacs KE, Landge AG, Levine JA, Pascucci V, Schulz M, Still CH (2012) Mapping applications with collectives over sub-communicators on torus networks. In: Proc. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), p 97:1–97:11 Bhatele A, Gamblin T, Langer SH, Bremer PT, Draeger EW, Hamann B, Isaacs KE, Landge AG, Levine JA, Pascucci V, Schulz M, Still CH (2012) Mapping applications with collectives over sub-communicators on torus networks. In: Proc. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), p 97:1–97:11
12.
go back to reference Boyd S, Vandenberghe L (2009) Convex optimization. Cambridge University Press, CambridgeMATH Boyd S, Vandenberghe L (2009) Convex optimization. Cambridge University Press, CambridgeMATH
13.
go back to reference Butz AR (1971) Alternative algorithm for Hilbert’s space-filling curve. IEEE Trans Comput C–20(4):424–426CrossRefMATH Butz AR (1971) Alternative algorithm for Hilbert’s space-filling curve. IEEE Trans Comput C–20(4):424–426CrossRefMATH
14.
go back to reference Chen Y, Davis TA, Hager WW, Rajamanickam S (2008) Algorithm 887: CHOLMOD, supernodal sparse cholesky factorization and update/downdate. ACM Trans Math Softw 35(3):22:1–22:14CrossRefMathSciNet Chen Y, Davis TA, Hager WW, Rajamanickam S (2008) Algorithm 887: CHOLMOD, supernodal sparse cholesky factorization and update/downdate. ACM Trans Math Softw 35(3):22:1–22:14CrossRefMathSciNet
15.
go back to reference Chockalingam T, Arunkumar S (1992) A randomized heuristics for the mapping problem: the genetic approach. Parallel Comput 18(10):1157–1165CrossRefMATH Chockalingam T, Arunkumar S (1992) A randomized heuristics for the mapping problem: the genetic approach. Parallel Comput 18(10):1157–1165CrossRefMATH
16.
go back to reference Chung IH, Lee CR, Zhou J, Chung YC (2011) Hierarchical mapping for HPC applications. In: Proc. IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), p 1815–1823 Chung IH, Lee CR, Zhou J, Chung YC (2011) Hierarchical mapping for HPC applications. In: Proc. IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), p 1815–1823
17.
go back to reference Davis TA, Hu Y (2011) The university of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25MathSciNet Davis TA, Hu Y (2011) The university of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25MathSciNet
18.
go back to reference Deveci M, Rajamanickam S, Leung VJ, Pedretti K, Olivier SL, Bunde DP, Çatalyürek UV, Devine K (2014) Exploiting geometric partitioning in task mapping for parallel computers. In: Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), p 27–36 Deveci M, Rajamanickam S, Leung VJ, Pedretti K, Olivier SL, Bunde DP, Çatalyürek UV, Devine K (2014) Exploiting geometric partitioning in task mapping for parallel computers. In: Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), p 27–36
19.
go back to reference Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proc. the Third Conference on Hypercube Concurrent Computers and Applications: Architecture, Software, Computer Systems, and General Issues, vol 1, C3P, p 210–221 Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proc. the Third Conference on Hypercube Concurrent Computers and Applications: Architecture, Software, Computer Systems, and General Issues, vol 1, C3P, p 210–221
20.
go back to reference Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore, LondonMATH Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore, LondonMATH
21.
go back to reference Hoefler T, Snir M (2011) Generic topology mapping strategies for large-scale parallel architectures. In: Proc. the International Conference on Supercomputing (ICS), p 75–84 Hoefler T, Snir M (2011) Generic topology mapping strategies for large-scale parallel architectures. In: Proc. the International Conference on Supercomputing (ICS), p 75–84
22.
go back to reference Hu YF, Blake RJ, Emerson DR (1998) An optimal migration algorithm for dynamic load balancing. Concurr Pract Exp 10(6):467–483CrossRefMATH Hu YF, Blake RJ, Emerson DR (1998) An optimal migration algorithm for dynamic load balancing. Concurr Pract Exp 10(6):467–483CrossRefMATH
24.
go back to reference Jeannot E, Mercier G, Tessier F (2014) Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans Parallel Distrib Syst 25(4):993–1002CrossRef Jeannot E, Mercier G, Tessier F (2014) Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans Parallel Distrib Syst 25(4):993–1002CrossRef
25.
go back to reference Kravtsov AV, Klypin AA, Khokhlov AM (1997) Adaptive refinement tree: a new high-resolution N-body code for cosmological simulations. Astrophys J Suppl Ser 111:73–94CrossRef Kravtsov AV, Klypin AA, Khokhlov AM (1997) Adaptive refinement tree: a new high-resolution N-body code for cosmological simulations. Astrophys J Suppl Ser 111:73–94CrossRef
26.
go back to reference Lee C, Bic L (1989) On the mapping problem using simulated annealing. In: Proc. International Phoenix Conference on Computers and Communications, p 40–44. doi:10.1109/PCCC.1989.37357 Lee C, Bic L (1989) On the mapping problem using simulated annealing. In: Proc. International Phoenix Conference on Computers and Communications, p 40–44. doi:10.​1109/​PCCC.​1989.​37357
29.
go back to reference Pellegrini F (1994) Static mapping by dual recursive bipartitioning of process architecture graphs. In: Proc. the Scalable High-Performance Computing Conference, p 486–493 Pellegrini F (1994) Static mapping by dual recursive bipartitioning of process architecture graphs. In: Proc. the Scalable High-Performance Computing Conference, p 486–493
30.
go back to reference Plewa T, Linde T, Weirs VG (2005) Adaptive mesh refinement-theory and applications. Springer, BerlinCrossRefMATH Plewa T, Linde T, Weirs VG (2005) Adaptive mesh refinement-theory and applications. Springer, BerlinCrossRefMATH
31.
go back to reference Salman A, Ahmad I, Al-Madani S (2002) Particle swarm optimization for task assignment problem. Microprocess Microsyst 26(8):363–371CrossRef Salman A, Ahmad I, Al-Madani S (2002) Particle swarm optimization for task assignment problem. Microprocess Microsyst 26(8):363–371CrossRef
32.
go back to reference Spielman D, Teng SH (2003) Solving sparse, symmetric, diagonally-dominant linear systems in time o(m1.31). In: Proc. IEEE Symposium on Foundations of Computer Science, p 416–427 Spielman D, Teng SH (2003) Solving sparse, symmetric, diagonally-dominant linear systems in time o(m1.31). In: Proc. IEEE Symposium on Foundations of Computer Science, p 416–427
33.
go back to reference Träff JL (2002) Implementing the MPI process topology mechanism. In: Proc. ACM/IEEE Conference on Supercomputing, p 28:1–28:14 Träff JL (2002) Implementing the MPI process topology mechanism. In: Proc. ACM/IEEE Conference on Supercomputing, p 28:1–28:14
36.
go back to reference Viswanathan N, Chu CCN (2004) FastPlace: Efficient analytical placement using cell shifting, iterative local refinement and a hybrid net model. In: Proc. International Symposium on Physical Design, p 26–33 Viswanathan N, Chu CCN (2004) FastPlace: Efficient analytical placement using cell shifting, iterative local refinement and a hybrid net model. In: Proc. International Symposium on Physical Design, p 26–33
37.
go back to reference Viswanathan N, Chu CCN (2005) FastPlace: efficient analytical placement using cell shifting, iterative local refinement, and a hybrid net model. IEEE Trans Comput Aided Design 24(5):722–733CrossRef Viswanathan N, Chu CCN (2005) FastPlace: efficient analytical placement using cell shifting, iterative local refinement, and a hybrid net model. IEEE Trans Comput Aided Design 24(5):722–733CrossRef
38.
go back to reference Wallace S, Vishwanath V, Coghlan S, Tramm J, Lan Z, Papkay M (2013) Application power profiling on IBM Blue Gene/Q. In: Proc. IEEE International Conference on Cluster Computing (CLUSTER), p 1–8 Wallace S, Vishwanath V, Coghlan S, Tramm J, Lan Z, Papkay M (2013) Application power profiling on IBM Blue Gene/Q. In: Proc. IEEE International Conference on Cluster Computing (CLUSTER), p 1–8
39.
go back to reference Wu J, Gonzalez RE, Lan Z, Gnedin NY, Kravtsov AV, Rudd DH, Yu Y (2011) Performance emulation of cell-based AMR cosmology simulations. In: Proc. IEEE International Conference on Cluster Computing (CLUSTER), p 8–16 Wu J, Gonzalez RE, Lan Z, Gnedin NY, Kravtsov AV, Rudd DH, Yu Y (2011) Performance emulation of cell-based AMR cosmology simulations. In: Proc. IEEE International Conference on Cluster Computing (CLUSTER), p 8–16
40.
go back to reference Wu J, Lan Z, Xiong X, Gnedin NY, Kravtsov AV (2012) Hierarchical task mapping of cell-based AMR cosmology simulations. In: Proc. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), SC ’12, p 75:1–75:10 Wu J, Lan Z, Xiong X, Gnedin NY, Kravtsov AV (2012) Hierarchical task mapping of cell-based AMR cosmology simulations. In: Proc. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), SC ’12, p 75:1–75:10
41.
go back to reference Wu J, Xiong X, Lan Z (2015) Hierarchical task mapping for parallel applications on supercomputers. J Supercomput 71(5):1776–1802CrossRef Wu J, Xiong X, Lan Z (2015) Hierarchical task mapping for parallel applications on supercomputers. J Supercomput 71(5):1776–1802CrossRef
42.
go back to reference Yu H, Chung IH, Moreira J (2006) Topology mapping for Blue Gene/L supercomputer. In: Proc. ACM/IEEE Conference on Supercomputing, p 52. doi:10.1109/SC.2006.63 Yu H, Chung IH, Moreira J (2006) Topology mapping for Blue Gene/L supercomputer. In: Proc. ACM/IEEE Conference on Supercomputing, p 52. doi:10.​1109/​SC.​2006.​63
43.
go back to reference Yu Y, Rudd DH, Lan Z, Gnedin NY, Kravtsov AV, Wu J (2012) Improving parallel IO performance of cell-based AMR cosmology applications. In: Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), p 933–944 Yu Y, Rudd DH, Lan Z, Gnedin NY, Kravtsov AV, Wu J (2012) Improving parallel IO performance of cell-based AMR cosmology applications. In: Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), p 933–944
Metadata
Title
Topology mapping of irregular parallel applications on torus-connected supercomputers
Authors
Jingjin Wu
Xuanxing Xiong
Eduardo Berrocal
Jia Wang
Zhiling Lan
Publication date
26-10-2016
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 4/2017
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-016-1876-7

Other articles of this Issue 4/2017

The Journal of Supercomputing 4/2017 Go to the issue

Premium Partner