Skip to main content

2019 | OriginalPaper | Buchkapitel

Scalable Work-Stealing Load-Balancer for HPC Distributed Memory Systems

verfasst von : Clement Fontenaille, Eric Petit, Pablo de Oliveira Castro, Seijilo Uemura, Devan Sohier, Piotr Lesnicki, Ghislain Lartigue, Vincent Moureau

Erschienen in: Euro-Par 2018: Parallel Processing Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Work-stealing schedulers are common in shared memory environments. However, large scale distributed memory usage has been limited to specific ad-hoc implementations preventing a broader adoption. In this paper we introduce a new scalable work-stealing algorithm for distributed memory systems as well as our implementation as the TITUS_DLB library. It is based on Kleinberg’s small-world graph. It allows to control the communication patterns and associated runtime overheads while providing efficient heuristics for victim selection and results routing. To validate our approach, we present the DLB_Bench benchmark which emulates arbitrary workload distribution and imbalance characteristics. Finally, we compare TITUS_DLB to the ad-hoc solution developed for the YALES2 computational fluid dynamics and combustion solver. We achieve up to 54% performance gain over thousands of cores.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Acun, B., et al.: Parallel programming with migratable objects: Charm++ in practice. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 647–658. IEEE Press (2014) Acun, B., et al.: Parallel programming with migratable objects: Charm++ in practice. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 647–658. IEEE Press (2014)
2.
Zurück zum Zitat Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)CrossRef Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)CrossRef
3.
Zurück zum Zitat Bader, D.: Designing scalable synthetic compact applications for benchmarking high productivity computing systems. Cyberinfrastructure Technol. Watch. 2, 1–10 (2006) Bader, D.: Designing scalable synthetic compact applications for benchmarking high productivity computing systems. Cyberinfrastructure Technol. Watch. 2, 1–10 (2006)
4.
Zurück zum Zitat Berenbrink, P., Friedetzky, T., Goldberg, L.A.: The natural work-stealing algorithm is stable. SIAM J. Comput. 32(5), 1260–1279 (2003)MathSciNetCrossRef Berenbrink, P., Friedetzky, T., Goldberg, L.A.: The natural work-stealing algorithm is stable. SIAM J. Comput. 32(5), 1260–1279 (2003)MathSciNetCrossRef
5.
Zurück zum Zitat Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM (JACM) 46(5), 720–748 (1999)MathSciNetCrossRef Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM (JACM) 46(5), 720–748 (1999)MathSciNetCrossRef
6.
Zurück zum Zitat Broquedis, F., et al.: hwloc: a generic framework for managing hardware affinities in HPC applications. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 180–186. IEEE (2010) Broquedis, F., et al.: hwloc: a generic framework for managing hardware affinities in HPC applications. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 180–186. IEEE (2010)
7.
Zurück zum Zitat Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, p. 53. ACM (2009) Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, p. 53. ACM (2009)
8.
Zurück zum Zitat Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: ACM Sigplan Notices, vol. 33, pp. 212–223. ACM (1998) Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: ACM Sigplan Notices, vol. 33, pp. 212–223. ACM (1998)
9.
Zurück zum Zitat Gautier, T., Lima, J.V., Maillard, N., Raffin, B.: Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1299–1308. IEEE (2013) Gautier, T., Lima, J.V., Maillard, N., Raffin, B.: Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1299–1308. IEEE (2013)
10.
Zurück zum Zitat Grünewald, D., Simmendinger, C.: The GASPI API specification and its implementation GPI 2.0. In: 7th International Conference on PGAS Programming Models, vol. 243 (2013) Grünewald, D., Simmendinger, C.: The GASPI API specification and its implementation GPI 2.0. In: 7th International Conference on PGAS Programming Models, vol. 243 (2013)
11.
Zurück zum Zitat Kleinberg, J.: The small-world phenomenon: an algorithmic perspective. In: Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, pp. 163–170 (2000) Kleinberg, J.: The small-world phenomenon: an algorithmic perspective. In: Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, pp. 163–170 (2000)
12.
Zurück zum Zitat Kleinberg, J., Rubinfeld, R.: Short paths in expander graphs. In: Proceedings of the 37th Annual Symposium on Foundations of Computer Science, pp. 86–95. IEEE (1996) Kleinberg, J., Rubinfeld, R.: Short paths in expander graphs. In: Proceedings of the 37th Annual Symposium on Foundations of Computer Science, pp. 86–95. IEEE (1996)
13.
Zurück zum Zitat Kukanov, A., Voss, M.J.: The foundations for scalable multi-core software in Intel Threading Building Blocks. Intel Technol. J. 11(4), 309–322 (2007)CrossRef Kukanov, A., Voss, M.J.: The foundations for scalable multi-core software in Intel Threading Building Blocks. Intel Technol. J. 11(4), 309–322 (2007)CrossRef
14.
Zurück zum Zitat Lusk, E.L., Pieper, S.C., Butler, R.M., et al.: More scalability, less pain: a simple programming model and its implementation for extreme computing. SciDAC Rev. 17(1), 30–37 (2010) Lusk, E.L., Pieper, S.C., Butler, R.M., et al.: More scalability, less pain: a simple programming model and its implementation for extreme computing. SciDAC Rev. 17(1), 30–37 (2010)
15.
Zurück zum Zitat Machado, R., Lojewski, C., Abreu, S., Pfreundt, F.J.: Unbalanced tree search on a manycore system using the GPI programming model. Comput. Sci. Res. Dev. 26, 229–236 (2011)CrossRef Machado, R., Lojewski, C., Abreu, S., Pfreundt, F.J.: Unbalanced tree search on a manycore system using the GPI programming model. Comput. Sci. Res. Dev. 26, 229–236 (2011)CrossRef
16.
Zurück zum Zitat Michael, M.M.: Scalable lock-free dynamic memory allocation. ACM Sigplan Not. 39(6), 35–46 (2004)CrossRef Michael, M.M.: Scalable lock-free dynamic memory allocation. ACM Sigplan Not. 39(6), 35–46 (2004)CrossRef
17.
Zurück zum Zitat Min, S.J., Iancu, C., Yelick, K.: Hierarchical work stealing on manycore clusters. In: 5th Conference on Partitioned Global Address Space programming Models (2011) Min, S.J., Iancu, C., Yelick, K.: Hierarchical work stealing on manycore clusters. In: 5th Conference on Partitioned Global Address Space programming Models (2011)
19.
Zurück zum Zitat Perarnau, S., Sato, M.: Victim selection and distributed work stealing performance: a case study. In: Parallel and Distributed Processing Symposium, vol. 28. IEEE (2014) Perarnau, S., Sato, M.: Victim selection and distributed work stealing performance: a case study. In: Parallel and Distributed Processing Symposium, vol. 28. IEEE (2014)
21.
Zurück zum Zitat Tchiboukdjian, M., Gast, N., Trystram, D., Roch, J.L., Bernard, J.: A Tighter Analysis of Work Stealing. Angorithms and Computation, pp. 291–302 (2010)CrossRef Tchiboukdjian, M., Gast, N., Trystram, D., Roch, J.L., Bernard, J.: A Tighter Analysis of Work Stealing. Angorithms and Computation, pp. 291–302 (2010)CrossRef
22.
Zurück zum Zitat Woodall, T.S., Shipman, G.M., Bosilca, G., Graham, R.L., Maccabe, A.B.: High performance RDMA protocols in HPC. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) EuroPVM/MPI 2006. LNCS, vol. 4192, pp. 76–85. Springer, Heidelberg (2006). https://doi.org/10.1007/11846802_18CrossRef Woodall, T.S., Shipman, G.M., Bosilca, G., Graham, R.L., Maccabe, A.B.: High performance RDMA protocols in HPC. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) EuroPVM/MPI 2006. LNCS, vol. 4192, pp. 76–85. Springer, Heidelberg (2006). https://​doi.​org/​10.​1007/​11846802_​18CrossRef
Metadaten
Titel
Scalable Work-Stealing Load-Balancer for HPC Distributed Memory Systems
verfasst von
Clement Fontenaille
Eric Petit
Pablo de Oliveira Castro
Seijilo Uemura
Devan Sohier
Piotr Lesnicki
Ghislain Lartigue
Vincent Moureau
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-10549-5_12

Premium Partner