Top

Published in:

2016 | OriginalPaper | Chapter

FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing

Authors : Carsten Weinhold, Adam Lackorzynski, Jan Bierbaum, Martin Küttler, Maksym Planeta, Hermann Härtig, Amnon Shiloh, Ely Levy, Tal Ben-Nun, Amnon Barak, Thomas Steinke, Thorsten Schütt, Jan Fajerski, Alexander Reinefeld, Matthias Lieber, Wolfgang E. Nagel

Published in: Software for Exascale Computing - SPPEXA 2013-2015

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper we describe the hardware and application-inherent challenges that future exascale systems pose to high-performance computing (HPC) and propose a system architecture that addresses them. This architecture is based on proven building blocks and few principles: (1) a fast light-weight kernel that is supported by a virtualized Linux for tasks that are not performance critical, (2) decentralized load and health management using fault-tolerant gossip-based information dissemination, (3) a maximally-parallel checkpoint store for cheap checkpoint/restart in the presence of frequent component failures, and (4) a runtime that enables applications to interact with the underlying system platform through new interfaces. The paper discusses the vision behind FFMK and the current state of a prototype implementation of the system, which is based on a microkernel and an adapted MPI runtime.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Simulating Turbulence Using the Astrophysical Discontinuous Galerkin Code TENET

next chapter Fast In-Memory Checkpointing with POSIX API for Legacy Exascale-Applications

COSMO-SPECS+FD4 has an internal load balancer, which we disabled in the experiments described here.

Acun, B., Gupta, A., Jain, N., Langer, A., Menon, H., Mikida, E., Ni, X., Robson, M., Sun, Y., Totoni, E., Wesolowski, L., Kale, L.: Parallel programming with migratable objects: Charm++ in practice. In: Proceedings of the Supercomputing 2014, Leipzig, pp. 647–658. IEEE (2014)

Arnold, D.C., Miller, B.P.: Scalable failure recovery for high-performance data aggregation. In: Proceedings of the IPDPS 2010, Atlanta, pp. 1–11. IEEE (2010)

Barak, A., Guday, S., Wheeler, R.: The MOSIX Distributed Operating System: Load Balancing for UNIX. Lecture Notes in Computer Science, vol. 672. Springer, Berlin/New York (1993)

Barak, A., Margolin, A., Shiloh, A.: Automatic resource-centric process migration for MPI. In: Proceedings of the EuroMPI 2012. Lecture Notes in Computer Science, vol. 7490, pp. 163–172. Springer, Berlin/New York (2012)

Barak, A., Drezner, Z., Levy, E., Lieber, M., Shiloh, A.: Resilient gossip algorithms for collecting online management information in exascale clusters. Concurr. Comput. Pract. Exper. 27 (17), 4797–4818 (2015)CrossRef

Beckman, P., et al.: Argo: an exascale operating system. http://www.argo-osr.org/. Accessed 20 Nov 2015

Ben-Nun, T., Levy, E., Barak, A., Rubin, E.: Memory access patterns: the missing piece of the multi-GPU puzzle. In: Proceedings of the Supercomputing 2015, Newport Beach, pp. 19:1–19:12. ACM (2015)

Berkeley Lab Checkpoint/Restart. http://ftg.lbl.gov/checkpoint. Accessed 20 Nov 2015

Brightwell, R., Oldfield, R., Maccabe, A.B., Bernholdt, D.E.: Hobbes: composition and virtualization as the foundations of an extreme-scale OS/R. In: Proceedings of the ROSS’13, pp. 2:1–2:8. ACM (2013)

10.

Bronevetsky, G., Marques, D., Pingali, K., Stodghill, P.: Automated application-level checkpointing of MPI programs. ACM Sigplan Not. 38 (10), 84–94 (2003)CrossRefMATH

11.

Burstedde, C., Ghattas, O., Gurnis, M., Isaac, T., Stadler, G., Warburton, T., Wilcox, L.: Extreme-scale AMR. In: Proceedings of the Supercomputing 2010, Tsukuba, pp. 1–12. ACM (2010)

12.

Cappello, F., Geist, A., Gropp, W., Kale, S., Kramer, B., Snir, M.: Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1 (1), 5–28 (2014)

13.

Corradi, A., Leonardi, L., Zambonelli, F.: Diffusive load-balancing policies for dynamic applications. IEEE Concurr. 7 (1), 22–31 (1999)CrossRef

14.

Dongarra, J., et al.: The international exascale software project roadmap. Int. J. High Speed Comput. 25 (1), 3–60 (2011)

15.

EXAHD – An Exa-Scalable Two-Level Sparse Grid Approach for Higher-Dimensional Problems in Plasma Physics and Beyond. http://ipvs.informatik.uni-stuttgart.de/SGS/EXAHD/index.php. Accessed 29 Nov 2015

16.

FFMK Website. http://ffmk.tudos.org. Accessed 20 Nov 2015

17.

Harlacher, D.F., Klimach, H., Roller, S., Siebert, C., Wolf, F.: Dynamic load balancing for unstructured meshes on space-filling curves. In: Proceedings of the IPDPSW 2012, pp. 1661–1669. IEEE (2012)

18.

Kale, L.V., Zheng, G.: Charm++ and AMPI: adaptive runtime strategies via migratable objects. In: Parashar, M., Li, X. (eds.) Advanced Computational Infrastructures for Parallel and Distributed Adaptive Applications, chap. 13, pp. 265–282. Wiley, Hoboken (2009)CrossRef

19.

Kogge, P., Shalf, J.: Exascale computing trends: adjusting to the “New Normal” for computer architecture. Comput. Sci. Eng. 15 (6), 16–26 (2013)CrossRef

20.

Lackorzynski, A., Warg, A., Peter, M.: Generic virtualization with virtual processors. In: Proceedings of the 12th Real-Time Linux Workshop, Nairobi (2010)

21.

Lange, J., Pedretti, K., Hudson, T., Dinda, P., Cui, Z., Xia, L., Bridges, P., Gocke, A., Jaconette, S., Levenhagen, M., Brightwell, R.: Palacios and Kitten: new high performance operating systems for scalable virtualized and native supercomputing. In: Proceedings of the IPDPS 2010, Atlanta, pp. 1–12. IEEE (2010)

22.

Levy, E., Barak, A., Shiloh, A., Lieber, M., Weinhold, C., Härtig, H.: Overhead of a decentralized gossip algorithm on the performance of HPC applications. In: Proceedings of the ROSS’14, Munich, pp. 10:1–10:7. ACM (2014)

23.

Lieber, M., Grützun, V., Wolke, R., Müller, M.S., Nagel, W.E.: Highly scalable dynamic load balancing in the atmospheric modeling system COSMO-SPECS+FD4. In: Proceedings of the PARA 2010. Lecture Notes in Computer Science, vol. 7133, pp. 131–141. Springer, Berlin/New York (2012)

24.

Liedtke, J.: On micro-kernel construction. In: Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP’95), Copper Mountain Resort, pp. 237–250. ACM (1995)

25.

Lucas, R., et al.: Top ten exascale research challenges. DOE ASCAC subcommittee report. http://science.energy.gov/~/media/ascr/ascac/pdf/meetings/20140210/Top10reportFEB14.pdf (2014). Accessed 20 Nov 2015

26.

Milthorpe, J., Ganesh, V., Rendell, A.P., Grove, D.: X10 as a parallel language for scientific computation: practice and experience. In: Proceedings of the IPDPS 2011, Anchorage, pp. 1080–1088. IEEE (2011)

27.

Moody, A., Bronevetsky, G., Mohror, K., de Supinski, B.: Detailed modeling, design, and evaluation of a scalable multi-level checkpointing system. Technical report LLNL-TR-440491, Lawrence Livermore National Laboratory (LLNL) (2010)

28.

MPI: A message-passing interface standard, version 3.1. http://www.mpi-forum.org/docs (2015). Accessed 20 Nov 2015

29.

Mvapich: Mpi over infiniband. http://mvapich.cse.ohio-state.edu/. Accessed 20 Nov 2015

30.

Open Source Molecular Dynamics. http://www.cp2k.org/. Accessed 20 Nov 2015

31.

Ouyang, X., Marcarelli, S., Rajachandrasekar, R., Panda, D.K.: RDMA-based job migration framework for MPI over Infiniband. In: Proceedings of the IEEE CLUSTER 2010, Heraklion, pp. 116–125. IEEE (2010)

32.

Rajachandrasekar, R., Moody, A., Mohror, K., Panda, D.K.: A 1 PB/s file system to checkpoint three million MPI tasks. In: Proceedings of the HPDC’13, New York, pp. 143–154. ACM (2013)

33.

Roitzsch, M., Wachtler, S., Härtig, H.: Atlas: look-ahead scheduling using workload metrics. In: Proceedings of the RTAS 2013, Philadelphia, pp. 1–10. IEEE (2013)

34.

Sato, K., Maruyama, N., Mohror, K., Moody, A., Gamblin, T., de Supinski, B.R., Matsuoka, S.: Design and modeling of a non-blocking checkpointing system. In: Proceedings of the Supercomputing 2012, Venice, pp. 19:1–19:10. IEEE (2012)

35.

Sato, M., Fukazawa, G., Yoshinaga, K., Tsujita, Y., Hori, A., Namiki, M.: A hybrid operating system for a computing node with multi-core and many-core processors. Int. J. Adv. Comput. Sci. 3, 368–377 (2013)

36.

Wang, C., Mueller, F., Engelmann, C., Scott, S.L.: Proactive process-level live migration and back migration in HPC environments. J. Par. Distrib. Comput. 72 (2), 254–267 (2012)CrossRef

37.

Wende, F., Steinke, T., Reinefeld, A.: The impact of process placement and oversubscription on application performance: a case study for exascale computing. Technical report 15–05, ZIB (2015)

38.

Winkel, M., Speck, R., Hübner, H., Arnold, L., Krause, R., Gibbon, P.: A massively parallel, multi-disciplinary Barnes-Hut tree code for extreme-scale N-body simulations. Comput. Phys. Commun. 183 (4), 880–889 (2012)MathSciNetCrossRef

39.

Wisniewski, R.W., Inglett, T., Keppel, P., Murty, R., Riesen, R.: mOS: an architecture for extreme-scale operating systems. In: Proceedings of the ROSS’14, Munich, pp. 2:1–2:8. ACM (2014)

40.

XtreemFS – a cloud file system. http://www.xtreemfs.org. Accessed 20 Nov 2015

41.

Xue, M., Droegemeier, K.K., Weber, D.: Numerical prediction of high-impact local weather: a driver for petascale computing. In: Bader, D.A. (ed.) Petascale Computing: Algorithms and Applications, pp. 103–124. Chapman & Hall/CRC, Boca Raton (2008)

42.

Zheng, F., Yu, H., Hantas, C., Wolf, M., Eisenhauer, G., Schwan, K., Abbasi, H., Klasky, S.: Goldrush: resource efficient in situ scientific data analytics using fine-grained interference aware execution. In: Proceedings of the Supercomputing 2013, Eugene, pp. 78:1–78:12. ACM (2013)

Title: FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing
Authors: Carsten Weinhold
Adam Lackorzynski
Jan Bierbaum
Martin Küttler
Maksym Planeta
Hermann Härtig
Amnon Shiloh
Ely Levy
Tal Ben-Nun
Amnon Barak
Thomas Steinke
Thorsten Schütt
Jan Fajerski
Alexander Reinefeld
Matthias Lieber
Wolfgang E. Nagel
Publisher: Springer International Publishing
Book: Software for Exascale Computing - SPPEXA 2013-2015
Print ISBN: 978-3-319-40526-1

Electronic ISBN: 978-3-319-40528-5

Copyright Year: 2016
DOI: https://doi.org/10.1007/978-3-319-40528-5_18

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner