Skip to main content
Top
Published in: The Journal of Supercomputing 2/2021

23-05-2020

ELS: Emulation system for debugging and tuning large-scale parallel programs on small clusters

Authors: Fang Lin, Yi Liu, Yayu Guo, Depei Qian

Published in: The Journal of Supercomputing | Issue 2/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Continuous scaling-up of high-performance computing systems has brought challenges to the debugging and tuning of large-scale parallel programs. Firstly, to locate bugs in a program or tune its performance, programmer often needs to execute the program in a specified scale repeatedly, which consumes massive resources; secondly, due to the extensively used job scheduling systems, programmers can only submit their programs as jobs and cannot interact with them, which restricts debugging efficiency and flexibility. To address these challenges, this paper proposes an emulation system that supports debugging and tuning of large-scale parallel programs by executing parallel programs in the desired scale on a small cluster. The program is firstly executed in the desired scale on the target HPC system to record necessary information; then, programmers can choose and re-execute a subset of processes of the program repeatedly on a small cluster, during which the emulation system controls the execution of the processes, and programmers can debug their programs by attaching tools to the selected processes. Moreover, our system supports popular CPU+GPU heterogeneous architecture. The system is evaluated on a small cluster, while a 1000-node system is used as the target HPC system; experimental results demonstrate the accuracy and efficiency of emulation-execution.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
11.
go back to reference Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent NR (2010) Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurr Comput Pract Exp 22(6):685–701 Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent NR (2010) Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurr Comput Pract Exp 22(6):685–701
12.
go back to reference Bahmani A, Mueller F (2017) Scalable communication event tracing via clustering. J Parallel Distrib Comput 109:230–244CrossRef Bahmani A, Mueller F (2017) Scalable communication event tracing via clustering. J Parallel Distrib Comput 109:230–244CrossRef
13.
go back to reference Bouteiller A, Bosilca G, Dongarra J (2007) Retrospect: deterministic replay of MPI applications for interactive distributed debugging. In: European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, Springer. pp 297–306 Bouteiller A, Bosilca G, Dongarra J (2007) Retrospect: deterministic replay of MPI applications for interactive distributed debugging. In: European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, Springer. pp 297–306
14.
go back to reference Clemencon C, Fritscher J, Meehan MJ, Rühl R (1995) An implementation of race detection and deterministic replay with MPI. In: European Conference on Parallel Processing, Springer. pp 155–166 Clemencon C, Fritscher J, Meehan MJ, Rühl R (1995) An implementation of race detection and deterministic replay with MPI. In: European Conference on Parallel Processing, Springer. pp 155–166
15.
go back to reference Danalis A, Marin G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. pp 63–74 Danalis A, Marin G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. pp 63–74
16.
go back to reference DeFreez D, Bhowmick A, Laguna I, Rubio-González C (2020) Detecting and reproducing error-code propagation bugs in mpi implementations. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp 187–201 DeFreez D, Bhowmick A, Laguna I, Rubio-González C (2020) Detecting and reproducing error-code propagation bugs in mpi implementations. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp 187–201
17.
go back to reference DeSouza J, Kuhn B, De Supinski BR, Samofalov V, Zheltov S, Bratanov S (2005) Automated, scalable debugging of mpi programs with intel® message checker. In: Proceedings of the Second International Workshop on Software Engineering for High Performance Computing System Applications. pp 78–82 DeSouza J, Kuhn B, De Supinski BR, Samofalov V, Zheltov S, Bratanov S (2005) Automated, scalable debugging of mpi programs with intel® message checker. In: Proceedings of the Second International Workshop on Software Engineering for High Performance Computing System Applications. pp 78–82
18.
go back to reference de Kergommeaux JC, Ronsse M, De Bosschere K (1999) Mpl: Efficient record/replay of nondeterministic features of message passing libraries. In: European Parallel Virtual Machine/message Passing Interface Users’ Group Meeting, Springer. pp 141–148 de Kergommeaux JC, Ronsse M, De Bosschere K (1999) Mpl: Efficient record/replay of nondeterministic features of message passing libraries. In: European Parallel Virtual Machine/message Passing Interface Users’ Group Meeting, Springer. pp 141–148
19.
go back to reference Elis B, Yang D, Schulz M (2019) Qmpi: a next generation MPI profiling interface for modern HPC platforms. In: Proceedings of the 26th European MPI Users’ Group Meeting. pp 1–10 Elis B, Yang D, Schulz M (2019) Qmpi: a next generation MPI profiling interface for modern HPC platforms. In: Proceedings of the 26th European MPI Users’ Group Meeting. pp 1–10
20.
go back to reference Filgueira R, Carretero J, Singh DE, Calderon A, Núñez A (2012) Dynamic-compi: dynamic optimization techniques for mpi parallel applications. J Supercomput 59(1):361–391CrossRef Filgueira R, Carretero J, Singh DE, Calderon A, Núñez A (2012) Dynamic-compi: dynamic optimization techniques for mpi parallel applications. J Supercomput 59(1):361–391CrossRef
21.
go back to reference Geimer M, Wolf F, Wylie BJ, Ábrahám E, Becker D, Mohr B (2010) The scalasca performance toolset architecture. Concurr Comput Pract Exp 22(6):702–719 Geimer M, Wolf F, Wylie BJ, Ábrahám E, Becker D, Mohr B (2010) The scalasca performance toolset architecture. Concurr Comput Pract Exp 22(6):702–719
22.
go back to reference Gioachin F, Zheng G, Kalé LV (2010) Debugging large scale applications in a virtualized environment. In: International Workshop on Languages and Compilers for Parallel Computing, Springer. pp 199–214 Gioachin F, Zheng G, Kalé LV (2010) Debugging large scale applications in a virtualized environment. In: International Workshop on Languages and Compilers for Parallel Computing, Springer. pp 199–214
23.
go back to reference Guo X, Lin Y, Xu X, Zhang X (2011) Ps-sim: An execution-driven performance simulation technology based on process-switch. In: International Conference on Computer Science, Environment, Ecoinformatics, and Education, Springer. pp 15–22 Guo X, Lin Y, Xu X, Zhang X (2011) Ps-sim: An execution-driven performance simulation technology based on process-switch. In: International Conference on Computer Science, Environment, Ecoinformatics, and Education, Springer. pp 15–22
24.
go back to reference Haque W (2006) Concurrent deadlock detection in parallel programs. Int J Comput Appl 28(1):19–25 Haque W (2006) Concurrent deadlock detection in parallel programs. Int J Comput Appl 28(1):19–25
25.
go back to reference Höfinger S, Haunschmid E (2017) Modelling parallel overhead from simple run-time records. J Supercomput 73(10):4390–4406CrossRef Höfinger S, Haunschmid E (2017) Modelling parallel overhead from simple run-time records. J Supercomput 73(10):4390–4406CrossRef
26.
go back to reference Kale LV, Krishnan S (1993) Charm++ a portable concurrent object oriented system based on c++. In: Proceedings of the Eighth Annual Conference on Object-oriented Programming Systems, Languages, and Applications. pp 91–108 Kale LV, Krishnan S (1993) Charm++ a portable concurrent object oriented system based on c++. In: Proceedings of the Eighth Annual Conference on Object-oriented Programming Systems, Languages, and Applications. pp 91–108
27.
go back to reference Krammer B, Bidmon K, Müller MS, Resch MM (2003) Marmot: An MPI analysis and checking tool. In: ParCo, vol 13, pp 493–500. Citeseer Krammer B, Bidmon K, Müller MS, Resch MM (2003) Marmot: An MPI analysis and checking tool. In: ParCo, vol 13, pp 493–500. Citeseer
28.
go back to reference Kranzlmüller D, Schaubschläger C, Volkert J (2001) An integrated record&replay mechanism for nondeterministic message passing programs. In: European Parallel Virtual Machine/message Passing Interface Users’ Group Meeting, Springer. pp 192–200 Kranzlmüller D, Schaubschläger C, Volkert J (2001) An integrated record&replay mechanism for nondeterministic message passing programs. In: European Parallel Virtual Machine/message Passing Interface Users’ Group Meeting, Springer. pp 192–200
29.
go back to reference Kranzlmüller D, Volkert J (1999) Nope: A nondeterministic program evaluator. In: International Conference of the Austrian Center for Parallel Computation, Springer. pp 490–499 Kranzlmüller D, Volkert J (1999) Nope: A nondeterministic program evaluator. In: International Conference of the Austrian Center for Parallel Computation, Springer. pp 490–499
30.
go back to reference Li H, Chen Z, Gupta R (2017) Parastack: Efficient hang detection for MPI programs at large scale. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. pp 1–12 Li H, Chen Z, Gupta R (2017) Parastack: Efficient hang detection for MPI programs at large scale. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. pp 1–12
31.
go back to reference Liu B, Huang J (2018) D4: fast concurrency debugging with parallel differential analysis. ACM SIGPLAN Not 53(4):359–373CrossRef Liu B, Huang J (2018) D4: fast concurrency debugging with parallel differential analysis. ACM SIGPLAN Not 53(4):359–373CrossRef
32.
go back to reference Liu X, Mellor-Crummey J (2013) A data-centric profiler for parallel programs. In: SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, IEEE. pp 1–12 Liu X, Mellor-Crummey J (2013) A data-centric profiler for parallel programs. In: SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, IEEE. pp 1–12
33.
go back to reference Liu Y, Zhi YZ, Zhang X, Li H, Jiao L, Zhang P, Su YM, Ni ZH, Qian DP (2013) Simhpc: An execution-driven simulator for high-performance computers. Jisuanji Xuebao(Chinese Journal of Computers) 36(4):738–746 Liu Y, Zhi YZ, Zhang X, Li H, Jiao L, Zhang P, Su YM, Ni ZH, Qian DP (2013) Simhpc: An execution-driven simulator for high-performance computers. Jisuanji Xuebao(Chinese Journal of Computers) 36(4):738–746
34.
go back to reference Luecke G, Chen H, Coyle J, Hoekstra J, Kraeva M, Zou Y (2003) MPI-check: a tool for checking Fortran 90 MPI programs. Concurr Comput Pract Exp 15(2):93–100CrossRef Luecke G, Chen H, Coyle J, Hoekstra J, Kraeva M, Zou Y (2003) MPI-check: a tool for checking Fortran 90 MPI programs. Concurr Comput Pract Exp 15(2):93–100CrossRef
35.
go back to reference Malony A, Shende S, Trebon N, Ray J, Armstrong R, Rasmussen C, Sottile M (2005) Performance technology for parallel and distributed component software. Concurr Comput Pract Exp 17(2–4):117–141CrossRef Malony A, Shende S, Trebon N, Ray J, Armstrong R, Rasmussen C, Sottile M (2005) Performance technology for parallel and distributed component software. Concurr Comput Pract Exp 17(2–4):117–141CrossRef
36.
go back to reference Maruyama M, Tsumura T, Nakashima H (2005) Parallel program debugging based on data-replay. In: IASTED PDCS. pp 151–156 Maruyama M, Tsumura T, Nakashima H (2005) Parallel program debugging based on data-replay. In: IASTED PDCS. pp 151–156
37.
go back to reference Mellor-Crummey J, Fowler RJ, Marin G, Tallent N (2002) HPCView: A tool for top-down analysis of node performance. J Supercomput 23(1):81–104CrossRef Mellor-Crummey J, Fowler RJ, Marin G, Tallent N (2002) HPCView: A tool for top-down analysis of node performance. J Supercomput 23(1):81–104CrossRef
38.
go back to reference Mueller F, Wu X, Schulz M, De Supinski BR, Gamblin T (2010) Scalatrace: tracing, analysis and modeling of HPC codes at scale. In: International Workshop on Applied Parallel Computing, Springer. pp 410–418 Mueller F, Wu X, Schulz M, De Supinski BR, Gamblin T (2010) Scalatrace: tracing, analysis and modeling of HPC codes at scale. In: International Workshop on Applied Parallel Computing, Springer. pp 410–418
39.
go back to reference Noeth M, Ratn P, Mueller F, Schulz M, De Supinski BR (2009) Scalatrace: scalable compression and replay of communication traces for high-performance computing. J Parallel Distrib Comput 69(8):696–710CrossRef Noeth M, Ratn P, Mueller F, Schulz M, De Supinski BR (2009) Scalatrace: scalable compression and replay of communication traces for high-performance computing. J Parallel Distrib Comput 69(8):696–710CrossRef
40.
go back to reference Pham A, Jéron T, Quinson M (2017) Verifying MPI applications with simgridmc. In: Proceedings of the First International Workshop on Software Correctness for HPC Applications. pp 28–33 Pham A, Jéron T, Quinson M (2017) Verifying MPI applications with simgridmc. In: Proceedings of the First International Workshop on Software Correctness for HPC Applications. pp 28–33
41.
go back to reference Prakash S, Bagrodia RL (1998) MPI-SIM: Using parallel simulation to evaluate MPI programs. In: 1998 Winter Simulation Conference. Proceedings (Cat. No. 98CH36274), vol 1, IEEE. pp 467–474 Prakash S, Bagrodia RL (1998) MPI-SIM: Using parallel simulation to evaluate MPI programs. In: 1998 Winter Simulation Conference. Proceedings (Cat. No. 98CH36274), vol 1, IEEE. pp 467–474
42.
go back to reference Siegel SF (2007) Model checking nonblocking MPI programs. In: International Workshop on Verification, Model Checking, and Abstract Interpretation. Springer, pp 44–58 Siegel SF (2007) Model checking nonblocking MPI programs. In: International Workshop on Verification, Model Checking, and Abstract Interpretation. Springer, pp 44–58
43.
go back to reference Siegel SF (2007) Verifying parallel programs with MPI-Spin. In: European Parallel Virtual Machine/message Passing Interface Users’ Group Meeting. Springer, pp 13–14 Siegel SF (2007) Verifying parallel programs with MPI-Spin. In: European Parallel Virtual Machine/message Passing Interface Users’ Group Meeting. Springer, pp 13–14
44.
go back to reference Spear W, Malony A, Morris A, Shende S (2006) Integrating TAU with eclipse: a performance analysis system in an integrated development environment. In: International Conference on High Performance Computing and Communications. Springer, pp 230–239 Spear W, Malony A, Morris A, Shende S (2006) Integrating TAU with eclipse: a performance analysis system in an integrated development environment. In: International Conference on High Performance Computing and Communications. Springer, pp 230–239
45.
go back to reference Su P, Jiao S, Chabbi M, Liu X (2019) Pinpointing performance inefficiencies via lightweight variance profiling. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. pp 1–19 Su P, Jiao S, Chabbi M, Liu X (2019) Pinpointing performance inefficiencies via lightweight variance profiling. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. pp 1–19
46.
go back to reference Taheri S, Briggs I, Burtscher M, Gopalakrishnan G (2019) Difftrace: Efficient whole-program trace analysis and diffing for debugging. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE. pp 1–12 Taheri S, Briggs I, Burtscher M, Gopalakrishnan G (2019) Difftrace: Efficient whole-program trace analysis and diffing for debugging. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE. pp 1–12
47.
go back to reference Taheri S, Devale S, Gopalakrishnan G, Burtscher M (2017) Parlot: Efficient whole-program call tracing for hpc applications. In: Programming and Performance Visualization Tools. Springer, pp 162–184 Taheri S, Devale S, Gopalakrishnan G, Burtscher M (2017) Parlot: Efficient whole-program call tracing for hpc applications. In: Programming and Performance Visualization Tools. Springer, pp 162–184
48.
go back to reference Vakkalanka SS, Sharma S, Gopalakrishnan G, Kirby RM (2008) ISP: A tool for model checking MPI programs. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp 285–286 Vakkalanka SS, Sharma S, Gopalakrishnan G, Kirby RM (2008) ISP: A tool for model checking MPI programs. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp 285–286
49.
go back to reference Vetter JS, De Supinski BR (2000) Dynamic software testing of MPI applications with umpire. In: SC’00: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. IEEE, pp 51–61 Vetter JS, De Supinski BR (2000) Dynamic software testing of MPI applications with umpire. In: SC’00: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. IEEE, pp 51–61
50.
go back to reference Vo A, Vakkalanka S, DeLisi M, Gopalakrishnan G, Kirby RM, Thakur R (2009) Formal verification of practical MPI programs. ACM Sigplan Not 44(4):261–270CrossRef Vo A, Vakkalanka S, DeLisi M, Gopalakrishnan G, Kirby RM, Thakur R (2009) Formal verification of practical MPI programs. ACM Sigplan Not 44(4):261–270CrossRef
51.
go back to reference Xue R, Liu X, Wu M, Guo Z, Chen W, Zheng W, Zhang Z, Voelker G (2009) MPIWiz: Subgroup reproducible replay of MPI applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp 251–260 Xue R, Liu X, Wu M, Guo Z, Chen W, Zheng W, Zhang Z, Voelker G (2009) MPIWiz: Subgroup reproducible replay of MPI applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp 251–260
52.
go back to reference Ye F, Zhao J, Sarkar V (2018) Detecting MPI usage anomalies via partial program symbolic execution. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 794–806 Ye F, Zhao J, Sarkar V (2018) Detecting MPI usage anomalies via partial program symbolic execution. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 794–806
53.
go back to reference Zhai J, Chen W, Zheng W (2010) Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. ACM Sigplan Not 45(5):305–314CrossRef Zhai J, Chen W, Zheng W (2010) Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. ACM Sigplan Not 45(5):305–314CrossRef
54.
go back to reference Zheng G, Kakulapati G, Kalé LV (2004) Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. IEEE, p 78 Zheng G, Kakulapati G, Kalé LV (2004) Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. IEEE, p 78
Metadata
Title
ELS: Emulation system for debugging and tuning large-scale parallel programs on small clusters
Authors
Fang Lin
Yi Liu
Yayu Guo
Depei Qian
Publication date
23-05-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 2/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03319-6

Other articles of this Issue 2/2021

The Journal of Supercomputing 2/2021 Go to the issue

Premium Partner