Top

The Journal of Supercomputing

Published in:

23-05-2020

ELS: Emulation system for debugging and tuning large-scale parallel programs on small clusters

Authors: Fang Lin, Yi Liu, Yayu Guo, Depei Qian

Published in: The Journal of Supercomputing | Issue 2/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Continuous scaling-up of high-performance computing systems has brought challenges to the debugging and tuning of large-scale parallel programs. Firstly, to locate bugs in a program or tune its performance, programmer often needs to execute the program in a specified scale repeatedly, which consumes massive resources; secondly, due to the extensively used job scheduling systems, programmers can only submit their programs as jobs and cannot interact with them, which restricts debugging efficiency and flexibility. To address these challenges, this paper proposes an emulation system that supports debugging and tuning of large-scale parallel programs by executing parallel programs in the desired scale on a small cluster. The program is firstly executed in the desired scale on the target HPC system to record necessary information; then, programmers can choose and re-execute a subset of processes of the program repeatedly on a small cluster, during which the emulation system controls the execution of the processes, and programmers can debug their programs by attaching tools to the selected processes. Moreover, our system supports popular CPU+GPU heterogeneous architecture. The system is evaluated on a small cluster, while a 1000-node system is used as the target HPC system; experimental results demonstrate the accuracy and efficiency of emulation-execution.

previous article Thermal neutrons: a possible threat for supercomputer reliability

next article Parallelized path-based search for constraint satisfaction in autonomous cognitive agents

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

CUDA-GDB homepage [online]. https://developer.nvidia.com/cuda-gdb. Accessed 1 Aug 2019

Distributed Debugging Tool (DDT) homepage [online]. https://developer.arm.com/products/software-development-tools/hpc/arm-forge. Accessed 11 July 2019

Dyninst homepage [online]. https://dyninst.org/. Accessed 18 Aug 2019

GDB homepage [online]. http://www.gnu.org/software/gdb/. Accessed 9 June 2019

HPCTOOLKIT homepage [online]. http://hpctoolkit.org/index.html. Accessed 20 Aug 2019

MPI Documents [online]. https://www.mpi-forum.org/docs/. Accessed 14 Nov 2018

MVAPICH homepage [online]. http://mvapich.cse.ohio-state.edu/. Accessed 14 Nov 2018

TAU homepage [online]. https://www.cs.uoregon.edu/research/tau/home.php. Accessed 20 Aug 2019

THE NAS PARALLEL BENCHMARKS [online]. https://www.nas.nasa.gov/publications/npb.html. Accessed 25 Dec 2018

10.

TotalView for HPC homepage [Online]. https://www.roguewave.com/products-services/totalview. Accessed 11 July 2019

11.

Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent NR (2010) Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurr Comput Pract Exp 22(6):685–701

12.

Bahmani A, Mueller F (2017) Scalable communication event tracing via clustering. J Parallel Distrib Comput 109:230–244CrossRef

13.

Bouteiller A, Bosilca G, Dongarra J (2007) Retrospect: deterministic replay of MPI applications for interactive distributed debugging. In: European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, Springer. pp 297–306

14.

Clemencon C, Fritscher J, Meehan MJ, Rühl R (1995) An implementation of race detection and deterministic replay with MPI. In: European Conference on Parallel Processing, Springer. pp 155–166

15.

Danalis A, Marin G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. pp 63–74

16.

DeFreez D, Bhowmick A, Laguna I, Rubio-González C (2020) Detecting and reproducing error-code propagation bugs in mpi implementations. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp 187–201

17.

DeSouza J, Kuhn B, De Supinski BR, Samofalov V, Zheltov S, Bratanov S (2005) Automated, scalable debugging of mpi programs with intel® message checker. In: Proceedings of the Second International Workshop on Software Engineering for High Performance Computing System Applications. pp 78–82

18.

de Kergommeaux JC, Ronsse M, De Bosschere K (1999) Mpl: Efficient record/replay of nondeterministic features of message passing libraries. In: European Parallel Virtual Machine/message Passing Interface Users’ Group Meeting, Springer. pp 141–148

19.

Elis B, Yang D, Schulz M (2019) Qmpi: a next generation MPI profiling interface for modern HPC platforms. In: Proceedings of the 26th European MPI Users’ Group Meeting. pp 1–10

20.

Filgueira R, Carretero J, Singh DE, Calderon A, Núñez A (2012) Dynamic-compi: dynamic optimization techniques for mpi parallel applications. J Supercomput 59(1):361–391CrossRef

21.

Geimer M, Wolf F, Wylie BJ, Ábrahám E, Becker D, Mohr B (2010) The scalasca performance toolset architecture. Concurr Comput Pract Exp 22(6):702–719

22.

Gioachin F, Zheng G, Kalé LV (2010) Debugging large scale applications in a virtualized environment. In: International Workshop on Languages and Compilers for Parallel Computing, Springer. pp 199–214

23.

Guo X, Lin Y, Xu X, Zhang X (2011) Ps-sim: An execution-driven performance simulation technology based on process-switch. In: International Conference on Computer Science, Environment, Ecoinformatics, and Education, Springer. pp 15–22

24.

Haque W (2006) Concurrent deadlock detection in parallel programs. Int J Comput Appl 28(1):19–25

25.

Höfinger S, Haunschmid E (2017) Modelling parallel overhead from simple run-time records. J Supercomput 73(10):4390–4406CrossRef

26.

Kale LV, Krishnan S (1993) Charm++ a portable concurrent object oriented system based on c++. In: Proceedings of the Eighth Annual Conference on Object-oriented Programming Systems, Languages, and Applications. pp 91–108

27.

Krammer B, Bidmon K, Müller MS, Resch MM (2003) Marmot: An MPI analysis and checking tool. In: ParCo, vol 13, pp 493–500. Citeseer

28.

Kranzlmüller D, Schaubschläger C, Volkert J (2001) An integrated record&replay mechanism for nondeterministic message passing programs. In: European Parallel Virtual Machine/message Passing Interface Users’ Group Meeting, Springer. pp 192–200

29.

Kranzlmüller D, Volkert J (1999) Nope: A nondeterministic program evaluator. In: International Conference of the Austrian Center for Parallel Computation, Springer. pp 490–499

30.

Li H, Chen Z, Gupta R (2017) Parastack: Efficient hang detection for MPI programs at large scale. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. pp 1–12

31.

Liu B, Huang J (2018) D4: fast concurrency debugging with parallel differential analysis. ACM SIGPLAN Not 53(4):359–373CrossRef

32.

Liu X, Mellor-Crummey J (2013) A data-centric profiler for parallel programs. In: SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, IEEE. pp 1–12

33.

Liu Y, Zhi YZ, Zhang X, Li H, Jiao L, Zhang P, Su YM, Ni ZH, Qian DP (2013) Simhpc: An execution-driven simulator for high-performance computers. Jisuanji Xuebao(Chinese Journal of Computers) 36(4):738–746

34.

Luecke G, Chen H, Coyle J, Hoekstra J, Kraeva M, Zou Y (2003) MPI-check: a tool for checking Fortran 90 MPI programs. Concurr Comput Pract Exp 15(2):93–100CrossRef

35.

Malony A, Shende S, Trebon N, Ray J, Armstrong R, Rasmussen C, Sottile M (2005) Performance technology for parallel and distributed component software. Concurr Comput Pract Exp 17(2–4):117–141CrossRef

36.

Maruyama M, Tsumura T, Nakashima H (2005) Parallel program debugging based on data-replay. In: IASTED PDCS. pp 151–156

37.

Mellor-Crummey J, Fowler RJ, Marin G, Tallent N (2002) HPCView: A tool for top-down analysis of node performance. J Supercomput 23(1):81–104CrossRef

38.

Mueller F, Wu X, Schulz M, De Supinski BR, Gamblin T (2010) Scalatrace: tracing, analysis and modeling of HPC codes at scale. In: International Workshop on Applied Parallel Computing, Springer. pp 410–418

39.

Noeth M, Ratn P, Mueller F, Schulz M, De Supinski BR (2009) Scalatrace: scalable compression and replay of communication traces for high-performance computing. J Parallel Distrib Comput 69(8):696–710CrossRef

40.

Pham A, Jéron T, Quinson M (2017) Verifying MPI applications with simgridmc. In: Proceedings of the First International Workshop on Software Correctness for HPC Applications. pp 28–33

41.

Prakash S, Bagrodia RL (1998) MPI-SIM: Using parallel simulation to evaluate MPI programs. In: 1998 Winter Simulation Conference. Proceedings (Cat. No. 98CH36274), vol 1, IEEE. pp 467–474

42.

Siegel SF (2007) Model checking nonblocking MPI programs. In: International Workshop on Verification, Model Checking, and Abstract Interpretation. Springer, pp 44–58

43.

Siegel SF (2007) Verifying parallel programs with MPI-Spin. In: European Parallel Virtual Machine/message Passing Interface Users’ Group Meeting. Springer, pp 13–14

44.

Spear W, Malony A, Morris A, Shende S (2006) Integrating TAU with eclipse: a performance analysis system in an integrated development environment. In: International Conference on High Performance Computing and Communications. Springer, pp 230–239

45.

Su P, Jiao S, Chabbi M, Liu X (2019) Pinpointing performance inefficiencies via lightweight variance profiling. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. pp 1–19

46.

Taheri S, Briggs I, Burtscher M, Gopalakrishnan G (2019) Difftrace: Efficient whole-program trace analysis and diffing for debugging. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE. pp 1–12

47.

Taheri S, Devale S, Gopalakrishnan G, Burtscher M (2017) Parlot: Efficient whole-program call tracing for hpc applications. In: Programming and Performance Visualization Tools. Springer, pp 162–184

48.

Vakkalanka SS, Sharma S, Gopalakrishnan G, Kirby RM (2008) ISP: A tool for model checking MPI programs. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp 285–286

49.

Vetter JS, De Supinski BR (2000) Dynamic software testing of MPI applications with umpire. In: SC’00: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. IEEE, pp 51–61

50.

Vo A, Vakkalanka S, DeLisi M, Gopalakrishnan G, Kirby RM, Thakur R (2009) Formal verification of practical MPI programs. ACM Sigplan Not 44(4):261–270CrossRef

51.

Xue R, Liu X, Wu M, Guo Z, Chen W, Zheng W, Zhang Z, Voelker G (2009) MPIWiz: Subgroup reproducible replay of MPI applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp 251–260

52.

Ye F, Zhao J, Sarkar V (2018) Detecting MPI usage anomalies via partial program symbolic execution. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 794–806

53.

Zhai J, Chen W, Zheng W (2010) Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. ACM Sigplan Not 45(5):305–314CrossRef

54.

Zheng G, Kakulapati G, Kalé LV (2004) Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. IEEE, p 78

Title: ELS: Emulation system for debugging and tuning large-scale parallel programs on small clusters
Authors: Fang Lin
Yi Liu
Yayu Guo
Depei Qian
Publication date: 23-05-2020
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 2/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-020-03319-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 2/2021

A systematic literature review on hardware implementation of artificial intelligence algorithms

Load-balanced and energy-aware opportunistic routing with adaptive duty cycling for multi-channel WSNs

Prediction of highway asphalt pavement performance based on Markov chain and artificial neural network approach

Ramanujan graphs and the spectral gap of supercomputing topologies

Parallelized path-based search for constraint satisfaction in autonomous cognitive agents

Investigating the performance of Hadoop and Spark platforms on machine learning algorithms

Premium Partner