nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

ParLoT: Efficient Whole-Program Call Tracing for HPC Applications

verfasst von : Saeed Taheri, Sindhu Devale, Ganesh Gopalakrishnan, Martin Burtscher

Erschienen in: Programming and Performance Visualization Tools

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The complexity of HPC software and hardware is quickly increasing. As a consequence, the need for efficient execution tracing to gain insight into HPC application behavior is steadily growing. Unfortunately, available tools either do not produce traces with enough detail or incur large overheads. An efficient tracing method that overcomes the tradeoff between maximum information and minimum overhead is therefore urgently needed. This paper presents such a method and tool, called ParLoT, with the following key features. (1) It describes a technique that makes low-overhead on-the-fly compression of whole-program call traces feasible. (2) It presents a new, efficient, incremental trace-compression approach that reduces the trace volume dynamically, which lowers not only the needed bandwidth but also the tracing overhead. (3) It collects all caller/callee relations, call frequencies, call stacks, as well as the full trace of all calls and returns executed by each thread, including in library code. (4) It works on top of existing dynamic binary instrumentation tools, thus requiring neither source-code modifications nor recompilation. (5) It supports program analysis and debugging at the thread, thread-group, and program level. This paper establishes that comparable capabilities are currently unavailable. Our experiments with the NAS parallel benchmarks running on the Comet supercomputer with up to 1,024 cores show that ParLoT can collect whole-program function-call traces at an average tracing bandwidth of just 56 kB/s per core.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Advanced Event-Sampling Support for PAPI

Nächstes Kapitel Gotcha: An Function-Wrapping Interface for HPC Tools

Given the absence of tools similar to ParLoT, we employ Callgrind as a “close-enough” tool in our comparisons elaborated in Sect. 4.3. In this capacity, Callgrind is similar to ParLoT(m), a variant of ParLoT that only collects traces from the main image. We perform such comparison to have an idea of how we fare with respect to one other tool. In Sect. 5, we also present a self-assessment of ParLoT separately.

Aguilar, X., Fürlinger, K., Laure, E.: Online MPI trace compression using event flow graphs and wavelets. Procedia Comput. Sci. 80(Supp. C), 1497–1506 (2016). https://doi.org/10.1016/j.procs.2016.05.471. http://www.sciencedirect.com/science/article/pii/S1877050916309565. International Conference on Computational Science 2016, ICCS 2016, 6-8 June 2016, San Diego, California, USACrossRef

Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), pp. 1–10 (2007)

Bailey, D.H., et al.: The NAS parallel benchmarks— summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing 1991, pp. 158–165. ACM, New York (1991). https://doi.org/10.1145/125826.125925

Burtscher, M., Rabeti, H.: A scalable heterogeneous parallelization framework for iterative local searches. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 1289–1298, May 2013. https://doi.org/10.1109/IPDPS.2013.27

Burtscher, M., Mukka, H., Yang, A., Hesaaraki, F.: Real-time synthesis of compression algorithms for scientific data. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 23:1–23:12. IEEE Press, Piscataway, NJ, USA (2016). http://dl.acm.org/citation.cfm?id=3014904.3014935

Claggett, S., Azimi, S., Burtscher, M.: SPDP: an automatically synthesized lossless compression algorithm for floating-point data. In: 2018 Data Compression Conference (2018)

Coplin, J., Yang, A., Poppe, A., Burtscher, M.: Increasing telemetry throughput using customized and adaptive data compression. In: AIAA SPACE and Astronautics Forum and Exposition (2016)

Freitag, F., Caubet, J., Labarta, J.: On the scalability of tracing mechanisms. In: Monien, B., Feldmann, R. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 97–104. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45706-2_10CrossRef

Gamblin, T., de Supinski, B.R., Schulz, M., Fowler, R., Reed, D.A.: Scalable load-balance measurement for SPMD codes. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 1–12, November 2008. https://doi.org/10.1109/SC.2008.5222553

10.

Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations, 1st edn. Springer, Secaucus (1997). https://doi.org/10.1007/978-3-642-59830-2CrossRefMATH

11.

Godin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithms based on Galois (concept) lattices. Comput. Intell. 11(2), 246–267

12.

Gopalakrishnan, G., et al.: Report of the HPC correctness summit, 25–26 January 2017, Washington, DC. CoRR abs/1705.07478 (2017). http://arxiv.org/abs/1705.07478

13.

Hazelwood, K., Klauser, A.: A dynamic binary instrumentation engine for the ARM architecture. In: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES 2006, pp. 261–270. ACM, New York (2006). https://doi.org/10.1145/1176760.1176793

14.

Heroux, M.A., et al.: Improving performance via mini-applications. Sandia National Laboratories, Technical report SAND2009-5574 3 (2009)

15.

Intel: Pin, a dynamic binary instrumentation. https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool

16.

Jurenz, M., Brendel, R., Knüpfer, A., Müller, M., Nagel, W.E.: Memory allocation tracing with VampirTrace. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4488, pp. 839–846. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72586-2_118CrossRef

17.

Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for Periscope, Scalasca, Tau, and Vampir. In: Brunst, H., Müller, M., Nagel, W., Resch, M. (eds.) Tools for High Performance Computing 2011, pp. 79–91. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-31476-6_7CrossRef

18.

Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 190–200. ACM, New York (2005). https://doi.org/10.1145/1065010.1065034

19.

Miller, B.P., et al.: The Paradyn parallel performance measurement tool. IEEE Comput. 28(11), 37–46 (1995). https://doi.org/10.1109/2.471178CrossRef

20.

Mohror, K., Karavanic, K.L.: Evaluating similarity-based trace reduction techniques for scalable performance analysis. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 55:1–55:12. ACM, New York (2009). https://doi.org/10.1145/1654059.1654115

21.

Nataraj, A., Malony, A., Morris, A., Arnold, D.C., Miller, B.: A framework for scalable, parallel performance monitoring 22, 720–735 (2009)

22.

Nethercote, N., Seward, J.: How to shadow every byte of memory used by a program. In: Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE 2007, pp. 65–74. ACM, New York (2007)

23.

Nethercote, N., Seward, J.: Valgrind: a program supervision framework. Electr. Notes Theor. Comput. Sci. 89(2), 44–66 (2003). https://doi.org/10.1016/S1571-0661(04)81042-9CrossRef

24.

Network, Microsoft, Docs: C sequence points. https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx

25.

Noeth, M., Ratn, P., Mueller, F., Schulz, M., de Supinski, B.R.: ScalaTrace: scalable compression and replay of communication traces for high-performance computing. J. Parallel Distrib. Comput. 69(8), 696–710 (2009). https://doi.org/10.1016/j.jpdc.2008.09.001. Best Paper Awards: 21st International Parallel and Distributed Processing Symposium (IPDPS 2007)CrossRef

26.

de Oliveira, D.C.B., Rakamarić, Z., Gopalakrishnan, G., Humphrey, A., Meng, Q., Berzins, M.: Systematic debugging of concurrent systems using Coalesced Stack Trace Graphs. In: Brodman, J., Tu, P. (eds.) LCPC 2014. LNCS, vol. 8967, pp. 317–331. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17473-0_21. http://www.sci.utah.edu/publications/Oli2014a/OliveiraLCPC2014.pdfCrossRef

27.

Ratanaworabhan, P., Burtscher, M.: Program phase detection based on critical basic block transitions. In: ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software, pp. 11–21, April 2008. https://doi.org/10.1109/ISPASS.2008.4510734

28.

Roth, P.C., Arnold, D.C., Miller, B.P.: MRNet: a software-based multicast/reduction network for scalable tools. In: 2003 ACM/IEEE Conference Supercomputing, p. 21, November 2003. https://doi.org/10.1145/1048935.1050172

29.

Schulz, M., Galarowicz, J., Maghrak, D., Hachfeld, W., Montoya, D., Cranford, S.: Open \(\vert \) SpeedShop: an open source infrastructure for parallel performance analysis. Sci. Prog. 16(2–3), 105–121 (2008). https://doi.org/10.3233/SPR-2008-0256

30.

Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20, 287–311 (2006). https://doi.org/10.1177/1094342006064482. http://portal.acm.org/citation.cfm?id=1125980.1125982CrossRef

31.

Strande, S.M., et al.: Comet: Tales from the Long Tail: Two Years in and 10,000 users later. In: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, PEARC 2017, pp. 38:1–38:7. ACM, New York (2017). https://doi.org/10.1145/3093338.3093383

32.

Tikir, M.M., Laurenzano, M., Carrington, L., Snavely, A.: PMaC binary instrumentation library for PowerPC/AIX. In: Workshop on Binary Instrumentation and Applications (2006)

33.

Weidendorfer, J.: Sequential performance analysis with Callgrind and KCachegrind. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 93–113. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68564-7_7CrossRef

34.

Yang, A., Mukka, H., Hesaaraki, F., Burtscher, M.: MPC: a massively parallel compression algorithm for scientific data. In: 2015 IEEE International Conference on Cluster Computing, pp. 381–389, September 2015. https://doi.org/10.1109/CLUSTER.2015.59

35.

Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (2006). https://doi.org/10.1109/TIT.1977.1055714MathSciNetCrossRefMATH

Titel: ParLoT: Efficient Whole-Program Call Tracing for HPC Applications
verfasst von: Saeed Taheri
Sindhu Devale
Ganesh Gopalakrishnan
Martin Burtscher
Verlag: Springer International Publishing
Buch: Programming and Performance Visualization Tools
Print ISBN: 978-3-030-17871-0

Electronic ISBN: 978-3-030-17872-7

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-17872-7_10

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"