Skip to main content

2019 | OriginalPaper | Buchkapitel

ParLoT: Efficient Whole-Program Call Tracing for HPC Applications

verfasst von : Saeed Taheri, Sindhu Devale, Ganesh Gopalakrishnan, Martin Burtscher

Erschienen in: Programming and Performance Visualization Tools

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The complexity of HPC software and hardware is quickly increasing. As a consequence, the need for efficient execution tracing to gain insight into HPC application behavior is steadily growing. Unfortunately, available tools either do not produce traces with enough detail or incur large overheads. An efficient tracing method that overcomes the tradeoff between maximum information and minimum overhead is therefore urgently needed. This paper presents such a method and tool, called ParLoT, with the following key features. (1) It describes a technique that makes low-overhead on-the-fly compression of whole-program call traces feasible. (2) It presents a new, efficient, incremental trace-compression approach that reduces the trace volume dynamically, which lowers not only the needed bandwidth but also the tracing overhead. (3) It collects all caller/callee relations, call frequencies, call stacks, as well as the full trace of all calls and returns executed by each thread, including in library code. (4) It works on top of existing dynamic binary instrumentation tools, thus requiring neither source-code modifications nor recompilation. (5) It supports program analysis and debugging at the thread, thread-group, and program level. This paper establishes that comparable capabilities are currently unavailable. Our experiments with the NAS parallel benchmarks running on the Comet supercomputer with up to 1,024 cores show that ParLoT can collect whole-program function-call traces at an average tracing bandwidth of just 56 kB/s per core.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Given the absence of tools similar to ParLoT, we employ Callgrind as a “close-enough” tool in our comparisons elaborated in Sect. 4.3. In this capacity, Callgrind is similar to ParLoT(m), a variant of ParLoT that only collects traces from the main image. We perform such comparison to have an idea of how we fare with respect to one other tool. In Sect. 5, we also present a self-assessment of ParLoT separately.
 
Literatur
2.
Zurück zum Zitat Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), pp. 1–10 (2007) Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), pp. 1–10 (2007)
3.
4.
5.
Zurück zum Zitat Burtscher, M., Mukka, H., Yang, A., Hesaaraki, F.: Real-time synthesis of compression algorithms for scientific data. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 23:1–23:12. IEEE Press, Piscataway, NJ, USA (2016). http://dl.acm.org/citation.cfm?id=3014904.3014935 Burtscher, M., Mukka, H., Yang, A., Hesaaraki, F.: Real-time synthesis of compression algorithms for scientific data. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 23:1–23:12. IEEE Press, Piscataway, NJ, USA (2016). http://​dl.​acm.​org/​citation.​cfm?​id=​3014904.​3014935
6.
Zurück zum Zitat Claggett, S., Azimi, S., Burtscher, M.: SPDP: an automatically synthesized lossless compression algorithm for floating-point data. In: 2018 Data Compression Conference (2018) Claggett, S., Azimi, S., Burtscher, M.: SPDP: an automatically synthesized lossless compression algorithm for floating-point data. In: 2018 Data Compression Conference (2018)
7.
Zurück zum Zitat Coplin, J., Yang, A., Poppe, A., Burtscher, M.: Increasing telemetry throughput using customized and adaptive data compression. In: AIAA SPACE and Astronautics Forum and Exposition (2016) Coplin, J., Yang, A., Poppe, A., Burtscher, M.: Increasing telemetry throughput using customized and adaptive data compression. In: AIAA SPACE and Astronautics Forum and Exposition (2016)
11.
Zurück zum Zitat Godin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithms based on Galois (concept) lattices. Comput. Intell. 11(2), 246–267 Godin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithms based on Galois (concept) lattices. Comput. Intell. 11(2), 246–267
13.
Zurück zum Zitat Hazelwood, K., Klauser, A.: A dynamic binary instrumentation engine for the ARM architecture. In: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES 2006, pp. 261–270. ACM, New York (2006). https://doi.org/10.1145/1176760.1176793 Hazelwood, K., Klauser, A.: A dynamic binary instrumentation engine for the ARM architecture. In: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES 2006, pp. 261–270. ACM, New York (2006). https://​doi.​org/​10.​1145/​1176760.​1176793
14.
Zurück zum Zitat Heroux, M.A., et al.: Improving performance via mini-applications. Sandia National Laboratories, Technical report SAND2009-5574 3 (2009) Heroux, M.A., et al.: Improving performance via mini-applications. Sandia National Laboratories, Technical report SAND2009-5574 3 (2009)
17.
18.
Zurück zum Zitat Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 190–200. ACM, New York (2005). https://doi.org/10.1145/1065010.1065034 Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 190–200. ACM, New York (2005). https://​doi.​org/​10.​1145/​1065010.​1065034
20.
Zurück zum Zitat Mohror, K., Karavanic, K.L.: Evaluating similarity-based trace reduction techniques for scalable performance analysis. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 55:1–55:12. ACM, New York (2009). https://doi.org/10.1145/1654059.1654115 Mohror, K., Karavanic, K.L.: Evaluating similarity-based trace reduction techniques for scalable performance analysis. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 55:1–55:12. ACM, New York (2009). https://​doi.​org/​10.​1145/​1654059.​1654115
21.
Zurück zum Zitat Nataraj, A., Malony, A., Morris, A., Arnold, D.C., Miller, B.: A framework for scalable, parallel performance monitoring 22, 720–735 (2009) Nataraj, A., Malony, A., Morris, A., Arnold, D.C., Miller, B.: A framework for scalable, parallel performance monitoring 22, 720–735 (2009)
22.
Zurück zum Zitat Nethercote, N., Seward, J.: How to shadow every byte of memory used by a program. In: Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE 2007, pp. 65–74. ACM, New York (2007) Nethercote, N., Seward, J.: How to shadow every byte of memory used by a program. In: Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE 2007, pp. 65–74. ACM, New York (2007)
25.
Zurück zum Zitat Noeth, M., Ratn, P., Mueller, F., Schulz, M., de Supinski, B.R.: ScalaTrace: scalable compression and replay of communication traces for high-performance computing. J. Parallel Distrib. Comput. 69(8), 696–710 (2009). https://doi.org/10.1016/j.jpdc.2008.09.001. Best Paper Awards: 21st International Parallel and Distributed Processing Symposium (IPDPS 2007)CrossRef Noeth, M., Ratn, P., Mueller, F., Schulz, M., de Supinski, B.R.: ScalaTrace: scalable compression and replay of communication traces for high-performance computing. J. Parallel Distrib. Comput. 69(8), 696–710 (2009). https://​doi.​org/​10.​1016/​j.​jpdc.​2008.​09.​001. Best Paper Awards: 21st International Parallel and Distributed Processing Symposium (IPDPS 2007)CrossRef
29.
31.
Zurück zum Zitat Strande, S.M., et al.: Comet: Tales from the Long Tail: Two Years in and 10,000 users later. In: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, PEARC 2017, pp. 38:1–38:7. ACM, New York (2017). https://doi.org/10.1145/3093338.3093383 Strande, S.M., et al.: Comet: Tales from the Long Tail: Two Years in and 10,000 users later. In: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, PEARC 2017, pp. 38:1–38:7. ACM, New York (2017). https://​doi.​org/​10.​1145/​3093338.​3093383
32.
Zurück zum Zitat Tikir, M.M., Laurenzano, M., Carrington, L., Snavely, A.: PMaC binary instrumentation library for PowerPC/AIX. In: Workshop on Binary Instrumentation and Applications (2006) Tikir, M.M., Laurenzano, M., Carrington, L., Snavely, A.: PMaC binary instrumentation library for PowerPC/AIX. In: Workshop on Binary Instrumentation and Applications (2006)
Metadaten
Titel
ParLoT: Efficient Whole-Program Call Tracing for HPC Applications
verfasst von
Saeed Taheri
Sindhu Devale
Ganesh Gopalakrishnan
Martin Burtscher
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-17872-7_10