Skip to main content
Erschienen in: Formal Methods in System Design 3/2019

09.10.2019

Annotation guided collection of context-sensitive parallel execution profiles

verfasst von: Zachary Benavides, Keval Vora, Rajiv Gupta, Xiangyu Zhang

Erschienen in: Formal Methods in System Design | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Studying the relative behavior of an application’s threads is critical to identifying performance bottlenecks and understanding their root causes. We present context-sensitive parallel (CSP) execution profiles, that capture the relative behavior of threads in terms of the user selected code regions they execute. CSPs can be analyzed to compute execution times spent by the application in interesting behavior states. To capture execution context, code regions of interest can be given static and dynamic names using a versatile set of annotations. The CSP divides the execution time of a multithreaded application into a sequence of time intervals called frames, during which no thread transitions between code regions. By appropriate selection and naming of code regions, the user can obtain a CSP that captures all occurrences of desired behavior states. We provide the user with a powerful query language to facilitate the analysis of CSPs. Our implementation for collection of CSPs of C++ programs has low overhead and high accuracy. Collection of CSPs of full executions of 12 Parsec programs incurred overhead of at most 7% in execution time. The accuracy of CSPs was validated in the context of common performance problems such as load imbalance in pipeline stages and the presence of straggler threads.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent NR (2010) HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr Comput Pract Exp 22(6):685–701 Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent NR (2010) HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr Comput Pract Exp 22(6):685–701
2.
Zurück zum Zitat Anderson TE, Lazowska ED (1990) Quartz: a tool for tuning parallel program performance. In: Proceedings of the ACM SIGMETRICS conference on measurement and modeling of computer systems, pp 115–125 Anderson TE, Lazowska ED (1990) Quartz: a tool for tuning parallel program performance. In: Proceedings of the ACM SIGMETRICS conference on measurement and modeling of computer systems, pp 115–125
3.
Zurück zum Zitat Benavides Z, Vora K, Gupta R, Zhang X (2017) annotation guided collection of context-sensitive parallel execution profiles. In: International conference on runtime verification, LNCS 10548. Springer, Berlin, pp 103–120CrossRef Benavides Z, Vora K, Gupta R, Zhang X (2017) annotation guided collection of context-sensitive parallel execution profiles. In: International conference on runtime verification, LNCS 10548. Springer, Berlin, pp 103–120CrossRef
4.
Zurück zum Zitat Böhme D, Wolf F, de Supinski BR, Schulz M, Geimer M (2012) Scalable critical-path based performance analysis. In: IEEE International symposium on parallel and distributed processing symposium, pp 1330–1340 Böhme D, Wolf F, de Supinski BR, Schulz M, Geimer M (2012) Scalable critical-path based performance analysis. In: IEEE International symposium on parallel and distributed processing symposium, pp 1330–1340
5.
Zurück zum Zitat Curtsinger C, Berger ED (2015) Coz: finding code that counts with causal profiling. In: Proceedings of the symposium on operating systems principles, pp 184–197 Curtsinger C, Berger ED (2015) Coz: finding code that counts with causal profiling. In: Proceedings of the symposium on operating systems principles, pp 184–197
6.
Zurück zum Zitat David F, Thomas G, Lawall J, Muller G (2014) Continuously measuring critical section pressure with the free-lunch profiler. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages applications, pp 291–307 David F, Thomas G, Lawall J, Muller G (2014) Continuously measuring critical section pressure with the free-lunch profiler. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages applications, pp 291–307
7.
Zurück zum Zitat Ding R, Zhou H, Lou JG, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: a cost-aware logging mechanism for performance diagnosis. In: USENIX Annual technical conference, pp 139–150 Ding R, Zhou H, Lou JG, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: a cost-aware logging mechanism for performance diagnosis. In: USENIX Annual technical conference, pp 139–150
8.
Zurück zum Zitat Du Bois K, Sartor JB, Eyerman S, Eeckhout L (2013) Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages applications, pp 355–372 Du Bois K, Sartor JB, Eyerman S, Eeckhout L (2013) Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages applications, pp 355–372
9.
Zurück zum Zitat Geimer M, Wolf F, Wylie BJ, Ábrahám E, Becker D, Mohr B (2010) The scalasca performance toolset architecture. Concurr Comput Pract Exp 22(6):702–719 Geimer M, Wolf F, Wylie BJ, Ábrahám E, Becker D, Mohr B (2010) The scalasca performance toolset architecture. Concurr Comput Pract Exp 22(6):702–719
10.
Zurück zum Zitat Graham SL, Kessler PB, Mckusick MK (1982) Gprof: a call graph execution profiler. In: Proceedings of the 1982 SIGPLAN symposium on compiler construction, pp 120–126CrossRef Graham SL, Kessler PB, Mckusick MK (1982) Gprof: a call graph execution profiler. In: Proceedings of the 1982 SIGPLAN symposium on compiler construction, pp 120–126CrossRef
11.
Zurück zum Zitat Hollingsworth JK (1996) An online computation of critical path profiling. In: Proceedings of the SIGMETRICS symposium on parallel and distributed tools, pp 11–20 Hollingsworth JK (1996) An online computation of critical path profiling. In: Proceedings of the SIGMETRICS symposium on parallel and distributed tools, pp 11–20
12.
Zurück zum Zitat Hollingsworth JK, Miller BP (1992) Parallel program performance metrics: a comparison and validation. In: Proceedings of the ACM/IEEE conference on supercomputing, pp 4–13 Hollingsworth JK, Miller BP (1992) Parallel program performance metrics: a comparison and validation. In: Proceedings of the ACM/IEEE conference on supercomputing, pp 4–13
13.
Zurück zum Zitat Hollingsworth JK, Miller BP (1992) Slack: a new performance metric for parallel programs. In: Computer sciences technical report Hollingsworth JK, Miller BP (1992) Slack: a new performance metric for parallel programs. In: Computer sciences technical report
15.
Zurück zum Zitat Jeon D, Garcia S, Louie C, Taylor MB (2011) Kismet: parallel speedup estimates for serial programs. In: Proceedings of the ACM international conference on object oriented programming systems languages and applications, pp 519–536 Jeon D, Garcia S, Louie C, Taylor MB (2011) Kismet: parallel speedup estimates for serial programs. In: Proceedings of the ACM international conference on object oriented programming systems languages and applications, pp 519–536
16.
Zurück zum Zitat Kambadur M, Tang K, Kim MA (2014) Parashares: finding the important basic blocks in multithreaded programs. In: European conference on parallel processing. Springer, Berlin, pp. 75–86 Kambadur M, Tang K, Kim MA (2014) Parashares: finding the important basic blocks in multithreaded programs. In: European conference on parallel processing. Springer, Berlin, pp. 75–86
18.
Zurück zum Zitat Miller BP, Clark M, Hollingsworth J, Kierstead S, Lim SS, Torzewski T (1990) IPS-2: the second generation of a parallel program measurement system. IEEE Trans Parallel Distrib Syst 1(2):206–217CrossRef Miller BP, Clark M, Hollingsworth J, Kierstead S, Lim SS, Torzewski T (1990) IPS-2: the second generation of a parallel program measurement system. IEEE Trans Parallel Distrib Syst 1(2):206–217CrossRef
19.
Zurück zum Zitat Nesheiwat J, Szymanski BK (1998) Instrumentation database for performance analysis of parallel scientific applications. In: International workshop on languages, compilers, and run-time systems for scalable computers. Springer, Berlin, pp 229–242CrossRef Nesheiwat J, Szymanski BK (1998) Instrumentation database for performance analysis of parallel scientific applications. In: International workshop on languages, compilers, and run-time systems for scalable computers. Springer, Berlin, pp 229–242CrossRef
20.
Zurück zum Zitat Nguyen D, Lenharth A, Pingali K (2013) A lightweight infrastructure for graph analytics. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 456–471 Nguyen D, Lenharth A, Pingali K (2013) A lightweight infrastructure for graph analytics. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 456–471
21.
Zurück zum Zitat Oyama Y, Taura K, Yonezawa A (2000) Online computation of critical paths for multithreaded languages. In: International parallel and distributed processing symposium, pp 301–313 Oyama Y, Taura K, Yonezawa A (2000) Online computation of critical paths for multithreaded languages. In: International parallel and distributed processing symposium, pp 301–313
22.
Zurück zum Zitat Shende S, Malony AD, Cuny J, Beckman P, Karmesin S, Lindlan K (1998) Portable profiling and tracing for parallel, scientific applications using C++. In: Proceedings of the SIGMETRICS symposium on parallel and distributed tools, pp 134–145 Shende S, Malony AD, Cuny J, Beckman P, Karmesin S, Lindlan K (1998) Portable profiling and tracing for parallel, scientific applications using C++. In: Proceedings of the SIGMETRICS symposium on parallel and distributed tools, pp 134–145
23.
Zurück zum Zitat Tallent NR, Mellor-Crummey JM, Porterfield A (2010) Analyzing lock contention in multithreaded applications. In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming, pp 269–280 Tallent NR, Mellor-Crummey JM, Porterfield A (2010) Analyzing lock contention in multithreaded applications. In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming, pp 269–280
24.
Zurück zum Zitat Truong HL, Fahringer T (2003) SCALEA: a performance analysis tool for parallel programs. Concurr Comput Pract Exp 15(11–12):1001–1025CrossRef Truong HL, Fahringer T (2003) SCALEA: a performance analysis tool for parallel programs. Concurr Comput Pract Exp 15(11–12):1001–1025CrossRef
25.
Zurück zum Zitat Vora K, Gupta R, Xu G (2017) Kickstarter: fast and accurate computations on streaming graphs via trimmed approximations. In: Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems, pp 237–251 Vora K, Gupta R, Xu G (2017) Kickstarter: fast and accurate computations on streaming graphs via trimmed approximations. In: Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems, pp 237–251
26.
Zurück zum Zitat Vora K, Koduru S-C, Gupta R (2014) ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM. In: Proceedings of the ACM international conference on object oriented programming systems languages and applications, pp 861–878 Vora K, Koduru S-C, Gupta R (2014) ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM. In: Proceedings of the ACM international conference on object oriented programming systems languages and applications, pp 861–878
27.
Zurück zum Zitat Yang CQ, Miller BP (1988) Critical path analysis for the execution of parallel and distributed programs. In: Proceedings of the 8th international conference on distributed computing systems, pp 366–373 Yang CQ, Miller BP (1988) Critical path analysis for the execution of parallel and distributed programs. In: Proceedings of the 8th international conference on distributed computing systems, pp 366–373
28.
Zurück zum Zitat Yu X, Han S, Zhang D, Xie T (2014) Comprehending performance from real-world execution traces: a device-driver case. In: Proceedings of the 19th international conference on architectural support for programming languages and operating systems, pp 193–206 Yu X, Han S, Zhang D, Xie T (2014) Comprehending performance from real-world execution traces: a device-driver case. In: Proceedings of the 19th international conference on architectural support for programming languages and operating systems, pp 193–206
29.
Zurück zum Zitat Yuan X, Wu C, Wang Z, Li J, Yew PC, Huang J, Feng X, Lan Y, Chen Y, Guan Y (2015) ReCBuLC: reproducing concurrency bugs using local clocks. In: Proceedings of the 37th international conference on software engineering-volume 1, pp 824–834 Yuan X, Wu C, Wang Z, Li J, Yew PC, Huang J, Feng X, Lan Y, Chen Y, Guan Y (2015) ReCBuLC: reproducing concurrency bugs using local clocks. In: Proceedings of the 37th international conference on software engineering-volume 1, pp 824–834
Metadaten
Titel
Annotation guided collection of context-sensitive parallel execution profiles
verfasst von
Zachary Benavides
Keval Vora
Rajiv Gupta
Xiangyu Zhang
Publikationsdatum
09.10.2019
Verlag
Springer US
Erschienen in
Formal Methods in System Design / Ausgabe 3/2019
Print ISSN: 0925-9856
Elektronische ISSN: 1572-8102
DOI
https://doi.org/10.1007/s10703-019-00341-0

Weitere Artikel der Ausgabe 3/2019

Formal Methods in System Design 3/2019 Zur Ausgabe