research-article

A new approach for performance analysis of openMP programs

Authors:
Xu Liu

Rice University, Houston, TX, USA

Rice University, Houston, TX, USA
View Profile

,
John Mellor-Crummey

Rice University, Houston, TX, USA

Rice University, Houston, TX, USA
View Profile

,
Michael Fagan

Rice University, Houston, TX, USA

Rice University, Houston, TX, USA
View Profile

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputingJune 2013Pages 69–80https://doi.org/10.1145/2464996.2465433

Published:10 June 2013Publication History

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing

Pages 69–80

ABSTRACT

The number of hardware threads is growing with each new generation of multicore chips; thus, one must effectively use threads to fully exploit emerging processors. OpenMP is a popular directive-based programming model that helps programmers exploit thread-level parallelism. In this paper, we describe the design and implementation of a novel performance tool for OpenMP. Our tool distinguishes itself from existing OpenMP performance tools in two principal ways. First, we develop a measurement methodology that attributes blame for work and inefficiency back to program contexts. We show how to integrate prior work on measurement methodologies that employ directed and undirected blame shifting and extend the approach to support dynamic thread-level parallelism in both time-shared and dedicated environments. Second, we develop a novel deferred context resolution method that supports online attribution of performance metrics to full calling contexts within an OpenMP program execution. This approach enables us to collect compact call path profiles for OpenMP program executions without the need for traces. Support for our approach is an integral part of an emerging standard performance tool application programming interface for OpenMP. We demonstrate the effectiveness of our approach by applying our tool to analyze four well-known application benchmarks that cover the spectrum of OpenMP features. In case studies with these benchmarks, insights from our tool helped us significantly improve the performance of these codes.

References

L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 22:685--701, 2010. Google ScholarCross Ref
E. Ayguadé et al. The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst., 20(3):404--418, Mar. 2009. Google ScholarDigital Library
I.-H. Chung. IBM high performance toolkit, 2008. https://computing.llnl.gov/tutorials/IBM.HPC.Toolkit.Chung.pdf.Google Scholar
Cray Inc. Using Cray performance analysis tools, April 2011. Document S-2376--52, http://docs.cray.com/books/S-2376--52/S-2376--52.pdf.Google Scholar
S. R. Das and R. M. Fujimoto. A performance study of the cancelback protocol for time warp. SIGSIM Simul. Dig., 23(1):135--142, July 1993. Google ScholarDigital Library
Free Software Foundation. GOMP--an OpenMP implementation for GCC. http://gcc.gnu.org/projects/gomp, 2012.Google Scholar
K. Fürlinger and M. Gerndt. ompP: A profiling tool for OpenMP. In Proc. of the First and Second International Workshops on OpenMP, pages 15--23, Eugene, Oregon, USA, May 2005. LNCS 4315. Google ScholarDigital Library
Google Inc. TCMalloc: Thread-Caching Malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html. Last accessed April 3, 2013.Google Scholar
Intel. Intel VTune Amplifier XE. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe, July, 2012.Google Scholar
M. Itzkowitz, O. Mazurov, N. Copty, and Y. Lin. An OpenMP runtime API for profiling. http://www.compunity.org/futures/omp-api.html.Google Scholar
H. Jin and R. F. V. der Wijngaart. Performance characteristics of the multi-zone NAS parallel benchmarks. J. Parallel Distrib. Comput., 66(5):674--685, May 2006. Google ScholarDigital Library
Lawrence Livermore National Laboratory. Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH). https://computation.llnl.gov/casc/ShockHydro. Last accessed April 3, 2013.Google Scholar
Lawrence Livermore National Laboratory. ASC Sequoia Benchmark Codes. https://asc.llnl.gov/sequoia/benchmarks, 2012.Google Scholar
B. Mohr, A. D. Malony, S. Shende, and F. Wolf. Design and prototype of a performance tool interface for OpenMP. The Journal of Supercomputing, 23:105--128, 2002. Google ScholarDigital Library
OpenMP Architecture Review Board. OpenMP application program interface, version 3.0. http://www.openmp.org/mp-documents/spec30.pdf, May 2008.Google Scholar
Oracle. Oracle Solaris Studio. http://www.oracle.com/technetwork/server-storage/solarisstudio/overview%/index.html.Google Scholar
M. Schulz et al. OpenSpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming, 16(2--3):105--121, Apr. 2008. Google ScholarDigital Library
S. Shende and A. D. Malony. The TAU parallel performance system. International Journal of High Performance Computing Applications, ACTS Collection Special Issue, 2005. Google ScholarDigital Library
N. R. Tallent and J. Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. In Proc. of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 229--240, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
N. R. Tallent, J. Mellor-Crummey, and M. W. Fagan. Binary analysis for measurement and attribution of program performance. In Proc. of the 2009 ACM SIGPLAN Conf on Programming Language Design and Implementation, pages 441--452, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
N. R. Tallent, J. Mellor-Crummey, and A. Porterfield. Analyzing lock contention in multithreaded applications. In Proc. of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010. Google ScholarDigital Library
The Portland Group. PGPROF Profiler Guide Parallel Profiling for Scientists and Engineers. http://www.pgroup.com/doc/pgprofug.pdf, 2011.Google Scholar
R. van der Pas. OpenMP Support in Sun Studio. https://iwomp.zih.tu-dresden.de/downloads/3.OpenMP_Sun_Studio.pdf, 2009.Google Scholar
F. Wolf et al. Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications. In Tools for High Performance Computing, pages 157--167. Springer Berlin Heidelberg, 2008.Google ScholarCross Ref

Index Terms

A new approach for performance analysis of openMP programs

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. ...
Read More
Critical-blame analysis for OpenMP 4.0 offloading on Intel Xeon Phi

Critical-path detection in OpenMP 4.0 programs with offloaded code.Detection and quantification of load imbalances and their cause in OpenMP 4.0 codes.Implementation in the open-source tool infrastructure Score-P.Validation and evaluation with modified ...
Read More
Performance Gaps between OpenMP and OpenCL for Multi-core CPUs
ICPPW '12: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops

OpenCL and OpenMP are the most commonly used programming models for multi-core processors. They are also fundamentally different in their approach to parallelization. In this paper, we focus on comparing the performance of OpenCL and OpenMP. We select ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing
June 2013
512 pages
ISBN:9781450321303
DOI:10.1145/2464996
General Chair:
Allen D. Malony
University of Oregon, USA
,
Program Chairs:
Mario Nemirovsky
Barcelona Supercomputing Center, Spain
,
Sam Midkiff
Purdue University, USA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
openmp
performance analysis
performance measurement
software tools
Qualifiers
- research-article
Conference

Acceptance Rates
ICS '13 Paper Acceptance Rate43of202submissions,21%Overall Acceptance Rate584of2,055submissions,28%
More
Upcoming Conference
ICS '24

Sponsor:

sigarch

2024 International Conference on Supercomputing

June 4 - 7, 2024

Kyoto , Japan
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 504
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A new approach for performance analysis of openMP programs

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Critical-blame analysis for OpenMP 4.0 offloading on Intel Xeon Phi

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A new approach for performance analysis of openMP programs

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Critical-blame analysis for OpenMP 4.0 offloading on Intel Xeon Phi

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media