skip to main content
10.1145/2464996.2465433acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

A new approach for performance analysis of openMP programs

Published:10 June 2013Publication History

ABSTRACT

The number of hardware threads is growing with each new generation of multicore chips; thus, one must effectively use threads to fully exploit emerging processors. OpenMP is a popular directive-based programming model that helps programmers exploit thread-level parallelism. In this paper, we describe the design and implementation of a novel performance tool for OpenMP. Our tool distinguishes itself from existing OpenMP performance tools in two principal ways. First, we develop a measurement methodology that attributes blame for work and inefficiency back to program contexts. We show how to integrate prior work on measurement methodologies that employ directed and undirected blame shifting and extend the approach to support dynamic thread-level parallelism in both time-shared and dedicated environments. Second, we develop a novel deferred context resolution method that supports online attribution of performance metrics to full calling contexts within an OpenMP program execution. This approach enables us to collect compact call path profiles for OpenMP program executions without the need for traces. Support for our approach is an integral part of an emerging standard performance tool application programming interface for OpenMP. We demonstrate the effectiveness of our approach by applying our tool to analyze four well-known application benchmarks that cover the spectrum of OpenMP features. In case studies with these benchmarks, insights from our tool helped us significantly improve the performance of these codes.

References

  1. L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 22:685--701, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  2. E. Ayguadé et al. The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst., 20(3):404--418, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I.-H. Chung. IBM high performance toolkit, 2008. https://computing.llnl.gov/tutorials/IBM.HPC.Toolkit.Chung.pdf.Google ScholarGoogle Scholar
  4. Cray Inc. Using Cray performance analysis tools, April 2011. Document S-2376--52, http://docs.cray.com/books/S-2376--52/S-2376--52.pdf.Google ScholarGoogle Scholar
  5. S. R. Das and R. M. Fujimoto. A performance study of the cancelback protocol for time warp. SIGSIM Simul. Dig., 23(1):135--142, July 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Free Software Foundation. GOMP--an OpenMP implementation for GCC. http://gcc.gnu.org/projects/gomp, 2012.Google ScholarGoogle Scholar
  7. K. Fürlinger and M. Gerndt. ompP: A profiling tool for OpenMP. In Proc. of the First and Second International Workshops on OpenMP, pages 15--23, Eugene, Oregon, USA, May 2005. LNCS 4315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Google Inc. TCMalloc: Thread-Caching Malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html. Last accessed April 3, 2013.Google ScholarGoogle Scholar
  9. Intel. Intel VTune Amplifier XE. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe, July, 2012.Google ScholarGoogle Scholar
  10. M. Itzkowitz, O. Mazurov, N. Copty, and Y. Lin. An OpenMP runtime API for profiling. http://www.compunity.org/futures/omp-api.html.Google ScholarGoogle Scholar
  11. H. Jin and R. F. V. der Wijngaart. Performance characteristics of the multi-zone NAS parallel benchmarks. J. Parallel Distrib. Comput., 66(5):674--685, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lawrence Livermore National Laboratory. Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH). https://computation.llnl.gov/casc/ShockHydro. Last accessed April 3, 2013.Google ScholarGoogle Scholar
  13. Lawrence Livermore National Laboratory. ASC Sequoia Benchmark Codes. https://asc.llnl.gov/sequoia/benchmarks, 2012.Google ScholarGoogle Scholar
  14. B. Mohr, A. D. Malony, S. Shende, and F. Wolf. Design and prototype of a performance tool interface for OpenMP. The Journal of Supercomputing, 23:105--128, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. OpenMP Architecture Review Board. OpenMP application program interface, version 3.0. http://www.openmp.org/mp-documents/spec30.pdf, May 2008.Google ScholarGoogle Scholar
  16. Oracle. Oracle Solaris Studio. http://www.oracle.com/technetwork/server-storage/solarisstudio/overview%/index.html.Google ScholarGoogle Scholar
  17. M. Schulz et al. OpenSpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming, 16(2--3):105--121, Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Shende and A. D. Malony. The TAU parallel performance system. International Journal of High Performance Computing Applications, ACTS Collection Special Issue, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. R. Tallent and J. Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. In Proc. of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 229--240, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. R. Tallent, J. Mellor-Crummey, and M. W. Fagan. Binary analysis for measurement and attribution of program performance. In Proc. of the 2009 ACM SIGPLAN Conf on Programming Language Design and Implementation, pages 441--452, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. R. Tallent, J. Mellor-Crummey, and A. Porterfield. Analyzing lock contention in multithreaded applications. In Proc. of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. The Portland Group. PGPROF Profiler Guide Parallel Profiling for Scientists and Engineers. http://www.pgroup.com/doc/pgprofug.pdf, 2011.Google ScholarGoogle Scholar
  23. R. van der Pas. OpenMP Support in Sun Studio. https://iwomp.zih.tu-dresden.de/downloads/3.OpenMP_Sun_Studio.pdf, 2009.Google ScholarGoogle Scholar
  24. F. Wolf et al. Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications. In Tools for High Performance Computing, pages 157--167. Springer Berlin Heidelberg, 2008.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A new approach for performance analysis of openMP programs

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing
              June 2013
              512 pages
              ISBN:9781450321303
              DOI:10.1145/2464996

              Copyright © 2013 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 10 June 2013

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              ICS '13 Paper Acceptance Rate43of202submissions,21%Overall Acceptance Rate584of2,055submissions,28%

              Upcoming Conference

              ICS '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader