ABSTRACT
Performance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of an application's communication activity using message sampling. We have implemented an operational prototype, named PHOTON, and our evidence shows that this new approach can provide an accurate, low-overhead, tractable alternative for performance analysis of communication activity. PHOTON consists of two components: a Message Passing Interface (MPI) profiling layer that implements sampling and analysis, and a modified MPI runtime that appends a small but necessary amount of information to individual messages. More importantly, this alternative enables an assortment of runtime analysis techniques so that, in contrast to post-mortem, trace-based techniques, the raw performance data can be jettisoned immediately after analysis. Our investigation shows that message sampling can reduce overhead to imperceptible levels for many applications. Experiments on several applications demonstrate the viability of this approach. For example, with one application, our technique reduced the analysis overhead from 154% for traditional tracing to 6% for statistical profiling. We also evaluate different sampling techniques in this framework. The coverage of the sample space provided by purely random sampling is superior to counter- and timer-based sampling. Also, PHOTON's design reveals that frugal modifications to the MPI runtime system could facilitate such techniques on production computing systems, and it suggests that this sampling technique could execute continuously for long-running applications.
- G. S. Almasi, C. Cascaval et al., "Demonstrating the scalability of a molecular dynamics application on a Petaflop computer," Proc. Int'l Conf. Supercomputing, 2001, pp. 393-406.]] Google ScholarDigital Library
- J. M. Anderson, L. M. Berc et al., "Continuous profiling: where have all the cycles gone?," ACM Trans. Computer Systems, 15(4):357-90, 1997.]] Google ScholarDigital Library
- T. E. Anderson and E. D. Lazowska, "Quartz: A Tool for Tuning Parallel Program Performance," Proc. 1990 SIGMETRICS Conf. Measurement and Modeling Computer Systems, 1990, pp. 115-25.]] Google ScholarDigital Library
- R. Bosch, C. Stolte et al., "Rivet: a flexible environment for computer systems visualization," Computer Graphics, 34(1):68-73, 2000.]] Google ScholarDigital Library
- P. N. Brown, R. D. Falgout, and J. E. Jones, "Semicoarsening multigrid on distributed memory machines," SIAM Journal on Scientific Computing, 21(5):1823-34, 2000.]] Google ScholarDigital Library
- J. Caubet, J. Gimenez et al., "A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications," Proc. Workshop on OpenMP Applications and Tools (WOMPAT), 2001.]] Google ScholarDigital Library
- K. C. Claffy, G. C. Polyzos, and H.-W. Braun, "Application of sampling methodologies to network traffic characterization," Proc. SIGCOMM: Communications architectures, protocols and applications, 1993, pp. 194-203.]] Google ScholarDigital Library
- G. A. Geist, M. T. Heath et al., "A Users' Guide to PICL - A Portable Instrumented Communication Library," Oak Ridge National Laboratory, P.O.Box 2009, Bldg. 9207-A, Oak Ridge, TN 37831-8083 1991.]]Google Scholar
- S. L. Graham, P. B. Kessler, and M. K. McKusick, "Gprof: A Call Graph Execution Profiler," SIGPLAN Notices (SIGPLAN '82 Symp. Compiler Construction), 17(6):120-6, 1982.]] Google ScholarDigital Library
- W. Gropp, E. Lusk, and A. Skjellum, Using MPI: portable parallel programming with the message-passing interface, 2nd ed. Cambridge, MA: MIT Press, 1999.]] Google ScholarDigital Library
- W. D. Gropp, E. Lusk, and D. Swider, "Improving the Performance of MPI Derived Datatypes," Proc. MPI Developers and Users Conference (MPIDC), 1999.]]Google Scholar
- W. Gu, G. Eisenhauer et al., "Falcon: On-line Monitoring and Steering of Parallel Programs," Concurrency: Practice and Experience, 10(9):699-736, 1998.]]Google ScholarCross Ref
- M. T. Heath, A. D. Malony, and D. T. Rover, "Parallel performance visualization: from practice to theory," IEEE Parallel & Distributed Technology: Systems & Applications, 3(4):44-60, 1995.]] Google ScholarDigital Library
- J. Hoeflinger, B. Kuhn et al., "An Integrated Performance Visualizer for OpenMP/MPI Programs," Proc. Workshop on OpenMP Applications and Tools (WOMPAT), 2001.]] Google ScholarDigital Library
- A. Hoisie, O. Lubeck et al., "A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs," Proc. ICPP 2000, 2000.]] Google ScholarDigital Library
- K. R. Koch, R. S. Baker, and R. E. Alcouffe, "Solution of the First-Order Form of the 3-D Discrete Ordinates Equation on a Massively Parallel Processor," Trans. Amer. Nuc. Soc., 65(198), 1992.]]Google Scholar
- J. Labarta, S. Girona et al., "DiP: A Parallel Program Development Environment," CEPBA, Barcelona, Spain 1996.]]Google Scholar
- A. D. Malony and D. A. Reed, "Visualizing Parallel Computer System Performance," in Parallel Computer Systems: Performance Instrumentation and Visualization, M. S. Bucher, Ed. New York: ACM, 1990.]] Google ScholarDigital Library
- A. A. Mirin, R. H. Cohen et al., "Very High Resolution Simulation of Compressible Turbulence on the IBM-SP System," Proc. SC99: High Performance Networking and Computing Conf. (electronic publication), 1999.]] Google ScholarDigital Library
- D. A. Reed, P. C. Roth et al., "Scalable performance analysis: the Pablo performance analysis environment," Proc. Scalable Parallel Libraries Conf., 1994, pp. 104-13.]]Google ScholarCross Ref
- S. Shende, A. D. Malony et al., "Portable profiling and tracing for parallel, scientific applications using C++," Proc. SIGMETRICS Symp. Parallel and Distributed Tools (SPDT), 1998, pp. 134-45.]] Google ScholarDigital Library
- M. Snir, S. Otto et al., Eds., MPI--the complete reference, 2nd ed. Cambridge, MA: MIT Press, 1998.]] Google ScholarDigital Library
- J. Stasko, J. Domingue et al., Eds., Software Visualization: Programming as a Multimedia Experience,. Cambridge, MA: MIT Press, 1998.]] Google ScholarDigital Library
- J. S. Vetter, "Performance Analysis of Distributed Applications using Automatic Classification of Communication Inefficiencies," Proc. ACM Int'l Conf. Supercomputing (ICS), 2000, pp. 245 - 54.]] Google ScholarDigital Library
- J. S. Vetter and F. Mueller, "Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures," Proc. International Parallel and Distributed Processing Symposium (IPDPS), 2002.]] Google ScholarDigital Library
- C. E. Wu, A. Bolmarcich et al., "From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems," Proc. SC2000: High Performance Networking and Computing, 2000.]] Google ScholarDigital Library
- Dynamic statistical profiling of communication activity in distributed applications
Recommendations
Dynamic statistical profiling of communication activity in distributed applications
Measurement and modeling of computer systemsPerformance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of ...
Low-overhead memory leak detection using adaptive statistical profiling
ASPLOS '04Sampling has been successfully used to identify performance optimization opportunities. We would like to apply similar techniques to check program correctness. Unfortunately, sampling provides poor coverage of infrequently executed code, where bugs ...
Profiling Java applications using code hotswapping and dynamic call graph revelation
Instrumentation-based profiling has many advantages and one serious disadvantage: usually high performance overhead. This overhead can be substantially reduced if only a small part of the target application (for example, one that has previously been ...
Comments