ABSTRACT
Empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which values give the best performance. In contrast, conventional compilers use models of programs and machines to choose these parameters. It is widely believed that model-driven optimization does not compete with empirical optimization, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the empirical optimization engine in ATLAS (a system for generating a dense numerical linear algebra library called the BLAS) with a model-driven optimization engine that used detailed models to estimate values for optimization parameters, and then measured the relative performance of the two systems on three different hardware platforms. Our experiments show that model-driven optimization can be surprisingly effective, and can generate code whose performance is comparable to that of code generated by empirical optimizers for the BLAS.
- R. Allan and K. Kennedy, Optimizing Compilers for Modern Architectures, Morgan Kaufman Publishing, 2002 Google ScholarDigital Library
- P. Boulet, A. Darte, T. Risset, and Y. Robert, Pen-ultimate tiling?, INTEGRATION, the VLSI Journal, Volume 17, pages 33--51, 1994 Google ScholarDigital Library
- G. Cascaval and D. Padua, Estimating Cache Misses and Locality using Stack Distances. To appear in the International Conference on Supercomputing (ICS), 2003 Google ScholarDigital Library
- P. Clauss, Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: Applications to analyze and transform scientific programs, In proceedings of the International Conference on Supercomputing (ICS), 1996 Google ScholarDigital Library
- S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In proceedings of Programming Languages Design and Implementation (PLDI), 1995 Google ScholarDigital Library
- J. Dongarra and R. Schreiber, Automatic blocking of nested loops, Technical Report UT-CS-90-108, Department of Computer Science, University of Tennessee, May 1990 Google Scholar
- B. B. Fraguela, R. Doallo, and E. Zapata: Automatic Analytical Modeling for the Estimation of Cache Misses. In proceedings of Parallel Architectures and Compilation Techniques (PACT), pages 221--231, 1999 Google ScholarDigital Library
- M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 3, pages 1381--1384, 1998Google ScholarCross Ref
- D. A. Patterson and J. L. Hennessy, Computer Architecture: A Quantitative Approach, 2 Edition, Morgan Kaufman, 1996 Google ScholarDigital Library
- T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, and H. A. G. Wijshoff . Iterative compilation in program optimization. In proceedings of Compilers for Parallel Computers, pages (CPC), pages 35--44, 2000Google Scholar
- T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, F. Bodin, and H. A. G. Wijshoff. A feasibility study in iterative compilation. In proceedings of the International Symposium on High Performance Computing (ISHPC), 1999 Google ScholarDigital Library
- M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 63--74, 1999 Google ScholarDigital Library
- R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal 9, 2, 78--117, 1970Google ScholarDigital Library
- A. C. McKellar and E. G. Coffman. Organizing matrices and matrix operations for paged memory systems. Communications of the ACM 12, 3, pages 153--165 Google ScholarDigital Library
- K. S. McKinley, S. Carr, and C. Tseng, Improving data locality with loop transformations, In ACM Transactions on Programming Languages and Systems, 18(4):424--453, 1996 Google ScholarDigital Library
- J. J. Navarro, T. Juan, and T. Lang, MOB forms: a class of multilevel block algorithms for dense linear algebra operations, In proceedings of the International Conference on Supercomputing (ICS), pages 354--362, 1994 Google ScholarDigital Library
- J. Ramanujam and P. Sadayappan, Tiling multidimensional iteration spaces for multicomputers, Journal of Parallel and Distributed Computing, 16(2):108--120, 1992Google ScholarCross Ref
- R. Whaley and J. Dongarra. Automatically Tuned Linear Algebra Software. Technical Report UT CS-97-366, LAPACK Working Note No.131, University of Tennessee, 1997 Google Scholar
- M. Wolfe, Iteration space tiling for memory hierarchies, In SIAM Conference on Parallel Processing for Scientific Computing, 1987 Google ScholarDigital Library
- M. E. Wolf, D. E. Maydan, and D. Chen, Combining loop transformations considering caches and scheduling, In proceeding of the International Symposium on Microarchitecture (MICRO 29), pages 274--286, 1996 Google ScholarDigital Library
- J. Xiong, J. Johnson, R. Johnson, and D. Padua, SPL: A Language and its Compiler for DSP Algorithms, In proceedings of Programming Languages Design and Implementation (PLDI), 2001 Google ScholarDigital Library
Index Terms
- A comparison of empirical and model-driven optimization
Recommendations
A comparison of empirical and model-driven optimization
Empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which values give the best performance. In contrast, conventional compilers use ...
Velocity-Driven Particle Swarm Optimization
ICCPR '19: Proceedings of the 2019 8th International Conference on Computing and Pattern RecognitionParticle swarm optimization (PSO) is an efficient nature-inspired optimization algorithm, which has been widely applied in many engineering fields. The performance of particle swarm optimization (PSO) has been significantly influenced by velocity update ...
Empirical study of the Bee Colony Optimization (BCO) algorithm
The Bee Colony Optimization (BCO) meta-heuristic deals with combinatorial optimization problems. It is biologically inspired method that explores collective intelligence applied by the honey bees during nectar collecting process. In this paper we ...
Comments