ABSTRACT
Developing an optimizing compiler for a newly proposed architecture is extremely difficult when there is only a simulator of the machine available. Designing such a compiler requires running many experiments in order to understand how different optimizations interact. Given that simulators are orders of magnitude slower than real processors, such experiments are highly restricted. This paper develops a technique to automatically build a performance model for predicting the impact of program transformations on any architecture, based on a limited number of automatically selected runs. As a result, the time for evaluating the impact of any compiler optimization in early design stages can be drastically reduced such that all selected potential compiler optimizations can be evaluated. This is achieved by first evaluating a small set of sample compiler optimizations on a prior set of benchmarks in order to train a model, followed by a very small number of evaluations, or probes, of the target program.We show that by training on less than 0. 7% of all possible transformations (640 samples collected from 10 benchmarks out of 880000 possible samples, 88000 per training benchmark) and probing the new program on only 4 transformations, we can predict the performance of all program transformations with an error of just 7. 3% on average. As each prediction takes almost no time to generate, this scheme provides an accurate method of evaluating compiler performance, which is several orders of magnitude faster than current approaches.
- F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. O'Boyle, J. Thomson, M. Toussaint, and C. Williams. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2006. Google ScholarDigital Library
- L. Almagor, K. Cooper, A. Grosul, T. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. Finding effective compilation sequences. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pages 231--239, 2004. Google ScholarDigital Library
- L. Bachega, J. Brunheroto, L. DeRose, P. Mindlin, and J. Moreira. The bluegene/l pseudo cycle-accurate simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2004. Google ScholarDigital Library
- H. Berry, D. G. Prez, and O. Temam. Chaos in computer performance. Chaos, 16(1), Dec. 2005.Google Scholar
- C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 2005. Google ScholarDigital Library
- L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Chapman and Hall, 1984.Google Scholar
- J. Cavazos and J. E. Moss. Inducing heuristics to decide whether to schedule. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2004. Google ScholarDigital Library
- K. D. Cooper, A. Grosul, T. Harvey, S. Reeves, D. Subramanian, L. Torczon, , and T. Waterman. Searching for compilation sequences. Tech. report, Rice University, 2005.Google Scholar
- T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, 1991. Google ScholarDigital Library
- L. Eeckhout, R. H. B. Jr., B. Stougie, K. D. Bosschere, and L. K. John. Control flow modeling in statistical simulation for accurate and efficient processor design studies. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 350--363, 2004. Google ScholarDigital Library
- A. Epshteyn, M. Garzaran, G. DeJong, D. Padua, G. Ren, X. Li, K. Yotov, and K. Pingali. Analytic models and empirical search: A hybrid approach to code optimization. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC), Hawthorne, NY, USA, 2005. Google ScholarDigital Library
- B. Franke, M. O'Boyle, J. Thomson, and G. Fursin. Probabilistic source-level optimisation of embedded programs. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), 2005. Google ScholarDigital Library
- M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, volume 3, pages 1381--1384, Seattle, WA, May 1998.Google ScholarCross Ref
- M. Hall, L. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, M., and Lam. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84--89, 1999. Google ScholarDigital Library
- E. Ipek, S. A. McKee, B. R. de Supinski, M. Schulz, and R. Caruana. Efficiently Exploring Architectural Design Spaces via Predictive Modeling. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, San Jose, CA, USA, 2006. ACM Press. Google ScholarDigital Library
- R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive Mixtures of Local Experts. Neural Computation, 3, 1991.Google Scholar
- T. Karkhanis and J. E. Smith. A first-order superscalar processor model. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 338--349, 2004. Google ScholarDigital Library
- P. Kulkarni, W. Zhao, H. Moon, K. Cho, D. Whalley, J. Davidson, M. Bailey, Y. Paek, and K. Gallivan. Finding effective optimization phase sequences. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pages 12--23, 2003. Google ScholarDigital Library
- C. Lee. Utdsp benchmark suite. In http://www. eecg. toronto. edu/~corinna/DSP/infrastructure/UTDSP. html, 1998.Google Scholar
- A. Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. In Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, Applications, LNCS 2443, pages 41--50, 2002. Google ScholarDigital Library
- Z. Pan and R. Eigenmann. Fast automatic procedure-level performance tuning. In IEEE PACT, Seattle, WA, September 2006. IEEE Computer Society. Google ScholarDigital Library
- D. Parello, O. Temam, A. Cohen, and J. -M. Verdun. Toward a systematic, pragmatic and architecture-aware program optimization process for complex processors. In Proceedings of the International Conference on Supercomputing, 2004. Google ScholarDigital Library
- M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gacić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232--275, 2005.Google ScholarCross Ref
- M. Saghir, P. Chow, and C. Lee. A comparison of traditional and vliw dsp architecture for compiled dsp applications. In Proceedings of the International Workshop on Compiler and Architecture Support for Embedded Systems (CASES), Washington, DC, USA, 1998.Google Scholar
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 45--57, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
- M. Stephenson and S. P. Amarasinghe. Predicting unroll factors using supervised classification. In Proceedings of International Symposium on Code Generation and Optimization (CGO), pages 123--134, 2005. Google ScholarDigital Library
- M. Stephenson, M. Martin, and U. O'Reilly. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 77--90, 2003. Google ScholarDigital Library
- S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D. August. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 204--215, 2003. Google ScholarDigital Library
- T. F. Wenisch, R. E. Wunderlich, B. Falsafi, and J. C. Hoe. Turbosmarts: accurate microarchitecture simulation sampling in minutes. In Proceedings of the ACM SIGMETRICS, pages 408--409, 2005. Google ScholarDigital Library
- R. C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In SuperComputing 1998: High Performance Networking and Computing, 1998. Google ScholarDigital Library
- R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. Smarts: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 84--95, 2003. Google ScholarDigital Library
- K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. Padua, K. Pingali, P. Stodghill, and P. Wu. A comparison of empirical and model-driven optimization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 63--76, 2003. Google ScholarDigital Library
- K. Yotov, K. Pingali, and P. Stodghill. Think globally, search locally. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 141--150, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- M. Zhao, B. R. Childers, and M. L. Soffa. A model-based framework: an approach for profit-driven optimization. In Third Annual IEEE/ACM Interational Conference on Code Generation and Optimization, pages 317--327, 2005. Google ScholarDigital Library
Index Terms
- Automatic performance model construction for the fast software exploration of new hardware designs
Recommendations
Tuning Compiler Optimizations for Simultaneous Multithreading
Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part IISimultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions ...
Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization
Profile-based optimization can be used for instruction scheduling, loop scheduling, data preloading, function in-lining, and instruction cache performance enhancement. However, these techniques have not been embraced by software vendors because programs ...
Investigating the impact of code generation on performance characteristics of integer programs
INTERACT-14: Proceedings of the 2010 Workshop on Interaction between Compilers and Computer ArchitectureAs the complexity of interactions among microprocessors, platforms, and system software increases, compilers play a greater role in extracting performance for applications. Different compiler technologies have contributed significantly in improving the ...
Comments