Article

Automatic performance model construction for the fast software exploration of new hardware designs

Authors:
John Cavazos

University of Edinburgh, UK

University of Edinburgh, UK
View Profile

,
Christophe Dubach

University of Edinburgh, UK

University of Edinburgh, UK
View Profile

,
Felix Agakov

University of Edinburgh, UK

University of Edinburgh, UK
View Profile

,
Edwin Bonilla

University of Edinburgh, UK

University of Edinburgh, UK
View Profile

,
Michael F. P. O'Boyle

University of Edinburgh, UK

University of Edinburgh, UK
View Profile

,
Grigori Fursin

Paris-Sud University, France

Paris-Sud University, France
View Profile

,
Olivier Temam

Paris-Sud University, France

Paris-Sud University, France
View Profile

CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systemsOctober 2006Pages 24–34https://doi.org/10.1145/1176760.1176765

Published:22 October 2006Publication History

CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems

Pages 24–34

ABSTRACT

Developing an optimizing compiler for a newly proposed architecture is extremely difficult when there is only a simulator of the machine available. Designing such a compiler requires running many experiments in order to understand how different optimizations interact. Given that simulators are orders of magnitude slower than real processors, such experiments are highly restricted. This paper develops a technique to automatically build a performance model for predicting the impact of program transformations on any architecture, based on a limited number of automatically selected runs. As a result, the time for evaluating the impact of any compiler optimization in early design stages can be drastically reduced such that all selected potential compiler optimizations can be evaluated. This is achieved by first evaluating a small set of sample compiler optimizations on a prior set of benchmarks in order to train a model, followed by a very small number of evaluations, or probes, of the target program.We show that by training on less than 0. 7% of all possible transformations (640 samples collected from 10 benchmarks out of 880000 possible samples, 88000 per training benchmark) and probing the new program on only 4 transformations, we can predict the performance of all program transformations with an error of just 7. 3% on average. As each prediction takes almost no time to generate, this scheme provides an accurate method of evaluating compiler performance, which is several orders of magnitude faster than current approaches.

References

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. O'Boyle, J. Thomson, M. Toussaint, and C. Williams. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2006. Google ScholarDigital Library
L. Almagor, K. Cooper, A. Grosul, T. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. Finding effective compilation sequences. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pages 231--239, 2004. Google ScholarDigital Library
L. Bachega, J. Brunheroto, L. DeRose, P. Mindlin, and J. Moreira. The bluegene/l pseudo cycle-accurate simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2004. Google ScholarDigital Library
H. Berry, D. G. Prez, and O. Temam. Chaos in computer performance. Chaos, 16(1), Dec. 2005.Google Scholar
C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 2005. Google ScholarDigital Library
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Chapman and Hall, 1984.Google Scholar
J. Cavazos and J. E. Moss. Inducing heuristics to decide whether to schedule. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2004. Google ScholarDigital Library
K. D. Cooper, A. Grosul, T. Harvey, S. Reeves, D. Subramanian, L. Torczon, , and T. Waterman. Searching for compilation sequences. Tech. report, Rice University, 2005.Google Scholar
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, 1991. Google ScholarDigital Library
L. Eeckhout, R. H. B. Jr., B. Stougie, K. D. Bosschere, and L. K. John. Control flow modeling in statistical simulation for accurate and efficient processor design studies. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 350--363, 2004. Google ScholarDigital Library
A. Epshteyn, M. Garzaran, G. DeJong, D. Padua, G. Ren, X. Li, K. Yotov, and K. Pingali. Analytic models and empirical search: A hybrid approach to code optimization. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC), Hawthorne, NY, USA, 2005. Google ScholarDigital Library
B. Franke, M. O'Boyle, J. Thomson, and G. Fursin. Probabilistic source-level optimisation of embedded programs. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), 2005. Google ScholarDigital Library
M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, volume 3, pages 1381--1384, Seattle, WA, May 1998.Google ScholarCross Ref
M. Hall, L. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, M., and Lam. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84--89, 1999. Google ScholarDigital Library
E. Ipek, S. A. McKee, B. R. de Supinski, M. Schulz, and R. Caruana. Efficiently Exploring Architectural Design Spaces via Predictive Modeling. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, San Jose, CA, USA, 2006. ACM Press. Google ScholarDigital Library
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive Mixtures of Local Experts. Neural Computation, 3, 1991.Google Scholar
T. Karkhanis and J. E. Smith. A first-order superscalar processor model. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 338--349, 2004. Google ScholarDigital Library
P. Kulkarni, W. Zhao, H. Moon, K. Cho, D. Whalley, J. Davidson, M. Bailey, Y. Paek, and K. Gallivan. Finding effective optimization phase sequences. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pages 12--23, 2003. Google ScholarDigital Library
C. Lee. Utdsp benchmark suite. In http://www. eecg. toronto. edu/~corinna/DSP/infrastructure/UTDSP. html, 1998.Google Scholar
A. Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. In Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, Applications, LNCS 2443, pages 41--50, 2002. Google ScholarDigital Library
Z. Pan and R. Eigenmann. Fast automatic procedure-level performance tuning. In IEEE PACT, Seattle, WA, September 2006. IEEE Computer Society. Google ScholarDigital Library
D. Parello, O. Temam, A. Cohen, and J. -M. Verdun. Toward a systematic, pragmatic and architecture-aware program optimization process for complex processors. In Proceedings of the International Conference on Supercomputing, 2004. Google ScholarDigital Library
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gacić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232--275, 2005.Google ScholarCross Ref
M. Saghir, P. Chow, and C. Lee. A comparison of traditional and vliw dsp architecture for compiled dsp applications. In Proceedings of the International Workshop on Compiler and Architecture Support for Embedded Systems (CASES), Washington, DC, USA, 1998.Google Scholar
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 45--57, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
M. Stephenson and S. P. Amarasinghe. Predicting unroll factors using supervised classification. In Proceedings of International Symposium on Code Generation and Optimization (CGO), pages 123--134, 2005. Google ScholarDigital Library
M. Stephenson, M. Martin, and U. O'Reilly. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 77--90, 2003. Google ScholarDigital Library
S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D. August. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 204--215, 2003. Google ScholarDigital Library
T. F. Wenisch, R. E. Wunderlich, B. Falsafi, and J. C. Hoe. Turbosmarts: accurate microarchitecture simulation sampling in minutes. In Proceedings of the ACM SIGMETRICS, pages 408--409, 2005. Google ScholarDigital Library
R. C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In SuperComputing 1998: High Performance Networking and Computing, 1998. Google ScholarDigital Library
R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. Smarts: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 84--95, 2003. Google ScholarDigital Library
K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. Padua, K. Pingali, P. Stodghill, and P. Wu. A comparison of empirical and model-driven optimization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 63--76, 2003. Google ScholarDigital Library
K. Yotov, K. Pingali, and P. Stodghill. Think globally, search locally. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 141--150, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
M. Zhao, B. R. Childers, and M. L. Soffa. A model-based framework: an approach for profit-driven optimization. In Third Annual IEEE/ACM Interational Conference on Code Generation and Optimization, pages 317--327, 2005. Google ScholarDigital Library

Index Terms

Automatic performance model construction for the fast software exploration of new hardware designs
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Logical and relational learning
        Inductive logic learning
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Tuning Compiler Optimizations for Simultaneous Multithreading
Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II

Simultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions ...
Read More
Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization

Profile-based optimization can be used for instruction scheduling, loop scheduling, data preloading, function in-lining, and instruction cache performance enhancement. However, these techniques have not been embraced by software vendors because programs ...
Read More
Investigating the impact of code generation on performance characteristics of integer programs
INTERACT-14: Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture

As the complexity of interactions among microprocessors, platforms, and system software increases, compilers play a greater role in extracting performance for applications. Different compiler technologies have contributed significantly in improving the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
October 2006
448 pages
ISBN:1595935436
DOI:10.1145/1176760
General Chairs:
Seongsoo Hong
Seoul National University
,
Wayne Wolf
Princeton University
,
Program Chairs:
Krisztian Flautner
ARM Ltd.
,
Taewhan Kim
Seoul National University
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 October 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
architecture
artificial neural networks
compiler optimization
machine learning
performance modelling
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate52of230submissions,23%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 46
  Total Citations
  View Citations
- 500
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic performance model construction for the fast software exploration of new hardware designs

CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Tuning Compiler Optimizations for Simultaneous Multithreading

Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization

Investigating the impact of code generation on performance characteristics of integer programs