Article

Think globally, search locally

Authors:
Kamen Yotov

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
Keshav Pingali

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
Paul Stodghill

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

ICS '05: Proceedings of the 19th annual international conference on SupercomputingJune 2005Pages 141–150https://doi.org/10.1145/1088149.1088168

Published:20 June 2005Publication History

ICS '05: Proceedings of the 19th annual international conference on Supercomputing

Pages 141–150

ABSTRACT

A key step in program optimization is the determination of optimal values for code optimization parameters such as cache tile sizes and loop unrolling factors. One approach, which is implemented in most compilers, is to use analytical models to determine these values. The other approach, used in library generators like ATLAS, is to perform a global empirical search over the space of parameter values.Neither approach is completely suitable for use in general-purpose compilers that must generate high quality code for large programs running on complex architectures. Model-driven optimization may incur a performance penalty of 10-20% even for a relatively simple code like matrix multiplication. On the other hand, global search is not tractable for optimizing large programs for complex architectures because the optimization space is too large.In this paper, we advocate a methodology for generating high-performance code without increasing search time dramatically. Our methodology has three components: (i) modeling, (ii) local search, and (iii) model refinement. We demonstrate this methodology by using it to eliminate the performance gap between code produced by a model-driven version of ATLAS described by us in prior work, and code produced by the original ATLAS system using global search.

References

Automatically Tuned Linear Algebra Software (ATLAS). http://math-atlas.sourceforge.net/.]]Google Scholar
R. Allan and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, 2002.]] Google ScholarDigital Library
Gianfranco Bilardi, Paolo D'Alberto, and Alex Nicolau. Fractal matrix multiplication: A case study on portability of cache performance. In Algorithm Engineering: 5th International Workshop, WAE, 2001.]] Google ScholarDigital Library
Stephanie Coleman and Kathryn S. McKinley. Tile size selection using cache organization and data layout. In SIGPLAN Conference on Programming Language Design and Implementation, pages 279--290, 1995.]] Google ScholarDigital Library
Paolo D'Alberto and Alex Nicolau. Juliusc: A practical approach for the analysis of divide-and-conquer algorithms. In LCPC, 2004.]]Google Scholar
Jack Dongarra. Personal communication.]]Google Scholar
Evelyn Duesterwald, Rajiv Gupta, and Mary Lou Soffa. Register pipelining: An integrated approach to register allocation for scalar and subscripted variables. In Proceedings of the 4th International Conference on Compiler Construction, pages 192--206. Springer-Verlag, 1992.]] Google ScholarDigital Library
Matteo Frigo and Steven G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2), 2005. special issue on "Program Generation, Optimization, and Adaptation".]]Google ScholarCross Ref
Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, page 285. IEEE Computer Society, 1999.]] Google ScholarDigital Library
Daniel M. Lavery, Pohua P. Chang, Scott A. Mahlke, William Y. Chen, and Wen mei W. Hwu. The importance of prepass code scheduling for superscalar and superpipelined processors. IEEE Trans. Comput., 44(3):353--370, 1995.]] Google ScholarDigital Library
William Press, Saul Teukolsky, William Vetterling, and Brian Flannery. Numerical Recipes in C. Cambridge University Press, 2002.]]Google Scholar
Markus Püschel, José M. F. Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan W. Singer, Jianxin Xiong, Franz Franchetti, Aca Ga&ccirc;ić, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93(2), 2005. special issue on "Program Generation, Optimization, and Adaptation".]]Google ScholarCross Ref
Rafael H. Saavedra and Alan Jay Smith. Measuring cache and TLB performance and their effect of benchmark run. Technical Report CSD-93-767, February 1993.]] Google ScholarDigital Library
Robert Schreiber and Jack Dongarra. Automatic blocking of nested loops. Technical Report CS-90-108, Knoxville, TN 37996, USA, 1990.]] Google ScholarDigital Library
R. Clint Whaley. http://sourceforge.net/mailarchive/forum.php? thread_id=1569256&forum_id%=426.]]Google Scholar
R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, 2001.]]Google ScholarDigital Library
Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibulskis, Gerald DeJong, Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill, and Peng Wu. A comparison of empirical and model-driven optimization. In Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pages 63--76. ACM Press, 2003.]] Google ScholarDigital Library
Kamen Yotov, Xiaoming Li, Gang Ren, Maria Garzaran, David Padua, Keshav Pingali, and Paul Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93(2), 2005. special issue on "Program Generation, Optimization, and Adaptation".]]Google ScholarCross Ref
Kamen Yotov, Keshav Pingali, and Paul Stodghill. Automatic measurement of memory hierarchy parameters. In Proc. of the 2005 International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'05).]] Google ScholarDigital Library

Recommendations

Steal Locally, Share Globally

In a general-purpose computing system, several parallel applications run simultaneously on the same platform. Even if each application is highly tuned for that specific platform, additional performance issues are arising in such a dynamic environment in ...
Read More
Think Globally, Distribute Power Locally: The Promise of Nanogrids

Nanogrids use price to mediate local electricity supply and demand, improving electricity allocation at the local level, facilitating integration of local storage and generation, and achieving more efficient use of low-voltage DC from local sources.

Read More
Locally testable and locally correctable codes approaching the gilbert-varshamov bound
SODA '17: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms

One of the most important open problems in the theory of error-correcting codes is to determine the tradeoff between the rate R and minimum distance δ of a binary code. The best known tradeoff is the Gilbert-Varshamov bound, and says that for every δ ∈ (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '05: Proceedings of the 19th annual international conference on Supercomputing
June 2005
414 pages
ISBN:1595931678
DOI:10.1145/1088149
General Chair:
Arvind
MIT
,
Program Chair:
Larry Rudolph
MIT
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Upcoming Conference
ICS '24

Sponsor:

sigarch

2024 International Conference on Supercomputing

June 4 - 7, 2024

Kyoto , Japan
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 375
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Think globally, search locally

ICS '05: Proceedings of the 19th annual international conference on Supercomputing

ABSTRACT

References

Cited By

Recommendations

Steal Locally, Share Globally

Think Globally, Distribute Power Locally: The Promise of Nanogrids

Locally testable and locally correctable codes approaching the gilbert-varshamov bound

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Think globally, search locally

ICS '05: Proceedings of the 19th annual international conference on Supercomputing

ABSTRACT

References

Cited By

Recommendations

Steal Locally, Share Globally

Think Globally, Distribute Power Locally: The Promise of Nanogrids

Locally testable and locally correctable codes approaching the gilbert-varshamov bound

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media