Compiler optimizations for improving data locality

Authors:
Steve Carr

Department of Computer Science, Michigan Technological University

Department of Computer Science, Michigan Technological University
View Profile

,
Kathryn S. McKinley

Department of Computer Science, University of Massachusetts

Department of Computer Science, University of Massachusetts
View Profile

,
Chau-Wen Tseng

Computer Systems Laboratory, Stanford University

Computer Systems Laboratory, Stanford University
View Profile

ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systemsNovember 1994Pages 252–262https://doi.org/10.1145/195473.195557

Published:01 November 1994Publication History

ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems

Pages 252–262

ABSTRACT

In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In this paper, we present compiler optimizations to improve data locality based on a simple yet accurate cost model. The model computes both temporal and spatial reuse of cache lines to find desirable loop organizations. The cost model drives the application of compound transformations consisting of loop permutation, loop fusion, loop distribution, and loop reversal. We demonstrate that these program transformations are useful for optimizing many programs.

To validate our optimization strategy, we implemented our algorithms and ran experiments on a large collection of scientific programs and kernels. Experiments with kernels illustrate that our model and algorithm can select and achieve the best performance. For over thirty complete applications, we executed the original and transformed versions and simulated cache hit rates. We collected statistics about the inherent characteristics of these programs and our ability to improve their data locality. To our knowledge, these studies are the first of such breadth and depth. We found performance improvements were difficult to achieve because benchmark programs typically have high hit rates even for small data caches; however, our optimizations significantly improved several programs.

References

AS79.W. Abu-Sufah. Improving the Performance of Virtual Memory Computers. PhD thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign, 1979. Google ScholarDigital Library
Car92.S. Cart. Memory-HierarchyManagement. PhD thesis, Dept. of Computer Science, Rice University, September 1992. Google ScholarDigital Library
CCK88.D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined machines. Journal of Parallel and Distributed Computing, 5(4):334-358, August 1988. Google ScholarDigital Library
CCK90.D. Callahan, S. Cart, and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of the SIG- PLAN '90 Conference on Program Language Design and Implementation, White Plains, NY, June 1990. Google ScholarDigital Library
CHH+93.K. Cooper, M. W. Hall, R. T. Hood, K, Kennedy, K. S. McKinley, J. M. Mellor-Crummey, L. Torczon, and S. K. Warren. The ParaScope parallel programming environment. Proceedings of the IEEE, 81(2):244-263,February I993.Google ScholarCross Ref
CHK93.K. Cooper, M. W. Hall, and K. Kennedy. A methodology for procedure cloning. Computer Languages, 19(2):105-I 17, February 1993.Google Scholar
CK94.S. Carr and K. Kennedy. Scalar replacement in the presence of conditional control flow. Software---Practice and Experience, 24(1):51-77, January 1994. Google ScholarDigital Library
CMT94.S. Carr, K. S. MCKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. Technical Report TR94-234, Dept. of Computer Science, Rice University, July 1994.Google ScholarDigital Library
FST91.J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelemter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer~Vedag. Google ScholarDigital Library
GJG88.D. Gannon, W. Jalby, and K. Galhvan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, October 1988. Google ScholarDigital Library
GKT91.G. Goff, K. Kennedy, and C.-W. Tseng, Practical dependence testing. In Proceedings of the SIGPLAN ' 91 Conference on Program Language Design and Implementation, Toronto, Canada, June 1991. Google ScholarDigital Library
HKM91.M.W. Hall, K. Kennedy, and K. S. MCKinley. Interprocedural transformations for parallel code generation. In Proceedings of Supercomputing' 91, Albuquerque, NM, November 1991. Google ScholarDigital Library
IT88.E Idgoin and R. Triolet. Supemode partitioning. In Proceedings of the Fifteenth Annual ACM Symposium on the Principles of Programming Languages, San Diego, CA, January 1988. Google ScholarDigital Library
KKP+81.D. Kuck, R. Kuhn, D. Padua, B. Leasure, and M. J. Wolfe. Dependence graphs and compiler optimizations. In Conference Record of the Eighth Annual ACM Symposiumon the Principles of Prograrnming Languages, Williamsburg, VA, January 1981. Google ScholarDigital Library
KM92.K. Kennedy and K. S. MCKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, Washington, DC, July 1992. Google ScholarDigital Library
KM93.K. Kennedy and K. S. MCKinley. Maximizmg loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993. Google ScholarDigital Library
KMT93.K. Kennedy, K. S. MCKinley, and C.-W. Tseng. Analysis and transformation in an interactive parallel programming tool. Concurrency: Practice & Experience, 5(7):575--602, October i993.Google Scholar
LP92.W. Li and K, Pingali. Access normalization: Loop restructuring for NUMA compilers. In Proceedings ofthe Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, October 1992. Google ScholarDigital Library
LRW91.M. Lain, E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings ofthe Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, April 1991. Google ScholarDigital Library
McK92.K.S. McKinley. Automatic and Interactive Parallelization. PhD thesis, Dept. of Computer Science, Rice University, April 1992. Google ScholarDigital Library
War84.J. Warren. A hierachical basis for reordering transformations. In Conference Record of the Eleventh Annual ACM Symposium on the Principles of Programming Languages, Salt Lake City, UT, january 1984. Google ScholarDigital Library
WL91.M.E. Wolf and M. Lain. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Program Language Design and implementation, Toronto, Canada, June 1991. Google ScholarDigital Library
Wol87.M.J. Wolfe. Iteration space tiling for memory hierarchies, December 1987. Extended version of a paper which appeared in Proceedings of the Third SIAM Conference on Parallel Processing. Google ScholarDigital Library
Wol91.M.J. Wolfe. The Troy loop restructuring research tool. in Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, IL, August 1991.Google Scholar
Wol92.M.E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Dept. of Computer Science, Stanford University, August 1992. Google ScholarDigital Library

Index Terms

Compiler optimizations for improving data locality

Recommendations

Compiler optimizations for improving data locality

In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In this paper, we present ...
Read More
Compiler Optimizations for Cache Locality and Coherence
Read More
Improving l2 cache performance through stream-directed optimizations
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
November 1994
341 pages
ISBN:0897916603
DOI:10.1145/195473
Chairmen:
Forest Baskett
Silicon Graphics
,
Douglas Clark
Princeton Univ.
ACM SIGOPS Operating Systems Review Volume 28, Issue 5
Dec. 1994
323 pages
ISSN:0163-5980
DOI:10.1145/381792
Chairman:
Henry M. Levy
Univ. of Washington, Seattle
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 29, Issue 11
Nov. 1994
323 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/195470
Editor:
Richard L. Wexelblat
Washington D.C.
Issue’s Table of Contents
Copyright © 1994 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 1994
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 245
  Total Citations
  View Citations
- 1,604
  Total Downloads
- Downloads (Last 12 months)160
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Compiler optimizations for improving data locality

ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Compiler optimizations for improving data locality

Compiler Optimizations for Cache Locality and Coherence

Improving l2 cache performance through stream-directed optimizations