Tile size selection using cache organization and data layout

Authors:
Stephanie Coleman

Intermetrics, Inc., 733 Concord Ave., Cambridge, MA

Intermetrics, Inc., 733 Concord Ave., Cambridge, MA
View Profile

,
Kathryn S. McKinley

Computer Science, LGRC, University of Massachusetts, Amherst, MA

Computer Science, LGRC, University of Massachusetts, Amherst, MA
View Profile

PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementationJune 1995Pages 279–290https://doi.org/10.1145/207110.207162

Published:01 June 1995Publication History

PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation

Pages 279–290

ABSTRACT

When dense matrix computations are too large to fit in cache, previous research proposes tiling to reduce or eliminate capacity misses. This paper presents a new algorithm for choosing problem-size dependent tile sizes based on the cache size and cache line size for a direct-mapped cache. The algorithm eliminates both capacity and self-interference misses and reduces cross-interference misses. We measured simulated miss rates and execution times for our algorithm and two others on a variety of problem sizes and cache organizations. At higher set associativity, our algorithm does not always achieve the best performance. However on direct-mapped caches, our algorithm improves simulated miss rates and measured execution times when compared with previous work.

References

BJWE92.F. Bodin, W. Jalby, D. Windheiser, and C. Eisenbeis. A quantitative algorithm for data locality optimzation. In Code Generation-Concepts, Tools, Techniques. Springer-Verlag, 1992.Google Scholar
CK92.S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Proceedings of Supercomputing '92, Minneapolis, MN, November 1992. Google ScholarDigital Library
CL95.S. Carr and R. B. Lehoucq, A compiler-blockable algorithm for QR decomposition. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, San Francisco, CA, February 1995.Google Scholar
CMT94.S. Carr, K. S. MCKinley, and C, Tseng, Compiler optimizations for improving data locahty. In Proceedings of the Sixth hzternationaI Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 1994. Google ScholarDigital Library
Col94.S. Coleman. Selecting tile sizes based on cache and data organization. Master's thes~s, Dept. of Computer Science, University of Massachusetts, Amherst, September 1994.Google Scholar
Ess93.K. Esseghir. Improving data locality for caches. Master's thesis, Dept. of Computer Science, Rice University, September 1993.Google Scholar
GJG88.D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Dis* tributed Computing, 5(5):587-616, October 1988. Google ScholarDigital Library
IT88.F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the Fifteenth Annual A CM Symposium on the Principles of Programming Languages, San Diego, CA, January 1988. Google ScholarDigital Library
Kob87.Neal Koblitz. Graduate Texts in Mathematics. Springer-Verlag, New York, 1987.Google Scholar
LRW91.M. Lam, E. Rothberg, and M. E. Wolf. The cache performance and opt~mizations of blocked algorithms. In Proceedings of the Fourth bzternational Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, April 1991. Google ScholarDigital Library
NJL94.J.J. Navarro, T Juan, and T Lang. Mob forms: A class of multilevel block algorithms for dense linear algebra operations. In Proceedings of the !994 A CM International Conference on Supercomputing, pages 354-363, Manchester, England, June 1994. Google ScholarDigital Library
Smi82.A.J. Smith. Cache memories. Computing Surveys, 14(3):473-530, September 1982. Google ScholarDigital Library
TGJ93.O. Temam, E. Granston, and W. Jalby. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proceedings of Supercomputing '93, Portland, OR, November 1993. Google ScholarDigital Library
WL91.M.E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and implementation, Toronto, Canada, June 1991. Google ScholarDigital Library
Wol89.M.J. Wolfe. More iteration space tiling. In Proceedings of Supercomputing '89, pages 655-664, Reno, NV, November 1989. Google ScholarDigital Library

Index Terms

Tile size selection using cache organization and data layout
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Tile size selection using cache organization and data layout

When dense matrix computations are too large to fit in cache, previous research proposes tiling to reduce or eliminate capacity misses. This paper presents a new algorithm for choosing problem-size dependent tile sizes based on the cache size and cache ...
Read More
Heterogeneous way-size cache
ICS '06: Proceedings of the 20th annual international conference on Supercomputing

Set-associative cache architectures are commonly used. These caches consist of a number of ways, each of the same size. We have observed that the different ways have very different utilization, which motivates the design of caches with heterogeneous way ...
Read More
A split data cache organization based on run-time data locality estimation
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
June 1995
335 pages
ISBN:0897916972
DOI:10.1145/207110
Chairman:
David W. Wall
Digital—Western Research Laboratory
ACM SIGPLAN Notices Volume 30, Issue 6
June 1995
327 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/223428
Editors:
Brent T. Hailpern
IBM T. J. Watson Research Center, Yorktown Heoghts, NY
,
Richard L. Wexelblat
IRS IS:AO, Washington, DC
Issue’s Table of Contents
Copyright © 1995 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1995
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
PLDI '95 Paper Acceptance Rate28of105submissions,27%Overall Acceptance Rate406of2,067submissions,20%
More
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 370
  Total Citations
  View Citations
- 1,213
  Total Downloads
- Downloads (Last 12 months)162
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Tile size selection using cache organization and data layout

PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Tile size selection using cache organization and data layout

Heterogeneous way-size cache

A split data cache organization based on run-time data locality estimation