ABSTRACT
When dense matrix computations are too large to fit in cache, previous research proposes tiling to reduce or eliminate capacity misses. This paper presents a new algorithm for choosing problem-size dependent tile sizes based on the cache size and cache line size for a direct-mapped cache. The algorithm eliminates both capacity and self-interference misses and reduces cross-interference misses. We measured simulated miss rates and execution times for our algorithm and two others on a variety of problem sizes and cache organizations. At higher set associativity, our algorithm does not always achieve the best performance. However on direct-mapped caches, our algorithm improves simulated miss rates and measured execution times when compared with previous work.
- BJWE92.F. Bodin, W. Jalby, D. Windheiser, and C. Eisenbeis. A quantitative algorithm for data locality optimzation. In Code Generation-Concepts, Tools, Techniques. Springer-Verlag, 1992.Google Scholar
- CK92.S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Proceedings of Supercomputing '92, Minneapolis, MN, November 1992. Google ScholarDigital Library
- CL95.S. Carr and R. B. Lehoucq, A compiler-blockable algorithm for QR decomposition. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, San Francisco, CA, February 1995.Google Scholar
- CMT94.S. Carr, K. S. MCKinley, and C, Tseng, Compiler optimizations for improving data locahty. In Proceedings of the Sixth hzternationaI Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 1994. Google ScholarDigital Library
- Col94.S. Coleman. Selecting tile sizes based on cache and data organization. Master's thes~s, Dept. of Computer Science, University of Massachusetts, Amherst, September 1994.Google Scholar
- Ess93.K. Esseghir. Improving data locality for caches. Master's thesis, Dept. of Computer Science, Rice University, September 1993.Google Scholar
- GJG88.D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Dis* tributed Computing, 5(5):587-616, October 1988. Google ScholarDigital Library
- IT88.F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the Fifteenth Annual A CM Symposium on the Principles of Programming Languages, San Diego, CA, January 1988. Google ScholarDigital Library
- Kob87.Neal Koblitz. Graduate Texts in Mathematics. Springer-Verlag, New York, 1987.Google Scholar
- LRW91.M. Lam, E. Rothberg, and M. E. Wolf. The cache performance and opt~mizations of blocked algorithms. In Proceedings of the Fourth bzternational Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, April 1991. Google ScholarDigital Library
- NJL94.J.J. Navarro, T Juan, and T Lang. Mob forms: A class of multilevel block algorithms for dense linear algebra operations. In Proceedings of the !994 A CM International Conference on Supercomputing, pages 354-363, Manchester, England, June 1994. Google ScholarDigital Library
- Smi82.A.J. Smith. Cache memories. Computing Surveys, 14(3):473-530, September 1982. Google ScholarDigital Library
- TGJ93.O. Temam, E. Granston, and W. Jalby. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proceedings of Supercomputing '93, Portland, OR, November 1993. Google ScholarDigital Library
- WL91.M.E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and implementation, Toronto, Canada, June 1991. Google ScholarDigital Library
- Wol89.M.J. Wolfe. More iteration space tiling. In Proceedings of Supercomputing '89, pages 655-664, Reno, NV, November 1989. Google ScholarDigital Library
Index Terms
- Tile size selection using cache organization and data layout
Recommendations
Tile size selection using cache organization and data layout
When dense matrix computations are too large to fit in cache, previous research proposes tiling to reduce or eliminate capacity misses. This paper presents a new algorithm for choosing problem-size dependent tile sizes based on the cache size and cache ...
Heterogeneous way-size cache
ICS '06: Proceedings of the 20th annual international conference on SupercomputingSet-associative cache architectures are commonly used. These caches consist of a number of ways, each of the same size. We have observed that the different ways have very different utilization, which motivates the design of caches with heterogeneous way ...
Comments