- 1.S G. Abraham et al. Predictabihty of Load/Store Instruction Latencies in Proc. of MICRO-26, pp 139-152, 1993 Google ScholarDigital Library
- 2.A. Agarwal and S.D. Pudar Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches in Proc of the 20th. int. Symp Comp. Architecture, pp. } 79-190, 1993. Google ScholarDigital Library
- 3.J-L. Baer and T-F. Chen An Effective On-Chip Preloading Scheme to Reduce Data Access penalty, m Proc of Supercomputing '91 Conference, pp. 176-186, 1991. Google ScholarDigital Library
- 4.S Carr and K. Kennedy, Compiler Blockabihty of Numerical Algorithms, m Proc of the Supercomputmg'92 Conference, pp 114-124. 1992. Google ScholarDigital Library
- 5.A. Chang, Application of Sparse Matrix Methods in Electric Power System Analysis, in Proc. of the Symp. on Sparse Matrices and their Applications, pp. 113-122, 1969,Google Scholar
- 6.T-F. Chen and J-L. Baer, A Performance Study of Software and Hardware Data Prefetchmg Schemes, in Proc of the 2I th. Int. Syrup. Comp. Architecture, pp. 223-232, 1994. Google ScholarDigital Library
- 7.J.W. Fu, J.H. Patel and B.L. Janssens, Stride Directed Prefetching in Scalar Processors, in Proc of the 25th. Int. Symp on Microarchitecture (MICRO-25), pp. 102-110, 1992. Google ScholarDigital Library
- 8.D.T Harper III and D.A. Linebarger, A Dynamm Storage Scheme for Confhct Free Vector Access, in Proc o! the 14th. Int. Symp. Comp. Architecture, pp. 72-77, 1987. Google ScholarDigital Library
- 9.Y Jegou and O. Temam, Speculative Prefetchmg in Proc. of the 1993 Int. Conf on Supercomputing, pp. 57-66, 1993. Google ScholarDigital Library
- 10.N.P. Jouppl, lmprovmg Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers in Proc of the 17th. Symp. Comp. Architecture, pp. 364-373, 1990. Google ScholarDigital Library
- 11.G. Kurpanek et al. PA7200. A PA-RISC Processor with Integrated High Performance MP Bus Interface m Proc. of CompCon94, pp. 375-382, 1994,Google Scholar
- 12.M.S Lain, E.E. Rothberg and M.E. Wolf, The Cache Performance and Optimization of Blocked Algorithms m Proc. of ASPLOS 199}, pp. 67-74, 1991. Google ScholarDigital Library
- 13.S. McFarling Cache Replacement with Dynamic Exclusion in Proc of the 19th. Int. Syrup. Comp. Architecture, pp. 191-200, 1992. Google ScholarDigital Library
- 14.A. Seznec A Case for Two-Way Skewed-Associative Caches in Proc o{ the 20th. Int. Symp. Comp. Archttecture, pp. 169-178, 1993 Google ScholarDigital Library
- 15.O. Temam and N. Drach, Software Assistance for Data Caches m Proc. of the Ist, hzt. Symp, on Htgh-Performance Computer Architecture, pp. 154-163, 1995. Google ScholarDigital Library
- 16.O. Temam, E.D. Granston, W. Jalby, To Copy or not to Copy: A Compile-time Techmque for Assesing when Data Copying Should be Used to Eliminate Cache Confltcts, m Proc. ofSupercomputmg'93 Con. fkrence, pp. 410-419, 1993. Google ScholarDigital Library
- 17.K B. Theobald, H.J. Hum and G R. Gao A Design Framework for Hybrid-Access Caches m Proc of the 1st. Int Symp. on High-Performance Computer Architecture, pp. 144-153, 1995. Google ScholarDigital Library
- 18.M. Valero et al. Increasing the Number of Strides for Confltc-Free Vector Access, m Proc ofthe 19th. Int. Symp Comp Architecture, pp. 372-381, 1992. Google ScholarDigital Library
- 19.M. Wolfe, Iteration Space Tiling for Memory Hierarchies, in Proc of the Thtrd SIAM Conference on Parallel Processing.for Sctentific Computmg, Dec. 1987. Google ScholarDigital Library
- 20.Q Yang and L.W. Yang, A Novel Cache Design for Vector Processing m Proc of the 19th. Int. Symp. Comp. Archttecture. pp. 362-371, 1992 Google ScholarDigital Library
Index Terms
- A data cache with multiple caching strategies tuned to different types of locality
Recommendations
Locality-Driven Dynamic GPU Cache Bypassing
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingThis paper presents novel cache optimizations for massively parallel, throughput-oriented architectures like GPUs. L1 data caches (L1 D-caches) are critical resources for providing high-bandwidth and low-latency data accesses. However, the high number of ...
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches
Although direct-mapped caches suffer from higher miss ratios as compared to set-associative caches, they are attractive for today's high-speed pipelined processors that require very low access times. Victim caching was proposed by Jouppi [1] as an ...
Comments