Supplemental Material
Available for Download
- 1.D. Burger, J. R. Goodman, and A. Kagi. Memory bandwidth limitations of future microprocessors. In Proc. of the 23nd Annual International Symposium on Computer Architecture, pages 78-89, 1996. Google ScholarDigital Library
- 2.D. C. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, June 1997.Google ScholarDigital Library
- 3.C.-L. Chen and C.-K. Liao. Analysis of vector access performance on skewed interleaved memory. In Proc. of the 16th Annual International Symposium on Computer Architecture, pages 387-394, 1989. Google ScholarDigital Library
- 4.Compaq Computer Corp. Technology for performance: Compaq professional workstation XP1000, Jan. 1999. White paper (document number ECG050/0199).Google Scholar
- 5.V. Cuppu and B. Jacob. Organizational design trade-offs at the DRAM, memory bus, and memory controller level: Initial results. Technical Report UMD-SCA-TR-1999-2, University of Maryland, Nov. 1999.Google Scholar
- 6.V. Cuppu, B. Jacob, B. Davis, and T. Mudge. A performance comparison of contemporary DRAM architectures. In Proc. of the 26th Annual International Symposium on Computer Architecture, pages 222-233, May 1999. Google ScholarDigital Library
- 7.J. S. Emer and D. W. Clark. A characterization of processor performance in the VAX-11/780. In Proc. of the 11th Annual International Symposium on Computer Architecture, pages 301-310, 1984. Google ScholarDigital Library
- 8.Q. S. Gao. The chinese remainder theorem and the prime memory system. In Proc. of the 20th Annual International Symposium on Computer Architecture, pages 337-340, May 1993. Google ScholarDigital Library
- 9.D. T. Harper III and J. R. Jump. Performance evaluation of vector accesses in parallel memories using a skewed storage scheme. In Proc. of the 13th Annual International Symposium on Computer Architecture, pages 324-328, 1986. Google ScholarDigital Library
- 10.W.-C. Hsu and J. E. Smith. Performance of cached DRAM organizations in vector supercomputers. In Proc. of the 20th Annual International Symposium on Computer Architecture, pages 327- 336, May 1993. Google ScholarDigital Library
- 11.W. L. Lynch, G. Lauterbach, and J. I. Chamdani. Low load latency through sum-addressed memory (SAM). In Proc. of the 25th Annual International Symposium on Computer Architecture, pages 369-379, 1998. Google ScholarDigital Library
- 12.PostgreSQL Inc. PostgreSQL 6.5. http://www.postgresql.org.Google Scholar
- 13.Rambus Inc. 256/288-Mbit Direct RDRAM, 2000. http://www.rambus.com/developer/downloads/rdram 256s 0060 10.pdf.Google Scholar
- 14.B. R. Rau. Pseudo-randomly interleaved memory. In Proc. of the 18th Annual International Symposium on Computer Architecture, pages 74-83, 1991. Google ScholarDigital Library
- 15.B. R. Rau, M. S. Schlansker, and D. W. L. Yen. The CYDRA 5 stride-insensitive memory system. In Proc. of the 1989 Internaional Conference on Parallel Processing, volume 1, pages 242- 246, 1989.Google Scholar
- 16.S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proc. of the 27th Annual International Symposium on Computer Architecture, pages 128-138, 2000. Google ScholarDigital Library
- 17.T. Sakakibara, K. Kitai, T. Isobe, S. Yazawa, T. Tanaka, Y. Inagami, and Y. Tamaki. Scalable parallel memory architecture with a skew scheme. In Proc. of the 1993 International Conference on Supercomputing, pages 157-166, 1993. Google ScholarDigital Library
- 18.A. Seznec and J. Lenfant. Interleaved parallel schemes: Improving memory throughput on supercomputers. In Proc. of the 19th Annual International Symposium on Computer Architecture, pages 246-255, 1992. Google ScholarDigital Library
- 19.A. Seznec and J. Lenfant. Odd memory systems may be quite interesting. In Proc. of the 20th Annual International Symposium on Computer Architecture, pages 341-350, May 1993. Google ScholarDigital Library
- 20.K. Skadron and D. W. Clark. Design issues and tradeoffs for write buffers. In Proc. of the 3rd International Symposium on High Performance Computer Architecture, pages 144-155, Feb. 1997. Google ScholarDigital Library
- 21.G. S. Sohi. High-bandwidth interleaved memories for vector processors - a simulation study. Technical Report CS-TR-1988-790, University of Wisconsin - Madison, Sept. 1988.Google Scholar
- 22.S. T. Srinivasan and A. R. Lebeck. Load latency tolerance in dynamically scheduled processors. In Proceedings of the 31st International Symposium on Microarchitecture, 1998. Google ScholarDigital Library
- 23.Standard Performance Evaluation Corporation. SPEC CPU95 Version 1.10, May 1997.Google Scholar
- 24.Transaction Processing Performance Council. TPC Benchmark C Standard Specification, Revision 3.3.3, Apr. 1998.Google Scholar
- 25.M. Valero, T. Lang, and E. Ayguade. Conflict-free access of vectors with power-of-two strides. In Proc. of the 1992 International Conference on Supercomputing, pages 149-156, 1992. Google ScholarDigital Library
- 26.M. V. Wilkes. The memory gap, Keynote Address. In Workshop on Solving the Memory Wall Problem, June 2000.Google Scholar
- 27.W. Wong and J.-L. Baer. DRAM on-chip caching. Technical Report UW CSE 97-03-04, University of Washington, Feb. 1997.Google Scholar
- 28.J. H. Zurawski, J. E. Murray, and P. J. Lemmon. The design and verification of the AlphaStation 600 5-series workstation. Digital Technical Journal, 7(1):89-99, 1995. Google ScholarDigital Library
Index Terms
- A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality
Recommendations
Reshaping cache misses to improve row-buffer locality in multicore systems
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniquesOptimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that ...
Exploiting spatial locality in data caches using spatial footprints
Special Issue: Proceedings of the 25th annual international symposium on Computer architecture (ISCA '98)Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on a cache miss. Subsequent references to words within the same cache line result in cache hits. Although this approach benefits from spatial locality, ...
Comments