Article

Free Access

A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Authors:
Zhao Zhang

Department of Computer Science, College of William and Mary, Williamsburg, VA

Department of Computer Science, College of William and Mary, Williamsburg, VA
View Profile

,
Zhichun Zhu

View Profile

,
Xiaodong Zhang

View Profile

MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on MicroarchitectureDecember 2000Pages 32–41https://doi.org/10.1145/360128.360134

Published:01 December 2000Publication History

MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture

Pages 32–41

Supplemental Material

Available for Download

p32-zhang.ps (836.1 KB)

References

1.D. Burger, J. R. Goodman, and A. Kagi. Memory bandwidth limitations of future microprocessors. In Proc. of the 23nd Annual International Symposium on Computer Architecture, pages 78-89, 1996. Google ScholarDigital Library
2.D. C. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, June 1997.Google ScholarDigital Library
3.C.-L. Chen and C.-K. Liao. Analysis of vector access performance on skewed interleaved memory. In Proc. of the 16th Annual International Symposium on Computer Architecture, pages 387-394, 1989. Google ScholarDigital Library
4.Compaq Computer Corp. Technology for performance: Compaq professional workstation XP1000, Jan. 1999. White paper (document number ECG050/0199).Google Scholar
5.V. Cuppu and B. Jacob. Organizational design trade-offs at the DRAM, memory bus, and memory controller level: Initial results. Technical Report UMD-SCA-TR-1999-2, University of Maryland, Nov. 1999.Google Scholar
6.V. Cuppu, B. Jacob, B. Davis, and T. Mudge. A performance comparison of contemporary DRAM architectures. In Proc. of the 26th Annual International Symposium on Computer Architecture, pages 222-233, May 1999. Google ScholarDigital Library
7.J. S. Emer and D. W. Clark. A characterization of processor performance in the VAX-11/780. In Proc. of the 11th Annual International Symposium on Computer Architecture, pages 301-310, 1984. Google ScholarDigital Library
8.Q. S. Gao. The chinese remainder theorem and the prime memory system. In Proc. of the 20th Annual International Symposium on Computer Architecture, pages 337-340, May 1993. Google ScholarDigital Library
9.D. T. Harper III and J. R. Jump. Performance evaluation of vector accesses in parallel memories using a skewed storage scheme. In Proc. of the 13th Annual International Symposium on Computer Architecture, pages 324-328, 1986. Google ScholarDigital Library
10.W.-C. Hsu and J. E. Smith. Performance of cached DRAM organizations in vector supercomputers. In Proc. of the 20th Annual International Symposium on Computer Architecture, pages 327- 336, May 1993. Google ScholarDigital Library
11.W. L. Lynch, G. Lauterbach, and J. I. Chamdani. Low load latency through sum-addressed memory (SAM). In Proc. of the 25th Annual International Symposium on Computer Architecture, pages 369-379, 1998. Google ScholarDigital Library
12.PostgreSQL Inc. PostgreSQL 6.5. http://www.postgresql.org.Google Scholar
13.Rambus Inc. 256/288-Mbit Direct RDRAM, 2000. http://www.rambus.com/developer/downloads/rdram 256s 0060 10.pdf.Google Scholar
14.B. R. Rau. Pseudo-randomly interleaved memory. In Proc. of the 18th Annual International Symposium on Computer Architecture, pages 74-83, 1991. Google ScholarDigital Library
15.B. R. Rau, M. S. Schlansker, and D. W. L. Yen. The CYDRA 5 stride-insensitive memory system. In Proc. of the 1989 Internaional Conference on Parallel Processing, volume 1, pages 242- 246, 1989.Google Scholar
16.S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proc. of the 27th Annual International Symposium on Computer Architecture, pages 128-138, 2000. Google ScholarDigital Library
17.T. Sakakibara, K. Kitai, T. Isobe, S. Yazawa, T. Tanaka, Y. Inagami, and Y. Tamaki. Scalable parallel memory architecture with a skew scheme. In Proc. of the 1993 International Conference on Supercomputing, pages 157-166, 1993. Google ScholarDigital Library
18.A. Seznec and J. Lenfant. Interleaved parallel schemes: Improving memory throughput on supercomputers. In Proc. of the 19th Annual International Symposium on Computer Architecture, pages 246-255, 1992. Google ScholarDigital Library
19.A. Seznec and J. Lenfant. Odd memory systems may be quite interesting. In Proc. of the 20th Annual International Symposium on Computer Architecture, pages 341-350, May 1993. Google ScholarDigital Library
20.K. Skadron and D. W. Clark. Design issues and tradeoffs for write buffers. In Proc. of the 3rd International Symposium on High Performance Computer Architecture, pages 144-155, Feb. 1997. Google ScholarDigital Library
21.G. S. Sohi. High-bandwidth interleaved memories for vector processors - a simulation study. Technical Report CS-TR-1988-790, University of Wisconsin - Madison, Sept. 1988.Google Scholar
22.S. T. Srinivasan and A. R. Lebeck. Load latency tolerance in dynamically scheduled processors. In Proceedings of the 31st International Symposium on Microarchitecture, 1998. Google ScholarDigital Library
23.Standard Performance Evaluation Corporation. SPEC CPU95 Version 1.10, May 1997.Google Scholar
24.Transaction Processing Performance Council. TPC Benchmark C Standard Specification, Revision 3.3.3, Apr. 1998.Google Scholar
25.M. Valero, T. Lang, and E. Ayguade. Conflict-free access of vectors with power-of-two strides. In Proc. of the 1992 International Conference on Supercomputing, pages 149-156, 1992. Google ScholarDigital Library
26.M. V. Wilkes. The memory gap, Keynote Address. In Workshop on Solving the Memory Wall Problem, June 2000.Google Scholar
27.W. Wong and J.-L. Baer. DRAM on-chip caching. Technical Report UW CSE 97-03-04, University of Washington, Feb. 1997.Google Scholar
28.J. H. Zurawski, J. E. Murray, and P. J. Lemmon. The design and verification of the AlphaStation 600 5-series workstation. Digital Technical Journal, 7(1):89-99, 1995. Google ScholarDigital Library

Index Terms

A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory
  2. Robustness
2. Mathematics of computing
  1. Discrete mathematics
    1. Combinatorics
      1. Permutations and combinations

Recommendations

Reshaping cache misses to improve row-buffer locality in multicore systems
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that ...
Read More
A split data cache organization based on run-time data locality estimation
Read More
Exploiting spatial locality in data caches using spatial footprints
Special Issue: Proceedings of the 25th annual international symposium on Computer architecture (ISCA '98)

Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on a cache miss. Subsequent references to words within the same cache line result in cache hits. Although this approach benefits from spatial locality, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
December 2000
357 pages
ISBN:1581131968
DOI:10.1145/360128
Chairmen:
Andrew Wolfe
S3 Incorporated
,
Michael Schlansker
Hewlett-Packard
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
MICRO 33 Paper Acceptance Rate31of110submissions,28%Overall Acceptance Rate484of2,242submissions,22%
More
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 170
  Total Citations
  View Citations
- 1,615
  Total Downloads
- Downloads (Last 12 months)132
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Reshaping cache misses to improve row-buffer locality in multicore systems

A split data cache organization based on run-time data locality estimation

Exploiting spatial locality in data caches using spatial footprints

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Reshaping cache misses to improve row-buffer locality in multicore systems

A split data cache organization based on run-time data locality estimation

Exploiting spatial locality in data caches using spatial footprints

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media