research-article

Coordinated control of multiple prefetchers in multi-core systems

Authors:
Eiman Ebrahimi

The University of Texas at Austin

The University of Texas at Austin
View Profile

,
Onur Mutlu

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Chang Joo Lee

The University of Texas at Austin

The University of Texas at Austin
View Profile

,
Yale N. Patt

The University of Texas at Austin

The University of Texas at Austin
View Profile

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on MicroarchitectureDecember 2009Pages 316–326https://doi.org/10.1145/1669112.1669154

Published:12 December 2009Publication History

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 316–326

ABSTRACT

Aggressive prefetching is very beneficial for memory latency tolerance of many applications. However, it faces significant challenges in multi-core systems. Prefetchers of different cores on a chip multiprocessor (CMP) can cause significant interference with prefetch and demand accesses of other cores. Because existing prefetcher throttling techniques do not address this prefetcher-caused inter-core interference, aggressive prefetching in multi-core systems can lead to significant performance degradation and wasted bandwidth consumption.

To make prefetching effective in CMPs, this paper proposes a low-cost mechanism to control prefetcher-caused inter-core interference by dynamically adjusting the aggressiveness of multiple cores' prefetchers in a coordinated fashion. Our solution consists of a hierarchy of prefetcher aggressiveness control structures that combine per-core (local) and prefetcher-caused inter-core (global) interference feedback to maximize the benefits of prefetching on each core while optimizing overall system performance. These structures improve system performance by 23% while reducing bus traffic by 17% compared to employing aggressive prefetching and improve system performance by 14% compared to a state-of-the-art prefetcher aggressiveness control technique on an eight-core system.

References

J. Baer and T. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Supercomputing '91, 1991. Google ScholarDigital Library
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13, 1970. Google ScholarDigital Library
M. Charney and T. Puzak. Prefetching and memory system behavior of the SPEC95 benchmark suite. IBM Journal of Research and Development, 31(3):265--286, 1997. Google ScholarDigital Library
R. Cooksey et al. A stateless, content-directed data prefetching mechanism. In ASPLOS-X, 2002. Google ScholarDigital Library
F. Dahlgren et al. Fixed and adaptive sequential prefetching in shared memory multiprocessors. In ICPP-22, 1993. Google ScholarDigital Library
E. Ebrahimi et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA-15, 2009.Google ScholarCross Ref
S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42--53, 2008. Google ScholarDigital Library
A. Gendler et al. A PAB-based multi-prefetcher mechanism. Intl. Journal of Parallel Programming, 34(2):171--188, Apr. 2006. Google ScholarDigital Library
D. E. Goldberg and J. H. Holland. Genetic algorithms and machine learning. Journal of Machine Learning, 3(2--3):95--99, 1988. Google ScholarDigital Library
R. Iyer et al. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS'07, June 2007. Google ScholarDigital Library
D. Joseph and D. Grunwald. Prefetching using Markov predictors. In ISCA-24, 1997. Google ScholarDigital Library
N. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA-17, 1990. Google ScholarDigital Library
C. J. Lee et al. Prefetch-aware DRAM controllers. In MICRO-41, 2008. Google ScholarDigital Library
R. L. Lee, P.-C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In ICPP-16, 1987.Google Scholar
W.-F. Lin et al. Filtering superfluous prefetches using density vectors. In ICCD-19, 2001.Google Scholar
K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google Scholar
Micron. Datasheet: 2Gb DDR3 SDRAM, MT41J512M4 - 64 Meg x 4 x 8 banks, http://download.micron.com/pdf/datasheets/dram/ddr3.Google Scholar
T. C. Mowry et al. Design and evaluation of a compiler algorithm for prefetching. In ASPLOS-5, 1992. Google ScholarDigital Library
O. Mutlu et al. Using the first-level caches as filters to reduce the pollution caused by speculative memory references. Intl. Journal of Parallel Programming, 33(5):529--559, October 2005. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35, 2008. Google ScholarDigital Library
K. J. Nesbit, A. S. Dhodapkar, and J. E. Smith. AC/DC: An adaptive data cache prefetcher. In PACT, 2004. Google ScholarDigital Library
K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA-34, June 2007. Google ScholarDigital Library
H. Patil et al. Pinpointing representative portions of large intel itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarDigital Library
S. Rixner et al. Memory access scheduling. In ISCA-27, 2000. Google ScholarDigital Library
A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreading processor. In ASPLOS-IX, 2000. Google ScholarDigital Library
S. Srinath et al. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007. Google ScholarDigital Library
J. Tendler et al. POWER4 system microarchitecture. IBM Technical White Paper, Oct. 2001.Google Scholar
D. M. Tullsen and S. J. Eggers. Limitations of cache prefetching on a bus-based multiprocessor. In ISCA-20, 1993. Google ScholarDigital Library
O. Wechsler. Inside Intel Core microarchitecture. Intel Technical White Paper, 2006.Google Scholar
X. Zhuang and H.-H. S. Lee. A hardware-based cache pollution filtering mechanism for aggressive prefetches. In ICPP-32, 2003.Google ScholarCross Ref

Index Terms

Coordinated control of multiple prefetchers in multi-core systems
1. Applied computing
  1. Computers in other domains
    1. Personal computers and PC applications
      1. Microcomputers
2. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data

Recommendations

Prefetch-aware shared resource management for multi-core systems
ISCA '11

Chip multiprocessors (CMPs) share a large portion of the memory subsystem among multiple cores. Recent proposals have addressed high-performance and fair management of these shared resources; however, none of them take into account prefetch requests. ...
Read More
CAFFEINE: A Utility-Driven Prefetcher Aggressiveness Engine for Multicores

Aggressive prefetching improves system performance by hiding and tolerating off-chip memory latency. However, on a multicore system, prefetchers of different cores contend for shared resources and aggressive prefetching can degrade the overall system ...
Read More
Band-Pass Prefetching: An Effective Prefetch Management Mechanism Using Prefetch-Fraction Metric in Multi-Core Systems

In multi-core systems, an application’s prefetcher can interfere with the memory requests of other applications using the shared resources, such as last level cache and memory bandwidth. In order to minimize prefetcher-caused interference, prior ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
December 2009
601 pages
ISBN:9781605587981
DOI:10.1145/1669112
General Chairs:
David Albonesi
Cornell
,
Margaret Martonosi
Princeton
,
Program Chairs:
David August
Princeton/Parakinetics
,
José Martínez
Cornell
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 December 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feedback control
memory systems
multi-core
prefetching
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 123
  Total Citations
  View Citations
- 719
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Coordinated control of multiple prefetchers in multi-core systems

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prefetch-aware shared resource management for multi-core systems

CAFFEINE: A Utility-Driven Prefetcher Aggressiveness Engine for Multicores

Band-Pass Prefetching: An Effective Prefetch Management Mechanism Using Prefetch-Fraction Metric in Multi-Core Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Coordinated control of multiple prefetchers in multi-core systems

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prefetch-aware shared resource management for multi-core systems

CAFFEINE: A Utility-Driven Prefetcher Aggressiveness Engine for Multicores

Band-Pass Prefetching: An Effective Prefetch Management Mechanism Using Prefetch-Fraction Metric in Multi-Core Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media