research-article

Distributed cooperative caching

Authors:
Enric Herrero

Universitat Politècnica de Catalunya, Barcelona, Spain

Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

,
José González

Intel Barcelona Research Center, Barcelona, Spain

Intel Barcelona Research Center, Barcelona, Spain
View Profile

,
Ramon Canal

Universitat Politècnica de Catalunya, Barcelona, Spain

Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesOctober 2008Pages 134–143https://doi.org/10.1145/1454115.1454136

Published:25 October 2008Publication History

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Pages 134–143

ABSTRACT

This paper presents the Distributed Cooperative Caching, a scalable and energy-efficient scheme to manage chip multiprocessor (CMP) cache resources. The proposed configuration is based in the Cooperative Caching framework [3] but it is intended for large scale CMPs. Both centralized and distributed configurations have the advantage of combining the benefits of private and shared caches. In our proposal, the Coherence Engine has been redesigned to allow its partitioning and thus, eliminate the size constraints imposed by the duplication of all tags. At the same time, a global replacement mechanism has been added to improve the usage of cache space. Our framework uses several Distributed Coherence Engines spread across all the nodes to improve scalability. The distribution permits a better balance of the network traffic over the entire chip avoiding bottlenecks and increasing performance for a 32-core CMP by 21% over a traditional shared memory configuration and by 57% over the Cooperative Caching scheme.

Furthermore, we have reduced the power consumption of the entire system by using a different tag allocation method and by reducing the number of tags compared on each request. For a 32-core CMP the Distributed Cooperative Caching framework provides an average improvement of the power/performance relation (MIPS³/W) of 3.66x over a traditional shared memory configuration and 4.30x over Cooperative Caching.

References

M. Acacio, J. Gonzalez, J. Garcia, and J. Duato. A new scalable directory architecture for large-scale multiprocessors. In HPCA '01: 7th International Symposium on High-Performance Computer Architecture, pages 97--106, January 2001. Google ScholarDigital Library
B. Beckmann, M. Marty, and D. Wood. Asr: Adaptive selective replication for cmp caches. In MICRO-39: 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006. Google ScholarDigital Library
J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In ISCA '06: 33rd Annual International Symposium on Computer Architecture, pages 264--276, June 2006. Google ScholarDigital Library
J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS '07: 21st Annual International Conference on Supercomputing, pages 242--252, June 2007. Google ScholarDigital Library
Z. Chishti, M. Powell, and T. Vijaykumar. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In MICRO-36: 36th Annual IEEE/ACM International Symposium on Microarchitecture, pages 55--66, December 2003. Google ScholarDigital Library
J. Davis, J. Laudon, and K. Olukotun. Maximizing cmp throughput with mediocre cores. In PACT '05: 14th International Conference on Parallel Architectures and Compilation Techniques, pages 51--62, September 2005. Google ScholarDigital Library
J. Dorsey, S. Searles, M. Ciraula, S. Johnson, N. Bujanos, D. Wu, M. Braganza, S. Meyers, E. Fang, and R. Kumar. An integrated quad-core opteron processor. In ISSCC '07: IEEE International Solid-State Circuits Conference, pages 102--103, February 2007.Google ScholarCross Ref
P. Dubey. A platform 2015 workload model: Recognition, mining and synthesis moves computers to the era of tera. Intel White Paper, Intel Corporation, 2005.Google Scholar
H. Dybdahl and P. Stenstrom. An adaptive shared/private nuca cache partitioning scheme for chip multiprocessors. In HPCA '07: 13th International Symposium on High Performance Computer Architecture, pages 2--12, February 2007. Google ScholarDigital Library
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. Keckler. A nuca substrate for flexible cmp cache sharing. In ICS '05: 19th Annual International Conference on Supercomputing, pages 31--40, June 2005. Google ScholarDigital Library
J. S. Kim, M. B. Taylor, J. Miller, and D. Wentzlaff. Energy characterization of a tiled architecture processor with on-chip networks. In ISLPED '03: International symposium on Low power electronics and design, pages 424--427, August 2003. Google ScholarDigital Library
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the dash multiprocessor. In ISCA '90: 17th Annual International Symposium on Computer Architecture, pages 148--159, May 1990. Google ScholarDigital Library
P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002. Google ScholarDigital Library
M. Martin, M. Hill, and D. Wood. Token coherence: decoupling performance and correctness. In ISCA '03: 30th Annual International Symposium on Computer Architecture, pages 182--193, June 2003. Google ScholarDigital Library
M. Martin, D. J. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, 2005. Google ScholarDigital Library
M. Monchiero, R. Canal, and A. Gonzalez. Power/performance/thermal design space exploration for multicore architectures. IEEE Transactions on Parallel and Distributed Systems, 19(5):666--681, May 2008. Google ScholarDigital Library
R. Mullins. Minimising dynamic power consumption in on-chip networks. International Symposium on System-on-Chip, pages 1--4, November 2006.Google ScholarCross Ref
M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39: 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 423--432, December 2006. Google ScholarDigital Library
N. Sakran, M. Yuffe, M. Mehalel, J. Doweck, E. Knoll, and A. Kovacs. The implementation of the 65nm dual-core 64b merom processor. In ISSCC '07: IEEE International Solid-State Circuits Conference, pages 106--590, February 2007.Google ScholarCross Ref
K. Strauss, X. Shen, and J. Torrellas. Uncorq: Unconstrained snoop request delivery in embedded-ring multiprocessors. In MICRO-40: 40th Annual IEEE/ACM International Symposium on Microarchitecture, December 2007. Google ScholarDigital Library
D. Tarjan, S. Thoziyoor, and N. Jouppi. Cacti 4.0. Technical report, HP Labs Palo Alto, June 2006.Google Scholar
S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar. An 80-tile 1.28tflops network-on-chip in 65nm cmos. In ISSCC '07: IEEE International Solid-State Circuits Conference, February 2007.Google ScholarCross Ref
H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: a power-performance simulator for interconnection networks. In MICRO-35: 35th Annual IEEE/ACM International Symposium on Microarchitecture, pages 294--305, November 2002. Google ScholarDigital Library
M. Zhang and K. Asanovic. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA '05: 32nd Annual International Symposium on Computer Architecture, pages 336--345, June 2005. Google ScholarDigital Library

Index Terms

Distributed cooperative caching
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors
ISCA '10

Next generation tiled microarchitectures are going to be limited by off-chip misses and by on-chip network usage. Furthermore, these platforms will run an heterogeneous mix of applications with very different memory needs, leading to significant ...
Read More
Cooperative Caching for Chip Multiprocessors

This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through ...
Read More
Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Next generation tiled microarchitectures are going to be limited by off-chip misses and by on-chip network usage. Furthermore, these platforms will run an heterogeneous mix of applications with very different memory needs, leading to significant ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
October 2008
328 pages
ISBN:9781605582825
DOI:10.1145/1454115
General Chair:
Andreas Moshovos
University of Toronto, Canada
,
Program Chairs:
David Tarditi
Microsoft, USA
,
Kunle Olukotun
Stanford University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chip multiprocessors
distributed cooperative caching
energy efficiency
memory hierarchy
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 732
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Distributed cooperative caching

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

ABSTRACT

References

Cited By

Index Terms

Recommendations

Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors

Cooperative Caching for Chip Multiprocessors

Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Distributed cooperative caching

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

ABSTRACT

References

Cited By

Index Terms

Recommendations

Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors

Cooperative Caching for Chip Multiprocessors

Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media