skip to main content
10.1145/1454115.1454136acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Distributed cooperative caching

Authors Info & Claims
Published:25 October 2008Publication History

ABSTRACT

This paper presents the Distributed Cooperative Caching, a scalable and energy-efficient scheme to manage chip multiprocessor (CMP) cache resources. The proposed configuration is based in the Cooperative Caching framework [3] but it is intended for large scale CMPs. Both centralized and distributed configurations have the advantage of combining the benefits of private and shared caches. In our proposal, the Coherence Engine has been redesigned to allow its partitioning and thus, eliminate the size constraints imposed by the duplication of all tags. At the same time, a global replacement mechanism has been added to improve the usage of cache space. Our framework uses several Distributed Coherence Engines spread across all the nodes to improve scalability. The distribution permits a better balance of the network traffic over the entire chip avoiding bottlenecks and increasing performance for a 32-core CMP by 21% over a traditional shared memory configuration and by 57% over the Cooperative Caching scheme.

Furthermore, we have reduced the power consumption of the entire system by using a different tag allocation method and by reducing the number of tags compared on each request. For a 32-core CMP the Distributed Cooperative Caching framework provides an average improvement of the power/performance relation (MIPS3/W) of 3.66x over a traditional shared memory configuration and 4.30x over Cooperative Caching.

References

  1. M. Acacio, J. Gonzalez, J. Garcia, and J. Duato. A new scalable directory architecture for large-scale multiprocessors. In HPCA '01: 7th International Symposium on High-Performance Computer Architecture, pages 97--106, January 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Beckmann, M. Marty, and D. Wood. Asr: Adaptive selective replication for cmp caches. In MICRO-39: 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In ISCA '06: 33rd Annual International Symposium on Computer Architecture, pages 264--276, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS '07: 21st Annual International Conference on Supercomputing, pages 242--252, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Chishti, M. Powell, and T. Vijaykumar. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In MICRO-36: 36th Annual IEEE/ACM International Symposium on Microarchitecture, pages 55--66, December 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Davis, J. Laudon, and K. Olukotun. Maximizing cmp throughput with mediocre cores. In PACT '05: 14th International Conference on Parallel Architectures and Compilation Techniques, pages 51--62, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Dorsey, S. Searles, M. Ciraula, S. Johnson, N. Bujanos, D. Wu, M. Braganza, S. Meyers, E. Fang, and R. Kumar. An integrated quad-core opteron processor. In ISSCC '07: IEEE International Solid-State Circuits Conference, pages 102--103, February 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. P. Dubey. A platform 2015 workload model: Recognition, mining and synthesis moves computers to the era of tera. Intel White Paper, Intel Corporation, 2005.Google ScholarGoogle Scholar
  9. H. Dybdahl and P. Stenstrom. An adaptive shared/private nuca cache partitioning scheme for chip multiprocessors. In HPCA '07: 13th International Symposium on High Performance Computer Architecture, pages 2--12, February 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. Keckler. A nuca substrate for flexible cmp cache sharing. In ICS '05: 19th Annual International Conference on Supercomputing, pages 31--40, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. S. Kim, M. B. Taylor, J. Miller, and D. Wentzlaff. Energy characterization of a tiled architecture processor with on-chip networks. In ISLPED '03: International symposium on Low power electronics and design, pages 424--427, August 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the dash multiprocessor. In ISCA '90: 17th Annual International Symposium on Computer Architecture, pages 148--159, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Martin, M. Hill, and D. Wood. Token coherence: decoupling performance and correctness. In ISCA '03: 30th Annual International Symposium on Computer Architecture, pages 182--193, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Martin, D. J. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Monchiero, R. Canal, and A. Gonzalez. Power/performance/thermal design space exploration for multicore architectures. IEEE Transactions on Parallel and Distributed Systems, 19(5):666--681, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Mullins. Minimising dynamic power consumption in on-chip networks. International Symposium on System-on-Chip, pages 1--4, November 2006.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39: 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 423--432, December 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Sakran, M. Yuffe, M. Mehalel, J. Doweck, E. Knoll, and A. Kovacs. The implementation of the 65nm dual-core 64b merom processor. In ISSCC '07: IEEE International Solid-State Circuits Conference, pages 106--590, February 2007.Google ScholarGoogle ScholarCross RefCross Ref
  20. K. Strauss, X. Shen, and J. Torrellas. Uncorq: Unconstrained snoop request delivery in embedded-ring multiprocessors. In MICRO-40: 40th Annual IEEE/ACM International Symposium on Microarchitecture, December 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Tarjan, S. Thoziyoor, and N. Jouppi. Cacti 4.0. Technical report, HP Labs Palo Alto, June 2006.Google ScholarGoogle Scholar
  22. S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar. An 80-tile 1.28tflops network-on-chip in 65nm cmos. In ISSCC '07: IEEE International Solid-State Circuits Conference, February 2007.Google ScholarGoogle ScholarCross RefCross Ref
  23. H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: a power-performance simulator for interconnection networks. In MICRO-35: 35th Annual IEEE/ACM International Symposium on Microarchitecture, pages 294--305, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Zhang and K. Asanovic. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA '05: 32nd Annual International Symposium on Computer Architecture, pages 336--345, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Distributed cooperative caching

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
        October 2008
        328 pages
        ISBN:9781605582825
        DOI:10.1145/1454115

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 October 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate121of471submissions,26%

        Upcoming Conference

        PACT '24
        International Conference on Parallel Architectures and Compilation Techniques
        October 14 - 16, 2024
        Southern California , CA , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader