skip to main content
10.1145/1248377.1248398acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

Proximity-aware directory-based coherence for multi-core processor architectures

Published:09 June 2007Publication History

ABSTRACT

As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue for multi-core performance. This is exacerbated by the fact that interconnection speeds are not scaling well with technology. This paper describes mechanisms to accelerate coherence for a multi-core architecture that has multiple private L2 caches and a scalable point-to-point interconnect between cores. These techniques exploit the differences in geometry between chip multiprocessors and traditional multiprocessor architectures.

Directory-based protocols have been proposed as a scalable alternative to snoop-based protocols. In this paper, we discuss implementations of coherence for CMPs and propose and evaluate a novel directory-based coherence scheme to improve the performance of parallel programs on such processors. Proximity-aware coherence accelerates read and write misses by initiating cache-to-cache transfers from the spatially closest sharer. This has the dual benefit of eliminating unnecessary accesses to off-chip memory, and minimizing the distance over which communicated data moves across the network. The proposed schemes result in speedups up to 74.9% for our workloads.

References

  1. M. E. Acacio, J. Gonzalez, J. M. Garcia, and J. Duato. A novel approach to reduce l2 miss latency in shared-memory multiprocessors. In IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing Symposium, page 25, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD. http://www.amd.com/usen/processors/productinformation/0 30 118 9484%,00.html.Google ScholarGoogle Scholar
  3. L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In ISCA-27, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In Proceedings of the 33rd International Symposium on Computer Architecture, pages 264--276, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Dahlgren and J. Torrellas. Cache-only memory architectures. Computer, 32(6):72--79, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Device Group. Predictive technology model. In UC Berkeley Technical Report, 2001.Google ScholarGoogle Scholar
  7. N. Eisley, L.-S. Peh, and L. Shang. In-network cache coherence. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 321{332, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Gupta, W.-D. Weber, and T. C. Mowry. Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In ICPP (1), pages 312--321, 1990.Google ScholarGoogle Scholar
  9. A. Hartstein and T. R. Puzak. The optimum pipeline depth considering both power and performance. ACM Trans. Archit. Code Optim., 1(4):369--388, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Ho, K. Mai, and M. Horowitz. The future of wires. Proceedings of the IEEE, 89(4):490--504, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A nuca substrate for exible cmp cache sharing. In Proceedings of the 19th ACM International Conference on Supercomputing (ICS 05), June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. IBM. Power5: Presentation at microprocessor forum. 2003.Google ScholarGoogle Scholar
  13. Intel. http://www.intel.com/products/processor/coreduo/.Google ScholarGoogle Scholar
  14. P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. In IEEE MICRO Magazine, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of International Symposium on Computer Architecture, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of International Symposium on Computer Architecture, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Laudon and D. Lenoski. The SGI Origin: a ccNUMA highly scalable server. In ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture, pages 241--251, New York, NY, USA, 1997. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Henessy, M. Horowitz, and M. Lam. The stanford DASH multiprocessor. In IEEE Computer, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. M. K. Martin, M. D. Hill, and D. A. Wood. Token coherence: decoupling performance and correctness. In Proceedings of the 30th annual international symposium on Computer architecture, pages 182--193, New York, NY, USA, 2003. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. M. Michael and A. K. Nanda. Design and performance of directory caches for scalable shared memory multiprocessors. In HPCA '99: Proceedings of the 5th International Symposium on High Performance Computer Architecture, page 142, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. W. O'Krafka and A. R. Newton. An empirical evaluation of two memory-efficient directory methods. In ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pages 138--147, New York, NY, USA, 1990. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. S. Pai, P. Ranganathan, and S. V. Adve. RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors. In Proceedings of the Third Workshop on Computer Architecture Education, February 1997. Also appears in IEEE TCCA Newsletter, October 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sun. UltrasparcIV: http://siliconvalley.internet.com/news/print.php/3090801.Google ScholarGoogle Scholar
  24. M. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 336--345, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Zhang and J. Torrellas. Reducing remote conict misses: Numa with remote cache versus coma. In HPCA '97: Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture, page 272, Washington, DC, USA, 1997. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Proximity-aware directory-based coherence for multi-core processor architectures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
        June 2007
        376 pages
        ISBN:9781595936677
        DOI:10.1145/1248377

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate447of1,461submissions,31%

        Upcoming Conference

        SPAA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader