skip to main content
research-article
Free Access

Why on-chip cache coherence is here to stay

Published:01 July 2012Publication History
Skip Abstract Section

Abstract

On-chip hardware coherence can scale gracefully as the number of cores increases.

References

  1. Agarwal, A., Simoni, R., Horowitz, M., and Hennessy, J. An evaluation of directory schemes for cache coherence. In Proceedings of the 15th Annual International Symposium on Computer Architecture (Honolulu, May). IEEE Computer Society Press, Los Alamitos, CA, 1988, 280--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Boyd-Wickizer, S. Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., and Zeldovich, N. An analysis of Linux scalability to many cores. In Proceedings of the Ninth USENIX Symposium on Operating Systems Design and Implementation (Vancouver, Oct. 4--6). USENIX Association, Berkeley, CA, 2010, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bryant, R. Scaling Linux to the extreme. In Proceedings of the Linux Symposium (Boston, June 27--July 2, 2004), 133--148.Google ScholarGoogle Scholar
  4. Butler, M., Barnes, L., Sarma, D.D., and Gelinas, B. Bulldozer: An approach to multithreaded compute performance. IEEE Micro 31, 2 (Mar./Apr. 2011), 6--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Choi, B., Komuravelli, R., Sung, H., Smolinski, R., Honarmand, N., Adve, S.V., Adve, V.S., Carter, N.P., and Chou, C.-T. DeNovo: Rethinking the memory hierarchy for disciplined parallelism. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (Galveston Island, TX, Oct. 10--14). IEEE Computer Society, Washington, D.C., 2011, 155--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., and Hughes, B. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30, 2 (Mar./Apr. 2010), 16--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ferdman, M., Lotfi-Kamran, P., Balet, K., and Falsafi, B. Cuckoo directory: Efficient and scalable CMP coherence. In Proceedings of the 17th Symposium on High-Performance Computer Architecture (San Antonio, TX, Feb. 12--16). IEEE Computer Society, Washington, D.C., 2011, 169--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hill, M.D., Larus, J.R., Reinhardt, S.K., and Wood, D.A. Cooperative shared memory: Software and hardware for scalable multiprocessors. ACM Transactions on Computer Systems 11, 4 (Nov. 1993), 300--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hill, M.D. and Smith, A.J. Evaluating associativity in CPU caches. IEEE Transactions on Computers 38, 12 (Dec. 1989), 1612--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Howard, J. et al. A 48-core IA-32 message-passing processor with DV FS in 45nm CMOS. In Proceedings of the International Solid-State Circuits Conference (San Francisco, Feb. 7--11, 2010), 108--109.Google ScholarGoogle Scholar
  11. Jaleel, A., Borch, E., Bhandaru, M., Steely Jr., S.C., and Emer, J. Achieving noninclusive cache performance with inclusive caches: Temporal locality-aware cache management policies. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (Atlanta, Dec. 4--8). IEEE Computer Society, Washington, D.C., 2010, 151--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kelm, J.H., Johnson, D.R., Johnson, M.R., Crago, N.C., Tuohy, W., Mahesri, A., Lumetta, S.S., Frank, M.I., and Patel, S.J. Rigel: An architecture and scalable programming interface for a 1,000-core accelerator. In Proceedings of the 36th Annual International Symposium on Computer Architecture (Austin, TX, June 20--24). ACM Press, New York, 2009, 140--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kelm, J.H., Johnson, D.R., Tuohy, W., Lumetta, S.S., and Patel, S.J. Cohesion: An adaptive hybrid memory model for accelerators. IEEE Micro 31, 1 (Jan./Feb. 2011), 42--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Laudon, J. and Lenoski, D. The SGI Origin: A ccNUMA highly scalable server. In Proceedings of the 24th Annual International Symposium on Computer Architecture (Denver, June 2--4). ACM Press, New York, 1997, 241--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nickolls, J. and Dally, W.J. The GPU computing era. IEEE Micro 30, 2 (Mar./Apr. 2010), 56--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shah, M., Barren, J., Brooks, J., Golla, R., Grohoski, G., Gura, N., Hetherington, R., Jordan, P., Luttrell, M., Olson, C., Sana, B., Sheahan, D., Spracklen, L., and Wynn, W. UltraSPARC T2: A highly treaded, power-efficient SPARC SOC. In Proceedings of the IEEE Asian Solid-State Circuits Conference (Jeju, Korea, Nov. 12--14, 2007), 22--25.Google ScholarGoogle ScholarCross RefCross Ref
  17. Singhal, R. Inside Intel next-generation Nehalem microarchitecture. Hot Chips 20 (Stanford, CA, Aug. 24--26, 2008).Google ScholarGoogle Scholar
  18. Sorin, D.J., Hill, M.D., and Wood, D.A. A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool Publishers, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhang, M., Lebeck, A.R., and Sorin, D.J. Fractal coherence: Scalably verifiable cache coherence. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (Atlanta, Dec. 4--8). IEEE Computer Society, Washington, D.C., 2010, 471--482. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Why on-chip cache coherence is here to stay

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 55, Issue 7
        July 2012
        120 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/2209249
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 July 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Popular
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format