research-article

Free Access

Why on-chip cache coherence is here to stay

Authors:
Milo M. K. Martin

University of Pennsylvania, Philadelphia, PA

University of Pennsylvania, Philadelphia, PA
View Profile

,
Mark D. Hill

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Daniel J. Sorin

Duke University, Durham, NC

Duke University, Durham, NC
View Profile

Authors Info & Claims

Communications of the ACM Volume 55 Issue 7July 2012pp 78–89https://doi.org/10.1145/2209249.2209269

Published:01 July 2012Publication History

Communications of the ACM

Abstract

On-chip hardware coherence can scale gracefully as the number of cores increases.

References

Agarwal, A., Simoni, R., Horowitz, M., and Hennessy, J. An evaluation of directory schemes for cache coherence. In Proceedings of the 15^th Annual International Symposium on Computer Architecture (Honolulu, May). IEEE Computer Society Press, Los Alamitos, CA, 1988, 280--298. Google ScholarDigital Library
Boyd-Wickizer, S. Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., and Zeldovich, N. An analysis of Linux scalability to many cores. In Proceedings of the Ninth USENIX Symposium on Operating Systems Design and Implementation (Vancouver, Oct. 4--6). USENIX Association, Berkeley, CA, 2010, 1--8. Google ScholarDigital Library
Bryant, R. Scaling Linux to the extreme. In Proceedings of the Linux Symposium (Boston, June 27--July 2, 2004), 133--148.Google Scholar
Butler, M., Barnes, L., Sarma, D.D., and Gelinas, B. Bulldozer: An approach to multithreaded compute performance. IEEE Micro 31, 2 (Mar./Apr. 2011), 6--15. Google ScholarDigital Library
Choi, B., Komuravelli, R., Sung, H., Smolinski, R., Honarmand, N., Adve, S.V., Adve, V.S., Carter, N.P., and Chou, C.-T. DeNovo: Rethinking the memory hierarchy for disciplined parallelism. In Proceedings of the 20^th International Conference on Parallel Architectures and Compilation Techniques (Galveston Island, TX, Oct. 10--14). IEEE Computer Society, Washington, D.C., 2011, 155--166. Google ScholarDigital Library
Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., and Hughes, B. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30, 2 (Mar./Apr. 2010), 16--29. Google ScholarDigital Library
Ferdman, M., Lotfi-Kamran, P., Balet, K., and Falsafi, B. Cuckoo directory: Efficient and scalable CMP coherence. In Proceedings of the 17^th Symposium on High-Performance Computer Architecture (San Antonio, TX, Feb. 12--16). IEEE Computer Society, Washington, D.C., 2011, 169--180. Google ScholarDigital Library
Hill, M.D., Larus, J.R., Reinhardt, S.K., and Wood, D.A. Cooperative shared memory: Software and hardware for scalable multiprocessors. ACM Transactions on Computer Systems 11, 4 (Nov. 1993), 300--318. Google ScholarDigital Library
Hill, M.D. and Smith, A.J. Evaluating associativity in CPU caches. IEEE Transactions on Computers 38, 12 (Dec. 1989), 1612--1630. Google ScholarDigital Library
Howard, J. et al. A 48-core IA-32 message-passing processor with DV FS in 45nm CMOS. In Proceedings of the International Solid-State Circuits Conference (San Francisco, Feb. 7--11, 2010), 108--109.Google Scholar
Jaleel, A., Borch, E., Bhandaru, M., Steely Jr., S.C., and Emer, J. Achieving noninclusive cache performance with inclusive caches: Temporal locality-aware cache management policies. In Proceedings of the 43^rd Annual IEEE/ACM International Symposium on Microarchitecture (Atlanta, Dec. 4--8). IEEE Computer Society, Washington, D.C., 2010, 151--162. Google ScholarDigital Library
Kelm, J.H., Johnson, D.R., Johnson, M.R., Crago, N.C., Tuohy, W., Mahesri, A., Lumetta, S.S., Frank, M.I., and Patel, S.J. Rigel: An architecture and scalable programming interface for a 1,000-core accelerator. In Proceedings of the 36^th Annual International Symposium on Computer Architecture (Austin, TX, June 20--24). ACM Press, New York, 2009, 140--151. Google ScholarDigital Library
Kelm, J.H., Johnson, D.R., Tuohy, W., Lumetta, S.S., and Patel, S.J. Cohesion: An adaptive hybrid memory model for accelerators. IEEE Micro 31, 1 (Jan./Feb. 2011), 42--55. Google ScholarDigital Library
Laudon, J. and Lenoski, D. The SGI Origin: A ccNUMA highly scalable server. In Proceedings of the 24^th Annual International Symposium on Computer Architecture (Denver, June 2--4). ACM Press, New York, 1997, 241--251. Google ScholarDigital Library
Nickolls, J. and Dally, W.J. The GPU computing era. IEEE Micro 30, 2 (Mar./Apr. 2010), 56--69. Google ScholarDigital Library
Shah, M., Barren, J., Brooks, J., Golla, R., Grohoski, G., Gura, N., Hetherington, R., Jordan, P., Luttrell, M., Olson, C., Sana, B., Sheahan, D., Spracklen, L., and Wynn, W. UltraSPARC T2: A highly treaded, power-efficient SPARC SOC. In Proceedings of the IEEE Asian Solid-State Circuits Conference (Jeju, Korea, Nov. 12--14, 2007), 22--25.Google ScholarCross Ref
Singhal, R. Inside Intel next-generation Nehalem microarchitecture. Hot Chips 20 (Stanford, CA, Aug. 24--26, 2008).Google Scholar
Sorin, D.J., Hill, M.D., and Wood, D.A. A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool Publishers, 2011. Google ScholarDigital Library
Zhang, M., Lebeck, A.R., and Sorin, D.J. Fractal coherence: Scalably verifiable cache coherence. In Proceedings of the 43^rd Annual IEEE/ACM International Symposium on Microarchitecture (Atlanta, Dec. 4--8). IEEE Computer Society, Washington, D.C., 2010, 471--482. Google ScholarDigital Library

Index Terms

Why on-chip cache coherence is here to stay
1. Computer systems organization
2. Hardware
  1. Integrated circuits

Recommendations

An efficient cache coherence mechanism for chip multiprocessors
Read More
A hybrid NoC design for cache coherence optimization for chip multiprocessors
DAC '12: Proceedings of the 49th Annual Design Automation Conference

On chip many-core systems, evolving from prior multi-processor systems, are considered as a promising solution to the performance scalability and power consumption problems. The long communication distance between the traditional multi-processors makes ...
Read More
Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors

Chip Multiprocessors (CMPs) have different technological parameters and physical constraints than earlier multi-processor systems, which should be taken into consideration when designing cache coherence protocols. Also, contemporary cache coherence ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Communications of the ACM Volume 55, Issue 7
July 2012
120 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2209249
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 190
  Total Citations
  View Citations
- 4,662
  Total Downloads
- Downloads (Last 12 months)204
- Downloads (Last 6 weeks)61
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Why on-chip cache coherence is here to stay

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

An efficient cache coherence mechanism for chip multiprocessors

A hybrid NoC design for cache coherence optimization for chip multiprocessors

Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Why on-chip cache coherence is here to stay

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

An efficient cache coherence mechanism for chip multiprocessors

A hybrid NoC design for cache coherence optimization for chip multiprocessors

Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media