Article

Proximity-aware directory-based coherence for multi-core processor architectures

Authors:
Jeffery A. Brown

University of California, San Diego, La Jolla, CA

University of California, San Diego, La Jolla, CA
View Profile

,
Rakesh Kumar

University of Illinois at Urbana-Champaign, Urbana, IL

University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

,
Dean Tullsen

University of California, San Diego, La Jolla, CA

University of California, San Diego, La Jolla, CA
View Profile

SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architecturesJune 2007Pages 126–134https://doi.org/10.1145/1248377.1248398

Published:09 June 2007Publication History

SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures

Pages 126–134

ABSTRACT

As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue for multi-core performance. This is exacerbated by the fact that interconnection speeds are not scaling well with technology. This paper describes mechanisms to accelerate coherence for a multi-core architecture that has multiple private L2 caches and a scalable point-to-point interconnect between cores. These techniques exploit the differences in geometry between chip multiprocessors and traditional multiprocessor architectures.

Directory-based protocols have been proposed as a scalable alternative to snoop-based protocols. In this paper, we discuss implementations of coherence for CMPs and propose and evaluate a novel directory-based coherence scheme to improve the performance of parallel programs on such processors. Proximity-aware coherence accelerates read and write misses by initiating cache-to-cache transfers from the spatially closest sharer. This has the dual benefit of eliminating unnecessary accesses to off-chip memory, and minimizing the distance over which communicated data moves across the network. The proposed schemes result in speedups up to 74.9% for our workloads.

References

M. E. Acacio, J. Gonzalez, J. M. Garcia, and J. Duato. A novel approach to reduce l2 miss latency in shared-memory multiprocessors. In IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing Symposium, page 25, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
AMD. http://www.amd.com/usen/processors/productinformation/0 30 118 9484%,00.html.Google Scholar
L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In ISCA-27, 2000. Google ScholarDigital Library
J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In Proceedings of the 33rd International Symposium on Computer Architecture, pages 264--276, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
F. Dahlgren and J. Torrellas. Cache-only memory architectures. Computer, 32(6):72--79, 1999. Google ScholarDigital Library
Device Group. Predictive technology model. In UC Berkeley Technical Report, 2001.Google Scholar
N. Eisley, L.-S. Peh, and L. Shang. In-network cache coherence. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 321{332, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
A. Gupta, W.-D. Weber, and T. C. Mowry. Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In ICPP (1), pages 312--321, 1990.Google Scholar
A. Hartstein and T. R. Puzak. The optimum pipeline depth considering both power and performance. ACM Trans. Archit. Code Optim., 1(4):369--388, 2004. Google ScholarDigital Library
R. Ho, K. Mai, and M. Horowitz. The future of wires. Proceedings of the IEEE, 89(4):490--504, 2001.Google ScholarCross Ref
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A nuca substrate for exible cmp cache sharing. In Proceedings of the 19th ACM International Conference on Supercomputing (ICS 05), June 2005. Google ScholarDigital Library
IBM. Power5: Presentation at microprocessor forum. 2003.Google Scholar
Intel. http://www.intel.com/products/processor/coreduo/.Google Scholar
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. In IEEE MICRO Magazine, Mar. 2005. Google ScholarDigital Library
R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of International Symposium on Computer Architecture, 2005. Google ScholarDigital Library
R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of International Symposium on Computer Architecture, June 2005. Google ScholarDigital Library
J. Laudon and D. Lenoski. The SGI Origin: a ccNUMA highly scalable server. In ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture, pages 241--251, New York, NY, USA, 1997. ACM Press. Google ScholarDigital Library
D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Henessy, M. Horowitz, and M. Lam. The stanford DASH multiprocessor. In IEEE Computer, 1992. Google ScholarDigital Library
M. M. K. Martin, M. D. Hill, and D. A. Wood. Token coherence: decoupling performance and correctness. In Proceedings of the 30th annual international symposium on Computer architecture, pages 182--193, New York, NY, USA, 2003. ACM Press. Google ScholarDigital Library
M. M. Michael and A. K. Nanda. Design and performance of directory caches for scalable shared memory multiprocessors. In HPCA '99: Proceedings of the 5th International Symposium on High Performance Computer Architecture, page 142, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarDigital Library
B. W. O'Krafka and A. R. Newton. An empirical evaluation of two memory-efficient directory methods. In ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pages 138--147, New York, NY, USA, 1990. ACM Press. Google ScholarDigital Library
V. S. Pai, P. Ranganathan, and S. V. Adve. RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors. In Proceedings of the Third Workshop on Computer Architecture Education, February 1997. Also appears in IEEE TCCA Newsletter, October 1997. Google ScholarDigital Library
Sun. UltrasparcIV: http://siliconvalley.internet.com/news/print.php/3090801.Google Scholar
M. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 336--345, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
Z. Zhang and J. Torrellas. Reducing remote conict misses: Numa with remote cache versus coma. In HPCA '97: Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture, page 272, Washington, DC, USA, 1997. IEEE Computer Society. Google ScholarDigital Library

Index Terms

Proximity-aware directory-based coherence for multi-core processor architectures
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. General and reference
  1. Cross-computing tools and techniques
    1. Performance

Recommendations

Directory based cache coherence verification logic in CMPs cache system
MES '13: Proceedings of the First International Workshop on Many-core Embedded Systems

This work reports a high speed protocol verificaion logic for Chip Multiprocessors (CMPs) realizing directory based cache coherence system. A special class of cellular automata (CA) referred to as single length cycle 2-attractor CA (TACA), has been ...
Read More
SARC Coherence: Scaling Directory Cache Coherence in Performance and Power

The SARC project seeks to improve power scalability of shared-memory chip multiprocessors (CMPs) by making directory coherence more efficient in both power and performance. The authors describe how they eliminate two major sources of inefficiency for ...
Read More
PS directory: a scalable multilevel directory cache for CMPs

As the number of cores increases in current and future chip-multiprocessor (CMP) generations, coherence protocols must rely on novel hardware structures to scale in terms of performance, power, and area. Systems that use directory information for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
June 2007
376 pages
ISBN:9781595936677
DOI:10.1145/1248377
General Chair:
Phillip B. Gibbons
Intel Research, USA
,
Program Chair:
Christian Scheideler
Technische Universität München, Germany
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chip multiprocessors
coherence
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate447of1,461submissions,31%
Upcoming Conference
SPAA '24

Sponsor:

sigact

sigact

36th ACM Symposium on Parallelism in Algorithms and Architectures

June 17 - 21, 2024

Nantes , France
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 844
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Proximity-aware directory-based coherence for multi-core processor architectures

SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

Directory based cache coherence verification logic in CMPs cache system

SARC Coherence: Scaling Directory Cache Coherence in Performance and Power

PS directory: a scalable multilevel directory cache for CMPs