Top

The Journal of Supercomputing

Published in:

31-03-2017

To be silent or not: on the impact of evictions of clean data in cache-coherent multicores

Authors: Ricardo Fernández-Pascual, Alberto Ros, Manuel E. Acacio

Published in: The Journal of Supercomputing | Issue 10/2017

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Maintaining coherence across hundreds or even thousands of cores is not an easy task. Among all of the proposed solutions until now, directory-based cache coherence has been advocated as the most feasible way of beating the scalability hurdles that arise at such large scale. Thanks to the knowledge accumulated during the last four decades, there is general consensus on the impact of most of the design aspects of directory coherence on performance, energy consumption and cost. However, there is one subtle design point for which we have observed some divergences in contemporary research works on cache-coherent multicores. Specifically, while some recent works assume a silent replacement policy for evictions of clean data in the last-level private caches, others implement just the opposite that we call a noisy replacement policy, and even others do not mention how these evictions are managed. In this work, we put this important aspect into the spotlight, demonstrating that the way in which evictions of clean data are managed can have important influence on the performance and energy consumption of a directory-based cache coherence protocol. We show that the noisy replacement policy leads to a significant increase in the total traffic (around 20% in several cases, 9.6% on average) compared with the silent policy. Given the important fraction of the total power budget that the on-chip interconnection network of future manycores is expected to consume, assuming the silent replacement policy for clean data will lead to non-negligible energy savings. Moreover, and what is more important, we have observed that depending on the particular directory structure used, assuming silent replacements could affect performance or not. This means that the use of noisy replacements is not justified in all cases, since it would increase unnecessarily network traffic without leading to any performance advantages.

previous article A game theoretic-based distributed detection method for VM-to-hypervisor attacks in cloud environment

next article A partitioning framework for Cassandra NoSQL database using Rendezvous hashing

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Along this work, we use the terms “replacement” and “eviction” interchangeably.

We have performed a revision of most papers on cache coherence appeared in the last five editions of the proceedings of ISCA, HPCA, PACT and MICRO conferences, and we have found that, out of 36 papers, noisy replacements are assumed in 14, silent replacements are assumed in 8, and 14 papers do not mention the used policy.

Consequently, replacements are almost as frequent as cache misses once the cache hierarchy is warmed up.

Martin MMK, Hill MD, Sorin DJ (2012) Why on-chip cache coherence is here to stay. Commun ACM 55(7):78–89CrossRef

Sorin DJ, Hill MD, Wood DA (2011) A primer on memory consistency and cache coherence. Synthesis lectures on computer architecture, vol 16. Morgan & Claypool Publishers, San Rafael. ISBN 978-1608455645

Censier LM, Feautrier P (1978) A new solution to coherence problems in multicache systems. IEEE Trans Comput 27(12):1112–1118CrossRefMATH

Culler DE, Singh JP, Gupta A (1999) Parallel computer architecture: a hardware/software approach. Morgan Kaufmann Publishers Inc, Burlington

Vantrease D, Lipasti MH, Binkert N (2011) Atomic coherence: leveraging nanophotonics to build race-free cache coherence protocols. In: 17th International Symposium on High-Performance Computer Architecture (HPCA), pp 132–143

Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 38th International Symposium on Computer Architecture (ISCA), pp 93–103

Elver M, Nagarajan V (2014) TSO-CC: consistency directed cache coherence for tso. In: 20th International Symposium on High-Performance Computer Architecture (HPCA), pp 165–176

Zhang M, Bingham JD, Erickson J, Sorin DJ (2014) PVCoherence: designing flat coherence protocols for scalable verification. In: 20th International Symposium on High-Performance Computer Architecture (HPCA), pp 392–403

Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: 46th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 359–370

10.

Demetriades S, Cho S (2014) Stash directory: a scalable directory for many-core coherence. In: 20th International Symposium on High-Performance Computer Architecture (HPCA), pp 177–188

11.

Menezo LG, Puente V, Gregorio J-Á (2015) Flask coherence: a morphable hybrid coherence protocol to balance energy, performance and scalability. In: 21th International Symposium on High-Performance Computer Architecture (HPCA), pp 198–209

12.

Zhao M, Yeung D (2015) Studying the impact of multicore processor scaling on directory techniques via reuse distance analysis. In: 21th International Symposium on High-Performance Computer Architecture (HPCA), pp 590–602

13.

Sanchez D, Kozyrakis C (2012) SCD: a scalable coherence directory with flexible sharer set encoding. In: 18th International Symposium on High-Performance Computer Architecture (HPCA), pp 129–140

14.

Zhang G, Horn W, Sanchez D (2015) Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems. In: 48th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 13–25

15.

Fu Y, Nguyen TM, Wentzlaff D (2015) Coherence domain restriction on large scale systems. In: 48th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 686–698

16.

Moscibroda T, Mutlu O (2009) A case for bufferless routing in on-chip networks. In: 36th International Symposium on Computer Architecture (ISCA), pp 196–207

17.

Borkar S (2007) Thousand core chips: a technology perspective. In: 44th Design Automation Conference (DAC), pp 746–749

18.

James DV, Laundrie AT, Gjessing S, Sohi GS (1990) Scalable coherent interface. IEEE Comput 23(6):74–77CrossRef

19.

Lovett T, Clapp R (1996) STiNG: a cc-NUMA computer system for the commercial marketplace. In: 23rd International Symposium on Computer Architecture (ISCA), pp 308–317

20.

Thekkath R, Singh AP, Singh JP, John S, Hennessy JL (1997) An evaluation of a commercial cc-NUMA architecture: the CONVEX Exemplar SPP1200. In: 11th International Symposium on Parallel Processing (IPPS), pp 8–17

21.

Fernández-Pascual R, Ros A, Acacio ME (2016) Optimization of a linked cache coherence protocol for scalable manycore coherence. In: 29th International Conference on Architecture of Computing Systems (ARCS), pp 100–112

22.

Martin MM, Hill MD, Wood DA (2003) Token coherence: decoupling performance and correctness. In: 30th International Symposium on Computer Architecture (ISCA), pp 182–193

23.

Marty MR, Bingham JD, Hill MD, Hu AJ, Martin MM, Wood DA (2005) Improving multiple-CMP systems using token coherence. In: 11th International Symposium on High-Performance Computer Architecture (HPCA), pp 328–339

24.

Simoni R, Horowitz MA (1991) Dynamic pointer allocation for scalable cache coherence directories. In: International Symposium on Shared Memory Multiprocessing, pp 72–81

25.

Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: Building customized program analysis tools with dynamic instrumentation. In: 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp 190–200

26.

Martin MM, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput Archit News 33(4):92–99CrossRef

27.

Monchiero M, Ahn JH, Falcón A, Ortega D, Faraboschi P (2009) How to simulate 1000 cores. Comput Archit News 37(2):10–19CrossRef

28.

Puente V, Gregorio JA, Beivide R (2002) SICOSYS: An integrated framework for studying interconnection network in multiprocessor systems. In: 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, pp 15–22

29.

Balkind J, McKeown M, Fu Y, Nguyen T, Zhou Y, Lavrov A, Shahrad M, Fuchs A, Payne S, Liang X, Matl M, Wentzlaff D (2016) Openpiton: an open source manycore research framework. In: 21st International Conference on Architectural Support for Programming Language and Operating Systems (ASPLOS), pp 217–232

30.

Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: 22nd International Symposium on Computer Architecture (ISCA), pp 24–36

31.

Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp 72–81

32.

Alameldeen AR, Wood DA (2003) Variability in architectural simulations of multi-threaded workloads. In: 9th International Symposium on High-Performance Computer Architecture (HPCA), pp 7–18

Title: To be silent or not: on the impact of evictions of clean data in cache-coherent multicores
Authors: Ricardo Fernández-Pascual
Alberto Ros
Manuel E. Acacio
Publication date: 31-03-2017
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 10/2017
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-017-2026-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 10/2017

GPU-based exhaustive algorithms processing kNN queries

An iso-time scaling method for big data tasks executing on parallel computing systems

A game theoretic-based distributed detection method for VM-to-hypervisor attacks in cloud environment

Fault-tolerant routing methodology for hypercube and cube-connected cycles interconnection networks

Cross-group secret sharing scheme for secure usage of cloud storage over different providers and regions

A partitioning framework for Cassandra NoSQL database using Rendezvous hashing

Premium Partner