Skip to main content
Top
Published in: The Journal of Supercomputing 10/2017

31-03-2017

To be silent or not: on the impact of evictions of clean data in cache-coherent multicores

Authors: Ricardo Fernández-Pascual, Alberto Ros, Manuel E. Acacio

Published in: The Journal of Supercomputing | Issue 10/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Maintaining coherence across hundreds or even thousands of cores is not an easy task. Among all of the proposed solutions until now, directory-based cache coherence has been advocated as the most feasible way of beating the scalability hurdles that arise at such large scale. Thanks to the knowledge accumulated during the last four decades, there is general consensus on the impact of most of the design aspects of directory coherence on performance, energy consumption and cost. However, there is one subtle design point for which we have observed some divergences in contemporary research works on cache-coherent multicores. Specifically, while some recent works assume a silent replacement policy for evictions of clean data in the last-level private caches, others implement just the opposite that we call a noisy replacement policy, and even others do not mention how these evictions are managed. In this work, we put this important aspect into the spotlight, demonstrating that the way in which evictions of clean data are managed can have important influence on the performance and energy consumption of a directory-based cache coherence protocol. We show that the noisy replacement policy leads to a significant increase in the total traffic (around 20% in several cases, 9.6% on average) compared with the silent policy. Given the important fraction of the total power budget that the on-chip interconnection network of future manycores is expected to consume, assuming the silent replacement policy for clean data will lead to non-negligible energy savings. Moreover, and what is more important, we have observed that depending on the particular directory structure used, assuming silent replacements could affect performance or not. This means that the use of noisy replacements is not justified in all cases, since it would increase unnecessarily network traffic without leading to any performance advantages.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
Along this work, we use the terms “replacement” and “eviction” interchangeably.
 
2
We have performed a revision of most papers on cache coherence appeared in the last five editions of the proceedings of ISCA, HPCA, PACT and MICRO conferences, and we have found that, out of 36 papers, noisy replacements are assumed in 14, silent replacements are assumed in 8, and 14 papers do not mention the used policy.
 
3
Consequently, replacements are almost as frequent as cache misses once the cache hierarchy is warmed up.
 
Literature
1.
go back to reference Martin MMK, Hill MD, Sorin DJ (2012) Why on-chip cache coherence is here to stay. Commun ACM 55(7):78–89CrossRef Martin MMK, Hill MD, Sorin DJ (2012) Why on-chip cache coherence is here to stay. Commun ACM 55(7):78–89CrossRef
2.
go back to reference Sorin DJ, Hill MD, Wood DA (2011) A primer on memory consistency and cache coherence. Synthesis lectures on computer architecture, vol 16. Morgan & Claypool Publishers, San Rafael. ISBN 978-1608455645 Sorin DJ, Hill MD, Wood DA (2011) A primer on memory consistency and cache coherence. Synthesis lectures on computer architecture, vol 16. Morgan & Claypool Publishers, San Rafael. ISBN 978-1608455645
3.
go back to reference Censier LM, Feautrier P (1978) A new solution to coherence problems in multicache systems. IEEE Trans Comput 27(12):1112–1118CrossRefMATH Censier LM, Feautrier P (1978) A new solution to coherence problems in multicache systems. IEEE Trans Comput 27(12):1112–1118CrossRefMATH
4.
go back to reference Culler DE, Singh JP, Gupta A (1999) Parallel computer architecture: a hardware/software approach. Morgan Kaufmann Publishers Inc, Burlington Culler DE, Singh JP, Gupta A (1999) Parallel computer architecture: a hardware/software approach. Morgan Kaufmann Publishers Inc, Burlington
5.
go back to reference Vantrease D, Lipasti MH, Binkert N (2011) Atomic coherence: leveraging nanophotonics to build race-free cache coherence protocols. In: 17th International Symposium on High-Performance Computer Architecture (HPCA), pp 132–143 Vantrease D, Lipasti MH, Binkert N (2011) Atomic coherence: leveraging nanophotonics to build race-free cache coherence protocols. In: 17th International Symposium on High-Performance Computer Architecture (HPCA), pp 132–143
6.
go back to reference Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 38th International Symposium on Computer Architecture (ISCA), pp 93–103 Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 38th International Symposium on Computer Architecture (ISCA), pp 93–103
7.
go back to reference Elver M, Nagarajan V (2014) TSO-CC: consistency directed cache coherence for tso. In: 20th International Symposium on High-Performance Computer Architecture (HPCA), pp 165–176 Elver M, Nagarajan V (2014) TSO-CC: consistency directed cache coherence for tso. In: 20th International Symposium on High-Performance Computer Architecture (HPCA), pp 165–176
8.
go back to reference Zhang M, Bingham JD, Erickson J, Sorin DJ (2014) PVCoherence: designing flat coherence protocols for scalable verification. In: 20th International Symposium on High-Performance Computer Architecture (HPCA), pp 392–403 Zhang M, Bingham JD, Erickson J, Sorin DJ (2014) PVCoherence: designing flat coherence protocols for scalable verification. In: 20th International Symposium on High-Performance Computer Architecture (HPCA), pp 392–403
9.
go back to reference Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: 46th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 359–370 Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: 46th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 359–370
10.
go back to reference Demetriades S, Cho S (2014) Stash directory: a scalable directory for many-core coherence. In: 20th International Symposium on High-Performance Computer Architecture (HPCA), pp 177–188 Demetriades S, Cho S (2014) Stash directory: a scalable directory for many-core coherence. In: 20th International Symposium on High-Performance Computer Architecture (HPCA), pp 177–188
11.
go back to reference Menezo LG, Puente V, Gregorio J-Á (2015) Flask coherence: a morphable hybrid coherence protocol to balance energy, performance and scalability. In: 21th International Symposium on High-Performance Computer Architecture (HPCA), pp 198–209 Menezo LG, Puente V, Gregorio J-Á (2015) Flask coherence: a morphable hybrid coherence protocol to balance energy, performance and scalability. In: 21th International Symposium on High-Performance Computer Architecture (HPCA), pp 198–209
12.
go back to reference Zhao M, Yeung D (2015) Studying the impact of multicore processor scaling on directory techniques via reuse distance analysis. In: 21th International Symposium on High-Performance Computer Architecture (HPCA), pp 590–602 Zhao M, Yeung D (2015) Studying the impact of multicore processor scaling on directory techniques via reuse distance analysis. In: 21th International Symposium on High-Performance Computer Architecture (HPCA), pp 590–602
13.
go back to reference Sanchez D, Kozyrakis C (2012) SCD: a scalable coherence directory with flexible sharer set encoding. In: 18th International Symposium on High-Performance Computer Architecture (HPCA), pp 129–140 Sanchez D, Kozyrakis C (2012) SCD: a scalable coherence directory with flexible sharer set encoding. In: 18th International Symposium on High-Performance Computer Architecture (HPCA), pp 129–140
14.
go back to reference Zhang G, Horn W, Sanchez D (2015) Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems. In: 48th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 13–25 Zhang G, Horn W, Sanchez D (2015) Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems. In: 48th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 13–25
15.
go back to reference Fu Y, Nguyen TM, Wentzlaff D (2015) Coherence domain restriction on large scale systems. In: 48th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 686–698 Fu Y, Nguyen TM, Wentzlaff D (2015) Coherence domain restriction on large scale systems. In: 48th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 686–698
16.
go back to reference Moscibroda T, Mutlu O (2009) A case for bufferless routing in on-chip networks. In: 36th International Symposium on Computer Architecture (ISCA), pp 196–207 Moscibroda T, Mutlu O (2009) A case for bufferless routing in on-chip networks. In: 36th International Symposium on Computer Architecture (ISCA), pp 196–207
17.
go back to reference Borkar S (2007) Thousand core chips: a technology perspective. In: 44th Design Automation Conference (DAC), pp 746–749 Borkar S (2007) Thousand core chips: a technology perspective. In: 44th Design Automation Conference (DAC), pp 746–749
18.
go back to reference James DV, Laundrie AT, Gjessing S, Sohi GS (1990) Scalable coherent interface. IEEE Comput 23(6):74–77CrossRef James DV, Laundrie AT, Gjessing S, Sohi GS (1990) Scalable coherent interface. IEEE Comput 23(6):74–77CrossRef
19.
go back to reference Lovett T, Clapp R (1996) STiNG: a cc-NUMA computer system for the commercial marketplace. In: 23rd International Symposium on Computer Architecture (ISCA), pp 308–317 Lovett T, Clapp R (1996) STiNG: a cc-NUMA computer system for the commercial marketplace. In: 23rd International Symposium on Computer Architecture (ISCA), pp 308–317
20.
go back to reference Thekkath R, Singh AP, Singh JP, John S, Hennessy JL (1997) An evaluation of a commercial cc-NUMA architecture: the CONVEX Exemplar SPP1200. In: 11th International Symposium on Parallel Processing (IPPS), pp 8–17 Thekkath R, Singh AP, Singh JP, John S, Hennessy JL (1997) An evaluation of a commercial cc-NUMA architecture: the CONVEX Exemplar SPP1200. In: 11th International Symposium on Parallel Processing (IPPS), pp 8–17
21.
go back to reference Fernández-Pascual R, Ros A, Acacio ME (2016) Optimization of a linked cache coherence protocol for scalable manycore coherence. In: 29th International Conference on Architecture of Computing Systems (ARCS), pp 100–112 Fernández-Pascual R, Ros A, Acacio ME (2016) Optimization of a linked cache coherence protocol for scalable manycore coherence. In: 29th International Conference on Architecture of Computing Systems (ARCS), pp 100–112
22.
go back to reference Martin MM, Hill MD, Wood DA (2003) Token coherence: decoupling performance and correctness. In: 30th International Symposium on Computer Architecture (ISCA), pp 182–193 Martin MM, Hill MD, Wood DA (2003) Token coherence: decoupling performance and correctness. In: 30th International Symposium on Computer Architecture (ISCA), pp 182–193
23.
go back to reference Marty MR, Bingham JD, Hill MD, Hu AJ, Martin MM, Wood DA (2005) Improving multiple-CMP systems using token coherence. In: 11th International Symposium on High-Performance Computer Architecture (HPCA), pp 328–339 Marty MR, Bingham JD, Hill MD, Hu AJ, Martin MM, Wood DA (2005) Improving multiple-CMP systems using token coherence. In: 11th International Symposium on High-Performance Computer Architecture (HPCA), pp 328–339
24.
go back to reference Simoni R, Horowitz MA (1991) Dynamic pointer allocation for scalable cache coherence directories. In: International Symposium on Shared Memory Multiprocessing, pp 72–81 Simoni R, Horowitz MA (1991) Dynamic pointer allocation for scalable cache coherence directories. In: International Symposium on Shared Memory Multiprocessing, pp 72–81
25.
go back to reference Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: Building customized program analysis tools with dynamic instrumentation. In: 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp 190–200 Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: Building customized program analysis tools with dynamic instrumentation. In: 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp 190–200
26.
go back to reference Martin MM, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput Archit News 33(4):92–99CrossRef Martin MM, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput Archit News 33(4):92–99CrossRef
27.
go back to reference Monchiero M, Ahn JH, Falcón A, Ortega D, Faraboschi P (2009) How to simulate 1000 cores. Comput Archit News 37(2):10–19CrossRef Monchiero M, Ahn JH, Falcón A, Ortega D, Faraboschi P (2009) How to simulate 1000 cores. Comput Archit News 37(2):10–19CrossRef
28.
go back to reference Puente V, Gregorio JA, Beivide R (2002) SICOSYS: An integrated framework for studying interconnection network in multiprocessor systems. In: 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, pp 15–22 Puente V, Gregorio JA, Beivide R (2002) SICOSYS: An integrated framework for studying interconnection network in multiprocessor systems. In: 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, pp 15–22
29.
go back to reference Balkind J, McKeown M, Fu Y, Nguyen T, Zhou Y, Lavrov A, Shahrad M, Fuchs A, Payne S, Liang X, Matl M, Wentzlaff D (2016) Openpiton: an open source manycore research framework. In: 21st International Conference on Architectural Support for Programming Language and Operating Systems (ASPLOS), pp 217–232 Balkind J, McKeown M, Fu Y, Nguyen T, Zhou Y, Lavrov A, Shahrad M, Fuchs A, Payne S, Liang X, Matl M, Wentzlaff D (2016) Openpiton: an open source manycore research framework. In: 21st International Conference on Architectural Support for Programming Language and Operating Systems (ASPLOS), pp 217–232
30.
go back to reference Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: 22nd International Symposium on Computer Architecture (ISCA), pp 24–36 Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: 22nd International Symposium on Computer Architecture (ISCA), pp 24–36
31.
go back to reference Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp 72–81 Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp 72–81
32.
go back to reference Alameldeen AR, Wood DA (2003) Variability in architectural simulations of multi-threaded workloads. In: 9th International Symposium on High-Performance Computer Architecture (HPCA), pp 7–18 Alameldeen AR, Wood DA (2003) Variability in architectural simulations of multi-threaded workloads. In: 9th International Symposium on High-Performance Computer Architecture (HPCA), pp 7–18
Metadata
Title
To be silent or not: on the impact of evictions of clean data in cache-coherent multicores
Authors
Ricardo Fernández-Pascual
Alberto Ros
Manuel E. Acacio
Publication date
31-03-2017
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 10/2017
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2026-6

Other articles of this Issue 10/2017

The Journal of Supercomputing 10/2017 Go to the issue

Premium Partner