Skip to main content
Erschienen in: The Journal of Supercomputing 8/2015

01.08.2015

PS directory: a scalable multilevel directory cache for CMPs

verfasst von: Joan J. Valls, Alberto Ros, Julio Sahuquillo, María E. Gómez

Erschienen in: The Journal of Supercomputing | Ausgabe 8/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As the number of cores increases in current and future chip-multiprocessor (CMP) generations, coherence protocols must rely on novel hardware structures to scale in terms of performance, power, and area. Systems that use directory information for coherence purposes are currently the most scalable alternative. This paper studies the important differences between the directory behavior of private and shared blocks, which claim for a separate management of both types of blocks at the directory. We propose the PS directory, a two-level directory cache that keeps the reduced number of frequently accessed shared entries in a small and fast first-level cache, namely Shared cache, and uses a larger and slower second-level Private cache to track the large amount of private blocks. Entries in the Private cache do not implement the sharer vector, which allows important silicon area savings. Speed and area reasons suggest the use of eDRAM technology, much denser but slower than SRAM technology, for the Private cache, which in turn brings energy savings. Experimental results for a 16-core CMP show that, compared to a conventional directory, the PS directory improves performance by 14 \(\%\) while reducing silicon area and energy consumption by 34 and 27 \(\%\), respectively. Also, compared to the state-of-the-art Multi-Grain Directory, the PS directory apart from increasing performance, it reduces power by 18.7 \(\%\), and provides more scalability in terms of area.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Experimental conditions are defined in Sect. 6.
 
2
Results for 0.125\(\times \) are not shown because CACTI is not able to provide results for so small caches.
 
Literatur
1.
Zurück zum Zitat Acacio ME, González J, García JM, Duato J (2001) A new scalable directory architecture for large-scale multiprocessors. In: Proceedings of 7th international symposium on high-performance computer architecture (HPCA), pp 97–106 Acacio ME, González J, García JM, Duato J (2001) A new scalable directory architecture for large-scale multiprocessors. In: Proceedings of 7th international symposium on high-performance computer architecture (HPCA), pp 97–106
2.
Zurück zum Zitat Acacio ME, González J, García JM, Duato J (2005) A two-level directory architecture for highly scalable cc-NUMA multiprocessors. IEEE Trans Parallel Distrib Syst (TPDS) 16(1):67–79CrossRef Acacio ME, González J, García JM, Duato J (2005) A two-level directory architecture for highly scalable cc-NUMA multiprocessors. IEEE Trans Parallel Distrib Syst (TPDS) 16(1):67–79CrossRef
3.
Zurück zum Zitat Agarwal N, Krishna T, Peh L-S, Jha NK (2009) GARNET: a detailed on-chip network model inside a full-system simulator. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS), pp 33–42 Agarwal N, Krishna T, Peh L-S, Jha NK (2009) GARNET: a detailed on-chip network model inside a full-system simulator. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS), pp 33–42
4.
Zurück zum Zitat Barroso LA, Gharachorloo K, McNamara R et al (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: Proceedings of 27th international symposium on computer architecture (ISCA), pp 12–14 Barroso LA, Gharachorloo K, McNamara R et al (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: Proceedings of 27th international symposium on computer architecture (ISCA), pp 12–14
5.
Zurück zum Zitat Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of 17th international conference on parallel architectures and compilation techniques (PACT), pp 72–81 Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of 17th international conference on parallel architectures and compilation techniques (PACT), pp 72–81
6.
Zurück zum Zitat Chaiken D, Kubiatowicz J, Agarwal A (1991) LimitLESS directories: a scalable cache coherence scheme. In: 4th international conference on architectural support for programming language and operating systems (ASPLOS), pp 224–234 Chaiken D, Kubiatowicz J, Agarwal A (1991) LimitLESS directories: a scalable cache coherence scheme. In: 4th international conference on architectural support for programming language and operating systems (ASPLOS), pp 224–234
7.
Zurück zum Zitat Chen G (1993) Slid: a cost-effective and scalable limited-directory scheme for cache coherence. In: 5th international conference on parallel architectures and languages Europe (PARLE), pp 341–352 Chen G (1993) Slid: a cost-effective and scalable limited-directory scheme for cache coherence. In: 5th international conference on parallel architectures and languages Europe (PARLE), pp 341–352
8.
Zurück zum Zitat Conway P, Kalyanasundharam N, Donley G, Lepak K, Hughes B (2010) Cache hierarchy and memory subsystem of the AMD opteron processor. IEEE Micro 30(2):16–29CrossRef Conway P, Kalyanasundharam N, Donley G, Lepak K, Hughes B (2010) Cache hierarchy and memory subsystem of the AMD opteron processor. IEEE Micro 30(2):16–29CrossRef
9.
Zurück zum Zitat Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: Proceedings of 38th international symposium on computer architecture (ISCA), pp 93–103 Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: Proceedings of 38th international symposium on computer architecture (ISCA), pp 93–103
10.
Zurück zum Zitat Ferdman M, Lotfi-Kamran P, Balet K, Falsafi B (2011) Cuckoo directory: a scalable directory for many-core systems. In: 17th international symposium on high-performance computer architecture (HPCA), pp 169–180 Ferdman M, Lotfi-Kamran P, Balet K, Falsafi B (2011) Cuckoo directory: a scalable directory for many-core systems. In: 17th international symposium on high-performance computer architecture (HPCA), pp 169–180
11.
Zurück zum Zitat Guo S-L, Wang H-X, Xue Y-B, Li C-M, Wang D-S (2010) Hierarchical cache directory for cmp. J Comput Sci Technol 25(2):246–256CrossRef Guo S-L, Wang H-X, Xue Y-B, Li C-M, Wang D-S (2010) Hierarchical cache directory for cmp. J Comput Sci Technol 25(2):246–256CrossRef
12.
Zurück zum Zitat Gupta A, Weber W-D, Mowry TC (1990) Reducing memory traffic requirements for scalable directory-based cache coherence schemes. In: Proceedings of international conference on parallel processing (ICPP), pp 312–321 Gupta A, Weber W-D, Mowry TC (1990) Reducing memory traffic requirements for scalable directory-based cache coherence schemes. In: Proceedings of international conference on parallel processing (ICPP), pp 312–321
13.
Zurück zum Zitat Kalla R, Sinharoy B, Starke WJ, Floyd M (2010) POWER7: IBMs next-generation server processor. IEEE Micro 30(2):7–15CrossRef Kalla R, Sinharoy B, Starke WJ, Floyd M (2010) POWER7: IBMs next-generation server processor. IEEE Micro 30(2):7–15CrossRef
14.
Zurück zum Zitat Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proceedings of 10th international conference on architectural support for programming language and operating systems (ASPLOS), pp 211–222 Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proceedings of 10th international conference on architectural support for programming language and operating systems (ASPLOS), pp 211–222
15.
Zurück zum Zitat Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of ACM SIGPLAN conference on programming language design and implementation (PLDI), June 2005, pp 190–200 Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of ACM SIGPLAN conference on programming language design and implementation (PLDI), June 2005, pp 190–200
16.
Zurück zum Zitat Magnusson PS, Christensson M, Eskilson J et al (2002) Simics: a full system simulation platform. IEEE Comput 35(2):50–58CrossRef Magnusson PS, Christensson M, Eskilson J et al (2002) Simics: a full system simulation platform. IEEE Comput 35(2):50–58CrossRef
17.
Zurück zum Zitat Martin MM, Sorin DJ, Beckmann BM et al (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput Archit News 33(4):92–99CrossRef Martin MM, Sorin DJ, Beckmann BM et al (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput Archit News 33(4):92–99CrossRef
18.
Zurück zum Zitat Marty MR, Hill MD (2007) Virtual hierarchies to support server consolidation. In: Proceedings of 34th international symposium on computer architecture (ISCA), pp 46–56 Marty MR, Hill MD (2007) Virtual hierarchies to support server consolidation. In: Proceedings of 34th international symposium on computer architecture (ISCA), pp 46–56
19.
Zurück zum Zitat Marty MR, Hill MD (2008) Virtual hierarchies. IEEE Micro 28(1):99–109CrossRef Marty MR, Hill MD (2008) Virtual hierarchies. IEEE Micro 28(1):99–109CrossRef
20.
Zurück zum Zitat Matick RE, Schuster SE (2005) Logic-based eDRAM: origins and rationale for use. IBM J Res Dev 49(1):145–165CrossRef Matick RE, Schuster SE (2005) Logic-based eDRAM: origins and rationale for use. IBM J Res Dev 49(1):145–165CrossRef
21.
Zurück zum Zitat Muralimanohar N, Balasubramonian R, Jouppi NP (2009) Cacti 6.0, HP Labs, technical report HPL-2009-85 Muralimanohar N, Balasubramonian R, Jouppi NP (2009) Cacti 6.0, HP Labs, technical report HPL-2009-85
22.
Zurück zum Zitat O’Krafka BW, Newton AR (1990) An empirical evaluation of two memory-efficient directory methods. In: Proceedings of 17th international symposium on computer architecture (ISCA), pp 138–147 O’Krafka BW, Newton AR (1990) An empirical evaluation of two memory-efficient directory methods. In: Proceedings of 17th international symposium on computer architecture (ISCA), pp 138–147
23.
Zurück zum Zitat Ros A, Acacio ME, García JM (2010) A scalable organization for distributed directories. J Syst Archit (JSA) 56(2–3):77–87CrossRef Ros A, Acacio ME, García JM (2010) A scalable organization for distributed directories. J Syst Archit (JSA) 56(2–3):77–87CrossRef
24.
Zurück zum Zitat Ros A, Cuesta B, Fernández-Pascual R, Gómez ME, Acacio ME, Robles A, García JM, Duato J (2012) Extending magny-cours cache coherence. IEEE Trans Comput (TC) 61(5):593–606CrossRef Ros A, Cuesta B, Fernández-Pascual R, Gómez ME, Acacio ME, Robles A, García JM, Duato J (2012) Extending magny-cours cache coherence. IEEE Trans Comput (TC) 61(5):593–606CrossRef
25.
Zurück zum Zitat Sanchez D, Kozyrakis C (2012) SCD: a scalable coherence directory with flexible sharer set encoding. In: Proceedings of 18th international sympoium on high-performance computer architecture (HPCA), pp 129–140 Sanchez D, Kozyrakis C (2012) SCD: a scalable coherence directory with flexible sharer set encoding. In: Proceedings of 18th international sympoium on high-performance computer architecture (HPCA), pp 129–140
26.
Zurück zum Zitat Shah M, Barreh J, Brooks J et al (2007) UltraSPARC T2: a highly-threaded, power-efficient, SPARC SoC. In: Proceedings of IEEE Asian solid-state circuits conference, pp 22–25 Shah M, Barreh J, Brooks J et al (2007) UltraSPARC T2: a highly-threaded, power-efficient, SPARC SoC. In: Proceedings of IEEE Asian solid-state circuits conference, pp 22–25
27.
Zurück zum Zitat Sinharoy B, Kalla RN, Tendler JM, Eickemeyer RJ, Joyner JB (2005) Power5 system microarchitecture. IBM J Res Dev 49(4/5):505–521CrossRef Sinharoy B, Kalla RN, Tendler JM, Eickemeyer RJ, Joyner JB (2005) Power5 system microarchitecture. IBM J Res Dev 49(4/5):505–521CrossRef
28.
Zurück zum Zitat Tendler JM, Dodson JS, Fields JS, Le H, Sinharoy B (2002) POWER4 system microarchitecture. IBM J Res Dev 46(1):5–25CrossRef Tendler JM, Dodson JS, Fields JS, Le H, Sinharoy B (2002) POWER4 system microarchitecture. IBM J Res Dev 46(1):5–25CrossRef
29.
Zurück zum Zitat Valero A, Sahuquillo J, Petit S, Lorente V, Canal R, López P, Duato J (2009) An hybrid eDRAM/SRAM macrocell to implement first-level data caches. In: Proceedings of 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 213–221 Valero A, Sahuquillo J, Petit S, Lorente V, Canal R, López P, Duato J (2009) An hybrid eDRAM/SRAM macrocell to implement first-level data caches. In: Proceedings of 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 213–221
30.
Zurück zum Zitat Valls JJ, Ros A, Sahuquillo J, Gómez ME, Duato J (2012) PS-Dir: a scalable two-level directory cache. In: Proceedings of 21st international conference on parallel architectures and compilation techniques (PACT), pp 451–452 Valls JJ, Ros A, Sahuquillo J, Gómez ME, Duato J (2012) PS-Dir: a scalable two-level directory cache. In: Proceedings of 21st international conference on parallel architectures and compilation techniques (PACT), pp 451–452
31.
Zurück zum Zitat Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of 22nd international symposium on computer architecture (ISCA), pp 24–36 Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of 22nd international symposium on computer architecture (ISCA), pp 24–36
32.
Zurück zum Zitat Wu X, Li J, Zhang L, Speight E, Rajamony R, Xie Y (2009) Hybrid cache architecture with disparate memory technologies. In: Proceedings of 36th international symposium on computer architecture (ISCA), pp 34–45 Wu X, Li J, Zhang L, Speight E, Rajamony R, Xie Y (2009) Hybrid cache architecture with disparate memory technologies. In: Proceedings of 36th international symposium on computer architecture (ISCA), pp 34–45
33.
Zurück zum Zitat Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: Proceedings of 46th IEEE/ACM international symposium on microarchitecture (MICRO), pp 359–370 Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: Proceedings of 46th IEEE/ACM international symposium on microarchitecture (MICRO), pp 359–370
34.
Zurück zum Zitat Zebchuk J, Srinivasan V, Qureshi MK, Moshovos A (2009) A tagless coherence directory. In: Proceedings of 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 423–434 Zebchuk J, Srinivasan V, Qureshi MK, Moshovos A (2009) A tagless coherence directory. In: Proceedings of 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 423–434
35.
Zurück zum Zitat Zhao H, Shriraman A, Dwarkadas S, Srinivasan V (2011) SPATL: Honey, I shrunk the coherence directory. In: Proceedings of 20th international conference on parallel architectures and compilation techniques (PACT), pp 148–157 Zhao H, Shriraman A, Dwarkadas S, Srinivasan V (2011) SPATL: Honey, I shrunk the coherence directory. In: Proceedings of 20th international conference on parallel architectures and compilation techniques (PACT), pp 148–157
Metadaten
Titel
PS directory: a scalable multilevel directory cache for CMPs
verfasst von
Joan J. Valls
Alberto Ros
Julio Sahuquillo
María E. Gómez
Publikationsdatum
01.08.2015
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 8/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1332-5

Weitere Artikel der Ausgabe 8/2015

The Journal of Supercomputing 8/2015 Zur Ausgabe

Premium Partner