Skip to main content
Erschienen in: The Journal of Supercomputing 1/2013

01.10.2013

DP&TB: a coherence filtering protocol for many-core chip multiprocessors

verfasst von: Fengkai Yuan, Zhenzhou Ji

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Future many-core chip multiprocessors (CMPs) will integrate hundreds of processor cores on chip. Two cache coherence protocols are the mainstream applied to current CMPs. The token-based protocol (Token) provides high performance, but it generates a prohibitive amount of network traffic, which translates into excessive power consumption. The directory-based protocol (Directory) reduces network traffic, yet trades off with the storage overhead of the directory as well as entails comparatively low performance caused by indirection limiting its applicability for many-core CMPs.
In this work, we present DP&TB, a novel cache coherence protocol particularly suited to future many-core CMPs. In DP&TB, cache coherence is maintained at the granularity of a page, facilitating to filter out either unnecessary coherence inspections for blocks inside private pages or network traffic for blocks inside shared pages. We employ Directory to detect private and shared pages and Token to maintain the coherence of the blocks inside shared pages. DP&TB inherits the merit of Directory and Token and overcome their problems. Experimental results show that DP&TB comprehensively beyond Directory and Token with improvement by 9.1 % in performance over Token and by 13.8 % in network traffic over Directory. In addition, the storage overhead of DP&TB is less than half of that of Directory. Our proposal can fulfill the requirement of many-core CMPs to achieve high performance, power and area efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agarwal N, Krishna T, Peh L-S, Jha NK (2009) GARNET: a detailed on-chip network model inside a full-system simulator. In: IEEE intl symp on performance analysis of systems and software (ISPASS), pp 33–42 Agarwal N, Krishna T, Peh L-S, Jha NK (2009) GARNET: a detailed on-chip network model inside a full-system simulator. In: IEEE intl symp on performance analysis of systems and software (ISPASS), pp 33–42
2.
Zurück zum Zitat Barroso LA, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: 27th intl symp on computer architecture (ISCA), pp 12–14 Barroso LA, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: 27th intl symp on computer architecture (ISCA), pp 12–14
3.
Zurück zum Zitat Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: 17th intl conference on parallel architectures and compilation techniques (PACT), pp 72–81 CrossRef Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: 17th intl conference on parallel architectures and compilation techniques (PACT), pp 72–81 CrossRef
4.
Zurück zum Zitat Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7 CrossRef Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7 CrossRef
5.
Zurück zum Zitat Cantin JF, Lipasti MH, Smith JE (2005) Improving multiprocessor performance with coarse-grain coherence tracking. In: 32th intl symp on computer architecture (ISCA), pp 246–257 CrossRef Cantin JF, Lipasti MH, Smith JE (2005) Improving multiprocessor performance with coarse-grain coherence tracking. In: 32th intl symp on computer architecture (ISCA), pp 246–257 CrossRef
6.
Zurück zum Zitat Cuesta B, Ros A, Gmez EM, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 38th intl symp on computer architecture (ISCA), pp 93–104 Cuesta B, Ros A, Gmez EM, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 38th intl symp on computer architecture (ISCA), pp 93–104
7.
Zurück zum Zitat Hardavellas N, Ferdman M, Falsafi B, Ailamaki A (2009) Reactive NUCA: near-optimal block placement and replication in distributed caches. In: 36th intl symp on computer architecture (ISCA), pp 184–195 Hardavellas N, Ferdman M, Falsafi B, Ailamaki A (2009) Reactive NUCA: near-optimal block placement and replication in distributed caches. In: 36th intl symp on computer architecture (ISCA), pp 184–195
9.
Zurück zum Zitat Kim D, Ahn J, Kim J, Huh J (2010) Subspace snooping: filtering snoops with operating system support. In: 19th intl conference on parallel architectures and compilation techniques (PACT), pp 111–122 CrossRef Kim D, Ahn J, Kim J, Huh J (2010) Subspace snooping: filtering snoops with operating system support. In: 19th intl conference on parallel architectures and compilation techniques (PACT), pp 111–122 CrossRef
10.
Zurück zum Zitat Magen N, Kolodny A, Weiser U, Shamir N (2004) Interconnect power dissipation in a microprocessor. In: Intl workshop on system level interconnect prediction (SLIP), pp 7–13 Magen N, Kolodny A, Weiser U, Shamir N (2004) Interconnect power dissipation in a microprocessor. In: Intl workshop on system level interconnect prediction (SLIP), pp 7–13
11.
Zurück zum Zitat Martin MMK (2003) Token coherence. PhD dissertation, University of Wisconsin Martin MMK (2003) Token coherence. PhD dissertation, University of Wisconsin
12.
Zurück zum Zitat Marty MR, Bingham J, Hill MD, Hu A, Martin MM, Wood DA (2005) Improving multiple CMP systems using token coherence. In: 11th intl symp on high-performance computer architecture (HPCA), pp 328–339 CrossRef Marty MR, Bingham J, Hill MD, Hu A, Martin MM, Wood DA (2005) Improving multiple CMP systems using token coherence. In: 11th intl symp on high-performance computer architecture (HPCA), pp 328–339 CrossRef
13.
Zurück zum Zitat Moshovos A (2005) RegionScout: exploiting coarse grain sharing in snoop-based coherence. In: 32nd intl symp on computer architecture (ISCA), pp 234–245 CrossRef Moshovos A (2005) RegionScout: exploiting coarse grain sharing in snoop-based coherence. In: 32nd intl symp on computer architecture (ISCA), pp 234–245 CrossRef
14.
Zurück zum Zitat Raghavan A, Blundell C, Martin MMK (2008) Token tenure: PATCHing token counting using directory-based cache coherence. In: 41st IEEE/ACM intl symp on microarchitecture (MICRO), pp 47–58 Raghavan A, Blundell C, Martin MMK (2008) Token tenure: PATCHing token counting using directory-based cache coherence. In: 41st IEEE/ACM intl symp on microarchitecture (MICRO), pp 47–58
15.
Zurück zum Zitat Ros A, Acacio ME, Garca JM (2010) A direct coherence protocol for many-core chip multiprocessors. IEEE Trans Parallel Distrib Syst 21(12):1779–1792 CrossRef Ros A, Acacio ME, Garca JM (2010) A direct coherence protocol for many-core chip multiprocessors. IEEE Trans Parallel Distrib Syst 21(12):1779–1792 CrossRef
16.
Zurück zum Zitat Taylor MB, Kim J, Miller J, Wentzlaff D, Ghodrat F, Greenwald B, Hoffman H, Lee JW, Johnson P, Lee W, Ma A, Saraf A, Seneski M, Shnidman N, Strumpen V, Frank M, Amarasinghe S, Agarwal A (2002) The raw microprocessor: a computational fabric for software circuits and general purpose programs. IEEE MICRO 22(2):25–35 CrossRef Taylor MB, Kim J, Miller J, Wentzlaff D, Ghodrat F, Greenwald B, Hoffman H, Lee JW, Johnson P, Lee W, Ma A, Saraf A, Seneski M, Shnidman N, Strumpen V, Frank M, Amarasinghe S, Agarwal A (2002) The raw microprocessor: a computational fabric for software circuits and general purpose programs. IEEE MICRO 22(2):25–35 CrossRef
18.
Zurück zum Zitat Wang J, Wang D, Wang H, Xue Y (2012) Dynamic reusability-based replication with network address mapping in CMPs. In: 17th Asia and South Pacific design automation conference (ASP-DAC), pp 487–492 CrossRef Wang J, Wang D, Wang H, Xue Y (2012) Dynamic reusability-based replication with network address mapping in CMPs. In: 17th Asia and South Pacific design automation conference (ASP-DAC), pp 487–492 CrossRef
19.
Zurück zum Zitat Zebchuk J, Safi E, Moshovos A (2007) A framework for coarse-grain optimizations in the on-chip memory hierarchy. In: 40th IEEE/ACM intl symp on microarchitecture (MICRO), pp 314–327 CrossRef Zebchuk J, Safi E, Moshovos A (2007) A framework for coarse-grain optimizations in the on-chip memory hierarchy. In: 40th IEEE/ACM intl symp on microarchitecture (MICRO), pp 314–327 CrossRef
Metadaten
Titel
DP&TB: a coherence filtering protocol for many-core chip multiprocessors
verfasst von
Fengkai Yuan
Zhenzhou Ji
Publikationsdatum
01.10.2013
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2013
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-013-0900-4

Weitere Artikel der Ausgabe 1/2013

The Journal of Supercomputing 1/2013 Zur Ausgabe

Premium Partner