nach oben

The Journal of Supercomputing

Erschienen in:

01.10.2013

DP&TB: a coherence filtering protocol for many-core chip multiprocessors

verfasst von: Fengkai Yuan, Zhenzhou Ji

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Future many-core chip multiprocessors (CMPs) will integrate hundreds of processor cores on chip. Two cache coherence protocols are the mainstream applied to current CMPs. The token-based protocol (Token) provides high performance, but it generates a prohibitive amount of network traffic, which translates into excessive power consumption. The directory-based protocol (Directory) reduces network traffic, yet trades off with the storage overhead of the directory as well as entails comparatively low performance caused by indirection limiting its applicability for many-core CMPs.

In this work, we present DP&TB, a novel cache coherence protocol particularly suited to future many-core CMPs. In DP&TB, cache coherence is maintained at the granularity of a page, facilitating to filter out either unnecessary coherence inspections for blocks inside private pages or network traffic for blocks inside shared pages. We employ Directory to detect private and shared pages and Token to maintain the coherence of the blocks inside shared pages. DP&TB inherits the merit of Directory and Token and overcome their problems. Experimental results show that DP&TB comprehensively beyond Directory and Token with improvement by 9.1 % in performance over Token and by 13.8 % in network traffic over Directory. In addition, the storage overhead of DP&TB is less than half of that of Directory. Our proposal can fulfill the requirement of many-core CMPs to achieve high performance, power and area efficiency.

Vorheriger Artikel Direction-aware resource discovery in large-scale distributed computing environments

Nächster Artikel QoS based resource provisioning and scheduling in grids

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Agarwal N, Krishna T, Peh L-S, Jha NK (2009) GARNET: a detailed on-chip network model inside a full-system simulator. In: IEEE intl symp on performance analysis of systems and software (ISPASS), pp 33–42

Barroso LA, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: 27th intl symp on computer architecture (ISCA), pp 12–14

Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: 17th intl conference on parallel architectures and compilation techniques (PACT), pp 72–81 CrossRef

Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7 CrossRef

Cantin JF, Lipasti MH, Smith JE (2005) Improving multiprocessor performance with coarse-grain coherence tracking. In: 32th intl symp on computer architecture (ISCA), pp 246–257 CrossRef

Cuesta B, Ros A, Gmez EM, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 38th intl symp on computer architecture (ISCA), pp 93–104

Hardavellas N, Ferdman M, Falsafi B, Ailamaki A (2009) Reactive NUCA: near-optimal block placement and replication in distributed caches. In: 36th intl symp on computer architecture (ISCA), pp 184–195

Kalray (2012) First MPPA MANYCORE chip (MPPA256) integrates 256 cores. http://www.kalray.eu/products/mppa-manycore. Accessed 22 May 2012

Kim D, Ahn J, Kim J, Huh J (2010) Subspace snooping: filtering snoops with operating system support. In: 19th intl conference on parallel architectures and compilation techniques (PACT), pp 111–122 CrossRef

10.

Magen N, Kolodny A, Weiser U, Shamir N (2004) Interconnect power dissipation in a microprocessor. In: Intl workshop on system level interconnect prediction (SLIP), pp 7–13

11.

Martin MMK (2003) Token coherence. PhD dissertation, University of Wisconsin

12.

Marty MR, Bingham J, Hill MD, Hu A, Martin MM, Wood DA (2005) Improving multiple CMP systems using token coherence. In: 11th intl symp on high-performance computer architecture (HPCA), pp 328–339 CrossRef

13.

Moshovos A (2005) RegionScout: exploiting coarse grain sharing in snoop-based coherence. In: 32nd intl symp on computer architecture (ISCA), pp 234–245 CrossRef

14.

Raghavan A, Blundell C, Martin MMK (2008) Token tenure: PATCHing token counting using directory-based cache coherence. In: 41st IEEE/ACM intl symp on microarchitecture (MICRO), pp 47–58

15.

Ros A, Acacio ME, Garca JM (2010) A direct coherence protocol for many-core chip multiprocessors. IEEE Trans Parallel Distrib Syst 21(12):1779–1792 CrossRef

16.

Taylor MB, Kim J, Miller J, Wentzlaff D, Ghodrat F, Greenwald B, Hoffman H, Lee JW, Johnson P, Lee W, Ma A, Saraf A, Seneski M, Shnidman N, Strumpen V, Frank M, Amarasinghe S, Agarwal A (2002) The raw microprocessor: a computational fabric for software circuits and general purpose programs. IEEE MICRO 22(2):25–35 CrossRef

17.

Tilera (2012) Tilera announces latest tile-gx family processors with up to 100 cores. http://www.tilera.com/products/processors/TILEGx_Family. Accessed 20 May 2012

18.

Wang J, Wang D, Wang H, Xue Y (2012) Dynamic reusability-based replication with network address mapping in CMPs. In: 17th Asia and South Pacific design automation conference (ASP-DAC), pp 487–492 CrossRef

19.

Zebchuk J, Safi E, Moshovos A (2007) A framework for coarse-grain optimizations in the on-chip memory hierarchy. In: 40th IEEE/ACM intl symp on microarchitecture (MICRO), pp 314–327 CrossRef

Titel: DP&TB: a coherence filtering protocol for many-core chip multiprocessors
verfasst von: Fengkai Yuan
Zhenzhou Ji
Publikationsdatum: 01.10.2013
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 1/2013
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-013-0900-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 1/2013

Advanced technologies and applications for Highly-Reliable Cyber Physical System (HRCPS)

Normalized particle swarm optimization for complex chooser option pricing on graphics processing unit

QoS based resource provisioning and scheduling in grids

An efficient strategy of nonuniform sensor deployment in cyber physical systems

Random data-aware flash translation layer for NAND flash-based smart devices

High-performance optimizations on tiled many-core embedded systems: a matrix multiplication case study

Premium Partner