Skip to main content
Erschienen in: The Journal of Supercomputing 4/2016

01.04.2016

Facing prefetching challenges in distributed shared memories for CMPs

verfasst von: Martí Torrents, Raul Martínez, Carlos Molina

Erschienen in: The Journal of Supercomputing | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Prefetch engines working on distributed memory systems behave independently by analyzing the memory accesses that are addressed to the attached piece of cache. They potentially generate prefetching requests targeted at any other tile on the system that depends on the computed address. This distributed behavior involves several challenges that are not present when the cache is unified. In this paper, we identify, analyze, quantify, and hint on how to face the effects of these challenges, thus paving the way to future research on how to implement prefetching mechanisms at all levels of the cache hierarchy of this kind of system with shared distributed caches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Byna S, Yong C, Xian-He S (2009) Taxonomy of data prefetching for multicore processors. J Computer Sci Technol 24:405–417CrossRef Byna S, Yong C, Xian-He S (2009) Taxonomy of data prefetching for multicore processors. J Computer Sci Technol 24:405–417CrossRef
2.
Zurück zum Zitat Levinthal D (2009) Performance analysis guide for Intel Core i7 processor and Intel Xeon 5500 processors. White paper (2009) Levinthal D (2009) Performance analysis guide for Intel Core i7 processor and Intel Xeon 5500 processors. White paper (2009)
4.
Zurück zum Zitat Byna S, Chen Y, Sun XH (2009) Taxonomy of data prefetching for multicore processors. J Computer Sci Technol 24(3):405–417CrossRef Byna S, Chen Y, Sun XH (2009) Taxonomy of data prefetching for multicore processors. J Computer Sci Technol 24(3):405–417CrossRef
5.
Zurück zum Zitat Ebrahimi E, Mutlu O, Lee CJ, Patt YN (2009) Coordinated control of multiple prefetchers in multi-core systems. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, MICRO 42, pp 316–326, New York, NY, USA. ACM Ebrahimi E, Mutlu O, Lee CJ, Patt YN (2009) Coordinated control of multiple prefetchers in multi-core systems. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, MICRO 42, pp 316–326, New York, NY, USA. ACM
6.
Zurück zum Zitat Flores A, Aragon JL, Acacio ME (2010) Heterogeneous interconnects for energy-efficient message management in CMPs. IEEE Trans Computers 59(1):16–28MathSciNetCrossRef Flores A, Aragon JL, Acacio ME (2010) Heterogeneous interconnects for energy-efficient message management in CMPs. IEEE Trans Computers 59(1):16–28MathSciNetCrossRef
7.
Zurück zum Zitat Lee CJ, Narasiman V, Mutlu O, Patt YN (2009) Improving memory bank-level parallelism in the presence of prefetching. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, MICRO 42, pp 327–336, New York, NY, USA. ACM Lee CJ, Narasiman V, Mutlu O, Patt YN (2009) Improving memory bank-level parallelism in the presence of prefetching. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, MICRO 42, pp 327–336, New York, NY, USA. ACM
8.
Zurück zum Zitat Lee J, Kim H, Vuduc R (2012) When prefetching works, when it doesnt, and why. ACM Trans Archit Code Optim 9(1):2CrossRef Lee J, Kim H, Vuduc R (2012) When prefetching works, when it doesnt, and why. ACM Trans Archit Code Optim 9(1):2CrossRef
9.
Zurück zum Zitat Vanderwiel S, Lilja DJ (1996) A survey of data prefetching techniques. Technical report Vanderwiel S, Lilja DJ (1996) A survey of data prefetching techniques. Technical report
10.
Zurück zum Zitat Torrents M et al (2012) Comparative study of prefetching mechanisms. CEDI Torrents M et al (2012) Comparative study of prefetching mechanisms. CEDI
11.
Zurück zum Zitat Gorder PF (2007) Multicore processors for science and engineering. Comput Sci Eng 9(2):3–7 Gorder PF (2007) Multicore processors for science and engineering. Comput Sci Eng 9(2):3–7
12.
Zurück zum Zitat Low R (2005) Microprocessor trends: multicore, memory, and power developments. Embed Comput Design Low R (2005) Microprocessor trends: multicore, memory, and power developments. Embed Comput Design
13.
Zurück zum Zitat Song Y, Kalogeropulos S, Tirumalai P (2005) Design and implementation of a compiler framework for helper threading on multi-core processors. In: 14th international conference on parallel architectures and compilation techniques, 2005. PACT 2005, pp 99–109. IEEE Song Y, Kalogeropulos S, Tirumalai P (2005) Design and implementation of a compiler framework for helper threading on multi-core processors. In: 14th international conference on parallel architectures and compilation techniques, 2005. PACT 2005, pp 99–109. IEEE
14.
Zurück zum Zitat Ganusov I, Burtscher M (2005) Future execution: a hardware prefetching technique for chip multiprocessors. In: 14th International conference on parallel architectures and compilation techniques, 2005. PACT 2005, pp 350–360. IEEE Ganusov I, Burtscher M (2005) Future execution: a hardware prefetching technique for chip multiprocessors. In: 14th International conference on parallel architectures and compilation techniques, 2005. PACT 2005, pp 350–360. IEEE
15.
Zurück zum Zitat Sun XH, Byna S, Chen Y (2007) Server-based data push architecture for multi-processor environments. J Computer Sci Technol 22(5):641–652CrossRef Sun XH, Byna S, Chen Y (2007) Server-based data push architecture for multi-processor environments. J Computer Sci Technol 22(5):641–652CrossRef
16.
Zurück zum Zitat Fu JWC, Patel JH, Janssens BL (1992) Stride directed prefetching in scalar processors. SIGMICRO Newsl 23(1–2):102–110CrossRef Fu JWC, Patel JH, Janssens BL (1992) Stride directed prefetching in scalar processors. SIGMICRO Newsl 23(1–2):102–110CrossRef
17.
Zurück zum Zitat Tien-Fu C, Baer JL (1995) Effective hardware-based data prefetching for high-performance processors. IEEE Trans Computers 44:609–623CrossRefMATH Tien-Fu C, Baer JL (1995) Effective hardware-based data prefetching for high-performance processors. IEEE Trans Computers 44:609–623CrossRefMATH
18.
Zurück zum Zitat Nesbit KJ, Smith JE (2004) Data cache prefetching using a global history buffer. In: IEEE Proceedings Software, p 96 Nesbit KJ, Smith JE (2004) Data cache prefetching using a global history buffer. In: IEEE Proceedings Software, p 96
19.
Zurück zum Zitat Srinath S, Mutlu O, Kim Hyesoon, Patt YN (2007) Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers. In: IEEE 13th international symposium on high performance computer architecture, 2007 (HPCA), pp 63–74 Srinath S, Mutlu O, Kim Hyesoon, Patt YN (2007) Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers. In: IEEE 13th international symposium on high performance computer architecture, 2007 (HPCA), pp 63–74
20.
Zurück zum Zitat Zhuang X, Lee HHS (2003) A hardware-based cache pollution filtering mechanism for aggressive prefetches. In: 2003 International conference on parallel processing, 2003. Proceedings, pp 286–293. IEEE Zhuang X, Lee HHS (2003) A hardware-based cache pollution filtering mechanism for aggressive prefetches. In: 2003 International conference on parallel processing, 2003. Proceedings, pp 286–293. IEEE
21.
Zurück zum Zitat Zhuang X, Lee HHS (2007) Reducing cache pollution via dynamic data prefetch filtering. IEEE Trans Comput 56(1):18–31MathSciNetCrossRef Zhuang X, Lee HHS (2007) Reducing cache pollution via dynamic data prefetch filtering. IEEE Trans Comput 56(1):18–31MathSciNetCrossRef
22.
Zurück zum Zitat Lee CJ, Mutlu O, Narasiman V, Patt YN (2008) Prefetch-aware DRAM controllers. In: Proceedings of the 41st annual IEEE/ACM international symposium on microarchitecture, pp 200–209. IEEE Computer Society Lee CJ, Mutlu O, Narasiman V, Patt YN (2008) Prefetch-aware DRAM controllers. In: Proceedings of the 41st annual IEEE/ACM international symposium on microarchitecture, pp 200–209. IEEE Computer Society
23.
Zurück zum Zitat Lin WF, Reinhardt SK, Burger D (2001) Reducing DRAM latencies with an integrated memory hierarchy design. In: The seventh international symposium on high-performance computer architecture, 2001. HPCA, pp 301–312. IEEE Lin WF, Reinhardt SK, Burger D (2001) Reducing DRAM latencies with an integrated memory hierarchy design. In: The seventh international symposium on high-performance computer architecture, 2001. HPCA, pp 301–312. IEEE
24.
Zurück zum Zitat Flores A, Aragón JL, Acacio ME (2010) Energy-efficient hardware prefetching for CMPs using heterogeneous interconnects. In: 18th Euromicro international conference on parallel, distributed and network-based processing (PDP), 2010, pp 147–154. IEEE Flores A, Aragón JL, Acacio ME (2010) Energy-efficient hardware prefetching for CMPs using heterogeneous interconnects. In: 18th Euromicro international conference on parallel, distributed and network-based processing (PDP), 2010, pp 147–154. IEEE
25.
Zurück zum Zitat Chidambaram Nachiappan N, Mishra AK, Kademir M, Sivasubramaniam A, Mutlu O, Das CR (2012) Application-aware prefetch prioritization in on-chip networks. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, pp 441–442. ACM Chidambaram Nachiappan N, Mishra AK, Kademir M, Sivasubramaniam A, Mutlu O, Das CR (2012) Application-aware prefetch prioritization in on-chip networks. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, pp 441–442. ACM
26.
Zurück zum Zitat Lee J, Kim H, Shin M, Kim JH, Huh Jaehyuk (2014) Mutually aware prefetcher and on-chip network designs for multi-cores. IEEE Trans Computers 63(9):2316–2329MathSciNetCrossRef Lee J, Kim H, Shin M, Kim JH, Huh Jaehyuk (2014) Mutually aware prefetcher and on-chip network designs for multi-cores. IEEE Trans Computers 63(9):2316–2329MathSciNetCrossRef
27.
Zurück zum Zitat Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput Arch News 39(2):1–7CrossRef Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput Arch News 39(2):1–7CrossRef
28.
Zurück zum Zitat Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, pp 72–81. ACM Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, pp 72–81. ACM
29.
Zurück zum Zitat Abadal S, Cabellos-Aparicio A, Lemme MC, Nemirovsky M et al (2013) Graphene-enabled wireless communication for massive multicore architectures. IEEE Commun Mag 51(11):137–143CrossRef Abadal S, Cabellos-Aparicio A, Lemme MC, Nemirovsky M et al (2013) Graphene-enabled wireless communication for massive multicore architectures. IEEE Commun Mag 51(11):137–143CrossRef
Metadaten
Titel
Facing prefetching challenges in distributed shared memories for CMPs
verfasst von
Martí Torrents
Raul Martínez
Carlos Molina
Publikationsdatum
01.04.2016
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 4/2016
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-016-1675-1

Weitere Artikel der Ausgabe 4/2016

The Journal of Supercomputing 4/2016 Zur Ausgabe

Premium Partner