Skip to main content
Erschienen in: The Journal of Supercomputing 5/2018

16.12.2017

REPLICA MBTAC: multithreaded dual-mode processor

verfasst von: Martti Forsell, Jussi Roivainen, Ville Leppänen

Erschienen in: The Journal of Supercomputing | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
This paper in an extended version of the paper [9] with more detailed description of the MBTAC processor, inclusion of MBTAC for 16-core constellations in FPGA prototype description and evaluation section, and measurements and results. It extends also theoretical work [7] that used a weakly implementable development version of the ECLIPSE architecture [5] for the tests.
 
Literatur
1.
Zurück zum Zitat Ahmad M, Hijaz F, Shi Q, Khan O (2015) Crono: a benchmark suite for multithreaded graph algorithms executing on futuristic multicores. In: Workload Characterization (IISWC), 2015 IEEE International Symposium on, pp 44–55 Ahmad M, Hijaz F, Shi Q, Khan O (2015) Crono: a benchmark suite for multithreaded graph algorithms executing on futuristic multicores. In: Workload Characterization (IISWC), 2015 IEEE International Symposium on, pp 44–55
2.
Zurück zum Zitat Dietzfelbinger M, Karlin A, Mehlhorn K, Meyer auf der Heide F, Rohnert H, Tarjan RE (1994) Dynamic perfect hashing: upper and lower bounds. SIAM J Comput 23(4):738–761MathSciNetCrossRefMATH Dietzfelbinger M, Karlin A, Mehlhorn K, Meyer auf der Heide F, Rohnert H, Tarjan RE (1994) Dynamic perfect hashing: upper and lower bounds. SIAM J Comput 23(4):738–761MathSciNetCrossRefMATH
3.
Zurück zum Zitat Engelmann C (1992) Simulationen von PRAM’s, Master’s thesis. Universitat des Saarlandes, FB Informatik Engelmann C (1992) Simulationen von PRAM’s, Master’s thesis. Universitat des Saarlandes, FB Informatik
4.
Zurück zum Zitat Forsell M (1994) Are multiport memories physically feasible? SIGARCH Comput Archit News 22(4):47–54CrossRef Forsell M (1994) Are multiport memories physically feasible? SIGARCH Comput Archit News 22(4):47–54CrossRef
5.
Zurück zum Zitat Forsell M (2002) A scalable high-performance computing solution for networks on chips. IEEE Micro 22(5):46–55CrossRef Forsell M (2002) A scalable high-performance computing solution for networks on chips. IEEE Micro 22(5):46–55CrossRef
6.
Zurück zum Zitat Forsell M (2004) E—a language for thread-level parallel programming on synchronous shared memory NOCs. WSEAS Trans Comput 3(3):807–812 Forsell M (2004) E—a language for thread-level parallel programming on synchronous shared memory NOCs. WSEAS Trans Comput 3(3):807–812
7.
Zurück zum Zitat Forsell M (2011) A PRAM-NUMA model of computation for addressing low-TLP workloads. Int J Netw Comput 1(1):21–35CrossRef Forsell M (2011) A PRAM-NUMA model of computation for addressing low-TLP workloads. Int J Netw Comput 1(1):21–35CrossRef
8.
Zurück zum Zitat Forsell M (2011) Performance comparison of some shared memory organizations for 2D mesh-like NOCs. Microprocess Microsyst 35(2):274–284CrossRef Forsell M (2011) Performance comparison of some shared memory organizations for 2D mesh-like NOCs. Microprocess Microsyst 35(2):274–284CrossRef
9.
Zurück zum Zitat Forsell M, Roivainen J, Leppänen V (2014) Prototyping the MBTAC processor for the REPLICA CMP. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW ’14. IEEE Computer Society, Washington, pp 709–716 Forsell M, Roivainen J, Leppänen V (2014) Prototyping the MBTAC processor for the REPLICA CMP. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW ’14. IEEE Computer Society, Washington, pp 709–716
10.
Zurück zum Zitat Hennessy J, Patterson D (1990) Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., Palo AltoMATH Hennessy J, Patterson D (1990) Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., Palo AltoMATH
12.
Zurück zum Zitat Intel (2006) Research at Intel From a Few Cores to Many: A Tera-scale Computing Research Overview. White Paper Intel (2006) Research at Intel From a Few Cores to Many: A Tera-scale Computing Research Overview. White Paper
13.
Zurück zum Zitat Jaja J (1992) Introduction to parallel algorithms. Addison-Wesley, ReadingMATH Jaja J (1992) Introduction to parallel algorithms. Addison-Wesley, ReadingMATH
14.
Zurück zum Zitat Keller J, Kessler C, Traff J (2001) Practical PRAM programming. Wiley, New York Keller J, Kessler C, Traff J (2001) Practical PRAM programming. Wiley, New York
15.
Zurück zum Zitat Krommydas K, Scogland TRW, Feng W-C (2013) On the programmability and performance of heterogeneous platforms. In: Proceedings of the 2013 International Conference on Parallel and Distributed Systems, ICPADS ’13. IEEE Computer Society, Washington, pp 224–231 Krommydas K, Scogland TRW, Feng W-C (2013) On the programmability and performance of heterogeneous platforms. In: Proceedings of the 2013 International Conference on Parallel and Distributed Systems, ICPADS ’13. IEEE Computer Society, Washington, pp 224–231
16.
Zurück zum Zitat Lenoski D, Laudon J, Gharachorloo K, Weber W-D, Gupta A, Hennessy J, Horowitz M, Lam MS (1992) The Stanford Dash multiprocessor. Computer 25(3):63–79CrossRef Lenoski D, Laudon J, Gharachorloo K, Weber W-D, Gupta A, Hennessy J, Horowitz M, Lam MS (1992) The Stanford Dash multiprocessor. Computer 25(3):63–79CrossRef
17.
Zurück zum Zitat Leppänen V (1996) Studies on the realization of PRAM. Turku Centre for Computer Science, University of Turku, Turku, Finland Leppänen V (1996) Studies on the realization of PRAM. Turku Centre for Computer Science, University of Turku, Turku, Finland
18.
Zurück zum Zitat Merritt R (2011) Panel: Wall ahead in multicore programming (Multicore Expo). EE Times Merritt R (2011) Panel: Wall ahead in multicore programming (Multicore Expo). EE Times
19.
Zurück zum Zitat Park JJK, Park Y, Mahlke S (2015) Chimera: collaborative preemption for multitasking on a shared GPU. In: Proceedings of ASPLOS Park JJK, Park Y, Mahlke S (2015) Chimera: collaborative preemption for multitasking on a shared GPU. In: Proceedings of ASPLOS
20.
Zurück zum Zitat Patterson D (2010) The trouble with multi-core. IEEE Spectr 47(7):28–32CrossRef Patterson D (2010) The trouble with multi-core. IEEE Spectr 47(7):28–32CrossRef
23.
Zurück zum Zitat Sun Microsystems (2005) Throughput computing: changing the economics and ecology of the data center with innovative SPARC Technology. White paper Sun Microsystems (2005) Throughput computing: changing the economics and ecology of the data center with innovative SPARC Technology. White paper
24.
Zurück zum Zitat Vishkin U (2008) Toward realizing a PRAM-on-a-chip vision. In: Proceedings of the 2007 Conference on Parallel Processing, Euro-Par’07. Springer, Berlin, pp 5–6 Vishkin U (2008) Toward realizing a PRAM-on-a-chip vision. In: Proceedings of the 2007 Conference on Parallel Processing, Euro-Par’07. Springer, Berlin, pp 5–6
25.
Zurück zum Zitat Vishkin U (2011) Using simple abstraction to reinvent computing for parallelism. Commun ACM 54(1):75–85CrossRef Vishkin U (2011) Using simple abstraction to reinvent computing for parallelism. Commun ACM 54(1):75–85CrossRef
Metadaten
Titel
REPLICA MBTAC: multithreaded dual-mode processor
verfasst von
Martti Forsell
Jussi Roivainen
Ville Leppänen
Publikationsdatum
16.12.2017
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 5/2018
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2199-z

Weitere Artikel der Ausgabe 5/2018

The Journal of Supercomputing 5/2018 Zur Ausgabe