Skip to main content
Erschienen in: The Journal of Supercomputing 9/2015

01.09.2015

Design space exploration of hardware task superscalar architecture

verfasst von: Fahimeh Yazdanpanah, Mohammad Alaei

Erschienen in: The Journal of Supercomputing | Ausgabe 9/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For current high performance computing systems, exploiting concurrency is a serious and important challenge. Recently, several dynamic software task management mechanisms have been proposed. In particular, task-based dataflow programming models which benefit from dataflow principles to improve task-level parallelism and overcome the limitations of static task management systems. However, these programming models rely on software-based dependency analysis, which are performed inherently slowly; and this limits their scalability specially when there is fine-grained task granularity and a large amount of tasks. Moreover, task scheduling in software introduces overheads, and so becomes increasingly inefficient with the number of cores. In contrast, a hardware scheduling solution, like Task SuperScalar (TSS), can achieve greater values of speed-up because a hardware task scheduler requires fewer cycles than the software version to dispatch a task. TSS combines the effectiveness of Out-of-Order processors together with the task abstraction. It has been implemented in software with limited parallelism and high memory consumption due to the nature of the software implementation. Hardware Task Superscalar (HTSS) is proposed to solve these drawbacks. HTSS is designed to be integrated in a future high performance computer with the ability to exploit fine-grained task parallelism. In this article, a deep latency and design space exploration of HTSS is described. For design space exploration, we have designed a full cycle-accurate simulator of HTSS, called SimTSS. The simulator has been tuned based on latency exploration of HTSS components resulted from VHDL description of each component. As the result of this exploration, we have found the number of components and memory capacity of HTSS for HPC systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Al-Kadi G, Terechko AS (2009) A hardware task scheduler for embedded video processing. In: Proceedings of the international conference on high performance and embedded architectures and compilers (HiPEAC), pp 140–152 Al-Kadi G, Terechko AS (2009) A hardware task scheduler for embedded video processing. In: Proceedings of the international conference on high performance and embedded architectures and compilers (HiPEAC), pp 140–152
2.
Zurück zum Zitat Badia RM (2011) Top down programming methodology and tools with StarSs, enabling scalable programming paradigms: extended abstract. In: Proceedings of the workshop on scalable algorithms for large-scale systems (ScalA), pp 19–20 Badia RM (2011) Top down programming methodology and tools with StarSs, enabling scalable programming paradigms: extended abstract. In: Proceedings of the workshop on scalable algorithms for large-scale systems (ScalA), pp 19–20
3.
Zurück zum Zitat Bellens P, Perez JM, Cabarcas F, Ramirez A, Badia RM, Labarta J (2009) CellSs: scheduling techniques to better exploit memory hierarchy. Sci Program 17(1–2):77–95 Bellens P, Perez JM, Cabarcas F, Ramirez A, Badia RM, Labarta J (2009) CellSs: scheduling techniques to better exploit memory hierarchy. Sci Program 17(1–2):77–95
4.
Zurück zum Zitat Bellens P, Perez J, Badia R, Labarta J (2006) CellSs: a programming model for the cell BE architecture. In: Proceedings of the supercomputing (SC). ACM, New York Bellens P, Perez J, Badia R, Labarta J (2006) CellSs: a programming model for the cell BE architecture. In: Proceedings of the supercomputing (SC). ACM, New York
6.
Zurück zum Zitat Bueno J, Martinell L, Duran A, Farreras M, Martorell X, Badia RM, Ayguade E, Labarta J (2011) Productive cluster programming with OmpSs. In: Proceedings of the International conference on parallel processing (Euro-Par), pp 555–566 Bueno J, Martinell L, Duran A, Farreras M, Martorell X, Badia RM, Ayguade E, Labarta J (2011) Productive cluster programming with OmpSs. In: Proceedings of the International conference on parallel processing (Euro-Par), pp 555–566
7.
Zurück zum Zitat Castrillon J, Zhang D, Kempf T, Vanthournout B, Leupers R, Ascheid G (2009) Task management in MPSoCs: an ASIP approach. In: Proceedings of the international conference on computer-aided design (ICCAD), pp 587–594 Castrillon J, Zhang D, Kempf T, Vanthournout B, Leupers R, Ascheid G (2009) Task management in MPSoCs: an ASIP approach. In: Proceedings of the international conference on computer-aided design (ICCAD), pp 587–594
8.
Zurück zum Zitat Duran A, Ayguade E, Badia RM, Labarta J, Martinell L, Martorell X, Planas J (2011) Ompss: a proposal for programming heterogeneous multi-core architectures. Parallel Process Lett 21(2):173–193MathSciNetCrossRef Duran A, Ayguade E, Badia RM, Labarta J, Martinell L, Martorell X, Planas J (2011) Ompss: a proposal for programming heterogeneous multi-core architectures. Parallel Process Lett 21(2):173–193MathSciNetCrossRef
9.
Zurück zum Zitat Etsion Y, Cabarcas F, Rico A, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: an out-of-order task pipeline. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 89–100 Etsion Y, Cabarcas F, Rico A, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: an out-of-order task pipeline. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 89–100
10.
Zurück zum Zitat Etsion Y, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: using processors as functional units. In: Proceedings of the hot topics in parallelism (HOTPAR) Etsion Y, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: using processors as functional units. In: Proceedings of the hot topics in parallelism (HOTPAR)
11.
Zurück zum Zitat Hoogerbrugge J, Terechko A (2011) A multithreaded multicore system for embedded media processing. Trans High-Perform Embedded Archit Compil (THEA) 3(2):154–173 (2011) Hoogerbrugge J, Terechko A (2011) A multithreaded multicore system for embedded media processing. Trans High-Perform Embedded Archit Compil (THEA) 3(2):154–173 (2011)
12.
Zurück zum Zitat Jenista JC, Eom YH, Demsky B (2010) OoOJava: an out-of-order approach to parallel programming. In: Proceedings of the USENIX conference on hot topic in parallelism (HotPar), pp 11–11 Jenista JC, Eom YH, Demsky B (2010) OoOJava: an out-of-order approach to parallel programming. In: Proceedings of the USENIX conference on hot topic in parallelism (HotPar), pp 11–11
13.
Zurück zum Zitat Jenista JC, Eom YH, Demsky BC (2011) OoOJava: software out-of-order execution. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP), pp 57–68 Jenista JC, Eom YH, Demsky BC (2011) OoOJava: software out-of-order execution. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP), pp 57–68
14.
Zurück zum Zitat Kalra R, Lysecky R (2010) Configuration locking and schedulability estimation for reduced reconfiguration overheads of reconfigurable systems. IEEE Trans Very Large Scale Integr Sys 18(4):671–674CrossRef Kalra R, Lysecky R (2010) Configuration locking and schedulability estimation for reduced reconfiguration overheads of reconfigurable systems. IEEE Trans Very Large Scale Integr Sys 18(4):671–674CrossRef
15.
Zurück zum Zitat Kish LB (2002) End of Moore’s law: thermal (noise) death of integration in micro and nano electronics. Phys Lett A 305:144–149CrossRef Kish LB (2002) End of Moore’s law: thermal (noise) death of integration in micro and nano electronics. Phys Lett A 305:144–149CrossRef
16.
Zurück zum Zitat Kish LB (2004) Moore’s law and the energy requirement of computing versus performance. IEE Proc Circuits Dev Syst 151(2):190–194MathSciNetCrossRef Kish LB (2004) Moore’s law and the energy requirement of computing versus performance. IEE Proc Circuits Dev Syst 151(2):190–194MathSciNetCrossRef
17.
Zurück zum Zitat Kumar S, Hughes CJ, Nguyen A (2007) Carbon: Architectural support for fine-grained parallelism on chip multiprocessors. In: Proceedings of the international symposium on computer architecture (ISCA), pp 162–173 Kumar S, Hughes CJ, Nguyen A (2007) Carbon: Architectural support for fine-grained parallelism on chip multiprocessors. In: Proceedings of the international symposium on computer architecture (ISCA), pp 162–173
18.
Zurück zum Zitat Lam MS, Rinard MC (1991) Coarse-grain parallel programming in Jade. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP). ACM, New York, pp 94–105 Lam MS, Rinard MC (1991) Coarse-grain parallel programming in Jade. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP). ACM, New York, pp 94–105
19.
Zurück zum Zitat Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39–55CrossRef Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39–55CrossRef
20.
Zurück zum Zitat Meenderinck C, Juurlink B (2010) A case for hardware task management support for the StarSs programming model. In: Proceedings of the conference on digital system design (DSD), pp 347–354 Meenderinck C, Juurlink B (2010) A case for hardware task management support for the StarSs programming model. In: Proceedings of the conference on digital system design (DSD), pp 347–354
21.
Zurück zum Zitat Meenderinck C, Juurlink B (2011) Nexus: hardware support for task-based programming. In: Proceedings of the conference on digital system design (DSD), pp 442–445 Meenderinck C, Juurlink B (2011) Nexus: hardware support for task-based programming. In: Proceedings of the conference on digital system design (DSD), pp 442–445
22.
Zurück zum Zitat Nacul AC, Regazzoni F, Lajolo M (2007) Hardware scheduling support in SMP architectures. In: Proceedings of the conference on design, automation and test in Europe (DATE), pp 642–647 Nacul AC, Regazzoni F, Lajolo M (2007) Hardware scheduling support in SMP architectures. In: Proceedings of the conference on design, automation and test in Europe (DATE), pp 642–647
23.
Zurück zum Zitat Noguera J, Badia RM (2003) System-level power-performance trade-offs in task scheduling for dynamically reconfigurable architectures. In: Proceedings of the international conference on compilers, architectures and synthesis for embedded systems (CASES), pp 73–83 Noguera J, Badia RM (2003) System-level power-performance trade-offs in task scheduling for dynamically reconfigurable architectures. In: Proceedings of the international conference on compilers, architectures and synthesis for embedded systems (CASES), pp 73–83
24.
Zurück zum Zitat Noguera J, Badia RM (2004) Multitasking on reconfigurable architectures: microarchitecture support and dynamic scheduling. ACM Trans Embedded Comput Syst 3(2):385–406CrossRef Noguera J, Badia RM (2004) Multitasking on reconfigurable architectures: microarchitecture support and dynamic scheduling. ACM Trans Embedded Comput Syst 3(2):385–406CrossRef
26.
Zurück zum Zitat Park S (2008) A hardware operating system kernel for multi processors. IEICE Electron Express 5(9):296–302CrossRef Park S (2008) A hardware operating system kernel for multi processors. IEICE Electron Express 5(9):296–302CrossRef
27.
Zurück zum Zitat Pearson PK (1990) Fast hashing of variable-length text strings. Commun ACM 33(6):677–680CrossRef Pearson PK (1990) Fast hashing of variable-length text strings. Commun ACM 33(6):677–680CrossRef
28.
Zurück zum Zitat Perez, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the international conference on cluster computing (CC), pp 142–151 Perez, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the international conference on cluster computing (CC), pp 142–151
29.
Zurück zum Zitat Rinard MC, Lam MS (1998) The design, implementation, and evaluation of Jade. ACM Trans Program Lang Syst (TPLS) 20(3):483–545CrossRef Rinard MC, Lam MS (1998) The design, implementation, and evaluation of Jade. ACM Trans Program Lang Syst (TPLS) 20(3):483–545CrossRef
30.
Zurück zum Zitat Rinard MC, Scales DJ, Lam MS (1992) Heterogeneous parallel programming in Jade. In: Proceedings of the conference on supercomputing, pp 245–256 Rinard MC, Scales DJ, Lam MS (1992) Heterogeneous parallel programming in Jade. In: Proceedings of the conference on supercomputing, pp 245–256
31.
Zurück zum Zitat Rinard MC, Scales DJ, Lam MS (1993) Jade: a high-level, machine-independent language for parallel programming. Computer 26(6):28–38CrossRef Rinard MC, Scales DJ, Lam MS (1993) Jade: a high-level, machine-independent language for parallel programming. Computer 26(6):28–38CrossRef
32.
Zurück zum Zitat Saez S, Vila J, Crespo A, Garcia A (1999) A hardware scheduler for complex real time system. In: Proceedings of the IEEE international symposium industrial electronics (ISIE). IEEE, pp 43–48 Saez S, Vila J, Crespo A, Garcia A (1999) A hardware scheduler for complex real time system. In: Proceedings of the IEEE international symposium industrial electronics (ISIE). IEEE, pp 43–48
33.
Zurück zum Zitat Sjalander M, Terechko A, Duranton M (2008) A look-ahead task management unit for embedded multi-core architectures. In: Proceedings of the conference on digital system design (DSD), pp 149–157 Sjalander M, Terechko A, Duranton M (2008) A look-ahead task management unit for embedded multi-core architectures. In: Proceedings of the conference on digital system design (DSD), pp 149–157
34.
Zurück zum Zitat Yazdanpanah F, Alvarez C, Jimenez-Gonalez D, Badia RM, Valero M (2015) Picos: a hardware runtime architecture support for ompss. Future Gener Comput Syst Yazdanpanah F, Alvarez C, Jimenez-Gonalez D, Badia RM, Valero M (2015) Picos: a hardware runtime architecture support for ompss. Future Gener Comput Syst
35.
Zurück zum Zitat Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y (2013) Hybrid dataflow/von-Neumann architectures. IEEE Trans Parallel Distrib Syst (TPDS) 25(6):1489–1509 Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y (2013) Hybrid dataflow/von-Neumann architectures. IEEE Trans Parallel Distrib Syst (TPDS) 25(6):1489–1509
36.
Zurück zum Zitat Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) Analysis of the task superscalar architecture hardware design. In: Proceedings of the international conference on computational science (ICCS) Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) Analysis of the task superscalar architecture hardware design. In: Proceedings of the international conference on computational science (ICCS)
37.
Zurück zum Zitat Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) FPGA-based prototype of the task superscalar architecture. In: Proceedings of the 7th HiPEAC workshop of reconfigurable computing (WRC) Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) FPGA-based prototype of the task superscalar architecture. In: Proceedings of the 7th HiPEAC workshop of reconfigurable computing (WRC)
Metadaten
Titel
Design space exploration of hardware task superscalar architecture
verfasst von
Fahimeh Yazdanpanah
Mohammad Alaei
Publikationsdatum
01.09.2015
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 9/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-015-1449-1

Weitere Artikel der Ausgabe 9/2015

The Journal of Supercomputing 9/2015 Zur Ausgabe