nach oben

The Journal of Supercomputing

Erschienen in:

01.09.2015

Design space exploration of hardware task superscalar architecture

verfasst von: Fahimeh Yazdanpanah, Mohammad Alaei

Erschienen in: The Journal of Supercomputing | Ausgabe 9/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

For current high performance computing systems, exploiting concurrency is a serious and important challenge. Recently, several dynamic software task management mechanisms have been proposed. In particular, task-based dataflow programming models which benefit from dataflow principles to improve task-level parallelism and overcome the limitations of static task management systems. However, these programming models rely on software-based dependency analysis, which are performed inherently slowly; and this limits their scalability specially when there is fine-grained task granularity and a large amount of tasks. Moreover, task scheduling in software introduces overheads, and so becomes increasingly inefficient with the number of cores. In contrast, a hardware scheduling solution, like Task SuperScalar (TSS), can achieve greater values of speed-up because a hardware task scheduler requires fewer cycles than the software version to dispatch a task. TSS combines the effectiveness of Out-of-Order processors together with the task abstraction. It has been implemented in software with limited parallelism and high memory consumption due to the nature of the software implementation. Hardware Task Superscalar (HTSS) is proposed to solve these drawbacks. HTSS is designed to be integrated in a future high performance computer with the ability to exploit fine-grained task parallelism. In this article, a deep latency and design space exploration of HTSS is described. For design space exploration, we have designed a full cycle-accurate simulator of HTSS, called SimTSS. The simulator has been tuned based on latency exploration of HTSS components resulted from VHDL description of each component. As the result of this exploration, we have found the number of components and memory capacity of HTSS for HPC systems.

Vorheriger Artikel Leveraging dark silicon to optimize networks-on-chip topology

Nächster Artikel Accelerating the least-square Monte Carlo method with parallel computing

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Al-Kadi G, Terechko AS (2009) A hardware task scheduler for embedded video processing. In: Proceedings of the international conference on high performance and embedded architectures and compilers (HiPEAC), pp 140–152

Badia RM (2011) Top down programming methodology and tools with StarSs, enabling scalable programming paradigms: extended abstract. In: Proceedings of the workshop on scalable algorithms for large-scale systems (ScalA), pp 19–20

Bellens P, Perez JM, Cabarcas F, Ramirez A, Badia RM, Labarta J (2009) CellSs: scheduling techniques to better exploit memory hierarchy. Sci Program 17(1–2):77–95

Bellens P, Perez J, Badia R, Labarta J (2006) CellSs: a programming model for the cell BE architecture. In: Proceedings of the supercomputing (SC). ACM, New York

Bsc application repository, bar (2014). In: Barcelona Supercomputing Center (BSC). https://pm.bsc.es/projects/bar. Accessed 06 Feb 2014

Bueno J, Martinell L, Duran A, Farreras M, Martorell X, Badia RM, Ayguade E, Labarta J (2011) Productive cluster programming with OmpSs. In: Proceedings of the International conference on parallel processing (Euro-Par), pp 555–566

Castrillon J, Zhang D, Kempf T, Vanthournout B, Leupers R, Ascheid G (2009) Task management in MPSoCs: an ASIP approach. In: Proceedings of the international conference on computer-aided design (ICCAD), pp 587–594

Duran A, Ayguade E, Badia RM, Labarta J, Martinell L, Martorell X, Planas J (2011) Ompss: a proposal for programming heterogeneous multi-core architectures. Parallel Process Lett 21(2):173–193MathSciNetCrossRef

Etsion Y, Cabarcas F, Rico A, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: an out-of-order task pipeline. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 89–100

10.

Etsion Y, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: using processors as functional units. In: Proceedings of the hot topics in parallelism (HOTPAR)

11.

Hoogerbrugge J, Terechko A (2011) A multithreaded multicore system for embedded media processing. Trans High-Perform Embedded Archit Compil (THEA) 3(2):154–173 (2011)

12.

Jenista JC, Eom YH, Demsky B (2010) OoOJava: an out-of-order approach to parallel programming. In: Proceedings of the USENIX conference on hot topic in parallelism (HotPar), pp 11–11

13.

Jenista JC, Eom YH, Demsky BC (2011) OoOJava: software out-of-order execution. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP), pp 57–68

14.

Kalra R, Lysecky R (2010) Configuration locking and schedulability estimation for reduced reconfiguration overheads of reconfigurable systems. IEEE Trans Very Large Scale Integr Sys 18(4):671–674CrossRef

15.

Kish LB (2002) End of Moore’s law: thermal (noise) death of integration in micro and nano electronics. Phys Lett A 305:144–149CrossRef

16.

Kish LB (2004) Moore’s law and the energy requirement of computing versus performance. IEE Proc Circuits Dev Syst 151(2):190–194MathSciNetCrossRef

17.

Kumar S, Hughes CJ, Nguyen A (2007) Carbon: Architectural support for fine-grained parallelism on chip multiprocessors. In: Proceedings of the international symposium on computer architecture (ISCA), pp 162–173

18.

Lam MS, Rinard MC (1991) Coarse-grain parallel programming in Jade. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP). ACM, New York, pp 94–105

19.

Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39–55CrossRef

20.

Meenderinck C, Juurlink B (2010) A case for hardware task management support for the StarSs programming model. In: Proceedings of the conference on digital system design (DSD), pp 347–354

21.

Meenderinck C, Juurlink B (2011) Nexus: hardware support for task-based programming. In: Proceedings of the conference on digital system design (DSD), pp 442–445

22.

Nacul AC, Regazzoni F, Lajolo M (2007) Hardware scheduling support in SMP architectures. In: Proceedings of the conference on design, automation and test in Europe (DATE), pp 642–647

23.

Noguera J, Badia RM (2003) System-level power-performance trade-offs in task scheduling for dynamically reconfigurable architectures. In: Proceedings of the international conference on compilers, architectures and synthesis for embedded systems (CASES), pp 73–83

24.

Noguera J, Badia RM (2004) Multitasking on reconfigurable architectures: microarchitecture support and dynamic scheduling. ACM Trans Embedded Comput Syst 3(2):385–406CrossRef

25.

Openmp application program interface, version 4.0 (2013). www.openmp.org/. Accessed 06 Feb 2014

26.

Park S (2008) A hardware operating system kernel for multi processors. IEICE Electron Express 5(9):296–302CrossRef

27.

Pearson PK (1990) Fast hashing of variable-length text strings. Commun ACM 33(6):677–680CrossRef

28.

Perez, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the international conference on cluster computing (CC), pp 142–151

29.

Rinard MC, Lam MS (1998) The design, implementation, and evaluation of Jade. ACM Trans Program Lang Syst (TPLS) 20(3):483–545CrossRef

30.

Rinard MC, Scales DJ, Lam MS (1992) Heterogeneous parallel programming in Jade. In: Proceedings of the conference on supercomputing, pp 245–256

31.

Rinard MC, Scales DJ, Lam MS (1993) Jade: a high-level, machine-independent language for parallel programming. Computer 26(6):28–38CrossRef

32.

Saez S, Vila J, Crespo A, Garcia A (1999) A hardware scheduler for complex real time system. In: Proceedings of the IEEE international symposium industrial electronics (ISIE). IEEE, pp 43–48

33.

Sjalander M, Terechko A, Duranton M (2008) A look-ahead task management unit for embedded multi-core architectures. In: Proceedings of the conference on digital system design (DSD), pp 149–157

34.

Yazdanpanah F, Alvarez C, Jimenez-Gonalez D, Badia RM, Valero M (2015) Picos: a hardware runtime architecture support for ompss. Future Gener Comput Syst

35.

Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y (2013) Hybrid dataflow/von-Neumann architectures. IEEE Trans Parallel Distrib Syst (TPDS) 25(6):1489–1509

36.

Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) Analysis of the task superscalar architecture hardware design. In: Proceedings of the international conference on computational science (ICCS)

37.

Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) FPGA-based prototype of the task superscalar architecture. In: Proceedings of the 7th HiPEAC workshop of reconfigurable computing (WRC)

Titel: Design space exploration of hardware task superscalar architecture
verfasst von: Fahimeh Yazdanpanah
Mohammad Alaei
Publikationsdatum: 01.09.2015
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 9/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-015-1449-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 9/2015

Accelerating low-fidelity aerodynamic codes on multi- and many-core architectures

A game theory-based block image compression method in encryption domain

Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing

Per-packet global congestion estimation for fast packet delivery in networks-on-chip

Optimizing the Hadoop MapReduce Framework with high-performance storage devices

Optimized clustering for data dissemination using stochastic coalition game in vehicular cyber-physical systems