Top

Published in:

2018 | OriginalPaper | Chapter

TaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism

Authors : Kallia Chronaki, Marc Casas, Miquel Moreto, Jaume Bosch, Rosa M. Badia

Published in: High Performance Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

As chip multi-processors (CMPs) are becoming more and more complex, software solutions such as parallel programming models are attracting a lot of attention. Task-based parallel programming models offer an appealing approach to utilize complex CMPs. However, the increasing number of cores on modern CMPs is pushing research towards the use of fine grained parallelism. Task-based programming models need to be able to handle such workloads and offer performance and scalability. Using specialized hardware for boosting performance of task-based programming models is a common practice in the research community.

Our paper makes the observation that task creation becomes a bottleneck when we execute fine grained parallel applications with many task-based programming models. As the number of cores increases the time spent generating the tasks of the application is becoming more critical to the entire execution. To overcome this issue, we propose TaskGenX. TaskGenX offers a solution for minimizing task creation overheads and relies both on the runtime system and a dedicated hardware. On the runtime system side, TaskGenX decouples the task creation from the other runtime activities. It then transfers this part of the runtime to a specialized hardware. We draw the requirements for this hardware in order to boost execution of highly parallel applications. From our evaluation using 11 parallel workloads on both symmetric and asymmetric multicore systems, we obtain performance improvements up to 15\(\times \), averaging to 3.1\(\times \) over the baseline.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Distributed Deep Reinforcement Learning: Learn How to Play Atari Games in 21 minutes

Details about the benchmarks used are in Sect. 4.

The experimental set-up is explained in Sect. 4.

Nanos++ also supports nested parallelism so any of the worker threads can potentially create tasks. However the majority of the existing parallel applications are not implemented using nested parallelism.

Section 6 further describes these proposals.

OpenMP architecture review board. OpenMP Specification. 4.5 (2015)

Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exper. 23(2), 187–198 (2011)CrossRef

Ayguadé, E., Badia, R., Bellens, P., Cabrera, D., Duran, A., Ferrer, R., Gonzàlez, M., Igual, F., Jiménez-González, D., Labarta, J., Martinell, L., Martorell, X., Mayo, R., Pérez, J., Planas, J., Quintana-Ortí, E.: Extending OpenMP to survive the heterogeneous multicore era. Int. J. Parallel Prog. 38(5–6), 440–459 (2010)CrossRef

Barcelona Supercomputing Center. BSC Application Repository, 18 April 2014. https://pm.bsc.es/projects/bar

Barcelona Supercomputing Center. Nanos++

Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC, pp. 66:1–66:11 (2012)

Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou,Y.: Cilk: an efficient multithreaded runtime system. In: PPoPP, pp. 207–216 (1995)CrossRef

Bueno, J., Planas, J., Duran, A., Badia, R.M., Martorell, X., Ayguadé, E., Labarta, J.: Productive programming of GPU clusters with OmpSs. In: IPDPS, pp. 557–568 (2012)

Chapman, B.: The multicore programming challenge. In: Xu, M., Zhan, Y., Cao, J., Liu, Y. (eds.) APPT 2007. LNCS, vol. 4847, p. 3. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76837-1_3CrossRef

10.

Chasapis, D., Casas, M., Moreto, M., Vidal, R., Ayguade, E., Labarta, J., Valero, M.: PARSECSs: evaluating the impact of task parallelism in the PARSEC benchmark suite. Trans. Archit. Code Optim. 12, 41:1–41:22 (2015)

11.

Chronaki, K., Rico, A., Badia, R.M., Ayguadé, E., Labarta, J., Valero, M.: Criticality-aware dynamic task scheduling for heterogeneous architectures. In: ICS, pp. 329–338 (2015)

12.

Dallou, T., Engelhardt, N., Elhossini, A., Juurlink, B.: Nexus#: a distributed hardware task manager for task-based programming models. In: IPDPS, pp. 1129–1138 (2015)

13.

Dennard, R., Gaensslen, F., Rideout, V., Bassous, E., LeBlanc, A.: Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE J. Solid-State Circuits 9, 256–268 (1974)CrossRef

14.

Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: a proposal for programming heterogeneous multicore architectures. Parallel Process. Lett. 21, 173–193 (2011)MathSciNetCrossRef

15.

Etsion, Y., Cabarcas, F., Rico, A., Ramirez, A., Badia, R.M., Ayguade, E., Labarta, J., Valero, M.: Task superscalar: an out-of-order task pipeline. In: MICRO, pp. 89–100 (2010)

16.

Grass, T., Allande, C., Armejach, A., Rico, A., Ayguadé, E., Labarta, J., Valero, M., Casas, M., Moreto, M.: MUSA: a multi-level simulation approach for next-generation HPC machines. In: SC 2016, pp. 526–537, November 2016

17.

Jeff, B.: big.LITTLE technology moves towards fully heterogeneous global task scheduling. ARM White Paper (2013)

18.

Jeffrey, M.C., Subramanian, S., Yan, C., Emer, J., Sanchez, D.: A scalable architecture for ordered parallelism. In: MICRO, pp. 228–241 (2015)

19.

Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: architectural support for fine-grained parallelism on chip multiprocessors. In: ISCA, pp. 162–173 (2007)

20.

Manivannan, M., Stenström, P.: Runtime-guided cache coherence optimizations in multicore architectures. In: IPDPS (2014)

21.

Papaefstathiou, V., Katevenis, M.G., Nikolopoulos, D.S., Pnevmatikatos, D.: Prefetching and cache management using task lifetimes. In: ICS 2013, pp. 325–334 (2013)

22.

Reinders, J.: Intel Threading Building Blocks - Outfitting C++ for Multicore Processor Parallelism. O’Reilly, Sebastopol (2007)

23.

Rico, A., Cabarcas, F., Villavieja, C., Pavlovic, M., Vega, A., Etsion, Y., Ramirez, A., Valero, M.: On the simulation of large-scale architectures using multiple application abstraction levels. ACM Trans. Archit. Code Optim. 8(4), 36:1–36:20 (2012)CrossRef

24.

Sanchez, D., Yoo, R.M., Kozyrakis, C.: Flexible architectural support for fine-grain scheduling. In: ASPLOS, pp. 311–322 (2010)

25.

Själander, M., Terechko, A., Duranton, M.: A look-ahead task management unit for embedded multicore architectures. In: EUROMICRO DSD, pp. 149–157 (2008)

26.

Tan, X., Bosch, J., Vidal, M., Álvarez, C., Jiménez-González, D., Ayguadé, E., Valero, M.: General purpose task-dependence management hardware for task-based dataflow programming models. In: IPDPS, pp. 244–253 (2017)

27.

Vandierendonck, H., Tzenakis, G., Nikolopoulos, D.S.: A unified scheduler for recursive and task dataflow parallelism. In: PACT, pp. 1–11 (2011)

28.

Castillo, E., Alvarez, L., Moretó, M., Casas, M., Vallejo, E., Bosque, J.L., Beivide, R., Valero, M.: Architectural support for task dependence management with flexible software scheduling. In: HPCA, pp. 283–295 (2018)

Title: TaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism
Authors: Kallia Chronaki
Marc Casas
Miquel Moreto
Jaume Bosch
Rosa M. Badia
Publisher: Springer International Publishing
Book: High Performance Computing
Print ISBN: 978-3-319-92039-9

Electronic ISBN: 978-3-319-92040-5

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-92040-5_20

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"