Skip to main content
Top
Published in: Computing 6/2018

01-11-2017

Locality-aware task scheduling for homogeneous parallel computing systems

Authors: Muhammad Khurram Bhatti, Isil Oz, Sarah Amin, Maria Mushtaq, Umer Farooq, Konstantin Popov, Mats Brorsson

Published in: Computing | Issue 6/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce execution time and energy consumption of parallel applications. Locality can be exploited at various hardware and software layers. For instance, by implementing private and shared caches in a multi-level fashion, recent hardware designs are already optimised for locality. However, this would all be useless if the software scheduling does not cast the execution in a manner that promotes locality available in the programs themselves. Since programs for parallel systems consist of tasks executed simultaneously, task scheduling becomes crucial for the performance in multi-level cache architectures. This paper presents a heuristic algorithm for homogeneous multi-core systems called locality-aware task scheduling (LeTS). The LeTS heuristic is a work-conserving algorithm that takes into account both locality and load balancing in order to reduce the execution time of target applications. The working principle of LeTS is based on two distinctive phases, namely; working task group formation phase (WTG-FP) and working task group ordering phase (WTG-OP). The WTG-FP forms groups of tasks in order to capture data reuse across tasks while the WTG-OP determines an optimal order of execution for task groups that minimizes the reuse distance of shared data between tasks. We have performed experiments using randomly generated task graphs by varying three major performance parameters, namely: (1) communication to computation ratio (CCR) between 0.1 and 1.0, (2) application size, i.e., task graphs comprising of 50-, 100-, and 300-tasks per graph, and (3) number of cores with 2-, 4-, 8-, and 16-cores execution scenarios. We have also performed experiments using selected real-world applications. The LeTS heuristic reduces overall execution time of applications by exploiting inter-task data locality. Results show that LeTS outperforms state-of-the-art algorithms in amortizing inter-task communication cost.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Wolf W, Jerraya AA, Martin G (2008) Multiprocessor system-on-chip (MPSoC) technology. IEEE Trans CAD ICs Syst 27(10):1701–1713CrossRef Wolf W, Jerraya AA, Martin G (2008) Multiprocessor system-on-chip (MPSoC) technology. IEEE Trans CAD ICs Syst 27(10):1701–1713CrossRef
3.
go back to reference Yoo RM, Hughes CJ, Kim C, Chen Y-K, Kozyrakis C (2013) Locality-aware task management for unstructured parallelism: a quantitative limit study. In: Proceedings of the twenty-fifth annual ACM symposium on parallelism in algorithms and architectures, ser. SPAA ’13. ACM, New York, NY, pp 315–325. https://doi.org/10.1145/2486159.2486175 Yoo RM, Hughes CJ, Kim C, Chen Y-K, Kozyrakis C (2013) Locality-aware task management for unstructured parallelism: a quantitative limit study. In: Proceedings of the twenty-fifth annual ACM symposium on parallelism in algorithms and architectures, ser. SPAA ’13. ACM, New York, NY, pp 315–325. https://​doi.​org/​10.​1145/​2486159.​2486175
4.
go back to reference Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Pearson A. Wesley, ReadingMATH Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Pearson A. Wesley, ReadingMATH
5.
go back to reference Sinnen O, Sousa L (2004) List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures. Parallel Comput 30(1):81–101CrossRef Sinnen O, Sousa L (2004) List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures. Parallel Comput 30(1):81–101CrossRef
7.
go back to reference Bhatti MK, Belleudy C, Auguin M (2011) Hybrid power management in real time embedded systems: an interplay of DVFs and DPM techniques. Real-Time Syst 47(2):143–162CrossRef Bhatti MK, Belleudy C, Auguin M (2011) Hybrid power management in real time embedded systems: an interplay of DVFs and DPM techniques. Real-Time Syst 47(2):143–162CrossRef
8.
go back to reference Shahul AS, Sinnen O (2010) Scheduling task graphs optimally with a*. J Supercomput 51(3):310–332CrossRef Shahul AS, Sinnen O (2010) Scheduling task graphs optimally with a*. J Supercomput 51(3):310–332CrossRef
9.
go back to reference Sinnen O, Sousa LA (2005) Communication contention in task scheduling. IEEE Trans Parallel Distrib Syst 16(6):503–515CrossRef Sinnen O, Sousa LA (2005) Communication contention in task scheduling. IEEE Trans Parallel Distrib Syst 16(6):503–515CrossRef
10.
go back to reference Dally W (2009) The future of GPU computing. In: The 22nd annual supercomputing conference Dally W (2009) The future of GPU computing. In: The 22nd annual supercomputing conference
11.
go back to reference Hill M, Kozyrakis C (2012) Advancing computer systems without technology progress. In: DARPA/ISAT workshop Hill M, Kozyrakis C (2012) Advancing computer systems without technology progress. In: DARPA/ISAT workshop
12.
go back to reference Consortium CC (2012) 21st century computer architecture. A community white paper Consortium CC (2012) 21st century computer architecture. A community white paper
14.
go back to reference Sinnen O (2007) Task scheduling for parallel systems. Wiley, New York. ISBN 978-0-471-73576-2CrossRef Sinnen O (2007) Task scheduling for parallel systems. Wiley, New York. ISBN 978-0-471-73576-2CrossRef
15.
go back to reference Yang T, Gerasoulis A (1994) Dsc: scheduling parallel tasks on an unbounded number of processors. IEEE Trans Parallel Distrib Syst 5(9):951–967CrossRef Yang T, Gerasoulis A (1994) Dsc: scheduling parallel tasks on an unbounded number of processors. IEEE Trans Parallel Distrib Syst 5(9):951–967CrossRef
16.
go back to reference Kasahara H, Narita S (1984) Practical multiprocessor scheduling algorithms for efficient parallel processing. IEEE Trans Comput C–33(11):1023–1029CrossRef Kasahara H, Narita S (1984) Practical multiprocessor scheduling algorithms for efficient parallel processing. IEEE Trans Comput C–33(11):1023–1029CrossRef
17.
go back to reference Khan MA (2012) Scheduling for heterogeneous systems using constrained critical paths. Parallel Comput 38:175–193CrossRef Khan MA (2012) Scheduling for heterogeneous systems using constrained critical paths. Parallel Comput 38:175–193CrossRef
18.
go back to reference Topcuouglu H, Hariri S, you Wu M (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274CrossRef Topcuouglu H, Hariri S, you Wu M (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274CrossRef
19.
go back to reference Kwok Y-K, Ahmad I (2000) Link contention-constrained scheduling and mapping of tasks and messages to a network of heterogeneous processors. Cluster Comput 3(2):113–124CrossRef Kwok Y-K, Ahmad I (2000) Link contention-constrained scheduling and mapping of tasks and messages to a network of heterogeneous processors. Cluster Comput 3(2):113–124CrossRef
20.
go back to reference Ahmad I, Kwok Y-K (1998) On exploiting task duplication in parallel program scheduling. IEEE Trans Parallel Distrib Syst 9(9):872–892CrossRef Ahmad I, Kwok Y-K (1998) On exploiting task duplication in parallel program scheduling. IEEE Trans Parallel Distrib Syst 9(9):872–892CrossRef
21.
go back to reference Kwok Y-K, Ahmad I (1996) Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors. IEEE Trans Parallel Distrib Syst 7(5):506–521CrossRef Kwok Y-K, Ahmad I (1996) Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors. IEEE Trans Parallel Distrib Syst 7(5):506–521CrossRef
22.
go back to reference Wu M-Y, Gajski D (1990) Hypertool: a programming aid for message-passing systems. IEEE Trans Parallel Distrib Syst 1(3):330–343CrossRef Wu M-Y, Gajski D (1990) Hypertool: a programming aid for message-passing systems. IEEE Trans Parallel Distrib Syst 1(3):330–343CrossRef
23.
go back to reference Fard HM, Prodan R, Barrionuevo JJD, Fahringer T (2012) A multi-objective approach for workflow scheduling in heterogeneous environments. In: 2012 12th IEEE/ACM international symposium on cluster, cloud and grid computing (ccgrid 2012), pp 300–309 Fard HM, Prodan R, Barrionuevo JJD, Fahringer T (2012) A multi-objective approach for workflow scheduling in heterogeneous environments. In: 2012 12th IEEE/ACM international symposium on cluster, cloud and grid computing (ccgrid 2012), pp 300–309
24.
go back to reference Arabnejad H, Barbosa J (2014) List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Trans Parallel Distrib Syst 25(3):682–694CrossRef Arabnejad H, Barbosa J (2014) List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Trans Parallel Distrib Syst 25(3):682–694CrossRef
25.
go back to reference Iverson MA, Ozguner F, Follen GJ (1995) Parallelizing existing applications in a distributed heterogeneous environment. In: HCW ’95, pp 93–100 Iverson MA, Ozguner F, Follen GJ (1995) Parallelizing existing applications in a distributed heterogeneous environment. In: HCW ’95, pp 93–100
27.
go back to reference Kim S, Browne J (1988) General approach to mapping of parallel computations upon multiprocessor architectures. Unknown J 3:1–8 Kim S, Browne J (1988) General approach to mapping of parallel computations upon multiprocessor architectures. Unknown J 3:1–8
28.
go back to reference Sarkar V (1989) Partitioning and scheduling parallel programs for multiprocessors. MIT Press, Cambridge, MAMATH Sarkar V (1989) Partitioning and scheduling parallel programs for multiprocessors. MIT Press, Cambridge, MAMATH
29.
go back to reference Kanemitsu H, Hanada M, Nakazato H (2016) Clustering-based task scheduling in a large number of heterogeneous processors. IEEE Trans Parallel Distrib Syst 27(11):3144–3157CrossRef Kanemitsu H, Hanada M, Nakazato H (2016) Clustering-based task scheduling in a large number of heterogeneous processors. IEEE Trans Parallel Distrib Syst 27(11):3144–3157CrossRef
30.
go back to reference Shahul AZ, Sinnen O (2010) Scheduling task graphs optimally with a*. J Supercomput 51(3):310–332CrossRef Shahul AZ, Sinnen O (2010) Scheduling task graphs optimally with a*. J Supercomput 51(3):310–332CrossRef
31.
go back to reference Deelman E, Singh G, Su M-H, Blythe J, Gil Y, Kesselman C, Mehta G, Vahi K, Berriman GB, Good J, Laity A, Jacob JC, Katz DS (2005) Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci Program 13(3):219–237 Deelman E, Singh G, Su M-H, Blythe J, Gil Y, Kesselman C, Mehta G, Vahi K, Berriman GB, Good J, Laity A, Jacob JC, Katz DS (2005) Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci Program 13(3):219–237
32.
go back to reference Darte A, Robert Y, Vivien F (2002) Scheduling and automatic parallelization. BirkhŁuser, New York. ISBN 0-8176-4149-1MATH Darte A, Robert Y, Vivien F (2002) Scheduling and automatic parallelization. BirkhŁuser, New York. ISBN 0-8176-4149-1MATH
33.
go back to reference Suter F, Desprez F, Casanova H (2004) From heterogeneous task scheduling to heterogeneous mixed parallel scheduling. In: Euro-Par 2004 parallel processing, pp 230–237 Suter F, Desprez F, Casanova H (2004) From heterogeneous task scheduling to heterogeneous mixed parallel scheduling. In: Euro-Par 2004 parallel processing, pp 230–237
34.
go back to reference Orsila H, Kangas T, Salminen E, Hamalainen TD, Hannikainen M (2007) Automated memory-aware application distribution for multi-processor system-on-chips. JSA 53(11):795–815 Orsila H, Kangas T, Salminen E, Hamalainen TD, Hannikainen M (2007) Automated memory-aware application distribution for multi-processor system-on-chips. JSA 53(11):795–815
35.
go back to reference de Langen P, Juurlink B (2009) Leakage-aware multiprocessor scheduling. J Signal Process Syst 57(1):73–88CrossRef de Langen P, Juurlink B (2009) Leakage-aware multiprocessor scheduling. J Signal Process Syst 57(1):73–88CrossRef
36.
go back to reference Bhatti MK, Oz I, Popov K, Muddukrishna A, Brorsson M (2014) Noodle: a heuristic algorithm for task scheduling in MPSoC architectures. In: 2014 17th Euromicro conference on digital system design (DSD). IEEE, pp 667–670 Bhatti MK, Oz I, Popov K, Muddukrishna A, Brorsson M (2014) Noodle: a heuristic algorithm for task scheduling in MPSoC architectures. In: 2014 17th Euromicro conference on digital system design (DSD). IEEE, pp 667–670
Metadata
Title
Locality-aware task scheduling for homogeneous parallel computing systems
Authors
Muhammad Khurram Bhatti
Isil Oz
Sarah Amin
Maria Mushtaq
Umer Farooq
Konstantin Popov
Mats Brorsson
Publication date
01-11-2017
Publisher
Springer Vienna
Published in
Computing / Issue 6/2018
Print ISSN: 0010-485X
Electronic ISSN: 1436-5057
DOI
https://doi.org/10.1007/s00607-017-0581-6

Premium Partner