Top

The Journal of Supercomputing

Published in:

17-08-2018

Fine-grained scheduling in multi-resource clusters

Authors: Mosong Zhou, Xiaoshe Dong, Heng Chen, Xingjun Zhang

Published in: The Journal of Supercomputing | Issue 3/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In multi-resource clusters, many schedulers allocate resources based on fixed quantities. However, fixed allocations can easily lead to resource fragmentation and over-commitment problems, which may result in lower resource utilization and performance degradation. This paper proposes a fine-grained method (FGM) to improve the allocation granularity of resource allocation. This method divides tasks into execution stages according to the task requirement estimated using similar tasks at the runtime. Then, task resource requirements are matched with the available server resources by stages to refine two aspects of allocation granularity: allocation duration and allocation quantity. In addition, the FGM may over-allocate resources deliberately to further improve resource utilization and performance. The paper tested the FGM in three environments using both online and offline workloads. The test results show that the FGM can resolve resource fragmentation and over-commitment problems by significantly improving resource utilization and performance with acceptable fairness and scheduling response times.

previous article A stochastic process-based server consolidation approach for dynamic workloads in cloud data centers

next article Editor’s Note

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Reiss C, Tumanov A, Ganger GR, Katz RH, Kozuch MA (2012) Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the 3rd ACM Symposium on Cloud Computing, pp 7–19

Staples G (2006) TORQUE resource manager. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing

Capacity Scheduler. https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html. Accessed 14 July 2017

Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. EECS Department, University of California, Berkeley, Technical report UCB/EECS-2009-55

Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th ACM European Conference on Computer Systems, pp 265–278

Zaharia M, Konwinski A, Joseph AD, Katz RH, Stoica I (2008) Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th Symposium on Operating Systems Design and Implementation, vol 8, pp 29–42

Apache Hadoop. http://hadoop.apache.org/. Accessed 14 July 2017

Ousterhout K, Wendell P, Zaharia M, Stoica I (2013) Sparrow: distributed, low latency scheduling. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles, pp 69–84

Reiss C, Tumanov A, Ganger GR, Katz RH, Kozuch MA (2012) Towards understanding heterogeneous clouds at scale: Google trace analysis. Intel Science and Technology Center for Cloud Computing, Technical report ISTC-CC-TR-12-101

10.

Abdul-Rahman OA, Aida K (2014) Towards understanding the usage behavior of Google cloud users: the mice and elephants phenomenon. In: Proceedings of the 6th IEEE International Conference on Cloud Computing Technology and Science, pp 272–277

11.

Di S, Kondo D, Cappello F (2013) Characterizing cloud applications on a Google data center. In: Proceedings of the 42th IEEE International Conference on Parallel Processing, pp 468–473

12.

Boutin E, Ekanayake J, Lin W, Shi B, Zhou J, Qian Z, Wu M, Zhou L (2014) Apollo: scalable and coordinated scheduling for cloud-scale computing. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, pp 285–300

13.

Schwarzkopf M, Konwinski A, Abd-El-Malek M, Wilkes J (2013) Omega: exible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp 351–364

14.

Grandl R, Ananthanarayanan G, Kandula S, Rao S, Akella A (2014) Multi-resource packing for cluster schedulers. In: Proceedings of the ACM Conference on SIGCOMM, pp 455–466

15.

Lu P, Lee YC, Wang C, Zhou BB, Chen J, Zomaya AY (2012) Workload characteristic oriented scheduler for MapReduce. In: Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, pp 156–163

16.

Tian C, Zhou H, He Y, Zha L (2009) A dynamic MapReduce scheduler for heterogeneous workloads. In: Proceedings of the 8th IEEE International Conference on Grid and Cooperative Computing, pp 218–224

17.

Tang Z, Liu M, Ammar A, Li K, Li K (2016) An optimized MapReduce work ow scheduling algorithm for heterogeneous computing. J Supercomput 72(6):2059–2079CrossRef

18.

Dean J, Barroso LA (2013) The tail at scale. Commun ACM 56(2):74–80CrossRef

19.

Garraghan P, Ouyang X, Yang R, McKee D, Xu J (2018) Straggler root-cause and impact analysis for massive-scale virtualized cloud datacenters. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2016.2611578 CrossRef

20.

Ghodsi A, Zaharia M, Hindman B, Konwinski A, Shenker S, Stoica I (2011) Dominant resource fairness: fair allocation of multiple resource types. In: Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, pp 232–336

21.

Grandl R, Chowdhury M, Akella A, Ananthanarayanan G (2016) Altruistic scheduling in multi-resource clusters. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, pp 65–80

22.

Vavilapalli VK et al (2013) Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th ACM Symposium on Cloud Computing, pp 1–16

23.

Zhang Z, Li C, Tao Y, Yang R, Tang H, Xu J (2014) Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. Proc VLDB Endow 7(13):1393–1404CrossRef

24.

Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J (2015) Large-scale cluster management at Google with Borg. In: Proceedings of the 10th ACM European Conference on Computer Systems, pp 1–17

25.

Jain R, Chiu DM, Hawe WR (1984) A quantitative measure of fairness and discrimination for resource allocation in shared computer system. Technical report DEC-TR-301

26.

Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef

27.

Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph AD, Katz R, Shenker S, Stoica I (2011) Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, pp 295–308

28.

Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A (2009) Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles, pp 261–276

29.

Gog I, Schwarzkopf M, Gleave A, Watson RN, Hand S (2016) Firmament: fast, centralized cluster scheduling at scale. In: Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation, pp 99–115

30.

Ananthanarayanan G, Kandula S, Greenberg A, Stoica I, Lu Y, Saha B, Harris E (2010) Reining in the outliers in map-reduce clusters using Mantri. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, pp 265–278

31.

Kung HT, Robinson JT (1981) On optimistic methods for concurrency control. ACM Trans Database Syst 6(2):213–226CrossRef

32.

Ghodsi A, Zaharia M, Shenker S, Stoica I (2013) Choosy: max–min fair sharing for datacenter jobs with constraints. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp 365–378

33.

Lee YH, Huang KC, Shieh MR, Lai KC (2017) Distributed resource allocation in federated clouds. J Supercomput 73(7):3196–3211CrossRef

34.

AlEbrahim S, Ahmad I (2017) Task scheduling for heterogeneous computing systems. J Supercomput 73(6):2313–2338CrossRef

35.

Agarwal S, Kandula S, Bruno N, Wu MC, Stoica I, Zhou J (2012) Re-optimizing data-parallel computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, pp 281–294

36.

Ferguson AD, Bodik P, Kandula S, Boutin E, Fonseca R (2012) Jockey: guaranteed job latency in data parallel clusters. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp 99–112

37.

Khan M, Jin Y, Li M, Xiang Y, Jiang C (2016) Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans Parallel Distrib Syst 27(2):441–454CrossRef

38.

Ananthanarayanan G, Ghodsi A, Wang A, Borthakur D, Kandula S, Shenker S, Stoica I (2012) Pacman: coordinated memory caching for parallel jobs. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, pp 267–280

39.

Morton K, Balazinska M, Grossman D (2010) ParaTimer: a progress indicator for MapReduce DAGs. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp 507–518

40.

Zhang X, Tune E, Hagmann R, Jnagal R, Gokhale V, Wilkes J (2013) CPI2: CPU performance isolation for shared compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp 379–391

Title: Fine-grained scheduling in multi-resource clusters
Authors: Mosong Zhou
Xiaoshe Dong
Heng Chen
Xingjun Zhang
Publication date: 17-08-2018
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 3/2020
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-018-2505-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 3/2020

Editorial Preface

Insights into relevant knowledge extraction techniques: a comprehensive review

Dependability evaluation of a disaster recovery solution for IoT infrastructures

GUIDE: an interactive and incremental approach for crawling Web applications

Empirical decision analytics approach of advanced granularity-based models for identifying performance measures of ERPS application

Transmission spectral analysis models for the assessment of white-shell eggs and brown-shell eggs freshness

Premium Partner