research-article

Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Authors:
Matei Zaharia

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Dhruba Borthakur

Facebook Inc, Palo Alto, CA, USA

Facebook Inc, Palo Alto, CA, USA
View Profile

,
Joydeep Sen Sarma

Facebook Inc, Palo Alto, CA, USA

Facebook Inc, Palo Alto, CA, USA
View Profile

,
Khaled Elmeleegy

Yahoo! Research, Sunnyvale, CA, USA

Yahoo! Research, Sunnyvale, CA, USA
View Profile

,
Scott Shenker

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Ion Stoica

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

EuroSys '10: Proceedings of the 5th European conference on Computer systemsApril 2010Pages 265–278https://doi.org/10.1145/1755913.1755940

Published:13 April 2010Publication History

EuroSys '10: Proceedings of the 5th European conference on Computer systems

Pages 265–278

ABSTRACT

As organizations start to use data-intensive cluster computing systems like Hadoop and Dryad for more applications, there is a growing need to share clusters between users. However, there is a conflict between fairness in scheduling and data locality (placing tasks on nodes that contain their input data). We illustrate this problem through our experience designing a fair scheduler for a 600-node Hadoop cluster at Facebook. To address the conflict between locality and fairness, we propose a simple algorithm called delay scheduling: when the job that should be scheduled next according to fairness cannot launch a local task, it waits for a small amount of time, letting other jobs launch tasks instead. We find that delay scheduling achieves nearly optimal data locality in a variety of workloads and can increase throughput by up to 2x while preserving fairness. In addition, the simplicity of delay scheduling makes it applicable under a wide variety of scheduling policies beyond fair sharing.

References

Amazon EC2. http://aws.amazon.com/ec2/.Google Scholar
Apache Hadoop. http://hadoop.apache.org.Google Scholar
Apache Hive. http://hadoop.apache.org/hive/.Google Scholar
Hadoop Map/Reduce tutorial. http://hadoop.apache.org/common/docs/current/mapred_tutorial.html.Google Scholar
Hive performance benchmarks. http://issues.apache.org/jira/browse/HIVE-396.Google Scholar
HP Neoview Workload Management Services Guide. \smallurlhttp://www.docs.hp.com/en/544806-001/Neoview_WMS_Guide_R2.3.pdf.Google Scholar
Max-Min Fairness (Wikipedia). http://en.wikipedia.org/wiki/Max-min_fairness.Google Scholar
NSF Cluster Exploratory (CluE) Program Solicitation. http://www.nsf.gov/pubs/2008/nsf08560/nsf08560.htm.Google Scholar
Official Google Blog: Sorting 1PB with MapReduce. \smallurlhttp://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.%html.Google Scholar
Open Cirrus. http://opencirrus.org/.Google Scholar
Personal communication with Hans Zeller of HP.Google Scholar
Personal communication with Owen O'Malley of the Yahoo! Hadoop team.Google Scholar
TORQUE Resource Manager. http://www.clusterresources.com/pages/products/torque-resource-manager.php.Google Scholar
Yahoo! Launches New Program to Advance Open-Source Software for Internet Computing. http://research.yahoo.com/node/1879.Google Scholar
J. Bennett and H. Zhang. WF²Q): Worst-case fair weighted fair queueing. In IEEE INFOCOM'96, pages 120--128, 1996. Google ScholarDigital Library
J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control in a batch--aware distributed file system. In NSDI'04, 2004. Google ScholarDigital Library
A. Chervenak, E. Deelman, M. Livny, M.-H. Su, R. Schuler, S. Bharathi, G. Mehta, and K. Vahi. Data Placement for Scientific Applications in Distributed Environments. In Proc. 8th IEEE/ACM International Conference on Grid Computing (Grid 2007), September 2007. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarDigital Library
A. Demers, S. Keshav, and S. Shenker. Analysis and simulation of a fair queueing algorithm. In Journal of Internetworking Research and Experience, pages 3--26, Oct. 1990.Google Scholar
S. Floyd and V. Jacobson. Link-sharing and resource management models for packet networks. IEEE/ACM Transactions on Networking, 3(4):365--386, 1995. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proc. SOSP 2003, pages 29--43, 2003. Google ScholarDigital Library
B. Hindman, A. Konwinski, M. Zaharia, and I. Stoica. A common substrate for cluster computing. In Workshop on Hot Topics in Cloud Computing (HotCloud) 2009, 2009. Google ScholarDigital Library
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys 2007, pages 59--72, 2007. Google ScholarDigital Library
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In SOSP 2009, 2009. Google ScholarDigital Library
J. Nieh and M. S. Lam. A SMART scheduler for multimedia applications. ACM TOCS, 21(2):117--163, 2003. Google ScholarDigital Library
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD'09, 2009. Google ScholarDigital Library
I. Stoica, H. Zhang, and T. Ng. A hierarchical fair service curve algorithm for link-sharing, real-time and priority service. In SIGCOMM'97, pages 162--173, Sept. 1997. Google ScholarDigital Library
D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the Condor experience. Concurrency and Computation Practice and Experience, 17(2-4):323--356, 2005. Google ScholarDigital Library
C. A. Waldspurger. Lottery and Stride Scheduling: Flexible Proportional-Share Resource Management. PhD thesis, MIT, Laboratory of Computer Science, 1995. MIT/LCS/TR-667. Google ScholarDigital Library
C. A. Waldspurger and W. E. Weihl. Lottery scheduling: Flexible proportional-share resource management. In OSDI 94, 1994. Google ScholarDigital Library

Index Terms

Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Preemptive Hadoop Jobs Scheduling under a Deadline
SKG '12: Proceedings of the 2012 Eighth International Conference on Semantics, Knowledge and Grids

MapReduce has become the dominant programming model in a cloud-based data processing environment, such as Hadoop. First In First Out (FIFO) is the default job scheduling policy of Hadoop, but it cannot guarantee that the job will be completed by a ...
Read More
Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

In this paper, we consider the problem of scheduling n deteriorating jobs with release dates on a single (batching) machine. Each job's processing time is a simple linear function of its starting time. The objective is to minimize the maximum lateness. ...
Read More
Delay tails in MapReduce scheduling
SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems

MapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job processing times. These phenomena are resultant of the workload features and the adopted scheduling algorithms. Analytically understanding the delays under different ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '10: Proceedings of the 5th European conference on Computer systems
April 2010
388 pages
ISBN:9781605585772
DOI:10.1145/1755913
General Chair:
Christine Morin
INRIA Rennes, France
,
Program Chair:
Gilles Muller
INRIA/LIP6, France
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cluster computing
fair sharing
mapreduce
scheduling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate241of1,308submissions,18%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,014
  Total Citations
  View Citations
- 4,881
  Total Downloads
- Downloads (Last 12 months)173
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

EuroSys '10: Proceedings of the 5th European conference on Computer systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Preemptive Hadoop Jobs Scheduling under a Deadline

Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

Delay tails in MapReduce scheduling