ABSTRACT
As organizations start to use data-intensive cluster computing systems like Hadoop and Dryad for more applications, there is a growing need to share clusters between users. However, there is a conflict between fairness in scheduling and data locality (placing tasks on nodes that contain their input data). We illustrate this problem through our experience designing a fair scheduler for a 600-node Hadoop cluster at Facebook. To address the conflict between locality and fairness, we propose a simple algorithm called delay scheduling: when the job that should be scheduled next according to fairness cannot launch a local task, it waits for a small amount of time, letting other jobs launch tasks instead. We find that delay scheduling achieves nearly optimal data locality in a variety of workloads and can increase throughput by up to 2x while preserving fairness. In addition, the simplicity of delay scheduling makes it applicable under a wide variety of scheduling policies beyond fair sharing.
- Amazon EC2. http://aws.amazon.com/ec2/.Google Scholar
- Apache Hadoop. http://hadoop.apache.org.Google Scholar
- Apache Hive. http://hadoop.apache.org/hive/.Google Scholar
- Hadoop Map/Reduce tutorial. http://hadoop.apache.org/common/docs/current/mapred_tutorial.html.Google Scholar
- Hive performance benchmarks. http://issues.apache.org/jira/browse/HIVE-396.Google Scholar
- HP Neoview Workload Management Services Guide. \smallurlhttp://www.docs.hp.com/en/544806-001/Neoview_WMS_Guide_R2.3.pdf.Google Scholar
- Max-Min Fairness (Wikipedia). http://en.wikipedia.org/wiki/Max-min_fairness.Google Scholar
- NSF Cluster Exploratory (CluE) Program Solicitation. http://www.nsf.gov/pubs/2008/nsf08560/nsf08560.htm.Google Scholar
- Official Google Blog: Sorting 1PB with MapReduce. \smallurlhttp://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.%html.Google Scholar
- Open Cirrus. http://opencirrus.org/.Google Scholar
- Personal communication with Hans Zeller of HP.Google Scholar
- Personal communication with Owen O'Malley of the Yahoo! Hadoop team.Google Scholar
- TORQUE Resource Manager. http://www.clusterresources.com/pages/products/torque-resource-manager.php.Google Scholar
- Yahoo! Launches New Program to Advance Open-Source Software for Internet Computing. http://research.yahoo.com/node/1879.Google Scholar
- J. Bennett and H. Zhang. WF2Q): Worst-case fair weighted fair queueing. In IEEE INFOCOM'96, pages 120--128, 1996. Google ScholarDigital Library
- J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control in a batch--aware distributed file system. In NSDI'04, 2004. Google ScholarDigital Library
- A. Chervenak, E. Deelman, M. Livny, M.-H. Su, R. Schuler, S. Bharathi, G. Mehta, and K. Vahi. Data Placement for Scientific Applications in Distributed Environments. In Proc. 8th IEEE/ACM International Conference on Grid Computing (Grid 2007), September 2007. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- A. Demers, S. Keshav, and S. Shenker. Analysis and simulation of a fair queueing algorithm. In Journal of Internetworking Research and Experience, pages 3--26, Oct. 1990.Google Scholar
- S. Floyd and V. Jacobson. Link-sharing and resource management models for packet networks. IEEE/ACM Transactions on Networking, 3(4):365--386, 1995. Google ScholarDigital Library
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proc. SOSP 2003, pages 29--43, 2003. Google ScholarDigital Library
- B. Hindman, A. Konwinski, M. Zaharia, and I. Stoica. A common substrate for cluster computing. In Workshop on Hot Topics in Cloud Computing (HotCloud) 2009, 2009. Google ScholarDigital Library
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys 2007, pages 59--72, 2007. Google ScholarDigital Library
- M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In SOSP 2009, 2009. Google ScholarDigital Library
- J. Nieh and M. S. Lam. A SMART scheduler for multimedia applications. ACM TOCS, 21(2):117--163, 2003. Google ScholarDigital Library
- A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD'09, 2009. Google ScholarDigital Library
- I. Stoica, H. Zhang, and T. Ng. A hierarchical fair service curve algorithm for link-sharing, real-time and priority service. In SIGCOMM'97, pages 162--173, Sept. 1997. Google ScholarDigital Library
- D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the Condor experience. Concurrency and Computation Practice and Experience, 17(2-4):323--356, 2005. Google ScholarDigital Library
- C. A. Waldspurger. Lottery and Stride Scheduling: Flexible Proportional-Share Resource Management. PhD thesis, MIT, Laboratory of Computer Science, 1995. MIT/LCS/TR-667. Google ScholarDigital Library
- C. A. Waldspurger and W. E. Weihl. Lottery scheduling: Flexible proportional-share resource management. In OSDI 94, 1994. Google ScholarDigital Library
Index Terms
- Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Recommendations
Preemptive Hadoop Jobs Scheduling under a Deadline
SKG '12: Proceedings of the 2012 Eighth International Conference on Semantics, Knowledge and GridsMapReduce has become the dominant programming model in a cloud-based data processing environment, such as Hadoop. First In First Out (FIFO) is the default job scheduling policy of Hadoop, but it cannot guarantee that the job will be completed by a ...
Scheduling of deteriorating jobs with release dates to minimize the maximum lateness
In this paper, we consider the problem of scheduling n deteriorating jobs with release dates on a single (batching) machine. Each job's processing time is a simple linear function of its starting time. The objective is to minimize the maximum lateness. ...
Delay tails in MapReduce scheduling
SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer SystemsMapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job processing times. These phenomena are resultant of the workload features and the adopted scheduling algorithms. Analytically understanding the delays under different ...
Comments