skip to main content
research-article

Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can

Published:17 August 2015Publication History
Skip Abstract Section

Abstract

To reduce the impact of network congestion on big data jobs, cluster management frameworks use various heuristics to schedule compute tasks and/or network flows. Most of these schedulers consider the job input data fixed and greedily schedule the tasks and flows that are ready to run. However, a large fraction of production jobs are recurring with predictable characteristics, which allows us to plan ahead for them. Coordinating the placement of data and tasks of these jobs allows for significantly improving their network locality and freeing up bandwidth, which can be used by other jobs running on the cluster. With this intuition, we develop Corral, a scheduling framework that uses characteristics of future workloads to determine an offline schedule which (i) jointly places data and compute to achieve better data locality, and (ii) isolates jobs both spatially (by scheduling them in different parts of the cluster) and temporally, improving their performance. We implement Corral on Apache Yarn, and evaluate it on a 210 machine cluster using production workloads. Compared to Yarn's capacity scheduler, Corral reduces the makespan of these workloads up to 33% and the median completion time up to 56%, with 20-90% reduction in data transferred across racks.

Skip Supplemental Material Section

Supplemental Material

p407-jalaparti.webm

webm

129.6 MB

References

  1. Amazon S3. https://aws.amazon.com/s3/.Google ScholarGoogle Scholar
  2. Amazon Web Services. http://aws.amazon.com/.Google ScholarGoogle Scholar
  3. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  4. Apache Tez. http://hortonworks.com/hadoop/tez/.Google ScholarGoogle Scholar
  5. Facebook data grows by over 500 TB daily. http://tinyurl.com/96d8oqj/.Google ScholarGoogle Scholar
  6. Hadoop Distributed Filesystem. http://hadoop.apache.org/hdfs.Google ScholarGoogle Scholar
  7. Hadoop MapReduce Next Generation - Capacity Scheduler. http://tinyurl.com/no2evu5.Google ScholarGoogle Scholar
  8. Hadoop YARN Project. http://tinyurl.com/bnadg9l.Google ScholarGoogle Scholar
  9. Microsoft Azure. https://azure.microsoft.com/.Google ScholarGoogle Scholar
  10. Microsoft Azure Storage. https://azure.microsoft.com/en-us/services/storage/.Google ScholarGoogle Scholar
  11. ORC File Format. http://tinyurl.com/n4pxofh.Google ScholarGoogle Scholar
  12. TPC Benchmark H. http://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  13. Windows Azure's Flat Network Storage and 2012 Scalability Targets. http://bit.ly/1A4Hbjt.Google ScholarGoogle Scholar
  14. S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Re-optimizing Data-parallel Computing. In NSDI 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Reoptimizing Data Parallel Computing. In NSDI'12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. Vijaykumar. ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters. In USENIX ATC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris. Scarlett: Coping with Skewed Content Popularity in Mapreduce Clusters. In EuroSys, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. P. Belkhale and P. Banerjee. An approximate algorithm for the partitionable independent task scheduling problem. Urbana, 51:61801, 1990.Google ScholarGoogle Scholar
  20. R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Datasets. In VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Chen, A. Ganapathi, R. Griffith, and Y. Katz. The Case for Evaluating MapReduce Performance Using Workload Suites. In MASCOTS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Chowdhury, S. Kandula, and I. Stoica. Leveraging Endpoint Flexibility in Data-Intensive Clusters. In ACM SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica. Managing Data Transfers in Computer Clusters with Orchestra. In ACM SIGCOMM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Chowdhury, Y. Zhong, and I. Stoica. Efficient Coflow Scheduling with Varys. In ACM SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized Task-aware Scheduling for Data Center Networks. In ACM SIGCOMM, August 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Du and J. Y.-T. Leung. Complexity of Scheduling Parallel Task Systems. SIAM J. Discret. Math., 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Elmeleegy. Piranha: Optimizing Short Jobs in Hadoop. Proc. VLDB Endow., 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, and J. McPherson. CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop. Proc. VLDB Endow., 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: Guaranteed Job Latency in Data Parallel Clusters. In EuroSys, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416--429, 1969.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource Packing for Cluster Schedulers. In SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Herodotou, F. Dong, and S. Babu. No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics. In SOCC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu. Starfish: A Self-tuning System for Big Data Analytics. In CIDR, 2011.Google ScholarGoogle Scholar
  35. C. Y. Hong, M. Caesar, and P. B. Godfrey. Finishing Flows Quickly with Preemptive Scheduling. In SIGCOMM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. V. Jalaparti, H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron. Bridging the Tenant-provider Gap in Cloud Services. In SOCC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Y. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys (CSUR), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. SkewTune: Mitigating Skew in Mapreduce Applications. In ACM SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Lepère, D. Trystram, and G. J. Woeginger. Approximation Algorithms for Scheduling Malleable Tasks Under Precedence Constraints. International Journal of Foundations of Computer Science, 13(04):613--627, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  41. M. Li, D. Subhraveti, A. R. Butt, A. Khasymski, and P. Sarkar. CAM: A Topology Aware Minimum Cost Flow Based Resource Manager for MapReduce Applications in the Cloud. In HPDC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Ovsiannikov, S. Rus, D. Reeves, P. Sutter, S. Rao, and J. Kelly. The Quantcast File System. Proc. VLDB Endow. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. B. Palanisamy, A. Singh, L. Liu, and B. Jain. Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud. In SC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive- a warehousing solution over a map-reduce framework. In VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Turek, J. L. Wolf, and P. S. Yu. Approximate Algorithms Scheduling Parallelizable Tasks. In SPAA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. G. Wang, A. Butt, P. Pandey, and K. Gupta. A Simulation Approach to Evaluating Design Decisions in MapReduce Setups. In MASCOTS, 2009.Google ScholarGoogle Scholar
  47. C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better Never than Late: Meeting Deadlines in Datacenter Networks. In ACM SIGCOMM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In HotCloud, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Zhou, N. Bruno, M.-C. Wu, P.-Å. Larson, R. Chaiken, and D. Shakib. Scope: parallel databases meet mapreduce. VLDB J., 21(5):611--636, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader