ABSTRACT
Communication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flow-level metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-level requirements. The recently proposed coflow abstraction bridges this gap and creates new opportunities for network scheduling. In this paper, we address inter-coflow scheduling for two different objectives: decreasing communication time of data-intensive jobs and guaranteeing predictable communication time. We introduce the concurrent open shop scheduling with coupled resources problem, analyze its complexity, and propose effective heuristics to optimize either objective. We present Varys, a system that enables data-intensive frameworks to use coflows and the proposed algorithms while maintaining high network utilization and guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete up to 3.16X faster on average and up to 2X more coflows meet their deadlines using Varys in comparison to per-flow mechanisms. Moreover, Varys outperforms non-preemptive coflow schedulers by more than 5X.
- Akka. http://akka.io.Google Scholar
- Amazon EC2. http://aws.amazon.com/ec2.Google Scholar
- Apache Hadoop. http://hadoop.apache.org.Google Scholar
- Apache Hive. http://hive.apache.org.Google Scholar
- Kryo serialization library. https://code.google.com/p/kryo.Google Scholar
- S. Agarwal et al. Reoptimizing data parallel computing. In NSDI'12. Google ScholarDigital Library
- M. Al-Fares et al. Hedera: Dynamic flow scheduling for data center networks. In NSDI. 2010. Google ScholarDigital Library
- M. Alizadeh et al. pFabric: Minimal near-optimal datacenter transport. In SIGCOMM. 2013. Google ScholarDigital Library
- G. Ananthanarayanan et al. Reining in the outliers in mapreduce clusters using Mantri. In OSDI. 2010. Google ScholarDigital Library
- G. Ananthanarayanan et al. PACMan: Coordinated memory caching for parallel jobs. In NSDI. 2012. Google ScholarDigital Library
- H. Ballani et al. Towards predictable datacenter networks. In SIGCOMM. 2011. Google ScholarDigital Library
- T. Benson et al. MicroTE: Fine grained traffic engineering for data centers. In CoNEXT. 2011. Google ScholarDigital Library
- D. Borthakur. The Hadoop distributed file system: Architecture and design. Hadoop Project Website, 2007.Google Scholar
- R. Chaiken et al. SCOPE: Easy and efficient parallel processing of massive datasets. In VLDB. 2008. Google ScholarDigital Library
- M. Chowdhury et al. Managing data transfers in computer clusters with Orchestra. In SIGCOMM. 2011. Google ScholarDigital Library
- M. Chowdhury et al. Coflow: A networking abstraction for cluster applications. In HotNets-XI, pages 31--36. 2012. Google ScholarDigital Library
- M. Chowdhury et al. Leveraging endpoint flexibility in data-intensive clusters. In SIGCOMM. 2013. Google ScholarDigital Library
- J. Dean et al. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137--150. 2004. Google ScholarDigital Library
- F. Dogar et al. Decentralized task-aware scheduling for data center networks. Technical Report MSR-TR-2013--96, 2013.Google Scholar
- A. Dragojevi et al. FaRM: Fast remote memory. In NSDI. 2014. Google ScholarDigital Library
- A. D. Ferguson et al. Participatory networking: An API for application control of SDNs. In SIGCOMM. 2013. Google ScholarDigital Library
- A. Greenberg et al. VL2: A scalable and flexible data center network. In SIGCOMM. 2009. Google ScholarDigital Library
- Z. Guo et al. Spotting code optimizations in data-parallel pipelines through PeriSCOPE. In OSDI. 2012. Google ScholarDigital Library
- B. Hindman et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI. 2011. Google ScholarDigital Library
- C.-Y. Hong et al. Finishing flows quickly with preemptive scheduling. In SIGCOMM. 2012. Google ScholarDigital Library
- M. Isard et al. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59--72. 2007. Google ScholarDigital Library
- N. Kang et al. Optimizing the "One Big Switch" abstraction in Software-Defined Networks. In CoNEXT. 2013. Google ScholarDigital Library
- J. E. Kelley. Critical-path planning and scheduling: Mathematical basis. Operations Research, 9(3):296--320, 1961.Google ScholarDigital Library
- G. Malewicz et al. Pregel: A system for large-scale graph processing. In SIGMOD. 2010. Google ScholarDigital Library
- M. Mastrolilli et al. Minimizing the sum of weighted completion times in a concurrent open shop. Operations Research Letters, 38(5):390--395, 2010. Google ScholarDigital Library
- N. McKeown et al. Achieving 100% throughput in an input-queued switch. IEEE Transactions on Communications, 47(8), 1999.Google ScholarCross Ref
- R. N. Mysore et al. PortLand: A scalable fault-tolerant layer 2 data center network fabric. In SIGCOMM, pages 39--50. 2009. Google ScholarDigital Library
- L. Popa et al. FairCloud: Sharing the network in cloud computing. In SIGCOMM. 2012. Google ScholarDigital Library
- T. A. Roemer. A note on the complexity of the concurrent open shop problem. Journal of Scheduling, 9(4):389--396, 2006. Google ScholarDigital Library
- A. Shieh et al. Sharing the data center network. In NSDI. 2011. Google ScholarDigital Library
- N. Tolia et al. An architecture for internet data transfer. In NSDI'06. Google ScholarDigital Library
- L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1990. Google ScholarDigital Library
- C. A. Waldspurger et al. Lottery scheduling: Flexible proportional-share resource management. In OSDI. 1994. Google ScholarDigital Library
- D. Xie et al. The only constant is change: Incorporating time-varying network reservations in data centers. In SIGCOMM. 2012. Google ScholarDigital Library
- M. Zaharia et al. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys. 2010. Google ScholarDigital Library
- M. Zaharia et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI. 2012. Google ScholarDigital Library
Index Terms
- Efficient coflow scheduling with Varys
Recommendations
Efficient Coflow Scheduling Without Prior Knowledge
SIGCOMM'15Inter-coflow scheduling improves application-level communication performance in data-parallel clusters. However, existing efficient schedulers require a priori coflow information and ignore cluster dynamics like pipelining, task failures, and ...
Sincronia: near-optimal network design for coflows
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data CommunicationWe present Sincronia, a near-optimal network design for coflows that can be implemented on top on any transport layer (for flows) that supports priority scheduling. Sincronia achieves this using a key technical result --- we show that given a "right" ...
Efficient coflow scheduling with Varys
SIGCOMM'14Communication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flow-level metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-...
Comments