ABSTRACT
Datacenter networks provide high path diversity for traffic between machines. Load balancing traffic across these paths is important for both, latency- and throughput-sensitive applications. The standard load balancing techniques used today obliviously hash a flow to a random path. When long flows collide on the same path, this might lead to long lasting congestion while other paths could be underutilized, degrading performance of other flows as well. Recent proposals to address this shortcoming incur significant implementation complexity at the host that would actually slow down short flows (MPTCP), depend on relatively slow centralized controllers for rerouting large congesting flows (Hedera), or require custom switch hardware, hindering near-term deployment (DeTail).
We propose FlowBender, a novel technique that: (1) Load balances distributively at the granularity of flows instead of packets, avoiding excessive packet reordering. (2) Uses end-host-driven rehashing to trigger dynamic flow-to-path assignment. (3) Recovers from link failures within a Retransmit Timeout (RTO). (4) Amounts to less than 50 lines of critical kernel code and is readily deployable in commodity data centers today. (5) Is very robust and simple to tune. We evaluate FlowBender using both simulations and a real testbed implementation, and show that it improves average and tail latencies significantly compared to state of the art techniques without incurring the significant overhead and complexity of other load balancing schemes.
- 802.1Qbb - priority-based flow control. http://www.ieee802.org/1/pages/802.1bb.html.Google Scholar
- Avoiding network polarization and increasing visibility in cloud networks using broadcom smart hash technology. http://www.broadcom.com/collateral/wp/StrataXGS_SmartSwitch-WP200-R.pdf.Google Scholar
- Cisco cli command reference. http://www.cisco.com/en/US/docs/wireless/asr_901/Command/Reference/Cmdref_asr901.html.Google Scholar
- NS-3 network simulator. http://www.nsnam.org/.Google Scholar
- Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A scalable, commodity data center network architecture. In Proceedings of the ACM SIGCOMM 2008 conference on Data communication, SIGCOMM '08, pages 63--74, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, and Amin Vahdat. Hedera: dynamic flow scheduling for data center networks. In Proceedings of the 7th USENIX conference on Networked systems design and implementation, NSDI'10, pages 19--19, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarDigital Library
- Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. Data center tcp (dctcp). In Proceedings of the ACM SIGCOMM 2010 conference, SIGCOMM '10, pages 63--74, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- Theophilus Benson, Aditya Akella, and David A. Maltz. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, IMC '10, pages 267--280, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- Andrew R. Curtis, Wonho Kim, and Praveen Yalagandula. Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection. In INFOCOM, pages 1629--1637. IEEE.Google Scholar
- Andrew R. Curtis, Jeffrey C. Mogul, Jean Tourrilhes, Praveen Yalagandula, Puneet Sharma, and Sujata Banerjee. Devoflow: scaling flow management for high-performance networks. In Proceedings of the ACM SIGCOMM 2011 conference, SIGCOMM '11, pages 254--265, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- W. J. Dally. Virtual-channel flow control. IEEE Trans. Parallel Distrib. Syst., 3(2):194--205, March 1992. Google ScholarDigital Library
- Daniel M. Dias and Manoj Kumar. Preventing congestion in multistage networks in the presence of hotspots. In ICPP (1)'89, pages 9--13, 1989.Google Scholar
- A. Dixit, P. Prakash, Y.C. Hu, and R.R. Kompella. On the impact of packet spraying in data center networks. In INFOCOM, 2013 Proceedings IEEE, pages 2130--2138, 2013.Google ScholarCross Ref
- John Kim, William J. Dally, and Dennis Abts. Adaptive routing in high-radix clos network. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, SC '06, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- Charles E. Leiserson. Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput., 34(10):892--901, October 1985. Google ScholarDigital Library
- Jayaram Mudigonda, Praveen Yalagandula, Mohammad Al-Fares, and Jeffrey C. Mogul. Spain: Cots data-center ethernet for multipathing over arbitrary topologies. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI'10, 2010. Google ScholarDigital Library
- Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik, and Mark Handley. Improving datacenter performance and robustness with multipath tcp. In Proceedings of the ACM SIGCOMM 2011 conference, SIGCOMM '11, pages 266--277, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- Costin Raiciu, Christoph Paasch, SÈbastien BarrÈ, Alan Ford, Michio Honda, Fabien Duchene, Olivier Bonaventure, and Mark Handley. How hard can it be? designing and implementing a deployable multipath tcp. In USENIX Symposium of Networked Systems Design and Implementation (NSDI'12), San Jose (CA), 2012. Google ScholarDigital Library
- J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in system design. ACM Trans. Comput. Syst., pages 277--288, 1984. Google ScholarDigital Library
- Jose Renato Santos, Yoshio Turner, and G. (john Janakiraman. End-to-end congestion control for infiniband. In In proceedings of Infocom03, 2003.Google ScholarCross Ref
- Balajee Vamanan, Jahangir Hasan, and T.N. Vijaykumar. Deadline-aware datacenter tcp (d2tcp). In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM '12, 2012. Google ScholarDigital Library
- Damon Wischik, Costin Raiciu, Adam Greenhalgh, and Mark Handley. Design, implementation and evaluation of congestion control for multipath tcp. In Proceedings of the 8th USENIX conference on Networked systems design and implementation, NSDI'11, pages 8--8, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarDigital Library
- David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, and Randy Katz. Detail: reducing the flow completion time tail in datacenter networks. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, SIGCOMM '12, pages 139--150, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
Index Terms
- FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in Datacenter Networks
Recommendations
A scalable load balancer for forwarding internet traffic: exploiting flow-level burstiness
ANCS '05: Proceedings of the 2005 ACM symposium on Architecture for networking and communications systemsPacket scheduling in parallel forwarding systems is a hard problem. Two major goals of a scheduler that distributes incoming packets to multiple forwarding engines are to achieve high system utilization (by balancing the load evenly among the multiple ...
TCP over optical burst-switched networks with controlled burst retransmission
For optical burst-switched (OBS) networks in which TCP is implemented at a higher layer, the loss of bursts can lead to serious degradation of TCP performance. Due to the bufferless nature of OBS, random burst losses may occur, even at low traffic ...
A switch-based approach to throughput collapse and starvation in data centers
Data center switches need to satisfy stringent low-delay and high-capacity requirements. To do so, they rely on small switch buffers. However, in case of congestion, data center switches may suffer from throughput collapse for short TCP flows as well as ...
Comments