Skip to main content

27.03.2019

RecFlow: SDN-based receiver-driven flow scheduling in datacenters

verfasst von: Aadil Zia Khan, Ihsan Ayyub Qazi

Erschienen in: Cluster Computing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Datacenter applications (e.g., web search, recommendation systems, and social networking) are designed to have a high fanout for the purpose of achieving scalable performance. Frequent fabric congestion (e.g., due to incast, imperfect hashing) is a corollary of such a design. This is true even when the network utilization is low. Such fabric congestion exhibits both temporal as well as spatial (intra-rack and inter-rack) variations. There exist two basic design paradigms which are used to address this issue. Current solutions lie somewhere between the two. On one hand we have arbiter based approaches where senders poll a centralized arbiter and collectively obey global scheduling decisions. On the other end of the spectrum, we have self adjusting end point based approaches where senders independently adjust transmission rate based on network congestion. The former incurs greater overhead, compared to the latter which trades off complexity for sub-optimality. Our work seeks a middle ground - optimality of arbiter based approaches with the simplicity of self adjusting end point based approaches. Our key design principle is that since the receiver has complete information regarding the flows destined for it, rather than having a centralized arbiter schedule flows or the senders making independent scheduling decisions, the receiver can orchestrate the various flows destined for it. Since multiple receivers may be using a bottleneck link, datapath visibility should be used to ensure fair sharing of the bottleneck capacity between receivers with minimum overhead. We propose RecFlow, which is a receiver-based proactive congestion control scheme. RecFlow employs OpenFlow provided path visibility to track changing bottlenecks on the fly. It spaces TCP acknowledgements to prevent traffic bursts and ensure that no receiver exceeds its fair share of the bottleneck capacity. The goal is to reduce buffer overflows while maintaining fairness among flows and high link utilization. Using extensive simulation results and real testbed evaluation, we show that compared to the state-of-the-art, RecFlow achieves up to 6× improvement in the inter-rack scenario and 1.5× in the intra-rack scenario while sharing the link capacity fairly between all flows.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
These controllers run in slave mode to ensure that they are read-only.
 
2
Note that 10 Gbps and 40 Gbps links are common in DCs [2, 5].
 
Literatur
4.
5.
Zurück zum Zitat Alizadeh, M., Greenberg, A., Maltz, D.A., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., Sridharan, M.: Data center tcp (dctcp). In: Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM ’10, pp. 63–74. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1851182.1851192 Alizadeh, M., Greenberg, A., Maltz, D.A., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., Sridharan, M.: Data center tcp (dctcp). In: Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM ’10, pp. 63–74. ACM, New York, NY, USA (2010). https://​doi.​org/​10.​1145/​1851182.​1851192
6.
Zurück zum Zitat Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pp. 265–278. USENIX Association, Berkeley, CA, USA (2010). http://dl.acm.org/citation.cfm?id=1924943.1924962 Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pp. 265–278. USENIX Association, Berkeley, CA, USA (2010). http://​dl.​acm.​org/​citation.​cfm?​id=​1924943.​1924962
7.
Zurück zum Zitat Bai, W., Chen, K., Wu, H., Lan, W., Zhao, Y.: PAC: Taming TCP incast congestion using proactive ACK control. In: Proceedings of the 2014 IEEE 22nd International Conference on Network Protocols, ICNP ’14, pp. 385–396. IEEE Computer Society, Washington, DC, USA (2014). https://doi.org/10.1109/ICNP.2014.62 Bai, W., Chen, K., Wu, H., Lan, W., Zhao, Y.: PAC: Taming TCP incast congestion using proactive ACK control. In: Proceedings of the 2014 IEEE 22nd International Conference on Network Protocols, ICNP ’14, pp. 385–396. IEEE Computer Society, Washington, DC, USA (2014). https://​doi.​org/​10.​1109/​ICNP.​2014.​62
9.
Zurück zum Zitat Cheng, P., Ren, F., Shu, R., Lin, C.: Catch the whole lot in an action: rapid precise packet loss notification in data centers. In: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI’14, pp. 17–28. USENIX Association, Berkeley, CA, USA (2014). http://dl.acm.org/citation.cfm?id=2616448.2616451 Cheng, P., Ren, F., Shu, R., Lin, C.: Catch the whole lot in an action: rapid precise packet loss notification in data centers. In: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI’14, pp. 17–28. USENIX Association, Berkeley, CA, USA (2014). http://​dl.​acm.​org/​citation.​cfm?​id=​2616448.​2616451
10.
Zurück zum Zitat Dalton, M., Schultz, D., Adriaens, J., Arefin, A., Gupta, A., Fahs, B., Rubinstein, D., Zermeno, E.C., Rubow, E., Docauer, J.A., Alpert, J., Ai, J., Olson, J., DeCabooter, K., de Kruijf, M., Hua, N., Lewis, N., Kasinadhuni, N., Crepaldi, R., Krishnan, S., Venkata, S., Richter, Y., Naik, U., Vahdat, A.: Andromeda: performance, isolation, and velocity at scale in cloud network virtualization. In: 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pp. 373–387. USENIX Association, Renton, WA (2018). https://www.usenix.org/conference/nsdi18/presentation/dalton Dalton, M., Schultz, D., Adriaens, J., Arefin, A., Gupta, A., Fahs, B., Rubinstein, D., Zermeno, E.C., Rubow, E., Docauer, J.A., Alpert, J., Ai, J., Olson, J., DeCabooter, K., de Kruijf, M., Hua, N., Lewis, N., Kasinadhuni, N., Crepaldi, R., Krishnan, S., Venkata, S., Richter, Y., Naik, U., Vahdat, A.: Andromeda: performance, isolation, and velocity at scale in cloud network virtualization. In: 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pp. 373–387. USENIX Association, Renton, WA (2018). https://​www.​usenix.​org/​conference/​nsdi18/​presentation/​dalton
11.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
14.
Zurück zum Zitat Ghobadi, M., Yeganeh, S.H., Ganjali, Y.: Rethinking end-to-end congestion control in software-defined networks. In: Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pp. 61–66. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2390231.2390242 Ghobadi, M., Yeganeh, S.H., Ganjali, Y.: Rethinking end-to-end congestion control in software-defined networks. In: Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pp. 61–66. ACM, New York, NY, USA (2012). https://​doi.​org/​10.​1145/​2390231.​2390242
16.
Zurück zum Zitat He, K., Rozner, E., Agarwal, K., Felter, W., Carter, J., Akella, A.: Presto: Edge-based load balancing for fast datacenter networks. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM ’15, pp. 465–478. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2785956.2787507 He, K., Rozner, E., Agarwal, K., Felter, W., Carter, J., Akella, A.: Presto: Edge-based load balancing for fast datacenter networks. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM ’15, pp. 465–478. ACM, New York, NY, USA (2015). https://​doi.​org/​10.​1145/​2785956.​2787507
22.
Zurück zum Zitat Krevat, E., Vasudevan, V., Phanishayee, A., Andersen, D.G., Ganger, G.R., Gibson, G.A., Seshan, S.: On application-level approaches to avoiding tcp throughput collapse in cluster-based storage systems. In: Proceedings of the 2nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing ’07, PDSW ’07, pp. 1–4. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1374596.1374598 Krevat, E., Vasudevan, V., Phanishayee, A., Andersen, D.G., Ganger, G.R., Gibson, G.A., Seshan, S.: On application-level approaches to avoiding tcp throughput collapse in cluster-based storage systems. In: Proceedings of the 2nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing ’07, PDSW ’07, pp. 1–4. ACM, New York, NY, USA (2007). https://​doi.​org/​10.​1145/​1374596.​1374598
24.
Zurück zum Zitat Lu, Y., Zhu, S.: SDN-based TCP congestion control in data center networks. In: Proceedings of the 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), IPCCC ’15, pp. 1–7. IEEE Computer Society, Washington, DC, USA (2015). https://doi.org/10.1109/PCCC.2015.7410275 Lu, Y., Zhu, S.: SDN-based TCP congestion control in data center networks. In: Proceedings of the 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), IPCCC ’15, pp. 1–7. IEEE Computer Society, Washington, DC, USA (2015). https://​doi.​org/​10.​1109/​PCCC.​2015.​7410275
27.
Zurück zum Zitat Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H.C., McElroy, R., Paleczny, M., Peek, D., Saab, P., Stafford, D., Tung, T., Venkataramani, V.: Scaling memcache at facebook. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, nsdi’13, pp. 385–398. USENIX Association, Berkeley, CA, USA (2013). http://dl.acm.org/citation.cfm?id=2482626.2482663 Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H.C., McElroy, R., Paleczny, M., Peek, D., Saab, P., Stafford, D., Tung, T., Venkataramani, V.: Scaling memcache at facebook. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, nsdi’13, pp. 385–398. USENIX Association, Berkeley, CA, USA (2013). http://​dl.​acm.​org/​citation.​cfm?​id=​2482626.​2482663
30.
32.
Zurück zum Zitat Perry, J., Ousterhout, A., Balakrishnan, H., Shah, D., Fugal, H.: Fastpass: A centralized “zero-queue” datacenter network. In: Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM ’14, pp. 307–318. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2619239.2626309 Perry, J., Ousterhout, A., Balakrishnan, H., Shah, D., Fugal, H.: Fastpass: A centralized “zero-queue” datacenter network. In: Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM ’14, pp. 307–318. ACM, New York, NY, USA (2014). https://​doi.​org/​10.​1145/​2619239.​2626309
33.
Zurück zum Zitat Pfaff, B., Pettit, J., Koponen, T., Jackson, E.J., Zhou, A., Rajahalme, J., Gross, J., Wang, A., Stringer, J., Shelar, P., Amidon, K., Casado, M.: The design and implementation of open vswitch. In: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI’15, pp. 117–130. USENIX Association, Berkeley, CA, USA (2015). http://dl.acm.org/citation.cfm?id=2789770.2789779 Pfaff, B., Pettit, J., Koponen, T., Jackson, E.J., Zhou, A., Rajahalme, J., Gross, J., Wang, A., Stringer, J., Shelar, P., Amidon, K., Casado, M.: The design and implementation of open vswitch. In: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI’15, pp. 117–130. USENIX Association, Berkeley, CA, USA (2015). http://​dl.​acm.​org/​citation.​cfm?​id=​2789770.​2789779
34.
Zurück zum Zitat Phanishayee, A., Krevat, E., Vasudevan, V., Andersen, D.G., Ganger, G.R., Gibson, G.A., Seshan, S.: Measurement and analysis of tcp throughput collapse in cluster-based storage systems. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST’08, pp. 12:1–12:14. USENIX Association, Berkeley, CA, USA (2008). http://dl.acm.org/citation.cfm?id=1364813.1364825 Phanishayee, A., Krevat, E., Vasudevan, V., Andersen, D.G., Ganger, G.R., Gibson, G.A., Seshan, S.: Measurement and analysis of tcp throughput collapse in cluster-based storage systems. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST’08, pp. 12:1–12:14. USENIX Association, Berkeley, CA, USA (2008). http://​dl.​acm.​org/​citation.​cfm?​id=​1364813.​1364825
35.
Zurück zum Zitat Pirzada, H.A., Mahboob, M.R., Qazi, I.A.: esdn: Rethinking datacenter transports using end-host sdn controllers. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM ’15, pp. 605–606. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2785956.2790022 Pirzada, H.A., Mahboob, M.R., Qazi, I.A.: esdn: Rethinking datacenter transports using end-host sdn controllers. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM ’15, pp. 605–606. ACM, New York, NY, USA (2015). https://​doi.​org/​10.​1145/​2785956.​2790022
37.
Zurück zum Zitat Rotsos, C., Sarrar, N., Uhlig, S., Sherwood, R., Moore, A.W.: Oflops: An open framework for openflow switch evaluation. In: Proceedings of the 13th International Conference on Passive and Active Measurement, PAM’12, pp. 85–95. Springer, Berlin (2012)CrossRef Rotsos, C., Sarrar, N., Uhlig, S., Sherwood, R., Moore, A.W.: Oflops: An open framework for openflow switch evaluation. In: Proceedings of the 13th International Conference on Passive and Active Measurement, PAM’12, pp. 85–95. Springer, Berlin (2012)CrossRef
39.
Zurück zum Zitat Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, A., Bannon, R., Boving, S., Desai, G., Felderman, B., Germano, P., Kanagala, A., Provost, J., Simmons, J., Tanda, E., Wanderer, J., Hölzle, U., Stuart, S., Vahdat, A.: Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network, pp. 183–197. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2829988.2787508 CrossRef Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, A., Bannon, R., Boving, S., Desai, G., Felderman, B., Germano, P., Kanagala, A., Provost, J., Simmons, J., Tanda, E., Wanderer, J., Hölzle, U., Stuart, S., Vahdat, A.: Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network, pp. 183–197. ACM, New York, NY, USA (2015). https://​doi.​org/​10.​1145/​2829988.​2787508 CrossRef
42.
Zurück zum Zitat The Open Networking Foundation: OpenFlow Switch Specification (2012) The Open Networking Foundation: OpenFlow Switch Specification (2012)
43.
Zurück zum Zitat The Open Networking Foundation: OpenFlow and SDN State of the Union (2016) The Open Networking Foundation: OpenFlow and SDN State of the Union (2016)
44.
Zurück zum Zitat Tootoonchian, A., Gorbunov, S., Ganjali, Y., Casado, M., Sherwood, R.: On controller performance in software-defined networks. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services, Hot-ICE’12, pp. 10–10. USENIX Association, Berkeley, CA, USA (2012). http://dl.acm.org/citation.cfm?id=2228283.2228297 Tootoonchian, A., Gorbunov, S., Ganjali, Y., Casado, M., Sherwood, R.: On controller performance in software-defined networks. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services, Hot-ICE’12, pp. 10–10. USENIX Association, Berkeley, CA, USA (2012). http://​dl.​acm.​org/​citation.​cfm?​id=​2228283.​2228297
45.
Zurück zum Zitat Vasudevan, V., Phanishayee, A., Shah, H., Krevat, E., Andersen, D.G., Ganger, G.R., Gibson, G.A., Mueller, B.: Safe and effective fine-grained tcp retransmissions for datacenter communication. In: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM ’09, pp. 303–314. ACM, New York, NY, USA (2009). https://doi.org/10.1145/1592568.1592604 Vasudevan, V., Phanishayee, A., Shah, H., Krevat, E., Andersen, D.G., Ganger, G.R., Gibson, G.A., Mueller, B.: Safe and effective fine-grained tcp retransmissions for datacenter communication. In: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM ’09, pp. 303–314. ACM, New York, NY, USA (2009). https://​doi.​org/​10.​1145/​1592568.​1592604
50.
Metadaten
Titel
RecFlow: SDN-based receiver-driven flow scheduling in datacenters
verfasst von
Aadil Zia Khan
Ihsan Ayyub Qazi
Publikationsdatum
27.03.2019
Verlag
Springer US
Erschienen in
Cluster Computing
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-019-02922-4