skip to main content
10.1145/3230543.3230557acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

Revisiting network support for RDMA

Published:07 August 2018Publication History

ABSTRACT

The advent of RoCE (RDMA over Converged Ethernet) has led to a significant increase in the use of RDMA in datacenter networks. To achieve good performance, RoCE requires a lossless network which is in turn achieved by enabling Priority Flow Control (PFC) within the network. However, PFC brings with it a host of problems such as head-of-the-line blocking, congestion spreading, and occasional deadlocks. Rather than seek to fix these issues, we instead ask: is PFC fundamentally required to support RDMA over Ethernet?

We show that the need for PFC is an artifact of current RoCE NIC designs rather than a fundamental requirement. We propose an improved RoCE NIC (IRN) design that makes a few simple changes to the RoCE NIC for better handling of packet losses. We show that IRN (without PFC) outperforms RoCE (with PFC) by 6-83% for typical network scenarios. Thus not only does IRN eliminate the need for PFC, it improves performance in the process! We further show that the changes that IRN introduces can be implemented with modest overheads of about 3-10% to NIC resources. Based on our results, we argue that research and industry should rethink the current trajectory of network support for RDMA.

References

  1. http://omnetpp.org/.Google ScholarGoogle Scholar
  2. https://inet.omnetpp.org.Google ScholarGoogle Scholar
  3. Xilinx Vivado Design Suite. https://www.xilinx.com/products/design-tools/vivado.html.Google ScholarGoogle Scholar
  4. InfiniBand architecture volume 1, general specifications, release 1.2.1. www.infinibandta.org/specs, 2008.Google ScholarGoogle Scholar
  5. Supplement to InfiniBand architecture specification volume 1 release 1.2.2 annex A16: RDMA over Converged Ethernet (RoCE). www.infinibandta.org/specs, 2010.Google ScholarGoogle Scholar
  6. IEEE. 802.11Qbb. Priority based flow control, 2011.Google ScholarGoogle Scholar
  7. Vivado Design Suite User Guide. https://goo.gl/akRdXC, 2013.Google ScholarGoogle Scholar
  8. http://www.xilinx.com/support/documentation/white_papers/wp350.pdf, 2014.Google ScholarGoogle Scholar
  9. Supplement to InfiniBand architecture specification volume 1 release 1.2.2 annex A17: RoCEv2 (IP routable RoCE),. www.infinibandta.org/specs, 2014.Google ScholarGoogle Scholar
  10. Mellanox ConnectX-4 Product Brief. https://goo.gl/HBw9f9, 2016.Google ScholarGoogle Scholar
  11. Mellanox ConnectX-5 Product Brief. https://goo.gl/ODlqMl, 2016.Google ScholarGoogle Scholar
  12. Mellanox Innova Flex 4 Product Brief. http://goo.gl/Lh7VN4, 2016.Google ScholarGoogle Scholar
  13. RoCE vs. iWARP Competitive Analysis. http://www.mellanox.com/related-docs/whitepapers/WP_RoCE_vs_iWARP.pdf, 2017.Google ScholarGoogle Scholar
  14. Sarita V Adve and Hans-J Boehm. Memory models: a case for rethinking parallel languages and hardware. Communications of the ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. Data Center TCP (DCTCP). In Proc. ACM SIGCOMM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. Deconstructing Datacenter Packet Transport. In Proc. ACM Workshop on Hot Topics in Networks (HotNets), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. pFabric: Minimal Near-optimal Datacenter Transport. In Proc. ACM SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Appenzeller, Guido and Keslassy, Isaac and McKeown, Nick. Sizing router buffers. In Proc. ACM SIGCOMM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Theophilus Benson, Aditya Akella, and David Maltz. Network Traffic Characteristics of Data Centers in the Wild. In Proc. ACM Internet Measurement Conference (IMC), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Advait Dixit, Pawan Prakash, Y Charlie Hu, and Ramana Rao Kompella. On the impact of packet spraying in data center networks. In Proc. IEEE INFOCOM, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  21. Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. FaRM: Fast Remote Memory. In Proc. USENIX NSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Soudeh Ghorbani, Zibin Yang, P. Brighten Godfrey, Yashar Ganjali, and Amin Firoozshahian. DRILL: Micro Load Balancing for Low-latency Data Center Networks. In Proc. ACM SIGCOMM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. RDMA over commodity ethernet at scale. In Proc. ACM SIGCOMM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, and Kai Chen. Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them. In Proc. ACM Workshop on Hot Topics in Networks (HotNets), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Anuj Kalia, Michael Kaminsky, and David G. Andersen. Using RDMA Efficiently for Key-value Services. In Proc. ACM SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Anuj Kalia, Michael Kaminsky, and David G. Andersen. Design Guidelines for High Performance RDMA Systems. In Proc. USENIX ATC, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bojie Li, Kun Tan, Layong (Larry) Luo, Yanqing Peng, Renqian Luo, Ningyi Xu, Yongqiang Xiong, Peng Cheng, and Enhong Chen. ClickNP: Highly Flexible and High Performance Network Processing with Re-configurable Hardware. In Proc. ACM SIGCOMM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yuanwei Lu, Guo Chen, Zhenyuan Ruan, Wencong Xiao, Bojie Li, Jiansong Zhang, Yongqiang Xiong, Peng Cheng, and Enhong Chen. Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter. In Proc. First Asia-Pacific Workshop on Networking (APNet), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. TIMELY: RTT-based Congestion Control for the Datacenter. In Proc. ACM SIGCOMM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, and Scott Shenker. Recursively Cautious Congestion Control. In Proc. USENIX NSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind Krishnamurthy, Sylvia Ratnasamy, and Scott Shenker. Revisiting Network Support for RDMA (Extended Version). arXiv:1806.08159, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. SENIC: Scalable NIC for End-host Rate Limiting. In Proc. USENIX NSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Renato Recio, Bernard Metzler, Paul Culley, Jeff Hilland, and Dave Garcia. A Remote Direct Memory Access Protocol Specification. RFC 5040, 2007.Google ScholarGoogle Scholar
  34. Alexander Shpiner, Eitan Zahavi, Omar Dahley, Aviv Barnea, Rotem Damsker, Gennady Yekelis, Michael Zus, Eitan Kuta, and Dean Baram. RoCE Rocks Without PFC: Detailed Evaluation. In Proc. ACM Workshop on Kernel-Bypass Networks (KBNets), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Alexander Shpiner, Eitan Zahavi, Vladimir Zdornov, Tal Anker, and Matty Kadosh. Unlocking Credit Loop Deadlocks. In Proc. ACM Workshop on Hot Topics in Networks (HotNets), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Daniel J. Sorin, Mark D. Hill, and David A. Wood. A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool Publishers, 1st edition, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Brent Stephens, Alan L Cox, Ankit Singla, John Carter, Colin Dixon, and Wesley Felter. Practical DCB for improved data center networks. In Proc. IEEE INFOCOM, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  38. Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. Congestion Control for Large-Scale RDMA Deployments. In Proc. ACM SIGCOMM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Revisiting network support for RDMA

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication
      August 2018
      604 pages
      ISBN:9781450355674
      DOI:10.1145/3230543

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 August 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate554of3,547submissions,16%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader