Abstract
DMA-capable interconnects, providing ultra-low latency and high bandwidth, are increasingly being used in the context of distributed storage and data processing systems. However, the deployment of such systems in virtualized data centers is currently inhibited by the lack of a flexible and high-performance virtualization solution for RDMA network interfaces.
In this work, we present a hybrid virtualization architecture which builds upon the concept of separation of paths for control and data operations available in RDMA. With hybrid virtualization, RDMA control operations are virtualized using hypervisor involvement, while data operations are set up to bypass the hypervisor completely. We describe HyV (Hybrid Virtualization), a virtualization framework for RDMA devices implementing such a hybrid architecture. In the paper, we provide a detailed evaluation of HyV for different RDMA technologies and operations. We further demonstrate the advantages of HyV in the context of a real distributed system by running RAMCloud on a set of HyV-enabled virtual machines deployed across a 6-node RDMA cluster. All of the performance results we obtained illustrate that hybrid virtualization enables bare-metal RDMA performance inside virtual machines while retaining the flexibility typically associated with paravirtualization.
- Adit Ranadive and Bhavesh Davda. Toward a Paravirtual vRDMA Device for VMware ESXi Guests. VMware, 2012.Google Scholar
- Ardalan Amiri Sani, Kevin Boos, Shaopu Qin, and Lin Zhong. I/O Paravirtualization at the Device File Boundary. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 319--332, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- Nadav Amit, Dan Tsafrir, and Assaf Schuster. VSwapper: A Memory Swapper for Virtualized Environments. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 349--366, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- Fabrice Bellard. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of USENIX Annual Technical Conference, pages 41--46, 2005. Google ScholarDigital Library
- Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. FaRM: Fast Remote Memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pages 401--414, Seattle, WA, April 2014. USENIX Association. Google ScholarDigital Library
- Thorsten Von Eicken, Anindya Basu, Vineet Buch, and Werner Vogels. U-net: A user-level network interface for parallel and distributed computing. In In Fifteenth ACM Symposium on Operating System Principles, 1995. Google ScholarDigital Library
- Keir Fraser, Steven H, Rolf Neugebauer, Ian Pratt, Andrew Warfield, and Mark Williamson. Safe hardware access with the Xen virtual machine monitor. In In 1st Workshop on Operating System and Architectural Support for the on demand IT InfraStructure (OASIS), 2004.Google Scholar
- Abel Gordon, Nadav Amit, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. ELI: Baremetal Performance for I/O Virtualization. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 411--422, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- InfiniBand Trade Association. InfiniBand Architectur Specification, Volume 1, Release 1.2.1. 2007.Google Scholar
- InfiniBand Trade Association. Annex A16: RDMA over Converged Ethernet (RoCE). 2010.Google Scholar
- J. Pinkerton J. Hilland, P. Culley and R. Recio. RDMA Protocol Verbs Specification. http://www.rdmaconsortium. org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC. pdf, 2003.Google Scholar
- Hwanju Kim, Sangwook Kim, Jinkyu Jeong, Joonwon Lee, and Seungryoul Maeng. Demand-based Coordinated Scheduling for SMP VMs. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 369--380, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- Hwanju Kim, Hyeontaek Lim, Jinkyu Jeong, Heeseung Jo, and Joonwon Lee. Task-aware Virtual Machine Scheduling for I/O Performance. In Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '09, pages 101--110, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. kvm: the Linux Virtual Machine Monitor. In Proceedings of the Linux Symposium, volume 1, pages 225--230, Ottawa, Ontario, Canada, June 2007.Google Scholar
- L. Lamport. Proving the correctness of multiprocess programs. IEEE Trans. Softw. Eng., 3(2):125--143, March 1977. Google ScholarDigital Library
- Jiuxing Liu, Wei Huang, Bulent Abali, and Dhabaleswar K. Panda. High Performance VMM-bypass I/O in Virtual Machines. In Proceedings of the Annual Conference on USENIX '06 Annual Technical Conference, ATEC '06, pages 3--3, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarDigital Library
- Matthew Wilcox. I'll Do It Later: Softirqs, Tasklets, Bottom Halves, Task Queues, Work Queues and Timers. In Linux.Conf.Au, 2003.Google Scholar
- Christopher Mitchell, Yifeng Geng, and Jinyang Li. Using One-sided RDMA Reads to Build a Fast, CPU-efficient Keyvalue Store. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference, USENIX ATC'13, pages 103--114, Berkeley, CA, USA, 2013. USENIX Association. Google ScholarDigital Library
- OFED. The Open Fabric Alliance, at https://www. openfabrics.org/.Google Scholar
- Diego Ongaro, Alan L. Cox, and Scott Rixner. Scheduling I/O in Virtual Machine Monitors. In Proceedings of the Fourth ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '08, pages 1--10, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. Fast Crash Recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 29--41, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazi'eres, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. The Case for RAMCloud. Commun. ACM, 54(7):121--130, July 2011. Google ScholarDigital Library
- John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazi'eres, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. The Case for RAMClouds: Scalable High-performance Storage Entirely in DRAM. SIGOPS Oper. Syst. Rev., 43(4):92--105, January 2010. Google ScholarDigital Library
- Zhenhao Pan, Yaozu Dong, Yu Chen, Lei Zhang, and Zhijiao Zhang. CompSC: Live Migration with Pass-through Devices. In Proceedings of the 8th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments, VEE '12, pages 109--120, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- PCI SIG. Single Root I/O Virtualization, at https://www.pcisig.com/specifications/iov/single_root/.Google Scholar
- A Ranadive, A Gavrilovska, and K. Schwan. FaReS: Fair Resource Scheduling for VMM-Bypass InfiniBand Devices. In Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, pages 418--427, May 2010. Google ScholarDigital Library
- R. Recio, B. Metzler, P. Culley, J. Hilland, and D. Garcia. A Remote Direct Memory Access Protocol Specification. RFC 5040, October 2007.Google Scholar
- S. A. Reinemo, T. Skeie, T. Sodring, O. Lysne, and O. Trudbakken. An Overview of QoS Capabilities in Infiniband, Advanced Switching Interconnect, and Ethernet. Comm. Mag., 44(7):32--38, September 2006. Google ScholarDigital Library
- Rusty Russell. virtio: Towards a De-facto Standard for Virtual I/O Devices. SIGOPS Oper. Syst. Rev., 42(5):95--103, July 2008. Google ScholarDigital Library
- Animesh Trivedi, Bernard Metzler, and Patrick Stuedi. A case for RDMA in clouds: turning supercomputer networking into commodity. In Proceedings of the Second Asia-Pacific Workshop on Systems, APSys '11, pages 17:1--17:5, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Index Terms
- A Hybrid I/O Virtualization Framework for RDMA-capable Network Interfaces
Recommendations
A Hybrid I/O Virtualization Framework for RDMA-capable Network Interfaces
VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsDMA-capable interconnects, providing ultra-low latency and high bandwidth, are increasingly being used in the context of distributed storage and data processing systems. However, the deployment of such systems in virtualized data centers is currently ...
Revisiting network support for RDMA
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data CommunicationThe advent of RoCE (RDMA over Converged Ethernet) has led to a significant increase in the use of RDMA in datacenter networks. To achieve good performance, RoCE requires a lossless network which is in turn achieved by enabling Priority Flow Control (PFC)...
Reviewing the World of Virtualization
ISMS '12: Proceedings of the 2012 Third International Conference on Intelligent Systems Modelling and SimulationThe latest talk in IT industry is about server virtualization. Virtualization increase server utilization rates. Along with this, it lets you consolidate multiple operating systems and applications as per physical server and deploy new applications in ...
Comments