skip to main content
10.1145/3514221.3526056acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

KafkaDirect: Zero-copy Data Access for Apache Kafka over RDMA Networks

Published:11 June 2022Publication History

ABSTRACT

Apache Kafka is an open-source distributed publish-subscribe system, which is widely used in data centers for messaging between applications, log aggregation, and stream processing. The existing Kafka implementation uses TCP/IP for communication, which has various inefficiencies such as a high message dispatch cost due to OS involvement and excessive memory copies. Recently, the availability of cost-effective RDMA-capable network controllers within data centers and cloud infrastructures have encouraged many modern applications to adopt RDMA networking, which offers the potential to outperform classical TCP/IP. We introduce KafkaDirect, an extension to Apache Kafka, that uses RDMA to accelerate the three most network intensive datapaths: record production, record replication, and record consumption. In this work, we explore the design choices including which RDMA operations to use to take full advantage of offloaded communication. Our RDMA design relies on one-sided RDMA requests to attain true zero-copy communication completely avoiding the need for using intermediate buffers in Kafka servers, thereby ensuring low latency and high throughput communication. KafkaDirect can offer up to 9x increase in throughput for both Kafka producers and Kafka consumers, and can provide 4x and 50x reduction in latency for Kafka producers and Kafka consumers, respectively.

Skip Supplemental Material Section

Supplemental Material

sigmod-industry-4-taranov.mp4

mp4

231.3 MB

References

  1. InfiniBand Trade Association et al . 2020. The InfiniBand Architecture Specification 1.4. https://www.infinibandta.org/ibta-specification/.Google ScholarGoogle Scholar
  2. Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobbler, Michael Wei, and John D. Davis. 2012. CORFU: A Shared Log Design for Flash Clusters. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI'12). USENIX Association, 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Claude Barthels, Gustavo Alonso, and Torsten Hoefler. 2017. Designing Databases for Future High-Performance Networks. IEEE Data Engineering Bulletin 40, 1 (2017), 15--26.Google ScholarGoogle Scholar
  4. Claude Barthels, Simon Loesing, Gustavo Alonso, and Donald Kossmann. 2015. Rack-Scale In-Memory Join Processing Using RDMA. In Proceedings of the 2015 ACM International Conference on Management of Data (SIGMOD'15). Association for Computing Machinery, 1463--1475. https://doi.org/10.1145/2723372.2750547Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, and Erfan Zamanian. 2016. The End of Slow Networks: It's Time for a Redesign. Proceedings of the VLDB Endowment 9, 7 (2016), 528--539. https://doi.org/10.14778/2904483.2904485Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Andrew D. Birrell and Bruce Jay Nelson. 1984. Implementing Remote Procedure Calls. ACM Transactions on Computer Systems 2, 1 (1984), 39--59. https://doi.org/10.1145/2080.357392Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mark S Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D Underwood, and Robert C Zak. 2015. Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics. In Proceedings of the 23rd IEEE Symposium on High-Performance Interconnects (HOTI'15). IEEE Computer Society, 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Matthew Burke, Sowmya Dharanipragada, Shannon Joyner, Adriana Szekeres, Jacob Nelson, Irene Zhang, and Dan R. K. Ports. 2021. PRISM: Rethinking the RDMA Interface for Distributed Systems. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP'21). Association for Computing Machinery, 228--242. https://doi.org/10.1145/3477132.3483587Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Haibo Chen, Rong Chen, Xingda Wei, Jiaxin Shi, Yanzhe Chen, Zhaoguo Wang, Binyu Zang, and Haibing Guan. 2015. Fast In-Memory Transaction Processing Using RDMA and HTM. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP'15). Association for Computing Machinery, 87--104.Google ScholarGoogle Scholar
  10. Youmin Chen, Youyou Lu, and Jiwu Shu. 2019. Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing. In Proceedings of the 14th EuroSys Conference (EuroSys'19). Association for Computing Machinery, Article 19, 14 pages. https://doi.org/10.1145/3302424.3303968Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Youmin Chen, Youyou Lu, Fan Yang, Qing Wang, Yang Wang, and Jiwu Shu. 2020. FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'20). Association for Computing Machinery, 1077--1091. https://doi.org/10.1145/3373376.3378515Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alibaba Cloud. 2018. Super computing cluster. https://www.alibabacloud.com/product/scc.Google ScholarGoogle Scholar
  13. Inc. Cloudera. 2019. kafka-*-perf-test. https://docs.cloudera.com/runtime/7.2.0/kafka-managing/topics/kafka-manage-cli-perf-test.html.Google ScholarGoogle Scholar
  14. Cong Ding, David Chu, Evan Zhao, Xiang Li, Lorenzo Alvisi, and Robbert Van Renesse. 2020. Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log . In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI'20). USENIX Association, 325--338.Google ScholarGoogle Scholar
  15. Aleksandar Dragojevi, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI'14). USENIX Association, 401--414.Google ScholarGoogle Scholar
  16. Ken Goodhope, Joel Koshy, Jay Kreps, Neha Narkhede, Richard Park, Jun Rao, and Victor Yang Ye. 2012. Building LinkedIn's Real-time Activity Data Pipeline. IEEE Data Engineering Bulletin 35, 2 (2012), 33--45.Google ScholarGoogle Scholar
  17. R. Guerraoui and A. Schiper. 1997. Total order multicast to multiple groups. In Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS'97). IEEE Computer Society, 578--585.Google ScholarGoogle Scholar
  18. Torsten Hoefler, Salvatore Di Girolamo, Konstantin Taranov, Ryan E. Grant, and Ron Brightwell. 2017. sPIN: High-performance Streaming Processing In the Network. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17). Association for Computing Machinery, Article 59, 16 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weijia Song, Edward Tremel, Robbert Van Renesse, Sydney Zink, and Kenneth P. Birman. 2019. Derecho: Fast State Machine Replication for Cloud Services. ACM Transactions on Computer Systems 36, 2, Article 4 (2019), 49 pages. https://doi.org/10.1145/3302258Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA Efficiently for Key-Value Services. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM'14). Association for Computing Machinery, 295--306. https://doi.org/10.1145/2619239.2626299Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC'16). USENIX Association, 437--450.Google ScholarGoogle Scholar
  22. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, 185--201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a Warehouse- Scale Computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15). Association for Computing Machinery, 158--169. https://doi.org/10.1145/2749469.2750392Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tejas Karmarkar. 2015. Availability of linux RDMA on Microsoft Azure. https://azure.microsoft.com/en-us/blog/azure-linux-rdma-hpc-available.Google ScholarGoogle Scholar
  25. Daehyeok Kim, Amirsaman Memaripour, Anirudh Badam, Yibo Zhu, Hongqiang Harry Liu, Jitu Padhye, Shachar Raindel, Steven Swanson, Vyas Sekar, and Srinivasan Seshan. 2018. Hyperloop: Group-Based NIC-Offloading to Accelerate Replicated Transactions in Multi-Tenant Storage Systems. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM'18). Association for Computing Machinery, 297--312. https://doi.org/10.1145/3230543.3230572Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jay Kreps, Neha Narkhede, Jun Rao, et al. 2011. Kafka: A distributed messaging system for log processing. In Proceedings of the 2011 IEEE International Workshop on Networking Meets Databases (NetDB'11). 1--7.Google ScholarGoogle Scholar
  27. Ilya Lesokhin, Haggai Eran, Shachar Raindel, Guy Shapiro, Sagi Grimberg, Liran Liss, Muli Ben-Yehuda, Nadav Amit, and Dan Tsafrir. 2017. Page fault support for network controllers. ACM SIGARCH Computer Architecture News 45, 1 (2017), 449--466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP'17). Association for Computing Machinery, 137--152. https://doi.org/10.1145/3132747.3132756Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Feng Li, Sudipto Das, Manoj Syamala, and Vivek R. Narasayya. 2016. Accelerating Relational Databases by Leveraging Remote Memory and RDMA. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD'16). Association for Computing Machinery, 355--370. https://doi.org/10.1145/2882903.2882949Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Joshua Lockerman, Jose M. Faleiro, Juno Kim, Soham Sankaran, Daniel J. Abadi, James Aspnes, Siddhartha Sen, and Mahesh Balakrishnan. 2018. The FuzzyLog: A Partially Ordered Shared Log. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI'18). USENIX Association, 357--372.Google ScholarGoogle Scholar
  31. Microsoft. 2020. Azure Service Bus Messaging. https://azure.microsoft.com/en-us/services/service-bus/.Google ScholarGoogle Scholar
  32. Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC'16). USENIX Association, 103--114.Google ScholarGoogle Scholar
  33. The Ohio State University Network-Based Computing Laboratory. 2018. RDMA-based Apache Kafka (RDMA-Kafka). http://hibd.cse.ohio-state.edu/#kafka.Google ScholarGoogle Scholar
  34. Oracle. 2020. Oracle Cloud. https://www.oracle.com/cloud/hpc/.Google ScholarGoogle Scholar
  35. Oracle. 2020. Oracle Messaging Cloud Service. https://www.oracle.com/technical-resources/articles/cloud/wilkins-ocms.html.Google ScholarGoogle Scholar
  36. John Ousterhout, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, and Stephen Yang. 2015. The RAMCloud Storage System. ACM Transactions on Computer Systems 33, 3, Article 7 (Aug. 2015), 55 pages. https://doi.org/10.1145/2806887Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of Performance in Data Analytics Frameworks. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI'15). USENIX Association, 293--307. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/ousterhoutGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sathish K Palaniappan and Pramod B Nagaraja. 2008. Efficient data transfer through zero copy. IBM developerworks (2008).Google ScholarGoogle Scholar
  39. Marius Poke and Torsten Hoefler. 2015. DARE: High-Performance State Machine Replication on RDMA Networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'15). Association for Computing Machinery, 107--118. https://doi.org/10.1145/2749246.2749267Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. OpenMessaging Project. 2017. OpenMessaging Benchmark Framework. https://github.com/openmessaging/openmessaging-benchmark.Google ScholarGoogle Scholar
  41. Renato Recio, Bernard Metzler, Paul Culley, Jeff Hilland, and Dave Garcia. 2007. A Remote Direct Memory Access Protocol Specification. Technical Report RFC 5040. Network Working Group.Google ScholarGoogle Scholar
  42. Eric Rescorla. 2018. The Transport Layer Security (TLS) Protocol Version 1.3. Technical Report RFC 8446. Network Working Group.Google ScholarGoogle Scholar
  43. Wolf Rödiger, Tobias Mühlbauer, Alfons Kemper, and Thomas Neumann. 2015. High-Speed Query Processing over High-Speed Networks. Proceedings of the VLDB Endowment 9, 4 (Dec. 2015), 228--239. https://doi.org/10.14778/2856318.2856319Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Benjamin Rothenberger, Konstantin Taranov, Adrian Perrig, and Torsten Hoefler. 2021. ReDMArk: Bypassing RDMA Security Mechanisms. In Proceedings of the 30th USENIX Security Symposium (USENIX Security'21). USENIX Association.Google ScholarGoogle Scholar
  45. Amazon Web Services. 2020. Amazon Simple Queue Service. https://aws.amazon.com/sqs/.Google ScholarGoogle Scholar
  46. Peter Snyder. 1990. tmpfs: A virtual memory file system. In Proceedings of the Autumn 1990 European UNIX Users' Group Conference (EUUG'90). 241--248.Google ScholarGoogle Scholar
  47. Patrick Stuedi, Bernard Metzler, and Animesh Trivedi. 2013. JVerbs: Ultra-Low Latency for Data Center Applications. In Proceedings of the 4th ACM Symposium on Cloud Computing (SoCC'13). Association for Computing Machinery, Article 10, 14 pages. https://doi.org/10.1145/2523616.2523631Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Patrick Stuedi, Animesh Trivedi, Bernard Metzler, and Jonas Pfefferle. 2014. DaRPC: Data Center RPC. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC'14). Association for Computing Machinery, 1--13. https://doi.org/10.1145/2670979.2670994Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Maomeng Su, Mingxing Zhang, Kang Chen, Zhenyu Guo, and Yongwei Wu. 2017. RFP: When RPC is Faster than Server-Bypass with RDMA. In Proceedings of the 12th European Conference on Computer Systems (EuroSys'17). Association for Computing Machinery, 1--15. https://doi.org/10.1145/3064176.3064189Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, and Toni Cortes. 2018. Tailwind: Fast and Atomic RDMA-based Replication. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC'18). USENIX Association, 851--863.Google ScholarGoogle Scholar
  51. Konstantin Taranov, Rodrigo Bruno, Gustavo Alonso, and Torsten Hoefler. 2021. Naos: Serialization-free RDMA networking in Java. In Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC'21). USENIX Association, 1--14.Google ScholarGoogle Scholar
  52. Konstantin Taranov, Benjamin Rothenberger, Adrian Perrig, and Torsten Hoefler. 2020. sRDMA -- Efficient NIC-based Authentication and Encryption for Remote Direct Memory Access. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC'20). USENIX Association, 691--704.Google ScholarGoogle Scholar
  53. Mellanox Technologies. 2015. RDMA Aware Networks Programming User Manual, Rev 1.7. https://www.mellanox.com/related-docs/prod_software/RDMA_Aware_ Programming_user_manual.pdf.Google ScholarGoogle Scholar
  54. Gigabyte Technology. 2021. AORUS Gen4 AIC SSD 8TB. https://www.gigabyte.com/Solid-State-Drive/AORUS-Gen4-AIC-SSD-8TB.Google ScholarGoogle Scholar
  55. Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA'16). IEEE Press, 53--65. https://doi.org/10.1109/ISCA.2016.15Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Giselle van Dongen and Dirk Van den Poel. 2020. Evaluation of Stream Processing Frameworks. IEEE Transactions on Parallel and Distributed Systems 31, 8 (2020), 1845--1858.Google ScholarGoogle ScholarCross RefCross Ref
  57. Guozhang Wang, Joel Koshy, Sriram Subramanian, Kartik Paramasivam, Mammad Zadeh, Neha Narkhede, Jun Rao, Jay Kreps, and Joe Stein. 2015. Building a Replicated Logging System with Apache Kafka. Proceedings of the VLDB Endowment 8, 12 (Aug. 2015), 1654--1655. https://doi.org/10.14778/2824032.2824063Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Michael Wei, Amy Tai, Christopher J. Rossbach, Ittai Abraham, Maithem Munshed, Medhavi Dhawan, Jim Stabile, Udi Wieder, Scott Fritchie, Steven Swanson, Michael J. Freedman, and Dahlia Malkhi. 2017. vCorfu: A Cloud-Scale Object Store on a Shared Log. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI'17). USENIX Association, 35--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. 2018. Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI'18). USENIX Association, Carlsbad, CA, 233--251. https://www.usenix.org/conference/osdi18/presentation/weiGoogle ScholarGoogle Scholar
  60. Jilong Xue, Youshan Miao, Cheng Chen, Ming Wu, Lintao Zhang, and Lidong Zhou. 2019. Fast Distributed Deep Learning over RDMA. In Proceedings of the 14th European Conference on Computer Systems (EuroSys'19). Association for Computing Machinery, Article 44, 14 pages. https://doi.org/10.1145/3302424.3303975Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud'10). USENIX Association, 10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Erfan Zamanian, Carsten Binnig, Tim Harris, and Tim Kraska. 2017. The End of a Myth: Distributed Transactions Can Scale. Proceedings of the VLDB Endowment 10, 6 (2017), 685--696. https://doi.org/10.14778/3055330.3055335Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Erfan Zamanian, Xiangyao Yu, Michael Stonebraker, and Tim Kraska. 2019. Rethinking Database High Availability with RDMA Networks. Proceedings of the VLDB Endowment 12, 11 (July 2019), 1637--1650. https://doi.org/10.14778/3342263.3342639Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Tobias Ziegler, Sumukha Tumkur Vani, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. Designing Distributed Tree-Based Index Structures for Fast RDMA-Capable Networks. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD'19). Association for Computing Machinery, 741--758. https://doi.org/10.1145/3299869.3300081Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. KafkaDirect: Zero-copy Data Access for Apache Kafka over RDMA Networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
          June 2022
          2597 pages
          ISBN:9781450392495
          DOI:10.1145/3514221

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 June 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader