Skip to main content
Erschienen in: International Journal of Parallel Programming 5/2021

18.03.2021

RDMA-Based Apache Storm for High-Performance Stream Data Processing

verfasst von: Ziyu Zhang, Zitan Liu, Qingcai Jiang, Junshi Chen, Hong An

Erschienen in: International Journal of Parallel Programming | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Apache Storm is a scalable fault-tolerant distributed real time stream-processing framework widely used in big data applications. For distributed data-sensitive applications, low-latency, high-throughput communication modules have a critical impact on overall system performance. Apache Storm currently uses Netty as its communication component, an asynchronous server/client framework based on TCP/IP protocol stack. The TCP/IP protocol stack has inherent performance flaws due to frequent memory copying and context switching. The Netty component not only limits the performance of the Storm but also increases the CPU load in the IPoIB (IP over InfiniBand) communication mode. In this paper, we introduce two new implementations for Apache Storm communication components with the help of RDMA technology. The performance evaluation on Mellanox QDR Cards (40 Gbps) shows that our implementations can achieve speedup up to 5\(\times\) compared with IPoIB and 10\(\times\) with Gigabit Ethernet. Our implementations also significantly reduce the CPU load and increase the throughput of the system.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endow. 8(12), 1792–1803 (2015). https://doi.org/10.14778/2824032.2824076CrossRef Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endow. 8(12), 1792–1803 (2015). https://​doi.​org/​10.​14778/​2824032.​2824076CrossRef
7.
Zurück zum Zitat Friedman, E., Tzoumas, K.: Introduction to Apache Flink: Stream Processing for Real Time and Beyond, 1st edn. O’Reilly Media, Inc., Newton (2016) Friedman, E., Tzoumas, K.: Introduction to Apache Flink: Stream Processing for Real Time and Beyond, 1st edn. O’Reilly Media, Inc., Newton (2016)
8.
Zurück zum Zitat He, Z., Wang, D., Fu, B., Tan, K., Hua, B., Zhang, Z.L., Zheng, K.: MASQ: RDMA for virtual private cloud. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM ’20, p. 1–14. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3387514.3405849 He, Z., Wang, D., Fu, B., Tan, K., Hua, B., Zhang, Z.L., Zheng, K.: MASQ: RDMA for virtual private cloud. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM ’20, p. 1–14. Association for Computing Machinery, New York, NY, USA (2020). https://​doi.​org/​10.​1145/​3387514.​3405849
11.
Zurück zum Zitat Lu, F., Fang, T., Zhang, Z., Li, S., Chen, J., An, H., Han, W.: Improving the performance of mongodb with RDMA. In: Z. Xiao, L.T. Yang, P. Balaji, T. Li, K. Li, A.Y. Zomaya (eds.) 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019, Zhangjiajie, China, August 10-12, 2019, pp. 1004–1010. IEEE (2019). https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00144 Lu, F., Fang, T., Zhang, Z., Li, S., Chen, J., An, H., Han, W.: Improving the performance of mongodb with RDMA. In: Z. Xiao, L.T. Yang, P. Balaji, T. Li, K. Li, A.Y. Zomaya (eds.) 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019, Zhangjiajie, China, August 10-12, 2019, pp. 1004–1010. IEEE (2019). https://​doi.​org/​10.​1109/​HPCC/​SmartCity/​DSS.​2019.​00144
19.
Zurück zum Zitat Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). https://doi.org/10.1145/2934664CrossRef Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). https://​doi.​org/​10.​1145/​2934664CrossRef
21.
Zurück zum Zitat Zhang, S., He, B., Dahlmeier, D., Zhou, A.C., Heinze, T.: Revisiting the design of data stream processing systems on multi-core processors. In: 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19–22, 2017, pp. 659–670. IEEE Computer Society (2017). https://doi.org/10.1109/ICDE.2017.119 Zhang, S., He, B., Dahlmeier, D., Zhou, A.C., Heinze, T.: Revisiting the design of data stream processing systems on multi-core processors. In: 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19–22, 2017, pp. 659–670. IEEE Computer Society (2017). https://​doi.​org/​10.​1109/​ICDE.​2017.​119
22.
Zurück zum Zitat Zhang, S., He, J., Zhou, A.C., He, B.: Briskstream: Scaling data stream processing on shared-memory multicore architectures. In: P.A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, T. Kraska (eds.) Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30–July 5, 2019, pp. 705–722. ACM (2019). https://doi.org/10.1145/3299869.3300067 Zhang, S., He, J., Zhou, A.C., He, B.: Briskstream: Scaling data stream processing on shared-memory multicore architectures. In: P.A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, T. Kraska (eds.) Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30–July 5, 2019, pp. 705–722. ACM (2019). https://​doi.​org/​10.​1145/​3299869.​3300067
Metadaten
Titel
RDMA-Based Apache Storm for High-Performance Stream Data Processing
verfasst von
Ziyu Zhang
Zitan Liu
Qingcai Jiang
Junshi Chen
Hong An
Publikationsdatum
18.03.2021
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 5/2021
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-021-00696-0

Weitere Artikel der Ausgabe 5/2021

International Journal of Parallel Programming 5/2021 Zur Ausgabe

Premium Partner