Skip to main content
Top

2019 | OriginalPaper | Chapter

Adaptive Partitioning and Order-Preserved Merging of Data Streams

Authors : Constantin Pohl, Kai-Uwe Sattler

Published in: Advances in Databases and Information Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Partitioning is a key concept for utilizing modern hardware, especially to exploit parallelism opportunities from many-core CPUs. In data streaming scenarios where parameters like tuple arrival rates can vary, adaptive strategies for partitioning solve the problem of overestimating or underestimating query workloads. While there are many possibilities to partition the data flow, threads running partitions independently from each other lead to unordered output inevitably. This is a considerable difficulty for applications where tuple order matters, like in stream reasoning or complex event processing scenarios.
In this paper, we address this problem by combining an adaptive partitioning approach with an order-preserving merge algorithm. Since reordering output tuples can only worsen latency, we mainly focus on the throughput of queries while keeping the delay on individual tuples minimal. We run micro-benchmarks as well as the Linear Road benchmark, demonstrating correctness and effectiveness of our approach while scaling out on a single Xeon Phi many-core CPU up to 256 partitions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Arasu, A., et al.: Linear road: a stream data management benchmark. In: (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, pp. 480–491, 31 August–3 September 2004 Arasu, A., et al.: Linear road: a stream data management benchmark. In: (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, pp. 480–491, 31 August–3 September 2004
2.
go back to reference Carbone, P., et al.: Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015) Carbone, P., et al.: Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015)
9.
go back to reference Matteis, T.D., et al.: Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, pp. 13:1–13:12, 12–16 March 2016. https://doi.org/10.1145/2851141.2851148 Matteis, T.D., et al.: Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, pp. 13:1–13:12, 12–16 March 2016. https://​doi.​org/​10.​1145/​2851141.​2851148
10.
12.
go back to reference Pacaci, A., et al.: Distribution-aware stream partitioning for distributed stream processing systems. In: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR@SIGMOD 2018, Houston, TX, USA, pp. 6:1–6:10, 15 June 2018. https://doi.org/10.1145/3206333.3206338 Pacaci, A., et al.: Distribution-aware stream partitioning for distributed stream processing systems. In: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR@SIGMOD 2018, Houston, TX, USA, pp. 6:1–6:10, 15 June 2018. https://​doi.​org/​10.​1145/​3206333.​3206338
13.
go back to reference Rivetti, N., et al.: Efficient key grouping for near-optimal load balancing in stream processing systems. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015, Oslo, Norway, pp. 80–91, 29 June–3 July 2015. https://doi.org/10.1145/2675743.2771827 Rivetti, N., et al.: Efficient key grouping for near-optimal load balancing in stream processing systems. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015, Oslo, Norway, pp. 80–91, 29 June–3 July 2015. https://​doi.​org/​10.​1145/​2675743.​2771827
17.
go back to reference Zeitler, E., et al.: Massive scale-out of expensive continuous queries. PVLDB 4(11), 1181–1188 (2011) Zeitler, E., et al.: Massive scale-out of expensive continuous queries. PVLDB 4(11), 1181–1188 (2011)
Metadata
Title
Adaptive Partitioning and Order-Preserved Merging of Data Streams
Authors
Constantin Pohl
Kai-Uwe Sattler
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-28730-6_17

Premium Partner