ABSTRACT
Evaluating modern stream processing systems in a reproducible manner requires data streams with different data distributions, data rates, and real-world characteristics such as delayed and out-of-order tuples. In this paper, we present an open source stream generator which generates reproducible and deterministic out-of-order streams based on real data files, simulating arbitrary fractions of out-of-order tuples and their respective delays.
- Tyler Akidau, Robert Bradshaw, et al. 2015. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. In VLDB.Google Scholar
- Davide Anguita, Alessandro Ghio, Luca Oneto, et al. 2013. A public domain dataset for human activity recognition using smartphones.. In Esann.Google Scholar
- Savong BOU, Hiroyuki KITAGAWA, and Toshiyuki AMAGASA. 2018. CBiX: Incremental Sliding-Window Aggregation For Real-Time Analytics Over Out-of-Order Data Streams. In DEIM.Google Scholar
- Zbigniew Jerzak, Thomas Heinze, Matthias Fehr, et al. 2012. The DEBS 2012 Grand Challenge. In DEBS.Google Scholar
- Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, and Volker Markl. 2018. Benchmarking distributed stream data processing systems. In ICDE.Google Scholar
- Jin Li, David Maier, Kristin Tufte, et al. 2005. Semantics and evaluation techniques for window aggregates in data streams. In SIGMOD. Google ScholarDigital Library
- Christopher Mutschler, Holger Ziekow, and Zbigniew Jerzak. 2013. The DEBS 2013 Grand Challenge. In DEBS.Google Scholar
- Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2018. Sub-O (log n) Out-of-Order Sliding-Window Aggregation. arXiv preprint (2018).Google Scholar
- New York City Taxi and Limousine Commission. {n.d.}. Tlc trip record data. https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page.Google Scholar
- Jonas Traub, Philipp M. Grulich, Alejandro R. Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl. 2018. Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing. In ICDE.Google Scholar
- Jonas Traub, Philipp M. Grulich, Alejandro R. Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl. 2019. Efficient Window Aggregation with General Stream Slicing. In EDBT.Google Scholar
- Jonas Traub, Nikolaas Steenbergen, Philipp Grulich, Tilmann Rabl, and Volker Markl. 2017. I2: Interactive Real-Time Visualization for Streaming Data.. In EDBT.Google Scholar
Index Terms
- Generating Reproducible Out-of-Order Data Streams
Recommendations
The HP PA-8000 RISC CPU
The PA-8000 RISC CPU is the first implementation of a new generation of microprocessors from Hewlett-Packard Company. The processor was designed for high-end systems and to support the new 64-bit PA-RISC 2.0 architecture. The aggressive four-way ...
Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks
Dataflow process networks are a convenient formalism for implementing robust concurrent systems that has been successfully used for hardware and software systems in the past. However, the strictly stream-based execution limits the performance of dataflow ...
Pre-processing and data validation in IoT data streams
DEBS '20: Proceedings of the 14th ACM International Conference on Distributed and Event-based SystemsIn the last few years, distributed stream processing engines have been on the rise due to their crucial impacts on real-time data processing with guaranteed low latency in several application domains such as financial markets, surveillance systems, ...
Comments