Skip to main content

2019 | OriginalPaper | Buchkapitel

4. Data Stream Management

verfasst von : Wolfram Wingerath, Norbert Ritter, Felix Gessert

Erschienen in: Real-Time & Stream Data Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In some domains, data arrives so fast and in such great quantity that storing it in a database collection is simply infeasible. When the incoming data relates to ongoing (real-world) events that require immediate action, persistence may further not even be useful; for example, data in electronic trading, network monitoring, or real-time fraud detection is only valuable for a short amount of time and therefore has to be utilized immediately. To adapt to these circumstances, data stream management systems (DSMSs) introduce the data stream as an abstraction for an infinite sequence of database records that arrive over time. The raw data streams arriving at the systems are usually referred to as base streams, whereas those resulting from data transformations (e.g. queries) are called derived streams. Since a data stream is impossible to store entirely due to its unbounded nature, DSMSs drop the database requirement of eternal data persistence: They retain incoming records for limited time only and eventually discard them.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
An attribute is monotonic, if all its values are either decreasing or increasing, such as arrival timestamps in a centralized DSMS. Similarly, an attribute is quasi-monotonic, if it is correlated to a monotonic attribute. For example, the time at which an event is registered according to a sensor’s local clock (event time) is quasi-monotonic, because it typically corresponds to the time at which it is received according to the server’s clock (arrival time) within a certain error margin (cf. Sect. 4.2).
 
2
All write operations in PipelineDB are coordinated synchronously via two-phase commit between all nodes [Pipb], so that highly distributed setups are likely to experience increased latency as well as reduced throughput and availability [Pan15, Sec. 3.1].
 
3
As an example, consider the graphical user interface of Aurora/Borealis which is based on arrows and boxes rather than SQL-style declarative statements [Çet+16].
 
4
Specifically, providing undo information requires buffering the original output [Aki+15, Sec. 2.3]. Likewise, reprocessing huge amounts of data to generate updated records can lead to CPU contention and can thus significantly impair overall system performance [Kre14c].
 
Literatur
[Aba+05]
Zurück zum Zitat Daniel J Abadi et al. “The Design of the Borealis Stream Processing Engine”. In: Second Biennial Conference on Innovative Data Systems Research (CIDR 2005). Asilomar, CA, 2005. Daniel J Abadi et al. “The Design of the Borealis Stream Processing Engine”. In: Second Biennial Conference on Innovative Data Systems Research (CIDR 2005). Asilomar, CA, 2005.
[ABB+13]
Zurück zum Zitat Tyler Akidau, Alex Balikov, Kaya Bekiroglu, et al. “MillWheel: Fault-Tolerant Stream Processing at Internet Scale”. In: Very Large Data Bases. 2013, pp. 734–746. Tyler Akidau, Alex Balikov, Kaya Bekiroglu, et al. “MillWheel: Fault-Tolerant Stream Processing at Internet Scale”. In: Very Large Data Bases. 2013, pp. 734–746.
[Aki+15]
Zurück zum Zitat Tyler Akidau et al. “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing”. In: Proceedings of the VLDB Endowment8 (2015), pp. 1792–1803. Tyler Akidau et al. “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing”. In: Proceedings of the VLDB Endowment8 (2015), pp. 1792–1803.
[Bab+02]
Zurück zum Zitat Brian Babcock et al. “Models and Issues in Data Stream Systems”. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems PODS’02. Madison, Wisconsin: ACM, 2002, pp. 1–16. isbn: 1-58113-507-6. url: https://doi.org/10.1145/543613.543615. http://doi.acm.org./10.1145/543613.543615. Brian Babcock et al. “Models and Issues in Data Stream Systems”. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems PODS’02. Madison, Wisconsin: ACM, 2002, pp. 1–16. isbn: 1-58113-507-6. url: https://​doi.​org/​10.​1145/​543613.​543615. http://​doi.​acm.​org.​/​10.​1145/​543613.​543615.​
[BD15]
Zurück zum Zitat Ralf Bruns and Jürgen Dunkel. Complex Event Processing: Komplexe Analyse von massiven Datenströmen mit CEP. Springer Vieweg, 2015.CrossRef Ralf Bruns and Jürgen Dunkel. Complex Event Processing: Komplexe Analyse von massiven Datenströmen mit CEP. Springer Vieweg, 2015.CrossRef
[BDM07]
Zurück zum Zitat Brian Babcock, Mayur Datar, and Rajeev Motwani. “Load Shedding in Data Stream Systems”. In: Data Streams – Models and Algorithms. Vol. 31. Advances in Database Systems. Springer, 2007, pp. 127–147. Brian Babcock, Mayur Datar, and Rajeev Motwani. “Load Shedding in Data Stream Systems”. In: Data Streams – Models and Algorithms. Vol. 31. Advances in Database Systems. Springer, 2007, pp. 127–147.
[Car+17]
Zurück zum Zitat Paris Carbone et al. “Large-Scale Data Stream Processing Systems”. In: Handbook of Big Data Technologies. Springer, 2017, pp. 219–260. Paris Carbone et al. “Large-Scale Data Stream Processing Systems”. In: Handbook of Big Data Technologies. Springer, 2017, pp. 219–260.
[CVZ13]
Zurück zum Zitat P. Carbone, K. Vandikas, and F Zaloshnja. “Towards Highly Available Complex Event Processing Deployments in the Cloud”. In: 2013 Seventh International Conference on Next Generation Mobile Apps, Services and Technologies. 2013, pp. 153–158. https://doi.org/10.1109/NGMAST.2013.35 P. Carbone, K. Vandikas, and F Zaloshnja. “Towards Highly Available Complex Event Processing Deployments in the Cloud”. In: 2013 Seventh International Conference on Next Generation Mobile Apps, Services and Technologies. 2013, pp. 153–158. https://​doi.​org/​10.​1109/​NGMAST.​2013.​35
[Dat+02]
Zurück zum Zitat Mayur Datar et al. “Maintaining Stream Statistics over Sliding Windows”. In: SIAM Journal on Computing 31.6 (2002), pp. 1794–1813.MathSciNetCrossRef Mayur Datar et al. “Maintaining Stream Statistics over Sliding Windows”. In: SIAM Journal on Computing 31.6 (2002), pp. 1794–1813.MathSciNetCrossRef
[DLOM02]
Zurück zum Zitat Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. “Frequency Estimation of Internet Packet Streams with Limited Space”. In: Proceedings of the 10th Annual European Symposium on Algorithms. ESA’02. London, UK, UK: Springer-Verlag, 2002, pp. 348–360. isbn: 3-540-44180-8. url: http://dlacmorg/citationcfm?id=647912.740658. Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. “Frequency Estimation of Internet Packet Streams with Limited Space”. In: Proceedings of the 10th Annual European Symposium on Algorithms. ESA’02. London, UK, UK: Springer-Verlag, 2002, pp. 348–360. isbn: 3-540-44180-8. url: http://​dlacmorg/​citationcfm?​id=​647912.​740658.​
[ENL11]
Zurück zum Zitat Opher Etzion, Peter Niblett, and David C Luckham. Event processing in action. Ed. by Sebastian Stirling. Manning Greenwich, 2011. Opher Etzion, Peter Niblett, and David C Luckham. Event processing in action. Ed. by Sebastian Stirling. Manning Greenwich, 2011.
[FR11]
Zurück zum Zitat M. Ficco and L. Romano. “A Generic Intrusion Detection and Diagnoser System Based on Complex Event Processing”. In: 2011 First International Conference on Data Compression, Communications and Processing 2011, pp. 275–284. https://doi.org/10.1109/CCP.2011.43. M. Ficco and L. Romano. “A Generic Intrusion Detection and Diagnoser System Based on Complex Event Processing”. In: 2011 First International Conference on Data Compression, Communications and Processing 2011, pp. 275–284. https://​doi.​org/​10.​1109/​CCP.​2011.​43.
[GAE06]
Zurück zum Zitat Thanaa M. Ghanem, Walid G. Aref, and Ahmed K. Elmagarmid. “Ex- ploiting Predicate-window Semantics over Data Streams”. In: SIGMOD Rec. 35.1 (Mar 2006), pp. 3–8. issn: 0163-5808. url: https://doi.org/10.1145/1121995.1121996. http://doiacmorg/10.1145/1121995.1121996CrossRef Thanaa M. Ghanem, Walid G. Aref, and Ahmed K. Elmagarmid. “Ex- ploiting Predicate-window Semantics over Data Streams”. In: SIGMOD Rec. 35.1 (Mar 2006), pp. 3–8. issn: 0163-5808. url: https://​doi.​org/​10.​1145/​1121995.​1121996. http://​doiacmorg/​10.​1145/​1121995.​1121996CrossRef
[Gia12]
Zurück zum Zitat Piero Giacomelli. Hornetq messaging developer’s guide Ed. by Ankita Shashi. Packt Publishing Ltd., 2012. Piero Giacomelli. Hornetq messaging developer’s guide Ed. by Ankita Shashi. Packt Publishing Ltd., 2012.
[Gib01]
Zurück zum Zitat Phillip B. Gibbons. “Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports”. In: Proceedings of the 27th International Conference on Very Large Data Bases VLDB ’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001, pp. 541–550. isbn: 1-55860-804-4. url: http://dl.acm.org/citation.cfm?id=645927.672351 Phillip B. Gibbons. “Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports”. In: Proceedings of the 27th International Conference on Very Large Data Bases VLDB ’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001, pp. 541–550. isbn: 1-55860-804-4. url: http://​dl.​acm.​org/​citation.​cfm?​id=​645927.​672351
[Gol06]
Zurück zum Zitat Lukasz Golab. “Sliding Window Query Processing over Data Streams”. PhD thesis. University of Waterloo, Aug. 2006. Lukasz Golab. “Sliding Window Query Processing over Data Streams”. PhD thesis. University of Waterloo, Aug. 2006.
[GZ10]
Zurück zum Zitat Lukasz Golab and M. Tamer Zsu. Data Stream Management. Morgan & Claypool Publishers, 2010. isbn: 1608452727, 9781608452729. Lukasz Golab and M. Tamer Zsu. Data Stream Management. Morgan & Claypool Publishers, 2010. isbn: 1608452727, 9781608452729.
[Joh+05]
[KKM13]
Zurück zum Zitat Konstantinos Karanasos, Asterios Katsifodimos, and Ioana Manolescu. “Delta: Scalable Data Dissemination Under Capacity Constraints”. In: Proc. VLDB Endow 7.4 (Dec. 2013), pp. 217–228. issn: 2150-8097. url: https://doi.org/10.14778/2732240.2732241. http://dxdoi.org/10.14778/2732240.2732241CrossRef Konstantinos Karanasos, Asterios Katsifodimos, and Ioana Manolescu. “Delta: Scalable Data Dissemination Under Capacity Constraints”. In: Proc. VLDB Endow 7.4 (Dec. 2013), pp. 217–228. issn: 2150-8097. url: https://​doi.​org/​10.​14778/​2732240.​2732241. http://​dxdoi.​org/​10.​14778/​2732240.​2732241CrossRef
[KNR11]
Zurück zum Zitat Jay Kreps, Neha Narkhede, and Jun Rao. “Kafka: a Distributed Messaging System for Log Processing”. In: NetDB’11 2011. Jay Kreps, Neha Narkhede, and Jun Rao. “Kafka: a Distributed Messaging System for Log Processing”. In: NetDB’11 2011.
[Kre14c]
Zurück zum Zitat Jay Kreps. “Why local state is a fundamental primitive in stream proc- essing”. In: O’Reilly Media (July 2014). Accessed: 2017-11-30. url: https://wwworeillycom/ideas/why-local-state-is-a-fundamental-primitive-in-stream-processing. Jay Kreps. “Why local state is a fundamental primitive in stream proc- essing”. In: O’Reilly Media (July 2014). Accessed: 2017-11-30. url: https://​wwworeillycom/​ideas/​why-local-state-is-a-fundamental-primitive-in-stream-processing.​
[Lam+12]
Zurück zum Zitat Valerie Lampkin et al. Building smarter planet solutions with MQTT and IBM WebSphere MQ Telemetry. IBM Redbooks, 2012. Valerie Lampkin et al. Building smarter planet solutions with MQTT and IBM WebSphere MQ Telemetry. IBM Redbooks, 2012.
[MAEA05b]
Zurück zum Zitat Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. “Efficient Computation of Frequent and Top-k Elements in Data Streams”. In: Proceedings of the 10th International Conference on Database Theory. ICDT’05. Edinburgh, UK: Springer-Verlag, 2005, pp. 398–412. isbn: 3-540-24288-0, 978-3-540-24288-8. url: https://doi.org/10.1007/9783540-30570-5_27. http://dxdoiorg/10.1007/978-3-540-30570-5_27. Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. “Efficient Computation of Frequent and Top-k Elements in Data Streams”. In: Proceedings of the 10th International Conference on Database Theory. ICDT’05. Edinburgh, UK: Springer-Verlag, 2005, pp. 398–412. isbn: 3-540-24288-0, 978-3-540-24288-8. url: https://​doi.​org/​10.​1007/​9783540-30570-5_​27. http://​dxdoiorg/​10.​1007/​978-3-540-30570-5_​27.​
[Mot+03]
Zurück zum Zitat Rajeev Motwani et al. “Query Processing, Approximation, and Resource Management in a Data Stream Management System”. In: CIDR. 2003. Rajeev Motwani et al. “Query Processing, Approximation, and Resource Management in a Data Stream Management System”. In: CIDR. 2003.
[VRR10]
Zurück zum Zitat Kreimir Vidakovi, Thomas Renner, and Sascha Rex. Marktübersicht Real-Time Monitoring Software: Event Processing Tools im Überblick. Tech. rep. Fraunhofer Verlag, Fraunhofer-Informationszentrum Raum und Bau IRB, 2010. Kreimir Vidakovi, Thomas Renner, and Sascha Rex. Marktübersicht Real-Time Monitoring Software: Event Processing Tools im Überblick. Tech. rep. Fraunhofer Verlag, Fraunhofer-Informationszentrum Raum und Bau IRB, 2010.
[IBM14]
Zurück zum Zitat IBM Corporation. Of Streams and Storms. Tech. rep. IBM Software Group, 2014. IBM Corporation. Of Streams and Storms. Tech. rep. IBM Software Group, 2014.
[Çet+16]
Zurück zum Zitat Uğur Çetintemel et al. “The Aurora and Borealis Stream Processing Engines”. In: Data Stream Management: Processing High-Speed Data Streams. Ed. by Minos Garofalakis, Johannes Gehrke, and Rajeev Rastogi. Berlin, Heidelberg: Springer Berlin Heidelberg, 2016, pp. 337–359. isbn: 978-3-540-28608-0. Uğur Çetintemel et al. “The Aurora and Borealis Stream Processing Engines”. In: Data Stream Management: Processing High-Speed Data Streams. Ed. by Minos Garofalakis, Johannes Gehrke, and Rajeev Rastogi. Berlin, Heidelberg: Springer Berlin Heidelberg, 2016, pp. 337–359. isbn: 978-3-540-28608-0.
Metadaten
Titel
Data Stream Management
verfasst von
Wolfram Wingerath
Norbert Ritter
Felix Gessert
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-10555-6_4

Neuer Inhalt