Skip to main content
Top
Published in: Knowledge and Information Systems 8/2020

21-02-2020 | Regular Paper

L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes

Authors: Savong Bou, Hiroyuki Kitagawa, Toshiyuki Amagasa

Published in: Knowledge and Information Systems | Issue 8/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The number of real-time information sources, or so-called streams, has rapidly increased, leading to a greater demand for complex analyses over streams. Although many stream analysis methods exist, aggregation is fundamental to ascertain higher levels of knowledge from raw data. In particular, sliding-window aggregation, where aggregations over sliding windows are repeatedly computed, is useful in many real-life applications. Two stacks is the state-of-the-art method to compute sliding-window aggregations incrementally with a O(1) time complexity. However, its performance seriously degrades as the window size increases due to the high overhead to maintain its index. To address this problem, this paper proposes a linear bidirectional index (L-BiX) that exploits two kinds of partial aggregations. Specifically, forward (old-to-new) and backward (new-to-old) aggregations allow efficient computations in an incremental manner. The proposed algorithm requires the same time complexity as two stacks (O(1)). Our experimental evaluation shows that the throughput of L-BiX can be faster by up to 1.71 times than that of two stacks with a 50% smaller memory footprint.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abadi DJ et al (2003) Aurora: a new model and architecture for data stream management. VLDB J 2(12):120–139CrossRef Abadi DJ et al (2003) Aurora: a new model and architecture for data stream management. VLDB J 2(12):120–139CrossRef
2.
go back to reference Abadi DJ et al (2005) The design of the borealis stream processing engine. In: Proceedings of the CIDR, pp 277–289 Abadi DJ et al (2005) The design of the borealis stream processing engine. In: Proceedings of the CIDR, pp 277–289
3.
go back to reference Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 11(6):1033–1044CrossRef Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 11(6):1033–1044CrossRef
4.
go back to reference Motwani R et al (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the CIDR Motwani R et al (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the CIDR
5.
go back to reference Toshniwal A et al (2014) Storm@twitter. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 147–156 Toshniwal A et al (2014) Storm@twitter. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 147–156
6.
go back to reference Murray DG et al (2013) Naiad: a timely dataflow system. In: Proceedings of the ACM symposium on operating systems principles, pp 439–455 Murray DG et al (2013) Naiad: a timely dataflow system. In: Proceedings of the ACM symposium on operating systems principles, pp 439–455
7.
go back to reference Ruxton GD (2006) The unequal variance t test is an underused alternative to student’s t test and the Mann Whitney U test. Behav Ecol 17(4):688–690 CrossRef Ruxton GD (2006) The unequal variance t test is an underused alternative to student’s t test and the Mann Whitney U test. Behav Ecol 17(4):688–690 CrossRef
8.
go back to reference Shein AU, Chrysanthis PK, Labrinidis A (2015) Processing of aggregate continuous queries in a distributed environment. Real-time business intelligence and analytics - international workshops, BIRTE 2015, Kohala Coast, HI, USA, August 31, 2015, BIRTE 2016, New Delhi, India, September 5, 2016, BIRTE 2017, Munich, Germany, August 28, 2017, Revised Selected Papers, pp 45–62. https://doi.org/10.1007/978-3-030-24124-7_4 Shein AU, Chrysanthis PK, Labrinidis A (2015) Processing of aggregate continuous queries in a distributed environment. Real-time business intelligence and analytics - international workshops, BIRTE 2015, Kohala Coast, HI, USA, August 31, 2015, BIRTE 2016, New Delhi, India, September 5, 2016, BIRTE 2017, Munich, Germany, August 28, 2017, Revised Selected Papers, pp 45–62. https://​doi.​org/​10.​1007/​978-3-030-24124-7_​4
9.
go back to reference Gulisano V et al (2012) Streamcloud: an elastic and scalable data streaming system. IEEE Trans Parallel Distrib Syst 12(23):2351–2351CrossRef Gulisano V et al (2012) Streamcloud: an elastic and scalable data streaming system. IEEE Trans Parallel Distrib Syst 12(23):2351–2351CrossRef
10.
go back to reference Arasu A et al (2006) The CQL continuous query language: semantic foundations and query execution. VLDB J 2(15):121–142CrossRef Arasu A et al (2006) The CQL continuous query language: semantic foundations and query execution. VLDB J 2(15):121–142CrossRef
11.
go back to reference Li J et al (2005) Semantics and evaluation techniques for window aggregates in data streams. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 311–322 Li J et al (2005) Semantics and evaluation techniques for window aggregates in data streams. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 311–322
12.
go back to reference Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc VLDB Endow 12(8):1792–1803CrossRef Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc VLDB Endow 12(8):1792–1803CrossRef
13.
go back to reference Gedik B (2014) Generic windowing support for extensible stream processing systems. Softw Pract Exp 9(44):1105–1128CrossRef Gedik B (2014) Generic windowing support for extensible stream processing systems. Softw Pract Exp 9(44):1105–1128CrossRef
14.
go back to reference Ali MH et al (2010) Spatio-temporal stream processing in microsoft streaminsight. Softw Pract Exp 2(33):69–74 Ali MH et al (2010) Spatio-temporal stream processing in microsoft streaminsight. Softw Pract Exp 2(33):69–74
15.
go back to reference Yang J, Widom J (2003) Incremental computation and maintenance of temporal aggregates. VLDB J 3(12):262–283CrossRef Yang J, Widom J (2003) Incremental computation and maintenance of temporal aggregates. VLDB J 3(12):262–283CrossRef
16.
go back to reference Moon B et al (2000) Scalable algorithms for large temporal aggregation. In: International conference on data engineering (ICDE), pp 145–154 Moon B et al (2000) Scalable algorithms for large temporal aggregation. In: International conference on data engineering (ICDE), pp 145–154
17.
go back to reference Soisalon-Soininen E, Widmayer P (2003) Single and bulk updates in stratified trees: an amortized and worst-case analysis. In: Goos G, Hartmanis J, van Leeuwen J (eds) Computer science in perspective: essays dedicated to Thomas Ottmann, LNCS (Lecture Notes in Computer Science), vol 2598. Springer, Berlin, pp 278–292 CrossRef Soisalon-Soininen E, Widmayer P (2003) Single and bulk updates in stratified trees: an amortized and worst-case analysis. In: Goos G, Hartmanis J, van Leeuwen J (eds) Computer science in perspective: essays dedicated to Thomas Ottmann, LNCS (Lecture Notes in Computer Science), vol 2598. Springer, Berlin, pp 278–292 CrossRef
18.
go back to reference Bou S et al (2019) Streamingcube-based analytical framework for environmental data analysis. In: BigComp 2019, pp 1–8 Bou S et al (2019) Streamingcube-based analytical framework for environmental data analysis. In: BigComp 2019, pp 1–8
19.
go back to reference Bou S et al (2019) Scalable keyword search over relational data streams by aggressive candidate network consolidation. Inf Syst 81:117–135CrossRef Bou S et al (2019) Scalable keyword search over relational data streams by aggressive candidate network consolidation. Inf Syst 81:117–135CrossRef
20.
go back to reference Bou S et al (2014) Keyword search with path-based filtering over XML streams. In: The 33rd IEEE symposium on reliable distributed systems (SRDS 2014), pp 337–338 Bou S et al (2014) Keyword search with path-based filtering over XML streams. In: The 33rd IEEE symposium on reliable distributed systems (SRDS 2014), pp 337–338
21.
go back to reference Bou S et al (2014) Filtering XML streams by XPath and Keywords. In: Proceedings of the 16th international conference on information integration and web-based applications and services (iiWAS 2014), pp 410–419 Bou S et al (2014) Filtering XML streams by XPath and Keywords. In: Proceedings of the 16th international conference on information integration and web-based applications and services (iiWAS 2014), pp 410–419
22.
go back to reference Bou S et al (2015) Path-based keyword search over XML streams. Int J Web Inf Syst (IJWIS) 11(3):347–369CrossRef Bou S et al (2015) Path-based keyword search over XML streams. Int J Web Inf Syst (IJWIS) 11(3):347–369CrossRef
23.
go back to reference Bou S et al (2016) An improved method of keyword search over relational data streams by aggressive candidate network consolidation. In: Database and expert systems applications, pp 336–351 Bou S et al (2016) An improved method of keyword search over relational data streams by aggressive candidate network consolidation. In: Database and expert systems applications, pp 336–351
28.
go back to reference Blount M et al (2010) Real-time analysis for intensive care: development and deployment of the artemis analytic system. IEEE Eng Med Biol Mag 29:110–118CrossRef Blount M et al (2010) Real-time analysis for intensive care: development and deployment of the artemis analytic system. IEEE Eng Med Biol Mag 29:110–118CrossRef
30.
go back to reference Gray J et al (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min Knowl Discov 1(1):29–53CrossRef Gray J et al (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min Knowl Discov 1(1):29–53CrossRef
31.
go back to reference Wesley R, Xu F (2016) Incremental computation of common windowed holistic aggregates. Proc VLDB Endow 9(12):1221–1232CrossRef Wesley R, Xu F (2016) Incremental computation of common windowed holistic aggregates. Proc VLDB Endow 9(12):1221–1232CrossRef
32.
go back to reference Li J (2005) No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Rec 34(1):39–44CrossRef Li J (2005) No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Rec 34(1):39–44CrossRef
33.
go back to reference Krishnamurthy S et al (2006) On-the-fly sharing for streamed aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 623–634 Krishnamurthy S et al (2006) On-the-fly sharing for streamed aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 623–634
34.
go back to reference Carbone P et al (2016) Cutty: aggregate sharing for user-defined windows. In: Proceedings of the ACM international on conference on information and knowledge management, pp 1201–1210 Carbone P et al (2016) Cutty: aggregate sharing for user-defined windows. In: Proceedings of the ACM international on conference on information and knowledge management, pp 1201–1210
35.
go back to reference Bhatotia P et al (2014) Slider: incremental sliding window analytics. In: Proceedings of the 15th international middleware conference, pp 61–72 Bhatotia P et al (2014) Slider: incremental sliding window analytics. In: Proceedings of the 15th international middleware conference, pp 61–72
36.
go back to reference Shein AU et al (2017) Flatfit: accelerated incremental sliding-window aggregation for real-time analytics. In: Proceedings of international conference on scientific and statistical database management, vol 5, pp 1–12 Shein AU et al (2017) Flatfit: accelerated incremental sliding-window aggregation for real-time analytics. In: Proceedings of international conference on scientific and statistical database management, vol 5, pp 1–12
37.
go back to reference Tangwongsan K et al (2015) General incremental sliding-window aggregation. Proc VLDB Endow 8(7):702–713CrossRef Tangwongsan K et al (2015) General incremental sliding-window aggregation. Proc VLDB Endow 8(7):702–713CrossRef
38.
go back to reference Tangwongsan K et al (2017) Low-latency sliding-window aggregation in worst-case constant time. In: Proceedings of the 11th ACM international conference on distributed and event-based systems, pp 66–77 Tangwongsan K et al (2017) Low-latency sliding-window aggregation in worst-case constant time. In: Proceedings of the 11th ACM international conference on distributed and event-based systems, pp 66–77
39.
go back to reference Arasu A, Widom J (2004) Resource sharing in continuous sliding-window aggregates. In: Proceedings of the VLDB conference, pp 336–347 Arasu A, Widom J (2004) Resource sharing in continuous sliding-window aggregates. In: Proceedings of the VLDB conference, pp 336–347
40.
go back to reference Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: Proceedings of the ACM SIGMOD–SIGACT–SIGART symposium on principles of database systems, pp 286–296 Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: Proceedings of the ACM SIGMOD–SIGACT–SIGART symposium on principles of database systems, pp 286–296
41.
go back to reference Bulut A, Singh AK (2003) SWAT: hierarchical stream summarization in large networks. In: International conference on data engineering (ICDE), pp 303–314 Bulut A, Singh AK (2003) SWAT: hierarchical stream summarization in large networks. In: International conference on data engineering (ICDE), pp 303–314
42.
go back to reference Jiao Y (2006) Maintaining stream statistics over multiscale sliding windows. ACM Trans Database Syst (TODS) 31(4):1305–1334MathSciNetCrossRef Jiao Y (2006) Maintaining stream statistics over multiscale sliding windows. ACM Trans Database Syst (TODS) 31(4):1305–1334MathSciNetCrossRef
43.
go back to reference Gibbons PB, Tirthapura S et al (2002) Distributed streams algorithms for sliding windows. In: Proceedings of the ACM symposium on parallel algorithms and architectures, pp 63–72 Gibbons PB, Tirthapura S et al (2002) Distributed streams algorithms for sliding windows. In: Proceedings of the ACM symposium on parallel algorithms and architectures, pp 63–72
44.
go back to reference Guirguis S et al (2011) Optimized processing of multiple aggregate continuous queries. In: Proceedings of the ACM international conference on information and knowledge management, pp 1515–1524 Guirguis S et al (2011) Optimized processing of multiple aggregate continuous queries. In: Proceedings of the ACM international conference on information and knowledge management, pp 1515–1524
45.
go back to reference Shein AU et al (2015) F1: accelerating the optimization of aggregate continuous queries. In: Proceedings of the ACM international conference on information and knowledge management, pp 1151–1160 Shein AU et al (2015) F1: accelerating the optimization of aggregate continuous queries. In: Proceedings of the ACM international conference on information and knowledge management, pp 1151–1160
46.
go back to reference Jerzak Z et al (2012) The DEBS 2012 grand challenge. In: Proceedings of the international conference on distributed event-based systems, pp 393–398 Jerzak Z et al (2012) The DEBS 2012 grand challenge. In: Proceedings of the international conference on distributed event-based systems, pp 393–398
Metadata
Title
L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes
Authors
Savong Bou
Hiroyuki Kitagawa
Toshiyuki Amagasa
Publication date
21-02-2020
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 8/2020
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-020-01444-5

Other articles of this Issue 8/2020

Knowledge and Information Systems 8/2020 Go to the issue

Premium Partner