Skip to main content
Top
Published in: Knowledge and Information Systems 1/2015

01-10-2015 | Regular Paper

Sliding windows over uncertain data streams

Authors: Michele Dallachiesa, Gabriela Jacques-Silva, Buğra Gedik, Kun-Lung Wu, Themis Palpanas

Published in: Knowledge and Information Systems | Issue 1/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Uncertain data streams can have tuples with both value and existential uncertainty. A tuple has value uncertainty when it can assume multiple possible values. A tuple is existentially uncertain when the sum of the probabilities of its possible values is \(<\)1. A situation where existential uncertainty can arise is when applying relational operators to streams with value uncertainty. Several prior works have focused on querying and mining data streams with both value and existential uncertainty. However, none of them have studied, in depth, the implications of existential uncertainty on sliding window processing, even though it naturally arises when processing uncertain data. In this work, we study the challenges arising from existential uncertainty, more specifically the management of count-based sliding windows, which are a basic building block of stream processing applications. We extend the semantics of sliding window to define the novel concept of uncertain sliding windows and provide both exact and approximate algorithms for managing windows under existential uncertainty. We also show how current state-of-the-art techniques for answering similarity join queries can be easily adapted to be used with uncertain sliding windows. We evaluate our proposed techniques under a variety of configurations using real data. The results show that the algorithms used to maintain uncertain sliding windows can efficiently operate while providing a high-quality approximation in query answering. In addition, we show that sort-based similarity join algorithms can perform better than index-based techniques (on 17 real datasets) when the number of possible values per tuple is low, as in many real-world applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Each dimension can be considered as an attribute.
 
2
The uniform distribution over \([0, x]\) has a fixed standard deviation that is only dependent on \(x\). To vary the standard deviation, we adapt the value of \(x\) (for \(\sigma =0.25\), \(x\approx 0.87\)).
 
Literature
1.
go back to reference Abadi D, Ahmad Y, Balazinska M, Çetintemel U, Cherniack M, Hwang JH, Lindner W, Maskey A, Rasin A, Ryvkina E, Tatbul N, Xing Y, Zdonik S (2005) The design of the Borealis stream processing engine. In: CIDR Abadi D, Ahmad Y, Balazinska M, Çetintemel U, Cherniack M, Hwang JH, Lindner W, Maskey A, Rasin A, Ryvkina E, Tatbul N, Xing Y, Zdonik S (2005) The design of the Borealis stream processing engine. In: CIDR
3.
go back to reference Aggarwal CC, Yu PS (2008) A framework for clustering uncertain data streams. In: IEEE ICDE Aggarwal CC, Yu PS (2008) A framework for clustering uncertain data streams. In: IEEE ICDE
4.
go back to reference Aßfalg J, Kriegel H-P, Kröger P, Renz M (2009) Probabilistic similarity search for uncertain time series. In: SSDBM Aßfalg J, Kriegel H-P, Kröger P, Renz M (2009) Probabilistic similarity search for uncertain time series. In: SSDBM
5.
go back to reference Aßfalg J, Kriegel HP, Kröger P, Renz M (2009) Probabilistic similarity search for uncertain time series. In: SSDBM, pp 435–443 Aßfalg J, Kriegel HP, Kröger P, Renz M (2009) Probabilistic similarity search for uncertain time series. In: SSDBM, pp 435–443
6.
go back to reference Benjelloun O, Sarma A, Halevy A, Widom J (2006) Uldbs: databases with uncertainty and lineage. In: VLDB Benjelloun O, Sarma A, Halevy A, Widom J (2006) Uldbs: databases with uncertainty and lineage. In: VLDB
7.
go back to reference Bernecker T, Kriegel HP, Renz M, Verhein F, Züfle A (2009) Probabilistic frequent itemset mining in uncertain databases. In: KDD, pp 119–128 Bernecker T, Kriegel HP, Renz M, Verhein F, Züfle A (2009) Probabilistic frequent itemset mining in uncertain databases. In: KDD, pp 119–128
8.
go back to reference Biem A, Bouillet E, Feng H, Ranganathan A, Riabov A, Verscheure O, Koutsopoulos H, Moran C (2010) IBM infosphere streams for scalable, real-time, intelligent transportation services. In: ACM SIGMOD Biem A, Bouillet E, Feng H, Ranganathan A, Riabov A, Verscheure O, Koutsopoulos H, Moran C (2010) IBM infosphere streams for scalable, real-time, intelligent transportation services. In: ACM SIGMOD
9.
go back to reference Calders T, Garboni C, Goethals B (2010) Approximation of frequentness probability of itemsets in uncertain data. In: Data mining (ICDM), 2010 IEEE 10th international conference on IEEE, pp 749–754 Calders T, Garboni C, Goethals B (2010) Approximation of frequentness probability of itemsets in uncertain data. In: Data mining (ICDM), 2010 IEEE 10th international conference on IEEE, pp 749–754
10.
go back to reference Cheng R, Kalashnikov D, Prabhakar S (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng 16(9):1112–1127CrossRef Cheng R, Kalashnikov D, Prabhakar S (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng 16(9):1112–1127CrossRef
11.
go back to reference Dai X, Yiu M, Mamoulis N, Tao Y, Vaitis M (2005) Probabilistic spatial queries on existentially uncertain data. In: SSTD Dai X, Yiu M, Mamoulis N, Tao Y, Vaitis M (2005) Probabilistic spatial queries on existentially uncertain data. In: SSTD
12.
go back to reference Dallachiesa M, Aggarwal C, Palpanas T (2014) Node classification in uncertain graphs. In: SSDBM 32 Dallachiesa M, Aggarwal C, Palpanas T (2014) Node classification in uncertain graphs. In: SSDBM 32
13.
go back to reference Dallachiesa M, Nushi B, Mirylenka K, Palpanas T (2012) Uncertain time-series similarity: return to the basics. PVLDB 5(11):1662–1673 Dallachiesa M, Nushi B, Mirylenka K, Palpanas T (2012) Uncertain time-series similarity: return to the basics. PVLDB 5(11):1662–1673
14.
go back to reference Dallachiesa M, Palpanas T (2013) Identifying streaming frequent items in ad hoc time windows. Data Knowl Eng 87:66–90CrossRef Dallachiesa M, Palpanas T (2013) Identifying streaming frequent items in ad hoc time windows. Data Knowl Eng 87:66–90CrossRef
15.
go back to reference Dallachiesa M, Palpanas T, Ilyas FI (2014) Top-k nearest neighbor search in uncertain data series. Proc VLDB Endowment Dallachiesa M, Palpanas T, Ilyas FI (2014) Top-k nearest neighbor search in uncertain data series. Proc VLDB Endowment
16.
go back to reference Daskalakis C, Diakonikolas I, Servedio RA (2012) Learning poisson binomial distributions. In: Proceedings of the 44th symposium on theory of computing. ACM, pp 709–728 Daskalakis C, Diakonikolas I, Servedio RA (2012) Learning poisson binomial distributions. In: Proceedings of the 44th symposium on theory of computing. ACM, pp 709–728
17.
go back to reference Diao Y, Li B, Liu A, Peng L, Sutton C, Tran TTL, Zink M (2009) Capturing data uncertainty in high-volume stream processing. In: CIDR Diao Y, Li B, Liu A, Peng L, Sutton C, Tran TTL, Zink M (2009) Capturing data uncertainty in high-volume stream processing. In: CIDR
18.
go back to reference Fernandez M, Williams S (2010) Closed-form expression for the poisson-binomial probability density function. IEEE Trans Aerosp Electron Syst 46(2):803–817 Fernandez M, Williams S (2010) Closed-form expression for the poisson-binomial probability density function. IEEE Trans Aerosp Electron Syst 46(2):803–817
19.
go back to reference Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: A survey of recent developments. ACM Comput Surv 42(4):14 Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: A survey of recent developments. ACM Comput Surv 42(4):14
20.
go back to reference Gedik B (2013) Generic windowing support for extensible stream processing systems. Softw Pract Exp 44(9):1105–1128 Gedik B (2013) Generic windowing support for extensible stream processing systems. Softw Pract Exp 44(9):1105–1128
21.
go back to reference Gedik B, Andrade H (2012) A model-based framework for building extensible, high performance stream processing middleware and programming language for IBM infosphere streams. Softw Pract Exp 42(11):1363–1391 Gedik B, Andrade H (2012) A model-based framework for building extensible, high performance stream processing middleware and programming language for IBM infosphere streams. Softw Pract Exp 42(11):1363–1391
22.
go back to reference Getoor L, Friedman N, Koller D, Taskar B (2003) Learning probabilistic models of link structure. J Mach Learn Res 3:679–707MATHMathSciNet Getoor L, Friedman N, Koller D, Taskar B (2003) Learning probabilistic models of link structure. J Mach Learn Res 3:679–707MATHMathSciNet
23.
go back to reference Halpern J (2003) Reasoning about uncertainty. MIT Press, CambridgeMATH Halpern J (2003) Reasoning about uncertainty. MIT Press, CambridgeMATH
24.
go back to reference Hirzel M, Andrade H, Gedik B, Kumar V, Losa G, Mendell M, Nasgaard H, Soulé R, Wu KL (2009) SPL language specification. Technical report RC24897. IBM Research Hirzel M, Andrade H, Gedik B, Kumar V, Losa G, Mendell M, Nasgaard H, Soulé R, Wu KL (2009) SPL language specification. Technical report RC24897. IBM Research
25.
go back to reference Hong Y (2011) On computing the distribution function for the sum of independent and non-identical random indicators. Technical report, Department of Statistics, Virginia Tech Hong Y (2011) On computing the distribution function for the sum of independent and non-identical random indicators. Technical report, Department of Statistics, Virginia Tech
26.
go back to reference Jayram TS, McGregor A, Muthukrishnan S, Vee E (2007) Estimating statistical aggregates on probabilistic data streams. In: ACM PODS Jayram TS, McGregor A, Muthukrishnan S, Vee E (2007) Estimating statistical aggregates on probabilistic data streams. In: ACM PODS
27.
go back to reference Jin C, Yi K, Chen L, Yu JX, Lin X (2008) Sliding-window top-k queries on uncertain streams. Proc VLDB Endowment 1(1):301–312CrossRef Jin C, Yi K, Chen L, Yu JX, Lin X (2008) Sliding-window top-k queries on uncertain streams. Proc VLDB Endowment 1(1):301–312CrossRef
28.
go back to reference Kanagal B, Deshpande A (2008) Online filtering, smoothing and probabilistic modeling of streaming data. In IEEE ICDE Kanagal B, Deshpande A (2008) Online filtering, smoothing and probabilistic modeling of streaming data. In IEEE ICDE
30.
go back to reference Kriegel H, Kunath P, Pfeifle M, Renz M (2006) Probabilistic similarity join on uncertain data. In: DASFAA Kriegel H, Kunath P, Pfeifle M, Renz M (2006) Probabilistic similarity join on uncertain data. In: DASFAA
31.
go back to reference Kuo W, Zuo M (2003) Optimal reliability modeling: principles and applications. Wiley, New York Kuo W, Zuo M (2003) Optimal reliability modeling: principles and applications. Wiley, New York
32.
go back to reference Leung CKS, Hao B (2009) Mining of frequent itemsets from streams of uncertain data. In: IEEE ICDE Leung CKS, Hao B (2009) Mining of frequent itemsets from streams of uncertain data. In: IEEE ICDE
33.
go back to reference Lian X, Chen L (2011) Similarity join processing on uncertain data streams. IEEE TKDE 23(11) Lian X, Chen L (2011) Similarity join processing on uncertain data streams. IEEE TKDE 23(11)
34.
go back to reference Liao L, Fox D, Kautz H (2007) Extracting places and activities from gps traces using hierarchical conditional random fields. Int J Rob Res 26(1):119–134CrossRef Liao L, Fox D, Kautz H (2007) Extracting places and activities from gps traces using hierarchical conditional random fields. Int J Rob Res 26(1):119–134CrossRef
35.
36.
go back to reference Moon B, Jagadish HV, Faloutsos C, Saltz JH (2001) Analysis of the clustering properties of the hilbert space-filling curve. IEEE TKDE 13(1) Moon B, Jagadish HV, Faloutsos C, Saltz JH (2001) Analysis of the clustering properties of the hilbert space-filling curve. IEEE TKDE 13(1)
37.
go back to reference Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: KDCloud Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: KDCloud
38.
go back to reference Nybø R (2008) Time series opportunities in the petroleum industry. In: ESTSP 08, European symposium on time series prediction, Porvoo, Finland Nybø R (2008) Time series opportunities in the petroleum industry. In: ESTSP 08, European symposium on time series prediction, Porvoo, Finland
39.
go back to reference Raza U, Camerra A, Murphy AL, Palpanas T, Picco GP (2012) What does model-driven data acquisition really achieve in wireless sensor networks? In: PERCOM Raza U, Camerra A, Murphy AL, Palpanas T, Picco GP (2012) What does model-driven data acquisition really achieve in wireless sensor networks? In: PERCOM
40.
go back to reference Ré C, Letchner J, Balazinska M, Suciu D (2008) Event queries on correlated probabilistic streams. In: ACM SIGMOD Ré C, Letchner J, Balazinska M, Suciu D (2008) Event queries on correlated probabilistic streams. In: ACM SIGMOD
41.
go back to reference Sarangi S, Murthy K (2010) DUST: a generalized notion of similarity between uncertain time series. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 383–392 Sarangi S, Murthy K (2010) DUST: a generalized notion of similarity between uncertain time series. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 383–392
42.
go back to reference Singh S, Mayfield C, Shah R, Prabhakar S, Hambrusch SE, Neville J, Cheng R (2008) Database support for probabilistic attributes and tuples. In: IEEE ICDE Singh S, Mayfield C, Shah R, Prabhakar S, Hambrusch SE, Neville J, Cheng R (2008) Database support for probabilistic attributes and tuples. In: IEEE ICDE
43.
go back to reference Sow D, Biem A, Blount M, Ebling M, Verscheure O (2010) Body sensor data processing using stream computing. In: MIR Sow D, Biem A, Blount M, Ebling M, Verscheure O (2010) Body sensor data processing using stream computing. In: MIR
44.
go back to reference Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 273–282 Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 273–282
45.
go back to reference Tran TT, Peng L, Diao Y, McGregor A, Liu A (2012) Claro: modeling and processing uncertain data streams. VLDB J Int J Very Large Data Bases 21(5):651–676CrossRef Tran TT, Peng L, Diao Y, McGregor A, Liu A (2012) Claro: modeling and processing uncertain data streams. VLDB J Int J Very Large Data Bases 21(5):651–676CrossRef
46.
go back to reference Tran TT, Peng L, Li B, Diao Y, Liu A (2010) Pods: a new model and processing algorithms for uncertain data streams. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 159–170 Tran TT, Peng L, Li B, Diao Y, Liu A (2010) Pods: a new model and processing algorithms for uncertain data streams. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 159–170
47.
go back to reference Wang L, Cheung D, Cheng R, Lee S, Yang X (2012) Efficient mining of frequent itemsets on large uncertain databases. IEEE Trans Knowl Data Eng 24(12):2170–2183 Wang L, Cheung D, Cheng R, Lee S, Yang X (2012) Efficient mining of frequent itemsets on large uncertain databases. IEEE Trans Knowl Data Eng 24(12):2170–2183
48.
go back to reference Wu KL, Yu PS, Gedik B, Hildrum K, Aggarwal CC, Bouillet E, Fan W, George D, Gu X, Luo G, Wang H (2007) Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on system. In: VLDB Wu KL, Yu PS, Gedik B, Hildrum K, Aggarwal CC, Bouillet E, Fan W, George D, Gu X, Luo G, Wang H (2007) Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on system. In: VLDB
49.
go back to reference Yeh M, Wu K, Yu P, Chen M (2009) PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 684–695 Yeh M, Wu K, Yu P, Chen M (2009) PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 684–695
50.
go back to reference Youssef M, Mah M, Agrawala A (2007) Challenges: device-free passive localization for wireless environments. In: ACM MOBICOM Youssef M, Mah M, Agrawala A (2007) Challenges: device-free passive localization for wireless environments. In: ACM MOBICOM
51.
go back to reference Zhang Q, Li F, Yi K (2008) Finding frequent items in probabilistic data. In: ACM SIGMOD Zhang Q, Li F, Yi K (2008) Finding frequent items in probabilistic data. In: ACM SIGMOD
52.
go back to reference Zhang W, Lin X, Zhang Y, Wang W, Yu JX (2009) Probabilistic skyline operator over sliding windows. In: IEEE ICDE Zhang W, Lin X, Zhang Y, Wang W, Yu JX (2009) Probabilistic skyline operator over sliding windows. In: IEEE ICDE
53.
go back to reference Zhou Z, Gupta H, Das SR, Zhu X (2007) Slotted scheduled tag access in multi-reader rfid systems. In: IEEE ICNP Zhou Z, Gupta H, Das SR, Zhu X (2007) Slotted scheduled tag access in multi-reader rfid systems. In: IEEE ICNP
Metadata
Title
Sliding windows over uncertain data streams
Authors
Michele Dallachiesa
Gabriela Jacques-Silva
Buğra Gedik
Kun-Lung Wu
Themis Palpanas
Publication date
01-10-2015
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 1/2015
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0804-5

Other articles of this Issue 1/2015

Knowledge and Information Systems 1/2015 Go to the issue

Premium Partner