Abstract
Uncertain data streams are increasingly common in real-world deployments and monitoring applications require the evaluation of complex queries on such streams. In this paper, we consider complex queries involving conditioning (e.g., selections and group by's) and aggregation operations on uncertain data streams. To characterize the uncertainty of answers to these queries, one generally has to compute the full probability distribution of each operation used in the query. Computing distributions of aggregates given conditioned tuple distributions is a hard, unsolved problem. Our work employs a new evaluation framework that includes a general data model, approximation metrics, and approximate representations. Within this framework we design fast data-stream algorithms, both deterministic and randomized, for returning approximate distributions with bounded errors as answers to those complex queries. Our experimental results demonstrate the accuracy and efficiency of our approximation techniques and offer insights into the strengths and limitations of deterministic and randomized algorithms.
- P. Agrawal and J. Widom. Continuous uncertainty in trio. In MUD Workshop, 2009.Google Scholar
- A. Arasu et al. CQL: A language for continuous queries over streams and relations. In DBPL, pages 1--19, 2003.Google Scholar
- O. Benjelloun et al. ULDBs: Databases with uncertainty and lineage. In VLDB, pages 953--964, 2006. Google ScholarDigital Library
- G. Cormode and M. Garofalakis. Sketching probabilistic data streams. In SIGMOD, pages 281--292, 2007. Google ScholarDigital Library
- N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523--544, 2007. Google ScholarDigital Library
- A. DasGupta. Asymptotic theory of statistics and probability. Springer Verlag, 2008.Google Scholar
- A. Deshpande et al. Model-driven data acquisition in sensor networks. In VLDB, pages 588--599, 2004. Google ScholarDigital Library
- T. Ge and S. B. Zdonik. Handling uncertain data in array database systems. In ICDE, pages 1140--1149, 2008. Google ScholarDigital Library
- R. Jampani et al. MCDB: a monte carlo approach to managing uncertain data. In SIGMOD, pages 687--700, 2008. Google ScholarDigital Library
- T. S. Jayram et al. Efficient aggregation algorithms for probabilistic data. In SODA, pages 346--355, 2007. Google ScholarDigital Library
- T. S. Jayram et al. Estimating statistical aggregates on probabilistic data streams. ACM Trans. Database Syst., 33(4), 2008. Google ScholarDigital Library
- B. Kanagal and A. Deshpande. Online filtering, smoothing and probabilistic modeling of streaming data. In ICDE, 1160--1169, 2008. Google ScholarDigital Library
- J. Kurose et al. An end-user-responsive sensor network architecture for hazardous weather detection. In AINTEC, pages 1--15, 2006. Google ScholarDigital Library
- R. H. Lopes et al. The two-dimensional kolmogorov-smirnov test. In Proceedings of the XI International Workshop on Advanced Computing and Analysis Techniques in Physics Research, 2007.Google Scholar
- C. Ré and D. Suciu. The trichotomy of HAVING queries on a probabilistic database. The VLDB Journal, 18(5), 1091--1116, 2009. Google ScholarDigital Library
- S. Singh et al. Database support for probabilistic attributes and tuples. In ICDE, pages 1053--1061, 2008. Google ScholarDigital Library
- D. Suciu et al. Embracing uncertainty in large-scale computational astrophysics. In MUD Workshop, 2009.Google Scholar
- T. Tran et al. Conditioning and aggregating uncertain data streams: Going beyond expectations. Technical report 2010--026, UMass Amherst, 2010.Google Scholar
- T. Tran et al. PODS: A new model and processing algorithms for uncertain data streams. In SIGMOD, pages 159--170, 2010. Google ScholarDigital Library
- T. Tran et al. Probabilistic inference over RFID streams in mobile environments. In ICDE, pages 1096--1107, 2009. Google ScholarDigital Library
Index Terms
- Conditioning and aggregating uncertain data streams: going beyond expectations
Recommendations
Continuously monitoring top-k uncertain data streams: a probabilistic threshold method
Recently, uncertain data processing has become more and more important. Although a significant amount of previous research explores various continuous queries on data streams, continuous queries on uncertain data streams have seldom been investigated. ...
Sliding-window top-k queries on uncertain streams
Recently, due to the imprecise nature of the data generated from a variety of streaming applications, such as sensor networks, query processing on uncertain data streams has become an important problem. However, all the existing works on uncertain data ...
Similarity Join Processing on Uncertain Data Streams
Similarity join processing in the streaming environment has many practical applications such as sensor networks, object tracking and monitoring, and so on. Previous works usually assume that stream processing is conducted over precise data. In this ...
Comments