ABSTRACT
Uncertain data streams, where data is incomplete, imprecise, and even misleading, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to monitoring applications. In this paper, we present the PODS system that supports stream processing for uncertain data naturally captured using continuous random variables. PODS employs a unique data model that is flexible and allows efficient computation. Built on this model, we develop evaluation techniques for complex relational operators, i.e., aggregates and joins, by exploring advanced statistical theory and approximation. Evaluation results show that our techniques can achieve high performance while satisfying accuracy requirements, and significantly outperform a state-of-the-art sampling method. A case study further shows that our techniques can enable a tornado detection system (for the first time) to produce detection results at stream speed and with much improved quality.
- P. Agrawal and J. Widom. Continuous uncertainty in Trio. In MUD, 2009.Google Scholar
- L. Antova, et al. Fast and simple relational processing of uncertain data. In ICDE, 983--992, 2008. Google ScholarDigital Library
- O. Benjelloun, et al. Uldbs: Databases with uncertainty and lineage. In VLDB, 953--964, 2006. Google ScholarDigital Library
- D. Carney, et al. Monitoring streams: a new class of data management applications. In VLDB, 215--226, 2002. Google ScholarDigital Library
- C. Casella and R. Berger. Statistical Inference. Duxbury, 2001.Google Scholar
- R. Cheng, et al. Evaluating probabilistic queries over imprecise data. In SIGMOD, 551--562, 2003. Google ScholarDigital Library
- G. Cormode and M. Garofalakis. Sketching probabilistic data streams. In SIGMOD, 281--292, 2007. Google ScholarDigital Library
- N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523--544, 2007. Google ScholarDigital Library
- A. Deshpande, et al. Model-driven data acquisition in sensor networks. In VLDB, 588--599, 2004. Google ScholarDigital Library
- A. Deshpande and S. Madden. MauveDB: supporting model-based user views in database systems. In SIGMOD, 73--84, 2006. Google ScholarDigital Library
- Y. Diao, et al. Capturing data uncertainty in high-volume stream processing. In CIDR, 2009.Google Scholar
- T. Ge and S. B. Zdonik. Handling uncertain data in array database systems. In ICDE, 1140--1149, 2008. Google ScholarDigital Library
- L. Girod, et al. Xstream: a signal-oriented data stream management system. In ICDE, 1180--1189, 2008. Google ScholarDigital Library
- C. Guestrin, et al. Distributed regression: an efficient framework for modeling sensor network data. In IPSN, 1--10, 2004. Google ScholarDigital Library
- R. Jampani, et al. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD, 687--700, 2008. Google ScholarDigital Library
- T. S. Jayram, et al. Estimating statistical aggregates on probabilistic data streams. In PODS, 243--252, 2007. Google ScholarDigital Library
- S. R. Jeffery, et al. Adaptive cleaning for RFID data streams. In VLDB, 163--174, 2006. Google ScholarDigital Library
- B. Kanagal and A. Deshpande. Online filtering, smoothing and probabilistic modeling of streaming data. In ICDE, 2008. Google ScholarDigital Library
- B. Kanagal and A. Deshpande. Efficient query evaluation over temporally correlated probabilistic streams. In ICDE, 2009. Google ScholarDigital Library
- J. F. Kurose, et al. An end-user-responsive sensor network architecture for hazardous weather detection, prediction and response. In AINTEC, 1--15, 2006. Google ScholarDigital Library
- G. McLachlan and D. Peel. Finite Mixture Models. Wiley-Interscience, 2000.Google ScholarCross Ref
- S. Fruhwirth-Schnatter. Finite Mixture and Markov Switching Models, Springer, 2006.Google Scholar
- T. Sauer. Numerical Analysis. Addison Wesley, 2005. Google ScholarDigital Library
- P. Sen, et al. Exploiting shared correlations in probabilistic databases. In VLDB, 809--820, 2008. Google ScholarDigital Library
- S. Singh, et al. Database support for probabilistic attributes and tuples. In ICDE, 1053--1061, 2008. Google ScholarDigital Library
- D. Suciu, et al. Embracing uncertainty in large-scale computational astrophysics. In MUD, 2009.Google Scholar
- A. Thiagarajan and S. Madden. Querying continuous functions in a database system. In SIGMOD, 791--804, 2008. Google ScholarDigital Library
- T. Tran, et al. PODS: A new model and processing algorithms for uncertain data streams. Technical report, UMass Amherst, 2009. http://www.cs.umass.edu/~ttran/pubs/pods.pdf.Google Scholar
- T. Tran, et al. Probabilistic inference over rfid streams in mobile environments. In ICDE, 1096--1107, 2009. Google ScholarDigital Library
- D. Z. Wang, et al. Bayesstore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, 340--351 2008. Google ScholarDigital Library
- N. Ye, editor. The Handbook of Data Mining. Lawrence Earlbaum Associates, 2003.Google ScholarCross Ref
Index Terms
- PODS: a new model and processing algorithms for uncertain data streams
Recommendations
CLARO: modeling and processing uncertain data streams
Uncertain data streams, where data are incomplete and imprecise, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to monitoring applications. ...
Learning accurate very fast decision trees from uncertain data streams
Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise ...
Efficient mining of high-speed uncertain data streams
Currently available algorithms for data streams classification are mostly designed to deal with precise and complete data. However, data in many real-life applications is naturally uncertain due to inherent instrument inaccuracy, wireless transmission ...
Comments