skip to main content
research-article

Conditioning and aggregating uncertain data streams: going beyond expectations

Published:01 September 2010Publication History
Skip Abstract Section

Abstract

Uncertain data streams are increasingly common in real-world deployments and monitoring applications require the evaluation of complex queries on such streams. In this paper, we consider complex queries involving conditioning (e.g., selections and group by's) and aggregation operations on uncertain data streams. To characterize the uncertainty of answers to these queries, one generally has to compute the full probability distribution of each operation used in the query. Computing distributions of aggregates given conditioned tuple distributions is a hard, unsolved problem. Our work employs a new evaluation framework that includes a general data model, approximation metrics, and approximate representations. Within this framework we design fast data-stream algorithms, both deterministic and randomized, for returning approximate distributions with bounded errors as answers to those complex queries. Our experimental results demonstrate the accuracy and efficiency of our approximation techniques and offer insights into the strengths and limitations of deterministic and randomized algorithms.

References

  1. P. Agrawal and J. Widom. Continuous uncertainty in trio. In MUD Workshop, 2009.Google ScholarGoogle Scholar
  2. A. Arasu et al. CQL: A language for continuous queries over streams and relations. In DBPL, pages 1--19, 2003.Google ScholarGoogle Scholar
  3. O. Benjelloun et al. ULDBs: Databases with uncertainty and lineage. In VLDB, pages 953--964, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Cormode and M. Garofalakis. Sketching probabilistic data streams. In SIGMOD, pages 281--292, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523--544, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. DasGupta. Asymptotic theory of statistics and probability. Springer Verlag, 2008.Google ScholarGoogle Scholar
  7. A. Deshpande et al. Model-driven data acquisition in sensor networks. In VLDB, pages 588--599, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Ge and S. B. Zdonik. Handling uncertain data in array database systems. In ICDE, pages 1140--1149, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Jampani et al. MCDB: a monte carlo approach to managing uncertain data. In SIGMOD, pages 687--700, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. S. Jayram et al. Efficient aggregation algorithms for probabilistic data. In SODA, pages 346--355, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. S. Jayram et al. Estimating statistical aggregates on probabilistic data streams. ACM Trans. Database Syst., 33(4), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Kanagal and A. Deshpande. Online filtering, smoothing and probabilistic modeling of streaming data. In ICDE, 1160--1169, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Kurose et al. An end-user-responsive sensor network architecture for hazardous weather detection. In AINTEC, pages 1--15, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. H. Lopes et al. The two-dimensional kolmogorov-smirnov test. In Proceedings of the XI International Workshop on Advanced Computing and Analysis Techniques in Physics Research, 2007.Google ScholarGoogle Scholar
  15. C. Ré and D. Suciu. The trichotomy of HAVING queries on a probabilistic database. The VLDB Journal, 18(5), 1091--1116, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Singh et al. Database support for probabilistic attributes and tuples. In ICDE, pages 1053--1061, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Suciu et al. Embracing uncertainty in large-scale computational astrophysics. In MUD Workshop, 2009.Google ScholarGoogle Scholar
  18. T. Tran et al. Conditioning and aggregating uncertain data streams: Going beyond expectations. Technical report 2010--026, UMass Amherst, 2010.Google ScholarGoogle Scholar
  19. T. Tran et al. PODS: A new model and processing algorithms for uncertain data streams. In SIGMOD, pages 159--170, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Tran et al. Probabilistic inference over RFID streams in mobile environments. In ICDE, pages 1096--1107, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Conditioning and aggregating uncertain data streams: going beyond expectations
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the VLDB Endowment
            Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
            September 2010
            1658 pages

            Publisher

            VLDB Endowment

            Publication History

            • Published: 1 September 2010
            Published in pvldb Volume 3, Issue 1-2

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader