skip to main content
10.1145/1807167.1807187acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

PODS: a new model and processing algorithms for uncertain data streams

Authors Info & Claims
Published:06 June 2010Publication History

ABSTRACT

Uncertain data streams, where data is incomplete, imprecise, and even misleading, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to monitoring applications. In this paper, we present the PODS system that supports stream processing for uncertain data naturally captured using continuous random variables. PODS employs a unique data model that is flexible and allows efficient computation. Built on this model, we develop evaluation techniques for complex relational operators, i.e., aggregates and joins, by exploring advanced statistical theory and approximation. Evaluation results show that our techniques can achieve high performance while satisfying accuracy requirements, and significantly outperform a state-of-the-art sampling method. A case study further shows that our techniques can enable a tornado detection system (for the first time) to produce detection results at stream speed and with much improved quality.

References

  1. P. Agrawal and J. Widom. Continuous uncertainty in Trio. In MUD, 2009.Google ScholarGoogle Scholar
  2. L. Antova, et al. Fast and simple relational processing of uncertain data. In ICDE, 983--992, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Benjelloun, et al. Uldbs: Databases with uncertainty and lineage. In VLDB, 953--964, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Carney, et al. Monitoring streams: a new class of data management applications. In VLDB, 215--226, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Casella and R. Berger. Statistical Inference. Duxbury, 2001.Google ScholarGoogle Scholar
  6. R. Cheng, et al. Evaluating probabilistic queries over imprecise data. In SIGMOD, 551--562, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cormode and M. Garofalakis. Sketching probabilistic data streams. In SIGMOD, 281--292, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523--544, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Deshpande, et al. Model-driven data acquisition in sensor networks. In VLDB, 588--599, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Deshpande and S. Madden. MauveDB: supporting model-based user views in database systems. In SIGMOD, 73--84, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Diao, et al. Capturing data uncertainty in high-volume stream processing. In CIDR, 2009.Google ScholarGoogle Scholar
  12. T. Ge and S. B. Zdonik. Handling uncertain data in array database systems. In ICDE, 1140--1149, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Girod, et al. Xstream: a signal-oriented data stream management system. In ICDE, 1180--1189, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Guestrin, et al. Distributed regression: an efficient framework for modeling sensor network data. In IPSN, 1--10, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Jampani, et al. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD, 687--700, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. S. Jayram, et al. Estimating statistical aggregates on probabilistic data streams. In PODS, 243--252, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. R. Jeffery, et al. Adaptive cleaning for RFID data streams. In VLDB, 163--174, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Kanagal and A. Deshpande. Online filtering, smoothing and probabilistic modeling of streaming data. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Kanagal and A. Deshpande. Efficient query evaluation over temporally correlated probabilistic streams. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. F. Kurose, et al. An end-user-responsive sensor network architecture for hazardous weather detection, prediction and response. In AINTEC, 1--15, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. McLachlan and D. Peel. Finite Mixture Models. Wiley-Interscience, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Fruhwirth-Schnatter. Finite Mixture and Markov Switching Models, Springer, 2006.Google ScholarGoogle Scholar
  23. T. Sauer. Numerical Analysis. Addison Wesley, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Sen, et al. Exploiting shared correlations in probabilistic databases. In VLDB, 809--820, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Singh, et al. Database support for probabilistic attributes and tuples. In ICDE, 1053--1061, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Suciu, et al. Embracing uncertainty in large-scale computational astrophysics. In MUD, 2009.Google ScholarGoogle Scholar
  27. A. Thiagarajan and S. Madden. Querying continuous functions in a database system. In SIGMOD, 791--804, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Tran, et al. PODS: A new model and processing algorithms for uncertain data streams. Technical report, UMass Amherst, 2009. http://www.cs.umass.edu/~ttran/pubs/pods.pdf.Google ScholarGoogle Scholar
  29. T. Tran, et al. Probabilistic inference over rfid streams in mobile environments. In ICDE, 1096--1107, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Z. Wang, et al. Bayesstore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, 340--351 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Ye, editor. The Handbook of Data Mining. Lawrence Earlbaum Associates, 2003.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. PODS: a new model and processing algorithms for uncertain data streams

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
      June 2010
      1286 pages
      ISBN:9781450300322
      DOI:10.1145/1807167

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 June 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader