skip to main content
column

Probabilistic databases

Published:01 June 2008Publication History
Skip Abstract Section

Abstract

Many applications today need to manage large data sets with uncertainties. In this paper we describe the foundations of managing data where the uncertainties are quantified as probabilities. We review the basic definitions of the probabilistic data model and present some fundamental theoretical results for query evaluation on probabilistic databases.

References

  1. S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In EDBT, pages 1059--1068, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Adams. A Primer of Probability Logic. CSLI Publications, Stanford, California, 1998.Google ScholarGoogle Scholar
  3. P. Andritsos, A. Fuxman, and R. J. Miller. Clean answers over dirty databases. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Antova, C. Koch, and D. Olteanu. 10^(10^6) worlds and beyond: Efficient representation and processing of incomplete information. In ICDE, 2007.Google ScholarGoogle Scholar
  5. L. Antova, C. Koch, and D. Olteanu. World-set decompositions: Expressiveness and efficient algorithms. In ICDT, pages 194--208, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Bacchus, A. Grove, J. Halpern, and D. Koller. From statistical knowledge bases to degrees of belief. Artificial Intelligence, 87(1--2):75--143, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. O. Benjelloun, A. Das Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, pages 953--964, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Borriello and F. Zhao. World-Wide Sensor Web: 2006 UWMSR Summer Institute Semiahmoo Resort, Blaine, WA, 2006. www.cs.washington.edu/mssi/2006/schedule.html.Google ScholarGoogle Scholar
  9. T. Choudhury, M. Philipose, D. Wyatt, and J. Lester. Towards activity databases: Using sensors and statistical models to summarize people's lives. IEEE Data Eng. Bull, 29(1):49--58, March 2006.Google ScholarGoogle Scholar
  10. E. F. Codd. A relational model for large shared databank. Communications of the ACM, 13(6):377--387, June 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Cooper. Computational complexity of probabilistic inference using bayesian belief networks (research note). Artificial Intelligence, 42:393--405, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Cowell, P. Dawid, S. Lauritzen, and D. Spiegelhalter, editors. Probabilistic Networks and Expert Systems. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Dagum and M. Luby. Approximating probabilistic inference in bayesian belief networks is NP-hard. Artificial Intelligence, 60:141--153, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Dalvi, C. Re, and D. Suciu. Query evaluation on probabilistic databases. IEEE Data Engineering Bulletin, 29(1):25--31, 2006.Google ScholarGoogle Scholar
  15. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, Toronto, Canada, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Dalvi and D. Suciu. Management of probabilistic data: Foundations and challenges. In PODS, pages 1--12, Beijing, China, 2007. (invited talk). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Das Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working models for uncertain data. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. de Rougemont. The reliability of queries. In PODS, pages 286--291, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, pages 588--599, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Using probabilistic models for data management in acquisitional environments. In CIDR, pages 317--328, 2005.Google ScholarGoogle Scholar
  21. O. Deux. The story of O2. IEEE Transactions on Knowledge and Data Engineering, 2(1):91--108, March 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen. Community information management. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Data Management, 29(1):64--72, March 2006.Google ScholarGoogle Scholar
  23. M. B. et al. Data management in the world-wide sensor web. IEEE Pervasive Computing, 2007. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Web-scale information extraction in KnowItAll: (preliminary results). In WWW, pages 100--110, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Grädel, Y. Gurevich, and C. Hirsch. The complexity of query reliability. In PODS, pages 227--234, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31--40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Green and V. Tannen. Models for incomplete and probabilistic information. IEEE Data Engineering Bulletin, 29(1):17--24, March 2006.Google ScholarGoogle Scholar
  28. R. Gupta and S. Sarawagi. Creating probabilistic databases from information extraction models. In VLDB, pages 965--976, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Halevy, A. Rajaraman, and J. Ordille. Data integration: The teenage years. In VLDB, pages 9--16, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Heckerman, Tutorial on graphical models, June 2002.Google ScholarGoogle Scholar
  31. E. Hung, L. Getoor, and V. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In ICDE, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  32. T. Imielinski and W. Lipski. Incomplete information in relational databases. Journal of the ACM, 31:761--791, October 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Engineering Bulletin, 29(1):40--48, 2006.Google ScholarGoogle Scholar
  34. R. Karp and M. Luby. Monte-Carlo algorithms for enumeration and reliability problems. In Proceedings of the annual ACM symposium on Theory of computing, 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. N. Khoussainova, M. Balazinska, and D. Suciu. Towards correcting input data errors probabilistically using integrity constraints. In MobiDB, pages 43--50, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. G. Kuper, L. Libkin, and J. P. (Eds.). Constraint Databases. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Lester, T. Choudhury, N. Kern, G. Borriello, and B. Hannaford. A hybrid discriminative/generative approach for modeling human activities. In IJCAI, pages 766--772, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Papadimitriou. Computational Complexity. Addison Wesley Publishing Company, 1994.Google ScholarGoogle Scholar
  39. S. Philippi and J. Kohler. Addressing the problems with life-science databases for traditional uses and systems biology. Nature Reviews Genetics, 7:481--488, June 2006.Google ScholarGoogle ScholarCross RefCross Ref
  40. J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4):777--788, 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Re, N. Dalvi, and D. Suciu. Efficient Top-k query evaluation on probabilistic data. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  42. C. Re and D. Suciu. Materialized views in probabilistic databases for information exchange and query optimization. In Proceedings of VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. C. Re and D. Suciu. Approximate lineage for probabilistic databases. Technical Report 2008-03-02, University of Washington, Seattle, WA, 2008.Google ScholarGoogle Scholar
  44. H.-J. Schek and M. H. Scholl. The relational model with relation-valued attributes. Information Systems, 11 (2):137--147, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. Snodgrass. The temporal query language TQUEL. ACM Transactions on Database Systems, 12(2):247--298, June 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. D. Suciu. An overview of semistructured data. SIGACT News, 29(4):28--38, December 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. L. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8:410--421, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In ICDE, pages 459--470, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Y. Vardi. The complexity of relational query languages. In Proceedings of 14th ACM SIGACT Symposium on the Theory of Computing, pages 137--146, San Francisco, California, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Probabilistic databases

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGACT News
            ACM SIGACT News  Volume 39, Issue 2
            June 2008
            117 pages
            ISSN:0163-5700
            DOI:10.1145/1388240
            Issue’s Table of Contents

            Copyright © 2008 Author

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 June 2008

            Check for updates

            Qualifiers

            • column

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader