Abstract
Many applications today need to manage large data sets with uncertainties. In this paper we describe the foundations of managing data where the uncertainties are quantified as probabilities. We review the basic definitions of the probabilistic data model and present some fundamental theoretical results for query evaluation on probabilistic databases.
- S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In EDBT, pages 1059--1068, 2006. Google ScholarDigital Library
- E. Adams. A Primer of Probability Logic. CSLI Publications, Stanford, California, 1998.Google Scholar
- P. Andritsos, A. Fuxman, and R. J. Miller. Clean answers over dirty databases. In ICDE, 2006. Google ScholarDigital Library
- L. Antova, C. Koch, and D. Olteanu. 10^(10^6) worlds and beyond: Efficient representation and processing of incomplete information. In ICDE, 2007.Google Scholar
- L. Antova, C. Koch, and D. Olteanu. World-set decompositions: Expressiveness and efficient algorithms. In ICDT, pages 194--208, 2007. Google ScholarDigital Library
- F. Bacchus, A. Grove, J. Halpern, and D. Koller. From statistical knowledge bases to degrees of belief. Artificial Intelligence, 87(1--2):75--143, 1996. Google ScholarDigital Library
- O. Benjelloun, A. Das Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, pages 953--964, 2006. Google ScholarDigital Library
- G. Borriello and F. Zhao. World-Wide Sensor Web: 2006 UWMSR Summer Institute Semiahmoo Resort, Blaine, WA, 2006. www.cs.washington.edu/mssi/2006/schedule.html.Google Scholar
- T. Choudhury, M. Philipose, D. Wyatt, and J. Lester. Towards activity databases: Using sensors and statistical models to summarize people's lives. IEEE Data Eng. Bull, 29(1):49--58, March 2006.Google Scholar
- E. F. Codd. A relational model for large shared databank. Communications of the ACM, 13(6):377--387, June 1970. Google ScholarDigital Library
- G. Cooper. Computational complexity of probabilistic inference using bayesian belief networks (research note). Artificial Intelligence, 42:393--405, 1990. Google ScholarDigital Library
- R. Cowell, P. Dawid, S. Lauritzen, and D. Spiegelhalter, editors. Probabilistic Networks and Expert Systems. Springer, 1999. Google ScholarDigital Library
- P. Dagum and M. Luby. Approximating probabilistic inference in bayesian belief networks is NP-hard. Artificial Intelligence, 60:141--153, 1993. Google ScholarDigital Library
- N. Dalvi, C. Re, and D. Suciu. Query evaluation on probabilistic databases. IEEE Data Engineering Bulletin, 29(1):25--31, 2006.Google Scholar
- N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, Toronto, Canada, 2004. Google ScholarDigital Library
- N. Dalvi and D. Suciu. Management of probabilistic data: Foundations and challenges. In PODS, pages 1--12, Beijing, China, 2007. (invited talk). Google ScholarDigital Library
- A. Das Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working models for uncertain data. In ICDE, 2006. Google ScholarDigital Library
- M. de Rougemont. The reliability of queries. In PODS, pages 286--291, 1995. Google ScholarDigital Library
- A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, pages 588--599, 2004. Google ScholarDigital Library
- A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Using probabilistic models for data management in acquisitional environments. In CIDR, pages 317--328, 2005.Google Scholar
- O. Deux. The story of O2. IEEE Transactions on Knowledge and Data Engineering, 2(1):91--108, March 1990. Google ScholarDigital Library
- A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen. Community information management. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Data Management, 29(1):64--72, March 2006.Google Scholar
- M. B. et al. Data management in the world-wide sensor web. IEEE Pervasive Computing, 2007. To appear. Google ScholarDigital Library
- O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Web-scale information extraction in KnowItAll: (preliminary results). In WWW, pages 100--110, 2004. Google ScholarDigital Library
- E. Grädel, Y. Gurevich, and C. Hirsch. The complexity of query reliability. In PODS, pages 227--234, 1998. Google ScholarDigital Library
- T. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31--40, 2007. Google ScholarDigital Library
- T. Green and V. Tannen. Models for incomplete and probabilistic information. IEEE Data Engineering Bulletin, 29(1):17--24, March 2006.Google Scholar
- R. Gupta and S. Sarawagi. Creating probabilistic databases from information extraction models. In VLDB, pages 965--976, 2006. Google ScholarDigital Library
- A. Halevy, A. Rajaraman, and J. Ordille. Data integration: The teenage years. In VLDB, pages 9--16, 2006. Google ScholarDigital Library
- D. Heckerman, Tutorial on graphical models, June 2002.Google Scholar
- E. Hung, L. Getoor, and V. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In ICDE, 2003.Google ScholarCross Ref
- T. Imielinski and W. Lipski. Incomplete information in relational databases. Journal of the ACM, 31:761--791, October 1984. Google ScholarDigital Library
- T. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Engineering Bulletin, 29(1):40--48, 2006.Google Scholar
- R. Karp and M. Luby. Monte-Carlo algorithms for enumeration and reliability problems. In Proceedings of the annual ACM symposium on Theory of computing, 1983.Google ScholarDigital Library
- N. Khoussainova, M. Balazinska, and D. Suciu. Towards correcting input data errors probabilistically using integrity constraints. In MobiDB, pages 43--50, 2006. Google ScholarDigital Library
- G. Kuper, L. Libkin, and J. P. (Eds.). Constraint Databases. Springer, 2000. Google ScholarDigital Library
- J. Lester, T. Choudhury, N. Kern, G. Borriello, and B. Hannaford. A hybrid discriminative/generative approach for modeling human activities. In IJCAI, pages 766--772, 2005. Google ScholarDigital Library
- C. Papadimitriou. Computational Complexity. Addison Wesley Publishing Company, 1994.Google Scholar
- S. Philippi and J. Kohler. Addressing the problems with life-science databases for traditional uses and systems biology. Nature Reviews Genetics, 7:481--488, June 2006.Google ScholarCross Ref
- J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4):777--788, 1983.Google ScholarDigital Library
- C. Re, N. Dalvi, and D. Suciu. Efficient Top-k query evaluation on probabilistic data. In ICDE, 2007.Google ScholarCross Ref
- C. Re and D. Suciu. Materialized views in probabilistic databases for information exchange and query optimization. In Proceedings of VLDB, 2007. Google ScholarDigital Library
- C. Re and D. Suciu. Approximate lineage for probabilistic databases. Technical Report 2008-03-02, University of Washington, Seattle, WA, 2008.Google Scholar
- H.-J. Schek and M. H. Scholl. The relational model with relation-valued attributes. Information Systems, 11 (2):137--147, 1986. Google ScholarDigital Library
- R. Snodgrass. The temporal query language TQUEL. ACM Transactions on Database Systems, 12(2):247--298, June 1987. Google ScholarDigital Library
- D. Suciu. An overview of semistructured data. SIGACT News, 29(4):28--38, December 1998. Google ScholarDigital Library
- L. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8:410--421, 1979.Google ScholarDigital Library
- M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In ICDE, pages 459--470, 2005. Google ScholarDigital Library
- M. Y. Vardi. The complexity of relational query languages. In Proceedings of 14th ACM SIGACT Symposium on the Theory of Computing, pages 137--146, San Francisco, California, 1982. Google ScholarDigital Library
Index Terms
- Probabilistic databases
Recommendations
Probabilistic Databases for All
PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsIn probabilistic databases the data is uncertain and is modeled by a probability distribution. The central problem in probabilistic databases is query evaluation, which requires performing not only traditional data processing such as joins, projections, ...
Comments