research-article

Exploiting shared correlations in probabilistic databases

Authors:
Prithviraj Sen

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
Amol Deshpande

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
Lise Getoor

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

Proceedings of the VLDB Endowment Volume 1 Issue 1pp 809–820https://doi.org/10.14778/1453856.1453944

Published:01 August 2008Publication History

Proceedings of the VLDB Endowment

Abstract

There has been a recent surge in work in probabilistic databases, propelled in large part by the huge increase in noisy data sources --- from sensor data, experimental data, data from uncurated sources, and many others. There is a growing need for database management systems that can efficiently represent and query such data. In this work, we show how data characteristics can be leveraged to make the query evaluation process more efficient. In particular, we exploit what we refer to as shared correlations where the same uncertainties and correlations occur repeatedly in the data. Shared correlations occur mainly due to two reasons: (1) Uncertainty and correlations usually come from general statistics and rarely vary on a tuple-to-tuple basis; (2) The query evaluation procedure itself tends to re-introduce the same correlations. Prior work has shown that the query evaluation problem on probabilistic databases is equivalent to a probabilistic inference problem on an appropriately constructed probabilistic graphical model (PGM). We leverage this by introducing a new data structure, called the random variable elimination graph (rv-elim graph) that can be built from the PGM obtained from query evaluation. We develop techniques based on bisimulation that can be used to compress the rv-elim graph exploiting the presence of shared correlations in the PGM, the compressed rv-elim graph can then be used to run inference. We validate our methods by evaluating them empirically and show that even with a few shared correlations significant speed-ups are possible.

References

P. Andritsos, A. Fuxman, and R. J. Miller. Clean answers over dirty databases. In ICDE, 2006. Google ScholarDigital Library
S. Arnborg. Efficient algorithms for combinatorial problems on graphs with bounded decomposability - a survey. In BIT, 1985. Google ScholarDigital Library
O. Benjelloun, A. Das Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarDigital Library
H. C. Bravo and R. Ramakrishnan. Optimizing mpf queries: decision support and prob. inference. In SIGMOD, 2007. Google ScholarDigital Library
R. Cheng, D. Kalashnikov, and S. Prabhakar. Evaluating prob. queries over imprecise data. In SIGMOD, 2003. Google ScholarDigital Library
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004. Google ScholarDigital Library
A. Das Sarma, M. Theobald, and J. Widom. Exploiting lineage for confidence computation in uncertain and probabilistic databases. In ICDE, 2008. Google ScholarDigital Library
R. de Salvo Braz, E. Amir, and D. Roth. Lifted first-order probabilistic inference. In International Joint Conference on Artificial Inteligence, 2005. Google ScholarDigital Library
A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004. Google ScholarDigital Library
A. Dovier, C. Piazza, and A. Policriti. A fast bisimulation algorithm. In International Conference on Computer Aided Verification, 2001. Google ScholarDigital Library
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In International Joint Conference on Artificial Inteligence, 1999. Google ScholarDigital Library
N. Fuhr and T. Rolleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997. Google ScholarDigital Library
L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models with link uncertainty. Journal of Machine Learning Research, 2002. Google ScholarDigital Library
L. Getoor and B. Taskar, editors. Introduction to Statistical Relational Learning. MIT Press, 2007. Google ScholarDigital Library
R. Gupta and S. Sarawagi. Creating probabilistic databases from information extraction models. In VLDB, 2006. Google ScholarDigital Library
C. Huang and A. Darwiche. Inference in belief networks: A procedural guide. International Journal of Approximate Reasoning, 1994.Google Scholar
P. C. Kanellakis and S. A. Smolka. CCS expressions, finite state processes, and three problems of equivalence. In ACM Symposium on Principles of Distributed Computing, 1983. Google ScholarDigital Library
U. Kjaerulff. Triangulation of graphs - algorithms giving small total state space. Technical report, University of Aalborg, Denmark, 1990.Google Scholar
R. Paige and R. E. Tarjan. Three partition refinement algorithms. SIAM Journal of Computing, 1987. Google ScholarDigital Library
D. Poole. First-order probabilistic inference. In International Joint Conference on Artificial Inteligence, 2003. Google ScholarDigital Library
C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.Google ScholarCross Ref
C. Re and D. Suciu. Materialized views in probabilistic databases for information exchange and query optimization. In VLDB, 2007. Google ScholarDigital Library
I. Rish. Efficient Reasoning in Graphical Models. PhD thesis, University of California, Irvine, 1999. Google ScholarDigital Library
P. Sen and A. Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.Google ScholarCross Ref
S. Singh, C. Mayfield, S. Prabhakar, S. Hambrusch, and R. Shah. Indexing uncertain categorical data. In ICDE, 2007.Google ScholarCross Ref
S. Singh, C. Mayfield, R. Shah, S. Prabhakar, S. Hambrusch, J. Neville, and R. Cheng. Database support for probabilistic attributes and tuples. In ICDE, 2008. Google ScholarDigital Library
N. L. Zhang and D. Poole. A simple approach to bayesian network computations. In Canadian Conference on AI, 1994.Google Scholar

Index Terms

Recommendations

Probabilistic Databases
Read More
Probabilistic ranking for relational databases based on correlations
PIKM '10: Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management

This paper proposes a ranking method to exploit statistical correlations among pairs of attribute values in relational databases. For a given query, the correlations of the query are aggregated with each of the attribute values in a tuple to estimate ...
Read More
Probabilistic inverse ranking queries in uncertain databases

Query processing in the uncertain database has become increasingly important due to the wide existence of uncertain data in many real applications. Different from handling precise data, the uncertain query processing needs to consider the data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 1, Issue 1
August 2008
1216 pages
ISSN:2150-8097
Editors:
Peter Buneman,
Beng Chin Ooi,
Kenneth Ross,
Gerald Weber
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2008
Published in pvldb Volume 1, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 200
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting shared correlations in probabilistic databases

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Probabilistic Databases

Probabilistic ranking for relational databases based on correlations

Probabilistic inverse ranking queries in uncertain databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exploiting shared correlations in probabilistic databases

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Probabilistic Databases

Probabilistic ranking for relational databases based on correlations

Probabilistic inverse ranking queries in uncertain databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media