Article

Maximally joining probabilistic data

Authors:
Benny Kimelfeld

The Hebrew University of Jerusalem

The Hebrew University of Jerusalem
View Profile

,
Yehoshua Sagiv

The Hebrew University of Jerusalem

The Hebrew University of Jerusalem
View Profile

PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsJune 2007Pages 303–312https://doi.org/10.1145/1265530.1265572

Published:11 June 2007Publication History

PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Pages 303–312

ABSTRACT

Conceptually, the common approach to manipulating probabilistic data is to evaluate relational queries and then calculate the probability of each tuple in the result. This approach ignores the possibility that the probabilities of complete answers are too low and, hence, partial answers (with sufficiently high probabilities) become important. Therefore, we consider the semantics in which answers are maximal (i.e., have the smallest degree of incompleteness), subject tothe constraint that the probability is still above a given threshold.

We investigate the complexity of joining relations under the above semantics. In contrast to the deterministic case, this approach gives rise to two different enumeration problems. The first is finding all maximal sets of tuples that are join consistent, connected and have a joint probability above the threshold. The second is computing all maximal tuples that are answers of partial joins and have a probability above the threshold. Both problems are tractable under data complexity. We also consider query-and-data complexity, which rules out as efficient the following naive algorithm: compute all partial answers and then choose the maximal ones among those with probabilities above the threshold. We give efficient algorithms for several, important special cases. We also show that, in general, the first problem is NP-hard whereas the secondis #P-hard.

References

D. Barbará, H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE Trans. Knowl. Data Eng., 4(5), 1992. Google ScholarDigital Library
C. Beeri, R. Fagin, D. Maier, and M. Yannakakis. On the desirability of acyclic database schemes. J. ACM, 30(3), 1983. Google ScholarDigital Library
P. A. Bernstein and N. Goodman. Power of natural semijoins. SIAM J. Comput., 10(4), 1981.Google Scholar
R. Cavallo and M. Pittarelli. The theory of probabilistic databases. In P. M. Stocker, W. Kent, and P. Hammersley, editors, VLDB, 1987. Google ScholarDigital Library
S. Cohen, I. Fadida, Y. Kanza, B. Kimelfeld, and Y. Sagiv. Full disjunctions: Polynomial-delay iterators in action. In VLDB, 2006. Google ScholarDigital Library
S. Cohen and Y. Sagiv. An abstract framework for generating maximal answers to queries. In ICDT, 2005. Google ScholarDigital Library
S. Cohen and Y. Sagiv. An incremental algorithm for computing ranked full disjunctions. In PODS, 2005. Google ScholarDigital Library
S. Cohen and Y. Sagiv. An incremental algorithm for computing ranked full disjunctions. To appear in J. Comput. Syst. Sci. (an extended version of {7}), 2006. Google ScholarDigital Library
N. N. Dalvi, G. Miklau, and D. Suciu. Asymptotic conditional probabilities for conjunctive queries. In ICDT, 2005. Google ScholarDigital Library
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.Google ScholarDigital Library
D. Dey and S. Sarkar. A probabilistic relational model and algebra. ACM Trans. Database Syst., 21(3), 1996. Google ScholarDigital Library
R. Fagin. Degrees of acyclicity for hypergraphs and relational database schemes. J. ACM, 30(3), 1983. Google ScholarDigital Library
N. Fuhr and T. Rölleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst., 15(1), 1997. Google ScholarDigital Library
C. A. Galindo-Legaria. Outerjoins as disjunctions. In SIGMOD, 1994. Google ScholarDigital Library
D. Johnson, M. Yannakakis, and C. Papadimitriou. On generating all maximal independent sets. Information Processing Letters, 27, 1988. Google ScholarDigital Library
Y. Kanza, W. Nutt, and Y. Sagiv. Querying incomplete information in semistructured data. J. Comput. Syst. Sci., 64(3), 2002.Google Scholar
Y. Kanza and Y. Sagiv. Computing full disjunctions. In PODS, 2003. Google ScholarDigital Library
L. V. S. Lakshmanan, N. Leone, R. Ross, and V. S. Subrahmanian. ProbView: A flexible probabilistic database system. ACM Trans. Database Syst., 22(3), 1997. Google ScholarDigital Library
Y. E. Lien. On the equivalence of database models. J. ACM, 29(2), 1982. Google ScholarDigital Library
A. Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. In VLDB, 2002. Google ScholarDigital Library
J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4), 1983.Google ScholarDigital Library
A. Rajaraman and J. D. Ullman. Integrating information by outerjoins and full disjunctions. In PODS, 1996. Google ScholarDigital Library
S. Toda and M. Ogiwara. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput., 21(2), 1992. Google ScholarDigital Library
L. G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8, 1979.Google Scholar
M. Yannakakis. Algorithms for acyclic database schemes. In VLDB, 1981.Google ScholarDigital Library

Index Terms

Maximally joining probabilistic data
1. Information systems
  1. Data management systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Sensitivity analysis and explanations for robust query evaluation in probabilistic databases
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly ...
Read More
Consensus answers for queries over probabilistic databases
PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

We address the problem of finding a "best" deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes ...
Read More
The dichotomy of conjunctive queries on probabilistic structures
PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

We show that for every conjunctive query, the complexity of evaluating it on a probabilistic database is either PTIME or P-complete, and we give an algorithm for deciding whether a given conjunctive query is PTIME or P-complete. The dichotomy property ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2007
328 pages
ISBN:9781595936851
DOI:10.1145/1265530
General Chair:
Phokion Kolaitis
IBM Almaden
,
Program Chair:
Leonid Libkin
University of Edinburgh
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
maximal answers
natural join
probabilistic databases
query-and-data complexity
Qualifiers
- Article
Conference

Acceptance Rates
PODS '07 Paper Acceptance Rate28of187submissions,15%Overall Acceptance Rate642of2,707submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 340
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Maximally joining probabilistic data

PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sensitivity analysis and explanations for robust query evaluation in probabilistic databases

Consensus answers for queries over probabilistic databases

The dichotomy of conjunctive queries on probabilistic structures