skip to main content
10.1145/1265530.1265572acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Maximally joining probabilistic data

Published:11 June 2007Publication History

ABSTRACT

Conceptually, the common approach to manipulating probabilistic data is to evaluate relational queries and then calculate the probability of each tuple in the result. This approach ignores the possibility that the probabilities of complete answers are too low and, hence, partial answers (with sufficiently high probabilities) become important. Therefore, we consider the semantics in which answers are maximal (i.e., have the smallest degree of incompleteness), subject tothe constraint that the probability is still above a given threshold.

We investigate the complexity of joining relations under the above semantics. In contrast to the deterministic case, this approach gives rise to two different enumeration problems. The first is finding all maximal sets of tuples that are join consistent, connected and have a joint probability above the threshold. The second is computing all maximal tuples that are answers of partial joins and have a probability above the threshold. Both problems are tractable under data complexity. We also consider query-and-data complexity, which rules out as efficient the following naive algorithm: compute all partial answers and then choose the maximal ones among those with probabilities above the threshold. We give efficient algorithms for several, important special cases. We also show that, in general, the first problem is NP-hard whereas the secondis #P-hard.

References

  1. D. Barbará, H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE Trans. Knowl. Data Eng., 4(5), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Beeri, R. Fagin, D. Maier, and M. Yannakakis. On the desirability of acyclic database schemes. J. ACM, 30(3), 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. A. Bernstein and N. Goodman. Power of natural semijoins. SIAM J. Comput., 10(4), 1981.Google ScholarGoogle Scholar
  4. R. Cavallo and M. Pittarelli. The theory of probabilistic databases. In P. M. Stocker, W. Kent, and P. Hammersley, editors, VLDB, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Cohen, I. Fadida, Y. Kanza, B. Kimelfeld, and Y. Sagiv. Full disjunctions: Polynomial-delay iterators in action. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Cohen and Y. Sagiv. An abstract framework for generating maximal answers to queries. In ICDT, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Cohen and Y. Sagiv. An incremental algorithm for computing ranked full disjunctions. In PODS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Cohen and Y. Sagiv. An incremental algorithm for computing ranked full disjunctions. To appear in J. Comput. Syst. Sci. (an extended version of {7}), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. N. Dalvi, G. Miklau, and D. Suciu. Asymptotic conditional probabilities for conjunctive queries. In ICDT, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Dey and S. Sarkar. A probabilistic relational model and algebra. ACM Trans. Database Syst., 21(3), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Fagin. Degrees of acyclicity for hypergraphs and relational database schemes. J. ACM, 30(3), 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Fuhr and T. Rölleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst., 15(1), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. A. Galindo-Legaria. Outerjoins as disjunctions. In SIGMOD, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Johnson, M. Yannakakis, and C. Papadimitriou. On generating all maximal independent sets. Information Processing Letters, 27, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Kanza, W. Nutt, and Y. Sagiv. Querying incomplete information in semistructured data. J. Comput. Syst. Sci., 64(3), 2002.Google ScholarGoogle Scholar
  17. Y. Kanza and Y. Sagiv. Computing full disjunctions. In PODS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. V. S. Lakshmanan, N. Leone, R. Ross, and V. S. Subrahmanian. ProbView: A flexible probabilistic database system. ACM Trans. Database Syst., 22(3), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. E. Lien. On the equivalence of database models. J. ACM, 29(2), 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4), 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Rajaraman and J. D. Ullman. Integrating information by outerjoins and full disjunctions. In PODS, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Toda and M. Ogiwara. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput., 21(2), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8, 1979.Google ScholarGoogle Scholar
  25. M. Yannakakis. Algorithms for acyclic database schemes. In VLDB, 1981.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Maximally joining probabilistic data

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
            June 2007
            328 pages
            ISBN:9781595936851
            DOI:10.1145/1265530

            Copyright © 2007 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 June 2007

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            PODS '07 Paper Acceptance Rate28of187submissions,15%Overall Acceptance Rate642of2,707submissions,24%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader