skip to main content
10.1145/1516360.1516472acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free Access

Provenance for nested subqueries

Published:24 March 2009Publication History

ABSTRACT

Data provenance is essential in applications such as scientific computing, curated databases, and data warehouses. Several systems have been developed that provide provenance functionality for the relational data model. These systems support only a subset of SQL, a severe limitation in practice since most of the application domains that benefit from provenance information use complex queries. Such queries typically involve nested subqueries, aggregation and/or user defined functions. Without support for these constructs, a provenance management system is of limited use.

In this paper we address this limitation by exploring the problem of provenance derivation when complex queries are involved. More precisely, we demonstrate that the widely used definition of Why-provenance fails in the presence of nested subqueries, and show how the definition can be modified to produce meaningful results for nested subqueries. We further present query rewrite rules to transform an SQL query into a query propagating provenance. The solution introduced in this paper allows us to track provenance information for a far wider subset of SQL than any of the existing approaches. We have incorporated these ideas into the Perm provenance management system engine and used it to evaluate the feasibility and performance of our approach.

References

  1. M. O. Akinde et al. Efficient computation of subqueries in complex OLAP. ICDE '03, pages 163--174, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  2. P. Buneman et al. Why and where: A characterization of data provenance. In ICDT '01, pages 316--330, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Buneman et al. On the expressiveness of implicit provenance in query and update languages. TODS, 33(4), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Cao et al. SQL query optimization through nested relational algebra. TODS, 32(3):18, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Chaudhuri. An overview of query optimization in relational systems. PODS' 98, pages 34--43, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Chiticariu et al. Dbnotes: a post-it system for relational databases based on provenance. In SIGMOD '05, pages 942--944, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Cui et al. Tracing the lineage of view data in a warehousing environment. TODS, 25(2):179--227, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. U. Dayal. Processing queries with quantifiers a horticultural approach. PODS '83, pages 125--136, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Elhemali et al. Execution strategies for SQL subqueries. SIGMOD '07, pages 993--1004, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Geerts et al. MONDRIAN: Annotating and querying databases through colors and blocks. Technical Report EDIINFRR0243, The University of Edinburgh, 2005.Google ScholarGoogle Scholar
  11. B. Glavic et al. Data provenance: A categorization of existing approaches. In BTW '07, pages 227--241, 2007.Google ScholarGoogle Scholar
  12. B. Glavic et al. Perm: Processing provenance and data on the same data model through query rewriting. In ICDE '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Green et al. Provenance semirings. PODS '07, pages 31--40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. Kim. On Optimizing an SQL-like Nested Query. TODS, 7(3):443--469, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Momjian. PostgreSQL: introduction and concepts. Boston, MA: Addison-Wesley, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Muralikrishna et al. Improved Unnesting Algorithms for Join Aggregate SQL Queries. VLDB '92, pages 91--102, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Mutsuzaki et al. Trio-One: Layering uncertainty and lineage on a conventional DBMS. CIDR '07, pages 269--274, 2007.Google ScholarGoogle Scholar
  18. Y. L. Simmhan et al. A survey of data provenance in e-science. SIGMOD Rec., 34(3):31--36, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Tan et al. Provenance in Databases: Past, Current, and Future. IEEE Data Eng. Bull., 30(4):3--12, 2007.Google ScholarGoogle Scholar
  20. Transaction Processing Performance Council. TPC-H Benchmark Specification. http://www.tpc.org/hspec.html, 2008.Google ScholarGoogle Scholar
  1. Provenance for nested subqueries

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
        March 2009
        1180 pages
        ISBN:9781605584225
        DOI:10.1145/1516360

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 March 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate7of10submissions,70%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader