skip to main content
10.1145/2463676.2465320acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Reverse engineering complex join queries

Published:22 June 2013Publication History

ABSTRACT

We study the following problem: Given a database D with schema G and an output table Out, compute a join query Q that generates OUT from D. A simpler variant allows Q to return a superset of Out. This problem has numerous applications, both by itself, and as a building block for other problems. Related prior work imposes conditions on the structure of Q which are not always consistent with the application, but simplify computation. We discuss several natural SQL queries that do not satisfy these conditions and cannot be discovered by prior work.

In this paper, we propose an efficient algorithm that discovers queries with arbitrary join graphs. A crucial insight is that any graph can be characterized by the combination of a simple structure, called a star, and a series of merge steps over the star. The merge steps define a lattice over graphs derived from the same star. This allows us to explore the set of candidate solutions in a principled way and quickly prune out a large number of infeasible graphs. We also design several optimizations that significantly reduce the running time. Finally, we conduct an extensive experimental study over a benchmark database and show that our approach is scalable and accurately discovers complex join queries.

References

  1. S. Agrawal, S. Chaudhuri, and G. Das. Dbxplorer: A system for keyword-based search over relational databases. In ICDE, pages 5--16, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In ICDE, pages 431--440, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Blunschi, C. Jossen, D. Kossmann, M. Mori, and K. Stockinger. Soda: Generating sql for business users. PVLDB, 5(10):932--943, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. He, H. Wang, J. Yang, and P. S. Yu. Blinks: Ranked keyword searches on graphs. In SIGMOD, pages 305--316, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Qian, M. J. Cafarella, and H. V. Jagadish. Sample-driven schema mapping. In SIGMOD, pages 73--84, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Qin, J. X. Yu, and L. Chang. Keyword search in databases: The power of rdbms. In SIGMOD, pages 681--694, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Qin, J. X. Yu, L. Chang, and Y. Tao. Querying communities in relational databases. In ICDE, pages 724--735, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. D. Sarma, A. G. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT, pages 89--103, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. TPC. TPC benchmarks. http://www.tpc.org/.Google ScholarGoogle Scholar
  10. Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. In SIGMOD, pages 535--548, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reverse engineering complex join queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
      June 2013
      1322 pages
      ISBN:9781450320375
      DOI:10.1145/2463676

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 June 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMOD '13 Paper Acceptance Rate76of372submissions,20%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader