ABSTRACT
We study the following problem: Given a database D with schema G and an output table Out, compute a join query Q that generates OUT from D. A simpler variant allows Q to return a superset of Out. This problem has numerous applications, both by itself, and as a building block for other problems. Related prior work imposes conditions on the structure of Q which are not always consistent with the application, but simplify computation. We discuss several natural SQL queries that do not satisfy these conditions and cannot be discovered by prior work.
In this paper, we propose an efficient algorithm that discovers queries with arbitrary join graphs. A crucial insight is that any graph can be characterized by the combination of a simple structure, called a star, and a series of merge steps over the star. The merge steps define a lattice over graphs derived from the same star. This allows us to explore the set of candidate solutions in a principled way and quickly prune out a large number of infeasible graphs. We also design several optimizations that significantly reduce the running time. Finally, we conduct an extensive experimental study over a benchmark database and show that our approach is scalable and accurately discovers complex join queries.
- S. Agrawal, S. Chaudhuri, and G. Das. Dbxplorer: A system for keyword-based search over relational databases. In ICDE, pages 5--16, 2002. Google ScholarDigital Library
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In ICDE, pages 431--440, 2002. Google ScholarDigital Library
- L. Blunschi, C. Jossen, D. Kossmann, M. Mori, and K. Stockinger. Soda: Generating sql for business users. PVLDB, 5(10):932--943, 2012. Google ScholarDigital Library
- H. He, H. Wang, J. Yang, and P. S. Yu. Blinks: Ranked keyword searches on graphs. In SIGMOD, pages 305--316, 2007. Google ScholarDigital Library
- L. Qian, M. J. Cafarella, and H. V. Jagadish. Sample-driven schema mapping. In SIGMOD, pages 73--84, 2012. Google ScholarDigital Library
- L. Qin, J. X. Yu, and L. Chang. Keyword search in databases: The power of rdbms. In SIGMOD, pages 681--694, 2009. Google ScholarDigital Library
- L. Qin, J. X. Yu, L. Chang, and Y. Tao. Querying communities in relational databases. In ICDE, pages 724--735, 2009. Google ScholarDigital Library
- A. D. Sarma, A. G. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT, pages 89--103, 2010. Google ScholarDigital Library
- TPC. TPC benchmarks. http://www.tpc.org/.Google Scholar
- Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. In SIGMOD, pages 535--548, 2009. Google ScholarDigital Library
Index Terms
- Reverse engineering complex join queries
Recommendations
Discovering queries based on example tuples
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataAn enterprise information worker is often aware of a few example tuples (but not the entire result) that should be present in the output of the query. We study the problem of discovering the minimal project join query that contains the given example ...
Learning Join Queries from User Examples
Special Issue: Invited 2014 PODS and EDBT Revised ArticlesWe investigate the problem of learning join queries from user examples. The user is presented with a set of candidate tuples and is asked to label them as positive or negative examples, depending on whether or not she would like the tuples as part of ...
Faster Join Enumeration for Complex Queries
ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data EngineeringMost existing join ordering algorithms concentrate on join queries with simple join predicates and inner joins only, where simple predicates are those that involve exactly two relations. However, real queries may contain complex join predicates, i.e. ...
Comments