research-article

Reverse engineering complex join queries

Authors:
Meihui Zhang

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Hazem Elmeleegy

Turn, Inc., Redwood City, CA, USA

Turn, Inc., Redwood City, CA, USA
View Profile

,
Cecilia M. Procopiuc

AT&T Labs - Research, Florham Park, NJ, USA

AT&T Labs - Research, Florham Park, NJ, USA
View Profile

,
Divesh Srivastava

AT&T Labs - Research, Florham Park, NJ, USA

AT&T Labs - Research, Florham Park, NJ, USA
View Profile

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataJune 2013Pages 809–820https://doi.org/10.1145/2463676.2465320

Published:22 June 2013Publication History

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Pages 809–820

ABSTRACT

We study the following problem: Given a database D with schema G and an output table Out, compute a join query Q that generates OUT from D. A simpler variant allows Q to return a superset of Out. This problem has numerous applications, both by itself, and as a building block for other problems. Related prior work imposes conditions on the structure of Q which are not always consistent with the application, but simplify computation. We discuss several natural SQL queries that do not satisfy these conditions and cannot be discovered by prior work.

In this paper, we propose an efficient algorithm that discovers queries with arbitrary join graphs. A crucial insight is that any graph can be characterized by the combination of a simple structure, called a star, and a series of merge steps over the star. The merge steps define a lattice over graphs derived from the same star. This allows us to explore the set of candidate solutions in a principled way and quickly prune out a large number of infeasible graphs. We also design several optimizations that significantly reduce the running time. Finally, we conduct an extensive experimental study over a benchmark database and show that our approach is scalable and accurately discovers complex join queries.

References

S. Agrawal, S. Chaudhuri, and G. Das. Dbxplorer: A system for keyword-based search over relational databases. In ICDE, pages 5--16, 2002. Google ScholarDigital Library
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In ICDE, pages 431--440, 2002. Google ScholarDigital Library
L. Blunschi, C. Jossen, D. Kossmann, M. Mori, and K. Stockinger. Soda: Generating sql for business users. PVLDB, 5(10):932--943, 2012. Google ScholarDigital Library
H. He, H. Wang, J. Yang, and P. S. Yu. Blinks: Ranked keyword searches on graphs. In SIGMOD, pages 305--316, 2007. Google ScholarDigital Library
L. Qian, M. J. Cafarella, and H. V. Jagadish. Sample-driven schema mapping. In SIGMOD, pages 73--84, 2012. Google ScholarDigital Library
L. Qin, J. X. Yu, and L. Chang. Keyword search in databases: The power of rdbms. In SIGMOD, pages 681--694, 2009. Google ScholarDigital Library
L. Qin, J. X. Yu, L. Chang, and Y. Tao. Querying communities in relational databases. In ICDE, pages 724--735, 2009. Google ScholarDigital Library
A. D. Sarma, A. G. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT, pages 89--103, 2010. Google ScholarDigital Library
TPC. TPC benchmarks. http://www.tpc.org/.Google Scholar
Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. In SIGMOD, pages 535--548, 2009. Google ScholarDigital Library

Index Terms

Reverse engineering complex join queries
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Discovering queries based on example tuples
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

An enterprise information worker is often aware of a few example tuples (but not the entire result) that should be present in the output of the query. We study the problem of discovering the minimal project join query that contains the given example ...
Read More
Learning Join Queries from User Examples
Special Issue: Invited 2014 PODS and EDBT Revised Articles

We investigate the problem of learning join queries from user examples. The user is presented with a set of candidate tuples and is asked to label them as positive or negative examples, depending on whether or not she would like the tuples as part of ...
Read More
Faster Join Enumeration for Complex Queries
ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Most existing join ordering algorithms concentrate on join queries with simple join predicates and inner joins only, where simple predicates are those that involve exactly two relations. However, real queries may contain complex join predicates, i.e. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
June 2013
1322 pages
ISBN:9781450320375
DOI:10.1145/2463676
General Chairs:
Kenneth Ross
Columbia University
,
Divesh Srivastava
AT&T Research
,
Program Chair:
Dimitris Papadias
HKUST
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
query join graph
query lattice
sql query discovery
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '13 Paper Acceptance Rate76of372submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 702
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reverse engineering complex join queries

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Discovering queries based on example tuples

Learning Join Queries from User Examples

Faster Join Enumeration for Complex Queries