Skip to main content
Erschienen in: The VLDB Journal 5/2014

01.10.2014 | Regular Paper

Query reverse engineering

verfasst von: Quoc Trung Tran, Chee-Yong Chan, Srinivasan Parthasarathy

Erschienen in: The VLDB Journal | Ausgabe 5/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we introduce a new problem termed query reverse engineering (QRE). Given a database \(D\) and a result table \(T\)—the output of some known or unknown query \(Q\) on \(D\)—the goal of QRE is to reverse-engineer a query \(Q'\) such that the output of query \(Q'\) on database \(D\) (denoted by \(Q'(D)\)) is equal to \(T\) (i.e., \(Q(D)\)). The QRE problem has useful applications in database usability, data analysis, and data security. In this work, we propose a data-driven approach, TALOS for Tree-based classifier with At Least One Semantics, that is based on a novel dynamic data classification formulation and extend the approach to efficiently support the three key dimensions of the QRE problem: whether the input query is known/unknown, supporting different query fragments, and supporting multiple database versions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
3
Note that even though the definition of a weak IEQ \(Q'\) of \(Q\) does not require the queries to share a set of core relations, we find this restriction to be a reasonable and effective way to obtain “good” IEQs.
 
4
If the search is for strong IEQs, then the discussion remains the same except that \(L\) is the ordered listing of the key attributes of a set of core relations \(S\) of \(Q\), and we replace \(Q(D)\) by \({Q}_{S}(D)\).
 
5
We also experimented with a scheme that randomly labels only one free tuple for each subset as positive, but its performance is worse than NI and RD.
 
6
To generate only precise IEQs, \(\tau = 0\). In our experiments, we set \(\tau = 1\) to derive a reasonable number of approximate IEQs.
 
7
Note that in the SQL implementation, null count values are first converted to zero values before being added.
 
10
We use \(gain,\,ms,\,edu,\,loss,\,nc,\,hpw\), and \(rs\), respectively, as abbreviations for capital gain, marital status, number of years of education, capital loss, native country, hours per week, and relationship.
 
11
As all the class-labeling schemes are based on the same framework to derive SPJU-IEQs which differ only in how they label free tuples to derive SPJ-IEQs that form the subqueries for SPJU-IEQs, we do not consider SPJU queries here. For SPJA queries, since the free tuples are assigned fixed class labels to satisfy the aggregation conditions before being classified, all the three schemes would have returned the same results for SPJA-IEQs; therefore, SPJA queries are not considered. For the TPC-H queries, it turns out that all the free tuples are bound, and therefore, all the three schemes return the same set of IEQs; therefore, we do not report results for TPC-H data sets.
 
12
We remove the \(comment\) attributes from each relation in TPC-H database.
 
13
These four relations are chosen because they appeared in the test queries \(T_1\) to \(T_4\).
 
14
Two queries \(Q\) and \(Q'\) are semantically equivalent if for every valid database \(D,\,Q(D) = Q'(D)\).
 
Literatur
1.
Zurück zum Zitat Andritsos, P., Miller, R.J., Tsaparas, P.: Information-theoretic tools for mining database structure from large data sets. In: SIGMOD (2004) Andritsos, P., Miller, R.J., Tsaparas, P.: Information-theoretic tools for mining database structure from large data sets. In: SIGMOD (2004)
2.
Zurück zum Zitat Arasu, A., Kaushik, R., Li, J.: Data generation using declarative constraints. In: SIGMOD Conference, pp. 685–696 (2011) Arasu, A., Kaushik, R., Li, J.: Data generation using declarative constraints. In: SIGMOD Conference, pp. 685–696 (2011)
3.
Zurück zum Zitat Binnig, C., Kossmann, D., Lo, E.: Reverse query processing. In: ICDE, pp. 506–515 (2007) Binnig, C., Kossmann, D., Lo, E.: Reverse query processing. In: ICDE, pp. 506–515 (2007)
4.
Zurück zum Zitat Binnig, C., Kossmann, D., Lo, E., Özsu, M.T.: QAGen: Generating query-aware test databases. In: SIGMOD Conference, pp. 341–352 (2007) Binnig, C., Kossmann, D., Lo, E., Özsu, M.T.: QAGen: Generating query-aware test databases. In: SIGMOD Conference, pp. 341–352 (2007)
5.
Zurück zum Zitat Blakeley, J.A., Larson, P.A., Tompa, F.W.: Efficiently updating materialized views. SIGMOD Rec. 15(2), 61–71 (1986)CrossRef Blakeley, J.A., Larson, P.A., Tompa, F.W.: Efficiently updating materialized views. SIGMOD Rec. 15(2), 61–71 (1986)CrossRef
6.
Zurück zum Zitat Brown, P.G., Haas, P.J.: BHUNT: Automatic discovery of fuzzy algebraic constraints in relational data. In: VLDB (2003) Brown, P.G., Haas, P.J.: BHUNT: Automatic discovery of fuzzy algebraic constraints in relational data. In: VLDB (2003)
7.
Zurück zum Zitat Bruno, N., Chaudhuri, S., Thomas, D.: Generating queries with cardinality constraints for dbms testing. IEEE TKDE 18(12), 1721–1725 (2006) Bruno, N., Chaudhuri, S., Thomas, D.: Generating queries with cardinality constraints for dbms testing. IEEE TKDE 18(12), 1721–1725 (2006)
8.
Zurück zum Zitat Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. McGraw-Hill Science, Boston (2001) Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. McGraw-Hill Science, Boston (2001)
9.
Zurück zum Zitat Gaasterland, T., Godfrey, P., Minker, J.: An overview of cooperative answering. J. Intell. Inf. Syst. 1(2), 123–157 (1992) Gaasterland, T., Godfrey, P., Minker, J.: An overview of cooperative answering. J. Intell. Inf. Syst. 1(2), 123–157 (1992)
10.
Zurück zum Zitat Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. In: SIGMOD Conference, pp. 461–472 (2001) Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. In: SIGMOD Conference, pp. 461–472 (2001)
11.
Zurück zum Zitat Godfrey, P., Gryz, J., Zuzarte, C.: Exploiting constraint-like data characterizations in query optimization. In: SIGMOD Conference, pp. 582–592 (2001) Godfrey, P., Gryz, J., Zuzarte, C.: Exploiting constraint-like data characterizations in query optimization. In: SIGMOD Conference, pp. 582–592 (2001)
12.
Zurück zum Zitat Johnson, T., Marathe, A., Dasu, T.: Database exploration and bellman. IEEE Data Eng. Bull. 26(3), 34–39 (2003) Johnson, T., Marathe, A., Dasu, T.: Database exploration and bellman. IEEE Data Eng. Bull. 26(3), 34–39 (2003)
13.
Zurück zum Zitat Lenzerini, M.: Data integration: A theoretical perspective. In: PODS, pp. 233–246 (2002) Lenzerini, M.: Data integration: A theoretical perspective. In: PODS, pp. 233–246 (2002)
14.
Zurück zum Zitat Lo, E., Cheng, N., Hon, W.K.: Generating databases for query workloads. Proc. VLDB Endow. 3(1), 848–859 (2010) Lo, E., Cheng, N., Hon, W.K.: Generating databases for query workloads. Proc. VLDB Endow. 3(1), 848–859 (2010)
15.
Zurück zum Zitat Malpani, A., Bernstein, P., Melnik, S., Terwilliger, J.: Reverse engineering models from databases to bootstrap application development. In: ICDE (2010) Malpani, A., Bernstein, P., Melnik, S., Terwilliger, J.: Reverse engineering models from databases to bootstrap application development. In: ICDE (2010)
16.
Zurück zum Zitat Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: EDBT, pp. 18–32 (1996) Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: EDBT, pp. 18–32 (1996)
17.
Zurück zum Zitat Mishra, C., Koudas, N., Zuzarte, C.: Generating targeted queries for database testing. In: SIGMOD Conference, pp. 499–510 (2008) Mishra, C., Koudas, N., Zuzarte, C.: Generating targeted queries for database testing. In: SIGMOD Conference, pp. 499–510 (2008)
18.
Zurück zum Zitat Motro, A.: Intensional answers to database queries. IEEE TKDE 6(3), 444–454 (1994) Motro, A.: Intensional answers to database queries. IEEE TKDE 6(3), 444–454 (1994)
19.
Zurück zum Zitat Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of data cubes and summary tables in a warehouse. SIGMOD Rec. 26(2), 100–111 (1997) Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of data cubes and summary tables in a warehouse. SIGMOD Rec. 26(2), 100–111 (1997)
20.
Zurück zum Zitat Petit, J.M., Toumani, F., Boulicaut, J.F., Kouloumdjian, J.: Towards the reverse engineering of denormalized relational databases. In: ICDE (1996) Petit, J.M., Toumani, F., Boulicaut, J.F., Kouloumdjian, J.: Towards the reverse engineering of denormalized relational databases. In: ICDE (1996)
21.
Zurück zum Zitat Tan, P.N., Kumar, M.V.: Introduction to Data Mining. Addison-Wesley, Reading, MA (2006) Tan, P.N., Kumar, M.V.: Introduction to Data Mining. Addison-Wesley, Reading, MA (2006)
22.
Zurück zum Zitat Ramakrishnan, N., Kumar, D., Mishra, B., Potts, M., Helm, R.F.: Turning cartwheels: An alternating algorithm for mining redescriptions. In: SIGKDD Conference, pp. 266–275 (2004) Ramakrishnan, N., Kumar, D., Mishra, B., Potts, M., Helm, R.F.: Turning cartwheels: An alternating algorithm for mining redescriptions. In: SIGKDD Conference, pp. 266–275 (2004)
23.
Zurück zum Zitat van Rijsbergen, C.J.: Information Retrieval. Butterworth, London (1979) van Rijsbergen, C.J.: Information Retrieval. Butterworth, London (1979)
24.
25.
Zurück zum Zitat Sarma, A.D., Parameswaran, A., Garcia-Molina, H., Widom, J.: Synthesizing view definitions from data. In: ICDT, pp. 89–103 (2010) Sarma, A.D., Parameswaran, A., Garcia-Molina, H., Widom, J.: Synthesizing view definitions from data. In: ICDT, pp. 89–103 (2010)
26.
Zurück zum Zitat Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: Efficient and scalable discovery of composite keys. In: VLDB (2006) Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: Efficient and scalable discovery of composite keys. In: VLDB (2006)
27.
Zurück zum Zitat Stonebraker, M.: The design of the postgres storage system. In: VLDB, pp. 289–300 (1987) Stonebraker, M.: The design of the postgres storage system. In: VLDB, pp. 289–300 (1987)
28.
Zurück zum Zitat Tran, Q.T., Chan, C.Y.: How to ConQueR why-not questions. In: SIGMOD Conference, pp. 15–26 (2010) Tran, Q.T., Chan, C.Y.: How to ConQueR why-not questions. In: SIGMOD Conference, pp. 15–26 (2010)
29.
Zurück zum Zitat Tran, Q.T., Chan, C.Y., Parthasarathy, S.: Query by output. In: SIGMOD Conference, pp. 535–548 (2009) Tran, Q.T., Chan, C.Y., Parthasarathy, S.: Query by output. In: SIGMOD Conference, pp. 535–548 (2009)
30.
Zurück zum Zitat Valduriez, P.: Join indices. ACM Trans. Database Syst. 12(2), 218–246 (1987)CrossRef Valduriez, P.: Join indices. ACM Trans. Database Syst. 12(2), 218–246 (1987)CrossRef
31.
Zurück zum Zitat Wu, W., Reinwald, B., Sismanis, Y., Manjrekar, R.: Discovering topical structures of databases. In: SIGMOD (2008) Wu, W., Reinwald, B., Sismanis, Y., Manjrekar, R.: Discovering topical structures of databases. In: SIGMOD (2008)
32.
Zurück zum Zitat Xiao, X., Tao, Y.: Output perturbation with query relaxation. Proc. VLDB Endow. 1(1), 857–869 (2008)CrossRef Xiao, X., Tao, Y.: Output perturbation with query relaxation. Proc. VLDB Endow. 1(1), 857–869 (2008)CrossRef
33.
Zurück zum Zitat Yin, X., Han, J., Yang, J., Yu, P.S.: Efficient classification across multiple database relations: A crossmine approach. TKDE 18, 770–783 (2006) Yin, X., Han, J., Yang, J., Yu, P.S.: Efficient classification across multiple database relations: A crossmine approach. TKDE 18, 770–783 (2006)
Metadaten
Titel
Query reverse engineering
verfasst von
Quoc Trung Tran
Chee-Yong Chan
Srinivasan Parthasarathy
Publikationsdatum
01.10.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 5/2014
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-013-0349-3

Weitere Artikel der Ausgabe 5/2014

The VLDB Journal 5/2014 Zur Ausgabe

Premium Partner