Skip to main content
Erschienen in: The VLDB Journal 4/2016

01.08.2016 | Regular Paper

Possible and certain keys for SQL

verfasst von: Henning Köhler, Uwe Leck, Sebastian Link, Xiaofang Zhou

Erschienen in: The VLDB Journal | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Driven by the dominance of the relational model and the requirements of modern applications, we revisit the fundamental notion of a key in relational databases with NULL. In SQL, primary key columns are NOT NULL, and UNIQUE constraints guarantee uniqueness only for tuples without NULL. We investigate the notions of possible and certain keys, which are keys that hold in some or all possible worlds that originate from an SQL table, respectively. Possible keys coincide with UNIQUE, thus providing a semantics for their syntactic definition in the SQL standard. Certain keys extend primary keys to include NULL columns and can uniquely identify entities whenever feasible, while primary keys may not. In addition to basic characterization, axiomatization, discovery, and extremal combinatorics problems, we investigate the existence and construction of Armstrong tables, and describe an indexing scheme for enforcing certain keys. Our experiments show that certain keys with NULLs occur in real-world data, and related computational problems can be solved efficiently. Certain keys are therefore semantically well founded and able to meet Codd’s entity integrity rule while handling high volumes of incomplete data from different formats.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The similar journal values of the first and second rows are not different by mistake: Our keys deal with the integrity dimension of data and not the accuracy dimension.
 
2
A partition of cardinality two.
 
3
w.r.t. the \(\wedge \)-support ordering \(\mathcal {W} \lesssim \mathcal {W}' :\Leftrightarrow \forall Y\in \mathcal {W} . \exists Y'\in \mathcal {W}' . Y\subseteq Y'\).
 
Literatur
1.
Zurück zum Zitat Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)CrossRef Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)CrossRef
2.
Zurück zum Zitat Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)MATH Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)MATH
3.
Zurück zum Zitat Aleksic, S., Celikovic, M., Link, S., Lukovic, I., Moginm, P.: Faceoff: Surrogate vs. natural keys. In: ADBIS, pp. 543–546 (2010) Aleksic, S., Celikovic, M., Link, S., Lukovic, I., Moginm, P.: Faceoff: Surrogate vs. natural keys. In: ADBIS, pp. 543–546 (2010)
4.
Zurück zum Zitat Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. J. ACM 31(1), 30–46 (1984)MathSciNetCrossRefMATH Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. J. ACM 31(1), 30–46 (1984)MathSciNetCrossRefMATH
5.
Zurück zum Zitat Biskup, J.: Security in Computing Systems - Challenges, Approaches and Solutions. Springer, New York (2009)MATH Biskup, J.: Security in Computing Systems - Challenges, Approaches and Solutions. Springer, New York (2009)MATH
6.
Zurück zum Zitat Brown, P., Link, S.: Probabilistic keys for data quality management. In: CAiSE, pp. 118–132 (2015) Brown, P., Link, S.: Probabilistic keys for data quality management. In: CAiSE, pp. 118–132 (2015)
7.
Zurück zum Zitat Calì, A., Calvanese, D., Lenzerini, M.: Data integration under integrity constraints. In: Seminal Contributions to Information, Systems Engineering, pp. 335–352 (2013) Calì, A., Calvanese, D., Lenzerini, M.: Data integration under integrity constraints. In: Seminal Contributions to Information, Systems Engineering, pp. 335–352 (2013)
8.
Zurück zum Zitat Calvanese, D., Fischl, W., Pichler, R., Sallinger, E., Simkus, M.: Capturing relational schemas and functional dependencies in RDFS. In: AAAI, pp. 1003–1011 (2014) Calvanese, D., Fischl, W., Pichler, R., Sallinger, E., Simkus, M.: Capturing relational schemas and functional dependencies in RDFS. In: AAAI, pp. 1003–1011 (2014)
9.
Zurück zum Zitat Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397–434 (1979)CrossRef Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397–434 (1979)CrossRef
10.
Zurück zum Zitat Codd, E.F.: The Relational Model for Database Management, Version 2. Addison-Wesley, Boston (1990)MATH Codd, E.F.: The Relational Model for Database Management, Version 2. Addison-Wesley, Boston (1990)MATH
11.
Zurück zum Zitat David, J., Lhote, L., Mary, A., Rioult, F.: An average study of hypergraphs and their minimal transversals. Theor. Comput. Sci. 596, 124–141 (2015)MathSciNetCrossRefMATH David, J., Lhote, L., Mary, A., Rioult, F.: An average study of hypergraphs and their minimal transversals. Theor. Comput. Sci. 596, 124–141 (2015)MathSciNetCrossRefMATH
13.
Zurück zum Zitat Eiter, T., Gottlob, G.: Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput. 24(6), 1278–1304 (1995)MathSciNetCrossRefMATH Eiter, T., Gottlob, G.: Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput. 24(6), 1278–1304 (1995)MathSciNetCrossRefMATH
15.
Zurück zum Zitat Fagin, R.: A normal form for relational databases that is based on domains and keys. ACM Trans. Database Syst. 6(3), 387–415 (1981)CrossRefMATH Fagin, R.: A normal form for relational databases that is based on domains and keys. ACM Trans. Database Syst. 6(3), 387–415 (1981)CrossRefMATH
17.
Zurück zum Zitat Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRefMATH Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRefMATH
18.
Zurück zum Zitat Fan, W., Geerts, F., Jia, X.: A revival of constraints for data cleaning. PVLDB 1(2), 1522–1523 (2008) Fan, W., Geerts, F., Jia, X.: A revival of constraints for data cleaning. PVLDB 1(2), 1522–1523 (2008)
19.
Zurück zum Zitat Hannula, M., Kontinen, J., Link, S.: On independence atoms and keys. In: CIKM, pp. 1229–1238 (2014) Hannula, M., Kontinen, J., Link, S.: On independence atoms and keys. In: CIKM, pp. 1229–1238 (2014)
20.
Zurück zum Zitat Hannula, M., Kontinen, J., Link, S.: On the finite and general implication problems of independence atoms and keys. J. Comput. Syst. Sci. 82(5), 856–877 (2016)MathSciNetCrossRefMATH Hannula, M., Kontinen, J., Link, S.: On the finite and general implication problems of independence atoms and keys. J. Comput. Syst. Sci. 82(5), 856–877 (2016)MathSciNetCrossRefMATH
21.
Zurück zum Zitat Hartmann, S., Kirchberg, M., Link, S.: Design by example for SQL table definitions with functional dependencies. VLDB J. 21(1), 121–144 (2012)CrossRef Hartmann, S., Kirchberg, M., Link, S.: Design by example for SQL table definitions with functional dependencies. VLDB J. 21(1), 121–144 (2012)CrossRef
22.
Zurück zum Zitat Hartmann, S., Leck, U., Link, S.: On Codd families of keys over incomplete relations. Comput. J. 54(7), 1166–1180 (2011)CrossRef Hartmann, S., Leck, U., Link, S.: On Codd families of keys over incomplete relations. Comput. J. 54(7), 1166–1180 (2011)CrossRef
23.
Zurück zum Zitat Hartmann, S., Link, S.: Efficient reasoning about a robust XML key fragment. ACM Trans. Database Syst. 34(2), 10 (2009)CrossRef Hartmann, S., Link, S.: Efficient reasoning about a robust XML key fragment. ACM Trans. Database Syst. 34(2), 10 (2009)CrossRef
24.
Zurück zum Zitat Hartmann, S., Link, S.: The implication problem of data dependencies over SQL table definitions. ACM Trans. Database Syst. 37(2), 13 (2012)CrossRef Hartmann, S., Link, S.: The implication problem of data dependencies over SQL table definitions. ACM Trans. Database Syst. 37(2), 13 (2012)CrossRef
25.
Zurück zum Zitat Heise, A., Quiane-Ruiz, J.A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. PVLDB 7(4), 301–312 (2013) Heise, A., Quiane-Ruiz, J.A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. PVLDB 7(4), 301–312 (2013)
26.
Zurück zum Zitat Ileana, I., Cautis, B., Deutsch, A., Katsis, Y.; Complete yet practical search for minimal query reformulations under constraints. In: SIGMOD, pp. 1015–1026 (2014) Ileana, I., Cautis, B., Deutsch, A., Katsis, Y.; Complete yet practical search for minimal query reformulations under constraints. In: SIGMOD, pp. 1015–1026 (2014)
28.
Zurück zum Zitat Jha, A.K., Rastogi, V., Suciu, D.: Query evaluation with soft keys. In: PODS, pp. 119–128 (2008) Jha, A.K., Rastogi, V., Suciu, D.: Query evaluation with soft keys. In: PODS, pp. 119–128 (2008)
29.
Zurück zum Zitat Köhler, H., Leck, U., Link, S., Prade, H.: Logical foundations of possibilistic keys. In: JELIA, pp. 181–195 (2014) Köhler, H., Leck, U., Link, S., Prade, H.: Logical foundations of possibilistic keys. In: JELIA, pp. 181–195 (2014)
30.
Zurück zum Zitat Köhler, H., Link, S.: Inclusion dependencies reloaded. In: CIKM, pp. 1361–1370 (2015) Köhler, H., Link, S.: Inclusion dependencies reloaded. In: CIKM, pp. 1361–1370 (2015)
32.
Zurück zum Zitat Köhler, H., Link, S., Zhou, X.: Possible and certain SQL keys. PVLDB 8(11), 1118–1129 (2015) Köhler, H., Link, S., Zhou, X.: Possible and certain SQL keys. PVLDB 8(11), 1118–1129 (2015)
33.
Zurück zum Zitat Koutris, P., Wijsen, J.: The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In: PODS, pp. 17–29 (2015) Koutris, P., Wijsen, J.: The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In: PODS, pp. 17–29 (2015)
34.
Zurück zum Zitat Levene, M., Loizou, G.: Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci. 206(1–2), 283–300 (1998)MathSciNetCrossRefMATH Levene, M., Loizou, G.: Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci. 206(1–2), 283–300 (1998)MathSciNetCrossRefMATH
35.
Zurück zum Zitat Levene, M., Loizou, G.: A generalisation of entity and referential integrity. ITA 35(2), 113–127 (2001)MathSciNetMATH Levene, M., Loizou, G.: A generalisation of entity and referential integrity. ITA 35(2), 113–127 (2001)MathSciNetMATH
37.
Zurück zum Zitat Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE TKDE 24(2), 251–264 (2012) Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE TKDE 24(2), 251–264 (2012)
38.
Zurück zum Zitat Mannila, H., Räihä, K.-J.: Design of Relational Databases. Addison-Wesley, Boston (1992)MATH Mannila, H., Räihä, K.-J.: Design of Relational Databases. Addison-Wesley, Boston (1992)MATH
39.
Zurück zum Zitat Marchi, F.D., Petit, J.: Semantic sampling of existing databases through informative armstrong databases. Inf. Syst. 32(3), 446–457 (2007)CrossRef Marchi, F.D., Petit, J.: Semantic sampling of existing databases through informative armstrong databases. Inf. Syst. 32(3), 446–457 (2007)CrossRef
40.
Zurück zum Zitat Melton, J.: ISO/IEC 9075-2: 2003 (SQL/foundation). ISO standard (2003) Melton, J.: ISO/IEC 9075-2: 2003 (SQL/foundation). ISO standard (2003)
41.
Zurück zum Zitat Memari, M., Link, S.: Index design for enforcing partial referential integrity efficiently. In: EDBT, pp. 217–228 (2015) Memari, M., Link, S.: Index design for enforcing partial referential integrity efficiently. In: EDBT, pp. 217–228 (2015)
42.
Zurück zum Zitat Memari, M., Link, S., Dobbie, G.: SQL data profiling of foreign keys. In: ER, pp. 229–243 (2015) Memari, M., Link, S., Dobbie, G.: SQL data profiling of foreign keys. In: ER, pp. 229–243 (2015)
43.
Zurück zum Zitat Naumann, F.: Data profiling revisited. SIGMOD Rec. 42(4), 40–49 (2013)CrossRef Naumann, F.: Data profiling revisited. SIGMOD Rec. 42(4), 40–49 (2013)CrossRef
45.
Zurück zum Zitat Pochampally, R., Sarma, A.D., Dong, X.L.: A. Meliou, and D. Srivastava. Fusing data with correlations. In: SIGMOD, pp. 433–444 (2014) Pochampally, R., Sarma, A.D., Dong, X.L.: A. Meliou, and D. Srivastava. Fusing data with correlations. In: SIGMOD, pp. 433–444 (2014)
46.
Zurück zum Zitat Ross, K. A., Srivastava, D., Sudarshan, S.: Materialized view maintenance and integrity constraint checking: Trading space for time. In: SIGMOD, pp. 447–458 (1996) Ross, K. A., Srivastava, D., Sudarshan, S.: Materialized view maintenance and integrity constraint checking: Trading space for time. In: SIGMOD, pp. 447–458 (1996)
47.
Zurück zum Zitat Saha, B., Srivastava, D.: Data quality: the other face of big data. In: ICDE, pp. 1294–1297 (2014) Saha, B., Srivastava, D.: Data quality: the other face of big data. In: ICDE, pp. 1294–1297 (2014)
48.
Zurück zum Zitat Schaefer, T.J.: The complexity of satisfiability problems. In: STOC, pp. 216–226 (1978) Schaefer, T.J.: The complexity of satisfiability problems. In: STOC, pp. 216–226 (1978)
49.
Zurück zum Zitat Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: Efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006) Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: Efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006)
51.
Zurück zum Zitat Thalheim, B.: On semantic issues connected with keys in relational databases permitting null values. Elektr. Inform. Kybern. 25(1/2), 11–20 (1989)MathSciNet Thalheim, B.: On semantic issues connected with keys in relational databases permitting null values. Elektr. Inform. Kybern. 25(1/2), 11–20 (1989)MathSciNet
Metadaten
Titel
Possible and certain keys for SQL
verfasst von
Henning Köhler
Uwe Leck
Sebastian Link
Xiaofang Zhou
Publikationsdatum
01.08.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 4/2016
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-016-0430-9

Weitere Artikel der Ausgabe 4/2016

The VLDB Journal 4/2016 Zur Ausgabe