Skip to main content
Erschienen in: The VLDB Journal 6/2021

28.07.2021 | Regular Paper

Algorithms for the discovery of embedded functional dependencies

verfasst von: Ziheng Wei, Sven Hartmann, Sebastian Link

Erschienen in: The VLDB Journal | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Embedded functional dependencies (eFDs) advance data management applications by data completeness and integrity requirements. We show that the discovery problem of eFDs is \({\mathsf {NP}}\)-complete, \(\mathsf {W}[2]\)-complete in the output, and has a minimum solution space that is larger than the maximum solution space for functional dependencies. Nevertheless, we use novel data structures and search strategies to develop row-efficient, column-efficient, and hybrid algorithms for eFD discovery. Our experiments demonstrate that the algorithms scale well in terms of their design targets, and that ranking the eFDs by the number of redundant data values they cause can provide useful guidance in identifying meaningful eFDs for applications. We further demonstrate the benefits of introducing completeness requirements and ranking by the number of redundant data values for other variants of functional dependencies. Finally, we show how to compute informative Armstrong samples and illustrate the performance of our algorithms on the benchmark data. The informative Armstrong samples can be used to find eFDs that are meaningful for the application domain but violated by a given data set due to inconsistencies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Abedjan, Z., Golab, L., Naumann, F., Papenbrock, T.: Data Profiling. Synthesis Lectures on Data Management. Morgan & Claypool, New York (2018) Abedjan, Z., Golab, L., Naumann, F., Papenbrock, T.: Data Profiling. Synthesis Lectures on Data Management. Morgan & Claypool, New York (2018)
2.
Zurück zum Zitat Abedjan, Z., Schulze, P., Naumann, F.: DFD: efficient functional dependency discovery. In: CIKM, pp. 949–958 (2014) Abedjan, Z., Schulze, P., Naumann, F.: DFD: efficient functional dependency discovery. In: CIKM, pp. 949–958 (2014)
3.
Zurück zum Zitat Berti-Équille, L., Harmouch, H., Naumann, F., Novelli, N., Thirumuruganathan, S.: Discovery of genuine functional dependencies from relational data with missing values. PVLDB 11(8), 880–892 (2018) Berti-Équille, L., Harmouch, H., Naumann, F., Novelli, N., Thirumuruganathan, S.: Discovery of genuine functional dependencies from relational data with missing values. PVLDB 11(8), 880–892 (2018)
4.
Zurück zum Zitat Bläsius, T., Friedrich, T., Schirneck, M.: The parameterized complexity of dependency detection in relational databases. In: LIPIcs-Leibniz International Proceedings in Informatics, volume 63. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2017) Bläsius, T., Friedrich, T., Schirneck, M.: The parameterized complexity of dependency detection in relational databases. In: LIPIcs-Leibniz International Proceedings in Informatics, volume 63. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2017)
5.
Zurück zum Zitat Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE, pp. 746–755 (2007) Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE, pp. 746–755 (2007)
6.
Zurück zum Zitat Bravo, L., Fan, W., Geerts, F., Ma, S.: Increasing the expressivity of conditional functional dependencies without extra complexity. In: ICDE, pp. 516–525 (2008) Bravo, L., Fan, W., Geerts, F., Ma, S.: Increasing the expressivity of conditional functional dependencies without extra complexity. In: ICDE, pp. 516–525 (2008)
7.
Zurück zum Zitat Caruccio, L., Deufemia, V., Polese, G.: Relaxed functional dependencies—a survey of approaches. IEEE Trans. Knowl. Data Eng. 28(1), 147–165 (2016)CrossRef Caruccio, L., Deufemia, V., Polese, G.: Relaxed functional dependencies—a survey of approaches. IEEE Trans. Knowl. Data Eng. 28(1), 147–165 (2016)CrossRef
8.
Zurück zum Zitat Demetrovics, J., Katona, G.O.H., Miklós, D., Thalheim, B.: On the number of independent functional dependencies. In: FoIKS, pp. 83–91 (2006) Demetrovics, J., Katona, G.O.H., Miklós, D., Thalheim, B.: On the number of independent functional dependencies. In: FoIKS, pp. 83–91 (2006)
9.
Zurück zum Zitat Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 6:1–6:48 (2008)CrossRef Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 6:1–6:48 (2008)CrossRef
10.
Zurück zum Zitat Fan, W., Geerts, F., Lakshmanan, L.V.S., Xiong, M.: Discovering conditional functional dependencies. In: ICDE, pp. 1231–1234 (2009) Fan, W., Geerts, F., Lakshmanan, L.V.S., Xiong, M.: Discovering conditional functional dependencies. In: ICDE, pp. 1231–1234 (2009)
11.
Zurück zum Zitat Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Eng. 23(5), 683–698 (2011)CrossRef Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Eng. 23(5), 683–698 (2011)CrossRef
12.
Zurück zum Zitat Flach, P.A., Savnik, I.: Database dependency discovery. AI Commun. 12(3), 139–160 (1999)MathSciNet Flach, P.A., Savnik, I.: Database dependency discovery. AI Commun. 12(3), 139–160 (1999)MathSciNet
13.
Zurück zum Zitat Gallier, J.: Discrete Mathematics. Universitext. Springer, New York (2011)CrossRef Gallier, J.: Discrete Mathematics. Universitext. Springer, New York (2011)CrossRef
15.
Zurück zum Zitat Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)CrossRef Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)CrossRef
16.
Zurück zum Zitat Kruse, S., Naumann, F.: Efficient discovery of approximate dependencies. PVLDB 11(7), 759–772 (2018) Kruse, S., Naumann, F.: Efficient discovery of approximate dependencies. PVLDB 11(7), 759–772 (2018)
17.
Zurück zum Zitat Link, S., Wei, Z.: Logical schema design that quantifies update inefficiency and join efficiency. In: SIGMOD, pp. 1169–1181 (2021) Link, S., Wei, Z.: Logical schema design that quantifies update inefficiency and join efficiency. In: SIGMOD, pp. 1169–1181 (2021)
18.
Zurück zum Zitat Lopes, S., Petit, J., Lakhal, L.: Efficient discovery of functional dependencies and Armstrong relations. In: EDBT, pp. 350–364 (2000) Lopes, S., Petit, J., Lakhal, L.: Efficient discovery of functional dependencies and Armstrong relations. In: EDBT, pp. 350–364 (2000)
19.
Zurück zum Zitat Mannila, H., Räihä, K.: Design by example: an application of Armstrong relations. J. Comput. Syst. Sci. 33(2), 126–141 (1986)MathSciNetCrossRef Mannila, H., Räihä, K.: Design by example: an application of Armstrong relations. J. Comput. Syst. Sci. 33(2), 126–141 (1986)MathSciNetCrossRef
20.
Zurück zum Zitat Mannila, H., Räihä, K.: Dependency inference. In: VLDB, pp. 155–158 (1987) Mannila, H., Räihä, K.: Dependency inference. In: VLDB, pp. 155–158 (1987)
21.
Zurück zum Zitat Marchi, F.D., Petit, J.: Semantic sampling of existing databases through informative Armstrong databases. Inf. Syst. 32(3), 446–457 (2007)CrossRef Marchi, F.D., Petit, J.: Semantic sampling of existing databases through informative Armstrong databases. Inf. Syst. 32(3), 446–457 (2007)CrossRef
22.
Zurück zum Zitat Novelli, N., Cicchetti, R.: Functional and embedded dependency inference: a data mining point of view. Inf. Syst. 26(7), 477–506 (2001) Novelli, N., Cicchetti, R.: Functional and embedded dependency inference: a data mining point of view. Inf. Syst. 26(7), 477–506 (2001)
23.
Zurück zum Zitat Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. PVLDB 8(10), 1082–1093 (2015) Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. PVLDB 8(10), 1082–1093 (2015)
24.
Zurück zum Zitat Papenbrock, T., Naumann, F.: A hybrid approach to functional dependency discovery. In: SIGMOD, pp. 821–833 (2016) Papenbrock, T., Naumann, F.: A hybrid approach to functional dependency discovery. In: SIGMOD, pp. 821–833 (2016)
25.
Zurück zum Zitat Papenbrock, T., Naumann, F.: Data-driven schema normalization. In: EDBT, pp. 342–353 (2017) Papenbrock, T., Naumann, F.: Data-driven schema normalization. In: EDBT, pp. 342–353 (2017)
26.
Zurück zum Zitat Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006) Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006)
27.
Zurück zum Zitat Stănică, P.: Good lower and upper bounds on binomial coefficients. JIPAM. J. Inequal. Pure Appl. Math. 2(3), Article 30,5 (2001)MathSciNet Stănică, P.: Good lower and upper bounds on binomial coefficients. JIPAM. J. Inequal. Pure Appl. Math. 2(3), Article 30,5 (2001)MathSciNet
28.
Zurück zum Zitat Visengeriyeva, L., Abedjan, Z.: Anatomy of metadata for data curation. ACM J. Data Inf. Qual. 12(3), 16:1–16:30 (2020) Visengeriyeva, L., Abedjan, Z.: Anatomy of metadata for data curation. ACM J. Data Inf. Qual. 12(3), 16:1–16:30 (2020)
29.
Zurück zum Zitat Wei, Z., Hartmann, S., Link, S.: Discovery algorithms for embedded functional dependencies. In: SIGMOD, pp. 833–843 (2020) Wei, Z., Hartmann, S., Link, S.: Discovery algorithms for embedded functional dependencies. In: SIGMOD, pp. 833–843 (2020)
30.
Zurück zum Zitat Wei, Z., Leck, U., Link, S.: Discovery and ranking of embedded uniqueness constraints. PVLDB 12(13), 2339–2352 (2019) Wei, Z., Leck, U., Link, S.: Discovery and ranking of embedded uniqueness constraints. PVLDB 12(13), 2339–2352 (2019)
31.
Zurück zum Zitat Wei, Z., Link, S.: Embedded cardinality constraints. In: CAiSE, pp. 523–538 (2018) Wei, Z., Link, S.: Embedded cardinality constraints. In: CAiSE, pp. 523–538 (2018)
32.
Zurück zum Zitat Wei, Z., Link, S.: DataProf: Semantic profiling for iterative data cleansing and business rule acquisition. In: SIGMOD, pp. 1793–1796 (2018) Wei, Z., Link, S.: DataProf: Semantic profiling for iterative data cleansing and business rule acquisition. In: SIGMOD, pp. 1793–1796 (2018)
33.
Zurück zum Zitat Wei, Z., Link, S.: Discovery and ranking of functional dependencies. In: ICDE, pp. 1526–1537 (2019) Wei, Z., Link, S.: Discovery and ranking of functional dependencies. In: ICDE, pp. 1526–1537 (2019)
34.
Zurück zum Zitat Wei, Z., Link, S.: Embedded functional dependencies and data-completeness tailored database design. PVLDB 12(11), 1458–1470 (2019) Wei, Z., Link, S.: Embedded functional dependencies and data-completeness tailored database design. PVLDB 12(11), 1458–1470 (2019)
35.
Zurück zum Zitat Wei, Z., Link, S.: Embedded functional dependencies and data-completeness tailored database design. ACM Trans. Database Syst. 46(2), 7:1–7:46 (2021)MathSciNetCrossRef Wei, Z., Link, S.: Embedded functional dependencies and data-completeness tailored database design. ACM Trans. Database Syst. 46(2), 7:1–7:46 (2021)MathSciNetCrossRef
36.
Zurück zum Zitat Wyss, C.M., Giannella, C., Robertson, E.L.: FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances. In: DaWaK, pp. 101–110 (2001) Wyss, C.M., Giannella, C., Robertson, E.L.: FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances. In: DaWaK, pp. 101–110 (2001)
37.
Zurück zum Zitat Yao, H., Hamilton, H.J., Butz, C.J.: Fd\(\_\)mine: Discovering functional dependencies in a database using equivalences. In: ICDM, pp. 729–732 (2002) Yao, H., Hamilton, H.J., Butz, C.J.: Fd\(\_\)mine: Discovering functional dependencies in a database using equivalences. In: ICDM, pp. 729–732 (2002)
Metadaten
Titel
Algorithms for the discovery of embedded functional dependencies
verfasst von
Ziheng Wei
Sven Hartmann
Sebastian Link
Publikationsdatum
28.07.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 6/2021
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-021-00684-3

Weitere Artikel der Ausgabe 6/2021

The VLDB Journal 6/2021 Zur Ausgabe