skip to main content
research-article

Embedded Functional Dependencies and Data-completeness Tailored Database Design

Published:29 May 2021Publication History
Skip Abstract Section

Abstract

We establish a principled schema design framework for data with missing values. The framework is based on the new notion of an embedded functional dependency, which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing redundant data value occurrences that may cause problems with processing data that meets the requirements. We establish axiomatic, algorithmic, and logical foundations for reasoning about embedded functional dependencies. These foundations enable us to introduce generalizations of Boyce-Codd and Third normal forms that avoid processing difficulties of any application data, or minimize these difficulties across dependency-preserving decompositions, respectively. We show how to transform any given schema into application schemata that meet given completeness and integrity requirements, and the conditions of the generalized normal forms. Data over those application schemata are therefore fit for purpose by design. Extensive experiments with benchmark schemata and data illustrate the effectiveness of our framework for the acquisition of the constraints, the schema design process, and the performance of the schema designs in terms of updates and join queries.

References

  1. Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Calvanese, Claire David, Richard Hull, Eyke Hüllermeier, Benny Kimelfeld, Leonid Libkin, Wim Martens, Tova Milo, Filip Murlak, Frank Neven, Magdalena Ortiz, Thomas Schwentick, Julia Stoyanovich, Jianwen Su, Dan Suciu, Victor Vianu, and Ke Yi. 2018. Research directions for principles of data management. Dagstuhl Manifestos 7, 1 (2018), 1–29.Google ScholarGoogle Scholar
  2. Marcelo Arenas. 2006. Normalization theory for XML. SIGMOD Rec. 35, 4 (2006), 57–64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. William Armstrong. 1974. Dependency structures of data base relationships. In Proceedings of the 6th IFIP Congress on Information Processing, Jack L. Rosenfeld (Ed.). North-Holland, 580–583.Google ScholarGoogle Scholar
  4. Paolo Atzeni and Nicola M. Morfuni. 1986. Functional dependencies and constraints on null values in database relations. Info. Control 70, 1 (1986), 1–31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Carlo Batini, Cinzia Cappiello, Chiara Francalanci, and Andrea Maurino. 2009. Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41, 3 (2009), 16:1–16:52.Google ScholarGoogle Scholar
  6. Carlo Batini and Andrea Maurino. 2018. Design for data quality. In Encyclopedia of Database Systems, 2nd ed. Springer.Google ScholarGoogle Scholar
  7. Catriel Beeri and Philip A. Bernstein. 1979. Computational problems related to the design of normal form relational schemas. ACM Trans. Database Syst. 4, 1 (1979), 30–59.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Philip A. Bernstein. 1976. Synthesizing third normal form relations from functional dependencies. ACM Trans. Database Syst. 1, 4 (1976), 277–298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joachim Biskup. 1998. Achievements of relational database schema design theory revisited. In Semantics in Databases (Lecture Notes in Computer Science), Leonid Libkin and Bernhard Thalheim (Eds.), Vol. 1358. Springer, 29–54.Google ScholarGoogle Scholar
  10. Joachim Biskup, Umeshwar Dayal, and Philip A. Bernstein. 1979. Synthesizing independent database schemas. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Philip A. Bernstein (Ed.). 143–151.Google ScholarGoogle Scholar
  11. Loreto Bravo, Wenfei Fan, Floris Geerts, and Shuai Ma. 2008. Increasing the expressivity of conditional functional dependencies without extra complexity. In Proceedings of the 24th International Conference on Data Engineering (ICDE’08). 516–525.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Peter P. Chen. 1976. The entity-relationship model—Toward a unified view of data. ACM Trans. Database Syst. 1, 1 (1976), 9–36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Edgar F. Codd. 1972. Further normalization of the database relational model. In Proceedings of the Courant Computer Science Symposia 6: Data Base Systems. 33–64.Google ScholarGoogle Scholar
  14. C. J. Date and Ronald Fagin. 1992. Simple conditions for guaranteeing higher normal forms in relational databases. ACM Trans. Database Syst. 17, 3 (1992), 465–476.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. William F. Dowling and Jean H. Gallier. 1984. Linear-time algorithms for testing the satisfiability of propositional Horn formulae. J. Log. Program. 1, 3 (1984), 267–284.Google ScholarGoogle ScholarCross RefCross Ref
  16. N. A. Emran. 2015. Data completeness measures. In Pattern Analysis, Intelligent Security and the Internet of Things (Advances in Intelligent Systems and Computing), Vol. 355. Springer, 117–130.Google ScholarGoogle Scholar
  17. Ronald Fagin. 1977. The decomposition versus synthetic approach to relational database design. In Proceedings of the Third International Conference on Very Large Data Bases. IEEE, 441–446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ronald Fagin. 1977. Multivalued dependencies and a new normal form for relational databases. ACM Trans. Database Syst. 2, 3 (1977), 262–278.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2008. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33, 2 (2008).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Patrick C. Fischer, Lawrence V. Saxton, Stan J. Thomas, and Dirk Van Gucht. 1985. Interactions between dependencies and nested relational structures. J. Comput. Syst. Sci. 31, 3 (1985), 343–354.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sergio Greco, Cristian Molinaro, and Francesca Spezzano. 2012. Incomplete Data and Data Dependencies in Relational Databases. Morgan & Claypool Publishers.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Paolo Guagliardo and Leonid Libkin. 2017. Correctness of SQL queries on databases with nulls. SIGMOD Rec. 46, 3 (2017), 5–16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sven Hartmann and Sebastian Link. 2012. The implication problem of data dependencies over SQL table definitions: Axiomatic, algorithmic and logical characterizations. ACM Trans. Database Syst. 37, 2 (2012), 13:1–13:40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. J. Heath. 1971. Unacceptable file operations in a relational data base. In Proceedings of the ACM-SIGFIDET Workshop on Data Description, Access and Control, E. F. Codd and A. L. Dean (Eds.). ACM, 19–33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tomasz Imielinski and Witold Lipski Jr.1983. Incomplete information and dependencies in relational databases. In Proceedings of Annual Meeting of the Special Interest Group on Management of Data (SIGMOD’83). 178–184.Google ScholarGoogle Scholar
  26. Christian S. Jensen, Richard T. Snodgrass, and Michael D. Soo. 1996. Extending existing dependency theory to temporal databases. IEEE Trans. Knowl. Data Eng. 8, 4 (1996), 563–582.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Neil D. Jones and William T. Laaser. 1976. Complete problems for deterministic polynomial time. Theor. Comput. Sci. 3, 1 (1976), 105–117.Google ScholarGoogle ScholarCross RefCross Ref
  28. Henning Köhler and Sebastian Link. 2016. SQL schema design: Foundations, normal forms, and normalization. In Proceedings of the International Conference on Management of Data (SIGMOD’16), Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 267–279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Henning Köhler and Sebastian Link. 2018. SQL schema design: Foundations, normal forms, and normalization. Info. Syst. 76 (2018), 88–113.Google ScholarGoogle Scholar
  30. S. Kolahi. 2007. Dependency-preserving normalization of relational and XML data. J. Comput. Syst. Sci. 73, 4 (2007), 636–647.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Solmaz Kolahi and Leonid Libkin. 2010. An information-theoretic analysis of worst-case redundancy in database design. ACM Trans. Database Syst. 35, 1 (2010), 5:1–5:32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mark Levene and George Loizou. 1998. Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci. 206, 1–2 (1998), 283–300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mark Levene and George Loizou. 1999. A Guided Tour of Relational Databases and Beyond. Springer.Google ScholarGoogle Scholar
  34. Mark Levene and Millist W. Vincent. 2000. Justification for inclusion dependency normal form. IEEE Trans. Knowl. Data Eng. 12, 2 (2000), 281–291.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Edmund Lien. 1979. Multivalued dependencies with null values in relational data bases. In Proceedings of the 5th International Conference on Very Large Data Bases. 61–66.Google ScholarGoogle ScholarCross RefCross Ref
  36. Y. Edmund Lien. 1982. On the equivalence of database models. J. ACM 29, 2 (1982), 333–362.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sebastian Link and Henri Prade. 2019. Relational database schema design for uncertain data. Info. Syst. 84 (2019), 88–110.Google ScholarGoogle Scholar
  38. David Maier. 1983. The Theory of Relational Databases. Computer Science Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Wai Yin Mok. 2016. Utilizing nested normal form to design redundancy free JSON schemas. Int. J. Recent Contributions Eng. Sci. IT 4, 4 (2016), 21–25.Google ScholarGoogle ScholarCross RefCross Ref
  40. Wai Yin Mok, Yiu-Kai Ng, and David W. Embley. 1996. A normal form for precisely characterizing redundancy in nested relations. ACM Trans. Database Syst. 21, 1 (1996), 77–106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Thorsten Papenbrock and Felix Naumann. 2016. A hybrid approach to functional dependency discovery. In Proceedings of the International Conference on Management of Data (SIGMOD’16). 821–833.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Thorsten Papenbrock and Felix Naumann. 2017. Data-driven schema normalization. In Proceedings of the 20th International Conference on Extending Database Technology (EDBT’17), V. Markl, S. Orlando, B. Mitschang, P. Andritsos, K.-U. Sattler, and S. Breß (Eds.). OpenProceedings.org, 342–353.Google ScholarGoogle Scholar
  43. Raghu Ramakrishnan and Johannes Gehrke. 2003. Database Management Systems. McGraw-Hill.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jorma Rissanen. 1977. Independent components of relations. ACM Trans. Database Syst. 2, 4 (1977), 317–325.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mark A. Roth and Henry F. Korth. 1987. The design of 1NF relational databases into nested normal form. In Proceedings of the International Conference on Management of Data (SIGMOD’87). 143–159.Google ScholarGoogle Scholar
  46. Shazia W. Sadiq (Ed.). 2013. Handbook of Data Quality, Research and Practice. Springer.Google ScholarGoogle Scholar
  47. Zahir Tari, John Stokes, and Stefano Spaccapietra. 1997. Object normal forms and dependency constraints for object-oriented schemata. ACM Trans. Database Syst. 22, 4 (1997), 513–569.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Millist W. Vincent. 1997. A corrected 5NF definition for relational database design. Theor. Comput. Sci. 185, 2 (1997), 379–391.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Millist W. Vincent. 1999. Semantic foundations of 4NF in relational database design. Acta Inf. 36, 3 (1999), 173–213.Google ScholarGoogle ScholarCross RefCross Ref
  50. Millist W. Vincent and Mark Levene. 2000. Restructuring partitioned normal form relations without information loss. SIAM J. Comput. 29, 5 (2000), 1550–1567.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ziheng Wei, Sven Hartmann, and Sebastian Link. 2020. Discovery algorithms for embedded functional dependencies. In Proceedings of the International Conference on Management of Data (SIGMOD’20). 833–843.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ziheng Wei, Uwe Leck, and Sebastian Link. 2019. Discovery and ranking of embedded uniqueness constraints. Proc. Very Large Data Base 12, 13 (2019), 2339–2352.Google ScholarGoogle Scholar
  53. Ziheng Wei, Uwe Leck, and Sebastian Link. 2019. Entity integrity, referential integrity, and query optimization with embedded uniqueness constraints. In Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE’19),. 1694–1697.Google ScholarGoogle ScholarCross RefCross Ref
  54. Ziheng Wei and Sebastian Link. 2018. Embedded cardinality constraints. In Proceedings of the 30th International Conference on Advanced Information Systems Engineering (CAiSE’18) (Lecture Notes in Computer Science), John Krogstie and Hajo A. Reijers (Eds.), Vol. 10816. Springer, 523–538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Ziheng Wei and Sebastian Link. 2019. Embedded functional dependencies and data-completeness tailored database design. Proc. Very Large Data Base 12, 11 (2019), 1458–1470.Google ScholarGoogle Scholar
  56. Ziheng Wei, Sebastian Link, and Jiamou Liu. 2017. Contextual keys. In Proceedings of the 36th International Conference on Conceptual Modeling (ER’17) (Lecture Notes in Computer Science), Heinrich C. Mayr, Giancarlo Guizzardi, Hui Ma, and Oscar Pastor (Eds.), Vol. 10650. Springer, 266–279.Google ScholarGoogle ScholarCross RefCross Ref
  57. C. Zaniolo. 1984. Database relations with null values. J. Comput. Syst. Sci. 28, 1 (1984), 142–166.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Embedded Functional Dependencies and Data-completeness Tailored Database Design

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 46, Issue 2
          Best of PODS 2019 and Regular Papers
          June 2021
          182 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/3468529
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 May 2021
          • Accepted: 1 February 2021
          • Revised: 1 November 2020
          • Received: 1 May 2020
          Published in tods Volume 46, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format