skip to main content
10.1145/1142351.1142369acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF

Published:26 June 2006Publication History

ABSTRACT

A recently introduced information-theoretic approach to analyzing redundancies in database design was used to justify normal forms like BCNF that completely eliminate redundancies. The main notion is that of an information content of each datum in an instance (which is a number in [0,1]): the closer to 1, the less redundancy it carries. In practice, however, one usually settles for 3NF which, unlike BCNF, may not eliminate all redundancies but always guarantees dependency preservation.In this paper we use the information-theoretic approach to prove that 3NF is the best normal form if one needs to achieve dependency preservation. For each dependency-preserving normal form, we define the price of dependency preservation as an information-theoretic measure of redundancy that gets introduced to compensate for dependency preservation. This is a number in the [0,1] range: the smaller it is, the less redundancy a normal form guarantees. We prove that for every dependency-preserving normal form, the price of dependency preservation is at least 1/2, and it is precisely 1/2 for 3NF. Hence, 3NF has the least amount of redundancy among all dependency-preserving normal forms. We also show that, information-theoretically, unnormalized schemas have at least twice the amount of redundancy than schemas in 3NF.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Albert, Y. Ioannidis, and R. Ramakrishnan. Equivalence of keyed relational schemas by conjunctive queries. JCSS, 58(3):512--534, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Arenas and L. Libkin. A normal form for XML documents. ACM TODS 29 (2004), 195--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Arenas and L. Libkin. An information-theoretic approach to normal forms for relational and XML data. Journal of the ACM, 52 (2005), 246--283. Extended abstract in PODS'03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Beeri, P. Bernstein, and N. Goodman. A sophisticate's introduction to database normalization theory. In VLDB'78, pages 113--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Bernstein. Synthesizing third normal form relations from functional dependencies. ACM TODS, 1 (1976), 277--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Biskup. Achievements of relational database schema design theory revisited. In Semantics in Databases, LNCS 1358, pages 29--54. Springer-Verlag, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Cavallo and M. Pittarelli. The theory of probabilistic databases. In VLDB'87, pages 71--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Cover and J. Thomas. Elements of Information Theory. Wiley-Interscience, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Dalkilic and E. Robertson. Information dependencies. In PODS'00, pages 245--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. W. Embley and W. Y. Mok. Developing XML documents with guaranteed "good" properties. In ER'01, pages 426--441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Fagin. Normal forms and relational database operators. In SIGMOD'79, pages 153--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Fagin. A normal form for relational databases that is based on domains and keys. ACM TODS 6 (1981), 387--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Hull. Relative information capacity of simple relational database schemata. SIAM J. Comput., 15(3):856--886, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Kanellakis. Elements of Relational Database Theory, In Handbook of TCS, vol. B, 1990, pages 1075--1144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Kifer, A. Bernstein, P. Lewis. Database Systems: An Application-Oriented Approach. Addison-Wesley, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Kolahi. Dependency-preserving normalization of relational and XML data. In DBPL'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. T. Lee. An information-theoretic analysis of relational databases - Part I: Data dependencies and information metric. IEEE Trans. on Software Engineering, 13(10):1049--1061, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Levene and G. Loizou. A Guided Tour of Relational Databases and Beyond. Springer, 1999. Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Levene and G. Loizou. Why is the snowflake schema a good data warehouse design? Information Systems, 28 (2003), 225--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Levene and M. W. Vincent. Justification for inclusion dependency normal form. IEEE TKDE, 12(2):281--291, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T.W. Ling, F. Tompa, T. Kameda. An improved third normal form for relational databases. ACM TODS 6 (1981), 329--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Oracle's General Database Design FAQ. http://www.orafaq.com/faqdesgn.htm.Google ScholarGoogle Scholar
  24. V. Vianu. A Web Odyssey: from Codd to XML. In PODS'01, pages 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Zaniolo. A new normal form for the design of relational database schemata. ACM TODS 7 (1982), 489--499. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
          June 2006
          382 pages
          ISBN:1595933182
          DOI:10.1145/1142351

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 June 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          PODS '06 Paper Acceptance Rate35of185submissions,19%Overall Acceptance Rate642of2,707submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader