ABSTRACT
A recently introduced information-theoretic approach to analyzing redundancies in database design was used to justify normal forms like BCNF that completely eliminate redundancies. The main notion is that of an information content of each datum in an instance (which is a number in [0,1]): the closer to 1, the less redundancy it carries. In practice, however, one usually settles for 3NF which, unlike BCNF, may not eliminate all redundancies but always guarantees dependency preservation.In this paper we use the information-theoretic approach to prove that 3NF is the best normal form if one needs to achieve dependency preservation. For each dependency-preserving normal form, we define the price of dependency preservation as an information-theoretic measure of redundancy that gets introduced to compensate for dependency preservation. This is a number in the [0,1] range: the smaller it is, the less redundancy a normal form guarantees. We prove that for every dependency-preserving normal form, the price of dependency preservation is at least 1/2, and it is precisely 1/2 for 3NF. Hence, 3NF has the least amount of redundancy among all dependency-preserving normal forms. We also show that, information-theoretically, unnormalized schemas have at least twice the amount of redundancy than schemas in 3NF.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- J. Albert, Y. Ioannidis, and R. Ramakrishnan. Equivalence of keyed relational schemas by conjunctive queries. JCSS, 58(3):512--534, 1999. Google ScholarDigital Library
- M. Arenas and L. Libkin. A normal form for XML documents. ACM TODS 29 (2004), 195--232. Google ScholarDigital Library
- M. Arenas and L. Libkin. An information-theoretic approach to normal forms for relational and XML data. Journal of the ACM, 52 (2005), 246--283. Extended abstract in PODS'03. Google ScholarDigital Library
- C. Beeri, P. Bernstein, and N. Goodman. A sophisticate's introduction to database normalization theory. In VLDB'78, pages 113--124. Google ScholarDigital Library
- P. Bernstein. Synthesizing third normal form relations from functional dependencies. ACM TODS, 1 (1976), 277--298. Google ScholarDigital Library
- J. Biskup. Achievements of relational database schema design theory revisited. In Semantics in Databases, LNCS 1358, pages 29--54. Springer-Verlag, 1995. Google ScholarDigital Library
- R. Cavallo and M. Pittarelli. The theory of probabilistic databases. In VLDB'87, pages 71--81. Google ScholarDigital Library
- T. Cover and J. Thomas. Elements of Information Theory. Wiley-Interscience, 1991. Google ScholarDigital Library
- M. Dalkilic and E. Robertson. Information dependencies. In PODS'00, pages 245--253. Google ScholarDigital Library
- D. W. Embley and W. Y. Mok. Developing XML documents with guaranteed "good" properties. In ER'01, pages 426--441. Google ScholarDigital Library
- R. Fagin. Normal forms and relational database operators. In SIGMOD'79, pages 153--160. Google ScholarDigital Library
- R. Fagin. A normal form for relational databases that is based on domains and keys. ACM TODS 6 (1981), 387--415. Google ScholarDigital Library
- R. Hull. Relative information capacity of simple relational database schemata. SIAM J. Comput., 15(3):856--886, 1986. Google ScholarDigital Library
- P. Kanellakis. Elements of Relational Database Theory, In Handbook of TCS, vol. B, 1990, pages 1075--1144. Google ScholarDigital Library
- M. Kifer, A. Bernstein, P. Lewis. Database Systems: An Application-Oriented Approach. Addison-Wesley, 2005. Google ScholarDigital Library
- S. Kolahi. Dependency-preserving normalization of relational and XML data. In DBPL'05. Google ScholarDigital Library
- T. T. Lee. An information-theoretic analysis of relational databases - Part I: Data dependencies and information metric. IEEE Trans. on Software Engineering, 13(10):1049--1061, 1987. Google ScholarDigital Library
- M. Levene and G. Loizou. A Guided Tour of Relational Databases and Beyond. Springer, 1999. Google ScholarCross Ref
- M. Levene and G. Loizou. Why is the snowflake schema a good data warehouse design? Information Systems, 28 (2003), 225--240. Google ScholarDigital Library
- M. Levene and M. W. Vincent. Justification for inclusion dependency normal form. IEEE TKDE, 12(2):281--291, 2000. Google ScholarDigital Library
- T.W. Ling, F. Tompa, T. Kameda. An improved third normal form for relational databases. ACM TODS 6 (1981), 329--346. Google ScholarDigital Library
- Oracle's General Database Design FAQ. http://www.orafaq.com/faqdesgn.htm.Google Scholar
- V. Vianu. A Web Odyssey: from Codd to XML. In PODS'01, pages 1--15. Google ScholarDigital Library
- C. Zaniolo. A new normal form for the design of relational database schemata. ACM TODS 7 (1982), 489--499. Google ScholarDigital Library
Index Terms
- On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF
Recommendations
Dependency-preserving normalization of relational and XML data
DBPL'05: Proceedings of the 10th international conference on Database Programming LanguagesHaving a database design that avoids redundant information and update anomalies is the main goal of normalization techniques. Ideally, data as well as constraints should be preserved. However, this is not always achievable: while BCNF eliminates all ...
Dependency-preserving normalization of relational and XML data
Having a database design that avoids redundant information and update anomalies is the main goal of normalization techniques. Ideally, data as well as constraints should be preserved. However, this is not always achievable: while BCNF eliminates all ...
XML schema refinement through redundancy detection and normalization
As XML becomes increasingly popular, XML schema design has become an increasingly important issue. One of the central objectives of good schema design is to avoid data redundancies: redundantly stored information can lead not just only to a higher data ...
Comments