skip to main content
research-article

On generating near-optimal tableaux for conditional functional dependencies

Published:01 August 2008Publication History
Skip Abstract Section

Abstract

Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify data inconsistencies. A CFD augments a functional dependency (FD) with a pattern tableau that defines the context (i.e., the subset of tuples) in which the underlying FD holds. While many aspects of CFDs have been studied, including static analysis and detecting and repairing violations, there has not been prior work on generating pattern tableaux, which is critical to realize the full potential of CFDs.

This paper is the first to formally characterize a "good" pattern tableau, based on naturally desirable properties of support, confidence and parsimony. We show that the problem of generating an optimal tableau for a given FD is NP-complete but can be approximated in polynomial time via a greedy algorithm. For large data sets, we propose an "on-demand" algorithm providing the same approximation bound, that outperforms the basic greedy algorithm in running time by an order of magnitude. For ordered attributes, we propose the range tableau as a generalization of a pattern tableau, which can achieve even more parsimony. The effectiveness and efficiency of our techniques are experimentally demonstrated on real data.

References

  1. D. Agarwal, D. Barman, D. Gunopulos, N. Young, F. Korn, and D. Srivastava. Efficient and effective explanation of change in hierarchical summaries. In KDD, pp 6--15, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pp 207--216, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Berman and M. Karpinski. On Some Tighter Inapproximability Results (Extended Abstract). In ICALP, pp. 200--209, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. L. Bravo, W. Fan, and S. Ma. Extending dependencies with conditions. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Brown, P. Haas. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Calders, R. Ng, and J. Wijsen. Searching for dependencies at multiple abstraction levels. TODS, 27(3):229--260, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Ceri, F. D. Guinta, and P. Lanzi. Mining constraint violations. TODS, 32(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: consistency and accuracy. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Dalkilic and E. Robertson. Information dependencies. In PODS, pp 245--253, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Gandhi, S. Khuller, and A. Srinivasan. Approximation algorithms for partial covering problems. J. Algorithms, 53(1):55--84, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP---Completeness. Freeman, San Francisco, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Giannella and E. Robertson. On approximation measures for functional dependencies. Information Systems, 29(6):483--507, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen. Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100--111, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  15. I. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. Cords: Automatic discovery of correlations and soft functional dependencies. In SIGMOD, pp 647--658, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. J. Kearns and U. V. Vazirani. An introduction to computational learning theory. MIT Press, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. King and J. Legendre. Discovery of functional and approximate functional dependencies in relational databases. Journal of Applied Mathematics and Decision Sciences, 7(1):49--59, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Kivinen and H. Mannila. Approximate inference of functional dependencies from relations. Theor. Comput. Sci., 149(1):129--149, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. U. Nambiar and S. Kambhampati. Mining approximate functional dependencies and concept similarities to answer imprecise queries. In WebDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Sarawagi. Explaining differences in multidimensional aggregates. In VLDB, pp 42--53, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD Rec., 25(2):1--12, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Xu and R. Motwani. Random sampling based algorithms for efficient semi-key discovery. Stanford University Technical Report.Google ScholarGoogle Scholar

Index Terms

  1. On generating near-optimal tableaux for conditional functional dependencies

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader