Abstract
Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify data inconsistencies. A CFD augments a functional dependency (FD) with a pattern tableau that defines the context (i.e., the subset of tuples) in which the underlying FD holds. While many aspects of CFDs have been studied, including static analysis and detecting and repairing violations, there has not been prior work on generating pattern tableaux, which is critical to realize the full potential of CFDs.
This paper is the first to formally characterize a "good" pattern tableau, based on naturally desirable properties of support, confidence and parsimony. We show that the problem of generating an optimal tableau for a given FD is NP-complete but can be approximated in polynomial time via a greedy algorithm. For large data sets, we propose an "on-demand" algorithm providing the same approximation bound, that outperforms the basic greedy algorithm in running time by an order of magnitude. For ordered attributes, we propose the range tableau as a generalization of a pattern tableau, which can achieve even more parsimony. The effectiveness and efficiency of our techniques are experimentally demonstrated on real data.
- D. Agarwal, D. Barman, D. Gunopulos, N. Young, F. Korn, and D. Srivastava. Efficient and effective explanation of change in hierarchical summaries. In KDD, pp 6--15, 2007. Google ScholarDigital Library
- R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pp 207--216, 1993. Google ScholarDigital Library
- P. Berman and M. Karpinski. On Some Tighter Inapproximability Results (Extended Abstract). In ICALP, pp. 200--209, 1999. Google ScholarDigital Library
- P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, 2007.Google ScholarCross Ref
- L. Bravo, W. Fan, and S. Ma. Extending dependencies with conditions. In VLDB, 2007. Google ScholarDigital Library
- P. Brown, P. Haas. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data. In VLDB, 2003. Google ScholarDigital Library
- T. Calders, R. Ng, and J. Wijsen. Searching for dependencies at multiple abstraction levels. TODS, 27(3):229--260, 2002. Google ScholarDigital Library
- S. Ceri, F. D. Guinta, and P. Lanzi. Mining constraint violations. TODS, 32(1), 2007. Google ScholarDigital Library
- G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: consistency and accuracy. In VLDB, 2007. Google ScholarDigital Library
- M. Dalkilic and E. Robertson. Information dependencies. In PODS, pp 245--253, 2000. Google ScholarDigital Library
- R. Gandhi, S. Khuller, and A. Srinivasan. Approximation algorithms for partial covering problems. J. Algorithms, 53(1):55--84, 2004. Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP---Completeness. Freeman, San Francisco, 1979. Google ScholarDigital Library
- C. Giannella and E. Robertson. On approximation measures for functional dependencies. Information Systems, 29(6):483--507, 2004. Google ScholarDigital Library
- Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen. Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100--111, 1999.Google ScholarCross Ref
- I. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. Cords: Automatic discovery of correlations and soft functional dependencies. In SIGMOD, pp 647--658, 2004. Google ScholarDigital Library
- M. J. Kearns and U. V. Vazirani. An introduction to computational learning theory. MIT Press, 1994. Google ScholarDigital Library
- R. King and J. Legendre. Discovery of functional and approximate functional dependencies in relational databases. Journal of Applied Mathematics and Decision Sciences, 7(1):49--59, 2003.Google ScholarCross Ref
- J. Kivinen and H. Mannila. Approximate inference of functional dependencies from relations. Theor. Comput. Sci., 149(1):129--149, 1995. Google ScholarDigital Library
- U. Nambiar and S. Kambhampati. Mining approximate functional dependencies and concept similarities to answer imprecise queries. In WebDB, 2004. Google ScholarDigital Library
- S. Sarawagi. Explaining differences in multidimensional aggregates. In VLDB, pp 42--53, 1999. Google ScholarDigital Library
- R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD Rec., 25(2):1--12, 1996. Google ScholarDigital Library
- Y. Xu and R. Motwani. Random sampling based algorithms for efficient semi-key discovery. Stanford University Technical Report.Google Scholar
Index Terms
- On generating near-optimal tableaux for conditional functional dependencies
Recommendations
HyperS tableaux - heuristic hyper tableaux
Several syntactic methods have been constructed to automate theorem proving in first-order logic. The positive (negative) hyper-resolution and the clause tableaux were combined in a single calculus called hyper tableaux in [1]. In this paper we propose ...
Optimal Tableaux for Conditional Logics with Cautious Monotonicity
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial IntelligenceConditional logics capture default entailment in a modal framework in which non-monotonic implication is a first-class citizen, and in particular can be negated and nested. There is a wide range of axiomatizations of conditionals in the literature, from ...
Sectional and Conditional Functional Dependencies
WASA 2014: Proceedings of the 9th International Conference on Wireless Algorithms, Systems, and Applications - Volume 8491This paper proposes the concept and applications of the sectional and conditional functional dependency (SCFD), which is an important extension of the conditional functional dependency (CFD) and the functional dependency (FD). SCFDs describe ...
Comments