ABSTRACT
We study the problem of repairing an inconsistent database that violates a set of functional dependencies by making the smallest possible value modifications. For an inconsistent database, we define an optimum repair as a database that satisfies the functional dependencies, and minimizes, among all repairs, a distance measure that depends on the number of corrections made in the database and the weights of tuples modified. We show that like other versions of the repair problem, checking the existence of a repair within a certain distance of a database is NP-complete. We also show that finding a constant-factor approximation for the optimum repair for any set of functional dependencies is NP-hard. Furthermore, there is a small constant and a set of functional dependencies, for which finding an approximate solution for the optimum repair within the factor of that constant is also NP-hard. Then we present an approximation algorithm that for a fixed set of functional dependencies and an arbitrary input inconsistent database, produces a repair whose distance to the database is within a constant factor of the optimum repair distance. We finally show how the approximation algorithm can be used in data cleaning using a recent extension to functional dependencies, called conditional functional dependencies.
- Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- Serge Abiteboul, Paris C. Kanellakis, and Gösta Grahne. On the representation and querying of sets of possible worlds. Theor. Comput. Sci., 78(1):158--187, 1991. Google ScholarDigital Library
- Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68--79, 1999. Google ScholarDigital Library
- Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Answer sets for consistent query answering in inconsistent databases. TPLP, 3(4--5):393--424, 2003. Google ScholarDigital Library
- Marcelo Arenas, Leopoldo E. Bertossi, Jan Chomicki, Xin He, Vijay Raghavan, and Jeremy Spinrad. Scalar aggregation in inconsistent databases. Theor. Comput. Sci., 3(296):405--434, 2003. Google ScholarDigital Library
- Mihir Bellare, Shafi Goldwasser, Carsten Lund, and A. Russeli. Efficient probabilistically checkable proofs and applications to approximations. In STOC, pages 294--304, 1993. Google ScholarDigital Library
- Leopoldo E. Bertossi. Consistent query answering in databases. SIGMOD Record, 35(2):68--76, 2006. Google ScholarDigital Library
- Leopoldo E. Bertossi, Loreto Bravo, Enrico Franconi, and Andrei Lopatenko. The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Inf. Syst., 33(4--5):407--434, 2008. Google ScholarDigital Library
- Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.Google ScholarCross Ref
- Philip Bohannon, Michael Flaster, Wenfei Fan, and Rajeev Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD Conference, pages 143--154, 2005. Google ScholarDigital Library
- Jan Chomicki. Consistent query answering: Five easy pieces. In ICDT, pages 1--17, 2007. Google ScholarDigital Library
- Jan Chomicki and Jerzy Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1--2):90--121, 2005.Google Scholar
- Gao Cong, Wenfei Fan, Floris Geerts, Xibei Jia, and Shuai Ma. Improving data quality: Consistency and accuracy. In VLDB, pages 315--326, 2007. Google ScholarDigital Library
- Dorit S. Hochbaum (Editor). Approximation Algorithms for NP-Hard Problems. PWS, 1997. Google ScholarDigital Library
- Wenfei Fan. Dependencies revisited for improving data quality. In PODS, pages 159--170, 2008. Google ScholarDigital Library
- Sergio Flesca, Filippo Furfaro, and Francesco Parisi. Consistent query answers on numerical databases under aggregate constraints. In DBPL, pages 279--294, 2005. Google ScholarDigital Library
- Enrico Franconi, Antonio Laureti Palma, Nicola Leone, Simona Perri, and Francesco Scarcello. Census data repair: a challenging application of disjunctive logic programming. In LPAR, pages 561--578, 2001. Google ScholarDigital Library
- Filippo Furfaro, Sergio Greco, and Cristian Molinaro. A three-valued semantics for querying and repairing inconsistent databases. Ann. Math. Artif. Intell., 51(2--4):167--193, 2007. Google ScholarDigital Library
- Ariel Fuxman, Elham Fazli, and Renée J. Miller. Conquer: Efficient management of inconsistent databases. In SIGMOD Conference, pages 155--166, 2005. Google ScholarDigital Library
- Ariel Fuxman and Renée J. Miller. First-order query rewriting for inconsistent databases. J. Comput. Syst. Sci., 73(4):610--635, 2007. Google ScholarDigital Library
- Solmaz Kolahi and Leonid Libkin. On redundancy vs dependency preservation in normalization: an information-theoretic study of 3nf. In PODS, pages 114--123, 2006. Google ScholarDigital Library
- Gabriel M. Kuper, Leonid Libkin, and Jan Paredaens, editors. Constraint Databases. Springer, 2000.Google Scholar
- Andrei Lopatenko and Loreto Bravo. Efficient approximation algorithms for repairing inconsistent databases. In ICDE, pages 216--225, 2007.Google ScholarCross Ref
- Christos H. Papadimitriou and Mihalis Yannakakis. Optimization, approximation, and complexity classes. J. Comput. Syst. Sci., 43(3):425--440, 1991.Google ScholarCross Ref
- Vijay V. Vazirani. Approximation Algorithms. Springer, 2003. Google ScholarDigital Library
- Jef Wijsen. Condensed representation of database repairs for consistent query answering. In ICDT, pages 375--390, 2003. Google ScholarDigital Library
- Jef Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722--768, 2005. Google ScholarDigital Library
- Jef Wijsen. Project-join-repair: An approach to consistent query answering under functional dependencies. In FQAS, pages 1--12, 2006. Google ScholarDigital Library
Index Terms
- On approximating optimum repairs for functional dependency violations
Recommendations
Computing Optimal Repairs for Functional Dependencies
Best of SIGMOD 2018 and Best of PODS 2018We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair), which is ...
Dichotomies in the Complexity of Preferred Repairs
PODS '15: Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsThe framework of database repairs provides a principled approach to managing inconsistencies in databases. Informally, a repair of an inconsistence database is a consistent database that differs from the inconsistent one in a "minimal way." A ...
Computing Optimal Repairs for Functional Dependencies
PODS '18: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsWe investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair) that is ...
Comments