Abstract
We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair), which is obtained by a minimum number of tuple deletions, and an optimal update repair (optimal U-repair), which is obtained by a minimum number of value (cell) updates. For computing an optimal S-repair, we present a polynomial-time algorithm that succeeds on certain sets of FDs and fails on others. We prove the following about the algorithm. When it succeeds, it can also incorporate weighted tuples and duplicate tuples. When it fails, the problem is NP-hard and, in fact, APX-complete (hence, cannot be approximated better than some constant). Thus, we establish a dichotomy in the complexity of computing an optimal S-repair. We present general analysis techniques for the complexity of computing an optimal U-repair, some based on the dichotomy for S-repairs. We also draw a connection to a past dichotomy in the complexity of finding a “most probable database” that satisfies a set of FDs with a single attribute on the left-hand side; the case of general FDs was left open, and we show how our dichotomy provides the missing generalization and thereby settles the open problem.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Computing Optimal Repairs for Functional Dependencies
- Foto N. Afrati and Phokion G. Kolaitis. 2009. Repair checking in inconsistent databases: Algorithms and complexity. In Proceedings of the ICDT. ACM, 31--41.Google Scholar
- Paola Alimonti and Viggo Kann. 2000. Some APX-completeness results for cubic graphs. Theor. Comput. Sci. 237, 1--2 (2000), 123--134.Google ScholarDigital Library
- Omid Amini, Stéphane Pérennes, and Ignasi Sau. 2009. Hardness and approximation of traffic grooming. Theor. Comput. Sci. 410, 38--40 (2009), 3751--3760.Google ScholarDigital Library
- Periklis Andritsos, Ariel Fuxman, and Renée J. Miller. 2006. Clean answers over dirty databases: A probabilistic approach. In Proceedings of the ICDE. IEEE Computer Society, 30.Google Scholar
- Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. 1999. Consistent query answers in inconsistent databases. In Proceedings of the PODS. ACM, 68--79.Google ScholarDigital Library
- Ahmad Assadi, Tova Milo, and Slava Novgorodov. 2017. DANCE: Data cleaning with constraints and experts. In Proceedings of the ICDE. IEEE Computer Society, 1409--1410.Google ScholarCross Ref
- Giorgio Ausiello, M. Protasi, A. Marchetti-Spaccamela, G. Gambosi, P. Crescenzi, and V. Kann. 1999. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties (1st ed.). Springer-Verlag, Berlin.Google Scholar
- Reuven Bar-Yehuda and Shimon Even. 1981. A linear-time approximation algorithm for the weighted vertex cover problem. J. Algor. 2, 2 (1981), 198--203.Google ScholarCross Ref
- Catriel Beeri and Moshe Y. Vardi. 1984. Formal systems for tuple and equality generating dependencies. SIAM J. Comput. 13, 1 (1984), 76--98.Google ScholarCross Ref
- Moria Bergman, Tova Milo, Slava Novgorodov, and Wang-Chiew Tan. 2015. QOCO: A query oriented data cleaning system with oracles. Proc. Very Large Data Base 8, 12 (2015), 1900--1903.Google ScholarDigital Library
- Leopoldo E. Bertossi. 2018. Measuring and computing database inconsistency via repairs. In Proceedings of the SUM (Lecture Notes in Computer Science), Vol. 11142. Springer, 368--372.Google ScholarCross Ref
- Leopoldo E. Bertossi. 2018. Repair-based degrees of database inconsistency: Computation and complexity. CoRR abs/1809.10286 (2018).Google Scholar
- Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2007. Conditional functional dependencies for data cleaning. In Proceedings of the ICDE. IEEE, 746--755.Google ScholarCross Ref
- Douglas Burdick, Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang Chiew Tan. 2017. Expressive power of entity-linking frameworks. In Proceedings of the ICDT (LIPIcs), Vol. 68. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 10:1--10:18.Google Scholar
- Marco A. Casanova, Ronald Fagin, and Christos H. Papadimitriou. 1984. Inclusion dependencies and their interaction with functional dependencies. J. Comput. Syst. Sci. 28, 1 (1984), 29--59.Google ScholarCross Ref
- Jan Chomicki and Jerzy Marcinkowski. 2005. Minimal-change integrity maintenance using tuple deletions. Info. Comput. 197, 1--2 (2005), 90--121.Google Scholar
- E. F. Codd. 1975. Recent investigations in relational data base systems. In Proceedings of the Conference on Data: Its Use, Organization, and Management (ACM Pacific’75). 15--20.Google Scholar
- P. Crescenzi. 1997. A short guide to approximation preserving reductions. In Proceedings of the IEEECCC. IEEE Computer Society, Washington, DC, 262.Google ScholarCross Ref
- Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, and Nan Tang. 2013. NADEEF: A commodity data cleaning system. In Proceedings of the SIGMOD. ACM, 541--552.Google ScholarDigital Library
- Nilesh N. Dalvi and Dan Suciu. 2004. Efficient query evaluation on probabilistic databases. In Proceedings of the VLDB. Morgan Kaufmann, 864--875.Google ScholarDigital Library
- C. J. Date. 1981. Referential integrity. In Proceedings of the VLDB. VLDB Endowment, 2--12.Google Scholar
- Jianfeng Du, Guilin Qi, and Yi-Dong Shen. 2013. Weight-based consistent query answering over inconsistent SHIQ knowledge bases. Knowl. Info. Syst. 34, 2 (2013), 335--371.Google ScholarDigital Library
- Ronald Fagin, Benny Kimelfeld, and Phokion G. Kolaitis. 2015. Dichotomies in the complexity of preferred repairs. In Proceedings of the PODS. ACM, 3--15.Google Scholar
- Wenfei Fan and Floris Geerts. 2012. Foundations of Data Quality Management. Morgan 8 Claypool Publishers.Google Scholar
- Terry Gaasterland, Parke Godfrey, and Jack Minker. 1992. An overview of cooperative answering. J. Intell. Info. Syst. 1, 2 (1992), 123--157.Google ScholarCross Ref
- Floris Geerts, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro. 2013. The LLUNATIC data-cleaning framework. Proc. Very Large Data Base 6, 9 (2013), 625--636.Google ScholarDigital Library
- John Grant and Anthony Hunter. 2017. Analysing inconsistent information using distance-based measures. Int. J. Approx. Reason. 89 (2017), 3--26.Google ScholarDigital Library
- Eric Gribkoff, Guy Van den Broeck, and Dan Suciu. 2014. The most probable database problem. In Proceedings of the BUDA.Google Scholar
- Venkatesan Guruswami. 2004. Inapproximability results for set splitting and SatisfiabilityProblems with no mixed clauses. Algorithmica 38, 3 (1 Mar 2004), 451--469.Google Scholar
- S. Khanna, M. Sudan, and L. Trevisan. 1997. Constraint satisfaction: The approximability of minimization problems. In Proceedings of the 12th Annual IEEE Conference on Computational Complexity. 282--296.Google Scholar
- Benny Kimelfeld. 2012. A dichotomy in the complexity of deletion propagation with functional dependencies. In Proceedings of the PODS. 191--202.Google ScholarDigital Library
- Benny Kimelfeld, Ester Livshits, and Liat Peterfreund. 2017. Detecting ambiguity in prioritized database repairing. In Proceedings of the ICDT. 17:1--17:20.Google Scholar
- Solmaz Kolahi and Laks V. S. Lakshmanan. 2009. On approximating optimum repairs for functional dependency violations. In Proceedings of the ICDT, Vol. 361. ACM, 53--62.Google Scholar
- Harold W. Kuhn. 1955. The hungarian method for the assignment problem. Naval Res. Logist. Quart. 2, 1--2 (Mar. 1955), 83--97.Google ScholarCross Ref
- Ester Livshits, Ihab F. Ilyas, Benny Kimelfeld, and Sudeepa Roy. 2019. Principles of progress indicators for database repairing. CoRR abs/1904.06492 (2019).Google Scholar
- Ester Livshits and Benny Kimelfeld. 2017. Counting and enumerating (preferred) database repairs. In Proceedings of the PODS. 289--301.Google ScholarDigital Library
- Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. 2018. Computing optimal repairs for functional dependencies. In Proceedings of the PODS. 225--237.Google ScholarDigital Library
- Andrei Lopatenko and Leopoldo E. Bertossi. 2007. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In Proceedings of the ICDT. 179--193.Google Scholar
- Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. 2017. HoloClean: Holistic data repairs with probabilistic inference. Proc. Very Large Data Base 10, 11 (2017), 1190--1201.Google ScholarDigital Library
- Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. 2019. A formal framework for probabilistic unclean databases. In Proceedings of the ICDT. 6:1--6:18.Google Scholar
- Slawek Staworko, Jan Chomicki, and Jerzy Marcinkowski. 2012. Prioritized repairing and consistent query answering in relational databases. Ann. Math. Artif. Intell. 64, 2--3 (2012), 209--246.Google ScholarDigital Library
- Dan Suciu, Dan Olteanu, R. Christopher, and Christoph Koch. 2011. Probabilistic Databases (1st ed.). Morgan 8 Claypool Publishers.Google Scholar
- Ingo Wegener and R. Pruim. 2005. Complexity Theory: Exploring the Limits of Efficient Algorithms. Springer-Verlag, Berlin.Google ScholarDigital Library
Index Terms
- Computing Optimal Repairs for Functional Dependencies
Recommendations
Computing Optimal Repairs for Functional Dependencies
PODS '18: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsWe investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair) that is ...
On approximating optimum repairs for functional dependency violations
ICDT '09: Proceedings of the 12th International Conference on Database TheoryWe study the problem of repairing an inconsistent database that violates a set of functional dependencies by making the smallest possible value modifications. For an inconsistent database, we define an optimum repair as a database that satisfies the ...
Tree-width and functional dependencies in databases
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsConjunctive query (CQ) evaluation on relational databases is NP-complete in general. Several restrictions, like bounded tree-width and bounded hypertree-width, allow polynomial time evaluations.We extend the framework in the presence of functional ...
Comments