skip to main content
research-article
Public Access

Computing Optimal Repairs for Functional Dependencies

Authors Info & Claims
Published:17 February 2020Publication History
Skip Abstract Section

Abstract

We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair), which is obtained by a minimum number of tuple deletions, and an optimal update repair (optimal U-repair), which is obtained by a minimum number of value (cell) updates. For computing an optimal S-repair, we present a polynomial-time algorithm that succeeds on certain sets of FDs and fails on others. We prove the following about the algorithm. When it succeeds, it can also incorporate weighted tuples and duplicate tuples. When it fails, the problem is NP-hard and, in fact, APX-complete (hence, cannot be approximated better than some constant). Thus, we establish a dichotomy in the complexity of computing an optimal S-repair. We present general analysis techniques for the complexity of computing an optimal U-repair, some based on the dichotomy for S-repairs. We also draw a connection to a past dichotomy in the complexity of finding a “most probable database” that satisfies a set of FDs with a single attribute on the left-hand side; the case of general FDs was left open, and we show how our dichotomy provides the missing generalization and thereby settles the open problem.

Skip Supplemental Material Section

Supplemental Material

References

  1. Foto N. Afrati and Phokion G. Kolaitis. 2009. Repair checking in inconsistent databases: Algorithms and complexity. In Proceedings of the ICDT. ACM, 31--41.Google ScholarGoogle Scholar
  2. Paola Alimonti and Viggo Kann. 2000. Some APX-completeness results for cubic graphs. Theor. Comput. Sci. 237, 1--2 (2000), 123--134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Omid Amini, Stéphane Pérennes, and Ignasi Sau. 2009. Hardness and approximation of traffic grooming. Theor. Comput. Sci. 410, 38--40 (2009), 3751--3760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Periklis Andritsos, Ariel Fuxman, and Renée J. Miller. 2006. Clean answers over dirty databases: A probabilistic approach. In Proceedings of the ICDE. IEEE Computer Society, 30.Google ScholarGoogle Scholar
  5. Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. 1999. Consistent query answers in inconsistent databases. In Proceedings of the PODS. ACM, 68--79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ahmad Assadi, Tova Milo, and Slava Novgorodov. 2017. DANCE: Data cleaning with constraints and experts. In Proceedings of the ICDE. IEEE Computer Society, 1409--1410.Google ScholarGoogle ScholarCross RefCross Ref
  7. Giorgio Ausiello, M. Protasi, A. Marchetti-Spaccamela, G. Gambosi, P. Crescenzi, and V. Kann. 1999. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties (1st ed.). Springer-Verlag, Berlin.Google ScholarGoogle Scholar
  8. Reuven Bar-Yehuda and Shimon Even. 1981. A linear-time approximation algorithm for the weighted vertex cover problem. J. Algor. 2, 2 (1981), 198--203.Google ScholarGoogle ScholarCross RefCross Ref
  9. Catriel Beeri and Moshe Y. Vardi. 1984. Formal systems for tuple and equality generating dependencies. SIAM J. Comput. 13, 1 (1984), 76--98.Google ScholarGoogle ScholarCross RefCross Ref
  10. Moria Bergman, Tova Milo, Slava Novgorodov, and Wang-Chiew Tan. 2015. QOCO: A query oriented data cleaning system with oracles. Proc. Very Large Data Base 8, 12 (2015), 1900--1903.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Leopoldo E. Bertossi. 2018. Measuring and computing database inconsistency via repairs. In Proceedings of the SUM (Lecture Notes in Computer Science), Vol. 11142. Springer, 368--372.Google ScholarGoogle ScholarCross RefCross Ref
  12. Leopoldo E. Bertossi. 2018. Repair-based degrees of database inconsistency: Computation and complexity. CoRR abs/1809.10286 (2018).Google ScholarGoogle Scholar
  13. Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2007. Conditional functional dependencies for data cleaning. In Proceedings of the ICDE. IEEE, 746--755.Google ScholarGoogle ScholarCross RefCross Ref
  14. Douglas Burdick, Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang Chiew Tan. 2017. Expressive power of entity-linking frameworks. In Proceedings of the ICDT (LIPIcs), Vol. 68. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 10:1--10:18.Google ScholarGoogle Scholar
  15. Marco A. Casanova, Ronald Fagin, and Christos H. Papadimitriou. 1984. Inclusion dependencies and their interaction with functional dependencies. J. Comput. Syst. Sci. 28, 1 (1984), 29--59.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jan Chomicki and Jerzy Marcinkowski. 2005. Minimal-change integrity maintenance using tuple deletions. Info. Comput. 197, 1--2 (2005), 90--121.Google ScholarGoogle Scholar
  17. E. F. Codd. 1975. Recent investigations in relational data base systems. In Proceedings of the Conference on Data: Its Use, Organization, and Management (ACM Pacific’75). 15--20.Google ScholarGoogle Scholar
  18. P. Crescenzi. 1997. A short guide to approximation preserving reductions. In Proceedings of the IEEECCC. IEEE Computer Society, Washington, DC, 262.Google ScholarGoogle ScholarCross RefCross Ref
  19. Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, and Nan Tang. 2013. NADEEF: A commodity data cleaning system. In Proceedings of the SIGMOD. ACM, 541--552.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nilesh N. Dalvi and Dan Suciu. 2004. Efficient query evaluation on probabilistic databases. In Proceedings of the VLDB. Morgan Kaufmann, 864--875.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. J. Date. 1981. Referential integrity. In Proceedings of the VLDB. VLDB Endowment, 2--12.Google ScholarGoogle Scholar
  22. Jianfeng Du, Guilin Qi, and Yi-Dong Shen. 2013. Weight-based consistent query answering over inconsistent SHIQ knowledge bases. Knowl. Info. Syst. 34, 2 (2013), 335--371.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ronald Fagin, Benny Kimelfeld, and Phokion G. Kolaitis. 2015. Dichotomies in the complexity of preferred repairs. In Proceedings of the PODS. ACM, 3--15.Google ScholarGoogle Scholar
  24. Wenfei Fan and Floris Geerts. 2012. Foundations of Data Quality Management. Morgan 8 Claypool Publishers.Google ScholarGoogle Scholar
  25. Terry Gaasterland, Parke Godfrey, and Jack Minker. 1992. An overview of cooperative answering. J. Intell. Info. Syst. 1, 2 (1992), 123--157.Google ScholarGoogle ScholarCross RefCross Ref
  26. Floris Geerts, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro. 2013. The LLUNATIC data-cleaning framework. Proc. Very Large Data Base 6, 9 (2013), 625--636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. John Grant and Anthony Hunter. 2017. Analysing inconsistent information using distance-based measures. Int. J. Approx. Reason. 89 (2017), 3--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Eric Gribkoff, Guy Van den Broeck, and Dan Suciu. 2014. The most probable database problem. In Proceedings of the BUDA.Google ScholarGoogle Scholar
  29. Venkatesan Guruswami. 2004. Inapproximability results for set splitting and SatisfiabilityProblems with no mixed clauses. Algorithmica 38, 3 (1 Mar 2004), 451--469.Google ScholarGoogle Scholar
  30. S. Khanna, M. Sudan, and L. Trevisan. 1997. Constraint satisfaction: The approximability of minimization problems. In Proceedings of the 12th Annual IEEE Conference on Computational Complexity. 282--296.Google ScholarGoogle Scholar
  31. Benny Kimelfeld. 2012. A dichotomy in the complexity of deletion propagation with functional dependencies. In Proceedings of the PODS. 191--202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Benny Kimelfeld, Ester Livshits, and Liat Peterfreund. 2017. Detecting ambiguity in prioritized database repairing. In Proceedings of the ICDT. 17:1--17:20.Google ScholarGoogle Scholar
  33. Solmaz Kolahi and Laks V. S. Lakshmanan. 2009. On approximating optimum repairs for functional dependency violations. In Proceedings of the ICDT, Vol. 361. ACM, 53--62.Google ScholarGoogle Scholar
  34. Harold W. Kuhn. 1955. The hungarian method for the assignment problem. Naval Res. Logist. Quart. 2, 1--2 (Mar. 1955), 83--97.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ester Livshits, Ihab F. Ilyas, Benny Kimelfeld, and Sudeepa Roy. 2019. Principles of progress indicators for database repairing. CoRR abs/1904.06492 (2019).Google ScholarGoogle Scholar
  36. Ester Livshits and Benny Kimelfeld. 2017. Counting and enumerating (preferred) database repairs. In Proceedings of the PODS. 289--301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. 2018. Computing optimal repairs for functional dependencies. In Proceedings of the PODS. 225--237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Andrei Lopatenko and Leopoldo E. Bertossi. 2007. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In Proceedings of the ICDT. 179--193.Google ScholarGoogle Scholar
  39. Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. 2017. HoloClean: Holistic data repairs with probabilistic inference. Proc. Very Large Data Base 10, 11 (2017), 1190--1201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. 2019. A formal framework for probabilistic unclean databases. In Proceedings of the ICDT. 6:1--6:18.Google ScholarGoogle Scholar
  41. Slawek Staworko, Jan Chomicki, and Jerzy Marcinkowski. 2012. Prioritized repairing and consistent query answering in relational databases. Ann. Math. Artif. Intell. 64, 2--3 (2012), 209--246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dan Suciu, Dan Olteanu, R. Christopher, and Christoph Koch. 2011. Probabilistic Databases (1st ed.). Morgan 8 Claypool Publishers.Google ScholarGoogle Scholar
  43. Ingo Wegener and R. Pruim. 2005. Complexity Theory: Exploring the Limits of Efficient Algorithms. Springer-Verlag, Berlin.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Computing Optimal Repairs for Functional Dependencies

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 45, Issue 1
          Best of SIGMOD 2018 and Best of PODS 2018
          March 2020
          177 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/3382758
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 February 2020
          • Accepted: 1 August 2019
          • Revised: 1 May 2019
          • Received: 1 November 2018
          Published in tods Volume 45, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format