skip to main content
research-article

The computation of optimal subset repairs

Published:01 July 2020Publication History
Skip Abstract Section

Abstract

Computing an optimal subset repair of an inconsistent database is becoming a standalone research problem and has a wide range of applications. However, it has not been well-studied yet. A tight inapproximability bound of the problem computing optimal subset repairs is still unknown, and there is still no existing algorithm with a constant approximation factor better than two. In this paper, we prove a new tighter inapproximability bound of the problem computing optimal subset repairs. We show that it is usually NP-hard to approximate it within a factor better than 17/16. An algorithm with an approximation ratio (2 - 1/2σ-1)is developed, where σ is the number of functional dependencies. It is the current best algorithm in terms of approximation ratio. The ratio can be further improved if there are a large amount of quasi-Turán clusters in the input database. Plenty of experiments are conducted on real data to examine the performance and the effectiveness of the proposed approximation algorithms in real-world applications.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases: The Logical Level. Addison-Wesley,1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. N. Afrati and P. G. Kolaitis. Repair checking in inconsistent databases: algorithms and complexity. In ICDT, pages 31--41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Amini, S. Pérennes, and I. Sau. Hardness and approximation of traffic grooming. Theor. Comput. Sci., 410(38--40):3751--3760, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68--79, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Arenas, L. Bertossi, and J. Chomicki. Answer sets for consistent query answering in inconsistent databases. Theor. Pract. Log. Prog., 3(4):393--424, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Arenas, L. Bertossi, J. Chomicki, X. He, V. Raghavan, and J. Spinrad. Scalar aggregation in inconsistent databases. Theor. Comput. Sci., 296(3):405--434, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Assadi, T. Milo, and S. Novgorodov. DANCE: data cleaning with constraints and experts. In ICDE, pages 1409--1410, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  8. R. Bar-Yehuda and S. Even. A linear-time approximation algorithm for the weighted vertex cover problem. J. Algorithms, 2(2):198--203, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Bergman, T. Milo, S. Novgorodov, and W.-C. Tan. QOCO: a query oriented data cleaning system with oracles. PVLDB, 8(12):1900--1903, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Bertossi. Database repairs and consistent query answering: origins and further developments. In PODS, pages 48--58, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Bertossi. Repair-based degrees of database inconsistency. In LPNMR, pages 195--209, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  12. L. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Inform. Syst., 33(4):407--434, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Bohannon, W. Fan, M. Flaster, and R. Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD, pages 143--154, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  15. N. Boria, F. D. Croce, and V. T. Paschos. On the max min vertex cover problem. Discrete Appl. Math., 196:62--71, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Caniupán and L. Bertossi. The consistency extractor system: answer set programs for consistent query answering in databases. Data Knowl. Eng., 69(6):545--572, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Chen, I. A. Kanj, and G. Xia. Improved upper bounds for vertex cover. Theor. Comput. Sci., 411(40):3736--3756, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Chiang and R. J. Miller. A unified model for data and constraint repair. In ICDE, pages 446--457, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1--2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. Chu, I. F. Ilyas, S. Krishnan, and J. Wang. Data cleaning: overview and emerging challenges. In SIGMOD, pages 2201--2206, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Chu, I. F. Ilyas, and P. Papotti. Holistic data cleaning: putting violations into context. In ICDE, pages 458--469, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: consistency and accuracy. PVLDB, 7(6):315--325, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Crescenzi. A short guide to approximation preserving reductions. In CCC, pages 262--273, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Dallachiesa, A. Ebaid, A. Eldawy, A. Elmagarmid, I. F. Ilyas, M. Ouzzani, and N. Tang. NADEEF: a commodity data cleaning system. In SIGMOD, pages 541--552, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. De Sa, I. F. Ilyas, B. Kimelfeld, C. Ré, and T. Rekatsinas. A formal framework for probabilistic unclean databases. In ICDT, pages 26--28, 2019.Google ScholarGoogle Scholar
  26. A. A. Dixit. CAvSAT: a system for query answering over inconsistent databases. In SIGMOD, pages 1823--1825, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. A. Dixit and P. G. Kolaitis. A SAT-based system for consistent query answering. In SAT, pages 117--135, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. Flesca, F. Furfaro, and F. Parisi. Consistent query answers on numerical databases under aggregate constraints. In DBPL, pages 279--294, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Gartner. Vendor Rating Service, accessed May 15, 2020.Google ScholarGoogle Scholar
  30. F. Geerts, G. Mecca, P. Papotti, and D. Santoro. The llunatic data-cleaning framework. PVLDB, 6(9):625--636, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Golab, I. F. Ilyas, G. Beskales, and A. Galiullin. On the relative trust between inconsistent data and inaccurate constraints. In ICDE, pages 541--552, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. V. Guruswami and S. Khot. Hardness of Max 3SAT with no mixed clauses. In CCC, pages 154--162, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. V. Kann. Maximum bounded 3-dimensional matching is MAX SNP-complete. Inform. Process. Lett., 37(1):27--35, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. Karakostas. A better approximation ratio for the vertex cover problem. ACM Trans. Algorithms, 5(4):41:1--41:8, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Khot. On the unique games conjecture. In FOCS, page 3, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Khot and O. Regev. Vertex cover might be hard to approximate to within 2-ε. J. Comput. Syst. Sci., 74(3):335--349, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Kolahi and L. V. S. Lakshmanan. On approximating optimum repairs for functional dependency violations. In ICDT, pages 53--62, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. G. Kolaitis, E. Pema, and W.-C. Tan. Efficient querying of inconsistent databases with binary integer programming. PVLDB, 6(6):397--408, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. P. Koutris and J. Wijsen. Consistent query answering for self-join-free conjunctive queries under primary key constraints. ACM Trans. Database Syst., 42(2):1--45, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. Livshits, B. Kimelfeld, and S. Roy. Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):1--46, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Lopatenko and L. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, pages 179--193, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Lopatenko and L. Bravo. Efficient approximation algorithms for repairing inconsistent databases. In ICDE, pages 216--225, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  43. G. L. Nemhauser and L. E. Trotter. Vertex packings: structural properties and algorithms. Math. Program., 8(4):232--248, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. Rekatsinas, X. Chu, I. F. Ilyas, and C. Ré. HoloClean: holistic data repairs with probabilistic inference. PVLDB, 10(11):1190--1201, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. B. Salimi, L. Rodriguez, B. Howe, and D. Suciu. Interventional fairness: causal database repair for algorithmic fairness. In SIGMOD, pages 793--810, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. Wijsen. On the consistent rewriting of conjunctive queries under primary key constraints. Inform. Syst., 34(7):578--601, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Wijsen. Certain conjunctive query answering in first-order logic. ACM Trans. Database Syst., 37(2):1--35, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Wijsen. Foundations of query answering on inconsistent databases. SIGMOD Rec., 48(3):6--16, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Zehavi. Maximum minimal vertex cover parameterized by vertex cover. SIAM J. Discrete Math., 31(4):2440--2456, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 12
    August 2020
    1710 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 July 2020
    Published in pvldb Volume 13, Issue 12

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader