Abstract
Computing an optimal subset repair of an inconsistent database is becoming a standalone research problem and has a wide range of applications. However, it has not been well-studied yet. A tight inapproximability bound of the problem computing optimal subset repairs is still unknown, and there is still no existing algorithm with a constant approximation factor better than two. In this paper, we prove a new tighter inapproximability bound of the problem computing optimal subset repairs. We show that it is usually NP-hard to approximate it within a factor better than 17/16. An algorithm with an approximation ratio (2 - 1/2σ-1)is developed, where σ is the number of functional dependencies. It is the current best algorithm in terms of approximation ratio. The ratio can be further improved if there are a large amount of quasi-Turán clusters in the input database. Plenty of experiments are conducted on real data to examine the performance and the effectiveness of the proposed approximation algorithms in real-world applications.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases: The Logical Level. Addison-Wesley,1995. Google ScholarDigital Library
- F. N. Afrati and P. G. Kolaitis. Repair checking in inconsistent databases: algorithms and complexity. In ICDT, pages 31--41, 2009. Google ScholarDigital Library
- O. Amini, S. Pérennes, and I. Sau. Hardness and approximation of traffic grooming. Theor. Comput. Sci., 410(38--40):3751--3760, 2009. Google ScholarDigital Library
- M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68--79, 1999. Google ScholarDigital Library
- M. Arenas, L. Bertossi, and J. Chomicki. Answer sets for consistent query answering in inconsistent databases. Theor. Pract. Log. Prog., 3(4):393--424, 2003. Google ScholarDigital Library
- M. Arenas, L. Bertossi, J. Chomicki, X. He, V. Raghavan, and J. Spinrad. Scalar aggregation in inconsistent databases. Theor. Comput. Sci., 296(3):405--434, 2003. Google ScholarDigital Library
- A. Assadi, T. Milo, and S. Novgorodov. DANCE: data cleaning with constraints and experts. In ICDE, pages 1409--1410, 2017.Google ScholarCross Ref
- R. Bar-Yehuda and S. Even. A linear-time approximation algorithm for the weighted vertex cover problem. J. Algorithms, 2(2):198--203, 1981.Google ScholarCross Ref
- M. Bergman, T. Milo, S. Novgorodov, and W.-C. Tan. QOCO: a query oriented data cleaning system with oracles. PVLDB, 8(12):1900--1903, 2015. Google ScholarDigital Library
- L. Bertossi. Database repairs and consistent query answering: origins and further developments. In PODS, pages 48--58, 2019. Google ScholarDigital Library
- L. Bertossi. Repair-based degrees of database inconsistency. In LPNMR, pages 195--209, 2019.Google ScholarCross Ref
- L. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Inform. Syst., 33(4):407--434, 2008. Google ScholarDigital Library
- P. Bohannon, W. Fan, M. Flaster, and R. Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD, pages 143--154, 2005. Google ScholarDigital Library
- P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.Google ScholarCross Ref
- N. Boria, F. D. Croce, and V. T. Paschos. On the max min vertex cover problem. Discrete Appl. Math., 196:62--71, 2015. Google ScholarDigital Library
- M. Caniupán and L. Bertossi. The consistency extractor system: answer set programs for consistent query answering in databases. Data Knowl. Eng., 69(6):545--572, 2010. Google ScholarDigital Library
- J. Chen, I. A. Kanj, and G. Xia. Improved upper bounds for vertex cover. Theor. Comput. Sci., 411(40):3736--3756, 2010. Google ScholarDigital Library
- F. Chiang and R. J. Miller. A unified model for data and constraint repair. In ICDE, pages 446--457, 2011. Google ScholarDigital Library
- J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1--2), 2005. Google ScholarDigital Library
- X. Chu, I. F. Ilyas, S. Krishnan, and J. Wang. Data cleaning: overview and emerging challenges. In SIGMOD, pages 2201--2206, 2016. Google ScholarDigital Library
- X. Chu, I. F. Ilyas, and P. Papotti. Holistic data cleaning: putting violations into context. In ICDE, pages 458--469, 2013. Google ScholarDigital Library
- G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: consistency and accuracy. PVLDB, 7(6):315--325, 2007. Google ScholarDigital Library
- P. Crescenzi. A short guide to approximation preserving reductions. In CCC, pages 262--273, 1997. Google ScholarDigital Library
- M. Dallachiesa, A. Ebaid, A. Eldawy, A. Elmagarmid, I. F. Ilyas, M. Ouzzani, and N. Tang. NADEEF: a commodity data cleaning system. In SIGMOD, pages 541--552, 2013. Google ScholarDigital Library
- C. De Sa, I. F. Ilyas, B. Kimelfeld, C. Ré, and T. Rekatsinas. A formal framework for probabilistic unclean databases. In ICDT, pages 26--28, 2019.Google Scholar
- A. A. Dixit. CAvSAT: a system for query answering over inconsistent databases. In SIGMOD, pages 1823--1825, 2019. Google ScholarDigital Library
- A. A. Dixit and P. G. Kolaitis. A SAT-based system for consistent query answering. In SAT, pages 117--135, 2019.Google ScholarCross Ref
- S. Flesca, F. Furfaro, and F. Parisi. Consistent query answers on numerical databases under aggregate constraints. In DBPL, pages 279--294, 2005. Google ScholarDigital Library
- Gartner. Vendor Rating Service, accessed May 15, 2020.Google Scholar
- F. Geerts, G. Mecca, P. Papotti, and D. Santoro. The llunatic data-cleaning framework. PVLDB, 6(9):625--636, 2013. Google ScholarDigital Library
- L. Golab, I. F. Ilyas, G. Beskales, and A. Galiullin. On the relative trust between inconsistent data and inaccurate constraints. In ICDE, pages 541--552, 2013. Google ScholarDigital Library
- V. Guruswami and S. Khot. Hardness of Max 3SAT with no mixed clauses. In CCC, pages 154--162, 2005. Google ScholarDigital Library
- V. Kann. Maximum bounded 3-dimensional matching is MAX SNP-complete. Inform. Process. Lett., 37(1):27--35, 1991. Google ScholarDigital Library
- G. Karakostas. A better approximation ratio for the vertex cover problem. ACM Trans. Algorithms, 5(4):41:1--41:8, 2009. Google ScholarDigital Library
- S. Khot. On the unique games conjecture. In FOCS, page 3, 2005. Google ScholarDigital Library
- S. Khot and O. Regev. Vertex cover might be hard to approximate to within 2-ε. J. Comput. Syst. Sci., 74(3):335--349, 2008. Google ScholarDigital Library
- S. Kolahi and L. V. S. Lakshmanan. On approximating optimum repairs for functional dependency violations. In ICDT, pages 53--62, 2009. Google ScholarDigital Library
- P. G. Kolaitis, E. Pema, and W.-C. Tan. Efficient querying of inconsistent databases with binary integer programming. PVLDB, 6(6):397--408, 2013. Google ScholarDigital Library
- P. Koutris and J. Wijsen. Consistent query answering for self-join-free conjunctive queries under primary key constraints. ACM Trans. Database Syst., 42(2):1--45, 2017. Google ScholarDigital Library
- E. Livshits, B. Kimelfeld, and S. Roy. Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):1--46, 2020.Google ScholarDigital Library
- A. Lopatenko and L. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, pages 179--193, 2007. Google ScholarDigital Library
- A. Lopatenko and L. Bravo. Efficient approximation algorithms for repairing inconsistent databases. In ICDE, pages 216--225, 2007.Google ScholarCross Ref
- G. L. Nemhauser and L. E. Trotter. Vertex packings: structural properties and algorithms. Math. Program., 8(4):232--248, 1975. Google ScholarDigital Library
- T. Rekatsinas, X. Chu, I. F. Ilyas, and C. Ré. HoloClean: holistic data repairs with probabilistic inference. PVLDB, 10(11):1190--1201, 2017. Google ScholarDigital Library
- B. Salimi, L. Rodriguez, B. Howe, and D. Suciu. Interventional fairness: causal database repair for algorithmic fairness. In SIGMOD, pages 793--810, 2019. Google ScholarDigital Library
- J. Wijsen. On the consistent rewriting of conjunctive queries under primary key constraints. Inform. Syst., 34(7):578--601, 2009. Google ScholarDigital Library
- J. Wijsen. Certain conjunctive query answering in first-order logic. ACM Trans. Database Syst., 37(2):1--35, 2012. Google ScholarDigital Library
- J. Wijsen. Foundations of query answering on inconsistent databases. SIGMOD Rec., 48(3):6--16, 2019. Google ScholarDigital Library
- M. Zehavi. Maximum minimal vertex cover parameterized by vertex cover. SIAM J. Discrete Math., 31(4):2440--2456, 2017.Google ScholarDigital Library
Recommendations
Computing Optimal Repairs for Functional Dependencies
Best of SIGMOD 2018 and Best of PODS 2018We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair), which is ...
Approximation and inapproximability results on computing optimal repairs
AbstractComputing optimal subset repairs and optimal update repairs of an inconsistent database has a wide range of applications and is becoming standalone research problems. However, these problems have not been well studied in terms of both ...
On the General Position Subset Selection Problem
Let $f(n,\ell)$ be the maximum integer such that every set of $n$ points in the plane with at most $\ell$ collinear contains a subset of $f(n,\ell)$ points with no three collinear. First we prove that if $\ell\leqslant O(\sqrt{n})$, then $f(n,\ell)\geqslant\...
Comments