skip to main content
10.1145/1514894.1514901acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free Access

On approximating optimum repairs for functional dependency violations

Published:23 March 2009Publication History

ABSTRACT

We study the problem of repairing an inconsistent database that violates a set of functional dependencies by making the smallest possible value modifications. For an inconsistent database, we define an optimum repair as a database that satisfies the functional dependencies, and minimizes, among all repairs, a distance measure that depends on the number of corrections made in the database and the weights of tuples modified. We show that like other versions of the repair problem, checking the existence of a repair within a certain distance of a database is NP-complete. We also show that finding a constant-factor approximation for the optimum repair for any set of functional dependencies is NP-hard. Furthermore, there is a small constant and a set of functional dependencies, for which finding an approximate solution for the optimum repair within the factor of that constant is also NP-hard. Then we present an approximation algorithm that for a fixed set of functional dependencies and an arbitrary input inconsistent database, produces a repair whose distance to the database is within a constant factor of the optimum repair distance. We finally show how the approximation algorithm can be used in data cleaning using a recent extension to functional dependencies, called conditional functional dependencies.

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Serge Abiteboul, Paris C. Kanellakis, and Gösta Grahne. On the representation and querying of sets of possible worlds. Theor. Comput. Sci., 78(1):158--187, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68--79, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Answer sets for consistent query answering in inconsistent databases. TPLP, 3(4--5):393--424, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marcelo Arenas, Leopoldo E. Bertossi, Jan Chomicki, Xin He, Vijay Raghavan, and Jeremy Spinrad. Scalar aggregation in inconsistent databases. Theor. Comput. Sci., 3(296):405--434, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mihir Bellare, Shafi Goldwasser, Carsten Lund, and A. Russeli. Efficient probabilistically checkable proofs and applications to approximations. In STOC, pages 294--304, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Leopoldo E. Bertossi. Consistent query answering in databases. SIGMOD Record, 35(2):68--76, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Leopoldo E. Bertossi, Loreto Bravo, Enrico Franconi, and Andrei Lopatenko. The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Inf. Syst., 33(4--5):407--434, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  10. Philip Bohannon, Michael Flaster, Wenfei Fan, and Rajeev Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD Conference, pages 143--154, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jan Chomicki. Consistent query answering: Five easy pieces. In ICDT, pages 1--17, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jan Chomicki and Jerzy Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1--2):90--121, 2005.Google ScholarGoogle Scholar
  13. Gao Cong, Wenfei Fan, Floris Geerts, Xibei Jia, and Shuai Ma. Improving data quality: Consistency and accuracy. In VLDB, pages 315--326, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dorit S. Hochbaum (Editor). Approximation Algorithms for NP-Hard Problems. PWS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wenfei Fan. Dependencies revisited for improving data quality. In PODS, pages 159--170, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sergio Flesca, Filippo Furfaro, and Francesco Parisi. Consistent query answers on numerical databases under aggregate constraints. In DBPL, pages 279--294, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Enrico Franconi, Antonio Laureti Palma, Nicola Leone, Simona Perri, and Francesco Scarcello. Census data repair: a challenging application of disjunctive logic programming. In LPAR, pages 561--578, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Filippo Furfaro, Sergio Greco, and Cristian Molinaro. A three-valued semantics for querying and repairing inconsistent databases. Ann. Math. Artif. Intell., 51(2--4):167--193, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ariel Fuxman, Elham Fazli, and Renée J. Miller. Conquer: Efficient management of inconsistent databases. In SIGMOD Conference, pages 155--166, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ariel Fuxman and Renée J. Miller. First-order query rewriting for inconsistent databases. J. Comput. Syst. Sci., 73(4):610--635, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Solmaz Kolahi and Leonid Libkin. On redundancy vs dependency preservation in normalization: an information-theoretic study of 3nf. In PODS, pages 114--123, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gabriel M. Kuper, Leonid Libkin, and Jan Paredaens, editors. Constraint Databases. Springer, 2000.Google ScholarGoogle Scholar
  23. Andrei Lopatenko and Loreto Bravo. Efficient approximation algorithms for repairing inconsistent databases. In ICDE, pages 216--225, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. Christos H. Papadimitriou and Mihalis Yannakakis. Optimization, approximation, and complexity classes. J. Comput. Syst. Sci., 43(3):425--440, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  25. Vijay V. Vazirani. Approximation Algorithms. Springer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jef Wijsen. Condensed representation of database repairs for consistent query answering. In ICDT, pages 375--390, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jef Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722--768, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jef Wijsen. Project-join-repair: An approach to consistent query answering under functional dependencies. In FQAS, pages 1--12, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On approximating optimum repairs for functional dependency violations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICDT '09: Proceedings of the 12th International Conference on Database Theory
      March 2009
      334 pages
      ISBN:9781605584232
      DOI:10.1145/1514894

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 March 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader