skip to main content
article

The LLUNATIC data-cleaning framework

Published:01 July 2013Publication History
Skip Abstract Section

Abstract

Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a set of given constraints. In recent years, repairing methods have been proposed for several classes of constraints. However, these methods rely on ad hoc decisions and tend to hard-code the strategy to repair conflicting values. As a consequence, there is currently no general algorithm to solve database repairing problems that involve different kinds of constraints and different strategies to select preferred values. In this paper we develop a uniform framework to solve this problem. We propose a new semantics for repairs, and a chase-based algorithm to compute minimal solutions. We implemented the framework in a DBMS-based prototype, and we report experimental results that confirm its good scalability and superior quality in computing repairs.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle Scholar
  2. L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and Simple Relational Processing of Uncertain Data. In ICDE, pages 983-992, 2008. Google ScholarGoogle Scholar
  3. M. Arenas, L. Bertossi, and J. Chomicki. Consistent Query Answers in Inconsistent Databases. In PODS, pages 68-79, 1999. Google ScholarGoogle Scholar
  4. C. Beeri and M. Vardi. A Proof Procedure for Data Dependencies. J. of the ACM, 31(4):718-741, 1984. Google ScholarGoogle Scholar
  5. L. Bertossi. Database Repairing and Consistent Query Answering. Morgan & Claypool, 2011. Google ScholarGoogle Scholar
  6. L. Bertossi, S. Kolahi, and L. Lakshmanan. Data Cleaning and Query Answering with Matching Dependencies and Matching Functions. In ICDT, pages 268-279, 2011. Google ScholarGoogle Scholar
  7. G. Beskales, I. F. Ilyas, and L. Golab. Sampling the repairs of functional dependency violations under hard constraints. PVLDB, 3:197-207, 2010. Google ScholarGoogle Scholar
  8. P. Bohannon, M. Flaster, W. Fan, and R. Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD, pages 143-154, 2005. Google ScholarGoogle Scholar
  9. X. Chu, I. F. Ilyas, and P. Papotti. Holistic Data Cleaning: Putting Violations into Context. In ICDE, 2013.Google ScholarGoogle Scholar
  10. G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: Consistency and accuracy. In VLDB, pages 315-326, 2007. Google ScholarGoogle Scholar
  11. T. Eiter, M. Fink, G. Greco, and D. Lembo. Repair Localization for Query Answering from Inconsistent Databases. ACM TODS, 33(2):1-51, 2008. Google ScholarGoogle Scholar
  12. R. Fagin, P. Kolaitis, R. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. TCS, 336(1):89-124, 2005. Google ScholarGoogle Scholar
  13. W. Fan. Dependencies Revisited for Improving Data Quality. In PODS, pages 159-170, 2008. Google ScholarGoogle Scholar
  14. W. Fan, H. Gao, X. Jia, J. Li, and S. Ma. Dynamic constraints for record matching. VLDB J., 20(4):495-520, 2011. Google ScholarGoogle Scholar
  15. W. Fan and F. Geerts. Foundations of Data Quality Management. Morgan & Claypool, 2012. Google ScholarGoogle Scholar
  16. W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional Functional Dependencies for Capturing Data Inconsistencies. ACM TODS, 33, 2008. Google ScholarGoogle Scholar
  17. W. Fan, F. Geerts, and J. Wijsen. Determining the Currency of Data. In PODS, pages 71-82, 2011. Google ScholarGoogle Scholar
  18. W. Fan, J. Li, S. Ma, N. Tang, and W. Yu. Towards certain fixes with editing rules and master data. PVLDB, 3(1):173-184, 2010. Google ScholarGoogle Scholar
  19. W. Fan, J. Li, S. Ma, N. Tang, and W. Yu. Interaction Between Record Matching and Data Repairing. In SIGMOD, pages 469-480, 2011. Google ScholarGoogle Scholar
  20. S. Flesca, F. Furfaro, and F. Parisi. Querying and Repairing Inconsistent Numerical Databases. TODS, pages 1-77, 2010. Google ScholarGoogle Scholar
  21. G. Greco, S. Greco, and E. Zumpano. A Logical Framework for Querying and Repairing Inconsistent Databases. TKDE, 15(6):1389-1408, 2003. Google ScholarGoogle Scholar
  22. T. Imielinski and W. Lipski. Incomplete Information in Relational Databases. J. of the ACM, 31(4):761-791, 1984. Google ScholarGoogle Scholar
  23. S. Kolahi and L. V. S. Lakshmanan. On Approximating Optimum Repairs for Functional Dependency Violations. In ICDT, 2009. Google ScholarGoogle Scholar
  24. D. Loshin. Master Data Management. Knowl. Integrity, Inc., 2009. Google ScholarGoogle Scholar
  25. M. Yakout, A. K. Elmagarmid, J. Neville, M. Ouzzani, and I. F. Ilyas. Guided data repair. PVLDB, 4(5):279-289, 2011. Google ScholarGoogle Scholar

Index Terms

  1. The LLUNATIC data-cleaning framework
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 6, Issue 9
          July 2013
          180 pages

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 July 2013
          Published in pvldb Volume 6, Issue 9

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader