Abstract
Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a set of given constraints. In recent years, repairing methods have been proposed for several classes of constraints. However, these methods rely on ad hoc decisions and tend to hard-code the strategy to repair conflicting values. As a consequence, there is currently no general algorithm to solve database repairing problems that involve different kinds of constraints and different strategies to select preferred values. In this paper we develop a uniform framework to solve this problem. We propose a new semantics for repairs, and a chase-based algorithm to compute minimal solutions. We implemented the framework in a DBMS-based prototype, and we report experimental results that confirm its good scalability and superior quality in computing repairs.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
- L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and Simple Relational Processing of Uncertain Data. In ICDE, pages 983-992, 2008. Google Scholar
- M. Arenas, L. Bertossi, and J. Chomicki. Consistent Query Answers in Inconsistent Databases. In PODS, pages 68-79, 1999. Google Scholar
- C. Beeri and M. Vardi. A Proof Procedure for Data Dependencies. J. of the ACM, 31(4):718-741, 1984. Google Scholar
- L. Bertossi. Database Repairing and Consistent Query Answering. Morgan & Claypool, 2011. Google Scholar
- L. Bertossi, S. Kolahi, and L. Lakshmanan. Data Cleaning and Query Answering with Matching Dependencies and Matching Functions. In ICDT, pages 268-279, 2011. Google Scholar
- G. Beskales, I. F. Ilyas, and L. Golab. Sampling the repairs of functional dependency violations under hard constraints. PVLDB, 3:197-207, 2010. Google Scholar
- P. Bohannon, M. Flaster, W. Fan, and R. Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD, pages 143-154, 2005. Google Scholar
- X. Chu, I. F. Ilyas, and P. Papotti. Holistic Data Cleaning: Putting Violations into Context. In ICDE, 2013.Google Scholar
- G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: Consistency and accuracy. In VLDB, pages 315-326, 2007. Google Scholar
- T. Eiter, M. Fink, G. Greco, and D. Lembo. Repair Localization for Query Answering from Inconsistent Databases. ACM TODS, 33(2):1-51, 2008. Google Scholar
- R. Fagin, P. Kolaitis, R. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. TCS, 336(1):89-124, 2005. Google Scholar
- W. Fan. Dependencies Revisited for Improving Data Quality. In PODS, pages 159-170, 2008. Google Scholar
- W. Fan, H. Gao, X. Jia, J. Li, and S. Ma. Dynamic constraints for record matching. VLDB J., 20(4):495-520, 2011. Google Scholar
- W. Fan and F. Geerts. Foundations of Data Quality Management. Morgan & Claypool, 2012. Google Scholar
- W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional Functional Dependencies for Capturing Data Inconsistencies. ACM TODS, 33, 2008. Google Scholar
- W. Fan, F. Geerts, and J. Wijsen. Determining the Currency of Data. In PODS, pages 71-82, 2011. Google Scholar
- W. Fan, J. Li, S. Ma, N. Tang, and W. Yu. Towards certain fixes with editing rules and master data. PVLDB, 3(1):173-184, 2010. Google Scholar
- W. Fan, J. Li, S. Ma, N. Tang, and W. Yu. Interaction Between Record Matching and Data Repairing. In SIGMOD, pages 469-480, 2011. Google Scholar
- S. Flesca, F. Furfaro, and F. Parisi. Querying and Repairing Inconsistent Numerical Databases. TODS, pages 1-77, 2010. Google Scholar
- G. Greco, S. Greco, and E. Zumpano. A Logical Framework for Querying and Repairing Inconsistent Databases. TKDE, 15(6):1389-1408, 2003. Google Scholar
- T. Imielinski and W. Lipski. Incomplete Information in Relational Databases. J. of the ACM, 31(4):761-791, 1984. Google Scholar
- S. Kolahi and L. V. S. Lakshmanan. On Approximating Optimum Repairs for Functional Dependency Violations. In ICDT, 2009. Google Scholar
- D. Loshin. Master Data Management. Knowl. Integrity, Inc., 2009. Google Scholar
- M. Yakout, A. K. Elmagarmid, J. Neville, M. Ouzzani, and I. F. Ilyas. Guided data repair. PVLDB, 4(5):279-289, 2011. Google Scholar
Index Terms
- The LLUNATIC data-cleaning framework
Recommendations
Keyword query cleaning using hidden Markov models
KEYS '09: Proceedings of the First International Workshop on Keyword Search on Structured DataIn this paper, we consider the problem of keyword query cleaning for structured databases from a probabilistic approach. Keyword query cleaning consists of rewriting the user query, segmenting the keywords, matching each segment to database items, and ...
Time series data cleaning: from anomaly detection to anomaly repairing
Errors are prevalent in time series data, such as GPS trajectories or sensor readings. Existing methods focus more on anomaly detection but not on repairing the detected anomalies. By simply filtering out the dirty data via anomaly detection, ...
Comments