ABSTRACT
Dependency theory is almost as old as relational databases themselves, and has traditionally been used to improve the quality of schema, among other things. Recently there has been renewed interest in dependencies for improving the quality of data. The increasing demand for data quality technology has also motivated revisions of classical dependencies, to capture more inconsistencies in real-life data, and to match, repair and query the inconsistent data. This paper aims to provide an overview of recent advances in revising classical dependencies for improving data quality.
Supplemental Material
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- S. Abiteboul, L. Segoufin, and V. Vianu. Representing and querying XML with incomplete information. TODS 31(1): 208--254, 2006. Google ScholarDigital Library
- P. Andritsos, A. Fuxman, and R. J.Miller. Clean answers over dirty databases: A probabilistic approach. In ICDE, 2006. Google ScholarDigital Library
- L. Antova, C. Koch and D. Olteanu. From complete to incomplete information and back. In SIGMOD, 2007. Google ScholarDigital Library
- L. Antova, C. Koch and D. Olteanu. From complete to incomplete information and back. In SIGMOD, 2007. Google ScholarDigital Library
- M. Arenas, L. E. Bertossi, and J. Chomicki. Answer sets for consistent query answering in inconsistent databases. TPLP 3(4-5): 393--424, 2003. Google ScholarDigital Library
- M. Arenas, L. E. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, 1999. Google ScholarDigital Library
- M. Arenas, L. E. Bertossi, J. Chomicki, X. He, V. Raghavan, and J. Spinrad. Scalar aggregation in inconsistent databases. TCS 296(3): 405--434, 2003. Google ScholarDigital Library
- M. Arenas, W. Fan, and L. Libkin. On the complexity of verifying consistency of XML specifications. SICOMP, to appear. Google ScholarDigital Library
- C. Batini and M. Scannapieco. Data Quality: Concepts, Methodologies and Techniques. Springer, 2006. Google ScholarDigital Library
- M. Baudinet, J. Chomicki, and P. Wolper. Constraint-generating dependencies. JCSS 59(1): 94--115, 1999. Google ScholarDigital Library
- L. Bertossi. Consistent query answering in databases. SIG-MOD Rec. 35(2): 68--76, 2006. Google ScholarDigital Library
- L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Complexity and approximation of fixing numerical attributes in databases under integrity constraints. In DBPL, 2005. Google ScholarDigital Library
- L. Bertossi and J. Chomicki. Query answering in inconsistent databases. Logics for Emerging Applications of Databases, 2003.Google Scholar
- P. Bohannon, W. Fan, E. Elnahrawy, and M. Flaster. Putting context into schema matching. In VLDB, 2006. Google ScholarDigital Library
- P. Bohannon, W. Fan, M. Flaster, and R. Rastogi. A costbased model and effective heuristic for repairing constraints by value modification. In SIGMOD, 2005. Google ScholarDigital Library
- P. D. Bra and J. Paredaens. Conditional dependencies for horizontal decompositions. In ICALP, 1983. Google ScholarDigital Library
- L. Bravo and L. E. Bertossi. Consistent query answers in virtual data integration systems. Inconsistency Tolerance, 2005. Google ScholarDigital Library
- L. Bravo, W. Fan, F. Geerts, and S. Ma. Increasing the expressivity of conditional functional dependencies without extra charge for complexity. In ICDE, 2008. Google ScholarDigital Library
- L. Bravo, W. Fan, and S. Ma. Extending dependencies with conditions. In VLDB, 2007. Google ScholarDigital Library
- F. Bry. Query answering in information systems with integrity constraints. In IICIS, 1996. Google ScholarDigital Library
- P. Buneman, J. Cheney, W. Tan, and S. Vansummeren. Curated databases. In PODS, 2008. Google ScholarDigital Library
- A. Calì, D. Lembo, and R. Rosati. On the decidability and complexity of query answering over inconsistent and incomplete databases. In PODS, 2003. Google ScholarDigital Library
- J. Chomicki. Consistent query answering: Five easy pieces. In ICDT, 2007. Google ScholarDigital Library
- J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput. 197(1-2):90--121, 2005. Google ScholarDigital Library
- J. Chomicki and J.Marcinkowski. On the computational complexity of minimal-change integrity maintenance in relational databases. Inconsistency Tolerance:119--150, 2005. Google ScholarDigital Library
- E. F. Codd. Relational completeness of data base sublanguages. In R. Rustin (ed.): Database Systems: 65-98, Prentice Hall and IBM Research Report RJ 987, 1972.Google Scholar
- G. Cong, W. Fan, F. Geerts, X. Jia, and S.Ma. Improving data quality: Consistency and accuracy. In VLDB, 2007. Google ScholarDigital Library
- N. N. Dalvi and D. Suciu. Management of probabilistic data: Foundations and challenges. In PODS, 2007. Google ScholarDigital Library
- A. Dreibelbis, E. Hechler, B. Mathews, M. Oberhofer, and G. Sauter. Master Data Management architecture patterns. IBM, Mar. 2007.Google Scholar
- W. W. Eckerson. Data quality and the bottom line: Achieving business success through a commitment to high quality data. The Data Warehousing Institute, 2002.Google Scholar
- A. K. Elmagarmid, P. G. Ipeirotis and V. S. Verykios. Duplicate record detection: A survey. TKDE 19(1): 1--16, 1007. Google ScholarDigital Library
- L. English. Plain English on data quality: Information quality management: The next frontier. DM Review Magazine, 2000.Google Scholar
- R. Fagin. Inverting schema mappings. in PODS, 2007. Google ScholarDigital Library
- R. Fagin and M. Y. Vardi. The theory of data dependencies - An overview. In ICALP, 1984. Google ScholarDigital Library
- W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for capturing data inconsistencies. TODS, to appear. Google ScholarDigital Library
- W. Fan, Y. Hu, J. Liu, S. Ma, and Y. Wu. Computing view dependencies with conditions. Unpublished manuscript.Google Scholar
- W. Fan, X. Jia, and S. Ma. Object identification based on dependencies. Unpublished manuscript.Google Scholar
- W. Fan and L. Libkin. On XML integrity constraints in the presence of DTDs. J. ACM 49(3):368--406, 2002. Google ScholarDigital Library
- I. Fellegi and D. Holt. A systematic approach to automatic edit and imputation. J. American Statistical Association 71(353):17--35, 1976.Google ScholarCross Ref
- S. Flesca, F. Furfaro, S. Greco, and E. Zumpano. Querying and repairing inconsistent XML data. In WISE 2005. Google ScholarDigital Library
- A. Fuxman, E. Fazli, and R. J. Miller. ConQuer: Efficient management of inconsistent databases. In SIGMOD 2005. Google ScholarDigital Library
- A. Fuxman and R. J. Miller. First-order query rewriting for inconsistent databases. JCSS 73(4): 610--635, 2007. Google ScholarDigital Library
- Gartner. Forecast: Data quality tools, worldwide, 2006--2011. 2007.Google Scholar
- S. Ginsburg and E. H. Spanier. On completing tables to satisfy functional dependencies. TCS 39: 309--317, 1985.Google ScholarCross Ref
- G. Grahne. The Problem of Incomplete Information in Relational Databases. Springer, 1991. Google ScholarDigital Library
- G. Greco, S. Greco, and E. Zumpano. A logical framework for querying and repairing inconsistent databases. TKDE 15(6): 1389--1408, 2003. Google ScholarDigital Library
- M. A. Hernandez and S. Stolfo. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Min. Knowl. Discov. 2(1): 9--37, 1998. Google ScholarDigital Library
- R. Hull. Specifiable implicational dependency families. J. ACM 31(2): 210--226, 1984. Google ScholarDigital Library
- T. Imieliński and W. Lipski Jr. Incomplete information in relational databases. J. ACM 31(4): 761--791, 1984. Google ScholarDigital Library
- P. C. Kanellakis. Elements of relational database theory. In Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics: 1073--1156, 1990. Google ScholarDigital Library
- A. C. Klug. Calculating constraints on relational expressions. TODS 5(3):260--290, 1980. Google ScholarDigital Library
- A. C. Klug and R. Price. Determining view dependencies using tableaux. TODS 7(3):361--380, 1982. Google ScholarDigital Library
- P. G. Kolaitis. Schema mappings, data exchange, and metadata management. In PODS, 2005. Google ScholarDigital Library
- D. Lembo, M. Lenzerini, and R. Rosati. Source inconsistency and incompleteness in data integration. In KRDB, 2002.Google Scholar
- M. Lenzerini. Data integration: A theoretical perspective. In PODS, 2002. Google ScholarDigital Library
- A. Lopatenko and L. E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, 2007. Google ScholarDigital Library
- A. Lopatenko and L. Bravo. Efficient approximation algorithms for repairing inconsistent databases. In ICDE, 2007.Google ScholarCross Ref
- M. J. Maher. Constrained dependencies. TCS 173(1): 113--149, 1997. Google ScholarDigital Library
- M. J. Maher and D. Srivastava. Chasing constrained tuple-generating dependencies. In PODS, 1996. Google ScholarDigital Library
- R. van der Meyden. Logical approaches to incomplete information: A survey. In J. Chomicki and G. Saake (eds.): Logics for Databases and Information Systems: 307--356, 1998. Google ScholarDigital Library
- J. Radcliffe and A. White. Key issues for Master Data Management. Gartner, Jan. 2008.Google Scholar
- K. V. S. V. N. Raju and A. K. Majumdar. Fuzzy functional dependencies and lossless join decomposition of fuzzy relational database systems. TODS 13(2): 129--166, 1988. Google ScholarDigital Library
- E. Rahm and H. H. Do. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 23(4): 3--13, 2000.Google Scholar
- T. Redman. The impact of poor data quality on the typical enterprise. Commun. ACM 41(2): 79--82, 1998. Google ScholarDigital Library
- C. C. Shilakes and J. Tylman. Enterprise information portals. Merrill Lynch, 1998.Google Scholar
- S. Staworko. Declarative inconsistency handling in relational and semi-structured databases. PhD thesis, the State University of New York at Buffalo, 2007, UB CSE TR 2008-03. Google ScholarDigital Library
- J. Wijsen. Database repairing using updates. TODS 30(3): 722--768, 2005. Google ScholarDigital Library
- W. E.Winkler. Methods for evaluating and creating data quality. Inf. Syst. 29(7): 531--550, 2004. Google ScholarDigital Library
- M. Winslett. Reasoning about action using a possible models approach. In AAAI, 1988.Google Scholar
Index Terms
- Dependencies revisited for improving data quality
Recommendations
Towards Data Quality into the Data Warehouse Development
DASC '11: Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure ComputingCommonly, DW development methodologies, paying little attention to the problem of data quality and completeness. One of the common mistakes made during the planning of a data warehousing project is to assume that data quality will be addressed during ...
Executable Data Quality Models
The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description ...
Data Warehouse Quality Assessment Using Contexts
WISE 2016: Proceedings of the 17th International Conference on Web Information Systems Engineering - Volume 10042Data Warehousing Systems DWS are of great relevance for supporting decision making and data analysis. This has been proven over time, through the generalization of its development and use in all kind of organizations. Many researchers have presented the ...
Comments