Abstract
Databases in real life are often neither entirely closed-world nor entirely open-world. Databases in an enterprise are typically partially closed, in which a part of the data is constrained by master data that contains complete information about the enterprise in certain aspects. It has been shown that, despite missing tuples, such a database may turn out to have complete information for answering a query.
This article studies partially closed databases from which both tuples and attribute values may be missing. We specify such a database in terms of conditional tables constrained by master data, referred to as c-instances. We first propose three models to characterize whether a c-instance T is complete for a query Q relative to master data. That is, depending on how missing values in T are instantiated, the answer to Q in T remains unchanged when new tuples are added. We then investigate three problems, to determine (a) whether a given c-instance is complete for a query Q, (b) whether there exists a c-instance that is complete for Q relative to master data available, and (c) whether a c-instance is a minimal-size database that is complete for Q. We establish matching lower and upper bounds on these problems for queries expressed in a variety of languages in each of the three models for specifying relative completeness.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Capturing Missing Tuples and Missing Values
- Serge Abiteboul and Oliver M. Duschka. 1998. Complexity of answering queries using materialized views. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’98). ACM, New York, NY, 254--263. DOI:http://dx.doi.org/10.1145/275487.275516 Google ScholarDigital Library
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley, New York, NY. Google ScholarDigital Library
- Serge Abiteboul, Paris C. Kanellakis, and Gösta Grahne. 1991. On the representation and querying of sets of possible worlds. Theoretical Computer Science 78, 1, 159--187. DOI:http://dx.doi.org/10.1016/0304-3975(51)90007-2 Google ScholarDigital Library
- Marcelo Arenas, Leopoldo Bertossi, and Jan Chomicki. 1999. Consistent query answers in inconsistent databases. In Proceedings of the 18th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’99). ACM, New York, NY, 68--79. DOI:http://dx.doi.org/10.1145/303976.303983 Google ScholarDigital Library
- Marcelo Arenas, Jorge Pérez, Juan L. Reutter, and Cristian Riveros. 2009. Composition and inversion of schema mappings. SIGMOD Record 38, 3, 17--28. DOI:http://dx.doi.org/10.1145/1815933.1815938 Google ScholarDigital Library
- Yang Cao, Ting Deng, Wenfei Fan, and Floris Geerts. 2014. On the data complexity of relative information completeness. Information Systems 45, 18--34. DOI:http://dx.doi.org/10.1016/j.is.2014.04.001Google ScholarCross Ref
- Jan Chomicki. 2007. Consistent query answering: Five easy pieces. In Proceedings of the 11th International Conference on Database Theory (ICDT’07). Springer-Verlag, New York, NY, 1--17. DOI:http://dx.doi.org/10.1007/11965893_1 Google ScholarDigital Library
- Charles Elkan. 1990. Independence of logic database queries and update. In Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, NY, 154--160. DOI:http://dx.doi.org/10.1145/298514.298557 Google ScholarDigital Library
- Wenfei Fan. 2008. Dependencies revisited for improving data quality. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’08). ACM, New York, NY, 159--170. DOI:http://dx.doi.org/10.1145/1376916.1376940 Google ScholarDigital Library
- Wenfei Fan and Floris Geerts. 2009. Relative information completeness. In Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’09). ACM, New York, NY, 97--106. DOI:http://dx.doi.org/10.1145/1559795.1559811 Google ScholarDigital Library
- Wenfei Fan and Floris Geerts. 2010a. Capturing missing tuples and missing values. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’10). ACM, New York, NY, 169--178. DOI:http://dx.doi.org/10.1145/1807085.1807109 Google ScholarDigital Library
- Wenfei Fan and Floris Geerts. 2010b. Relative information completeness. ACM Transactions on Database Systems 35, 4, Article 27, 44 pages. DOI:http://dx.doi.org/10.1145/1862919.1862924 Google ScholarDigital Library
- Wenfei Fan and Floris Geerts. 2012. Foundations of Data Quality Management. Morgan & Claypool Publishers, San Francisco, CA. DOI:http://dx.doi.org/10.2200/S00439ED1V01Y201207DTM030 Google ScholarDigital Library
- Georg Gottlob and Roberto Zicari. 1988. Closed world databases opened through null values. In Proceedings of the 14th International Conference on Very Large Data Bases (VLDB’88). Morgan Kaufmann Publishers Inc., San Francisco, CA, 50--61. http://dl.acm.org/citation.cfm?id=645915.671794 Google ScholarDigital Library
- Gösta Grahne. 1991. The Problem of Incomplete Information in Relational Databases. Lecture Notes in Computer Science, Vol. 554. Springer. DOI:http://dx.doi.org/10.1007/3-540-54919-6 Google ScholarDigital Library
- Tomasz Imieliński and Witold Lipski, Jr. 1984. Incomplete information in relational databases. Journal of the ACM 31, 4, 761--791. DOI:http://dx.doi.org/10.1145/1634.1886 Google ScholarDigital Library
- Phokion G. Kolaitis. 2005. Schema mappings, data exchange, and metadata management. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’05). ACM, New York, NY, 61--75. DOI:http://dx.doi.org/10.1145/1065167.1065176 Google ScholarDigital Library
- Willis Lang, Rimma V. Nehme, Eric Robinson, and Jeffrey F. Naughton. 2014. Partial results in database systems. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD’14). ACM, New York, NY, 1275--1286. DOI:http://dx.doi.org/10.1145/2588555.2612176 Google ScholarDigital Library
- Maurizio Lenzerini. 2002. Data integration: A theoretical perspective. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’02). ACM, New York, NY, 233--246. DOI:http://dx.doi.org/10.1145/543613.543644 Google ScholarDigital Library
- Alon Levy, Inderpal Singh Mumick, Yehoshua Sagiv, and Oded Shmueli. 1993. Equivalence, query-reachability and satisfiability in datalog extensions. In Proceedings of the 12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’93). ACM, New York, NY, 109--122. DOI:http://dx.doi.org/10.1145/153850.153860 Google ScholarDigital Library
- Alon Y. Levy. 1996. Obtaining complete answers from incomplete databases. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB’96). Morgan Kaufmann Publishers Inc., San Francisco, CA, 402--412. http://dl.acm.org/citation.cfm?id=645922.673332 Google ScholarDigital Library
- Alon Y. Levy and Yehoshua Sagiv. 1993. Queries independent of updates. In Proceedings of the 19th International Conference on Very Large Data Bases (VLDB’93). Morgan Kaufmann Publishers Inc., San Francisco, CA, 171--181. http://dl.acm.org/citation.cfm?id=645919.672674 Google ScholarDigital Library
- Leonid Libkin. 2014. Incomplete data: What went wrong, and how to fix it. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’14). ACM, New York, NY, 1--13. DOI:http://dx.doi.org/10.1145/2594538.2594561 Google ScholarDigital Library
- David Loshin. 2008. Master Data Management. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarDigital Library
- Microsoft. 2008. SQL Server 2008 R2 Master Data Services. Retrieved April 3, 2016 from http://www.microsoft.com/sqlserver/2008/en/us/MDS.aspx.Google Scholar
- Donald W. Miller Jr., John D. Yeast, and Robin L. Evans. 2005. Missing prenatal records at a birth center: A communication problem quantified. In AMIA Annual Symposium Proceedings (AMIA’05). 535--539.Google Scholar
- Amihai Motro. 1989. Integrity = validity + completeness. ACM Transactions on Database Systems 14, 4, 480--502. DOI:http://dx.doi.org/10.1145/76902.76904 Google ScholarDigital Library
- Dan Olteanu, Christoph Koch, and Lyublena Antova. 2008. World-set decompositions: Expressiveness and efficient algorithms. Theoretical Computer Science 403, 2--3, 265--284. Google ScholarDigital Library
- Christos H. Papadimitriou. 1994. Computational Complexity. Addison-Wesley, New York, NY.Google Scholar
- John Radcliffe and Andrew White. 2008. Key issues for Master Data Management. Gartner. Retrieved April 3, 2016 from https://www.gartner.com/doc/590507/key-issues-master-data-management.Google Scholar
- Luc Segoufin and Victor Vianu. 2005. Views and queries: Determinacy and rewriting. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’05). ACM, New York, NY, 49--60. DOI:http://dx.doi.org/10.1145/1065167.1065174 Google ScholarDigital Library
- Mark Spielmann. 2000. Abstract State Machines: Verification Problems and Complexity. Ph.D. Dissertation. RWTH Aachen University, Aachen, Germany.Google Scholar
- Boris Trakhtenbrot. 1950. The impossibility of an algorithm for the decidability problem on finite classes. Doklady Akademii Nauk SSSR 70, 4, 569--572.Google Scholar
- Ron van der Meyden. 1998. Logical approaches to incomplete information: A survey. In Logics for Databases and Information Systems, J. Chomicki and G. Saake (Eds.). Kluwer, 307--356. Google ScholarDigital Library
- Moshe Vardi. 1986. On the integrity of databases with incomplete information. In Proceedings of the 5th ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS’86). ACM, New York, NY, 252--266. DOI:http://dx.doi.org/10.1145/6012.15419 Google ScholarDigital Library
- Michael Wooldridge and Paul E. Dunne. 2004. On the computational complexity of qualitative coalitional games. Artificial Intelligence 158, 1, 27--73. DOI:http://dx.doi.org/10.1016/j.artint.2004.04.002 Google ScholarDigital Library
Index Terms
- Capturing Missing Tuples and Missing Values
Recommendations
Capturing missing tuples and missing values
PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsDatabases in real life are often neither entirely closed-world nor entirely open-world. Indeed, databases in an enterprise are typically partially closed, in which a part of the data is constrained by master data that contains complete information about ...
Relative information completeness
This article investigates the question of whether a partially closed database has complete information to answer a query. In practice an enterprise often maintains master data Dm, a closed-world database. We say that a database D is partially closed if ...
Missing Values: Proposition of a Typology and Characterization with an Association Rule-Based Model
DaWaK '09: Proceedings of the 11th International Conference on Data Warehousing and Knowledge DiscoveryHandling missing values when tackling real-world datasets is a great challenge arousing the interest of many scientific communities. Many works propose completion methods or implement new data mining techniques tolerating the presence of missing values. ...
Comments