skip to main content
research-article
Public Access

Capturing Missing Tuples and Missing Values

Authors Info & Claims
Published:11 May 2016Publication History
Skip Abstract Section

Abstract

Databases in real life are often neither entirely closed-world nor entirely open-world. Databases in an enterprise are typically partially closed, in which a part of the data is constrained by master data that contains complete information about the enterprise in certain aspects. It has been shown that, despite missing tuples, such a database may turn out to have complete information for answering a query.

This article studies partially closed databases from which both tuples and attribute values may be missing. We specify such a database in terms of conditional tables constrained by master data, referred to as c-instances. We first propose three models to characterize whether a c-instance T is complete for a query Q relative to master data. That is, depending on how missing values in T are instantiated, the answer to Q in T remains unchanged when new tuples are added. We then investigate three problems, to determine (a) whether a given c-instance is complete for a query Q, (b) whether there exists a c-instance that is complete for Q relative to master data available, and (c) whether a c-instance is a minimal-size database that is complete for Q. We establish matching lower and upper bounds on these problems for queries expressed in a variety of languages in each of the three models for specifying relative completeness.

Skip Supplemental Material Section

Supplemental Material

References

  1. Serge Abiteboul and Oliver M. Duschka. 1998. Complexity of answering queries using materialized views. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’98). ACM, New York, NY, 254--263. DOI:http://dx.doi.org/10.1145/275487.275516 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Serge Abiteboul, Paris C. Kanellakis, and Gösta Grahne. 1991. On the representation and querying of sets of possible worlds. Theoretical Computer Science 78, 1, 159--187. DOI:http://dx.doi.org/10.1016/0304-3975(51)90007-2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marcelo Arenas, Leopoldo Bertossi, and Jan Chomicki. 1999. Consistent query answers in inconsistent databases. In Proceedings of the 18th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’99). ACM, New York, NY, 68--79. DOI:http://dx.doi.org/10.1145/303976.303983 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marcelo Arenas, Jorge Pérez, Juan L. Reutter, and Cristian Riveros. 2009. Composition and inversion of schema mappings. SIGMOD Record 38, 3, 17--28. DOI:http://dx.doi.org/10.1145/1815933.1815938 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yang Cao, Ting Deng, Wenfei Fan, and Floris Geerts. 2014. On the data complexity of relative information completeness. Information Systems 45, 18--34. DOI:http://dx.doi.org/10.1016/j.is.2014.04.001Google ScholarGoogle ScholarCross RefCross Ref
  7. Jan Chomicki. 2007. Consistent query answering: Five easy pieces. In Proceedings of the 11th International Conference on Database Theory (ICDT’07). Springer-Verlag, New York, NY, 1--17. DOI:http://dx.doi.org/10.1007/11965893_1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Charles Elkan. 1990. Independence of logic database queries and update. In Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, NY, 154--160. DOI:http://dx.doi.org/10.1145/298514.298557 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Wenfei Fan. 2008. Dependencies revisited for improving data quality. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’08). ACM, New York, NY, 159--170. DOI:http://dx.doi.org/10.1145/1376916.1376940 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wenfei Fan and Floris Geerts. 2009. Relative information completeness. In Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’09). ACM, New York, NY, 97--106. DOI:http://dx.doi.org/10.1145/1559795.1559811 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Wenfei Fan and Floris Geerts. 2010a. Capturing missing tuples and missing values. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’10). ACM, New York, NY, 169--178. DOI:http://dx.doi.org/10.1145/1807085.1807109 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Wenfei Fan and Floris Geerts. 2010b. Relative information completeness. ACM Transactions on Database Systems 35, 4, Article 27, 44 pages. DOI:http://dx.doi.org/10.1145/1862919.1862924 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wenfei Fan and Floris Geerts. 2012. Foundations of Data Quality Management. Morgan & Claypool Publishers, San Francisco, CA. DOI:http://dx.doi.org/10.2200/S00439ED1V01Y201207DTM030 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Georg Gottlob and Roberto Zicari. 1988. Closed world databases opened through null values. In Proceedings of the 14th International Conference on Very Large Data Bases (VLDB’88). Morgan Kaufmann Publishers Inc., San Francisco, CA, 50--61. http://dl.acm.org/citation.cfm?id=645915.671794 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gösta Grahne. 1991. The Problem of Incomplete Information in Relational Databases. Lecture Notes in Computer Science, Vol. 554. Springer. DOI:http://dx.doi.org/10.1007/3-540-54919-6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tomasz Imieliński and Witold Lipski, Jr. 1984. Incomplete information in relational databases. Journal of the ACM 31, 4, 761--791. DOI:http://dx.doi.org/10.1145/1634.1886 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Phokion G. Kolaitis. 2005. Schema mappings, data exchange, and metadata management. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’05). ACM, New York, NY, 61--75. DOI:http://dx.doi.org/10.1145/1065167.1065176 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Willis Lang, Rimma V. Nehme, Eric Robinson, and Jeffrey F. Naughton. 2014. Partial results in database systems. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD’14). ACM, New York, NY, 1275--1286. DOI:http://dx.doi.org/10.1145/2588555.2612176 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Maurizio Lenzerini. 2002. Data integration: A theoretical perspective. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’02). ACM, New York, NY, 233--246. DOI:http://dx.doi.org/10.1145/543613.543644 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alon Levy, Inderpal Singh Mumick, Yehoshua Sagiv, and Oded Shmueli. 1993. Equivalence, query-reachability and satisfiability in datalog extensions. In Proceedings of the 12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’93). ACM, New York, NY, 109--122. DOI:http://dx.doi.org/10.1145/153850.153860 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alon Y. Levy. 1996. Obtaining complete answers from incomplete databases. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB’96). Morgan Kaufmann Publishers Inc., San Francisco, CA, 402--412. http://dl.acm.org/citation.cfm?id=645922.673332 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alon Y. Levy and Yehoshua Sagiv. 1993. Queries independent of updates. In Proceedings of the 19th International Conference on Very Large Data Bases (VLDB’93). Morgan Kaufmann Publishers Inc., San Francisco, CA, 171--181. http://dl.acm.org/citation.cfm?id=645919.672674 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Leonid Libkin. 2014. Incomplete data: What went wrong, and how to fix it. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’14). ACM, New York, NY, 1--13. DOI:http://dx.doi.org/10.1145/2594538.2594561 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David Loshin. 2008. Master Data Management. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Microsoft. 2008. SQL Server 2008 R2 Master Data Services. Retrieved April 3, 2016 from http://www.microsoft.com/sqlserver/2008/en/us/MDS.aspx.Google ScholarGoogle Scholar
  26. Donald W. Miller Jr., John D. Yeast, and Robin L. Evans. 2005. Missing prenatal records at a birth center: A communication problem quantified. In AMIA Annual Symposium Proceedings (AMIA’05). 535--539.Google ScholarGoogle Scholar
  27. Amihai Motro. 1989. Integrity = validity + completeness. ACM Transactions on Database Systems 14, 4, 480--502. DOI:http://dx.doi.org/10.1145/76902.76904 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Dan Olteanu, Christoph Koch, and Lyublena Antova. 2008. World-set decompositions: Expressiveness and efficient algorithms. Theoretical Computer Science 403, 2--3, 265--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Christos H. Papadimitriou. 1994. Computational Complexity. Addison-Wesley, New York, NY.Google ScholarGoogle Scholar
  30. John Radcliffe and Andrew White. 2008. Key issues for Master Data Management. Gartner. Retrieved April 3, 2016 from https://www.gartner.com/doc/590507/key-issues-master-data-management.Google ScholarGoogle Scholar
  31. Luc Segoufin and Victor Vianu. 2005. Views and queries: Determinacy and rewriting. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’05). ACM, New York, NY, 49--60. DOI:http://dx.doi.org/10.1145/1065167.1065174 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mark Spielmann. 2000. Abstract State Machines: Verification Problems and Complexity. Ph.D. Dissertation. RWTH Aachen University, Aachen, Germany.Google ScholarGoogle Scholar
  33. Boris Trakhtenbrot. 1950. The impossibility of an algorithm for the decidability problem on finite classes. Doklady Akademii Nauk SSSR 70, 4, 569--572.Google ScholarGoogle Scholar
  34. Ron van der Meyden. 1998. Logical approaches to incomplete information: A survey. In Logics for Databases and Information Systems, J. Chomicki and G. Saake (Eds.). Kluwer, 307--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Moshe Vardi. 1986. On the integrity of databases with incomplete information. In Proceedings of the 5th ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS’86). ACM, New York, NY, 252--266. DOI:http://dx.doi.org/10.1145/6012.15419 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Michael Wooldridge and Paul E. Dunne. 2004. On the computational complexity of qualitative coalitional games. Artificial Intelligence 158, 1, 27--73. DOI:http://dx.doi.org/10.1016/j.artint.2004.04.002 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Capturing Missing Tuples and Missing Values

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 41, Issue 2
        Invited Paper from SIGMOD 2014 and Regular Papers
        June 2016
        271 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/2936309
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 May 2016
        • Accepted: 1 December 2015
        • Revised: 1 October 2015
        • Received: 1 March 2015
        Published in tods Volume 41, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader