skip to main content
research-article

XML with incomplete information

Published:21 December 2010Publication History
Skip Abstract Section

Abstract

We study models of incomplete information for XML, their computational properties, and query answering. While our approach is motivated by the study of relational incompleteness, incomplete information in XML documents may appear not only as null values but also as missing structural information. Our goal is to provide a classification of incomplete descriptions of XML documents, and separate features—or groups of features—that lead to hard computational problems from those that admit efficient algorithms. Our classification of incomplete information is based on the combination of null values with partial structural descriptions of documents. The key computational problems we consider are consistency of partial descriptions, representability of complete documents by incomplete ones, and query answering. We show how factors such as schema information, the presence of node ids, and missing structural information affect the complexity of these main computational problems, and find robust classes of incomplete XML descriptions that permit tractable query evaluation.

Skip Supplemental Material Section

Supplemental Material

References

  1. Abiteboul, S., and Duschka, O. 1998. Complexity of answering queries using materialized views. In Proceedings of the 17th ACM Symposium on Principles of Database Systems, PODS'98. 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Abiteboul, S., Kanellakis, P., and Grahne, G. 1991. On the representation and querying of sets of possible worlds. Theoret. Comput. Sci. 78, 1, 158--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Abiteboul, S., Segoufin, L., and Vianu, V. 2006. Representing and querying XML with incomplete information. ACM Trans. Datab. Syst. 31, 1, 208--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Albert, J., Giammarresi, D., and Wood., D. 2001. Normal form algorithms for extended context-free grammars. Theoreti. Comput. Sci. 267, 1-2, 35--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arenas, M., Fan, W., and Libkin, L. 2008. On the complexity of verifying consistency of XML specifications. SIAM J. Comput. 38, 3, 841--880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Arenas, M., and Libkin, L. 2008. XML data exchange: Consistency and query answering. J. ACM 55, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Barceló, P., Libkin, L., Poggi, A., and Sirangelo, C. 2009. XML with incomplete information: models, properties, and query answering. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 237--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Benedikt, M., Fan, W., and Geerts, F. 2008. XPath satisfiability in the presence of DTDs. J. ACM 55, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Björklund, H., Martens, W., and Schwentick, T. 2007. Conjunctive query containment over trees. In Proceedings of the International Symposium on Database Programming Languages, M. Arenas and M. I. Schwartzbach, Eds. Lecture Notes in Computer Science, vol. 4797. Springer, 66--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Björklund, H., Martens, W., and Schwentick, T. 2008. Optimizing conjunctive queries over trees using schema information. In Proceedings of the 33rd International Symposium on Mathematical Foundations of Computer Science (MFCS'08). Lecture Notes in Computer Science, vol. 5162, Springer. 132--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Calì, A., Lembo, D., and Rosati, R. 2003. On the decidability and complexity of query answering over inconsistent and incomplete databases. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 260--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Calvanese, D., Giacomo, G. D., and Lenzerini, M. 1998. Semi-structured data with constraints and incomplete information. In Proceedings of the Workshop on Description Logics.Google ScholarGoogle Scholar
  14. Calvanese, D., Giacomo, G. D., and Lenzerini, M. 2002. Representing and reasoning on XML documents: a description logic approach. J. Logic Comput. 9, 3, 295--318.Google ScholarGoogle ScholarCross RefCross Ref
  15. Cohen, S., Kimelfeld, B., and Sagiv, Y. 2008. Incorporating constraints in probabilistic XML. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 109--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Date, C., and Darwin, H. 1996. A Guide to the SQL Standard. Addison-Wesley.Google ScholarGoogle Scholar
  17. David, C. 2008. Complexity of data tree patterns over XML documents. In Proceedings of the 33rd International Symposium on Mathematical Foundations of Computer Science (MFCS'08). Lecture Notes in Computer Science, vol. 5162, Springer, 278--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Deutsch, A., and Tannen, V. 2003. Reformulation of XML Queries and Constraints. In Proceedings of the International Conference on Database Theory (ICDT). 225--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. DOM. April 2004. Document object model (DOM). W3C recommendation. http://www.w3.org/ TR/DOM-Level-3-Core/.Google ScholarGoogle Scholar
  20. Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2005. Data exchange: Semantics and query answering. Theor. Comput. Sci. 336, 89--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Fan, W., and Libkin, L. 2002. On XML integrity constraints in the presence of DTDs. J. ACM 49, 3, 368--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Figueira, D. 2009. Satisfiability of downward XPath with data equality tests. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 197--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gardner, P., Smith, G., Wheelhouse, M., and Zarfaty, U. 2008. Local Hoare reasoning about DOM. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 261--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gottlob, G., Koch, C., and Schulz, K. 2006. Conjunctive queries over trees. J. ACM 53, 2, 238--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Imielinski, T., and Lipski, W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kanza, Y., Nutt, W., and Sagiv, Y. 2002. Querying incomplete information in semistructured data. J. Comput. Syst. Sci. 64, 3, 655--693.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kolaitis, P., and Vardi, M. 2007. A logical approach to constraint satisfaction. In Finite Model Theory and Its Applications. Springer, 339--370.Google ScholarGoogle Scholar
  28. Lenzerini, M. 2002. Data integration: a theoretical perspective. In Proceedings of the 21st ACM Symposium on Principles of Database Systems (PODS'02). 233--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Libkin, L. 2004. Elements of Finite Model Theory 1st Ed. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Martens, W., Neven, F., Schwentick, T., and Bex, G. J. 2006. Expressiveness and complexity of XML Schema. ACM Trans. Datab. Syst. 31, 3, 770--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Neven, F., and Schwentick, T. 2006. On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Logical Meth. Comput. Sci. 2, 3.Google ScholarGoogle ScholarCross RefCross Ref
  32. Olteanu, D., Koch, C., and Antova, L. 2008. World-set decompositions: expressiveness and efficient algorithms. Theor. Comput. Sci. 403, 2-3, 265--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Schwentick, T. 2008. A little bit infinite? On adding data to finitely labelled structures. In Proceedings of the Symposium on Theoretical Aspects of Computer Science. 17--18.Google ScholarGoogle Scholar
  34. Segoufin, L. 2006. Automata and logics for words and trees over an infinite alphabet. In Comput. Sci. Logic. 41--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Senellart, P. and Abiteboul, S. 2007. On the complexity of managing probabilistic XML data. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 283--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. ten Cate, B., and Lutz, C. 2009. The complexity of query containment in expressive fragments of XPath 2.0. J. ACM 56, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Vardi, M. 1986. Querying logical databases. J. Comput. Syst. Sci. 33, 2, 142--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wood, P. 2003. Containment for XPath fragments under DTD constraints. In Proceedings of the International Conference on Database Theory (ICDT). 300--314. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. XML with incomplete information

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Journal of the ACM
              Journal of the ACM  Volume 58, Issue 1
              December 2010
              158 pages
              ISSN:0004-5411
              EISSN:1557-735X
              DOI:10.1145/1870103
              Issue’s Table of Contents

              Copyright © 2010 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 21 December 2010
              • Accepted: 1 July 2010
              • Revised: 1 April 2010
              • Received: 1 September 2009
              Published in jacm Volume 58, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader