Abstract
We study models of incomplete information for XML, their computational properties, and query answering. While our approach is motivated by the study of relational incompleteness, incomplete information in XML documents may appear not only as null values but also as missing structural information. Our goal is to provide a classification of incomplete descriptions of XML documents, and separate features—or groups of features—that lead to hard computational problems from those that admit efficient algorithms. Our classification of incomplete information is based on the combination of null values with partial structural descriptions of documents. The key computational problems we consider are consistency of partial descriptions, representability of complete documents by incomplete ones, and query answering. We show how factors such as schema information, the presence of node ids, and missing structural information affect the complexity of these main computational problems, and find robust classes of incomplete XML descriptions that permit tractable query evaluation.
Supplemental Material
Available for Download
Online appendix to XML with incomplete information on article 4.
- Abiteboul, S., and Duschka, O. 1998. Complexity of answering queries using materialized views. In Proceedings of the 17th ACM Symposium on Principles of Database Systems, PODS'98. 254--263. Google ScholarDigital Library
- Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
- Abiteboul, S., Kanellakis, P., and Grahne, G. 1991. On the representation and querying of sets of possible worlds. Theoret. Comput. Sci. 78, 1, 158--187. Google ScholarDigital Library
- Abiteboul, S., Segoufin, L., and Vianu, V. 2006. Representing and querying XML with incomplete information. ACM Trans. Datab. Syst. 31, 1, 208--254. Google ScholarDigital Library
- Albert, J., Giammarresi, D., and Wood., D. 2001. Normal form algorithms for extended context-free grammars. Theoreti. Comput. Sci. 267, 1-2, 35--47. Google ScholarDigital Library
- Arenas, M., Fan, W., and Libkin, L. 2008. On the complexity of verifying consistency of XML specifications. SIAM J. Comput. 38, 3, 841--880. Google ScholarDigital Library
- Arenas, M., and Libkin, L. 2008. XML data exchange: Consistency and query answering. J. ACM 55, 2. Google ScholarDigital Library
- Barceló, P., Libkin, L., Poggi, A., and Sirangelo, C. 2009. XML with incomplete information: models, properties, and query answering. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 237--246. Google ScholarDigital Library
- Benedikt, M., Fan, W., and Geerts, F. 2008. XPath satisfiability in the presence of DTDs. J. ACM 55, 2. Google ScholarDigital Library
- Björklund, H., Martens, W., and Schwentick, T. 2007. Conjunctive query containment over trees. In Proceedings of the International Symposium on Database Programming Languages, M. Arenas and M. I. Schwartzbach, Eds. Lecture Notes in Computer Science, vol. 4797. Springer, 66--80. Google ScholarDigital Library
- Björklund, H., Martens, W., and Schwentick, T. 2008. Optimizing conjunctive queries over trees using schema information. In Proceedings of the 33rd International Symposium on Mathematical Foundations of Computer Science (MFCS'08). Lecture Notes in Computer Science, vol. 5162, Springer. 132--143. Google ScholarDigital Library
- Calì, A., Lembo, D., and Rosati, R. 2003. On the decidability and complexity of query answering over inconsistent and incomplete databases. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 260--271. Google ScholarDigital Library
- Calvanese, D., Giacomo, G. D., and Lenzerini, M. 1998. Semi-structured data with constraints and incomplete information. In Proceedings of the Workshop on Description Logics.Google Scholar
- Calvanese, D., Giacomo, G. D., and Lenzerini, M. 2002. Representing and reasoning on XML documents: a description logic approach. J. Logic Comput. 9, 3, 295--318.Google ScholarCross Ref
- Cohen, S., Kimelfeld, B., and Sagiv, Y. 2008. Incorporating constraints in probabilistic XML. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 109--118. Google ScholarDigital Library
- Date, C., and Darwin, H. 1996. A Guide to the SQL Standard. Addison-Wesley.Google Scholar
- David, C. 2008. Complexity of data tree patterns over XML documents. In Proceedings of the 33rd International Symposium on Mathematical Foundations of Computer Science (MFCS'08). Lecture Notes in Computer Science, vol. 5162, Springer, 278--289. Google ScholarDigital Library
- Deutsch, A., and Tannen, V. 2003. Reformulation of XML Queries and Constraints. In Proceedings of the International Conference on Database Theory (ICDT). 225--241. Google ScholarDigital Library
- DOM. April 2004. Document object model (DOM). W3C recommendation. http://www.w3.org/ TR/DOM-Level-3-Core/.Google Scholar
- Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2005. Data exchange: Semantics and query answering. Theor. Comput. Sci. 336, 89--124. Google ScholarDigital Library
- Fan, W., and Libkin, L. 2002. On XML integrity constraints in the presence of DTDs. J. ACM 49, 3, 368--406. Google ScholarDigital Library
- Figueira, D. 2009. Satisfiability of downward XPath with data equality tests. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 197--206. Google ScholarDigital Library
- Gardner, P., Smith, G., Wheelhouse, M., and Zarfaty, U. 2008. Local Hoare reasoning about DOM. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 261--270. Google ScholarDigital Library
- Gottlob, G., Koch, C., and Schulz, K. 2006. Conjunctive queries over trees. J. ACM 53, 2, 238--272. Google ScholarDigital Library
- Imielinski, T., and Lipski, W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761--791. Google ScholarDigital Library
- Kanza, Y., Nutt, W., and Sagiv, Y. 2002. Querying incomplete information in semistructured data. J. Comput. Syst. Sci. 64, 3, 655--693.Google ScholarDigital Library
- Kolaitis, P., and Vardi, M. 2007. A logical approach to constraint satisfaction. In Finite Model Theory and Its Applications. Springer, 339--370.Google Scholar
- Lenzerini, M. 2002. Data integration: a theoretical perspective. In Proceedings of the 21st ACM Symposium on Principles of Database Systems (PODS'02). 233--246. Google ScholarDigital Library
- Libkin, L. 2004. Elements of Finite Model Theory 1st Ed. Springer-Verlag. Google ScholarDigital Library
- Martens, W., Neven, F., Schwentick, T., and Bex, G. J. 2006. Expressiveness and complexity of XML Schema. ACM Trans. Datab. Syst. 31, 3, 770--813. Google ScholarDigital Library
- Neven, F., and Schwentick, T. 2006. On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Logical Meth. Comput. Sci. 2, 3.Google ScholarCross Ref
- Olteanu, D., Koch, C., and Antova, L. 2008. World-set decompositions: expressiveness and efficient algorithms. Theor. Comput. Sci. 403, 2-3, 265--284. Google ScholarDigital Library
- Schwentick, T. 2008. A little bit infinite? On adding data to finitely labelled structures. In Proceedings of the Symposium on Theoretical Aspects of Computer Science. 17--18.Google Scholar
- Segoufin, L. 2006. Automata and logics for words and trees over an infinite alphabet. In Comput. Sci. Logic. 41--57. Google ScholarDigital Library
- Senellart, P. and Abiteboul, S. 2007. On the complexity of managing probabilistic XML data. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 283--292. Google ScholarDigital Library
- ten Cate, B., and Lutz, C. 2009. The complexity of query containment in expressive fragments of XPath 2.0. J. ACM 56, 6. Google ScholarDigital Library
- Vardi, M. 1986. Querying logical databases. J. Comput. Syst. Sci. 33, 2, 142--160. Google ScholarDigital Library
- Wood, P. 2003. Containment for XPath fragments under DTD constraints. In Proceedings of the International Conference on Database Theory (ICDT). 300--314. Google ScholarDigital Library
Index Terms
- XML with incomplete information
Recommendations
XML with incomplete information: models, properties, and query answering
PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsWe study models of incomplete information for XML, their computational properties, and query answering. While our approach is motivated by the study of relational incompleteness, incomplete information in XML documents may appear not only as null values ...
XML Schema Mappings: Data Exchange and Metadata Management
Relational schema mappings have been extensively studied in connection with data integration and exchange problems, but mappings between XML schemas have not received the same amount of attention. Our goal is to develop a theory of expressive XML schema ...
On the complexity of query answering over incomplete XML documents
ICDT '12: Proceedings of the 15th International Conference on Database TheoryPrevious studies of incomplete XML documents have identified three main sources of incompleteness -- in structural information, data values, and labeling -- and addressed data complexity of answering analogs of unions of conjunctive queries under the ...
Comments