skip to main content
research-article

XML data exchange: Consistency and query answering

Published:15 May 2008Publication History
Skip Abstract Section

Abstract

Data exchange is the problem of finding an instance of a target schema, given an instance of a source schema and a specification of the relationship between the source and the target. Theoretical foundations of data exchange have recently been investigated for relational data.

In this article, we start looking into the basic properties of XML data exchange, that is, restructuring of XML documents that conform to a source DTD under a target DTD, and answering queries written over the target schema. We define XML data exchange settings in which source-to-target dependencies refer to the hierarchical structure of the data. Combining DTDs and dependencies makes some XML data exchange settings inconsistent. We investigate the consistency problem and determine its exact complexity.

We then move to query answering, and prove a dichotomy theorem that classifies data exchange settings into those over which query answering is tractable, and those over which it is coNP-complete, depending on classes of regular expressions used in DTDs. Furthermore, for all tractable cases we give polynomial-time algorithms that compute target XML documents over which queries can be answered.

References

  1. Abiteboul, S., and Duschka, O. M. 1998. Complexity of answering queries using materialized views. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abiteboul, S., Kanellakis, P. C., and Grahne, G. 1991. On the representation and querying of sets of possible worlds. Theoret. Comput. Sci. 78, 1, 158--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Abiteboul, S., Segoufin, L., and Vianu, V. 2001. Representing and querying XML with incomplete information. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 150--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amer-Yahia, S., Cho, S., Lakshmanan, L. V. S., and Srivastava, D. 2002. Tree pattern query minimization. VLDB J. 11, 4, 315--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Amer-Yahia, S., and Kotidis, Y. 2004. Web-services architecture for efficient XML data exchange. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). IEEE Computer Society Press, Los Alamitos, CA, 523--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arenas, M., Barceló, P., Fagin, R., and Libkin, L. 2004. Locally consistent transformations and query answering in data exchange. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 229--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Arenas, M., and Libkin, L. 2005. XML data exchange: consistency and query answering. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Benedikt, M., Fan, W., and Kuper, G. M. 2003. Structural properties of XPath fragments. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Heidelberg, Germany, 79--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., and Tommasi, M. 2007. Tree automata techniques and applications. (Available on: http://www.grappa.univ-lille3.fr/tata. release October, 12th 2007).Google ScholarGoogle Scholar
  10. Deutsch, A., and Tannen, V. 2001. Containment and integrity constraints for XPath. In Proceedings of the International Workshop on Knowledge Representation meets Databases (KRDB). CEUR-WS.org.Google ScholarGoogle Scholar
  11. Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2005a. Data exchange: Semantics and query answering. Theoret. Comput. Sci. 336, 1, 89--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fagin, R., Kolaitis, P. G., and Popa, L. 2005b. Data exchange: getting to the core. ACM Trans. Datab. Syst. 30, 1, 174--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fagin, R., Kolaitis, P. G., Popa, L., and Tan, W. C. 2005c. Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Datab. Syst. 30, 4, 994--1055. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gottlob, G., Koch, C., and Schulz, K. U. 2006. Conjunctive queries over trees. J. ACM 53, 2, 238--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Imielinski, T., and Lipski, W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kozen, D. 2002. On two letters versus three. In Proceedings of the Fixed Points in Computer Science (FICS). University of Aarhus, 44--50.Google ScholarGoogle Scholar
  18. Krishnamurthy, R., Kaushik, R., and Naughton, J. F. 2003. XML-SQL query translation literature: The state of the art and open problems. In Proceedings of the International XML Database Symposium (XSym). Springer-Verlag, Heidelberg, Germany, 1--18.Google ScholarGoogle Scholar
  19. Lakshmanan, L. V. S., Ramesh, G., Wang, H., and Zhao, Z. 2004. On testing satisfiability of tree pattern queries. In Proceedings of the International Conference Very Large Data Bases (VLDB). Morgan Kaufmann, San Mateo, CA, 120--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lenstra, H. W. 1983. Integer programming in a fixed number of variables. Math. Oper. Res. 8, 4, 538--548.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 233--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Miller, R. J., Hernández, M. A., Haas, L. M., Yan, L.-L., Ho, C. T. H., Fagin, R., and Popa, L. 2001. The Clio project: Managing heterogeneity. SIGMOD Record 30, 1, 78--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Neven, F. 2002. Automata, logic, and XML. In Proceedings of the Conference for Computer Science Logic (CSL). Springer-Verlag, Heidelberg, Germany, 2--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Neven, F., and Schwentick, T. 2003. XPath containment in the presence of disjunction, DTDs, and variables. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Heidelberg, Germany, 312--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Popa, L., Velegrakis, Y., Miller, R. J., Hernández, M. A., and Fagin, R. 2002. Translating web data. In Proceedings of the International Conference Very Large Data Bases (VLDB). Morgan-Kaufmann, San Mateo, CA, 598--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Seidl, H. 1990. Deciding equivalence of finite tree automata. SIAM J. Comput. 19, 3, 424--437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shu, N. C., Housel, B. C., Taylor, R. W., Ghosh, S. P., and Lum, V. Y. 1977. Express: A data extraction, processing, and restructuring system. ACM Trans. Database Syst. 2, 2, 134--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Vianu, V. 2001. A web odyssey: From Codd to XML. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wood, P. T. 2003. Containment for XPath fragments under DTD constraints. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Heidelberg, Germany, 297--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yu, C., and Popa, L. 2004. Constraint-based xml query rewriting for data integration. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, 371--382. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. XML data exchange: Consistency and query answering

            Recommendations

            Reviews

            Anthony Joseph Duben

            Extensible Markup Language (XML) data exchange is the process by which XML documents structured in a source document type definition (DTD) are queried under a different target DTD, given a set of transformation rules between the source DTD and target DTD. This task is to be accomplished in a manner that remains consistent with the structure of the source DTD. The process is unrelated to the use of the same DTD to structure documents passed among different applications that parse them using the same DTD rules. XML data exchange is a new area of research. Most of the important papers on this topic were published within the past eight years. This long mini-monograph was written to initiate and stimulate discussion of the basic theoretical issues underlying XML data exchange. Theoretical work so far has been concentrated on the relational model. The authors begin with these formulations and burrow deeper. Because of the length of this paper, it is better considered to be something of a short book. The paper is divided into seven sections and an appendix. The first section is an introduction, which poses the problem by employing an example of finding a book using two different DTDs. Section 2 states basic definitions, notational conventions, and shorthand representations in describing the contents of a DTD. These are important, since the source DTD is rewritten on the fly when a query is made in the target DTD. The third section is on XML data exchange settings. The authors depart from the relational source-to-target dependencies in the literature to use a formalism based on a tree pattern like the XML languages themselves. The fourth section continues the discussion of data exchange settings to study consistency, namely, whether a solution in the transformation between DTDs exists. Section 5 is on query answering. Certain answers, defined as the intersection of query results over all solutions, are sought. The sixth section occupies nearly half of the paper. It is on computing certain answers, and classifying them according to complexity. The seventh section briefly summarizes the accomplishments of the paper, and poses additional areas for further research. The appendix goes into the details of some of the proofs in the previous sections. This paper is a major contribution in a new research area. However, it is very terse (if that could be an appropriate description of a 72-page paper) because there is so much here. There is enough covered here to fill a book an order of magnitude larger in size. Online Computing Reviews Service

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Journal of the ACM
              Journal of the ACM  Volume 55, Issue 2
              May 2008
              282 pages
              ISSN:0004-5411
              EISSN:1557-735X
              DOI:10.1145/1346330
              Issue’s Table of Contents

              Copyright © 2008 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 15 May 2008
              • Revised: 1 January 2008
              • Accepted: 1 January 2008
              • Received: 1 December 2005
              Published in jacm Volume 55, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader