Abstract
Data exchange is the problem of finding an instance of a target schema, given an instance of a source schema and a specification of the relationship between the source and the target. Theoretical foundations of data exchange have recently been investigated for relational data.
In this article, we start looking into the basic properties of XML data exchange, that is, restructuring of XML documents that conform to a source DTD under a target DTD, and answering queries written over the target schema. We define XML data exchange settings in which source-to-target dependencies refer to the hierarchical structure of the data. Combining DTDs and dependencies makes some XML data exchange settings inconsistent. We investigate the consistency problem and determine its exact complexity.
We then move to query answering, and prove a dichotomy theorem that classifies data exchange settings into those over which query answering is tractable, and those over which it is coNP-complete, depending on classes of regular expressions used in DTDs. Furthermore, for all tractable cases we give polynomial-time algorithms that compute target XML documents over which queries can be answered.
- Abiteboul, S., and Duschka, O. M. 1998. Complexity of answering queries using materialized views. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 254--263. Google ScholarDigital Library
- Abiteboul, S., Kanellakis, P. C., and Grahne, G. 1991. On the representation and querying of sets of possible worlds. Theoret. Comput. Sci. 78, 1, 158--187. Google ScholarDigital Library
- Abiteboul, S., Segoufin, L., and Vianu, V. 2001. Representing and querying XML with incomplete information. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 150--161. Google ScholarDigital Library
- Amer-Yahia, S., Cho, S., Lakshmanan, L. V. S., and Srivastava, D. 2002. Tree pattern query minimization. VLDB J. 11, 4, 315--331. Google ScholarDigital Library
- Amer-Yahia, S., and Kotidis, Y. 2004. Web-services architecture for efficient XML data exchange. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). IEEE Computer Society Press, Los Alamitos, CA, 523--534. Google ScholarDigital Library
- Arenas, M., Barceló, P., Fagin, R., and Libkin, L. 2004. Locally consistent transformations and query answering in data exchange. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 229--240. Google ScholarDigital Library
- Arenas, M., and Libkin, L. 2005. XML data exchange: consistency and query answering. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 13--24. Google ScholarDigital Library
- Benedikt, M., Fan, W., and Kuper, G. M. 2003. Structural properties of XPath fragments. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Heidelberg, Germany, 79--95. Google ScholarDigital Library
- Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., and Tommasi, M. 2007. Tree automata techniques and applications. (Available on: http://www.grappa.univ-lille3.fr/tata. release October, 12th 2007).Google Scholar
- Deutsch, A., and Tannen, V. 2001. Containment and integrity constraints for XPath. In Proceedings of the International Workshop on Knowledge Representation meets Databases (KRDB). CEUR-WS.org.Google Scholar
- Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2005a. Data exchange: Semantics and query answering. Theoret. Comput. Sci. 336, 1, 89--124. Google ScholarDigital Library
- Fagin, R., Kolaitis, P. G., and Popa, L. 2005b. Data exchange: getting to the core. ACM Trans. Datab. Syst. 30, 1, 174--210. Google ScholarDigital Library
- Fagin, R., Kolaitis, P. G., Popa, L., and Tan, W. C. 2005c. Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Datab. Syst. 30, 4, 994--1055. Google ScholarDigital Library
- Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman. Google ScholarDigital Library
- Gottlob, G., Koch, C., and Schulz, K. U. 2006. Conjunctive queries over trees. J. ACM 53, 2, 238--272. Google ScholarDigital Library
- Imielinski, T., and Lipski, W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761--791. Google ScholarDigital Library
- Kozen, D. 2002. On two letters versus three. In Proceedings of the Fixed Points in Computer Science (FICS). University of Aarhus, 44--50.Google Scholar
- Krishnamurthy, R., Kaushik, R., and Naughton, J. F. 2003. XML-SQL query translation literature: The state of the art and open problems. In Proceedings of the International XML Database Symposium (XSym). Springer-Verlag, Heidelberg, Germany, 1--18.Google Scholar
- Lakshmanan, L. V. S., Ramesh, G., Wang, H., and Zhao, Z. 2004. On testing satisfiability of tree pattern queries. In Proceedings of the International Conference Very Large Data Bases (VLDB). Morgan Kaufmann, San Mateo, CA, 120--131. Google ScholarDigital Library
- Lenstra, H. W. 1983. Integer programming in a fixed number of variables. Math. Oper. Res. 8, 4, 538--548.Google ScholarDigital Library
- Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 233--246. Google ScholarDigital Library
- Miller, R. J., Hernández, M. A., Haas, L. M., Yan, L.-L., Ho, C. T. H., Fagin, R., and Popa, L. 2001. The Clio project: Managing heterogeneity. SIGMOD Record 30, 1, 78--83. Google ScholarDigital Library
- Neven, F. 2002. Automata, logic, and XML. In Proceedings of the Conference for Computer Science Logic (CSL). Springer-Verlag, Heidelberg, Germany, 2--26. Google ScholarDigital Library
- Neven, F., and Schwentick, T. 2003. XPath containment in the presence of disjunction, DTDs, and variables. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Heidelberg, Germany, 312--326. Google ScholarDigital Library
- Popa, L., Velegrakis, Y., Miller, R. J., Hernández, M. A., and Fagin, R. 2002. Translating web data. In Proceedings of the International Conference Very Large Data Bases (VLDB). Morgan-Kaufmann, San Mateo, CA, 598--609. Google ScholarDigital Library
- Seidl, H. 1990. Deciding equivalence of finite tree automata. SIAM J. Comput. 19, 3, 424--437. Google ScholarDigital Library
- Shu, N. C., Housel, B. C., Taylor, R. W., Ghosh, S. P., and Lum, V. Y. 1977. Express: A data extraction, processing, and restructuring system. ACM Trans. Database Syst. 2, 2, 134--174. Google ScholarDigital Library
- Vianu, V. 2001. A web odyssey: From Codd to XML. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 1--15. Google ScholarDigital Library
- Wood, P. T. 2003. Containment for XPath fragments under DTD constraints. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Heidelberg, Germany, 297--311. Google ScholarDigital Library
- Yu, C., and Popa, L. 2004. Constraint-based xml query rewriting for data integration. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, 371--382. Google ScholarDigital Library
Index Terms
- XML data exchange: Consistency and query answering
Recommendations
XML Schema Mappings: Data Exchange and Metadata Management
Relational schema mappings have been extensively studied in connection with data integration and exchange problems, but mappings between XML schemas have not received the same amount of attention. Our goal is to develop a theory of expressive XML schema ...
XML data exchange with target constraints
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema, by following a mapping between the two schemas. There is a rich literature on problems related to data exchange, e.g., the design ...
Exchanging intensional XML data
Special Issue: SIGMOD/PODS 2003XML is becoming the universal format for data exchange between applications. Recently, the emergence of Web services as standard means of publishing and accessing data on the Web introduced a new class of XML documents, which we call intensional ...
Comments