skip to main content
article

Data exchange: getting to the core

Published:01 March 2005Publication History
Skip Abstract Section

Abstract

Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that satisfy the constraints of the data exchange problem. In an earlier article, we identified a special class of solutions that we call universal. A universal solution has homomorphisms into every possible solution, and hence is a “most general possible” solution. Nonetheless, given a source instance, there may be many universal solutions. This naturally raises the question of whether there is a “best” universal solution, and hence a best solution for data exchange. We answer this question by considering the well-known notion of the core of a structure, a notion that was first studied in graph theory, and has also played a role in conjunctive-query processing. The core of a structure is the smallest substructure that is also a homomorphic image of the structure. All universal solutions have the same core (up to isomorphism); we show that this core is also a universal solution, and hence the smallest universal solution. The uniqueness of the core of a universal solution together with its minimality make the core an ideal solution for data exchange. We investigate the computational complexity of producing the core. Well-known results by Chandra and Merlin imply that, unless P = NP, there is no polynomial-time algorithm that, given a structure as input, returns the core of that structure as output. In contrast, in the context of data exchange, we identify natural and fairly broad conditions under which there are polynomial-time algorithms for computing the core of a universal solution. We also analyze the computational complexity of the following decision problem that underlies the computation of cores: given two graphs G and H, is H the core of G? Earlier results imply that this problem is both NP-hard and coNP-hard. Here, we pinpoint its exact complexity by establishing that it is a DP-complete problem. Finally, we show that the core is the best among all universal solutions for answering existential queries, and we propose an alternative semantics for answering queries in data exchange settings.

References

  1. Abiteboul, S. and Duschka, O. M. 1998. Complexity of answering queries using materialized views. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley, Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Beeri, C. and Vardi, M. Y. 1984. A proof procedure for data dependencies. Journal Assoc. Comput. Mach. 31, 4, 718--741. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chandra, A. K. and Merlin, P. M. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the ACM Symposium on Theory of Computing (STOC). 77--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cosmadakis, S. 1983. The complexity of evaluating relational queries. Inform. Contr. 58, 101--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cosmadakis, S. S. and Kanellakis, P. C. 1986. Functional and inclusion dependencies: A graph theoretic approach. In Advances in Computing Research., vol. 3. JAI Press, Greenwich, CT, 163--184.Google ScholarGoogle Scholar
  7. Deutsch, A., Popa, L., and Tannen, V. 1999. Physical data independence, constraints and optimization with universal plans. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 459--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Deutsch, A. and Tannen, V. 2003. Reformulation of XML queries and constraints. In Proceedings of the International Conference on Database Theory (ICDT). 225--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fagin, R. 1982. Horn clauses and database dependencies. Journal Assoc. Comput. Mach. 29, 4 (Oct.), 952--985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2003. Data exchange: Semantics and query answering. In Proceedings of the International Conference on Database Theory (ICDT). 207--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Friedman, M., Levy, A. Y., and Millstein, T. D. 1999. Navigational plans for data integration. In Proceedings of the National Conference on Artificial Intelligence (AAAI). 67--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gottlob, G. 2005. Cores for data exchange: Hard cases and practical solutions. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gottlob, G. and Fermüller, C. 1993. Removing redundancy from a clause. Art. Intell. 61, 2, 263--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Halevy, A. 2001. Answering queries using views: A survey. VLDB J. 10, 4, 270--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hell, P. and Neset&rbreve;il, J. 1992. The core of a graph. Discr. Math. 109, 117--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kanellakis, P. C. 1990. Elements of relational database theory. In Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics. Elsevier, Amsterdam, The Netherlands, and MIT Press, Cambridge, MA, 1073--1156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 233--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Maier, D., Mendelzon, A. O., and Sagiv, Y. 1979. Testing implications of data dependencies. ACM Trans. Database Syst. 4, 4 (Dec.), 455--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Miller, R. J., Haas, L. M., and Hernández, M. 2000. Schema mapping as query discovery. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 77--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Papadimitriou, C. and Yannakakis, M. 1982. The complexity of facets and some facets of complexity. In Proceedings of the ACM Symposium on Theory of Computing (STOC). 229--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Papadimitriou, C. H. 1994. Computational Complexity. Addison-Wesley, Reading, MA.Google ScholarGoogle Scholar
  22. Popa, L., Velegrakis, Y., Miller, R. J., Hernandez, M. A., and Fagin, R. 2002. Translating Web data. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 598--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shu, N. C., Housel, B. C., Taylor, R. W., Ghosh, S. P., and Lum, V. Y. 1977. EXPRESS: A data EXtraction, Processing, amd REStructuring System. ACM Trans. Database Syst. 2, 2, 134--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. van der Meyden, R. 1998. Logical approaches to incomplete information: A survey. In Logics for Databases and Information Systems. Kluwer, Dordrecht, The Netherlands, 307--356. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data exchange: getting to the core

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader