Abstract
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that satisfy the constraints of the data exchange problem. In an earlier article, we identified a special class of solutions that we call universal. A universal solution has homomorphisms into every possible solution, and hence is a “most general possible” solution. Nonetheless, given a source instance, there may be many universal solutions. This naturally raises the question of whether there is a “best” universal solution, and hence a best solution for data exchange. We answer this question by considering the well-known notion of the core of a structure, a notion that was first studied in graph theory, and has also played a role in conjunctive-query processing. The core of a structure is the smallest substructure that is also a homomorphic image of the structure. All universal solutions have the same core (up to isomorphism); we show that this core is also a universal solution, and hence the smallest universal solution. The uniqueness of the core of a universal solution together with its minimality make the core an ideal solution for data exchange. We investigate the computational complexity of producing the core. Well-known results by Chandra and Merlin imply that, unless P = NP, there is no polynomial-time algorithm that, given a structure as input, returns the core of that structure as output. In contrast, in the context of data exchange, we identify natural and fairly broad conditions under which there are polynomial-time algorithms for computing the core of a universal solution. We also analyze the computational complexity of the following decision problem that underlies the computation of cores: given two graphs G and H, is H the core of G? Earlier results imply that this problem is both NP-hard and coNP-hard. Here, we pinpoint its exact complexity by establishing that it is a DP-complete problem. Finally, we show that the core is the best among all universal solutions for answering existential queries, and we propose an alternative semantics for answering queries in data exchange settings.
- Abiteboul, S. and Duschka, O. M. 1998. Complexity of answering queries using materialized views. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 254--263. Google ScholarDigital Library
- Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- Beeri, C. and Vardi, M. Y. 1984. A proof procedure for data dependencies. Journal Assoc. Comput. Mach. 31, 4, 718--741. Google ScholarDigital Library
- Chandra, A. K. and Merlin, P. M. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the ACM Symposium on Theory of Computing (STOC). 77--90. Google ScholarDigital Library
- Cosmadakis, S. 1983. The complexity of evaluating relational queries. Inform. Contr. 58, 101--112. Google ScholarDigital Library
- Cosmadakis, S. S. and Kanellakis, P. C. 1986. Functional and inclusion dependencies: A graph theoretic approach. In Advances in Computing Research., vol. 3. JAI Press, Greenwich, CT, 163--184.Google Scholar
- Deutsch, A., Popa, L., and Tannen, V. 1999. Physical data independence, constraints and optimization with universal plans. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 459--470. Google ScholarDigital Library
- Deutsch, A. and Tannen, V. 2003. Reformulation of XML queries and constraints. In Proceedings of the International Conference on Database Theory (ICDT). 225--241. Google ScholarDigital Library
- Fagin, R. 1982. Horn clauses and database dependencies. Journal Assoc. Comput. Mach. 29, 4 (Oct.), 952--985. Google ScholarDigital Library
- Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2003. Data exchange: Semantics and query answering. In Proceedings of the International Conference on Database Theory (ICDT). 207--224. Google ScholarDigital Library
- Friedman, M., Levy, A. Y., and Millstein, T. D. 1999. Navigational plans for data integration. In Proceedings of the National Conference on Artificial Intelligence (AAAI). 67--73. Google ScholarDigital Library
- Gottlob, G. 2005. Cores for data exchange: Hard cases and practical solutions. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). Google ScholarDigital Library
- Gottlob, G. and Fermüller, C. 1993. Removing redundancy from a clause. Art. Intell. 61, 2, 263--289. Google ScholarDigital Library
- Halevy, A. 2001. Answering queries using views: A survey. VLDB J. 10, 4, 270--294. Google ScholarDigital Library
- Hell, P. and Neset&rbreve;il, J. 1992. The core of a graph. Discr. Math. 109, 117--126. Google ScholarDigital Library
- Kanellakis, P. C. 1990. Elements of relational database theory. In Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics. Elsevier, Amsterdam, The Netherlands, and MIT Press, Cambridge, MA, 1073--1156. Google ScholarDigital Library
- Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 233--246. Google ScholarDigital Library
- Maier, D., Mendelzon, A. O., and Sagiv, Y. 1979. Testing implications of data dependencies. ACM Trans. Database Syst. 4, 4 (Dec.), 455--469. Google ScholarDigital Library
- Miller, R. J., Haas, L. M., and Hernández, M. 2000. Schema mapping as query discovery. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 77--88. Google ScholarDigital Library
- Papadimitriou, C. and Yannakakis, M. 1982. The complexity of facets and some facets of complexity. In Proceedings of the ACM Symposium on Theory of Computing (STOC). 229--234. Google ScholarDigital Library
- Papadimitriou, C. H. 1994. Computational Complexity. Addison-Wesley, Reading, MA.Google Scholar
- Popa, L., Velegrakis, Y., Miller, R. J., Hernandez, M. A., and Fagin, R. 2002. Translating Web data. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 598--609. Google ScholarDigital Library
- Shu, N. C., Housel, B. C., Taylor, R. W., Ghosh, S. P., and Lum, V. Y. 1977. EXPRESS: A data EXtraction, Processing, amd REStructuring System. ACM Trans. Database Syst. 2, 2, 134--174. Google ScholarDigital Library
- van der Meyden, R. 1998. Logical approaches to incomplete information: A survey. In Logics for Databases and Information Systems. Kluwer, Dordrecht, The Netherlands, 307--356. Google ScholarDigital Library
Index Terms
- Data exchange: getting to the core
Recommendations
Peer data exchange
In this article, we introduce and study a framework, called peer data exchange, for sharing and exchanging data between peers. This framework is a special case of a full-fledged peer data management system and a generalization of data exchange between a ...
Data exchange: computing cores in polynomial time
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsData exchange deals with inserting data from one database into another database having a different schema. We study and solve a central computational problem of data exchange, namely, computing the core of a universal solution to a data exchange ...
Efficient core computation in data exchange
Data exchange deals with inserting data from one database into another database having a different schema. Fagin et al. [2005] have shown that among the universal solutions of a solvable data exchange problem, there exists—up to isomorphism—a unique ...
Comments