ABSTRACT
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that satisfy the constraints of the data exchange problem. In an earlier paper, we identified a special class of solutions that we call universal. A universal solution has homomorphisms into every possible solution, and hence is a "most general possible" solution. Nonetheless, given a source instance, there may be many universal solutions. This naturally raises the question of whether there is a "best" universal solution, and hence a best solution for data exchange. We answer this question by considering the well-known notion of the core of a structure, a notion that was first studied in graph theory, but has also played a role in conjunctive-query processing. The core of a structure is the smallest substructure that is also a homomorphic image of the structure. All universal solutions have the same core (up to isomorphism); we show that this core is also a universal solution, and hence the smallest universal solution. The uniqueness of the core of a universal solution together with its minimality make the core an ideal solution for data exchange. Furthermore, we show that the core is the best among all universal solutions for answering unions of conjunctive queries with inequalities. After this, we investigate the computational complexity of producing the core. Well-known results by Chandra and Merlin imply that, unless P = NP, there is no polynomial-time algorithm that, given a structure as input, returns the core of that structure as output. In contrast, in the context of data exchange, we identify natural and fairly broad conditions under which there are polynomial-time algorithms for computing the core of a universal solution. Finally, we analyze the computational complexity of the following decision problem that underlies the computation of cores: given two graphs G and H, is H the core ofG? Earlier results imply that this problem is both NP-hard and coNP-hard. Here, we pinpoint its exact complexity by establishing that it is a DP-complete problem.
- S. Abiteboul and O. M. Duschka. Complexity of Answering Queries Using Materialized Views. In PODS, pages 254--263, 1998. Google ScholarDigital Library
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- C. Beeri and M. Y. Vardi. A Proof Procedure for Data Dependencies. Journal of the ACM, 31(4):718--741, 1984. Google ScholarDigital Library
- A. K. Chandra and P. M. Merlin. Optimal Implementation of Conjunctive Queries in Relational Data Bases. In STOC, pages 77--90, 1977. Google ScholarDigital Library
- S. Cosmadakis. The Complexity of Evaluating Relational Queries. Information and Control, 58:101--112, 1983. Google ScholarDigital Library
- S. S. Cosmadakis and P. C. Kanellakis. Functional and Inclusion Dependencies: A Graph Theoretic Approach. In Advances in Computing Research, volume 3, pages 163--184. JAI Press, 1986.Google Scholar
- A. Deutsch, L. Popa, and V. Tannen. Physical Data Independence, Constraints and Optimization with Universal Plans. In VLDB, pages 459--470, 1999. Google ScholarDigital Library
- A. Deutsch and V. Tannen. Reformulation of XML Queries and Constraints. In ICDT, pages 225--241, 2003. Google ScholarDigital Library
- R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. In ICDT, pages 207--224, 2003. Google ScholarDigital Library
- M. Friedman, A. Y. Levy, and T. D. Millstein. Navigational Plans For Data Integration. In AAAI, pages 67--73, 1999. Google ScholarDigital Library
- A. Halevy. Answering Queries Using Views: A Survey. VLDB Journal, pages 270--294, 2001. Google ScholarDigital Library
- P. Hell and J. Neset ril. The Core of a Graph. Discrete Mathematics, 109:117--126, 1992. Google ScholarDigital Library
- P. C. Kanellakis. Elements of Relational Database Theory. In Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics, pages 1073--1156. Elsevier and MIT Press, 1990. Google ScholarDigital Library
- M. Lenzerini. Data Integration: A Theoretical Perspective. In PODS, pages 233--246, 2002. Google ScholarDigital Library
- D. Maier, A. O. Mendelzon, and Y. Sagiv. Testing Implications of Data Dependencies. ACM TODS, 4(4):455--469, Dec. 1979. Google ScholarDigital Library
- R. J. Miller, L. M. Haas, and M. Hernández. Schema Mapping as Query Discovery. In VLDB, pages 77--88, 2000. Google ScholarDigital Library
- C. Papadimitriou and M. Yannakakis. The Complexity of Facets and Some Facets of Complexity. In STOC, pages 229--234, 1982. Google ScholarDigital Library
- C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.Google Scholar
- L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernandez, and R. Fagin. Translating Web Data. In VLDB, pages 598--609, 2002. Google ScholarDigital Library
- N. C. Shu, B. C. Housel, R. W. Taylor, S. P. Ghosh, and V. Y. Lum. EXPRESS: A Data EXtraction, Processing, amd REStructuring System. ACM TODS, 2(2):134--174, 1977. Google ScholarDigital Library
- R. van der Meyden. Logical Approaches to Incomplete Information: A Survey. In Logics for Databases and Information Systems, pages 307--356. Kluwer, 1998. Google ScholarDigital Library
Index Terms
- Data exchange: getting to the core
Recommendations
Data exchange: getting to the core
Special Issue: SIGMOD/PODS 2003Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange ...
Data exchange: computing cores in polynomial time
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsData exchange deals with inserting data from one database into another database having a different schema. We study and solve a central computational problem of data exchange, namely, computing the core of a universal solution to a data exchange ...
Peer data exchange
PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsIn this paper, we introduce and study a framework, called peer data exchange, for sharing and exchanging data between peers. This framework is a special case of a full-fledged peer data management system and a generalization of data exchange between a ...
Comments