skip to main content
10.1145/773153.773163acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Data exchange: getting to the core

Published:09 June 2003Publication History

ABSTRACT

Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that satisfy the constraints of the data exchange problem. In an earlier paper, we identified a special class of solutions that we call universal. A universal solution has homomorphisms into every possible solution, and hence is a "most general possible" solution. Nonetheless, given a source instance, there may be many universal solutions. This naturally raises the question of whether there is a "best" universal solution, and hence a best solution for data exchange. We answer this question by considering the well-known notion of the core of a structure, a notion that was first studied in graph theory, but has also played a role in conjunctive-query processing. The core of a structure is the smallest substructure that is also a homomorphic image of the structure. All universal solutions have the same core (up to isomorphism); we show that this core is also a universal solution, and hence the smallest universal solution. The uniqueness of the core of a universal solution together with its minimality make the core an ideal solution for data exchange. Furthermore, we show that the core is the best among all universal solutions for answering unions of conjunctive queries with inequalities. After this, we investigate the computational complexity of producing the core. Well-known results by Chandra and Merlin imply that, unless P = NP, there is no polynomial-time algorithm that, given a structure as input, returns the core of that structure as output. In contrast, in the context of data exchange, we identify natural and fairly broad conditions under which there are polynomial-time algorithms for computing the core of a universal solution. Finally, we analyze the computational complexity of the following decision problem that underlies the computation of cores: given two graphs G and H, is H the core ofG? Earlier results imply that this problem is both NP-hard and coNP-hard. Here, we pinpoint its exact complexity by establishing that it is a DP-complete problem.

References

  1. S. Abiteboul and O. M. Duschka. Complexity of Answering Queries Using Materialized Views. In PODS, pages 254--263, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Beeri and M. Y. Vardi. A Proof Procedure for Data Dependencies. Journal of the ACM, 31(4):718--741, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. K. Chandra and P. M. Merlin. Optimal Implementation of Conjunctive Queries in Relational Data Bases. In STOC, pages 77--90, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Cosmadakis. The Complexity of Evaluating Relational Queries. Information and Control, 58:101--112, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. S. Cosmadakis and P. C. Kanellakis. Functional and Inclusion Dependencies: A Graph Theoretic Approach. In Advances in Computing Research, volume 3, pages 163--184. JAI Press, 1986.Google ScholarGoogle Scholar
  7. A. Deutsch, L. Popa, and V. Tannen. Physical Data Independence, Constraints and Optimization with Universal Plans. In VLDB, pages 459--470, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Deutsch and V. Tannen. Reformulation of XML Queries and Constraints. In ICDT, pages 225--241, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. In ICDT, pages 207--224, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Friedman, A. Y. Levy, and T. D. Millstein. Navigational Plans For Data Integration. In AAAI, pages 67--73, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Halevy. Answering Queries Using Views: A Survey. VLDB Journal, pages 270--294, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Hell and J. Neset ril. The Core of a Graph. Discrete Mathematics, 109:117--126, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. C. Kanellakis. Elements of Relational Database Theory. In Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics, pages 1073--1156. Elsevier and MIT Press, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Lenzerini. Data Integration: A Theoretical Perspective. In PODS, pages 233--246, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Maier, A. O. Mendelzon, and Y. Sagiv. Testing Implications of Data Dependencies. ACM TODS, 4(4):455--469, Dec. 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. J. Miller, L. M. Haas, and M. Hernández. Schema Mapping as Query Discovery. In VLDB, pages 77--88, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Papadimitriou and M. Yannakakis. The Complexity of Facets and Some Facets of Complexity. In STOC, pages 229--234, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.Google ScholarGoogle Scholar
  19. L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernandez, and R. Fagin. Translating Web Data. In VLDB, pages 598--609, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. C. Shu, B. C. Housel, R. W. Taylor, S. P. Ghosh, and V. Y. Lum. EXPRESS: A Data EXtraction, Processing, amd REStructuring System. ACM TODS, 2(2):134--174, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. van der Meyden. Logical Approaches to Incomplete Information: A Survey. In Logics for Databases and Information Systems, pages 307--356. Kluwer, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data exchange: getting to the core

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
              June 2003
              291 pages
              ISBN:1581136706
              DOI:10.1145/773153
              • Conference Chair:
              • Frank Neven,
              • General Chair:
              • Catriel Beeri,
              • Program Chair:
              • Tova Milo

              Copyright © 2003 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 June 2003

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              PODS '03 Paper Acceptance Rate27of136submissions,20%Overall Acceptance Rate642of2,707submissions,24%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader