skip to main content
10.1145/1055558.1055572acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Composing schema mappings: second-order dependencies to the rescue

Published:14 June 2004Publication History

ABSTRACT

A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). Schema mappings play a key role in numerous areas of database systems, including database design, information integration, and model management. A fundamental problem in this context is composing schema mappings: given two successive schema mappings, derive a schema mapping between the source schema of the first and the target schema of the second that has the same effect as applying successively the two schema mappings.In this paper, we give a rigorous semantics to the composition of schema mappings and investigate the definability and computational complexity of the composition of two schema mappings. We first study the important case of schema mappings in which the specification is given by a finite set of source-to-target tuple-generating dependencies (source-to-target tgds). We show that the composition of a finite set of full source-to-target tgds with a finite set of tgds is always definable by a finite set of source-to-target tgds, but the composition of a finite set of source-to-target tgds with a finite set of full source-to-target tgds may not be definable by any set (finite or infinite) of source-to-target tgds; furthermore, it may not be definable by any formula of least fixed-point logic, and the associated composition query may be NP-complete. After this, we introduce a class of existential second-order formulas with function symbols, which we call second-order tgds, and make a case that they are the "right" language for composing schema mappings. To this effect, we show that the composition of finite sets of source-to-target tgds is always definable by a second-order tgd. Moreover, the composition of second-order tgds is also definable by a second-order tgd. Our second-order tgds allow equalities, even though the "obvious" way to define them does not require equalities. Allowing equalities in second-order tgds turns out to be of the essence, because we show that second-order tgds without equalities are not sufficiently expressive to define even the composition of finite sets of source-to-target tgds. Finally, we show that second-order tgds possess good properties for data exchange. In particular. the chase procedure can be extended to second-order tgds so that it produces polynomial-time computable universal solutions in data exchange settings specified by second-order tgds.

References

  1. S. Abiteboul and O. M. Duschka. Complexity of Answering Queries Using Materialized Views. In ACM Symposium on Principles of Database Systems (PODS), pages 254--263, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Beeri and M. Y. Vardi. A Proof Procedure for Data Dependencies. Journal of the Association for Computing Machinery (JACM), 31(4):718--741, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Beeri and M. Y. Vardi. Formal Systems for Tuple and Equality Generating Dependencies. SIAM J. on Computing, 13(1):76--98, 1984.Google ScholarGoogle Scholar
  5. P. A. Bernstein. Applying Model Management to Classical Meta-Data Problems. In Conference on Innovative Data Systems Research (CIDR), pages 209--220, 2003.Google ScholarGoogle Scholar
  6. A. K. Chandra and D. Harel. Structure and Complexity of Relational Queries. Journal of Computer and System Sciences, 25(1):99--128, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. Dawar. A Restricted Second Order Logic for Finite Structures. Information and Computation, 143(2):154--174, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Fagin. Generalized First-Order Spectra and Polynomial-Time Recognizable Sets. In R. M. Karp, editor, Complexity of Computation, SIAM-AMS Proceedings, Vol. 7, pages 43--73, 1974.Google ScholarGoogle Scholar
  9. R. Fagin. Horn Clauses and Database Dependencies. Journal of the Association for Computing Machinery (JACM), 29(4):952--985, Oct. 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. In International Conference on Database Theory (ICDT), pages 207--224, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Fagin, P. G. Kolaitis, and L. Popa. Data Exchange: Getting to the Core. In ACM Symposium on Principles of Database Systems (PODS), pages 90--101, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Feder and M. Vardi. The Computational Structure of Monotone Monadic SNP and Constraint Satisfaction: A Study through Datalog and Group Theory. SIAM J. on Computing, 28:57--104, 1998. Preliminary version in Proc. 25th ACM Symp. on Theory of Computing, May 1993, pp. 612--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Y. Halevy, Z. G. Ives, P. Mork, and I. Tatarinov. Piazza: Data Management Infrastructure for Semantic Web Applications. In International World Wide Web Conference, pages 556--567, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Immerman. Descriptive Complexity. Springer, 1999.Google ScholarGoogle Scholar
  15. M. Lenzerini. Data Integration: A Theoretical Perspective. In ACM Symposium on Principles of Database Systems (PODS), pages 233--246, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. W. Lloyd. Foundations of Logic Programming. Springer, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Madhavan and A. Y. Halevy. Composing Mappings Among Data Sources. In International Conference on Very Large Data Bases (VLDB), pages 572--583, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. J. Miller, L. M. Haas, and M. Hernández. Schema Mapping as Query Discovery. In International Conference on Very Large Data Bases (VLDB), pages 77--88, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernandez, and R. Fagin. Translating Web Data. In International Conference on Very Large Data Bases (VLDB), pages 598--609, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Vassiliadis, A. Simitsis, and S. Skiadopoulos. On the Logical Modeling of ETL Processes. In International Conference on Advanced Information Systems Engineering (CAiSE), pages 782--786, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
    June 2004
    350 pages
    ISBN:158113858X
    DOI:10.1145/1055558

    Copyright © 2004 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 June 2004

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate642of2,707submissions,24%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader