skip to main content
article

Composing schema mappings: Second-order dependencies to the rescue

Published:01 December 2005Publication History
Skip Abstract Section

Abstract

A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). A fundamental problem is composing schema mappings: given two successive schema mappings, derive a schema mapping between the source schema of the first and the target schema of the second that has the same effect as applying successively the two schema mappings.In this article, we give a rigorous semantics to the composition of schema mappings and investigate the definability and computational complexity of the composition of two schema mappings. We first study the important case of schema mappings in which the specification is given by a finite set of source-to-target tuple-generating dependencies (source-to-target tgds). We show that the composition of a finite set of full source-to-target tgds with a finite set of tgds is always definable by a finite set of source-to-target tgds, but the composition of a finite set of source-to-target tgds with a finite set of full source-to-target tgds may not be definable by any set (finite or infinite) of source-to-target tgds; furthermore, it may not be definable by any formula of least fixed-point logic, and the associated composition query may be NP-complete. After this, we introduce a class of existential second-order formulas with function symbols and equalities, which we call second-order tgds, and make a case that they are the “right” language for composing schema mappings. Specifically, we show that second-order tgds form the smallest class (up to logical equivalence) that contains every source-to-target tgd and is closed under conjunction and composition. Allowing equalities in second-order tgds turns out to be of the essence, even though the “obvious” way to define second-order tgds does not require equalities. We show that second-order tgds without equalities are not sufficiently expressive to define the composition of finite sets of source-to-target tgds. Finally, we show that second-order tgds possess good properties for data exchange and query answering: the chase procedure can be extended to second-order tgds so that it produces polynomial-time computable universal solutions in data exchange settings specified by second-order tgds.

References

  1. Abiteboul, S. and Duschka, O. M. 1998. Complexity of answering queries using materialized views. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley, Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Beeri, C. and Vardi, M. Y. 1984a. A proof procedure for data dependencies. J. ACM. 31, 4, 718--741. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Beeri, C. and Vardi, M. Y. 1984b. Formal systems for tuple and equality generating dependencies. SIAM J. Comput. 13, 1, 76--98.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bernstein, P. A. 2003. Applying model management to classical meta-data problems. In Proceedings of the Conference on Innovative Data Systems Research (CIDR). 209--220.Google ScholarGoogle Scholar
  6. Chandra, A. K. and Harel, D. 1982. Structure and complexity of relational queries. J. Comput. Syst. Sci. 25, 1, 99--128.Google ScholarGoogle ScholarCross RefCross Ref
  7. Dawar, A. 1998. A restricted second order logic for finite structures. Inf. Comput. 143, 2, 154--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ebbinghaus, H.-D. and Flum, J. 1999. Finite Model Theory, 2nd Edition. Springer-Verlag, New York.Google ScholarGoogle Scholar
  9. Enderton, H. B. 2001. A Mathematical Introduction to Logic, 2nd Edition. Academic Press, Orlando, FL.Google ScholarGoogle Scholar
  10. Fagin, R. 1974. Generalized first-order spectra and polynomial-time recognizable sets. In Complexity of Computation, SIAM-AMS Proceedings, Vol. 7, R. M. Karp, Ed. 43--73.Google ScholarGoogle Scholar
  11. Fagin, R. 1982. Horn clauses and database dependencies. J. ACM 29, 4 (Oct.), 952--985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2005a. Data exchange: semantics and query answering. Theoret. Comput. Sci. 336, 89--124. (Preliminary version in Proceedings of the 2003 International Conference on Database Theory, pp. 207--224.) Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fagin, R., Kolaitis, P. G., and Popa, L. 2005b. Data exchange: Getting to the core. ACM Trans. Datab. Syst. 30, 1 (March), 174--210. (A preliminary version appeared in Proceedings of the Symposium on Principles of Database Systems (PODS 2003). ACM New York, 90--101). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fagin, R., Kolaitis, P. G., Popa, L., and Tan, W.-C. 2004. Composing schema mappings: Second-order dependencies to the rescue. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Feder, T. and Vardi, M. 1998. The computational structure of monotone monadic SNP and constraint satisfaction: A study through datalog and group theory. SIAM J. Comput. 28, 57--104. (Preliminary version in Proc. 25th ACM Symp. on Theory of Computing, May 1993, pp. 612--622.) Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Garey, M., Johnson, D. S., and Stockmeyer, L. J. 1976. Some simplified NP-complete graph problems. Theoret. Comput. Sci. 1, 237--267.Google ScholarGoogle ScholarCross RefCross Ref
  17. Halevy, A. Y., Ives, Z. G., Mork, P., and Tatarinov, I. 2003. Piazza: Data management infrastructure for semantic web applications. In Proceedings of the International World Wide Web Conference. 556--567. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Immerman, N. 1999. Descriptive Complexity. Springer-Verlag, New York.Google ScholarGoogle Scholar
  19. Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 233--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Madhavan, J. and Halevy, A. Y. 2003. Composing mappings among data sources. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 572--583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Miller, R. J., Haas, L. M., and Hernández, M. 2000. Schema mapping as query discovery. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 77--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Popa, L., Velegrakis, Y., Miller, R. J., Hernandez, M. A., and Fagin, R. 2002. Translating web data. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 598--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Vassiliadis, P., Simitsis, A., and Skiadopoulos, S. 2002. On the logical modeling of ETL processes. In Proceedings of the International Conference on Advanced Information Systems Engineering (CAiSE). 782--786. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Composing schema mappings: Second-order dependencies to the rescue

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader