Abstract
A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). A fundamental problem is composing schema mappings: given two successive schema mappings, derive a schema mapping between the source schema of the first and the target schema of the second that has the same effect as applying successively the two schema mappings.In this article, we give a rigorous semantics to the composition of schema mappings and investigate the definability and computational complexity of the composition of two schema mappings. We first study the important case of schema mappings in which the specification is given by a finite set of source-to-target tuple-generating dependencies (source-to-target tgds). We show that the composition of a finite set of full source-to-target tgds with a finite set of tgds is always definable by a finite set of source-to-target tgds, but the composition of a finite set of source-to-target tgds with a finite set of full source-to-target tgds may not be definable by any set (finite or infinite) of source-to-target tgds; furthermore, it may not be definable by any formula of least fixed-point logic, and the associated composition query may be NP-complete. After this, we introduce a class of existential second-order formulas with function symbols and equalities, which we call second-order tgds, and make a case that they are the “right” language for composing schema mappings. Specifically, we show that second-order tgds form the smallest class (up to logical equivalence) that contains every source-to-target tgd and is closed under conjunction and composition. Allowing equalities in second-order tgds turns out to be of the essence, even though the “obvious” way to define second-order tgds does not require equalities. We show that second-order tgds without equalities are not sufficiently expressive to define the composition of finite sets of source-to-target tgds. Finally, we show that second-order tgds possess good properties for data exchange and query answering: the chase procedure can be extended to second-order tgds so that it produces polynomial-time computable universal solutions in data exchange settings specified by second-order tgds.
- Abiteboul, S. and Duschka, O. M. 1998. Complexity of answering queries using materialized views. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 254--263. Google ScholarDigital Library
- Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- Beeri, C. and Vardi, M. Y. 1984a. A proof procedure for data dependencies. J. ACM. 31, 4, 718--741. Google ScholarDigital Library
- Beeri, C. and Vardi, M. Y. 1984b. Formal systems for tuple and equality generating dependencies. SIAM J. Comput. 13, 1, 76--98.Google ScholarCross Ref
- Bernstein, P. A. 2003. Applying model management to classical meta-data problems. In Proceedings of the Conference on Innovative Data Systems Research (CIDR). 209--220.Google Scholar
- Chandra, A. K. and Harel, D. 1982. Structure and complexity of relational queries. J. Comput. Syst. Sci. 25, 1, 99--128.Google ScholarCross Ref
- Dawar, A. 1998. A restricted second order logic for finite structures. Inf. Comput. 143, 2, 154--174. Google ScholarDigital Library
- Ebbinghaus, H.-D. and Flum, J. 1999. Finite Model Theory, 2nd Edition. Springer-Verlag, New York.Google Scholar
- Enderton, H. B. 2001. A Mathematical Introduction to Logic, 2nd Edition. Academic Press, Orlando, FL.Google Scholar
- Fagin, R. 1974. Generalized first-order spectra and polynomial-time recognizable sets. In Complexity of Computation, SIAM-AMS Proceedings, Vol. 7, R. M. Karp, Ed. 43--73.Google Scholar
- Fagin, R. 1982. Horn clauses and database dependencies. J. ACM 29, 4 (Oct.), 952--985. Google ScholarDigital Library
- Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2005a. Data exchange: semantics and query answering. Theoret. Comput. Sci. 336, 89--124. (Preliminary version in Proceedings of the 2003 International Conference on Database Theory, pp. 207--224.) Google ScholarDigital Library
- Fagin, R., Kolaitis, P. G., and Popa, L. 2005b. Data exchange: Getting to the core. ACM Trans. Datab. Syst. 30, 1 (March), 174--210. (A preliminary version appeared in Proceedings of the Symposium on Principles of Database Systems (PODS 2003). ACM New York, 90--101). Google ScholarDigital Library
- Fagin, R., Kolaitis, P. G., Popa, L., and Tan, W.-C. 2004. Composing schema mappings: Second-order dependencies to the rescue. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 83--94. Google ScholarDigital Library
- Feder, T. and Vardi, M. 1998. The computational structure of monotone monadic SNP and constraint satisfaction: A study through datalog and group theory. SIAM J. Comput. 28, 57--104. (Preliminary version in Proc. 25th ACM Symp. on Theory of Computing, May 1993, pp. 612--622.) Google ScholarDigital Library
- Garey, M., Johnson, D. S., and Stockmeyer, L. J. 1976. Some simplified NP-complete graph problems. Theoret. Comput. Sci. 1, 237--267.Google ScholarCross Ref
- Halevy, A. Y., Ives, Z. G., Mork, P., and Tatarinov, I. 2003. Piazza: Data management infrastructure for semantic web applications. In Proceedings of the International World Wide Web Conference. 556--567. Google ScholarDigital Library
- Immerman, N. 1999. Descriptive Complexity. Springer-Verlag, New York.Google Scholar
- Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 233--246. Google ScholarDigital Library
- Madhavan, J. and Halevy, A. Y. 2003. Composing mappings among data sources. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 572--583. Google ScholarDigital Library
- Miller, R. J., Haas, L. M., and Hernández, M. 2000. Schema mapping as query discovery. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 77--88. Google ScholarDigital Library
- Popa, L., Velegrakis, Y., Miller, R. J., Hernandez, M. A., and Fagin, R. 2002. Translating web data. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 598--609. Google ScholarDigital Library
- Vassiliadis, P., Simitsis, A., and Skiadopoulos, S. 2002. On the logical modeling of ETL processes. In Proceedings of the International Conference on Advanced Information Systems Engineering (CAiSE). 782--786. Google ScholarDigital Library
Index Terms
- Composing schema mappings: Second-order dependencies to the rescue
Recommendations
Composing schema mappings: second-order dependencies to the rescue
PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsA schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). Schema mappings play a key role in numerous areas of ...
Quasi-inverses of schema mappings
Schema mappings are high-level specifications that describe the relationship between two database schemas. Two operators on schema mappings, namely the composition operator and the inverse operator, are regarded as especially important. Progress on the ...
Inverting schema mappings
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsA schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). Although the notion of an inverse of a schema mapping is ...
Comments