ABSTRACT
The creation of values to represent incomplete information, often referred to as value invention, is central in data exchange. Within schema mappings, Skolem functions have long been used for value invention as they permit a precise representation of missing information. Recent work on a powerful mapping language called second-order tuple generating dependencies (SO tgds), has drawn attention to the fact that the use of arbitrary Skolem functions can have negative computational and programmatic properties in data exchange. In this paper, we present two techniques for understanding when the Skolem functions needed to represent the correct semantics of incomplete information are computationally well-behaved. Specifically, we consider when the Skolem functions in second-order (SO) mappings have a first-order (FO) semantics and are therefore programmatically and computationally more desirable for use in practice. Our first technique, linearization, significantly extends the Nash, Bernstein and Melnik unskolemization algorithm, by understanding when the sets of arguments of the Skolem functions in a mapping are related by set inclusion. We show that such a linear relationship leads to mappings that have FO semantics and are expressible in popular mapping languages including source-to-target tgds and nested tgds. Our second technique uses source semantics, specifically functional dependencies (including keys), to transform SO mappings into equivalent FO mappings. We show that our algorithms are applicable to a strictly larger class of mappings than previous approaches, but more importantly we present an extensive experimental evaluation that quantifies this difference (about 78% improvement) over an extensive schema mapping benchmark and illustrates the applicability of our results on real mappings.
- B. Alexe, M. A. Hernández, L. Popa, and W. C. Tan. MapMerge: Correlating Independent Schema Mappings. VLDB J., 21(2):191--211, 2012. Google ScholarDigital Library
- B. Alexe, W. C. Tan, and Y. Velegrakis. STBenchmark: Towards a Benchmark for Mapping Systems. PVLDB, 1(1):230--244, 2008. Google ScholarDigital Library
- B. Alexe, B. ten Cate, P. Kolaitis, and W. Tan. Designing and refining schema mappings via data examples. In SIGMOD, pages 133--144, 2011. Google ScholarDigital Library
- Y. An, A. Borgida, R. J. Miller, and J. Mylopoulos. A Semantic Approach to Discovering Schema Mapping Expressions. In ICDE, pages 206--215, 2007.Google ScholarCross Ref
- M. Arenas, R. Fagin, and A. Nash. Composition with Target Constraints. Logical Methods in Comput. Sci., 7(3), 2011.Google Scholar
- M. Arenas, J. Pérez, J. Reutter, and C. Riveros. Inverting Schema Mappings: Bridging the Gap between Theory and Practice. PVLDB, 2(1):1018--1029, 2009. Google ScholarDigital Library
- P. C. Arocena, M. D'Angelo, B. Glavic, and R. J. Miller. STBenchmark 2.0. Technical report, University of Toronto, 2013. http://dblab.cs.toronto.edu/project/STBench2.0.Google Scholar
- P. A. Bernstein, T. J. Green, S. Melnik, and A. Nash. Implementing Mapping Composition. VLDB J., 17(2):333--353, 2008. Google ScholarDigital Library
- P. A. Bernstein, A. Y. Halevy, and R. A. Pottinger. A Vision for Management of Complex Models. SIGMOD Record, 29(4):55--63, Dec. 2000. Google ScholarDigital Library
- H. B. Enderton. A Mathematical Introduction to Logic. Harcout Academic Press, 2nd. Edition, 2001.Google Scholar
- R. Fagin, P. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. Theor. Comput. Sci., 336(1):89--124, 2005. Google ScholarDigital Library
- R. Fagin, P. Kolaitis, L. Popa, and W. C. Tan. Composing Schema Mappings: Second-Order Dependencies to the Rescue. TODS, 30(4):994--1055, 2005. Google ScholarDigital Library
- I. Feinerer, R. Pichler, E. Sallinger, and V. Savenkov. On the Undecidability of the Equivalence of Second-Order Tuple Generating Dependencies. In AMW, 2011.Google Scholar
- A. Fuxman, M. A. Hernandez, H. Ho, R. J. Miller, P. Papotti, and L. Popa. Nested Mappings: Schema Mapping Reloaded. In VLDB, pages 67--78, 2006. Google ScholarDigital Library
- A. Fuxman, P. Kolaitis, R. J. Miller, and W. C. Tan. Peer Data Exchange. TODS, 31(4):1454--1498, 2006. Google ScholarDigital Library
- D. Gabbay, R. Schmidt, and A. Szalas. Second Order Quantifier Elimination: Foundations, Computational Aspects and Applications. College Publications, 2008. Google ScholarDigital Library
- T. J. Green, G. Karvounarakis, N. E. Taylor, O. Biton, Z. G. Ives, and V. Tannen. ORCHESTRA: Facilitating Collaborative Data Sharing. In SIGMOD, pages 1131--1133, 2007. Google ScholarDigital Library
- R. Hull and M. Yoshikawa. ILOG: Declarative Creation and Manipulation of Object Identifiers. In VLDB, pages 455--468, 1990. Google ScholarDigital Library
- A. Klug and R. Price. Determining View Dependencies Using Tableaux. TODS, 7(3):361--380, 1982. Google ScholarDigital Library
- M. K. Lawrence, R. A. Pottinger, and S. Staub-French. Data Coordination: Supporting Contingent Updates. PVLDB, 4(11):831--842, 2011.Google Scholar
- M. Lenzerini. Data Integration: a Theoretical Perspective. In PODS, pages 233--246, 2002. Google ScholarDigital Library
- L. Libkin and C. Sirangelo. Data Exchange and Schema Mappings in Open and Closed Worlds. J. Comput. Syst. Sci., 77(3):542--571, 2011. Google ScholarDigital Library
- B. Marnette, G. Mecca, and P. Papotti. Scalable Data Exchange with Functional Dependencies. PVLDB, 3(1):105--116, 2010. Google ScholarDigital Library
- B. Marnette, G. Mecca, P. Papotti, S. Raunich, and D. Santoro. ++Spicy: an OpenSource Tool for Second-Generation Schema Mapping and Data Exchange. PVLDB, 4(12):1438--1441, 2011.Google ScholarDigital Library
- R. J. Miller, D. Fisla, M. Huang, D. Kymlicka, F. Ku, and V. Lee. The Amalgam Schema and Data Integration Test Suite. www.cs.toronto.edu/~miller/amalgam, 2001.Google Scholar
- A. Nash, P. A. Bernstein, and S. Melnik. Composition of Mappings Given by Embedded Dependencies. TODS, 32(1):4, 2007. Google ScholarDigital Library
- Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina. Object Fusion in Mediator Systems. In VLDB, pages 413--424, 1996. Google ScholarDigital Library
- R. Pichler and S. Skritek. The Complexity of Evaluating Tuple Generating Dependencies. In ICDT, pages 244--255, 2011. Google ScholarDigital Library
- L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernandez, and R. Fagin. Translating Web Data. In VLDB, pages 598--609, 2002. Google ScholarDigital Library
- L. Seligman, P. Mork, A. Y. Halevy, K. P. Smith, M. J. Carey, K. Chen, C. Wolf, J. Madhavan, A. Kannan, and D. Burdick. OpenII: an Open Source Information Integration Toolkit. In SIGMOD, pages 1057--1060, 2010. Google ScholarDigital Library
- B. ten Cate and P. Kolaitis. Structural Characterizations of Schema-Mapping Languages. In ICDT, pages 63--72, 2009. Google ScholarDigital Library
- C. Yu and L. Popa. Semantic Adaptation of Schema Mappings when Schemas Evolve. VDLB, pages 1006--1017, 2005. Google ScholarDigital Library
Index Terms
- Value invention in data exchange
Recommendations
Reflections on Schema Mappings, Data Exchange, and Metadata Management
PODS '18: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsA schema mapping is a high-level specification of the relationship between two database schemas. For the past fifteen years, schema mappings have played an essential role in the modeling and analysis of data exchange, data integration, and related data ...
On the Language of Nested Tuple Generating Dependencies
Best of SIGMOD 2018, Best of PODS 2018 and Regular PapersDuring the past 15 years, schema mappings have been extensively used in formalizing and studying such critical data interoperability tasks as data exchange and data integration. Much of the work has focused on GLAV mappings, i.e., schema mappings ...
Nested dependencies: structure and reasoning
PODS '14: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsDuring the past decade, schema mappings have been extensively used in formalizing and studying such critical data interoperability tasks as data exchange and data integration. Much of the work has focused on GLAV mappings, i.e., schema mappings ...
Comments