ABSTRACT
We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be structured under a constrained target schema. However, despite their wide use, target fds' evaluation is still a bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL approach typically do not support target fds unless additional information is provided. Alternatively, DE engines that do include these dependencies typically pay the price of a significant drop in performance and scalability. In this paper, we present a novel chase-based algorithm that can efficiently handle arbitrary fds on the target. Our approach essentially relies on exploiting the interactions between source-to-target (s-t) tuple-generating dependencies (tgds) and target fds. This allows us to tame the size of the intermediate chase results, by playing on a careful ordering of chase steps interleaving fds and (chosen) tgds. As a direct consequence, we importantly diminish the fd application scope, often a central cause of the dramatic overhead induced by target fds. Moreover, reasoning on dependency interaction further leads us to interesting parallelization opportunities, yielding additional scalability gains. We provide a proof-of-concept implementation of our chase-based algorithm and an experimental study aimed at gauging its scalability and efficiency. Finally, we empirically compare with the latest DE engines, and show that our algorithm outperforms them.
- P. C. Arocena, R. Ciucanu, B. Glavic, and R. J. Miller. Gain control over your integration evaluations. PVLDB, 8(12):1960--1963, 2015. Google ScholarDigital Library
- M. Benedikt, J. Leblay, and E. Tsamoura. Querying with access patterns and integrity constraints. PVLDB, 8(6):690--701, 2015. Google ScholarDigital Library
- A. Bonifati, I. Ileana, and M. Linardi. Functional dependencies unleashed for scalable data exchange, 2016. http://arxiv.org/abs/1602.00563.Google Scholar
- A. Calì, G. Gottlob, and M. Kifer. Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. (JAIR), 48:115--174, 2013. Google ScholarDigital Library
- A. Deutsch, L. Popa, and V. Tannen. Physical data independence, constraints, and optimization with universal plans. In Proceedings of VLDB, p. 459--470, 1999. Google ScholarDigital Library
- R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theoretical Computer Science, 336(1):89--124, 2005. Google ScholarDigital Library
- R. Fagin, P. G. Kolaitis, and L. Popa. Data exchange: getting to the core. ACM Trans. Database Syst., 30(1):174--210, 2005. Google ScholarDigital Library
- F. Geerts, G. Mecca, P. Papotti, and D. Santoro. Mapping and cleaning. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pages 232--243, 2014.Google ScholarCross Ref
- G. Gottlob and A. Nash. Efficient core computation in data exchange. J. ACM, 55(2), 2008. Google ScholarDigital Library
- G. Gottlob, R. Pichler, and V. Savenkov. Normalization and optimization of schema mappings. VLDB J., 20(2):277--302, 2011. Google ScholarDigital Library
- G. Gottlob, S. Rudolph, and M. Simkus. Expressiveness of guarded existential rule languages. In Proceedings of PODS'14, pages 27--38, 2014. Google ScholarDigital Library
- G. Grahne and A. Onet. The data-exchange chase under the microscope. CoRR, abs/1407.2279, 2014.Google Scholar
- T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen. Update exchange with mappings and provenance. In Proceedings of VLDB, pages 675--686, 2007. Google ScholarDigital Library
- P. G. Kolaitis, J. Panttaja, and W. C. Tan. The complexity of data exchange. In Proceedings of PODS, p. 30--39, 2006. Google ScholarDigital Library
- G. Konstantinidis and J. L. Ambite. Optimizing the chase: Scalable data integration under constraints. PVLDB, 7(14):1869--1880, 2014. Google ScholarDigital Library
- B. Marnette, G. Mecca, and P. Papotti. Scalable data exchange with functional dependencies. PVLDB, 3(1):105--116, 2010. Google ScholarDigital Library
- B. Marnette, G. Mecca, P. Papotti, S. Raunich, and D. Santoro. ++Spicy: an OpenSource Tool for Second-Generation Schema Mapping and Data Exchange. PVLDB, 4(12):1438--1441, 2011.Google ScholarDigital Library
- R. Pichler and V. Savenkov. Towards practical feasibility of core computation in data exchange. Theor. Comput. Sci., 411(7-9):935--957, 2010. Google ScholarDigital Library
- L. Popa, Y. Velegrakis, M. A. Hernández, R. J. Miller, and R. Fagin. Translating web data. In Proceedings of VLDB, pages 598--609, 2002. Google ScholarDigital Library
- B. ten Cate, L. Chiticariu, P. G. Kolaitis, and W. C. Tan. Laconic schema mappings: Computing the core with SQL queries. PVLDB, 2(1):1006--1017, 2009. Google ScholarDigital Library
- B. ten Cate, R. L. Halpert, and P. G. Kolaitis. Practical query answering in data exchange under inconsistencytolerant semantics. In Proceedings of EDBT, pages 233--244, 2016.Google Scholar
- http://www.db.unibas.it/projects/llunatic/.Google Scholar
- http://www.db.unibas.it/projects/spicy/.Google Scholar
- www.mi.parisdescartes.fr/~mlinardi/FD_DE_paper.html.Google Scholar
- Functional Dependencies Unleashed for Scalable Data Exchange
Recommendations
A Survey Study on XML Functional Dependencies
ISDPE '07: Proceedings of the The First International Symposium on Data, Privacy, and E-CommerceThere are two major kinds of XML functional dependency (FD) definitions. The first kind of XML FD includes Tree-tuple-based XML FD (tFD) and Path-based XML FD (pFD), and the second kind of XML FD includes Extended-path-based XML FD (epFD), Sub-graph-...
Quasi-inverses of schema mappings
Schema mappings are high-level specifications that describe the relationship between two database schemas. Two operators on schema mappings, namely the composition operator and the inverse operator, are regarded as especially important. Progress on the ...
Composing schema mappings: Second-order dependencies to the rescue
Special Issue: SIGMOD/PODS 2004A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). A fundamental problem is composing schema mappings: given ...
Comments