skip to main content
10.1145/2949689.2949698acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Functional Dependencies Unleashed for Scalable Data Exchange

Published:18 July 2016Publication History

ABSTRACT

We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be structured under a constrained target schema. However, despite their wide use, target fds' evaluation is still a bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL approach typically do not support target fds unless additional information is provided. Alternatively, DE engines that do include these dependencies typically pay the price of a significant drop in performance and scalability. In this paper, we present a novel chase-based algorithm that can efficiently handle arbitrary fds on the target. Our approach essentially relies on exploiting the interactions between source-to-target (s-t) tuple-generating dependencies (tgds) and target fds. This allows us to tame the size of the intermediate chase results, by playing on a careful ordering of chase steps interleaving fds and (chosen) tgds. As a direct consequence, we importantly diminish the fd application scope, often a central cause of the dramatic overhead induced by target fds. Moreover, reasoning on dependency interaction further leads us to interesting parallelization opportunities, yielding additional scalability gains. We provide a proof-of-concept implementation of our chase-based algorithm and an experimental study aimed at gauging its scalability and efficiency. Finally, we empirically compare with the latest DE engines, and show that our algorithm outperforms them.

References

  1. P. C. Arocena, R. Ciucanu, B. Glavic, and R. J. Miller. Gain control over your integration evaluations. PVLDB, 8(12):1960--1963, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Benedikt, J. Leblay, and E. Tsamoura. Querying with access patterns and integrity constraints. PVLDB, 8(6):690--701, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bonifati, I. Ileana, and M. Linardi. Functional dependencies unleashed for scalable data exchange, 2016. http://arxiv.org/abs/1602.00563.Google ScholarGoogle Scholar
  4. A. Calì, G. Gottlob, and M. Kifer. Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. (JAIR), 48:115--174, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Deutsch, L. Popa, and V. Tannen. Physical data independence, constraints, and optimization with universal plans. In Proceedings of VLDB, p. 459--470, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theoretical Computer Science, 336(1):89--124, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Fagin, P. G. Kolaitis, and L. Popa. Data exchange: getting to the core. ACM Trans. Database Syst., 30(1):174--210, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Geerts, G. Mecca, P. Papotti, and D. Santoro. Mapping and cleaning. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pages 232--243, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  9. G. Gottlob and A. Nash. Efficient core computation in data exchange. J. ACM, 55(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Gottlob, R. Pichler, and V. Savenkov. Normalization and optimization of schema mappings. VLDB J., 20(2):277--302, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Gottlob, S. Rudolph, and M. Simkus. Expressiveness of guarded existential rule languages. In Proceedings of PODS'14, pages 27--38, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Grahne and A. Onet. The data-exchange chase under the microscope. CoRR, abs/1407.2279, 2014.Google ScholarGoogle Scholar
  13. T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen. Update exchange with mappings and provenance. In Proceedings of VLDB, pages 675--686, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. G. Kolaitis, J. Panttaja, and W. C. Tan. The complexity of data exchange. In Proceedings of PODS, p. 30--39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Konstantinidis and J. L. Ambite. Optimizing the chase: Scalable data integration under constraints. PVLDB, 7(14):1869--1880, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Marnette, G. Mecca, and P. Papotti. Scalable data exchange with functional dependencies. PVLDB, 3(1):105--116, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Marnette, G. Mecca, P. Papotti, S. Raunich, and D. Santoro. ++Spicy: an OpenSource Tool for Second-Generation Schema Mapping and Data Exchange. PVLDB, 4(12):1438--1441, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Pichler and V. Savenkov. Towards practical feasibility of core computation in data exchange. Theor. Comput. Sci., 411(7-9):935--957, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Popa, Y. Velegrakis, M. A. Hernández, R. J. Miller, and R. Fagin. Translating web data. In Proceedings of VLDB, pages 598--609, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. ten Cate, L. Chiticariu, P. G. Kolaitis, and W. C. Tan. Laconic schema mappings: Computing the core with SQL queries. PVLDB, 2(1):1006--1017, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. ten Cate, R. L. Halpert, and P. G. Kolaitis. Practical query answering in data exchange under inconsistencytolerant semantics. In Proceedings of EDBT, pages 233--244, 2016.Google ScholarGoogle Scholar
  22. http://www.db.unibas.it/projects/llunatic/.Google ScholarGoogle Scholar
  23. http://www.db.unibas.it/projects/spicy/.Google ScholarGoogle Scholar
  24. www.mi.parisdescartes.fr/~mlinardi/FD_DE_paper.html.Google ScholarGoogle Scholar
  1. Functional Dependencies Unleashed for Scalable Data Exchange

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SSDBM '16: Proceedings of the 28th International Conference on Scientific and Statistical Database Management
      July 2016
      290 pages

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 July 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate56of146submissions,38%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader