skip to main content
10.1145/3209900.3209902acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Beaver: Towards a Declarative Schema Mapping

Published:10 June 2018Publication History

ABSTRACT

Schema mapping is used to transform data to a desired schema from data sources with different schemas. Manually writing complete schema mapping specifications requires a deep understanding of the source and target schemas, which can be burdensome for the user. Programming By Example (PBE) schema mapping methods allow the user to describe the schema mapping using data records. However, real data records are still harder to specify compared to other useful insights about the desired schema mapping the user might have. In this project, we develop a new schema mapping technique, Beaver, that enables an interaction model that gives the user more flexibility in describing the desired schema mapping. The end user is not limited to providing exact and complete target schema data examples but may also provide incomplete or ambiguous examples. Moreover, the user can provide other types of descriptions, like data type or value range, about the target schema. We design an explore-and-verify search-based algorithm to efficiently discover all satisfying schema mapping specifications. We implemented a prototype of our schema mapping technique and experimentally evaluated the efficiency of the system in handling traditional PBE schema mapping test cases, as well as our newly-proposed declarative schema mapping test cases. The experiment results show that the declarative queries, which we believe are easier for non-expert user to input, often cost around zero to five seconds more than the traditional PBE queries. This suggests we retain a system efficiency comparable to traditional PBE schema mapping systems.

References

  1. Daniel W Barowy, Sumit Gulwani, Ted Hart, and Benjamin Zorn. 2015. FlashRelate: extracting relational data from semi-structured spreadsheets using examples. In PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Angela Bonifati, Ugo Comignani, Emmanuel Coquery, and Romuald Thion. 2017. Interactive mapping specification with exemplar tuples. In SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In POPL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Zhongjun Jin, Michael R Anderson, Michael Cafarella, and HV Jagadish. 2017. Foofah: a programming-by-example system for synthesizing data transformation programs. In SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zhongjun Jin, Michael R Anderson, Michael Cafarella, and HV Jagadish. 2017. Foofah: Transforming data by example. In SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Wolfgang May. 1999. Information extraction and integration: The Mondial case study. Technical Report. Universität Freiburg, Institut für Informatik.Google ScholarGoogle Scholar
  7. Davide Mottin, Matteo Lissandrini, Yannis Velegrakis, and Themis Palpanas. 2014. Exemplar queries: Give me an example of what you need. In PVLDB. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Li Qian, Michael J Cafarella, and HV Jagadish. 2012. Sample-driven schema mapping. In SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, and Lev Novik. 2014. Discovering queries based on example tuples. In SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing Highly Expressive SQL Queries from Input-output Examples. In PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    HILDA '18: Proceedings of the Workshop on Human-In-the-Loop Data Analytics
    June 2018
    87 pages
    ISBN:9781450358279
    DOI:10.1145/3209900

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 10 June 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate28of56submissions,50%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader