skip to main content
10.1145/2213836.2213846acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Sample-driven schema mapping

Published:20 May 2012Publication History

ABSTRACT

End-users increasingly find the need to perform light-weight, customized schema mapping. State-of-the-art tools provide powerful functions to generate schema mappings, but they usually require an in-depth understanding of the semantics of multiple schemas and their correspondences, and are thus not suitable for users who are technically unsophisticated or when a large number of mappings must be performed.

We propose a system for sample-driven schema mapping. It automatically constructs schema mappings, in real time, from user-input sample target instances. Because the user does not have to provide any explicit attribute-level match information, she is isolated from the possibly complex structure and semantics of both the source schemas and the mappings. In addition, the user never has to master any operations specific to schema mappings: she simply types data values into a spreadsheet-style interface. As a result, the user can construct mappings with a much lower cognitive burden.

In this paper we present Mweaver, a prototype sample-driven schema mapping system. It employs novel algorithms that enable the system to obtain desired mapping results while meeting interactive response performance requirements. We show the results of a user study that compares Mweaver with two state-of-the-art mapping tools across several mapping tasks, both real and synthetic. These suggest that the Mweaver system enables users to perform practical mapping tasks in about 1/5th the time needed by the state-of-the-art tools.

References

  1. Altova mapforce. http://www.altova.com/mapforce.html.Google ScholarGoogle Scholar
  2. Microsoft biztalk server. http://www.microsoft.com/biztalk/en/us/.Google ScholarGoogle Scholar
  3. Stylus studio. http://www.stylusstudio.com/.Google ScholarGoogle Scholar
  4. S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, page 5, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Alexe, L. Chiticariu, R. Miller, and W. Tan. Muse: Mapping understanding and design by example. In ICDE, pages 10--19, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Alexe, L. Chiticariu, and W. Tan. SPIDER: a schema mapPIng DEbuggeR. In VLDB, pages 1179--1182, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Alexe, P. Kolaitis, and W. Tan. Characterizing schema mappings via data examples. In SIGMOD, pages 261--272, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Alexe, B. ten Cate, P. Kolaitis, and W. Tan. Designing and refining schema mappings via data examples. In SIGMOD, page 133, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Barceló. Logical foundations of relational data exchange. SIGMOD Rec., 38:49--58, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z. Bellahense, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, S. Sudarshan, and I. Bombay. Keyword searching and browsing in databases using BANKS. In ICDE, page 431, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Cafarella, A. Halevy, and N. Khoussainova. Data integration for the relational web. VLDB, 2(1):1090--1101, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Do and E. Rahm. COMA: a system for flexible combination of schema matching approaches. In VLDB, pages 610--621, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD, pages 509--520, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Drumm, M. Schmitt, H. Do, and E. Rahm. Quickmig: automatic schema matching for data migration projects. In CIKM, pages 107--116, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Elmeleegy, M. Ouzzani, and A. Elmagarmid. Usage-based schema matching. In ICDE, pages 20--29, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, page 681, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. Making database systems usable. SIGMOD, pages 13--24, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, pages 505--516, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Kang and J. Naughton. On schema matching with opaque column names and data values. In SIGMOD, pages 205--216, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Kolaitis. Schema mappings, data exchange, and metadata management. In PODS, pages 61--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Lenzerini. Data integration: A theoretical perspective. In PODS, pages 233--246, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Madhavan, P. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In ICDE, pages 57--68, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In VLDB, pages 49--58, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, pages 117--128, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Nandi and P. Bernstein. HAMSTER: using search clicklogs for schema and taxonomy matching. VLDB, 2(1):181--192, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Popa, Y. Velegrakis, M. Hernández, R. Miller, and R. Fagin. Translating web data. In VLDB, pages 598--609, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Talukdar, Z. Ives, and F. Pereira. Automatically incorporating new sources in keyword search-based data integration. In SIGMOD, pages 387--398, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Talukdar, M. Jacob, M. Mehmood, K. Crammer, Z. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. VLDB, 1(1):785--796, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Yan, R. Miller, L. Haas, and R. Fagin. Data-driven understanding and refinement of schema mappings. In SIGMOD, page 485, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Zloof. Query by example. In Proceedings of the May 19-22, 1975, national computer conference and exposition, pages 431--438, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sample-driven schema mapping

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
              May 2012
              886 pages
              ISBN:9781450312479
              DOI:10.1145/2213836

              Copyright © 2012 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 20 May 2012

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              SIGMOD '12 Paper Acceptance Rate48of289submissions,17%Overall Acceptance Rate785of4,003submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader