ABSTRACT
End-users increasingly find the need to perform light-weight, customized schema mapping. State-of-the-art tools provide powerful functions to generate schema mappings, but they usually require an in-depth understanding of the semantics of multiple schemas and their correspondences, and are thus not suitable for users who are technically unsophisticated or when a large number of mappings must be performed.
We propose a system for sample-driven schema mapping. It automatically constructs schema mappings, in real time, from user-input sample target instances. Because the user does not have to provide any explicit attribute-level match information, she is isolated from the possibly complex structure and semantics of both the source schemas and the mappings. In addition, the user never has to master any operations specific to schema mappings: she simply types data values into a spreadsheet-style interface. As a result, the user can construct mappings with a much lower cognitive burden.
In this paper we present Mweaver, a prototype sample-driven schema mapping system. It employs novel algorithms that enable the system to obtain desired mapping results while meeting interactive response performance requirements. We show the results of a user study that compares Mweaver with two state-of-the-art mapping tools across several mapping tasks, both real and synthetic. These suggest that the Mweaver system enables users to perform practical mapping tasks in about 1/5th the time needed by the state-of-the-art tools.
- Altova mapforce. http://www.altova.com/mapforce.html.Google Scholar
- Microsoft biztalk server. http://www.microsoft.com/biztalk/en/us/.Google Scholar
- Stylus studio. http://www.stylusstudio.com/.Google Scholar
- S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, page 5, 2002. Google ScholarDigital Library
- B. Alexe, L. Chiticariu, R. Miller, and W. Tan. Muse: Mapping understanding and design by example. In ICDE, pages 10--19, 2008. Google ScholarDigital Library
- B. Alexe, L. Chiticariu, and W. Tan. SPIDER: a schema mapPIng DEbuggeR. In VLDB, pages 1179--1182, 2006. Google ScholarDigital Library
- B. Alexe, P. Kolaitis, and W. Tan. Characterizing schema mappings via data examples. In SIGMOD, pages 261--272, 2010. Google ScholarDigital Library
- B. Alexe, B. ten Cate, P. Kolaitis, and W. Tan. Designing and refining schema mappings via data examples. In SIGMOD, page 133, 2011. Google ScholarDigital Library
- P. Barceló. Logical foundations of relational data exchange. SIGMOD Rec., 38:49--58, June 2009. Google ScholarDigital Library
- Z. Bellahense, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Springer, 2011. Google ScholarDigital Library
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, S. Sudarshan, and I. Bombay. Keyword searching and browsing in databases using BANKS. In ICDE, page 431, 2002. Google ScholarDigital Library
- M. Cafarella, A. Halevy, and N. Khoussainova. Data integration for the relational web. VLDB, 2(1):1090--1101, 2009. Google ScholarDigital Library
- H. Do and E. Rahm. COMA: a system for flexible combination of schema matching approaches. In VLDB, pages 610--621, 2002. Google ScholarDigital Library
- A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD, pages 509--520, 2001. Google ScholarDigital Library
- C. Drumm, M. Schmitt, H. Do, and E. Rahm. Quickmig: automatic schema matching for data migration projects. In CIKM, pages 107--116, 2007. Google ScholarDigital Library
- H. Elmeleegy, M. Ouzzani, and A. Elmagarmid. Usage-based schema matching. In ICDE, pages 20--29, 2008. Google ScholarDigital Library
- V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, page 681, 2002. Google ScholarDigital Library
- H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. Making database systems usable. SIGMOD, pages 13--24, 2007. Google ScholarDigital Library
- V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, pages 505--516, 2005. Google ScholarDigital Library
- J. Kang and J. Naughton. On schema matching with opaque column names and data values. In SIGMOD, pages 205--216, 2003. Google ScholarDigital Library
- P. Kolaitis. Schema mappings, data exchange, and metadata management. In PODS, pages 61--75, 2005. Google ScholarDigital Library
- M. Lenzerini. Data integration: A theoretical perspective. In PODS, pages 233--246, 2002. Google ScholarDigital Library
- J. Madhavan, P. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In ICDE, pages 57--68, 2005. Google ScholarDigital Library
- J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In VLDB, pages 49--58, 2001. Google ScholarDigital Library
- S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, pages 117--128, 2002. Google ScholarDigital Library
- A. Nandi and P. Bernstein. HAMSTER: using search clicklogs for schema and taxonomy matching. VLDB, 2(1):181--192, 2009. Google ScholarDigital Library
- L. Popa, Y. Velegrakis, M. Hernández, R. Miller, and R. Fagin. Translating web data. In VLDB, pages 598--609, 2002. Google ScholarDigital Library
- P. Talukdar, Z. Ives, and F. Pereira. Automatically incorporating new sources in keyword search-based data integration. In SIGMOD, pages 387--398, 2010. Google ScholarDigital Library
- P. Talukdar, M. Jacob, M. Mehmood, K. Crammer, Z. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. VLDB, 1(1):785--796, 2008. Google ScholarDigital Library
- L. Yan, R. Miller, L. Haas, and R. Fagin. Data-driven understanding and refinement of schema mappings. In SIGMOD, page 485, 2001. Google ScholarDigital Library
- M. Zloof. Query by example. In Proceedings of the May 19-22, 1975, national computer conference and exposition, pages 431--438, 1975. Google ScholarDigital Library
Index Terms
- Sample-driven schema mapping
Recommendations
Towards a theory of schema-mapping optimization
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsA schema mapping is a high-level specification that describes the relationship between two database schemas. As schema mappings constitute the essential building blocks of data exchange and data integration, an extensive investigation of the foundations ...
Structural characterizations of schema-mapping languages
ICDT '09: Proceedings of the 12th International Conference on Database TheorySchema mappings are declarative specifications that describe the relationship between two database schemas. In recent years, there has been an extensive study of schema mappings and of their applications to several different data inter-operability tasks,...
Quasi-inverses of schema mappings
Schema mappings are high-level specifications that describe the relationship between two database schemas. Two operators on schema mappings, namely the composition operator and the inverse operator, are regarded as especially important. Progress on the ...
Comments