Article

iMAP: discovering complex semantic matches between database schemas

Authors:
Robin Dhamankar

University of Illinois, Urbana-Champaign, IL

University of Illinois, Urbana-Champaign, IL
View Profile

,
Yoonkyong Lee

University of Illinois, Urbana-Champaign, IL

University of Illinois, Urbana-Champaign, IL
View Profile

,
AnHai Doan

University of Illinois, Urbana-Champaign, IL

University of Illinois, Urbana-Champaign, IL
View Profile

,
Alon Halevy

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

,
Pedro Domingos

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of dataJune 2004Pages 383–394https://doi.org/10.1145/1007568.1007612

Published:13 June 2004Publication History

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

Pages 383–394

ABSTRACT

Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these works deal only with one-to-one (1-1) matches, such as address = location. They do not consider the important class of more complex matches, such as address = concat (city, state) and room-pric = room-rate* (1 + tax-rate).We describe the iMAP system which semi-automatically discovers both 1-1 and complex matches. iMAP reformulates schema matching as a search in an often very large or infinite match space. To search effectively, it employs a set of searchers, each discovering specific types of complex matches. To further improve matching accuracy, iMAP exploits a variety of domain knowledge, including past complex matches, domain integrity constraints, and overlap data. Finally, iMAP introduces a novel feature that generates explanation of predicted matches, to provide insights into the matching process and suggest actions to converge on correct matches quickly. We apply iMAP to several real-world domains to match relational tables, and show that it discovers both 1-1 and complex matches with high accuracy.

References

J. Berlin and A. Motro. Database schema matching using machine learning with feature selection. In Proc. of CAiSE-2002. Google ScholarDigital Library
S. Castano and V. D. Antonellis. A schema analysis and reconciliation tool environment. In Proc. of IDEAS-1999. Google ScholarDigital Library
C. Clifton, E. Housman, and A. Rosenthal. Experience with a combined approach to attribute-matching across heterogeneous databases. In Proc. of the IFIP Working Conference on Data Semantics (DS-7), 1997.Google Scholar
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, NY, 1991. Google ScholarDigital Library
T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In Proc. of SIGMOD-2002. Google ScholarDigital Library
R. Dhamankar. Semi-automated discovery of matches between schemas, ontologies, and data fragments of disparate data sources. M. S. Thesis, Dept. of CS, Univ. of Illinois. To appear.Google Scholar
H. Do, S. Melnik, and E. Rahm. Comparison of schema matching evaluations. In Proceedings of the 2nd Int. Workshop on Web Databases 2002. Google ScholarDigital Library
H. Do and E. Rahm. Coma: A system for flexible combination of schema matching approaches. In Proc. of VLDB-2002. Google ScholarDigital Library
A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine learning approach. In Proc. of SIGMOD-2001. Google ScholarDigital Library
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, NY, 1973.Google ScholarDigital Library
D. Embley, D. Jackman, and L. Xu. Multifaceted exploitation of metadata for attribute match discovery in information integration. In Proc. of the WIIW-01, 2001.Google Scholar
B. He and K. C.-C. Chang. Statistical schema matching across web query interfaces. In Proc. of SIGMOD-2003. Google ScholarDigital Library
J. Kang and J. Naughton. On schema matching with opaque column names and data values. In Proc. of SIGMOD-2003. Google ScholarDigital Library
M. Lenzerini. Data integration; a theoretical perspective. In Proc. of PODS-2002. Google ScholarDigital Library
W. Li and C. Clifton. SEMINT: A tool for identifying attribute correspondence in heterogeneous databases using neural networks. Data and Knowledge Engineering, 33:49--84, 2000. Google ScholarDigital Library
J. Madhavan, P. Bernstein, K. Chen, A. Halevy, and P. Shenoy. Matching schemas by learning from a schema corpus. In Proc. of the IJCAI-03 Workshop on Info. Integration, 2003.Google Scholar
J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In Proc. of VLDB-2001. Google ScholarDigital Library
C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, US, 1999. Google ScholarDigital Library
S. Melnik, H. Molina-Garcia, and E. Rahm. Similarity flooding: a versatile graph matching algorithm. In Proc. of ICDE-2002. Google ScholarDigital Library
R. Miller. Using schematically heterogeneous structures. In Proc. of SIGMOD-1998. Google ScholarDigital Library
T. Milo and S. Zohar. Using schema matching to simplify heterogeneous data translation. In Proc. of VLDB-1998. Google ScholarDigital Library
P. Mitra, G. Wiederhold, and J. Jannink. Semi-automatic integration of knowledge sources. In Proc. of Fusion-1999.Google Scholar
M. Perkowitz and O. Etzioni. Category translation: Learning to understand information on the internet. In Proc. of Int. Conf. on AI (IJCAI), 1995. Google ScholarDigital Library
E. Rahm and P. Bernstein. On matching schemas automatically. VLDB Journal, 10(4), 2001. Google ScholarDigital Library
S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995. Google ScholarDigital Library
L. Seligman, A. Rosenthal, P. Lehner, and A. Smith. Data integration: Where does the time go? IEEE Data Engineering Bulletin, 2002.Google Scholar
L. Todorovski and S. Dzeroski. Declarative bias in equation discovery. In Proc. of the Int. Conf. on Machine Learning (ICML), 1997. Google ScholarDigital Library
L. Xu and D. Embley. Using domain ontologies to discover direct and indirect matches for schema elements. In Proc. of the Semantic Integration Workshop at ISWC-2003.Google Scholar
L. Yan, R. Miller, L. Haas, and R. Fagin. Data driven understanding and refinement of schema mappings. In Proc. of SIGMOD-2001. Google ScholarDigital Library

iMAP: discovering complex semantic matches between database schemas
1. Information systems
  1. Data management systems

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
June 2004
988 pages
ISBN:1581138598
DOI:10.1145/1007568
Conference Chairs:
Arnd Christian König
Microsoft Research
,
Stefan Dessloch
University of Kaiserslautern, Germany
,
General Chair:
Patrick Valduriez
INRIA, France
,
Program Chair:
Gerhard Weikum
University of the Saarland
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 266
  Total Citations
  View Citations
- 1,795
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

iMAP: discovering complex semantic matches between database schemas

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Managing Imap

The Book of IMAP: Building a Mail Server with Courier and Cyrus

RFC3691: Internet Message Access Protocol (IMAP) UNSELECT command