skip to main content
10.1145/1007568.1007612acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

iMAP: discovering complex semantic matches between database schemas

Published:13 June 2004Publication History

ABSTRACT

Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these works deal only with one-to-one (1-1) matches, such as address = location. They do not consider the important class of more complex matches, such as address = concat (city, state) and room-pric = room-rate* (1 + tax-rate).We describe the iMAP system which semi-automatically discovers both 1-1 and complex matches. iMAP reformulates schema matching as a search in an often very large or infinite match space. To search effectively, it employs a set of searchers, each discovering specific types of complex matches. To further improve matching accuracy, iMAP exploits a variety of domain knowledge, including past complex matches, domain integrity constraints, and overlap data. Finally, iMAP introduces a novel feature that generates explanation of predicted matches, to provide insights into the matching process and suggest actions to converge on correct matches quickly. We apply iMAP to several real-world domains to match relational tables, and show that it discovers both 1-1 and complex matches with high accuracy.

References

  1. J. Berlin and A. Motro. Database schema matching using machine learning with feature selection. In Proc. of CAiSE-2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Castano and V. D. Antonellis. A schema analysis and reconciliation tool environment. In Proc. of IDEAS-1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Clifton, E. Housman, and A. Rosenthal. Experience with a combined approach to attribute-matching across heterogeneous databases. In Proc. of the IFIP Working Conference on Data Semantics (DS-7), 1997.Google ScholarGoogle Scholar
  4. T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, NY, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In Proc. of SIGMOD-2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Dhamankar. Semi-automated discovery of matches between schemas, ontologies, and data fragments of disparate data sources. M. S. Thesis, Dept. of CS, Univ. of Illinois. To appear.Google ScholarGoogle Scholar
  7. H. Do, S. Melnik, and E. Rahm. Comparison of schema matching evaluations. In Proceedings of the 2nd Int. Workshop on Web Databases 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Do and E. Rahm. Coma: A system for flexible combination of schema matching approaches. In Proc. of VLDB-2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine learning approach. In Proc. of SIGMOD-2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, NY, 1973.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Embley, D. Jackman, and L. Xu. Multifaceted exploitation of metadata for attribute match discovery in information integration. In Proc. of the WIIW-01, 2001.Google ScholarGoogle Scholar
  12. B. He and K. C.-C. Chang. Statistical schema matching across web query interfaces. In Proc. of SIGMOD-2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Kang and J. Naughton. On schema matching with opaque column names and data values. In Proc. of SIGMOD-2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Lenzerini. Data integration; a theoretical perspective. In Proc. of PODS-2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Li and C. Clifton. SEMINT: A tool for identifying attribute correspondence in heterogeneous databases using neural networks. Data and Knowledge Engineering, 33:49--84, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Madhavan, P. Bernstein, K. Chen, A. Halevy, and P. Shenoy. Matching schemas by learning from a schema corpus. In Proc. of the IJCAI-03 Workshop on Info. Integration, 2003.Google ScholarGoogle Scholar
  17. J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In Proc. of VLDB-2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, US, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Melnik, H. Molina-Garcia, and E. Rahm. Similarity flooding: a versatile graph matching algorithm. In Proc. of ICDE-2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Miller. Using schematically heterogeneous structures. In Proc. of SIGMOD-1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Milo and S. Zohar. Using schema matching to simplify heterogeneous data translation. In Proc. of VLDB-1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Mitra, G. Wiederhold, and J. Jannink. Semi-automatic integration of knowledge sources. In Proc. of Fusion-1999.Google ScholarGoogle Scholar
  23. M. Perkowitz and O. Etzioni. Category translation: Learning to understand information on the internet. In Proc. of Int. Conf. on AI (IJCAI), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Rahm and P. Bernstein. On matching schemas automatically. VLDB Journal, 10(4), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. Seligman, A. Rosenthal, P. Lehner, and A. Smith. Data integration: Where does the time go? IEEE Data Engineering Bulletin, 2002.Google ScholarGoogle Scholar
  27. L. Todorovski and S. Dzeroski. Declarative bias in equation discovery. In Proc. of the Int. Conf. on Machine Learning (ICML), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Xu and D. Embley. Using domain ontologies to discover direct and indirect matches for schema elements. In Proc. of the Semantic Integration Workshop at ISWC-2003.Google ScholarGoogle Scholar
  29. L. Yan, R. Miller, L. Haas, and R. Fagin. Data driven understanding and refinement of schema mappings. In Proc. of SIGMOD-2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. iMAP: discovering complex semantic matches between database schemas

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
      June 2004
      988 pages
      ISBN:1581138598
      DOI:10.1145/1007568

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 June 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader