skip to main content
research-article

Entity resolution: theory, practice & open challenges

Published:01 August 2012Publication History
Skip Abstract Section

Abstract

This tutorial brings together perspectives on ER from a variety of fields, including databases, machine learning, natural language processing and information retrieval, to provide, in one setting, a survey of a large body of work. We discuss both the practical aspects and theoretical underpinnings of ER. We describe existing solutions, current challenges, and open research problems.

References

  1. A. Arasu, M. Goetz, and R. Kaushik. On active learning of record matching packages. In SIGMOD, 2010. Google ScholarGoogle Scholar
  2. O. Benjelloun, H. Garcia-Molina, D. Menestrina, Q. Su, S. E. Whang, and J. Widom. Swoosh: a generic approach to entity resolution. VLDB Journal, 18(1), 2009. Google ScholarGoogle Scholar
  3. I. Bhattacharya and L. Getoor. A latent dirichlet model for unsupervised entity resolution. In SDM, 2006.Google ScholarGoogle Scholar
  4. I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery in Data, 1(1), 2007. Google ScholarGoogle Scholar
  5. M. Bilenko, B. Kamath, and R. J. Mooney. Adaptive blocking: Learning to scale up record linkage and clustering. In ICDM, 2006. Google ScholarGoogle Scholar
  6. S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In SIGMOD, 2003. Google ScholarGoogle Scholar
  7. S. Chaudhuri, V. Ganti, and R. Motwani. Robust identification of fuzzy duplicates. In ICDE, 2005. Google ScholarGoogle Scholar
  8. P. Christen. Data Matching. Springer, 2012.Google ScholarGoogle Scholar
  9. W. Cohen and P. Ravikumar. A hierarchical graphical model for record linkage. In Proc. of UAI, 2004. Google ScholarGoogle Scholar
  10. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In Proc. of IJCAI, 2003.Google ScholarGoogle Scholar
  11. X. Dong, A. Halevy, and J. Madhavan. Reference reconciliation in complex information spaces. In SIGMOD, 2005. Google ScholarGoogle Scholar
  12. I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Society, 64(2283), 1969.Google ScholarGoogle Scholar
  13. L. Gravano, P. Ipeirotis, N. Koudas, and D. Srivastava. Text joins for data cleansing and integration in an rdbms. In ICDE, 2003.Google ScholarGoogle Scholar
  14. M. A. Hernandez and S. J. Stolfo. The merge/purge problem for large databases. In SIGMOD, 1995. Google ScholarGoogle Scholar
  15. D. V. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationships for domain-independent data cleaning. In SDM, 2005.Google ScholarGoogle Scholar
  16. H. Köpcke, A. Thor, and E. Rahm. Evaluation of entity resolution approaches on real-world match problems. PVLDB, 3(1):484--493, 2010. Google ScholarGoogle Scholar
  17. N. Koudas, S. Sarawagi, and D. Srivastava. Record linkage: Similarity measures and algorithms. Tutorial at SIGMOD, 2006. Google ScholarGoogle Scholar
  18. A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In KDD, 2000. Google ScholarGoogle Scholar
  19. A. McCallum and B. Wellner. Conditional models of identity uncertainty with application to noun coreference. In NIPS, 2004.Google ScholarGoogle Scholar
  20. D. Menestrina, S. E. Whang, and H. Garcia-Molina. Evaluating entity resolution results. In PVLDB, 2010. Google ScholarGoogle Scholar
  21. M. Michelson and C. A. Knoblock. Learning blocking schemes for record linkage. In AAAI, 2006. Google ScholarGoogle Scholar
  22. A. E. Monge and C. P. Elkan. An efficient domain-independent algorithm for detecting approximately duplicate database records. In SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997.Google ScholarGoogle Scholar
  23. H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. Identity uncertainty and citation matching. In NIPS, 2003. Google ScholarGoogle Scholar
  24. V. Rastogi, N. Dalvi, and M. Garofalakis. Large-scale collective entity matching. In PVLDB, 2012. Google ScholarGoogle Scholar
  25. E. S. Ristad and P. N. Yianilos. Learning string edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998. Google ScholarGoogle Scholar
  26. S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In SIGKDD, 2002. Google ScholarGoogle Scholar
  27. W. Shen, X. Li, and A. Doan. Constraint-based entity matching. In AAAI, 2005. Google ScholarGoogle Scholar
  28. P. Singla and P. Domingos. Multi-relational record linkage. In KDD, 2004.Google ScholarGoogle Scholar
  29. P. Singla and P. Domingos. Entity resolution with markov logic. In ICDM, 2006. Google ScholarGoogle Scholar
  30. S. E. Whang, D. Menestrina, G. Koutrika, M. Theobald, and H. Garcia-Molina. Entity resolution with iterative blocking. In SIGMOD, 2009. Google ScholarGoogle Scholar
  31. W. E. Winkler. Methods for record linkage and bayesian networks. Technical report, Statistical Research Division, U. S. Census Bureau, 2002.Google ScholarGoogle Scholar

Index Terms

  1. Entity resolution: theory, practice & open challenges
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 5, Issue 12
      August 2012
      340 pages

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 August 2012
      Published in pvldb Volume 5, Issue 12

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader