research-article

A framework for robust discovery of entity synonyms

Authors:
Kaushik Chakrabarti

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Surajit Chaudhuri

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Tao Cheng

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Dong Xin

Google, Kirkland, WA, USA

Google, Kirkland, WA, USA
View Profile

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2012Pages 1384–1392https://doi.org/10.1145/2339530.2339743

Published:12 August 2012Publication History

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1384–1392

ABSTRACT

Entity synonyms are critical for many applications like information retrieval and named entity recognition in documents. The current trend is to automatically discover entity synonyms using statistical techniques on web data. Prior techniques suffer from several limitations like click log sparsity and inability to distinguish between entities of different concept classes. In this paper, we propose a general framework for robustly discovering entity synonym with two novel similarity functions that overcome the limitations of prior techniques. We develop efficient and scalable techniques leveraging the MapReduce framework to discover synonyms at large scale. To handle long entity names with extraneous tokens, we propose techniques to effectively map long entity names to short queries in query log. Our experiments on real data from different entity domains demonstrate the superior quality of our synonyms as well as the efficiency of our algorithms. The entity synonyms produced by our system is in production in Bing Shopping and Video search, with experiments showing the significance it brings in improving search experience.

Supplemental Material

310_w_talk_4.mp4

mp4

364 MB

Download

References

S. Agrawal, K. Chakrabarti, S. Chaudhuri, and V. Ganti. Scalable ad-hoc entity extraction from text collections. Proc. VLDB Endow., 2008. Google ScholarDigital Library
M. Baroni and S. Bisi. Using cooccurrence statistics and the web to discover synonyms in technical language. In In Proceedings of LREC 2004, pages 1725--1728, 2004.Google Scholar
S. Chaudhuri, V. Ganti, and D. Xin. Exploiting web search to generate synonyms for entities. In WWW Conference, 2009. Google ScholarDigital Library
S. Chaudhuri, V. Ganti, and D. Xin. Mining document collections to facilitate accurate approximate entity matching. PVLDB, 2009. Google ScholarDigital Library
S. Chaudhuri, V. Ganti, and D. Xin. Mining document collections to facilitate accurate approximate entity matching. PVLDB, 2(1), 2009. Google ScholarDigital Library
T. Cheng, H. Lauw, and S. Paparizos. Fuzzy matching of web queries to structured data. In ICDE, 2010.Google ScholarCross Ref
T. Cheng, H. W. Lauw, and S. Paparizos. Entity synonyms for structured web search. TKDE, 2011.Google Scholar
N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, 2007. Google ScholarDigital Library
X. Dong, A. Halevy, and J. Madhavan. Reference reconciliation in complex information spaces. In SIGMOD, 2005. Google ScholarDigital Library
G. W. Furnas, S. C. Deerwester, S. T. Dumais, T. K. Landauer, R. A. Harshman, L. A. Streeter, and K. E. Lochbaum. Information retrieval using a singular value decomposition model of latent semantic structure. In SIGIR, 1988. Google ScholarDigital Library
Z. Harris. Distributional structure. Word, 10(23), 1954.Google Scholar
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys, 2007. Google ScholarDigital Library
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In WWW, 2006. Google ScholarDigital Library
C. D. Manning and H. Schütze. Foundations of statistical natural language processing. MIT Press, 1999. Google ScholarDigital Library
Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In CIKM, 2008. Google ScholarDigital Library
G. Navarro. A guided tour to approximate string matching. ACM Comput. Surv., 2001. Google ScholarDigital Library
P. Pantel, E. Crestan, A. Borkovsky, A.-M. Popescu, and V. Vyas. Web-scale distributional similarity and entity set expansion. In EMNLP, 2009. Google ScholarDigital Library
P. D. Turney. Mining the web for synonyms: Pmi-ir versus lsa on toefl. CoRR, cs.LG/0212033, 2002. Google ScholarDigital Library
T. Wang and G. Hirst. Near-synonym lexical choice in latent semantic space. In COLING, 2010. Google ScholarDigital Library

Index Terms

A framework for robust discovery of entity synonyms
1. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining

Recommendations

Rule based synonyms for entity extraction from noisy text
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Identification of named entities such as person, organization and product names from text is an important task in information extraction. In many domains, the same entity could be referred to in multiple ways due to variations introduced by different ...
Read More
An Automatic Approach for Extracting Chinese Entity Synonyms from Encyclopedias
ICBDT '20: Proceedings of the 3rd International Conference on Big Data Technologies

Synonyms play an important role in many entity-based applications. However, most known synonym extraction methods are in English, while Chinese ones are relatively rare. In this paper, we propose a simple yet effective extraction and cleaning framework ...
Read More
KGSynNet: A Novel Entity Synonyms Discovery Framework with Knowledge Graph
Database Systems for Advanced Applications
Abstract
Entity synonyms discovery is crucial for entity-leveraging applications. However, existing studies suffer from several critical issues: (1) the input mentions may be out-of-vocabulary (OOV) and may come from a different semantic space of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entity synonym
pseudo document similarity
query context similarity
robust synonym discovery
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 645
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A framework for robust discovery of entity synonyms

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Rule based synonyms for entity extraction from noisy text

An Automatic Approach for Extracting Chinese Entity Synonyms from Encyclopedias

KGSynNet: A Novel Entity Synonyms Discovery Framework with Knowledge Graph