research-article

The SMAPH system for query entity recognition and disambiguation

Authors:
Marco Cornolti

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

,
Paolo Ferragina

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

,
Massimiliano Ciaramita

Google Research, Zurich, Switzerland

Google Research, Zurich, Switzerland
View Profile

,
Hinrich Schütze

University of Munich, Munich, Germany

University of Munich, Munich, Germany
View Profile

,
Stefan Rüd

University of Munich, Munich, Germany

University of Munich, Munich, Germany
View Profile

ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguationJuly 2014Pages 25–30https://doi.org/10.1145/2633211.2634348

Published:11 July 2014Publication History

ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguation

Pages 25–30

ABSTRACT

The SMAPH system implements a pipeline of four main steps: (1) Fetching -- it fetches the search results returned by a search engine given the query to be annotated; (2) Spotting -- search result snippets are parsed to identify candidate mentions for the entities to be annotated. This is done in a novel way by detecting the keywords-in-context by looking at the bold parts of the search snippets; (3) Candidate generation -- candidate entities are generated in two ways: from the Wikipedia pages occurring in the search results, and from an existing annotator, using the mentions identified in the spotting step as input; (4) Pruning -- a binary SVM classifier is used to decide which entities to keep/discard in order to generate the final annotation set for the query. The SMAPH system ranked third on the development set and first on the final blind test of the 2014 ERD Challenge short text track.

References

C. Boston, H. Fang, S. Carberry, H. Wu, X. Liu. Wikimantic: Toward effective disambiguation and expansion of queries. Data & Knowledge Engineering, 90: 22--37, 2014.Google ScholarCross Ref
D. Carmel, M. Chang, E. Gabrilovich, B. Hsu and K. Wang. ERD 2014: Entity Recognition and Disambiguation Challenge. SIGIR Forum, ACM, 2014. Google ScholarDigital Library
C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. In ACM Transactions on Intelligent Systems and Technology, 27:1--27:27, 2011. Google ScholarDigital Library
M. Cornolti, P. Ferragina, M. Ciaramita. A framework for benchmarking entity-annotation systems. In WWW, 249--260, 2013. Google ScholarDigital Library
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. EMNLP and CNLL, 708--716, 2007.Google Scholar
P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. IEEE Software, 29(1): 70--75, 2012. Also in ACM CIKM, 1625--1628, 2010. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. J. Artif. Int. Res., 34(1):443--498, 2009. Google ScholarDigital Library
J. Guo, G. Xu, X. Cheng, H. Li. Named Entity Recognition in Query. In SIGIR, 267--274, 2009. Google ScholarDigital Library
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol,B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proc. EMNLP, 782--792, 2011. Google ScholarDigital Library
N. Houlsby and M. Ciaramita. A Scalable Gibbs Sampler for Probabilistic Entity Linking. In Proceedings of ECIR, 335--346, 2014.Google ScholarCross Ref
S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in web text. In ACM KDD, 457--466, 2009. Google ScholarDigital Library
S. Liu, C. Yu, W. Meng. Word Sense Disambiguation in Queries. In CIKM, 525--532, 2005. Google ScholarDigital Library
M. Manshadi, X. Li. Semantic tagging of web search queries. In ACL, 861--869, 2009. Google ScholarDigital Library
E. Meij. A Comparison of five semantic linking algorithms on tweets. Personal Blog: http://alturl.com/aujuc, 2012.Google Scholar
E. Meij, K. Balog, D. Odijk. Entity linking and retrieval for semantic search. In Procs ACM WSDM, 683--684, 2014. Google ScholarDigital Library
E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In Proc. WSDM, 563--572, 2012. Google ScholarDigital Library
R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proc. ACM CIKM, 233--242, 2007. Google ScholarDigital Library
D. Milne and I. H. Witten. Learning to link with wikipedia. In Proc. CIKM, 509--518, 2008. Google ScholarDigital Library
D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. AAAI Workshop on Wikipedia and Artificial Intelligence, 2008.Google Scholar
F. Piccinno, P. Ferragina. From TagME to WAT: a new entity annotator. In Entity Annotation and Disambiguation Challange (ERD): Long track, ACM SIGIR Forum, 2014. Google ScholarDigital Library
K. Risvik, T. Mikolajewski, P. Boros. Query segmentation for web search. In WWW (poster), 2003.Google Scholar
S. Rüd, M. Ciaramita, J. Müller, and H. Schütze. Piggyback: using search engines for robust cross-domain named entity recognition. In Proc. ACL-HLT, 965--975, 2011. Google ScholarDigital Library
F.M. Suchanek, G. Weikum. Knowledge harvesting in the big-data era. In ACM SIGMOD, 933--938, 2013. Google ScholarDigital Library
X. Yin, S. Shah. Building taxonomy of web search intents for name entity queries. In WWW, 1001--1010, 2010. Google ScholarDigital Library

Index Terms

The SMAPH system for query entity recognition and disambiguation
1. Information systems
  1. World Wide Web
    1. Web applications
    2. Web services

Recommendations

A language modeling approach to entity recognition and disambiguation for search queries
ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguation

The Entity Recognition and Disambiguation (ERD) problem refers to the task of recognizing mentions of entities in a given query string, disambiguating them, and mapping them to entities in a given Knowledge Base(KB). If there are multiple ways to ...
Read More
An optimization framework for entity recognition and disambiguation
ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguation

We present a system for entity recognition and disambiguation (ERD) in short text, aiming at identifying all text fragments referring to an entity contained in Freebase. The task is organized in two steps. Given a short text the first step is ...
Read More
Re-ranking for joint named-entity recognition and linking
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Recognizing names and linking them to structured data is a fundamental task in text analysis. Existing approaches typically perform these two steps using a pipeline architecture: they use a Named-Entity Recognition (NER) system to find the boundaries of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguation
July 2014
134 pages
ISBN:9781450330237
DOI:10.1145/2633211
Conference Chairs:
David Carmel
Yahoo! Lab
,
Ming-Wei Chang
Microsoft Research
,
Evgeniy Gabrilovich
Google
,
Bo-June (Paul) Hsu
Microsoft Research
,
Kuansan Wang
Microsoft Research
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 July 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entity linking
erd 2014 challenge
query disambiguation
Qualifiers
- research-article
Conference

Acceptance Rates
ERD '14 Paper Acceptance Rate18of28submissions,64%Overall Acceptance Rate18of28submissions,64%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 244
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The SMAPH system for query entity recognition and disambiguation

ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguation

ABSTRACT

References

Cited By

Index Terms

Recommendations

A language modeling approach to entity recognition and disambiguation for search queries

An optimization framework for entity recognition and disambiguation

Re-ranking for joint named-entity recognition and linking