ABSTRACT
The SMAPH system implements a pipeline of four main steps: (1) Fetching -- it fetches the search results returned by a search engine given the query to be annotated; (2) Spotting -- search result snippets are parsed to identify candidate mentions for the entities to be annotated. This is done in a novel way by detecting the keywords-in-context by looking at the bold parts of the search snippets; (3) Candidate generation -- candidate entities are generated in two ways: from the Wikipedia pages occurring in the search results, and from an existing annotator, using the mentions identified in the spotting step as input; (4) Pruning -- a binary SVM classifier is used to decide which entities to keep/discard in order to generate the final annotation set for the query. The SMAPH system ranked third on the development set and first on the final blind test of the 2014 ERD Challenge short text track.
- C. Boston, H. Fang, S. Carberry, H. Wu, X. Liu. Wikimantic: Toward effective disambiguation and expansion of queries. Data & Knowledge Engineering, 90: 22--37, 2014.Google ScholarCross Ref
- D. Carmel, M. Chang, E. Gabrilovich, B. Hsu and K. Wang. ERD 2014: Entity Recognition and Disambiguation Challenge. SIGIR Forum, ACM, 2014. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. In ACM Transactions on Intelligent Systems and Technology, 27:1--27:27, 2011. Google ScholarDigital Library
- M. Cornolti, P. Ferragina, M. Ciaramita. A framework for benchmarking entity-annotation systems. In WWW, 249--260, 2013. Google ScholarDigital Library
- S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. EMNLP and CNLL, 708--716, 2007.Google Scholar
- P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. IEEE Software, 29(1): 70--75, 2012. Also in ACM CIKM, 1625--1628, 2010. Google ScholarDigital Library
- E. Gabrilovich and S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. J. Artif. Int. Res., 34(1):443--498, 2009. Google ScholarDigital Library
- J. Guo, G. Xu, X. Cheng, H. Li. Named Entity Recognition in Query. In SIGIR, 267--274, 2009. Google ScholarDigital Library
- J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol,B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proc. EMNLP, 782--792, 2011. Google ScholarDigital Library
- N. Houlsby and M. Ciaramita. A Scalable Gibbs Sampler for Probabilistic Entity Linking. In Proceedings of ECIR, 335--346, 2014.Google ScholarCross Ref
- S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in web text. In ACM KDD, 457--466, 2009. Google ScholarDigital Library
- S. Liu, C. Yu, W. Meng. Word Sense Disambiguation in Queries. In CIKM, 525--532, 2005. Google ScholarDigital Library
- M. Manshadi, X. Li. Semantic tagging of web search queries. In ACL, 861--869, 2009. Google ScholarDigital Library
- E. Meij. A Comparison of five semantic linking algorithms on tweets. Personal Blog: http://alturl.com/aujuc, 2012.Google Scholar
- E. Meij, K. Balog, D. Odijk. Entity linking and retrieval for semantic search. In Procs ACM WSDM, 683--684, 2014. Google ScholarDigital Library
- E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In Proc. WSDM, 563--572, 2012. Google ScholarDigital Library
- R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proc. ACM CIKM, 233--242, 2007. Google ScholarDigital Library
- D. Milne and I. H. Witten. Learning to link with wikipedia. In Proc. CIKM, 509--518, 2008. Google ScholarDigital Library
- D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. AAAI Workshop on Wikipedia and Artificial Intelligence, 2008.Google Scholar
- F. Piccinno, P. Ferragina. From TagME to WAT: a new entity annotator. In Entity Annotation and Disambiguation Challange (ERD): Long track, ACM SIGIR Forum, 2014. Google ScholarDigital Library
- K. Risvik, T. Mikolajewski, P. Boros. Query segmentation for web search. In WWW (poster), 2003.Google Scholar
- S. Rüd, M. Ciaramita, J. Müller, and H. Schütze. Piggyback: using search engines for robust cross-domain named entity recognition. In Proc. ACL-HLT, 965--975, 2011. Google ScholarDigital Library
- F.M. Suchanek, G. Weikum. Knowledge harvesting in the big-data era. In ACM SIGMOD, 933--938, 2013. Google ScholarDigital Library
- X. Yin, S. Shah. Building taxonomy of web search intents for name entity queries. In WWW, 1001--1010, 2010. Google ScholarDigital Library
Index Terms
- The SMAPH system for query entity recognition and disambiguation
Recommendations
A language modeling approach to entity recognition and disambiguation for search queries
ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguationThe Entity Recognition and Disambiguation (ERD) problem refers to the task of recognizing mentions of entities in a given query string, disambiguating them, and mapping them to entities in a given Knowledge Base(KB). If there are multiple ways to ...
An optimization framework for entity recognition and disambiguation
ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguationWe present a system for entity recognition and disambiguation (ERD) in short text, aiming at identifying all text fragments referring to an entity contained in Freebase. The task is organized in two steps. Given a short text the first step is ...
Re-ranking for joint named-entity recognition and linking
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementRecognizing names and linking them to structured data is a fundamental task in text analysis. Existing approaches typically perform these two steps using a pipeline architecture: they use a Named-Entity Recognition (NER) system to find the boundaries of ...
Comments