skip to main content
10.1145/2633211.2634348acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

The SMAPH system for query entity recognition and disambiguation

Published:11 July 2014Publication History

ABSTRACT

The SMAPH system implements a pipeline of four main steps: (1) Fetching -- it fetches the search results returned by a search engine given the query to be annotated; (2) Spotting -- search result snippets are parsed to identify candidate mentions for the entities to be annotated. This is done in a novel way by detecting the keywords-in-context by looking at the bold parts of the search snippets; (3) Candidate generation -- candidate entities are generated in two ways: from the Wikipedia pages occurring in the search results, and from an existing annotator, using the mentions identified in the spotting step as input; (4) Pruning -- a binary SVM classifier is used to decide which entities to keep/discard in order to generate the final annotation set for the query. The SMAPH system ranked third on the development set and first on the final blind test of the 2014 ERD Challenge short text track.

References

  1. C. Boston, H. Fang, S. Carberry, H. Wu, X. Liu. Wikimantic: Toward effective disambiguation and expansion of queries. Data & Knowledge Engineering, 90: 22--37, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  2. D. Carmel, M. Chang, E. Gabrilovich, B. Hsu and K. Wang. ERD 2014: Entity Recognition and Disambiguation Challenge. SIGIR Forum, ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. In ACM Transactions on Intelligent Systems and Technology, 27:1--27:27, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Cornolti, P. Ferragina, M. Ciaramita. A framework for benchmarking entity-annotation systems. In WWW, 249--260, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. EMNLP and CNLL, 708--716, 2007.Google ScholarGoogle Scholar
  6. P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. IEEE Software, 29(1): 70--75, 2012. Also in ACM CIKM, 1625--1628, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Gabrilovich and S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. J. Artif. Int. Res., 34(1):443--498, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Guo, G. Xu, X. Cheng, H. Li. Named Entity Recognition in Query. In SIGIR, 267--274, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol,B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proc. EMNLP, 782--792, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Houlsby and M. Ciaramita. A Scalable Gibbs Sampler for Probabilistic Entity Linking. In Proceedings of ECIR, 335--346, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in web text. In ACM KDD, 457--466, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Liu, C. Yu, W. Meng. Word Sense Disambiguation in Queries. In CIKM, 525--532, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Manshadi, X. Li. Semantic tagging of web search queries. In ACL, 861--869, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Meij. A Comparison of five semantic linking algorithms on tweets. Personal Blog: http://alturl.com/aujuc, 2012.Google ScholarGoogle Scholar
  15. E. Meij, K. Balog, D. Odijk. Entity linking and retrieval for semantic search. In Procs ACM WSDM, 683--684, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In Proc. WSDM, 563--572, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proc. ACM CIKM, 233--242, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Milne and I. H. Witten. Learning to link with wikipedia. In Proc. CIKM, 509--518, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. AAAI Workshop on Wikipedia and Artificial Intelligence, 2008.Google ScholarGoogle Scholar
  20. F. Piccinno, P. Ferragina. From TagME to WAT: a new entity annotator. In Entity Annotation and Disambiguation Challange (ERD): Long track, ACM SIGIR Forum, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Risvik, T. Mikolajewski, P. Boros. Query segmentation for web search. In WWW (poster), 2003.Google ScholarGoogle Scholar
  22. S. Rüd, M. Ciaramita, J. Müller, and H. Schütze. Piggyback: using search engines for robust cross-domain named entity recognition. In Proc. ACL-HLT, 965--975, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F.M. Suchanek, G. Weikum. Knowledge harvesting in the big-data era. In ACM SIGMOD, 933--938, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Yin, S. Shah. Building taxonomy of web search intents for name entity queries. In WWW, 1001--1010, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The SMAPH system for query entity recognition and disambiguation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguation
        July 2014
        134 pages
        ISBN:9781450330237
        DOI:10.1145/2633211

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 July 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ERD '14 Paper Acceptance Rate18of28submissions,64%Overall Acceptance Rate18of28submissions,64%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader