ABSTRACT
Twitter has become an increasingly important source of information, with more than 400 million tweets posted per day. The task to link the named entity mentions detected from tweets with the corresponding real world entities in the knowledge base is called tweet entity linking. This task is of practical importance and can facilitate many different tasks, such as personalized recommendation and user interest discovery. The tweet entity linking task is challenging due to the noisy, short, and informal nature of tweets. Previous methods focus on linking entities in Web documents, and largely rely on the context around the entity mention and the topical coherence between entities in the document. However, these methods cannot be effectively applied to the tweet entity linking task due to the insufficient context information contained in a tweet. In this paper, we propose KAURI, a graph-based framework to collectively link all the named entity mentions in all tweets posted by a user via modeling the user's topics of interest. Our assumption is that each user has an underlying topic interest distribution over various named entities. KAURI integrates the intra-tweet local information with the inter-tweet user interest information into a unified graph-based framework. We extensively evaluated the performance of KAURI over manually annotated tweet corpus, and the experimental results show that KAURI significantly outperforms the baseline methods in terms of accuracy, and KAURI is efficient and scales well to tweet stream.
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, and Z. Ives. Dbpedia: A nucleus for a web of open data. In ISWC'07. Google ScholarDigital Library
- C. Bizer, T. Heath, and T. Berners-Lee. Linked Data - The Story So Far. IJSWIS, 5(3), 2009.Google Scholar
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD'08. Google ScholarDigital Library
- R. Bunescu and M. Pasca. Using Encyclopedic Knowledge for Named Entity Disambiguation. In EACL'06.Google Scholar
- J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi. Short and tweet: experiments on recommending content from information streams. In CHI'10. Google ScholarDigital Library
- K. Chen, T. Chen, G. Zheng, O. Jin, E. Yao, and Y. Yu. Collaborative personalized tweet recommendation. In SIGIR'12. Google ScholarDigital Library
- S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP-CoNLL'07.Google Scholar
- N. Dalvi, R. Kumar, and B. Pang. Object matching in tweets with spatial models. In WSDM'12. Google ScholarDigital Library
- X. Han, L. Sun, and J. Zhao. Collective entity linking in web text: a graph-based method. In SIGIR'11. Google ScholarDigital Library
- T. H. Haveliwala. Topic-sensitive pagerank. In WWW'02. Google ScholarDigital Library
- J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP'11. Google ScholarDigital Library
- S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of wikipedia entities in web text. In SIGKDD'09. Google ScholarDigital Library
- C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee. Twiner: named entity recognition in targeted twitter stream. In SIGIR'12. Google ScholarDigital Library
- X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing named entities in tweets. In ACL'11. Google ScholarDigital Library
- E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In WSDM '12, pages 563--572. Google ScholarDigital Library
- M. Michelson and S. A. Macskassy. Discovering users' topics of interest on twitter: a first look. In AND'10. Google ScholarDigital Library
- D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In WIKIAI'08.Google Scholar
- W. Shen, J. Wang, P. Luo, and M. Wang. A graph-based approach for ontology population with named entities. In CIKM'12, pages 345--354. Google ScholarDigital Library
- W. Shen, J. Wang, P. Luo, and M. Wang. Liege: Link entities in web lists with knowledge base. In SIGKDD'12. Google ScholarDigital Library
- W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In WWW'12. Google ScholarDigital Library
- F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In WWW'07. Google ScholarDigital Library
- J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: finding topic-sensitive influential twitterers. In WSDM'10. Google ScholarDigital Library
- W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In SIGMOD'12. Google ScholarDigital Library
- Z. Xu, L. Ru, L. Xiang, and Q. Yang. Discovering user interest on twitter with a modified author-topic model. In WI-IAT'11. Google ScholarDigital Library
Index Terms
- Linking named entities in Tweets with knowledge base via user interest modeling
Recommendations
Exploiting Relevant Hyperlinks in Knowledge Base for Entity Linking
Advances in Knowledge Discovery and Data MiningAbstractIn this study, we propose a new model aiming to enhance the quality of entity linking by exploiting highly relevant hyperlinks in knowledge base for entity disambiguation. We find that most existing studies do not filter the corresponding ...
A generative entity-mention model for linking entities with knowledge base
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1Linking entities with knowledge base (entity linking) is a key issue in bridging the textual data with the structural knowledge base. Due to the name variation problem and the name ambiguity problem, the entity linking decisions are critically depending ...
Linking named entities to any database
EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language LearningExisting techniques for disambiguating named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive ...
Comments