ABSTRACT
Heterogeneous information networks that consist of multi-type, interconnected objects are becoming ubiquitous and increasingly popular, such as social media networks and bibliographic networks. The task to link named entity mentions detected from the unstructured Web text with their corresponding entities existing in a heterogeneous information network is of practical importance for the problem of information network population and enrichment. This task is challenging due to name ambiguity and limited knowledge existing in the information network. Most existing entity linking methods focus on linking entities with Wikipedia or Wikipedia-derived knowledge bases (e.g., YAGO), and are largely dependent on the special features associated with Wikipedia (e.g., Wikipedia articles or Wikipedia-based relatedness measures). Since heterogeneous information networks do not have such features, these previous methods cannot be applied to our task. In this paper, we propose SHINE, the first probabilistic model to link the named entities in Web text with a heterogeneous information network to the best of our knowledge. Our model consists of two components: the entity popularity model that captures the popularity of an entity, and the entity object model that captures the distribution of multi-type objects appearing in the textual context of an entity, which is generated using meta-path constrained random walks over networks. As different meta-paths express diverse semantic meanings and lead to various distributions over objects, different paths have different weights in entity linking. We propose an effective iterative approach to automatically learning the weights for each meta-path based on the expectation-maximization (EM) algorithm without requiring any training data. Experimental results on a real world data set demonstrate the effectiveness and efficiency of our proposed model in comparison with the baselines.
- L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS, pages 161--168, 2008.Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, 1998. Google ScholarDigital Library
- R. Bunescu and M. Pasca. Using Encyclopedic Knowledge for Named Entity Disambiguation. In EACL, pages 9--16, 2006.Google Scholar
- S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP-CoNLL, pages 708--716.Google Scholar
- N. Dalvi, R. Kumar, and B. Pang. Object matching in tweets with spatial models. In WSDM, pages 43--52, 2012. Google ScholarDigital Library
- O. Deshpande, D. S. Lamba, M. Tourn, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, and A. Doan. Building, maintaining, and using knowledge bases: A report from the trenches. In SIGMOD, pages 1209--1220, 2013. Google ScholarDigital Library
- M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 277--285, 2010. Google ScholarDigital Library
- A. A. Ferreira, M. A. Gonçalves, and A. H. Laender. A brief survey of automatic methods for author name disambiguation. SIGMOD Rec., 41(2):15--26, 2012. Google ScholarDigital Library
- T. L. Griffiths and M. Steyvers. Finding scientific topics. National Academy of Sciences, 101, 2004.Google Scholar
- X. Han and L. Sun. A generative entity-mention model for linking entities with knowledge base. In ACL, 2011. Google ScholarDigital Library
- J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP, 2011. Google ScholarDigital Library
- H. Ji and R. Grishman. Knowledge base population: successful approaches and challenges. In ACL, pages 1148--1158, 2011. Google ScholarDigital Library
- P. Kanani, A. McCallum, and C. Pal. Improving author coreference by resource-bounded information gathering from the web. In IJCAI, pages 429--434, 2007. Google ScholarDigital Library
- S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of wikipedia entities in web text. In SIGKDD, pages 457--466, 2009. Google ScholarDigital Library
- N. Lao and W. W. Cohen. Relational retrieval using a combination of path-constrained random walks. Mach. Learn., 81(1):53--67, Oct. 2010. Google ScholarDigital Library
- M. Ley. Dblp: some lessons learned. Proc. VLDB Endow., 2(2):1493--1500, Aug. 2009. Google ScholarDigital Library
- P. Li, X. L. Dong, A. Maurino, and D. Srivastava. Linking temporal records. Proceedings of the VLDB Endowment, 4(11):956--967, Aug. 2011.Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schütze, editors. An Introduction to Information Retrieval. Cambridge University Press, 2009. Google ScholarDigital Library
- D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In WIKIAI, 2008.Google Scholar
- P. Pantel and A. Fuxman. Jigs and lures: associating web queries with structured entities. In ACL, pages 83--92, 2011. Google ScholarDigital Library
- L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL, 2011. Google ScholarDigital Library
- W. Shen, J. Wang, P. Luo, and M. Wang. Liege: Link entities in web lists with knowledge base. In SIGKDD, pages 1424--1432, 2012. Google ScholarDigital Library
- W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In WWW, pages 449--458, 2012. Google ScholarDigital Library
- W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In SIGKDD, pages 68--76, 2013. Google ScholarDigital Library
- L. Shu, B. Long, and W. Meng. A latent topic model for complete entity resolution. In ICDE, pages 880--891, 2009. Google ScholarDigital Library
- F. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In WWW, pages 697--706, 2007. Google ScholarDigital Library
- Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB'11.Google Scholar
- Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and X. Yu. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In SIGKDD, 2012. Google ScholarDigital Library
- X. Wang, J. Tang, H. Cheng, and P. S. Yu. Adana: Active name disambiguation. In ICDM, pages 794--803, 2011. Google ScholarDigital Library
- X. Yin, J. Han, and P. S. Yu. Object distinction: Distinguishing objects with identical names. In ICDE, 2007.Google ScholarCross Ref
Index Terms
- A probabilistic model for linking named entities in web text with heterogeneous information networks
Recommendations
Collective entity linking in web text: a graph-based method
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalEntity Linking (EL) is the task of linking name mentions in Web text with their referent entities in a knowledge base. Traditional EL methods usually link name mentions in a document by assuming them to be independent. However, there is often additional ...
A graph-based approach for ontology population with named entities
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementAutomatically populating ontology with named entities extracted from the unstructured text has become a key issue for Semantic Web and knowledge management techniques. This issue naturally consists of two subtasks: (1) for the entity mention whose ...
Deola: A System for Linking Author Entities in Web Document with DBLP
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementIn this paper, we present Deola, an Online system for Author Entity Linking with DBLP. Unlike most existing entity linking systems which focus on linking entities with Wikipedia and depend largely on the special features associated with Wikipedia (e.g., ...
Comments