research-article

A probabilistic model for linking named entities in web text with heterogeneous information networks

Authors:
Wei Shen

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jiawei Han

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Jianyong Wang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataJune 2014Pages 1199–1210https://doi.org/10.1145/2588555.2593676

Published:18 June 2014Publication History

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Pages 1199–1210

ABSTRACT

Heterogeneous information networks that consist of multi-type, interconnected objects are becoming ubiquitous and increasingly popular, such as social media networks and bibliographic networks. The task to link named entity mentions detected from the unstructured Web text with their corresponding entities existing in a heterogeneous information network is of practical importance for the problem of information network population and enrichment. This task is challenging due to name ambiguity and limited knowledge existing in the information network. Most existing entity linking methods focus on linking entities with Wikipedia or Wikipedia-derived knowledge bases (e.g., YAGO), and are largely dependent on the special features associated with Wikipedia (e.g., Wikipedia articles or Wikipedia-based relatedness measures). Since heterogeneous information networks do not have such features, these previous methods cannot be applied to our task. In this paper, we propose SHINE, the first probabilistic model to link the named entities in Web text with a heterogeneous information network to the best of our knowledge. Our model consists of two components: the entity popularity model that captures the popularity of an entity, and the entity object model that captures the distribution of multi-type objects appearing in the textual context of an entity, which is generated using meta-path constrained random walks over networks. As different meta-paths express diverse semantic meanings and lead to various distributions over objects, different paths have different weights in entity linking. We propose an effective iterative approach to automatically learning the weights for each meta-path based on the expectation-maximization (EM) algorithm without requiring any training data. Experimental results on a real world data set demonstrate the effectiveness and efficiency of our proposed model in comparison with the baselines.

References

L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS, pages 161--168, 2008.Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, 1998. Google ScholarDigital Library
R. Bunescu and M. Pasca. Using Encyclopedic Knowledge for Named Entity Disambiguation. In EACL, pages 9--16, 2006.Google Scholar
S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP-CoNLL, pages 708--716.Google Scholar
N. Dalvi, R. Kumar, and B. Pang. Object matching in tweets with spatial models. In WSDM, pages 43--52, 2012. Google ScholarDigital Library
O. Deshpande, D. S. Lamba, M. Tourn, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, and A. Doan. Building, maintaining, and using knowledge bases: A report from the trenches. In SIGMOD, pages 1209--1220, 2013. Google ScholarDigital Library
M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 277--285, 2010. Google ScholarDigital Library
A. A. Ferreira, M. A. Gonçalves, and A. H. Laender. A brief survey of automatic methods for author name disambiguation. SIGMOD Rec., 41(2):15--26, 2012. Google ScholarDigital Library
T. L. Griffiths and M. Steyvers. Finding scientific topics. National Academy of Sciences, 101, 2004.Google Scholar
X. Han and L. Sun. A generative entity-mention model for linking entities with knowledge base. In ACL, 2011. Google ScholarDigital Library
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP, 2011. Google ScholarDigital Library
H. Ji and R. Grishman. Knowledge base population: successful approaches and challenges. In ACL, pages 1148--1158, 2011. Google ScholarDigital Library
P. Kanani, A. McCallum, and C. Pal. Improving author coreference by resource-bounded information gathering from the web. In IJCAI, pages 429--434, 2007. Google ScholarDigital Library
S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of wikipedia entities in web text. In SIGKDD, pages 457--466, 2009. Google ScholarDigital Library
N. Lao and W. W. Cohen. Relational retrieval using a combination of path-constrained random walks. Mach. Learn., 81(1):53--67, Oct. 2010. Google ScholarDigital Library
M. Ley. Dblp: some lessons learned. Proc. VLDB Endow., 2(2):1493--1500, Aug. 2009. Google ScholarDigital Library
P. Li, X. L. Dong, A. Maurino, and D. Srivastava. Linking temporal records. Proceedings of the VLDB Endowment, 4(11):956--967, Aug. 2011.Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schütze, editors. An Introduction to Information Retrieval. Cambridge University Press, 2009. Google ScholarDigital Library
D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In WIKIAI, 2008.Google Scholar
P. Pantel and A. Fuxman. Jigs and lures: associating web queries with structured entities. In ACL, pages 83--92, 2011. Google ScholarDigital Library
L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL, 2011. Google ScholarDigital Library
W. Shen, J. Wang, P. Luo, and M. Wang. Liege: Link entities in web lists with knowledge base. In SIGKDD, pages 1424--1432, 2012. Google ScholarDigital Library
W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In WWW, pages 449--458, 2012. Google ScholarDigital Library
W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In SIGKDD, pages 68--76, 2013. Google ScholarDigital Library
L. Shu, B. Long, and W. Meng. A latent topic model for complete entity resolution. In ICDE, pages 880--891, 2009. Google ScholarDigital Library
F. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In WWW, pages 697--706, 2007. Google ScholarDigital Library
Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB'11.Google Scholar
Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and X. Yu. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In SIGKDD, 2012. Google ScholarDigital Library
X. Wang, J. Tang, H. Cheng, and P. S. Yu. Adana: Active name disambiguation. In ICDM, pages 794--803, 2011. Google ScholarDigital Library
X. Yin, J. Han, and P. S. Yu. Object distinction: Distinguishing objects with identical names. In ICDE, 2007.Google ScholarCross Ref

Index Terms

A probabilistic model for linking named entities in web text with heterogeneous information networks
1. Information systems
  1. Information retrieval

Recommendations

Collective entity linking in web text: a graph-based method
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Entity Linking (EL) is the task of linking name mentions in Web text with their referent entities in a knowledge base. Traditional EL methods usually link name mentions in a document by assuming them to be independent. However, there is often additional ...
Read More
A graph-based approach for ontology population with named entities
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Automatically populating ontology with named entities extracted from the unstructured text has become a key issue for Semantic Web and knowledge management techniques. This issue naturally consists of two subtasks: (1) for the entity mention whose ...
Read More
Deola: A System for Linking Author Entities in Web Document with DBLP
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

In this paper, we present Deola, an Online system for Author Entity Linking with DBLP. Unlike most existing entity linking systems which focus on linking entities with Wikipedia and depend largely on the special features associated with Wikipedia (e.g., ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
June 2014
1645 pages
ISBN:9781450323765
DOI:10.1145/2588555
General Chairs:
Curtis Dyreson
Utah State University, USA
,
Feifei Li
University of Utah, USA
,
Program Chair:
M. Tamer Özsu
University of Waterloo, Canada
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
domain-specific entity linking
entity linking
heterogeneous information networks
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 760
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A probabilistic model for linking named entities in web text with heterogeneous information networks

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Collective entity linking in web text: a graph-based method

A graph-based approach for ontology population with named entities

Deola: A System for Linking Author Entities in Web Document with DBLP