skip to main content
10.1145/2588555.2593676acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

A probabilistic model for linking named entities in web text with heterogeneous information networks

Published:18 June 2014Publication History

ABSTRACT

Heterogeneous information networks that consist of multi-type, interconnected objects are becoming ubiquitous and increasingly popular, such as social media networks and bibliographic networks. The task to link named entity mentions detected from the unstructured Web text with their corresponding entities existing in a heterogeneous information network is of practical importance for the problem of information network population and enrichment. This task is challenging due to name ambiguity and limited knowledge existing in the information network. Most existing entity linking methods focus on linking entities with Wikipedia or Wikipedia-derived knowledge bases (e.g., YAGO), and are largely dependent on the special features associated with Wikipedia (e.g., Wikipedia articles or Wikipedia-based relatedness measures). Since heterogeneous information networks do not have such features, these previous methods cannot be applied to our task. In this paper, we propose SHINE, the first probabilistic model to link the named entities in Web text with a heterogeneous information network to the best of our knowledge. Our model consists of two components: the entity popularity model that captures the popularity of an entity, and the entity object model that captures the distribution of multi-type objects appearing in the textual context of an entity, which is generated using meta-path constrained random walks over networks. As different meta-paths express diverse semantic meanings and lead to various distributions over objects, different paths have different weights in entity linking. We propose an effective iterative approach to automatically learning the weights for each meta-path based on the expectation-maximization (EM) algorithm without requiring any training data. Experimental results on a real world data set demonstrate the effectiveness and efficiency of our proposed model in comparison with the baselines.

References

  1. L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS, pages 161--168, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bunescu and M. Pasca. Using Encyclopedic Knowledge for Named Entity Disambiguation. In EACL, pages 9--16, 2006.Google ScholarGoogle Scholar
  4. S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP-CoNLL, pages 708--716.Google ScholarGoogle Scholar
  5. N. Dalvi, R. Kumar, and B. Pang. Object matching in tweets with spatial models. In WSDM, pages 43--52, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. O. Deshpande, D. S. Lamba, M. Tourn, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, and A. Doan. Building, maintaining, and using knowledge bases: A report from the trenches. In SIGMOD, pages 1209--1220, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 277--285, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. A. Ferreira, M. A. Gonçalves, and A. H. Laender. A brief survey of automatic methods for author name disambiguation. SIGMOD Rec., 41(2):15--26, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. L. Griffiths and M. Steyvers. Finding scientific topics. National Academy of Sciences, 101, 2004.Google ScholarGoogle Scholar
  10. X. Han and L. Sun. A generative entity-mention model for linking entities with knowledge base. In ACL, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Ji and R. Grishman. Knowledge base population: successful approaches and challenges. In ACL, pages 1148--1158, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Kanani, A. McCallum, and C. Pal. Improving author coreference by resource-bounded information gathering from the web. In IJCAI, pages 429--434, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of wikipedia entities in web text. In SIGKDD, pages 457--466, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Lao and W. W. Cohen. Relational retrieval using a combination of path-constrained random walks. Mach. Learn., 81(1):53--67, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Ley. Dblp: some lessons learned. Proc. VLDB Endow., 2(2):1493--1500, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Li, X. L. Dong, A. Maurino, and D. Srivastava. Linking temporal records. Proceedings of the VLDB Endowment, 4(11):956--967, Aug. 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. D. Manning, P. Raghavan, and H. Schütze, editors. An Introduction to Information Retrieval. Cambridge University Press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In WIKIAI, 2008.Google ScholarGoogle Scholar
  20. P. Pantel and A. Fuxman. Jigs and lures: associating web queries with structured entities. In ACL, pages 83--92, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Shen, J. Wang, P. Luo, and M. Wang. Liege: Link entities in web lists with knowledge base. In SIGKDD, pages 1424--1432, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In WWW, pages 449--458, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In SIGKDD, pages 68--76, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Shu, B. Long, and W. Meng. A latent topic model for complete entity resolution. In ICDE, pages 880--891, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In WWW, pages 697--706, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB'11.Google ScholarGoogle Scholar
  28. Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and X. Yu. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In SIGKDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Wang, J. Tang, H. Cheng, and P. S. Yu. Adana: Active name disambiguation. In ICDM, pages 794--803, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Yin, J. Han, and P. S. Yu. Object distinction: Distinguishing objects with identical names. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A probabilistic model for linking named entities in web text with heterogeneous information networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
      June 2014
      1645 pages
      ISBN:9781450323765
      DOI:10.1145/2588555

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 June 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader