research-article

Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding

Authors:
Xiang Ren

Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA

Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Wenqi He

Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA

Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Meng Qu

Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA

Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Clare R. Voss

Army Research Laboratory, Adelphi, MD, USA

Army Research Laboratory, Adelphi, MD, USA
View Profile

,
Heng Ji

Rensselaer Polytechnic Institute, Troy, NY, USA

Rensselaer Polytechnic Institute, Troy, NY, USA
View Profile

,
Jiawei Han

Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA

Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2016Pages 1825–1834https://doi.org/10.1145/2939672.2939822

Published:13 August 2016Publication History

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1825–1834

ABSTRACT

Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions. However, the type labels so obtained from knowledge bases are often noisy (i.e., incorrect for the entity mention's local context). We define a new task, Label Noise Reduction in Entity Typing (LNR), to be the automatic identification of correct type labels (type-paths) for training examples, given the set of candidate type labels obtained by distant supervision with a given type hierarchy. The unknown type labels for individual entity mentions and the semantic similarity between entity types pose unique challenges for solving the LNR task. We propose a general framework, called PLE, to jointly embed entity mentions, text features and entity types into the same low-dimensional space where, in that space, objects whose types are semantically close have similar representations. Then we estimate the type-path for each training example in a top-down manner using the learned embeddings. We formulate a global objective for learning the embeddings from text corpora and knowledge bases, which adopts a novel margin-based loss that is robust to noisy labels and faithfully models type correlation derived from knowledge bases. Our experiments on three public typing datasets demonstrate the effectiveness and robustness of PLE, with an average of 25% improvement in accuracy compared to next best method.

References

A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multi-relational data. In NIPS, 2013.Google ScholarDigital Library
T. Cour, B. Sapp, and B. Taskar. Learning from partial labels. JMLR, 12:1501--1536, 2011. Google ScholarDigital Library
L. Dong, F. Wei, H. Sun, M. Zhou, and K. Xu. A hybrid neural model for type classification of entity mentions. In IJCAI, 2015. Google ScholarDigital Library
X. L. Dong, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In KDD, 2014. Google ScholarDigital Library
J. Dunietz and D. Gillick. A new entity salience task with millions of training examples. EACL, 2014.Google ScholarCross Ref
A. Fader, L. Zettlemoyer, and O. Etzioni. Open question answering over curated and extracted knowledge bases. KDD, 2014. Google ScholarDigital Library
D. Gillick, N. Lazic, K. Ganchev, J. Kirchner, and D. Huynh. Context-dependent fine-grained entity type tagging. arXiv preprint arXiv:1412.1820, 2014.Google Scholar
Z. S. Harris. Distributional structure. Word, 10:146--162, 1954.Google ScholarCross Ref
X. He and P. Niyogi. Locality preserving projections. In NIPS, 2004. Google ScholarDigital Library
Z. Hu, P. Huang, Y. Deng, Y. Gao, and E. P. Xing. Entity hierarchy embedding. In ACL, 2015.Google ScholarCross Ref
H. Ji, T. Cassidy, Q. Li, and S. Tamang. Tackling representation, annotation and classification challenges for temporal knowledge base population. KIS, 41(3):611--646, 2014. Google ScholarDigital Library
J.-Y. Jiang, C.-Y. Lin, and P.-J. Cheng. Entity-driven type hierarchy construction for freebase. In WWW, 2015. Google ScholarDigital Library
T. Lin, O. Etzioni, et al. No noun phrase left behind: detecting and typing unlinkable entities. In EMNLP, 2012. Google ScholarDigital Library
X. Ling and D. S. Weld. Fine-grained entity recognition. In AAAI, 2012. Google ScholarDigital Library
L. Liu and T. G. Dietterich. A conditional multinomial mixture model for superset label learning. In NIPS, 2012.Google Scholar
C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The stanford corenlp natural language processing toolkit. ACL, 2014.Google ScholarCross Ref
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.Google ScholarDigital Library
D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30:3--26, 2007.Google ScholarCross Ref
N. Nakashole, T. Tylenda, and G. Weikum. Fine-grained semantic typing of emerging entities. In ACL, 2013.Google Scholar
N. Nguyen and R. Caruana. Classification with partial labels. In KDD, 2008. Google ScholarDigital Library
B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In KDD, 2014. Google ScholarDigital Library
X. Ren, A. El-Kishky, C. Wang, F. Tao, C. R. Voss, and J. Han. Clustype: Effective entity recognition and typing by relation phrase-based clustering. In KDD, 2015. Google ScholarDigital Library
M. Schmitz, R. Bart, S. Soderland, O. Etzioni, et al. Open language learning for information extraction. In EMNLP, 2012.Google ScholarDigital Library
S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical programming, 127(1):3--30, 2011. Google ScholarDigital Library
W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. TKDE, (99):1--20, 2014.Google Scholar
S. Singh, A. Subramanya, F. Pereira, and A. McCallum. Wikilinks: A large-scale cross-document coreference corpus labeled via links to wikipedia. UM-CS-2012-015, 2012.Google Scholar
S. Takamatsu, I. Sato, and H. Nakagawa. Reducing wrong labels in distant supervision for relation extraction. In ACL, 2012. Google ScholarDigital Library
J. Tang, M. Qu, and Q. Mei. Pte: Predictive text embedding through large-scale heterogeneous text networks. In KDD, 2015. Google ScholarDigital Library
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In WWW, 2015. Google ScholarDigital Library
P. Tseng. Convergence of a block coordinate descent method for non differentiable minimization. JOTA, 109(3):475--494, 2001. Google ScholarDigital Library
R. Weischedel and A. Brunstein. Bbn pronoun coreference and entity type corpus. Linguistic Data Consortium, 112, 2005.Google Scholar
R. Weischedel, E. Hovy, M. Marcus, M. Palmer, R. Belvin, S. Pradhan, L. Ramshaw, and N. Xue. Ontonotes: A large training corpus for enhanced processing. 2011.Google Scholar
J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI, 2011. Google ScholarDigital Library
D. Yogatama, D. Gillick, and N. Lazic. Embedding methods for fine grained entity type classification. In ACL, 2015.Google ScholarCross Ref
M. A. Yosef, S. Bauer, J. Hoffart, M. Spaniol, and G. Weikum. Hyena: Hierarchical type classification for entity names. In COLING, 2012.Google Scholar
M.-L. Zhang. Disambiguation-free partial label learning. In SDM, 2014.Google ScholarCross Ref

Index Terms

Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning latent representations
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Information extraction
  2. Information systems applications
    1. Data mining

Recommendations

Partial Label Learning via Feature-Aware Disambiguation
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Partial label learning deals with the problem where each training example is represented by a feature vector while associated with a set of candidate labels, among which only one label is valid. To learn from such ambiguous labeling information, the key ...
Read More
Fine-Grained Entity Typing via Label Noise Reduction and Data Augmentation
Database Systems for Advanced Applications
Abstract
Fine-grained entity typing aims to assign one or more types for entity mentions in the corpus. Recently, distant supervision has been utilized to generate training data. However, it has two drawbacks. First, the same labels are assigned to every ...
Read More
Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

We study the problem of few-shot Fine-grained Entity Typing (FET), where only a few annotated entity mentions with contexts are given for each entity type. Recently, prompt-based tuning has demonstrated superior performance to standard fine-tuning in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon
Copyright © 2016 ACM
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distant supervision
entity typing
fine-grained entity typing
knowledge base
label noise reduction
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 54
  Total Citations
  View Citations
- 579
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Partial Label Learning via Feature-Aware Disambiguation

Fine-Grained Entity Typing via Label Noise Reduction and Data Augmentation

Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation