research-article

Assessing confidence of knowledge base content with an experimental study in entity resolution

Authors:
Michael Wick

University of Massachusetts, Amherst, MA, USA

University of Massachusetts, Amherst, MA, USA
View Profile

,
Sameer Singh

University of Massachusetts, Amherst, MA, USA

University of Massachusetts, Amherst, MA, USA
View Profile

,
Ari Kobren

University of Massachusetts, Amherst, MA, USA

University of Massachusetts, Amherst, MA, USA
View Profile

,
Andrew McCallum

University of Massachusetts, Amherst, MA, USA

University of Massachusetts, Amherst, MA, USA
View Profile

AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base constructionOctober 2013Pages 13–18https://doi.org/10.1145/2509558.2509561

Published:27 October 2013Publication History

AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction

Pages 13–18

ABSTRACT

The purpose of this paper is to begin a conversation about the importance and role of confidence estimation in knowledge bases (KBs). KBs are never perfectly accurate, yet without confidence reporting their users are likely to treat them as if they were, possibly with serious real-world consequences. We define a notion of confidence based on the probability of a KB fact being true. For automatically constructed KBs we propose several algorithms for estimating this confidence from pre-existing probabilistic models of data integration and KB construction. In particular, this paper focuses on confidence estimation in entity resolution. A goal of our exposition here is to encourage creators and curators of KBs to include confidence estimates for entities and relations in their KBs.

References

A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In phProceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), 2010.Google Scholar
A. Culotta. Confidence estimation for information extraction. In phIn Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL, 2004. Google ScholarDigital Library
Culotta, Kanani, Hall, Wick, and McCallum}culotta07authorA. Culotta, P. Kanani, R. Hall, M. Wick, and A. McCallum. Author disambiguation using error-driven machine learning with a ranking loss function. In phSixth International Workshop on Information Integration on the Web (IIWeb-07), Vancouver, Canada, 2007\natexlaba. URL http://www2.selu.edu/Academics/Faculty/aculotta/pubs/culotta07author.pdf.Google Scholar
Culotta, Wick, and McCallum}culotta07:first-orderA. Culotta, M. Wick, and A. McCallum. First-order probabilistic models for coreference resolution. In phNorth American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT), 2007\natexlabb.Google Scholar
T. Finley and T. Joachims. Supervised clustering with support vector machines. In phInternational Conference on Machine Learning (ICML), pages 217--224, 2005. URL http://doi.acm.org/10.1145/1102351.1102379. Google ScholarDigital Library
S. Gandrabur, G. Foster, and G. Lapalme. Confidence estimation for nlp applications. phACM Trans. Speech Lang. Process., 3 (3): 1--29, Oct. 2006. ISSN 1550--4875. 10.1145/1177055.1177057. URL http://doi.acm.org/10.1145/1177055.1177057. Google ScholarDigital Library
F.-W. Gerstengarbe and P. Werner. A method to estimate the statistical confidence of cluster separation. phTheoretical and Applied Climatology, 57 (1--2): 103--110, 1997.Google ScholarCross Ref
Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. Open language learning for information extraction. In phEMNLP-CoNLL, pages 523--534, 2012. Google ScholarDigital Library
A. McCallum and B. Wellner. Toward conditional models of identity uncertainty with application to proper noun coreference. In phIJCAI Workshop on Information Integration on the Web, 2003.Google Scholar
A. Mejer and K. Crammer. Confidence in structured-prediction using confidence-weighted models. In phProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 971--981. Association for Computational Linguistics, 2010. Google ScholarDigital Library
R. M. Neal. Annealed importance sampling. phSTATISTICS AND COMPUTING, 11: 125--139, 1998. Google ScholarDigital Library
S. Singh, A. Subramanya, F. Pereira, and A. McCallum. Large-scale cross-document coreference using distributed inference and hierarchical models. In phAssociation for Computational Linguistics: Human Language Technologies (ACL HLT), 2011. Google ScholarDigital Library
S. Singh, M. Wick, and A. McCallum. Monte carlo MCMC: Efficient inference by approximate sampling. In phEmpirical Methods in Natural Language Processing (EMNLP), 2012. Google ScholarDigital Library
M. Wick, A. Culotta, K. Rohanimanesh, and A. McCallum. An entity-based model for coreference resolution. In phSIAM International Conference on Data Mining (SDM), 2009.Google Scholar
M. Wick, S. Singh, and A. McCallum. A discriminative hierarchical model for fast coreference at large scale. In phAssociation for Computational Linguistics (ACL), 2012. Google ScholarDigital Library
M. L. Wick and A. McCallum. Query-aware McMC. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, editors, phAdvances in Neural Information Processing Systems 24, pages 2564--2572. 2011.Google Scholar

Index Terms

Assessing confidence of knowledge base content with an experimental study in entity resolution
1. Information systems
  1. Data management systems

Recommendations

Entity query feature expansion using knowledge base links
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the ...
Read More
Entity resolution using search engine results
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We ...
Read More
Evaluating Entity Linking with Wikipedia

Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction
October 2013
124 pages
ISBN:9781450324113
DOI:10.1145/2509558
Program Chairs:
Fabian M. Suchanek
Max Planck Institute for Informatics, Germany
,
Sebastian Riedel
University College London, UK
,
Sameer Singh
University of Massachusetts Amherst, USA
,
Partha Pratim Talukdar
Carnegie Mellon University, USA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entity resolution
information extraction
uncertain data
Qualifiers
- research-article
Conference
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 114
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Assessing confidence of knowledge base content with an experimental study in entity resolution

AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Entity query feature expansion using knowledge base links

Entity resolution using search engine results

Evaluating Entity Linking with Wikipedia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Assessing confidence of knowledge base content with an experimental study in entity resolution

AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Entity query feature expansion using knowledge base links

Entity resolution using search engine results

Evaluating Entity Linking with Wikipedia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media