ABSTRACT
The purpose of this paper is to begin a conversation about the importance and role of confidence estimation in knowledge bases (KBs). KBs are never perfectly accurate, yet without confidence reporting their users are likely to treat them as if they were, possibly with serious real-world consequences. We define a notion of confidence based on the probability of a KB fact being true. For automatically constructed KBs we propose several algorithms for estimating this confidence from pre-existing probabilistic models of data integration and KB construction. In particular, this paper focuses on confidence estimation in entity resolution. A goal of our exposition here is to encourage creators and curators of KBs to include confidence estimates for entities and relations in their KBs.
- A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In phProceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), 2010.Google Scholar
- A. Culotta. Confidence estimation for information extraction. In phIn Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL, 2004. Google ScholarDigital Library
- Culotta, Kanani, Hall, Wick, and McCallum}culotta07authorA. Culotta, P. Kanani, R. Hall, M. Wick, and A. McCallum. Author disambiguation using error-driven machine learning with a ranking loss function. In phSixth International Workshop on Information Integration on the Web (IIWeb-07), Vancouver, Canada, 2007\natexlaba. URL http://www2.selu.edu/Academics/Faculty/aculotta/pubs/culotta07author.pdf.Google Scholar
- Culotta, Wick, and McCallum}culotta07:first-orderA. Culotta, M. Wick, and A. McCallum. First-order probabilistic models for coreference resolution. In phNorth American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT), 2007\natexlabb.Google Scholar
- T. Finley and T. Joachims. Supervised clustering with support vector machines. In phInternational Conference on Machine Learning (ICML), pages 217--224, 2005. URL http://doi.acm.org/10.1145/1102351.1102379. Google ScholarDigital Library
- S. Gandrabur, G. Foster, and G. Lapalme. Confidence estimation for nlp applications. phACM Trans. Speech Lang. Process., 3 (3): 1--29, Oct. 2006. ISSN 1550--4875. 10.1145/1177055.1177057. URL http://doi.acm.org/10.1145/1177055.1177057. Google ScholarDigital Library
- F.-W. Gerstengarbe and P. Werner. A method to estimate the statistical confidence of cluster separation. phTheoretical and Applied Climatology, 57 (1--2): 103--110, 1997.Google ScholarCross Ref
- Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. Open language learning for information extraction. In phEMNLP-CoNLL, pages 523--534, 2012. Google ScholarDigital Library
- A. McCallum and B. Wellner. Toward conditional models of identity uncertainty with application to proper noun coreference. In phIJCAI Workshop on Information Integration on the Web, 2003.Google Scholar
- A. Mejer and K. Crammer. Confidence in structured-prediction using confidence-weighted models. In phProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 971--981. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- R. M. Neal. Annealed importance sampling. phSTATISTICS AND COMPUTING, 11: 125--139, 1998. Google ScholarDigital Library
- S. Singh, A. Subramanya, F. Pereira, and A. McCallum. Large-scale cross-document coreference using distributed inference and hierarchical models. In phAssociation for Computational Linguistics: Human Language Technologies (ACL HLT), 2011. Google ScholarDigital Library
- S. Singh, M. Wick, and A. McCallum. Monte carlo MCMC: Efficient inference by approximate sampling. In phEmpirical Methods in Natural Language Processing (EMNLP), 2012. Google ScholarDigital Library
- M. Wick, A. Culotta, K. Rohanimanesh, and A. McCallum. An entity-based model for coreference resolution. In phSIAM International Conference on Data Mining (SDM), 2009.Google Scholar
- M. Wick, S. Singh, and A. McCallum. A discriminative hierarchical model for fast coreference at large scale. In phAssociation for Computational Linguistics (ACL), 2012. Google ScholarDigital Library
- M. L. Wick and A. McCallum. Query-aware McMC. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, editors, phAdvances in Neural Information Processing Systems 24, pages 2564--2572. 2011.Google Scholar
Index Terms
- Assessing confidence of knowledge base content with an experimental study in entity resolution
Recommendations
Entity query feature expansion using knowledge base links
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalRecent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the ...
Entity resolution using search engine results
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementGiven a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We ...
Evaluating Entity Linking with Wikipedia
Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate ...
Comments