skip to main content
10.1145/2509558.2509561acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Assessing confidence of knowledge base content with an experimental study in entity resolution

Published:27 October 2013Publication History

ABSTRACT

The purpose of this paper is to begin a conversation about the importance and role of confidence estimation in knowledge bases (KBs). KBs are never perfectly accurate, yet without confidence reporting their users are likely to treat them as if they were, possibly with serious real-world consequences. We define a notion of confidence based on the probability of a KB fact being true. For automatically constructed KBs we propose several algorithms for estimating this confidence from pre-existing probabilistic models of data integration and KB construction. In particular, this paper focuses on confidence estimation in entity resolution. A goal of our exposition here is to encourage creators and curators of KBs to include confidence estimates for entities and relations in their KBs.

References

  1. A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In phProceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), 2010.Google ScholarGoogle Scholar
  2. A. Culotta. Confidence estimation for information extraction. In phIn Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Culotta, Kanani, Hall, Wick, and McCallum}culotta07authorA. Culotta, P. Kanani, R. Hall, M. Wick, and A. McCallum. Author disambiguation using error-driven machine learning with a ranking loss function. In phSixth International Workshop on Information Integration on the Web (IIWeb-07), Vancouver, Canada, 2007\natexlaba. URL http://www2.selu.edu/Academics/Faculty/aculotta/pubs/culotta07author.pdf.Google ScholarGoogle Scholar
  4. Culotta, Wick, and McCallum}culotta07:first-orderA. Culotta, M. Wick, and A. McCallum. First-order probabilistic models for coreference resolution. In phNorth American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT), 2007\natexlabb.Google ScholarGoogle Scholar
  5. T. Finley and T. Joachims. Supervised clustering with support vector machines. In phInternational Conference on Machine Learning (ICML), pages 217--224, 2005. URL http://doi.acm.org/10.1145/1102351.1102379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Gandrabur, G. Foster, and G. Lapalme. Confidence estimation for nlp applications. phACM Trans. Speech Lang. Process., 3 (3): 1--29, Oct. 2006. ISSN 1550--4875. 10.1145/1177055.1177057. URL http://doi.acm.org/10.1145/1177055.1177057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F.-W. Gerstengarbe and P. Werner. A method to estimate the statistical confidence of cluster separation. phTheoretical and Applied Climatology, 57 (1--2): 103--110, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  8. Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. Open language learning for information extraction. In phEMNLP-CoNLL, pages 523--534, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. McCallum and B. Wellner. Toward conditional models of identity uncertainty with application to proper noun coreference. In phIJCAI Workshop on Information Integration on the Web, 2003.Google ScholarGoogle Scholar
  10. A. Mejer and K. Crammer. Confidence in structured-prediction using confidence-weighted models. In phProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 971--981. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. M. Neal. Annealed importance sampling. phSTATISTICS AND COMPUTING, 11: 125--139, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Singh, A. Subramanya, F. Pereira, and A. McCallum. Large-scale cross-document coreference using distributed inference and hierarchical models. In phAssociation for Computational Linguistics: Human Language Technologies (ACL HLT), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Singh, M. Wick, and A. McCallum. Monte carlo MCMC: Efficient inference by approximate sampling. In phEmpirical Methods in Natural Language Processing (EMNLP), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Wick, A. Culotta, K. Rohanimanesh, and A. McCallum. An entity-based model for coreference resolution. In phSIAM International Conference on Data Mining (SDM), 2009.Google ScholarGoogle Scholar
  15. M. Wick, S. Singh, and A. McCallum. A discriminative hierarchical model for fast coreference at large scale. In phAssociation for Computational Linguistics (ACL), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. L. Wick and A. McCallum. Query-aware McMC. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, editors, phAdvances in Neural Information Processing Systems 24, pages 2564--2572. 2011.Google ScholarGoogle Scholar

Index Terms

  1. Assessing confidence of knowledge base content with an experimental study in entity resolution

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction
      October 2013
      124 pages
      ISBN:9781450324113
      DOI:10.1145/2509558

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 October 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader