skip to main content
10.1145/3078081.3078099acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdatechConference Proceedingsconference-collections
research-article

Toponym disambiguation in historical documents using semantic and geographic features

Authors Info & Claims
Published:01 June 2017Publication History

ABSTRACT

Historians are often interested in the locations mentioned in digitized collections. However, place names are highly ambiguous and may change over time, which makes it especially hard to automatically ground mentions of places in historical texts to their real-world referents. Toponym disambiguation is a challenging problem in natural language processing, and has been approached in two different yet related tasks: toponym resolution and entity linking. In this paper, we propose a weakly-supervised method that combines the strengths of both approaches by exploiting both geographic and semantic features. We tested our method against a historical toponym resolution benchmark and improved the state of the art. We also created five datasets and tested the performance of two state-of-the-art out-of-the-box entity linking methods and also improved on their performance when only locations are considered.

References

  1. Razvan Bunescu and Marius Pasca. 2006. Using Encyclopedic Knowledge for Named Entity Disambiguation. In EACL. 9--16.Google ScholarGoogle Scholar
  2. Davide Buscaldi. 2011. Approaches to disambiguating toponyms. SIGSPATIAL Special 3, 2 (2011), 16--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Davide Buscaldi and Bernardo Magnini. 2010. Grounding Toponyms in an Italian Local News Corpus. In Workshop on Geographic Information Retrieval. 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet: A content-based approach to geo-locating twitter users. In CIKM. 759--768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Silviu Cucerzan. 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP--CoNLL. 708--716.Google ScholarGoogle Scholar
  6. Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. 2013. Improving Efficiency and Accuracy in Multilingual Entity Extraction. In I-Semantics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Grant DeLozier, Jason Baldridge, and Loretta London. 2015. Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles. In AAAI. 2382--2388.Google ScholarGoogle Scholar
  8. Grant DeLozier, Ben Wing, Jason Baldridge, and Scott Nesbit. 2016. Creating a Novel Geolocation Corpus from Historical Texts. In Linguistic Annotation Workshop. 188--198. Google ScholarGoogle ScholarCross RefCross Ref
  9. Jacob Eisenstein, Brendan OfiConnor, Noah A. Smith, and Eric P. Xing. 2010. A Latent Variable Model for Geographic Lexical Variation. In EMNLP. 1277--1287.Google ScholarGoogle Scholar
  10. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In ACL. 363--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective Entity Linking in Web Text: A Graph-based Method. In SIGIR. 765--774.Google ScholarGoogle Scholar
  12. Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti. 2009. Collective Annotation of Wikipedia Entities in Web Text. In KDD. 457--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jochen L. Leidner. 2008. Toponym resolution in text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Ph.D. Dissertation. University of Edinburgh.Google ScholarGoogle Scholar
  14. Michael D. Lieberman, Hanan Samet, and Jagan Sankaranarayanan. 2010. Geotagging with local lexicons to build indexes for textually-specified spatial data. In ICDE. 201--212. Google ScholarGoogle ScholarCross RefCross Ref
  15. David Milne and Ian H. Witten. 2008. Learning to Link with Wikipedia. In CIKM. 509--518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity Linking meets Word Sense Disambiguation: a Unified Approach. TACL 2 (2014), 231--244.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ted Pedersen, Amruta Purandare, and Anagha Kulkarni. 2005. Name Discrimination by Clustering Similar Contexts. In CICLing. 226--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Erik Rauch, Michael Bukatin, and Kenneth Baker. 2003. A Confidence-based Framework for Disambiguating Geographic Terms. In HLT-NAACL Workshop on Analysis of Geographic References. 50--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. David A. Smith and Gregory Crane. 2001. Disambiguating Geographic Names in a Historical Digital Library. In Research and Advanced Technology for Digital Libraries. Springer Berlin Heidelberg, 127--136. Google ScholarGoogle ScholarCross RefCross Ref
  20. Michael Speriosu and Jason Baldridge. 2013. Text-Driven Toponym Resolution using Indirect Supervision. In ACL. 1466--1476.Google ScholarGoogle Scholar

Index Terms

  1. Toponym disambiguation in historical documents using semantic and geographic features

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        DATeCH2017: Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage
        June 2017
        179 pages
        ISBN:9781450352659
        DOI:10.1145/3078081

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        DATeCH2017 Paper Acceptance Rate29of37submissions,78%Overall Acceptance Rate60of86submissions,70%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader