ABSTRACT
Historians are often interested in the locations mentioned in digitized collections. However, place names are highly ambiguous and may change over time, which makes it especially hard to automatically ground mentions of places in historical texts to their real-world referents. Toponym disambiguation is a challenging problem in natural language processing, and has been approached in two different yet related tasks: toponym resolution and entity linking. In this paper, we propose a weakly-supervised method that combines the strengths of both approaches by exploiting both geographic and semantic features. We tested our method against a historical toponym resolution benchmark and improved the state of the art. We also created five datasets and tested the performance of two state-of-the-art out-of-the-box entity linking methods and also improved on their performance when only locations are considered.
- Razvan Bunescu and Marius Pasca. 2006. Using Encyclopedic Knowledge for Named Entity Disambiguation. In EACL. 9--16.Google Scholar
- Davide Buscaldi. 2011. Approaches to disambiguating toponyms. SIGSPATIAL Special 3, 2 (2011), 16--19. Google ScholarDigital Library
- Davide Buscaldi and Bernardo Magnini. 2010. Grounding Toponyms in an Italian Local News Corpus. In Workshop on Geographic Information Retrieval. 1--5. Google ScholarDigital Library
- Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet: A content-based approach to geo-locating twitter users. In CIKM. 759--768. Google ScholarDigital Library
- Silviu Cucerzan. 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP--CoNLL. 708--716.Google Scholar
- Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. 2013. Improving Efficiency and Accuracy in Multilingual Entity Extraction. In I-Semantics. Google ScholarDigital Library
- Grant DeLozier, Jason Baldridge, and Loretta London. 2015. Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles. In AAAI. 2382--2388.Google Scholar
- Grant DeLozier, Ben Wing, Jason Baldridge, and Scott Nesbit. 2016. Creating a Novel Geolocation Corpus from Historical Texts. In Linguistic Annotation Workshop. 188--198. Google ScholarCross Ref
- Jacob Eisenstein, Brendan OfiConnor, Noah A. Smith, and Eric P. Xing. 2010. A Latent Variable Model for Geographic Lexical Variation. In EMNLP. 1277--1287.Google Scholar
- Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In ACL. 363--370. Google ScholarDigital Library
- Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective Entity Linking in Web Text: A Graph-based Method. In SIGIR. 765--774.Google Scholar
- Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti. 2009. Collective Annotation of Wikipedia Entities in Web Text. In KDD. 457--466. Google ScholarDigital Library
- Jochen L. Leidner. 2008. Toponym resolution in text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Ph.D. Dissertation. University of Edinburgh.Google Scholar
- Michael D. Lieberman, Hanan Samet, and Jagan Sankaranarayanan. 2010. Geotagging with local lexicons to build indexes for textually-specified spatial data. In ICDE. 201--212. Google ScholarCross Ref
- David Milne and Ian H. Witten. 2008. Learning to Link with Wikipedia. In CIKM. 509--518. Google ScholarDigital Library
- Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity Linking meets Word Sense Disambiguation: a Unified Approach. TACL 2 (2014), 231--244.Google ScholarCross Ref
- Ted Pedersen, Amruta Purandare, and Anagha Kulkarni. 2005. Name Discrimination by Clustering Similar Contexts. In CICLing. 226--237. Google ScholarDigital Library
- Erik Rauch, Michael Bukatin, and Kenneth Baker. 2003. A Confidence-based Framework for Disambiguating Geographic Terms. In HLT-NAACL Workshop on Analysis of Geographic References. 50--54. Google ScholarDigital Library
- David A. Smith and Gregory Crane. 2001. Disambiguating Geographic Names in a Historical Digital Library. In Research and Advanced Technology for Digital Libraries. Springer Berlin Heidelberg, 127--136. Google ScholarCross Ref
- Michael Speriosu and Jason Baldridge. 2013. Text-Driven Toponym Resolution using Indirect Supervision. In ACL. 1466--1476.Google Scholar
Index Terms
- Toponym disambiguation in historical documents using semantic and geographic features
Recommendations
Toponym disambiguation in historical documents using network analysis of qualitative relationships
GeoHumanities '19: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Geospatial HumanitiesIn this paper we use network analysis to identify qualitative "neighbors" for toponyms in an eighteenth-century French encyclopedia, but could apply to any entry-based text with annotated toponyms. This method draws on relations in a corpus of articles, ...
Spatial autocorrelation and toponym ambiguity
GIR '08: Proceedings of the 5th Workshop on Geographic Information RetrievalIn this paper, we explore the spatial distribution of the referents of ambiguous toponyms and compare it to the distribution of randomly selected unambiguous toponym pairs. We show that for a number of gazetteers, ambiguous toponyms are spatially ...
Toponym disambiguation in online social network profiles
SIGSPATIAL '15: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information SystemsIn online social networks individuals are given the option to reveal on their online profiles some personal information about themselves including, among others, their home location that, if specified, is typically referred to with a toponym. A toponym ...
Comments