ABSTRACT
In this paper, we attempt to give spatial semantics to web pages by assigning them place names. The entire assignment task is divided into three sub-problems, namely place name extraction, place name disambiguation and place name assignment. We propose our approaches to address these sub-problems. In particular, we have modified GATE, a well-known named entity extraction software, to perform place name extraction using a US Census gazetteer. A rule-based place name disambiguation method and a place name assignment method capable of assigning place names to web page segments have also been proposed. We have evaluated our proposed disambiguation and assignment methods on a web page collection referenced by the DLESE metadata collection. The results returned by our methods are compared with manually disambiguated place names and place name assignment. It is shown that our proposed place name disambiguation method works well for geo/geo ambiguities. The preliminary results of our place name assignment method indicate promising results given the existence of geo/non-geo ambiguities among place names.
- E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: Geotagging web content. In SIGIR 2004, Sheffield, South Yorkshire, UK, July 2004.]] Google ScholarDigital Library
- N. Chinchor. MUC-7 named entity task definition version 3.5. In Seventh Message Understanding Conference (MUC-7), 1998.]]Google Scholar
- H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.]]Google Scholar
- Digital Library for Earth System Education. http://www.dlese.org.]]Google Scholar
- J. Leidner. Towards a reference corpus for automatic toponym resolution evaluation. In SIGIR 2004, Sheffield, South Yorkshire, UK, July 2004.]]Google Scholar
- H. Li, R. Srihari, C. Niu, and W. Li. Location normalization for information extraction. In 19th Conference on Computational Linguistics (COLING'02), Taipei, Taiwan, August 2002.]] Google ScholarDigital Library
- H. Li, R. K. Srihari, C. Niu, and W. Li. Infoxtract location normalization: a hybrid approach to geographic references in information extraction. In Proc. of HLT-NAACL 2003 Workshop on Analysis of Geographic References, Alberta, Canada, 2003.]] Google ScholarDigital Library
- E.-P. Lim, D. H.-L. Goh, Z. Liu, W.-K. Ng, C. S.-G. Khoo, and S. E. Higgins. G-Portal: A map-based digital library for distributed geospatial and georeferenced resources. In Proceedings of the Second ACM+IEEE Joint Conference on Digital Libraries (JCDL 2002), Portland, Oregon, USA, July 14-18 2002.]] Google ScholarDigital Library
- D. Manov, A. Kiryakov, B. Popov, K. Bontcheva, and D. Maynard. Experiments with geographic knowledge for information extraction. In HLT-NAACL 2003 Workshop on Analysis of Geographic References, Edmonton, Canada, 2003.]] Google ScholarDigital Library
- Y. Morimoto, M. Aono, M. E. Houle, and K. McCurley. Extracting spatial knowledge from the web. In Symposium on Applications and the Internet (SAINT'03), 2003.]] Google ScholarDigital Library
- E. Rauch, M. Bukatin, and K. Baker. A confidence-based framework for disambiguating geographic terms. In HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 50--54, Edmonton, Canada, 2003.]] Google ScholarDigital Library
- D. Smith and G. Crane. Disambiguating geographic names in a historical digital library. In ECDL, pages 127--136, 2001.]] Google ScholarDigital Library
- US Census Bureau. http://www.census.gov.]]Google Scholar
Index Terms
- On assigning place names to geography related web pages
Recommendations
Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling
Knowledge Engineering and Knowledge ManagementAbstractPlace name disambiguation is the task of correctly identifying a place from a set of places sharing a common name. It contributes to tasks such as knowledge extraction, query answering, geographic information retrieval, and automatic tagging. ...
Improving wikipedia-based place name disambiguation in short texts using structured data from DBpedia
GIR '14: Proceedings of the 8th Workshop on Geographic Information RetrievalPlace name disambiguation is an important task for improving the accuracy of geographic information retrieval. This task becomes more challenging when the input texts are short. Wikipedia provides information about places and has often been employed for ...
Detecting geographical references in the form of place names and associated spatial natural language
Recognizing spatial language in text documents, termed geoparsing, is useful for many applications, because together with mapping such language to lat/long values, also known as geocoding, it enables the connection of the unstructured textual realm with ...
Comments