Skip to main content
Top

2016 | OriginalPaper | Chapter

Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling

Authors : Yiting Ju, Benjamin Adams, Krzysztof Janowicz, Yingjie Hu, Bo Yan, Grant McKenzie

Published in: Knowledge Engineering and Knowledge Management

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Place name disambiguation is the task of correctly identifying a place from a set of places sharing a common name. It contributes to tasks such as knowledge extraction, query answering, geographic information retrieval, and automatic tagging. Disambiguation quality relies on the ability to correctly identify and interpret contextual clues, complicating the task for short texts. Here we propose a novel approach to the disambiguation of place names from short texts that integrates two models: entity co-occurrence and topic modeling. The first model uses Linked Data to identify related entities to improve disambiguation quality. The second model uses topic modeling to differentiate places based on the terms used to describe them. We evaluate our approach using a corpus of short texts, determine the suitable weight between models, and demonstrate that a combined model outperforms benchmark systems such as DBpedia Spotlight and Open Calais in terms of F1-score and Mean Reciprocal Rank.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Adams, B., Janowicz, K.: On the geo-indicativeness of non-georeferenced text. In: International AAAI Conference on Web and Social Media (ICWSM), pp. 375–378 (2012) Adams, B., Janowicz, K.: On the geo-indicativeness of non-georeferenced text. In: International AAAI Conference on Web and Social Media (ICWSM), pp. 375–378 (2012)
2.
go back to reference Adams, B., McKenzie, G., Gahegan, M.: Frankenplace: interactive thematic mapping for ad hoc exploratory search. In: Proceedings of the 24th International Conference on World Wide Web, pp. 12–22. ACM (2015) Adams, B., McKenzie, G., Gahegan, M.: Frankenplace: interactive thematic mapping for ad hoc exploratory search. In: Proceedings of the 24th International Conference on World Wide Web, pp. 12–22. ACM (2015)
3.
go back to reference Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)CrossRef Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)CrossRef
4.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)MATH
5.
go back to reference Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. EACL 6, 9–16 (2006) Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. EACL 6, 9–16 (2006)
6.
go back to reference Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. EMNLP-CoNLL 7, 708–716 (2007) Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. EMNLP-CoNLL 7, 708–716 (2007)
7.
go back to reference Fader, A., Soderland, S., Etzioni, O., Center, T.: Scaling Wikipedia-based named entity disambiguation to arbitrary web text. In: Proceedings of the IJCAI Workshop on User-contributed Knowledge, Artificial Intelligence: An Evolving Synergy, Pasadena, CA, USA, pp. 21–26, 2009 (2011) Fader, A., Soderland, S., Etzioni, O., Center, T.: Scaling Wikipedia-based named entity disambiguation to arbitrary web text. In: Proceedings of the IJCAI Workshop on User-contributed Knowledge, Artificial Intelligence: An Evolving Synergy, Pasadena, CA, USA, pp. 21–26, 2009 (2011)
8.
go back to reference Goodchild, M.F., Glennon, J.A.: Crowdsourcing geographic information for disaster response: a research frontier. Int. J. Digit. Earth 3(3), 231–241 (2010)CrossRef Goodchild, M.F., Glennon, J.A.: Crowdsourcing geographic information for disaster response: a research frontier. Int. J. Digit. Earth 3(3), 231–241 (2010)CrossRef
9.
go back to reference Gray, R.W.: Exact transformation equations for Fuller’s world map. Cartogr.: Int. J. Geogr. Inf. Geovis. 32(3), 17–25 (1995)CrossRef Gray, R.W.: Exact transformation equations for Fuller’s world map. Cartogr.: Int. J. Geogr. Inf. Geovis. 32(3), 17–25 (1995)CrossRef
10.
go back to reference Han, X., Zhao, J., Structural semantic relatedness: a knowledge-based method to named entity disambiguation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 50–59. Association for Computational Linguistics (2010) Han, X., Zhao, J., Structural semantic relatedness: a knowledge-based method to named entity disambiguation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 50–59. Association for Computational Linguistics (2010)
11.
go back to reference Hu, Y., Janowicz, K., Prasad, S.: Improving Wikipedia-based place name disambiguation in short texts using structured data from DBpedia. In: Proceedings of the 8th Workshop on Geographic Information Retrieval, p. 8. ACM (2014) Hu, Y., Janowicz, K., Prasad, S.: Improving Wikipedia-based place name disambiguation in short texts using structured data from DBpedia. In: Proceedings of the 8th Workshop on Geographic Information Retrieval, p. 8. ACM (2014)
12.
go back to reference Janowicz, K., Hitzler, P.: The digital earth as knowledge engine. Semant. Web 3(3), 213–221 (2012) Janowicz, K., Hitzler, P.: The digital earth as knowledge engine. Semant. Web 3(3), 213–221 (2012)
13.
go back to reference Jones, C.B., Purves, R.S.: Geographical information retrieval. Int. J. Geogr. Inf. Sci. 22(3), 219–228 (2008)CrossRef Jones, C.B., Purves, R.S.: Geographical information retrieval. Int. J. Geogr. Inf. Sci. 22(3), 219–228 (2008)CrossRef
14.
go back to reference Machado, I.M.R., de Alencar, R.O., de Oliveira Campos Jr., R., Davis Jr., C.A.: An ontological gazetteer and its application for place name disambiguation in text. J. Braz. Comput. Soc. 17(4), 267–279 (2011)CrossRef Machado, I.M.R., de Alencar, R.O., de Oliveira Campos Jr., R., Davis Jr., C.A.: An ontological gazetteer and its application for place name disambiguation in text. J. Braz. Comput. Soc. 17(4), 267–279 (2011)CrossRef
15.
go back to reference Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C., Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011) Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C., Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)
16.
go back to reference Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 233–242. ACM (2007) Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 233–242. ACM (2007)
17.
go back to reference Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of the 17th ACM conference on Information and knowledge management, pp. 509–518. ACM (2008) Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of the 17th ACM conference on Information and knowledge management, pp. 509–518. ACM (2008)
18.
go back to reference Overell, S., Rüger, S.: Using co-occurrence models for placename disambiguation. Int. J. Geogr. Inf. Sci. 22(3), 265–287 (2008)CrossRef Overell, S., Rüger, S.: Using co-occurrence models for placename disambiguation. Int. J. Geogr. Inf. Sci. 22(3), 265–287 (2008)CrossRef
19.
go back to reference Purves, R., Jones, C.: Geographic information retrieval. SIGSPATIAL Spec. 3(2), 2–4 (2011)CrossRef Purves, R., Jones, C.: Geographic information retrieval. SIGSPATIAL Spec. 3(2), 2–4 (2011)CrossRef
20.
go back to reference Rizzo, G., van Erp, M., Troncy, R.: Benchmarking the extraction and disambiguation of named entities on the semantic web. In: LREC, pp. 4593–4600 (2014) Rizzo, G., van Erp, M., Troncy, R.: Benchmarking the extraction and disambiguation of named entities on the semantic web. In: LREC, pp. 4593–4600 (2014)
21.
go back to reference Sahr, K., White, D., Kimerling, A.J.: Geodesic discrete global grid systems. Cartogr. Geogr. Inf. Sci. 30(2), 121–134 (2003)CrossRef Sahr, K., White, D., Kimerling, A.J.: Geodesic discrete global grid systems. Cartogr. Geogr. Inf. Sci. 30(2), 121–134 (2003)CrossRef
22.
go back to reference Spitz, A., Geiß, J., Gertz, M., So far away, yet so close: augmenting toponym disambiguation and similarity with text-based networks. In: Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2016, pp. 2: 1–2: 6. ACM, New York, NY, USA (2016) Spitz, A., Geiß, J., Gertz, M., So far away, yet so close: augmenting toponym disambiguation and similarity with text-based networks. In: Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2016, pp. 2: 1–2: 6. ACM, New York, NY, USA (2016)
23.
go back to reference Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007) Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)
24.
go back to reference Zhang, W., Gelernter, J.: Geocoding location expressions in Twitter messages: a preference learning method. J. Spat. Inf. Sci. 2014(9), 37–70 (2014) Zhang, W., Gelernter, J.: Geocoding location expressions in Twitter messages: a preference learning method. J. Spat. Inf. Sci. 2014(9), 37–70 (2014)
Metadata
Title
Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling
Authors
Yiting Ju
Benjamin Adams
Krzysztof Janowicz
Yingjie Hu
Bo Yan
Grant McKenzie
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-49004-5_23

Premium Partner