Skip to main content
Erschienen in: Discover Computing 1/2013

01.02.2013

Modeling locations with social media

verfasst von: Neil O’Hare, Vanessa Murdock

Erschienen in: Discover Computing | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we focus on the locations explicit and implicit in users descriptions of their surroundings. We propose a statistical language modeling approach to identifying locations in arbitrary text, and investigate several ways to estimate the models, based on the term frequency and the user frequency. The geotagged public photos in Flickr serve as a convenient ground truth. Our results show that we can predict location within a one kilometer by one kilometer cell with 17 % accuracy, and within a three kilometer radius around such a one kilometer cell with 40 % accuracy, using only a photo’s tags. This is significantly better than the state of the art. Further we examine several estimation strategies that leverage the physical proximity of places, and show that for sparsely represented locations, smoothing from the immediate neighborhood improves results. We also show that estimation strategies based on user frequency are much more reliable than approaches based on the raw term frequency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
http://​www.​flickr.​com visited March 2011.
 
2
 
5
http://​www.​navteq.​com/​ visited January 2012.
 
6
http://​www.​teleatlas.​com visited January 2012.
 
8
Note that Flickr has a public API which allows members of the research community to download metadata and images from the public photos of users. http://​www.​flickr.​com/​services/​api/​ visited January 2012.
 
9
We do not present the complete set of results for the small dataset for all hierarchical smoothing approaches here (for brevity), but the relative performance of the different approaches is the similar to those in Table 4.
 
10
Personal communication with the creators of the CoPhIR dataset. The removed archive was sapir_id_1_xml_r.tgz.
 
12
http://​www.​facebook.​com visited January 2012.
 
13
http://​www.​twitter.​com visited January 2012.
 
Literatur
Zurück zum Zitat Ahern, S., Naaman, M., Nair, R., & Yang, J. H.-I. (2007). World Explorer: Visualizing aggregate data from unstructured text in geo-referenced collections. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’07), pp. 1–10. Ahern, S., Naaman, M., Nair, R., & Yang, J. H.-I. (2007). World Explorer: Visualizing aggregate data from unstructured text in geo-referenced collections. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’07), pp. 1–10.
Zurück zum Zitat Amitay, E., Har’El, N., Sivan, R., & Soffer, A. (2004). Web-a-where: Geotagging web content. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04), pp. 273–280. Amitay, E., Har’El, N., Sivan, R., & Soffer, A. (2004). Web-a-where: Geotagging web content. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04), pp. 273–280.
Zurück zum Zitat Backstrom, L., Kleinberg, J., Kumar, R., & Novak, J. (2008). Spatial variation in search engine queries. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08), pp. 357–366. Backstrom, L., Kleinberg, J., Kumar, R., & Novak, J. (2008). Spatial variation in search engine queries. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08), pp. 357–366.
Zurück zum Zitat Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., & Rabitti, F. (2009). CoPhIR: A test collection for content-based image retrieval. CoRR, abs/0905.4627v2. Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., & Rabitti, F. (2009). CoPhIR: A test collection for content-based image retrieval. CoRR, abs/0905.4627v2.
Zurück zum Zitat Chen, L., Hu, B.-G., Zhang, L., Li, M., & Zhang, H. (2003). Face annotation for family photo album management. International Journal of Image and Graphics, 3(1), 81–94.CrossRef Chen, L., Hu, B.-G., Zhang, L., Li, M., & Zhang, H. (2003). Face annotation for family photo album management. International Journal of Image and Graphics, 3(1), 81–94.CrossRef
Zurück zum Zitat Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM ’10), pp. 759–768. Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM ’10), pp. 759–768.
Zurück zum Zitat Clements, M., Serdyukov, P., de Vries, A. P., & Reinders, M. J. T. (2010). Finding wormholes with flickr geotags. In Proceedings of the 32nd European Conference on Advances in Information Retrieval (ECIR ’10), pp. 658–661. Clements, M., Serdyukov, P., de Vries, A. P., & Reinders, M. J. T. (2010). Finding wormholes with flickr geotags. In Proceedings of the 32nd European Conference on Advances in Information Retrieval (ECIR ’10), pp. 658–661.
Zurück zum Zitat Crandall, D. J., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09), pp. 761–770. Crandall, D. J., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09), pp. 761–770.
Zurück zum Zitat Ding, J., Gravano, L., & Shivakumar, N. (2000). Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00), pp. 545–556. Ding, J., Gravano, L., & Shivakumar, N. (2000). Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00), pp. 545–556.
Zurück zum Zitat Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2010). A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP ’10), pp. 1277–1287. Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2010). A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP ’10), pp. 1277–1287.
Zurück zum Zitat Hays, J., & Efros, A. A. (2008). im2gps: Estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’08). Hays, J., & Efros, A. A. (2008). im2gps: Estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’08).
Zurück zum Zitat Hiemstra, D. (1998). A linguistically motivated probabilistic model of information retrieval. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL ’98) (pp. 569–584). London: Springer-Verlag. Hiemstra, D. (1998). A linguistically motivated probabilistic model of information retrieval. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL ’98) (pp. 569–584). London: Springer-Verlag.
Zurück zum Zitat Hollenstein, L., & Purves, R. (2010). Exploring place through user-generated content: Using flickr to describe city cores. Journal of Spatial Information Science, (1). Hollenstein, L., & Purves, R. (2010). Exploring place through user-generated content: Using flickr to describe city cores. Journal of Spatial Information Science, (1).
Zurück zum Zitat Jones, C. B., Purves, R. S., Clough, P. D., & Joho, H. (2008a). Modelling vague places with knowldge from the web. International Journal of Geographical Information Science, 22(10), 1045–1065.CrossRef Jones, C. B., Purves, R. S., Clough, P. D., & Joho, H. (2008a). Modelling vague places with knowldge from the web. International Journal of Geographical Information Science, 22(10), 1045–1065.CrossRef
Zurück zum Zitat Jones, R., Zhang, W., Rey, B., Jhala, P., & Stipp, E. (2008b). Geographic intention and modification in web search. International Journal of Geographical Information Science, 22(3), 229–246.CrossRef Jones, R., Zhang, W., Rey, B., Jhala, P., & Stipp, E. (2008b). Geographic intention and modification in web search. International Journal of Geographical Information Science, 22(3), 229–246.CrossRef
Zurück zum Zitat Kantor, P. B., & Voorhees, E. M. (1996). Report on the trec-5 confusion track. In NIST Special Publication 500-238: The Fifth Text REtrieval Conference (TREC-5), pp. 65–74. Kantor, P. B., & Voorhees, E. M. (1996). Report on the trec-5 confusion track. In NIST Special Publication 500-238: The Fifth Text REtrieval Conference (TREC-5), pp. 65–74.
Zurück zum Zitat Kennedy, L., Naaman, M., Ahern, S., Nair, R., & Rattenbury, T. (2007). How flickr helps us make sense of the world: Context and content in community-contributed media collections. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA ’07), pp. 631–640. Kennedy, L., Naaman, M., Ahern, S., Nair, R., & Rattenbury, T. (2007). How flickr helps us make sense of the world: Context and content in community-contributed media collections. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA ’07), pp. 631–640.
Zurück zum Zitat Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRef Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRef
Zurück zum Zitat Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: The MIT Press.MATH Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: The MIT Press.MATH
Zurück zum Zitat Mc Donald, K., & Smeaton, A. F. (2005). A comparison of score, rank and probability-based fusion methods for video shot retrieval. In Proceedings of the International Conference on Image and Video Retrieval (CIVR 2005), pp. 61–70. Mc Donald, K., & Smeaton, A. F. (2005). A comparison of score, rank and probability-based fusion methods for video shot retrieval. In Proceedings of the International Conference on Image and Video Retrieval (CIVR 2005), pp. 61–70.
Zurück zum Zitat Mei, Q., Liu, C., Su, H., & Zhai, C. (2006). A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the 15th International Conference on the World Wide Web (WWW ’06). Mei, Q., Liu, C., Su, H., & Zhai, C. (2006). A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the 15th International Conference on the World Wide Web (WWW ’06).
Zurück zum Zitat Moxley, E., Kleban, J., & Manjunath, B. S. (2008). Spirittagger: A geo-aware tag suggestion tool mined from flickr. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR ’08), pp. 24–30. Moxley, E., Kleban, J., & Manjunath, B. S. (2008). Spirittagger: A geo-aware tag suggestion tool mined from flickr. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR ’08), pp. 24–30.
Zurück zum Zitat Murdock, V. (2006). Aspects of Sentence Retrieval. PhD thesis, University of Massachusetts. Murdock, V. (2006). Aspects of Sentence Retrieval. PhD thesis, University of Massachusetts.
Zurück zum Zitat Naaman, M., Paepcke, A., & Garcia-Molina, H. (2003). From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proceedings of the 10th International Conference on Cooperative Information Systems (COOPIS 2003). Naaman, M., Paepcke, A., & Garcia-Molina, H. (2003). From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proceedings of the 10th International Conference on Cooperative Information Systems (COOPIS 2003).
Zurück zum Zitat Nov, O., Naaman, M., & Ye, C. (2010). Analysis of participation in an online photo-sharing community: A multidimensional perspective. Journal of the American Society for Information Science and Technology, 61(3). Nov, O., Naaman, M., & Ye, C. (2010). Analysis of participation in an online photo-sharing community: A multidimensional perspective. Journal of the American Society for Information Science and Technology, 61(3).
Zurück zum Zitat O’Hare, N., & Smeaton, A. F. (2009). Context-aware person identification in personal photo collections. IEEE Transactions on Multimedia, Special Issue on Integration of Context and Content for Multimedia Management, 11(2), 220–228.CrossRef O’Hare, N., & Smeaton, A. F. (2009). Context-aware person identification in personal photo collections. IEEE Transactions on Multimedia, Special Issue on Integration of Context and Content for Multimedia Management, 11(2), 220–228.CrossRef
Zurück zum Zitat Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’98), pp. 275–281. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’98), pp. 275–281.
Zurück zum Zitat Rattenbury, T., Good, N., & Naaman, M. (2007). Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’07). Rattenbury, T., Good, N., & Naaman, M. (2007). Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’07).
Zurück zum Zitat Serdyukov, P., Murdock, V., & van Zwol, R. (2009). Placing flickr photos on a map. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’09) (pp. 484–491). ACM. Serdyukov, P., Murdock, V., & van Zwol, R. (2009). Placing flickr photos on a map. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’09) (pp. 484–491). ACM.
Zurück zum Zitat Sigurbjörnsson, B., & van Zwol, R. (2008). Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th International World Wide Web Conference (WWW 2008), Beijing, China. Sigurbjörnsson, B., & van Zwol, R. (2008). Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th International World Wide Web Conference (WWW 2008), Beijing, China.
Zurück zum Zitat Smucker, M. D., & Allan, J. (2005). An investigation of dirichlet prior smoothing’s performance advantage. Technical Report CIIR Technical Report IR-548, The Center for Intelligent Information Retrieval, The University of Massachusetts. Smucker, M. D., & Allan, J. (2005). An investigation of dirichlet prior smoothing’s performance advantage. Technical Report CIIR Technical Report IR-548, The Center for Intelligent Information Retrieval, The University of Massachusetts.
Zurück zum Zitat Toyama, K., Logan, R., & Roseway, A. (2003). Geographic location tags on digital images. In Proceedings of the Eleventh ACM International Conference on Multimedia (MULTIMEDIA ’03), pp. 156–166. Toyama, K., Logan, R., & Roseway, A. (2003). Geographic location tags on digital images. In Proceedings of the Eleventh ACM International Conference on Multimedia (MULTIMEDIA ’03), pp. 156–166.
Zurück zum Zitat Vadrevu, S., Zhang, Y., Tseng, B., Sun, G., & Li, X. (2008). Identifying regional sensitive queries in web search. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08). Vadrevu, S., Zhang, Y., Tseng, B., Sun, G., & Li, X. (2008). Identifying regional sensitive queries in web search. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08).
Zurück zum Zitat van House, N. (2007). Flickr and public image-sharing: Distance closeness and photo exhibition. In Extended Abstracts CHI. van House, N. (2007). Flickr and public image-sharing: Distance closeness and photo exhibition. In Extended Abstracts CHI.
Zurück zum Zitat Vincenty, T. (1975). Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey Review, 23(176), 88–93. Vincenty, T. (1975). Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey Review, 23(176), 88–93.
Zurück zum Zitat Wang, C., Wang, J., Xie, X., & Ma, W.-Y. (2007). Mining geographic knowledge using location aware topic model. In Proceedings of the 4th ACM Workshop On Geographic Information Retrieval (GIR ’07). Wang, C., Wang, J., Xie, X., & Ma, W.-Y. (2007). Mining geographic knowledge using location aware topic model. In Proceedings of the 4th ACM Workshop On Geographic Information Retrieval (GIR ’07).
Zurück zum Zitat Westerveld, T., de Vries, A. P., & van Ballegooij, A. R. (2003). CWI at the TREC-2002 video track. In NIST Special Publication: SP 500-251: The Eleventh Text REtrieval Conference (TREC 2002), pp. 207–216. Westerveld, T., de Vries, A. P., & van Ballegooij, A. R. (2003). CWI at the TREC-2002 video track. In NIST Special Publication: SP 500-251: The Eleventh Text REtrieval Conference (TREC 2002), pp. 207–216.
Zurück zum Zitat Yi, X., Raghavan, H., & Leggetter, C. (2009). Discovering users’ specific geo intention in web search. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09) (pp. 481–490). New York, NY, USA. Yi, X., Raghavan, H., & Leggetter, C. (2009). Discovering users’ specific geo intention in web search. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09) (pp. 481–490). New York, NY, USA.
Zurück zum Zitat Zhuang, Z., Brunk, C., & Giles, C. L. (2008). Modeling and visualizing geosensitive queries based on user clicks. In First International Workshop on Location and the Web (LocWeb ’08). Zhuang, Z., Brunk, C., & Giles, C. L. (2008). Modeling and visualizing geosensitive queries based on user clicks. In First International Workshop on Location and the Web (LocWeb ’08).
Zurück zum Zitat Zong, W., Wu, D., Sun, A., Lim, E.-P., & Goh, D. H.-L. (2005). On assigning place names to geography related web pages. In Proceedings of the Joint Conference on Digital Libraries (JCDL ’05), pp. 354–362. Zong, W., Wu, D., Sun, A., Lim, E.-P., & Goh, D. H.-L. (2005). On assigning place names to geography related web pages. In Proceedings of the Joint Conference on Digital Libraries (JCDL ’05), pp. 354–362.
Metadaten
Titel
Modeling locations with social media
verfasst von
Neil O’Hare
Vanessa Murdock
Publikationsdatum
01.02.2013
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 1/2013
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-012-9195-y

Premium Partner