Skip to main content
Top
Published in: Discover Computing 1/2013

01-02-2013

Modeling locations with social media

Authors: Neil O’Hare, Vanessa Murdock

Published in: Discover Computing | Issue 1/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper we focus on the locations explicit and implicit in users descriptions of their surroundings. We propose a statistical language modeling approach to identifying locations in arbitrary text, and investigate several ways to estimate the models, based on the term frequency and the user frequency. The geotagged public photos in Flickr serve as a convenient ground truth. Our results show that we can predict location within a one kilometer by one kilometer cell with 17 % accuracy, and within a three kilometer radius around such a one kilometer cell with 40 % accuracy, using only a photo’s tags. This is significantly better than the state of the art. Further we examine several estimation strategies that leverage the physical proximity of places, and show that for sparsely represented locations, smoothing from the immediate neighborhood improves results. We also show that estimation strategies based on user frequency are much more reliable than approaches based on the raw term frequency.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
http://​www.​flickr.​com visited March 2011.
 
2
 
5
http://​www.​navteq.​com/​ visited January 2012.
 
6
http://​www.​teleatlas.​com visited January 2012.
 
8
Note that Flickr has a public API which allows members of the research community to download metadata and images from the public photos of users. http://​www.​flickr.​com/​services/​api/​ visited January 2012.
 
9
We do not present the complete set of results for the small dataset for all hierarchical smoothing approaches here (for brevity), but the relative performance of the different approaches is the similar to those in Table 4.
 
10
Personal communication with the creators of the CoPhIR dataset. The removed archive was sapir_id_1_xml_r.tgz.
 
12
http://​www.​facebook.​com visited January 2012.
 
13
http://​www.​twitter.​com visited January 2012.
 
Literature
go back to reference Ahern, S., Naaman, M., Nair, R., & Yang, J. H.-I. (2007). World Explorer: Visualizing aggregate data from unstructured text in geo-referenced collections. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’07), pp. 1–10. Ahern, S., Naaman, M., Nair, R., & Yang, J. H.-I. (2007). World Explorer: Visualizing aggregate data from unstructured text in geo-referenced collections. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’07), pp. 1–10.
go back to reference Amitay, E., Har’El, N., Sivan, R., & Soffer, A. (2004). Web-a-where: Geotagging web content. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04), pp. 273–280. Amitay, E., Har’El, N., Sivan, R., & Soffer, A. (2004). Web-a-where: Geotagging web content. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04), pp. 273–280.
go back to reference Backstrom, L., Kleinberg, J., Kumar, R., & Novak, J. (2008). Spatial variation in search engine queries. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08), pp. 357–366. Backstrom, L., Kleinberg, J., Kumar, R., & Novak, J. (2008). Spatial variation in search engine queries. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08), pp. 357–366.
go back to reference Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., & Rabitti, F. (2009). CoPhIR: A test collection for content-based image retrieval. CoRR, abs/0905.4627v2. Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., & Rabitti, F. (2009). CoPhIR: A test collection for content-based image retrieval. CoRR, abs/0905.4627v2.
go back to reference Chen, L., Hu, B.-G., Zhang, L., Li, M., & Zhang, H. (2003). Face annotation for family photo album management. International Journal of Image and Graphics, 3(1), 81–94.CrossRef Chen, L., Hu, B.-G., Zhang, L., Li, M., & Zhang, H. (2003). Face annotation for family photo album management. International Journal of Image and Graphics, 3(1), 81–94.CrossRef
go back to reference Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM ’10), pp. 759–768. Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM ’10), pp. 759–768.
go back to reference Clements, M., Serdyukov, P., de Vries, A. P., & Reinders, M. J. T. (2010). Finding wormholes with flickr geotags. In Proceedings of the 32nd European Conference on Advances in Information Retrieval (ECIR ’10), pp. 658–661. Clements, M., Serdyukov, P., de Vries, A. P., & Reinders, M. J. T. (2010). Finding wormholes with flickr geotags. In Proceedings of the 32nd European Conference on Advances in Information Retrieval (ECIR ’10), pp. 658–661.
go back to reference Crandall, D. J., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09), pp. 761–770. Crandall, D. J., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09), pp. 761–770.
go back to reference Ding, J., Gravano, L., & Shivakumar, N. (2000). Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00), pp. 545–556. Ding, J., Gravano, L., & Shivakumar, N. (2000). Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00), pp. 545–556.
go back to reference Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2010). A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP ’10), pp. 1277–1287. Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2010). A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP ’10), pp. 1277–1287.
go back to reference Hays, J., & Efros, A. A. (2008). im2gps: Estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’08). Hays, J., & Efros, A. A. (2008). im2gps: Estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’08).
go back to reference Hiemstra, D. (1998). A linguistically motivated probabilistic model of information retrieval. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL ’98) (pp. 569–584). London: Springer-Verlag. Hiemstra, D. (1998). A linguistically motivated probabilistic model of information retrieval. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL ’98) (pp. 569–584). London: Springer-Verlag.
go back to reference Hollenstein, L., & Purves, R. (2010). Exploring place through user-generated content: Using flickr to describe city cores. Journal of Spatial Information Science, (1). Hollenstein, L., & Purves, R. (2010). Exploring place through user-generated content: Using flickr to describe city cores. Journal of Spatial Information Science, (1).
go back to reference Jones, C. B., Purves, R. S., Clough, P. D., & Joho, H. (2008a). Modelling vague places with knowldge from the web. International Journal of Geographical Information Science, 22(10), 1045–1065.CrossRef Jones, C. B., Purves, R. S., Clough, P. D., & Joho, H. (2008a). Modelling vague places with knowldge from the web. International Journal of Geographical Information Science, 22(10), 1045–1065.CrossRef
go back to reference Jones, R., Zhang, W., Rey, B., Jhala, P., & Stipp, E. (2008b). Geographic intention and modification in web search. International Journal of Geographical Information Science, 22(3), 229–246.CrossRef Jones, R., Zhang, W., Rey, B., Jhala, P., & Stipp, E. (2008b). Geographic intention and modification in web search. International Journal of Geographical Information Science, 22(3), 229–246.CrossRef
go back to reference Kantor, P. B., & Voorhees, E. M. (1996). Report on the trec-5 confusion track. In NIST Special Publication 500-238: The Fifth Text REtrieval Conference (TREC-5), pp. 65–74. Kantor, P. B., & Voorhees, E. M. (1996). Report on the trec-5 confusion track. In NIST Special Publication 500-238: The Fifth Text REtrieval Conference (TREC-5), pp. 65–74.
go back to reference Kennedy, L., Naaman, M., Ahern, S., Nair, R., & Rattenbury, T. (2007). How flickr helps us make sense of the world: Context and content in community-contributed media collections. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA ’07), pp. 631–640. Kennedy, L., Naaman, M., Ahern, S., Nair, R., & Rattenbury, T. (2007). How flickr helps us make sense of the world: Context and content in community-contributed media collections. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA ’07), pp. 631–640.
go back to reference Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRef Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRef
go back to reference Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: The MIT Press.MATH Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: The MIT Press.MATH
go back to reference Mc Donald, K., & Smeaton, A. F. (2005). A comparison of score, rank and probability-based fusion methods for video shot retrieval. In Proceedings of the International Conference on Image and Video Retrieval (CIVR 2005), pp. 61–70. Mc Donald, K., & Smeaton, A. F. (2005). A comparison of score, rank and probability-based fusion methods for video shot retrieval. In Proceedings of the International Conference on Image and Video Retrieval (CIVR 2005), pp. 61–70.
go back to reference Mei, Q., Liu, C., Su, H., & Zhai, C. (2006). A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the 15th International Conference on the World Wide Web (WWW ’06). Mei, Q., Liu, C., Su, H., & Zhai, C. (2006). A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the 15th International Conference on the World Wide Web (WWW ’06).
go back to reference Moxley, E., Kleban, J., & Manjunath, B. S. (2008). Spirittagger: A geo-aware tag suggestion tool mined from flickr. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR ’08), pp. 24–30. Moxley, E., Kleban, J., & Manjunath, B. S. (2008). Spirittagger: A geo-aware tag suggestion tool mined from flickr. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR ’08), pp. 24–30.
go back to reference Murdock, V. (2006). Aspects of Sentence Retrieval. PhD thesis, University of Massachusetts. Murdock, V. (2006). Aspects of Sentence Retrieval. PhD thesis, University of Massachusetts.
go back to reference Naaman, M., Paepcke, A., & Garcia-Molina, H. (2003). From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proceedings of the 10th International Conference on Cooperative Information Systems (COOPIS 2003). Naaman, M., Paepcke, A., & Garcia-Molina, H. (2003). From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proceedings of the 10th International Conference on Cooperative Information Systems (COOPIS 2003).
go back to reference Nov, O., Naaman, M., & Ye, C. (2010). Analysis of participation in an online photo-sharing community: A multidimensional perspective. Journal of the American Society for Information Science and Technology, 61(3). Nov, O., Naaman, M., & Ye, C. (2010). Analysis of participation in an online photo-sharing community: A multidimensional perspective. Journal of the American Society for Information Science and Technology, 61(3).
go back to reference O’Hare, N., & Smeaton, A. F. (2009). Context-aware person identification in personal photo collections. IEEE Transactions on Multimedia, Special Issue on Integration of Context and Content for Multimedia Management, 11(2), 220–228.CrossRef O’Hare, N., & Smeaton, A. F. (2009). Context-aware person identification in personal photo collections. IEEE Transactions on Multimedia, Special Issue on Integration of Context and Content for Multimedia Management, 11(2), 220–228.CrossRef
go back to reference Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’98), pp. 275–281. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’98), pp. 275–281.
go back to reference Rattenbury, T., Good, N., & Naaman, M. (2007). Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’07). Rattenbury, T., Good, N., & Naaman, M. (2007). Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’07).
go back to reference Serdyukov, P., Murdock, V., & van Zwol, R. (2009). Placing flickr photos on a map. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’09) (pp. 484–491). ACM. Serdyukov, P., Murdock, V., & van Zwol, R. (2009). Placing flickr photos on a map. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’09) (pp. 484–491). ACM.
go back to reference Sigurbjörnsson, B., & van Zwol, R. (2008). Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th International World Wide Web Conference (WWW 2008), Beijing, China. Sigurbjörnsson, B., & van Zwol, R. (2008). Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th International World Wide Web Conference (WWW 2008), Beijing, China.
go back to reference Smucker, M. D., & Allan, J. (2005). An investigation of dirichlet prior smoothing’s performance advantage. Technical Report CIIR Technical Report IR-548, The Center for Intelligent Information Retrieval, The University of Massachusetts. Smucker, M. D., & Allan, J. (2005). An investigation of dirichlet prior smoothing’s performance advantage. Technical Report CIIR Technical Report IR-548, The Center for Intelligent Information Retrieval, The University of Massachusetts.
go back to reference Toyama, K., Logan, R., & Roseway, A. (2003). Geographic location tags on digital images. In Proceedings of the Eleventh ACM International Conference on Multimedia (MULTIMEDIA ’03), pp. 156–166. Toyama, K., Logan, R., & Roseway, A. (2003). Geographic location tags on digital images. In Proceedings of the Eleventh ACM International Conference on Multimedia (MULTIMEDIA ’03), pp. 156–166.
go back to reference Vadrevu, S., Zhang, Y., Tseng, B., Sun, G., & Li, X. (2008). Identifying regional sensitive queries in web search. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08). Vadrevu, S., Zhang, Y., Tseng, B., Sun, G., & Li, X. (2008). Identifying regional sensitive queries in web search. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08).
go back to reference van House, N. (2007). Flickr and public image-sharing: Distance closeness and photo exhibition. In Extended Abstracts CHI. van House, N. (2007). Flickr and public image-sharing: Distance closeness and photo exhibition. In Extended Abstracts CHI.
go back to reference Vincenty, T. (1975). Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey Review, 23(176), 88–93. Vincenty, T. (1975). Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey Review, 23(176), 88–93.
go back to reference Wang, C., Wang, J., Xie, X., & Ma, W.-Y. (2007). Mining geographic knowledge using location aware topic model. In Proceedings of the 4th ACM Workshop On Geographic Information Retrieval (GIR ’07). Wang, C., Wang, J., Xie, X., & Ma, W.-Y. (2007). Mining geographic knowledge using location aware topic model. In Proceedings of the 4th ACM Workshop On Geographic Information Retrieval (GIR ’07).
go back to reference Westerveld, T., de Vries, A. P., & van Ballegooij, A. R. (2003). CWI at the TREC-2002 video track. In NIST Special Publication: SP 500-251: The Eleventh Text REtrieval Conference (TREC 2002), pp. 207–216. Westerveld, T., de Vries, A. P., & van Ballegooij, A. R. (2003). CWI at the TREC-2002 video track. In NIST Special Publication: SP 500-251: The Eleventh Text REtrieval Conference (TREC 2002), pp. 207–216.
go back to reference Yi, X., Raghavan, H., & Leggetter, C. (2009). Discovering users’ specific geo intention in web search. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09) (pp. 481–490). New York, NY, USA. Yi, X., Raghavan, H., & Leggetter, C. (2009). Discovering users’ specific geo intention in web search. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09) (pp. 481–490). New York, NY, USA.
go back to reference Zhuang, Z., Brunk, C., & Giles, C. L. (2008). Modeling and visualizing geosensitive queries based on user clicks. In First International Workshop on Location and the Web (LocWeb ’08). Zhuang, Z., Brunk, C., & Giles, C. L. (2008). Modeling and visualizing geosensitive queries based on user clicks. In First International Workshop on Location and the Web (LocWeb ’08).
go back to reference Zong, W., Wu, D., Sun, A., Lim, E.-P., & Goh, D. H.-L. (2005). On assigning place names to geography related web pages. In Proceedings of the Joint Conference on Digital Libraries (JCDL ’05), pp. 354–362. Zong, W., Wu, D., Sun, A., Lim, E.-P., & Goh, D. H.-L. (2005). On assigning place names to geography related web pages. In Proceedings of the Joint Conference on Digital Libraries (JCDL ’05), pp. 354–362.
Metadata
Title
Modeling locations with social media
Authors
Neil O’Hare
Vanessa Murdock
Publication date
01-02-2013
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 1/2013
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-012-9195-y

Premium Partner