Top

Discover Computing

Published in:

01-02-2013

Modeling locations with social media

Authors: Neil O’Hare, Vanessa Murdock

Published in: Discover Computing | Issue 1/2013

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper we focus on the locations explicit and implicit in users descriptions of their surroundings. We propose a statistical language modeling approach to identifying locations in arbitrary text, and investigate several ways to estimate the models, based on the term frequency and the user frequency. The geotagged public photos in Flickr serve as a convenient ground truth. Our results show that we can predict location within a one kilometer by one kilometer cell with 17 % accuracy, and within a three kilometer radius around such a one kilometer cell with 40 % accuracy, using only a photo’s tags. This is significantly better than the state of the art. Further we examine several estimation strategies that leverage the physical proximity of places, and show that for sparsely represented locations, smoothing from the immediate neighborhood improves results. We also show that estimation strategies based on user frequency are much more reliable than approaches based on the raw term frequency.

previous article Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs

next article Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

http://www.flickr.com visited March 2011.

http://www.geonames.org/ visited January 2012.

http://developer.yahoo.com/geo/placemaker/ visited January 2012.

http://www.ordnancesurvey.co.uk/oswebsite/ visited January 2012.

http://www.navteq.com/ visited January 2012.

http://www.teleatlas.com visited January 2012.

http://www.openstreetmap.org/ visited January 2012.

Note that Flickr has a public API which allows members of the research community to download metadata and images from the public photos of users. http://www.flickr.com/services/api/ visited January 2012.

We do not present the complete set of results for the small dataset for all hierarchical smoothing approaches here (for brevity), but the relative performance of the different approaches is the similar to those in Table 4.

Personal communication with the creators of the CoPhIR dataset. The removed archive was sapir_id_1_xml_r.tgz.

http://www.gettyimages.com/ January 2012.

http://www.facebook.com visited January 2012.

http://www.twitter.com visited January 2012.

http://developer.yahoo.com/geo/placemaker visited January 2012.

Ahern, S., Naaman, M., Nair, R., & Yang, J. H.-I. (2007). World Explorer: Visualizing aggregate data from unstructured text in geo-referenced collections. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’07), pp. 1–10.

Amitay, E., Har’El, N., Sivan, R., & Soffer, A. (2004). Web-a-where: Geotagging web content. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04), pp. 273–280.

Backstrom, L., Kleinberg, J., Kumar, R., & Novak, J. (2008). Spatial variation in search engine queries. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08), pp. 357–366.

Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., & Rabitti, F. (2009). CoPhIR: A test collection for content-based image retrieval. CoRR, abs/0905.4627v2.

Chen, L., Hu, B.-G., Zhang, L., Li, M., & Zhang, H. (2003). Face annotation for family photo album management. International Journal of Image and Graphics, 3(1), 81–94.CrossRef

Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM ’10), pp. 759–768.

Clements, M., Serdyukov, P., de Vries, A. P., & Reinders, M. J. T. (2010). Finding wormholes with flickr geotags. In Proceedings of the 32nd European Conference on Advances in Information Retrieval (ECIR ’10), pp. 658–661.

Crandall, D. J., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09), pp. 761–770.

Ding, J., Gravano, L., & Shivakumar, N. (2000). Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00), pp. 545–556.

Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2010). A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP ’10), pp. 1277–1287.

Hays, J., & Efros, A. A. (2008). im2gps: Estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’08).

Hiemstra, D. (1998). A linguistically motivated probabilistic model of information retrieval. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL ’98) (pp. 569–584). London: Springer-Verlag.

Hollenstein, L., & Purves, R. (2010). Exploring place through user-generated content: Using flickr to describe city cores. Journal of Spatial Information Science, (1).

Jones, C. B., Purves, R. S., Clough, P. D., & Joho, H. (2008a). Modelling vague places with knowldge from the web. International Journal of Geographical Information Science, 22(10), 1045–1065.CrossRef

Jones, R., Zhang, W., Rey, B., Jhala, P., & Stipp, E. (2008b). Geographic intention and modification in web search. International Journal of Geographical Information Science, 22(3), 229–246.CrossRef

Kantor, P. B., & Voorhees, E. M. (1996). Report on the trec-5 confusion track. In NIST Special Publication 500-238: The Fifth Text REtrieval Conference (TREC-5), pp. 65–74.

Kennedy, L., Naaman, M., Ahern, S., Nair, R., & Rattenbury, T. (2007). How flickr helps us make sense of the world: Context and content in community-contributed media collections. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA ’07), pp. 631–640.

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRef

Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: The MIT Press.MATH

Mc Donald, K., & Smeaton, A. F. (2005). A comparison of score, rank and probability-based fusion methods for video shot retrieval. In Proceedings of the International Conference on Image and Video Retrieval (CIVR 2005), pp. 61–70.

Mei, Q., Liu, C., Su, H., & Zhai, C. (2006). A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the 15th International Conference on the World Wide Web (WWW ’06).

Moxley, E., Kleban, J., & Manjunath, B. S. (2008). Spirittagger: A geo-aware tag suggestion tool mined from flickr. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR ’08), pp. 24–30.

Murdock, V. (2006). Aspects of Sentence Retrieval. PhD thesis, University of Massachusetts.

Naaman, M., Paepcke, A., & Garcia-Molina, H. (2003). From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proceedings of the 10th International Conference on Cooperative Information Systems (COOPIS 2003).

Nov, O., Naaman, M., & Ye, C. (2010). Analysis of participation in an online photo-sharing community: A multidimensional perspective. Journal of the American Society for Information Science and Technology, 61(3).

O’Hare, N., & Smeaton, A. F. (2009). Context-aware person identification in personal photo collections. IEEE Transactions on Multimedia, Special Issue on Integration of Context and Content for Multimedia Management, 11(2), 220–228.CrossRef

Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’98), pp. 275–281.

Rattenbury, T., Good, N., & Naaman, M. (2007). Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’07).

Serdyukov, P., Murdock, V., & van Zwol, R. (2009). Placing flickr photos on a map. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’09) (pp. 484–491). ACM.

Sigurbjörnsson, B., & van Zwol, R. (2008). Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th International World Wide Web Conference (WWW 2008), Beijing, China.

Smucker, M. D., & Allan, J. (2005). An investigation of dirichlet prior smoothing’s performance advantage. Technical Report CIIR Technical Report IR-548, The Center for Intelligent Information Retrieval, The University of Massachusetts.

Toyama, K., Logan, R., & Roseway, A. (2003). Geographic location tags on digital images. In Proceedings of the Eleventh ACM International Conference on Multimedia (MULTIMEDIA ’03), pp. 156–166.

Vadrevu, S., Zhang, Y., Tseng, B., Sun, G., & Li, X. (2008). Identifying regional sensitive queries in web search. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08).

van House, N. (2007). Flickr and public image-sharing: Distance closeness and photo exhibition. In Extended Abstracts CHI.

Vincenty, T. (1975). Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey Review, 23(176), 88–93.

Wang, C., Wang, J., Xie, X., & Ma, W.-Y. (2007). Mining geographic knowledge using location aware topic model. In Proceedings of the 4th ACM Workshop On Geographic Information Retrieval (GIR ’07).

Westerveld, T., de Vries, A. P., & van Ballegooij, A. R. (2003). CWI at the TREC-2002 video track. In NIST Special Publication: SP 500-251: The Eleventh Text REtrieval Conference (TREC 2002), pp. 207–216.

Yi, X., Raghavan, H., & Leggetter, C. (2009). Discovering users’ specific geo intention in web search. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09) (pp. 481–490). New York, NY, USA.

Zhuang, Z., Brunk, C., & Giles, C. L. (2008). Modeling and visualizing geosensitive queries based on user clicks. In First International Workshop on Location and the Web (LocWeb ’08).

Zong, W., Wu, D., Sun, A., Lim, E.-P., & Goh, D. H.-L. (2005). On assigning place names to geography related web pages. In Proceedings of the Joint Conference on Digital Libraries (JCDL ’05), pp. 354–362.

Title: Modeling locations with social media
Authors: Neil O’Hare
Vanessa Murdock
Publication date: 01-02-2013
Publisher: Springer Netherlands
Published in: Discover Computing / Issue 1/2013
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI: https://doi.org/10.1007/s10791-012-9195-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner