ABSTRACT
Social media such as Twitter generate large quantities of data about what a person is thinking and doing in a particular location. We leverage this data to build models of locations to improve our understanding of a user's geographic context. Understanding the user's geographic context can in turn enable a variety of services that allow us to present information, recommend businesses and services, and place advertisements that are relevant at a hyper-local level.
In this paper we create language models of locations using coordinates extracted from geotagged Twitter data. We model locations at varying levels of granularity, from the zip code to the country level. We measure the accuracy of these models by the degree to which we can predict the location of an individual tweet, and further by the accuracy with which we can predict the location of a user. We find that we can meet the performance of the industry standard tool for predicting both the tweet and the user at the country, state and city levels, and far exceed its performance at the hyper-local level, achieving a three- to ten-fold increase in accuracy at the zip code level.
- J. Allan, J. Callan, K. Collins-Thompson, W. B. Croft, F. Feng, D. Fisher, J. Lafferty, L. Larkey, T. N. Truong, P. Ogilvie, L. Si, T. Strohman, H. Turtle, and C. Zhai. The Lemur toolkit for language modeling and information retrieval, 2005. http://www.cs.cmu.edu/lemur.Google Scholar
- C. M. Anastasios Noulas, Salvatore Scellato and M. Pontil. An empirical study of geographic user activity patterns in Foursquare. In Proceedings of ICWSM, 2011.Google Scholar
- Z. Cheng, J. Caverlee, and K. Lee. You are where you tweet: A content-based approach to geo-locating Twitter users. In Proceedings of CIKM, 2010. Google ScholarDigital Library
- D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In Proceedings of WWW, 2009. Google ScholarDigital Library
- J. Eisenstein, B. O'Connor, N. A. Smith, and E. Xing. A latent variable model for geographic lexical variation. In Proceedings of EMNLP, 2010. Google ScholarDigital Library
- J. Hays and A. Efros. IM2GPS: Estimating geographic information from a single image. In Proceedings of CVPR, 2008.Google ScholarCross Ref
- B. Hecht, L. Hong, B. Suh, and E. Chi. Tweets from Justin Bieber's heart: The dynamics of the "location" field in user profiles. In Proceedings of CHI, 2011. Google ScholarDigital Library
- R. Jones, W. Zhang, B. Rey, P. Jhala, and E. Stipp. Geographic intention and modification in Web search. International Journal of Geographical Information Science, 22(3):229--246, 2008. Google ScholarDigital Library
- I. MaxMind. GeoIP City Accuracy for Selected Countries, May 2010. http://www.maxmind.com/app/city_accuracy.Google Scholar
- M. Naaman, J. Boase, and C.-H. Lai. Is it really about me? Message content in social awareness streams. In Proceedings of CSCW, 2010. Google ScholarDigital Library
- I. Ounis, C. Lioma, C. Macdonald, and V. Plachouras. Research directions in Terrier: a search engine for advanced retrieval on the Web. Novatica/UPGRADE Special Issue on Web Information Access, 2007.Google Scholar
- J. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, 1998. Google ScholarDigital Library
- P. Serdyukov, V. Murdock, and R. van Zwol. Placing Flickr photos on a map. In Proceedings of SIGIR, 2009. Google ScholarDigital Library
- B. Sigurbjörnsson and R. Van Zwol. Flickr tag recommendation based on collective knowledge. In Proceedings of WWW, 2008. Google ScholarDigital Library
- X. Yi, H. Raghavan, and C. Leggetter. Discovering users' specific geo intention in Web search. In Proceedings of WWW, 2009. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, 2001. Google ScholarDigital Library
Index Terms
- "I'm eating a sandwich in Glasgow": modeling locations with tweets
Recommendations
Fine-grained location extraction from tweets with temporal awareness
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalTwitter is a popular platform for sharing activities, plans, and opinions. Through tweets, users often reveal their location information and short term visiting plans. In this paper, we are interested in extracting fine-grained locations mentioned in ...
Uncovering the Location of Twitter Users
BRACIS '13: Proceedings of the 2013 Brazilian Conference on Intelligent SystemsSocial networks, like Twitter and Facebook, are valuable sources to monitor real-time events, such as earthquakes and epidemics. For this type of surveillance the user's location is an essential piece of information, but a substantial number of users ...
IP Geolocation Using Traceroute Location Propagation and IP Range Location Interpolation
WWW '21: Companion Proceedings of the Web Conference 2021Many online services, including search engines, content delivery networks, ad networks, and fraud detection utilize IP geolocation databases to map IP addresses to their physical locations. However, IP geolocation databases are often inaccurate. We ...
Comments