ABSTRACT
We propose and evaluate a probabilistic framework for estimating a Twitter user's city-level location based purely on the content of the user's tweets, even in the absence of any other geospatial cues. By augmenting the massive human-powered sensing capabilities of Twitter and related microblogging services with content-derived location information, this framework can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on. Three of the key features of the proposed approach are: (i) its reliance purely on tweet content, meaning no need for user IP information, private login information, or external knowledge bases; (ii) a classification component for automatically identifying words in tweets with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a user's location estimate. The system estimates k possible locations for each user in descending order of confidence. On average we find that the location estimates converge quickly (needing just 100s of tweets), placing 51% of Twitter users within 100 miles of their actual location.
- Census 2000 u.s. gazetteer. http://www.census.gov/geo/www/gazetteer/places2k.html.Google Scholar
- Kevin's word list. http://wordlist.sourceforge.net.Google Scholar
- The local business owner's guide to twitter. http://domusconsultinggroup.com/wpcontent/uploads/2009/06/090624-twitter-ebook.pdf.Google Scholar
- New data on twitter's users and engagement. http://themetricsystem.rjmetrics.com/2010/01/26/newdata-on-twitters-users-and-engagement/.Google Scholar
- Twitter4j open-source library. http://yusuke.homeip.net/twitter4j/en/index.html.Google Scholar
- Twitter's open api. http://apiwiki.twitter.com.Google Scholar
- E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: geotagging web content. In SIGIR, 2004. Google ScholarDigital Library
- L. Backstrom, J. Kleinberg, R. Kumar, and J. Novak. Spatial variation in search engine queries. In WWW, 2008. Google ScholarDigital Library
- L. Backstrom, E. Sun, and C. Marlow. Find me if you can: improving geographical prediction with social and spatial proximity. In WWW, 2010. Google ScholarDigital Library
- D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In WWW, 2009. Google ScholarDigital Library
- C. Fink, C. Piatko, J. Mayfield, T. Finin, and J. Martineau. Geolocating blogs from their textual content. In AAAI 2009 Spring Symposia on Social Semantic Web: Where Web 2.0 Meets Web 3.0, 2009.Google Scholar
- R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. Social network classification incorporating link type. In IEEE Intelligence and Security Informatics, 2009. Google ScholarDigital Library
- M. Hurst, M. Siegler, and N. Glance. On estimating the geographic distribution of social media. In ICWSM, 2007.Google Scholar
- K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: Social honeypots + machine learning. In SIGIR, 2010. Google ScholarDigital Library
- J. Lin and A. Halavais. Mapping the blogosphere in america. In Workshop on the Weblogging Ecosystem at the 13th International World Wide Web Conference, 2004.Google Scholar
- J. Lindamood, R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. Inferring private information using social network data. In WWW, 2009. Google ScholarDigital Library
- W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C: The Art of Scientific Computing. 1986.Google Scholar
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW, 2010. Google ScholarDigital Library
- P. Serdyukov, V. Murdock, and R. van Zwol. Placing flickr photos on a map. In SIGIR, 2009. Google ScholarDigital Library
- I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition). Morgan Kaufmann, June 2005. Google ScholarDigital Library
- S. Yardi and D. Boyd. Tweeting from the town square: Measuring geographic local networks. In ICWSM, 2010.Google Scholar
- W. Zong, D. Wu, A. Sun, E.-P. Lim, and D. H.-L. Goh. On assigning place names to geography related web pages. In JCDL, 2005. Google ScholarDigital Library
Index Terms
- You are where you tweet: a content-based approach to geo-locating twitter users
Recommendations
The where in the tweet
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementTwitter is a widely-used social networking service which enables its users to post text-based messages, so-called tweets. POI tags on tweets can show more human-readable high-level information about a place rather than just a pair of coordinates. In ...
A content-driven framework for geolocating microblog users
Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in contextHighly dynamic real-time microblog systems have already published petabytes of real-time human sensor data in the form of status updates. However, the lack of user adoption of geo-based features per user or per post signals that the promise of microblog ...
Predicting Arabic Tweet Popularity by Use of Data and Text Mining Techniques
MEDES '14: Proceedings of the 6th International Conference on Management of Emergent Digital EcoSystemsAmong the emerging "social media" or "social networks" applications facilitating communication between individuals, such as YouTube, LinkedIn, and Facebook, Twitter has become one of the most-used of these applications in Arab countries. Twitter works ...
Comments