skip to main content
10.1145/1871437.1871535acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

You are where you tweet: a content-based approach to geo-locating twitter users

Published:26 October 2010Publication History

ABSTRACT

We propose and evaluate a probabilistic framework for estimating a Twitter user's city-level location based purely on the content of the user's tweets, even in the absence of any other geospatial cues. By augmenting the massive human-powered sensing capabilities of Twitter and related microblogging services with content-derived location information, this framework can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on. Three of the key features of the proposed approach are: (i) its reliance purely on tweet content, meaning no need for user IP information, private login information, or external knowledge bases; (ii) a classification component for automatically identifying words in tweets with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a user's location estimate. The system estimates k possible locations for each user in descending order of confidence. On average we find that the location estimates converge quickly (needing just 100s of tweets), placing 51% of Twitter users within 100 miles of their actual location.

References

  1. Census 2000 u.s. gazetteer. http://www.census.gov/geo/www/gazetteer/places2k.html.Google ScholarGoogle Scholar
  2. Kevin's word list. http://wordlist.sourceforge.net.Google ScholarGoogle Scholar
  3. The local business owner's guide to twitter. http://domusconsultinggroup.com/wpcontent/uploads/2009/06/090624-twitter-ebook.pdf.Google ScholarGoogle Scholar
  4. New data on twitter's users and engagement. http://themetricsystem.rjmetrics.com/2010/01/26/newdata-on-twitters-users-and-engagement/.Google ScholarGoogle Scholar
  5. Twitter4j open-source library. http://yusuke.homeip.net/twitter4j/en/index.html.Google ScholarGoogle Scholar
  6. Twitter's open api. http://apiwiki.twitter.com.Google ScholarGoogle Scholar
  7. E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: geotagging web content. In SIGIR, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Backstrom, J. Kleinberg, R. Kumar, and J. Novak. Spatial variation in search engine queries. In WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Backstrom, E. Sun, and C. Marlow. Find me if you can: improving geographical prediction with social and spatial proximity. In WWW, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In WWW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Fink, C. Piatko, J. Mayfield, T. Finin, and J. Martineau. Geolocating blogs from their textual content. In AAAI 2009 Spring Symposia on Social Semantic Web: Where Web 2.0 Meets Web 3.0, 2009.Google ScholarGoogle Scholar
  12. R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. Social network classification incorporating link type. In IEEE Intelligence and Security Informatics, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hurst, M. Siegler, and N. Glance. On estimating the geographic distribution of social media. In ICWSM, 2007.Google ScholarGoogle Scholar
  14. K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: Social honeypots + machine learning. In SIGIR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Lin and A. Halavais. Mapping the blogosphere in america. In Workshop on the Weblogging Ecosystem at the 13th International World Wide Web Conference, 2004.Google ScholarGoogle Scholar
  16. J. Lindamood, R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. Inferring private information using social network data. In WWW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C: The Art of Scientific Computing. 1986.Google ScholarGoogle Scholar
  18. T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Serdyukov, V. Murdock, and R. van Zwol. Placing flickr photos on a map. In SIGIR, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition). Morgan Kaufmann, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Yardi and D. Boyd. Tweeting from the town square: Measuring geographic local networks. In ICWSM, 2010.Google ScholarGoogle Scholar
  22. W. Zong, D. Wu, A. Sun, E.-P. Lim, and D. H.-L. Goh. On assigning place names to geography related web pages. In JCDL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. You are where you tweet: a content-based approach to geo-locating twitter users

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
        October 2010
        2036 pages
        ISBN:9781450300995
        DOI:10.1145/1871437

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 October 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader