skip to main content
10.1145/2380718.2380720acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

Topical anomaly detection from Twitter stream

Published:22 June 2012Publication History

ABSTRACT

In this paper, we spot topically anomalous tweets in twitter streams by analyzing the content of the document pointed to by the URLs in the tweets in preference to their textual content. Existing approaches to anomaly detection ignore such URLs thereby missing opportunities to detect off-topic tweets. Specifically, we determine the divergence of claimed topic of a tweet as reflected by the hashtags and the actual topic as reflected by the referenced document content. Our approach avoids the need for labeled samples by selecting documents from reliable sources gleaned from the URLs present in the tweets. These documents are used for comparison against documents associated with unknown URLs in incoming tweets improving reliability, scalability and adaptability to rapidly changing topics. We evaluate our approach on three events and show that it can find topical inconsistencies not detectable by existing approaches.

References

  1. Becker, H., Naaman, M., and Gravano, L. Selecting quality twitter content for events. In Fifth International AAAI Conference on Weblogs and Social Media (2011).Google ScholarGoogle Scholar
  2. Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V. Detecting spammers on twitter. In Proceedings of the 7th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) (2010).Google ScholarGoogle Scholar
  3. Dondio, P., Barrett, S., Weber, S., and Seigneur, J. Extracting trust from domain analysis: A case study on the wikipedia project. Autonomic and Trusted Computing (2006), 362--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gayo-Avello, D., and Brenes, D. Overcoming spammers in twitter-a tale of five algorithms. In 1st Spanish Conference on Information Retrieval, Madrid, Spain (2010).Google ScholarGoogle Scholar
  5. Kumaran, G., and Allan, J. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM (2004), 297--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Liu, H., Lim, E., Lauw, H., Le, M., Sun, A., Srivastava, J., and Kim, Y. Predicting trusts among users of online communities: an epinions case study. In Proceedings of the 9th ACM Conference on Electronic Commerce, ACM (2008), 310--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mustafaraj, E., and Metaxas, P. From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search. In Proceedings of the WebSci10: Extending the Frontiers of Society On-Line (Apr. 2010).Google ScholarGoogle Scholar
  8. Ratkiewicz, J., Conover, M., Meiss, M., Gonçalves, B., Patil, S., Flammini, A., and Menczer, F. Detecting and tracking the spread of astroturf memes in microblog streams. Arxiv preprint arXiv:1011.3768 (2010).Google ScholarGoogle Scholar
  9. Toma, C. L., and Hancock, J. T. Reading between the lines: linguistic cues to deception in online dating profiles. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, CSCW '10, ACM (New York, NY, USA, 2010), 5--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wang, A. Don't follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, IEEE (2010), 1--10.Google ScholarGoogle Scholar
  11. Yardi, S., Romero, D. M., Schoenebeck, G., and Boyd, D. Detecting spam in a twitter network. First Monday 15, 1 (2010).Google ScholarGoogle Scholar

Index Terms

  1. Topical anomaly detection from Twitter stream

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WebSci '12: Proceedings of the 4th Annual ACM Web Science Conference
      June 2012
      531 pages
      ISBN:9781450312288
      DOI:10.1145/2380718

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 June 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate218of875submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader