ABSTRACT
In this paper, we spot topically anomalous tweets in twitter streams by analyzing the content of the document pointed to by the URLs in the tweets in preference to their textual content. Existing approaches to anomaly detection ignore such URLs thereby missing opportunities to detect off-topic tweets. Specifically, we determine the divergence of claimed topic of a tweet as reflected by the hashtags and the actual topic as reflected by the referenced document content. Our approach avoids the need for labeled samples by selecting documents from reliable sources gleaned from the URLs present in the tweets. These documents are used for comparison against documents associated with unknown URLs in incoming tweets improving reliability, scalability and adaptability to rapidly changing topics. We evaluate our approach on three events and show that it can find topical inconsistencies not detectable by existing approaches.
- Becker, H., Naaman, M., and Gravano, L. Selecting quality twitter content for events. In Fifth International AAAI Conference on Weblogs and Social Media (2011).Google Scholar
- Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V. Detecting spammers on twitter. In Proceedings of the 7th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) (2010).Google Scholar
- Dondio, P., Barrett, S., Weber, S., and Seigneur, J. Extracting trust from domain analysis: A case study on the wikipedia project. Autonomic and Trusted Computing (2006), 362--373. Google ScholarDigital Library
- Gayo-Avello, D., and Brenes, D. Overcoming spammers in twitter-a tale of five algorithms. In 1st Spanish Conference on Information Retrieval, Madrid, Spain (2010).Google Scholar
- Kumaran, G., and Allan, J. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM (2004), 297--304. Google ScholarDigital Library
- Liu, H., Lim, E., Lauw, H., Le, M., Sun, A., Srivastava, J., and Kim, Y. Predicting trusts among users of online communities: an epinions case study. In Proceedings of the 9th ACM Conference on Electronic Commerce, ACM (2008), 310--319. Google ScholarDigital Library
- Mustafaraj, E., and Metaxas, P. From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search. In Proceedings of the WebSci10: Extending the Frontiers of Society On-Line (Apr. 2010).Google Scholar
- Ratkiewicz, J., Conover, M., Meiss, M., Gonçalves, B., Patil, S., Flammini, A., and Menczer, F. Detecting and tracking the spread of astroturf memes in microblog streams. Arxiv preprint arXiv:1011.3768 (2010).Google Scholar
- Toma, C. L., and Hancock, J. T. Reading between the lines: linguistic cues to deception in online dating profiles. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, CSCW '10, ACM (New York, NY, USA, 2010), 5--8. Google ScholarDigital Library
- Wang, A. Don't follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, IEEE (2010), 1--10.Google Scholar
- Yardi, S., Romero, D. M., Schoenebeck, G., and Boyd, D. Detecting spam in a twitter network. First Monday 15, 1 (2010).Google Scholar
Index Terms
- Topical anomaly detection from Twitter stream
Recommendations
On-line relevant anomaly detection in the Twitter stream: an efficient bursty keyword detection model
ODD '13: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and DescriptionOn-line social networks have become a massive communication and information channel for users world-wide. In particular, the microblogging platform Twitter, is characterized by short-text message exchanges at extremely high rates. In this type of ...
Fake Twitter followers detection by denoising autoencoder
WI '17: Proceedings of the International Conference on Web IntelligenceGaining followers on the Twitter platform has become a rapid way to increase one's credibility on this social network, that in the last few years has become a launch pad for new trends and to influence people opinions. So, many people have begun to buy ...
Identifying interesting Twitter contents using topical analysis
Social media platforms such as Twitter are becoming increasingly mainstream which provides valuable user-generated information by publishing and sharing contents. Identifying interesting and useful contents from large text-streams is a crucial issue in ...
Comments