ABSTRACT
A number of online video social networks, out of which YouTube is the most popular, provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content, or simply pollution, into the system. For instance, spammers may post an unrelated video as response to a popular one aiming at increasing the likelihood of the response being viewed by a larger number of users. Moreover, opportunistic users--promoters--may try to gain visibility to a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the responded video, making it appear in the top lists maintained by the system. Content pollution may jeopardize the trust of users on the system, thus compromising its success in promoting social interactions. In spite of that, the available literature is very limited in providing a deep understanding of this problem.
In this paper, we go a step further by addressing the issue of detecting video spammers and promoters. Towards that end, we manually build a test collection of real YouTube users, classifying them as spammers, promoters, and legitimates. Using our test collection, we provide a characterization of social and content attributes that may help distinguish each user class. We also investigate the feasibility of using a state-of-the-art supervised classification algorithm to detect spammers and promoters, and assess its effectiveness in our test collection. We found that our approach is able to correctly identify the majority of the promoters, misclassifying only a small percentage of legitimate users. In contrast, although we are able to detect a significant fraction of spammers, they showed to be much harder to distinguish from legitimate users.
- comscore: Americans viewed 12 billion videos online in may 2008. http://www.comscore.com/press/release.asp?press=2324.Google Scholar
- The new york times: Search ads come to youtube. http://bits.blogs.nytimes.com/2008/10/13/search-ads-come-to-youtube.Google Scholar
- Youtube fact sheet. http://www.youtube.com/t/fact_sheet.Google Scholar
- Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristics of huge online social networking services. In Int'l World Wide Web Conference (WWW), 2007. Google ScholarDigital Library
- F. Benevenuto, F. Duarte, T. Rodrigues, V. Almeida, J. Almeida, and K. Ross. Understanding video interactions in youtube. In ACM Multimedia (MM), 2008. Google ScholarDigital Library
- F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, C. Zhang, and K. Ross. Identifying video spammers in online social networks. In Int'l Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2008. Google ScholarDigital Library
- S. Boll. Multitube--where web 2.0 and multimedia could meet. IEEE MultiMedia, 14, 2007. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Int'l World Wide Web Conference (WWW), 1998. Google ScholarDigital Library
- C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In Int'l ACM SIGIR, 2007. Google ScholarDigital Library
- M. Cha, H. Kwak, P. Rodriguez, Y. Ahn, and S. Moon. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. In Internet Measurement Conference (IMC), 2007. Google ScholarDigital Library
- F. Douglis. On social networking and communication paradigms. IEEE Internet Computing, 12, 2008. Google ScholarDigital Library
- R. Fan, P. Chen, and C. Lin. Working set selection using the second order information for training svm. Journal of Machine Learning Research (JMLR), 6, 2005. Google ScholarDigital Library
- D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Int'l Workshop on the Web and Databases (WebDB), 2004. Google ScholarDigital Library
- P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: A view from the edge. In Internet Measurement Conference (IMC), 2007. Google ScholarDigital Library
- L. Gomes, J. Almeida, V. Almeida, and W. Meira. Workload models of spam and legitimate e-mails. Performance Evaluation, 64, 2007. Google ScholarDigital Library
- Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Int'l. Conference on Very Large Data Bases (VLDB), 2004. Google ScholarDigital Library
- P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11, 2007. Google ScholarDigital Library
- C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, volume 13, 2002. Google ScholarDigital Library
- A. Jain, M. Murty, and P. Flynn. Data clustering: a review. ACM Computing Surveys, 31, 1999. Google ScholarDigital Library
- T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML), 1998. Google ScholarDigital Library
- S. Kamvar, M. Schlosser, and H. Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. In Int'l World Wide Web Conference (WWW), 2003. Google ScholarDigital Library
- R. Kohavi and F. Provost. Glossary of terms. Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, Machine Learning, 30, 1998.Google Scholar
- G. Koutrika, F. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In Int'l Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2007. Google ScholarDigital Library
- A. Langville and C. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006. Google ScholarDigital Library
- Y. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. Tseng. Detecting splogs via temporal dynamics using self-similarity analysis. ACM Transactions on the Web (TWeb), 2, 2008. Google ScholarDigital Library
- A. Mislove, M. Marcon, K. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Internet Measurement Conference (IMC), 2007. Google ScholarDigital Library
- K. Morik, P. Brockhausen, and T. Joachims. Combining statistical learning with a knowledge-based approach--a case study in intensive care monitoring. In Int'l Conference on Machine Learning (ICML), 1999. Google ScholarDigital Library
- M. Newman and J. Park. Why social networks are different from other types of networks. Phys. Rev. E, 68, 2003.Google Scholar
- A. Thomason. Blog spam: A review. In Conference on Email and Anti-Spam (CEAS), 2007.Google Scholar
- G. Weiss and F. Provost. The effect of class distribution on classifier learning: An empirical study. Technical report, 2001.Google Scholar
- C. Wu, K. Cheng, Q. Zhu, and Y. Wu. Using visual features for anti-spam filtering. In IEEE Int'l Conference on Image Processing (ICIP), 2005.Google Scholar
- Y. Xie, F. Yu, K. Achan, R. Panigrahy, G. Hulten, and I. Osipkov. Spamming botnets: Signatures and characteristics. In ACM SIGCOMM, 2008. Google ScholarDigital Library
- Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrival, 1, 1999. Google ScholarDigital Library
- Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In Int'l Conference on Machine Learning (ICML), 1997. Google ScholarDigital Library
Index Terms
- Detecting spammers and content promoters in online video social networks
Recommendations
Identifying video spammers in online social networks
AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the webIn many video social networks, including YouTube, users are permitted to post video responses to other users' videos. Such a response can be legitimate or can be a video response spam, which is a video response whose content is not related to the topic ...
Detecting spammers on social networks
ACSAC '10: Proceedings of the 26th Annual Computer Security Applications ConferenceSocial networking has become a popular way for users to meet and interact online. Users spend a significant amount of time on popular social network platforms (such as Facebook, MySpace, or Twitter), storing and sharing a wealth of personal information. ...
Detecting spammers and content promoters in online video social networks
INFOCOM'09: Proceedings of the 28th IEEE international conference on Computer Communications WorkshopsOnline video social networks provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content into the system. For instance, spammers may post an unrelated ...
Comments