research-article

Detecting spammers and content promoters in online video social networks

Authors:
Fabrício Benevenuto

Federal University of Minas Gerais, Belo Horizonte, Brazil

Federal University of Minas Gerais, Belo Horizonte, Brazil
View Profile

,
Tiago Rodrigues

Federal University of Minas Gerais, Belo Horizonte, Brazil

Federal University of Minas Gerais, Belo Horizonte, Brazil
View Profile

,
Virgílio Almeida

Federal University of Minas Gerais, Belo Horizonte, Brazil

Federal University of Minas Gerais, Belo Horizonte, Brazil
View Profile

,
Jussara Almeida

Federal University of Minas Gerais, Belo Horizonte, Brazil

Federal University of Minas Gerais, Belo Horizonte, Brazil
View Profile

,
Marcos Gonçalves

Federal University of Minas Gerais, Belo Horizonte, Brazil

Federal University of Minas Gerais, Belo Horizonte, Brazil
View Profile

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalJuly 2009Pages 620–627https://doi.org/10.1145/1571941.1572047

Published:19 July 2009Publication History

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pages 620–627

ABSTRACT

A number of online video social networks, out of which YouTube is the most popular, provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content, or simply pollution, into the system. For instance, spammers may post an unrelated video as response to a popular one aiming at increasing the likelihood of the response being viewed by a larger number of users. Moreover, opportunistic users--promoters--may try to gain visibility to a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the responded video, making it appear in the top lists maintained by the system. Content pollution may jeopardize the trust of users on the system, thus compromising its success in promoting social interactions. In spite of that, the available literature is very limited in providing a deep understanding of this problem.

In this paper, we go a step further by addressing the issue of detecting video spammers and promoters. Towards that end, we manually build a test collection of real YouTube users, classifying them as spammers, promoters, and legitimates. Using our test collection, we provide a characterization of social and content attributes that may help distinguish each user class. We also investigate the feasibility of using a state-of-the-art supervised classification algorithm to detect spammers and promoters, and assess its effectiveness in our test collection. We found that our approach is able to correctly identify the majority of the promoters, misclassifying only a small percentage of legitimate users. In contrast, although we are able to detect a significant fraction of spammers, they showed to be much harder to distinguish from legitimate users.

References

comscore: Americans viewed 12 billion videos online in may 2008. http://www.comscore.com/press/release.asp?press=2324.Google Scholar
The new york times: Search ads come to youtube. http://bits.blogs.nytimes.com/2008/10/13/search-ads-come-to-youtube.Google Scholar
Youtube fact sheet. http://www.youtube.com/t/fact_sheet.Google Scholar
Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristics of huge online social networking services. In Int'l World Wide Web Conference (WWW), 2007. Google ScholarDigital Library
F. Benevenuto, F. Duarte, T. Rodrigues, V. Almeida, J. Almeida, and K. Ross. Understanding video interactions in youtube. In ACM Multimedia (MM), 2008. Google ScholarDigital Library
F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, C. Zhang, and K. Ross. Identifying video spammers in online social networks. In Int'l Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2008. Google ScholarDigital Library
S. Boll. Multitube--where web 2.0 and multimedia could meet. IEEE MultiMedia, 14, 2007. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Int'l World Wide Web Conference (WWW), 1998. Google ScholarDigital Library
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In Int'l ACM SIGIR, 2007. Google ScholarDigital Library
M. Cha, H. Kwak, P. Rodriguez, Y. Ahn, and S. Moon. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. In Internet Measurement Conference (IMC), 2007. Google ScholarDigital Library
F. Douglis. On social networking and communication paradigms. IEEE Internet Computing, 12, 2008. Google ScholarDigital Library
R. Fan, P. Chen, and C. Lin. Working set selection using the second order information for training svm. Journal of Machine Learning Research (JMLR), 6, 2005. Google ScholarDigital Library
D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Int'l Workshop on the Web and Databases (WebDB), 2004. Google ScholarDigital Library
P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: A view from the edge. In Internet Measurement Conference (IMC), 2007. Google ScholarDigital Library
L. Gomes, J. Almeida, V. Almeida, and W. Meira. Workload models of spam and legitimate e-mails. Performance Evaluation, 64, 2007. Google ScholarDigital Library
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Int'l. Conference on Very Large Data Bases (VLDB), 2004. Google ScholarDigital Library
P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11, 2007. Google ScholarDigital Library
C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, volume 13, 2002. Google ScholarDigital Library
A. Jain, M. Murty, and P. Flynn. Data clustering: a review. ACM Computing Surveys, 31, 1999. Google ScholarDigital Library
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML), 1998. Google ScholarDigital Library
S. Kamvar, M. Schlosser, and H. Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. In Int'l World Wide Web Conference (WWW), 2003. Google ScholarDigital Library
R. Kohavi and F. Provost. Glossary of terms. Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, Machine Learning, 30, 1998.Google Scholar
G. Koutrika, F. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In Int'l Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2007. Google ScholarDigital Library
A. Langville and C. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006. Google ScholarDigital Library
Y. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. Tseng. Detecting splogs via temporal dynamics using self-similarity analysis. ACM Transactions on the Web (TWeb), 2, 2008. Google ScholarDigital Library
A. Mislove, M. Marcon, K. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Internet Measurement Conference (IMC), 2007. Google ScholarDigital Library
K. Morik, P. Brockhausen, and T. Joachims. Combining statistical learning with a knowledge-based approach--a case study in intensive care monitoring. In Int'l Conference on Machine Learning (ICML), 1999. Google ScholarDigital Library
M. Newman and J. Park. Why social networks are different from other types of networks. Phys. Rev. E, 68, 2003.Google Scholar
A. Thomason. Blog spam: A review. In Conference on Email and Anti-Spam (CEAS), 2007.Google Scholar
G. Weiss and F. Provost. The effect of class distribution on classifier learning: An empirical study. Technical report, 2001.Google Scholar
C. Wu, K. Cheng, Q. Zhu, and Y. Wu. Using visual features for anti-spam filtering. In IEEE Int'l Conference on Image Processing (ICIP), 2005.Google Scholar
Y. Xie, F. Yu, K. Achan, R. Panigrahy, G. Hulten, and I. Osipkov. Spamming botnets: Signatures and characteristics. In ACM SIGCOMM, 2008. Google ScholarDigital Library
Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrival, 1, 1999. Google ScholarDigital Library
Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In Int'l Conference on Machine Learning (ICML), 1997. Google ScholarDigital Library

Index Terms

Detecting spammers and content promoters in online video social networks
1. Information systems
  1. World Wide Web
    1. Web applications
    2. Web services

Recommendations

Identifying video spammers in online social networks
AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web

In many video social networks, including YouTube, users are permitted to post video responses to other users' videos. Such a response can be legitimate or can be a video response spam, which is a video response whose content is not related to the topic ...
Read More
Detecting spammers on social networks
ACSAC '10: Proceedings of the 26th Annual Computer Security Applications Conference

Social networking has become a popular way for users to meet and interact online. Users spend a significant amount of time on popular social network platforms (such as Facebook, MySpace, or Twitter), storing and sharing a wealth of personal information. ...
Read More
Detecting spammers and content promoters in online video social networks
INFOCOM'09: Proceedings of the 28th IEEE international conference on Computer Communications Workshops

Online video social networks provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content into the system. For instance, spammers may post an unrelated ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
July 2009
896 pages
ISBN:9781605584836
DOI:10.1145/1571941
General Chairs:
James Allan
University of Massachusetts Amherst, USA
,
Javed Aslam
Northeastern University, USA
,
Program Chairs:
Mark Sanderson
University of Sheffield, UK
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Justin Zobel
University of Melbourne, Australia
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
promoter
social media
social networks
spammer
video promotion
video response
video spam
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 96
  Total Citations
  View Citations
- 1,332
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Detecting spammers and content promoters in online video social networks

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identifying video spammers in online social networks

Detecting spammers on social networks

Detecting spammers and content promoters in online video social networks