skip to main content
10.1145/1571941.1572047acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Detecting spammers and content promoters in online video social networks

Authors Info & Claims
Published:19 July 2009Publication History

ABSTRACT

A number of online video social networks, out of which YouTube is the most popular, provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content, or simply pollution, into the system. For instance, spammers may post an unrelated video as response to a popular one aiming at increasing the likelihood of the response being viewed by a larger number of users. Moreover, opportunistic users--promoters--may try to gain visibility to a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the responded video, making it appear in the top lists maintained by the system. Content pollution may jeopardize the trust of users on the system, thus compromising its success in promoting social interactions. In spite of that, the available literature is very limited in providing a deep understanding of this problem.

In this paper, we go a step further by addressing the issue of detecting video spammers and promoters. Towards that end, we manually build a test collection of real YouTube users, classifying them as spammers, promoters, and legitimates. Using our test collection, we provide a characterization of social and content attributes that may help distinguish each user class. We also investigate the feasibility of using a state-of-the-art supervised classification algorithm to detect spammers and promoters, and assess its effectiveness in our test collection. We found that our approach is able to correctly identify the majority of the promoters, misclassifying only a small percentage of legitimate users. In contrast, although we are able to detect a significant fraction of spammers, they showed to be much harder to distinguish from legitimate users.

References

  1. comscore: Americans viewed 12 billion videos online in may 2008. http://www.comscore.com/press/release.asp?press=2324.Google ScholarGoogle Scholar
  2. The new york times: Search ads come to youtube. http://bits.blogs.nytimes.com/2008/10/13/search-ads-come-to-youtube.Google ScholarGoogle Scholar
  3. Youtube fact sheet. http://www.youtube.com/t/fact_sheet.Google ScholarGoogle Scholar
  4. Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristics of huge online social networking services. In Int'l World Wide Web Conference (WWW), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Benevenuto, F. Duarte, T. Rodrigues, V. Almeida, J. Almeida, and K. Ross. Understanding video interactions in youtube. In ACM Multimedia (MM), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, C. Zhang, and K. Ross. Identifying video spammers in online social networks. In Int'l Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Boll. Multitube--where web 2.0 and multimedia could meet. IEEE MultiMedia, 14, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Int'l World Wide Web Conference (WWW), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In Int'l ACM SIGIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Cha, H. Kwak, P. Rodriguez, Y. Ahn, and S. Moon. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. In Internet Measurement Conference (IMC), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Douglis. On social networking and communication paradigms. IEEE Internet Computing, 12, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Fan, P. Chen, and C. Lin. Working set selection using the second order information for training svm. Journal of Machine Learning Research (JMLR), 6, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Int'l Workshop on the Web and Databases (WebDB), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: A view from the edge. In Internet Measurement Conference (IMC), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Gomes, J. Almeida, V. Almeida, and W. Meira. Workload models of spam and legitimate e-mails. Performance Evaluation, 64, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Int'l. Conference on Very Large Data Bases (VLDB), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, volume 13, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Jain, M. Murty, and P. Flynn. Data clustering: a review. ACM Computing Surveys, 31, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Kamvar, M. Schlosser, and H. Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. In Int'l World Wide Web Conference (WWW), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Kohavi and F. Provost. Glossary of terms. Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, Machine Learning, 30, 1998.Google ScholarGoogle Scholar
  23. G. Koutrika, F. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In Int'l Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Langville and C. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. Tseng. Detecting splogs via temporal dynamics using self-similarity analysis. ACM Transactions on the Web (TWeb), 2, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Mislove, M. Marcon, K. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Internet Measurement Conference (IMC), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Morik, P. Brockhausen, and T. Joachims. Combining statistical learning with a knowledge-based approach--a case study in intensive care monitoring. In Int'l Conference on Machine Learning (ICML), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Newman and J. Park. Why social networks are different from other types of networks. Phys. Rev. E, 68, 2003.Google ScholarGoogle Scholar
  29. A. Thomason. Blog spam: A review. In Conference on Email and Anti-Spam (CEAS), 2007.Google ScholarGoogle Scholar
  30. G. Weiss and F. Provost. The effect of class distribution on classifier learning: An empirical study. Technical report, 2001.Google ScholarGoogle Scholar
  31. C. Wu, K. Cheng, Q. Zhu, and Y. Wu. Using visual features for anti-spam filtering. In IEEE Int'l Conference on Image Processing (ICIP), 2005.Google ScholarGoogle Scholar
  32. Y. Xie, F. Yu, K. Achan, R. Panigrahy, G. Hulten, and I. Osipkov. Spamming botnets: Signatures and characteristics. In ACM SIGCOMM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrival, 1, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In Int'l Conference on Machine Learning (ICML), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Detecting spammers and content promoters in online video social networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
        July 2009
        896 pages
        ISBN:9781605584836
        DOI:10.1145/1571941

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader