research-article

Campaign extraction from social media

Authors:
Kyumin Lee

Texas A&M University, College Station, TX

Texas A&M University, College Station, TX
View Profile

,
James Caverlee

Texas A&M University, College Station, TX

Texas A&M University, College Station, TX
View Profile

,
Zhiyuan Cheng

Texas A&M University, College Station, TX

Texas A&M University, College Station, TX
View Profile

,
Daniel Z. Sui

Ohio State University, College Station, TX

Ohio State University, College Station, TX
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 5 Issue 1Article No.: 9pp 1–28https://doi.org/10.1145/2542182.2542191

Published:03 January 2014Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

In this manuscript, we study the problem of detecting coordinated free text campaigns in large-scale social media. These campaigns—ranging from coordinated spam messages to promotional and advertising campaigns to political astro-turfing—are growing in significance and reach with the commensurate rise in massive-scale social systems. Specifically, we propose and evaluate a content-driven framework for effectively linking free text posts with common “talking points” and extracting campaigns from large-scale social media. Three of the salient features of the campaign extraction framework are: (i) first, we investigate graph mining techniques for isolating coherent campaigns from large message-based graphs; (ii) second, we conduct a comprehensive comparative study of text-based message correlation in message and user levels; and (iii) finally, we analyze temporal behaviors of various campaign types. Through an experimental study over millions of Twitter messages we identify five major types of campaigns—namely Spam, Promotion, Template, News, and Celebrity campaigns—and we show how these campaigns may be extracted with high precision and recall.

References

Apache. 2012. Hadoop. http://hadoop.apache.org/.Google Scholar
Becchetti, L., Castillo, C., Donato, D., Baeza-Yates, R., and Leonardi, S. 2008. Link analysis for web spam detection. ACM Trans. Web 2, 1, 1--42. Google ScholarDigital Library
Benczur, A. A., Csalogany, K., and Sarlos, T. 2006. Link-based similarity search to fight web spam. In Proceedings of the SIGIR Workshop on Adversarial Information Retrieval on the Web.Google Scholar
Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V. 2010. Detecting spammers on twitter. In Proceedings of the Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS'10).Google Scholar
Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., and Gonc Alves, M. 2009. Detecting spammers and content promoters in online video social networks. In Proceedings of the 32^nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 620--627. Google ScholarDigital Library
Bratko, A., Filipic, B., Cormack, G. V., Lynam, T. R., and Zupan, B. 2006. Spam filtering using statistical data compression models. J. Mach. Learn. Res. 7, 2673--2698. Google ScholarDigital Library
Broder, A. Z., Glassman, S. C., Manasse, M. S., and Zweig, G. 1997. Syntactic clustering of the web. Comput. Netw. ISDN Syst. 29, 8--13, 1157--1166. Google ScholarDigital Library
Castillo, C., Mendoza, M., and Poblete, B. 2011. Information credibility on twitter. In Proceedings of the 20^th International Conference on World Wide Web (WWW'11). 675--684. Google ScholarDigital Library
Caverlee, J., Liu, L., and Webb, S. 2008. Socialtrust: Tamper-resilient trust establishment in online communities. In Proceedings of the 8^th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'08). 104--114. Google ScholarDigital Library
Caverlee, J., Liu, L., and Webb, S. 2010. The socialtrust framework for trusted social information management: Architecture and algorithms. Inf. Sci. 180, 1, 95--112. Google ScholarDigital Library
Cctv. 2010. Uncovering online promotion. http://news.cntv.cn/china/20101107/102619.shtml.Google Scholar
Cheng, Z., Caverlee, J., and Lee, K. 2010. You are where you tweet: A content-based approach to geolocating twitter users. In Proceedings of the 19^th ACM International Conference on Information and Knowledge Management (CIKM'10). 759--768. Google ScholarDigital Library
Chowdhury, A., Frieder, O., Grossman, D., and McCabe, M. C. 2002. Collection statistics for fast duplicate document detection. ACM Trans. Inf. Syst. 20, 2, 171--191. Google ScholarDigital Library
Cialdini, R. B. 2007. Influence: The Psychology of Persuasion (Collins Business Essentials). Harper Paperbacks.Google Scholar
Cormack, G. V. 2008. Email spam filtering: A systematic review. Foundat. Trends Inf. Retr. 1, 335--455. Google ScholarDigital Library
Dean, J. and Ghemawat, S. 2004. Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6^th Conference on Operating Systems Design and Implementation (OSDI'04). Google ScholarDigital Library
Fetterly, D., Manasse, M., and Najork, M. 2004. Spam, damn spam, and statistics: using statistical analysis to locate spam web pages. In Proceedings of the Workshop on the Web and Databases. Google ScholarDigital Library
Films, L. 2011. (Astro) turf wars. www.astroturfwars.com.Google Scholar
Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., and Zhao, B. Y. 2010. Detecting and characterizing social spam campaigns. In Proceedings of the 10^th Annual Conference on Internet Measurement (IMC'10). Google ScholarDigital Library
Gibson, D., Kumar, R., and Tomkins, A. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31^st International Conference on Very Large Data Bases (VLDB'05). 721--732. Google ScholarDigital Library
Gilbert, I. and Henry, T. 2010. Persuasion detection in conversation. In Master's thesis, Naval Postgraduate School, Monterey, CA.Google Scholar
Grier, C., Thomas, K., Paxson, V., and Zhang, M. 2010. &commat;spam: The underground on 140 characters or less. In Proceedings of the 17^th ACM Conference on Computer and Communications Security (CCS'10). 27--37. Google ScholarDigital Library
Gyongyi, Z., Berkhin, P., Garcia-Molina, H., and Pedersen, J. 2006. Link spam detection based on mass estimation. In Proceedings of the 32^nd International Conference on Very Large Data Bases (VLDB'06). 439--450. Google ScholarDigital Library
Gyongyi, Z., Garcia-Molina, H., and Pedersen, J. 2004. Combating web spam with trustrank. In Proceedings of the 30^th International Conference on Very Large Data Bases (VLDB'04). 576--587. Google ScholarDigital Library
Hu, H., Yan, X., Huang, Y., Han, J., and Zhou, X. J. 2005. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinf. 21, 213--221. Google ScholarDigital Library
Hurley, N. J., O'Mahony, M. P., and Silvestre, G. C. M. 2007. Attacking recommender systems: A cost-benefit analysis. IEEE Intell. Syst. 22, 3, 64--68. Google ScholarDigital Library
Irani, D., Webb, S., Pu, C., and Li, K. 2010. Study of trend-stuffing on twitter through text classification. In Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS'10).Google Scholar
Koutrika, G., Effendi, F. A., Gyongyi, Z., Heymann, P., and Garcia-Molina, H. 2008. Combating spam in tagging systems: An evaluation. ACM Trans. Web 2, 4, 1--34. Google ScholarDigital Library
Lam, S. K. and Riedl, J. 2004. Shilling recommender systems for fun and profit. In Proceedings of the 13^th International Conference on World Wide Web (WWW'04). Google ScholarDigital Library
Lee, K., Caverlee, J., Cheng, Z., and Sui, D. Z. 2011a. Content-driven detection of campaigns in social media. In Proceedings of the 20^th ACM International Conference on Information and Knowledge Management (CIKM'11). 551--556. Google ScholarDigital Library
Lee, K., Caverlee, J., and Webb, S. 2010. Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33^rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 435--442. Google ScholarDigital Library
Lee, K., Eoff, B. D., and Caverlee, J. 2011b. Seven months with the devils: A long-term study of content polluters on twitter. In Proceedings of the 5^th AAAI International Conference on Weblogs and Social Media (ICWSM'11).Google Scholar
Levenshtein, V. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Phys. Doklady 10, 707.Google Scholar
Levien, R. and Aiken, A. 1998. Attack-resistant trust metrics for public key certification. In Proceedings of the 7^th USENIX Security Symposium. Google ScholarDigital Library
Lim, E. P., Nguyen, V. A., Jindal, N., Liu, B., and Lauw, H. W. 2010. Detecting product review spammers using rating behaviors. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'10). Google ScholarDigital Library
Manning, C. D., Raghavan, P., and Schtze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
Manning, C. D. and Schutze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press. Google ScholarDigital Library
Mehta, B. 2007. Unsupervised shilling detection for collaborative filtering. In Proceedings of the 22^nd National Conference on Artificial Intelligence (AAAI'07). Google ScholarDigital Library
Mehta, B., Hofmann, T., and Fankhauser, P. 2007. Lies and propaganda: Detecting spam users in collaborative filtering. In Proceedings of the 12^th International Conference on Intelligent User Interfaces (IUI'07). 14--21. Google ScholarDigital Library
Motoyama, M., McCoy, D., Levchenko, K., Savage, S., and Voelker, G. M. 2011. Dirty jobs: The role of freelance labor in web service abuse. In Proceedings of the 20^th USENIX Security Symposium. Google ScholarDigital Library
Mui, L., Mohtashemi, M., and Halberstadt, A. 2002. A computational model of trust and reputation for e-business. In Proceedings of the 35^th Annual Hawaii International Conference on System Sciences (HICSS'02). 188. Google ScholarDigital Library
Mukherjee, A., Liu, B., Wang, J., Glance, N., and Jindal, N. 2011. Detecting group review spam. In Proceedings of the 20^th International Conference Companion on World Wide Web (WWW'11). 93--94. Google ScholarDigital Library
Niennattrakul, V. and Ratanamahatana, C. A. 2007. Inaccuracies of shape averaging method using dynamic time warping for time series data. In Proceedings of the 7^th International Conference on Computational Science (ICCS'07). Google ScholarDigital Library
Ntoulas, A., Najork, M., Manasse, M., and Fetterly, D. 2006. Detecting spam web pages through content analysis. In Proceedings of the 15^th International Conference on World Wide Web (WWW'06). 83--92. Google ScholarDigital Library
O'Mahony, M., Hurley, N., and Silvestre, G. 2002. Promoting recommendations: An attack on collaborative filtering. In Proceedings of the 13^th International Conference on Database and Expert Systems Applications (DEXA'02). 494--503. Google ScholarDigital Library
Petitjean, F., Ketterlin, A., and Gancarski, P. 2011. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn. 44, 678--693. Google ScholarDigital Library
Ratkiewicz, J., Conover, M., Meiss, M., Goncalves, B., Flammini, A., and Menczer, F. 2011. Detecting and tracking political abuse in social media. In Proceedings of the 5^th AAAI International Conference on Weblogs and Social Media (ICWSM'11).Google Scholar
Ray, S. and Mahanti, A. 2009. Strategies for effective shilling attacks against recommender systems. In Proceedings of the 2^nd ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD. Google ScholarDigital Library
Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. 1998. A bayesian approach to filtering junk e-mail. In Proceedings of the ICML Workshop on Learning for Text Categorization.Google Scholar
Su, X.-F., Zeng, H.-J., and Chen, Z. 2005. Finding group shilling in recommendation system. In Proceedings of the 14^th International Conference on World Wide Web (WWW'05). (Special Interest Tracks and Posters). Google ScholarDigital Library
Theobald, M., Siddharth, J., and Paepcke, A. 2008. Spotsigs: Robust and efficient near duplicate detection in large web collections. In Proceedings of the 31^st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). Google ScholarDigital Library
Tomita, E., Tanaka, A., and Takahashi, H. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363, 28--42. Google ScholarDigital Library
Trec. 2004. Terabyte track. http://www-nlpir.nist.gov/projects/terabyte/.Google Scholar
Trec. 2007. Spam track. http://plg.uwaterloo.ca/&sim;gvcormac/treccorpus07/.Google Scholar
Twitter. 2012. The twitter rules. http://support.twitter.com/articles/18311-the-twitter-rules.Google Scholar
Voorhees, E. M. and Dang, H. T. 2005. Overview of the trec 2005 question answering track. In Proceedings of the 14^th Text Retrieval Conference (TREC'05).Google ScholarCross Ref
Wang, G., Wilson, C., Zhao, X., Zhu, Y., Mohanlal, M., Zheng, H., and Zhao, B. Y. 2012. Serf and turf: Crowdturfing for fun and profit. In Proceedings of the 21^st International Conference on World Wide Web (WWW'12). Google ScholarDigital Library
Wang, N., Parthasarathy, S., Tan, K.-L., and Tung, A. K. H. 2008. Csv: Visualizing and mining cohesive subgraphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'08). 445--448. Google ScholarDigital Library
Webb, S., Caverlee, J., and Pu, C. 2006. Introducing the webb spam corpus: Using email spam to identify web spam automatically. In Proceedings of the Conference on Email and Anti-Spam (CEAS'06).Google Scholar
Wu, B. and Davison, B. D. 2005. Identifying link farm spam pages. In Proceedings of the 14^th International Conference on World Wide Web (WWW'05). (Special Interest Tracks and Posters). Google ScholarDigital Library
Wu, B., Yang, S., Zhao, H., and Wang, B. 2009. A distributed algorithm to enumerate all maximal cliques in mapreduce. In Proceedings of the 4^th International Conference on Frontier of Computer Science and Technology. Google ScholarDigital Library
Wu, G., Greene, D., Smyth, B., and Cunningham, P. 2010. Distortion as a validation criterion in the identification of suspicious reviews. In proceedings of the SIGKDD Workshop on Social Media Analytics (SOMA'10). Google ScholarDigital Library
Yoshida, K., Adachi, F., Washio, T., Motoda, H., Homma, T., Nakashima, A., Fujikawa, H., and Yamazaki, K. 2004. Density-based spam detector. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD'04). http://pdf.aminer.org/000/473/526/density_based_spam_detector.pdf. Google ScholarDigital Library
Young, J., Martell, C., Anand, P., Ortiz, P., and Gilbert Iv, H. 2011. A microtext corpus for persuasion detection in dialog. In Proceedings of the 25^th Workshops at the AAAI Conference on Artificial Intelligence.Google Scholar
Zhang, Q., Zhang, Y., Yu, H., and Huang, X. 2010. Efficient partial-duplicate detection based on sequence matching. In Proceeding of the 33^rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). Google ScholarDigital Library
Ziegler, C.-N. and Lausen, G. 2005. Propagation models for trust and distrust in social networks. Inf. Syst. Frontiers 7, 4--5, 337--358. Google ScholarDigital Library

Index Terms

Campaign extraction from social media

Recommendations

Content-driven detection of campaigns in social media
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

We study the problem of detecting coordinated free text campaigns in large-scale social media. These campaigns -- ranging from coordinated spam messages to promotional and advertising campaigns to political astro-turfing -- are growing in significance ...
Read More
Towards Multimodal Campaign Detection: Including Image Information in Stream Clustering to Detect Social Media Campaigns
Disinformation in Open Online Media
Abstract
This work explores the potential to include visual information from images in social media campaign recognition. The diverse content shared on social media platforms, including text, photos, videos, and links, necessitates a multimodal analysis ...
Read More
Strategic Temporality on Social Media During the General Election of the 2016 U.S. Presidential Campaign
#SMSociety17: Proceedings of the 8th International Conference on Social Media & Society

To date, little attention has been paid to the temporal nature of campaigns as they respond to events or react to the different stages of a political election -- what we define as strategic temporality. This article seeks to remedy this lack of research ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 5, Issue 1
Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
December 2013
520 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2542182
Editors:
Dr. Hui Xiong
Rutgers University http://datamining.rutgers.edu/
,
Dr. Shashi Shekhar
University of Minnesota http://www-users.cs.umn.edu/∼shekhar/
,
Dr. Alexander Tuzhilin
New York University http://pages.stern.nyu.edu/∼atuzhili/
,
Francesco Bonchi
Yahoo! Research Barcelona, Spain
,
Wray Buntine
NICTA, Australia and The Australian National University
,
Ricard Gavaldá
Technical University of Catalonia, Spain
,
Shengbo Guo
Xerox Research Centre Europe, France
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 January 2014
- Revised: 1 September 2012
- Accepted: 1 September 2012
- Received: 1 February 2012
Published in tist Volume 5, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Social media
campaign detection
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 878
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Campaign extraction from social media

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Content-driven detection of campaigns in social media

Towards Multimodal Campaign Detection: Including Image Information in Stream Clustering to Detect Social Media Campaigns

Strategic Temporality on Social Media During the General Election of the 2016 U.S. Presidential Campaign