research-article

Public Access

Automated Crowdturfing Attacks and Defenses in Online Review Systems

Authors:
Yuanshun Yao

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Bimal Viswanath

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Jenna Cryan

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Haitao Zheng

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Ben Y. Zhao

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2017Pages 1143–1158https://doi.org/10.1145/3133956.3133990

Published:30 October 2017Publication History

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

Pages 1143–1158

ABSTRACT

Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect.

Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers.

Supplemental Material

References

Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-tuned domain generation and detection Proc. of AISec Workshop.Google Scholar
Ebru Arisoy, Tara N. Sainath, Brian Kingsbury, and Bhuvana Ramabhadran 2012. Deep neural network language models. In Proc. NAACL-HLT Workshop.Google Scholar
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining Proc. of LREC.Google Scholar
Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu 2012. Key challenges in defending against malicious socialbots Proc. of LEET Workshop.Google Scholar
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig 1997. Syntactic clustering of the web. In Proc. of WWW. Google ScholarDigital Library
Bo Dong and Xue Wang. 2016. Comparison deep learning method to traditional methods using for network intrusion detection Proc. of ICCSN.Google Scholar
Geli Fei, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Exploiting burstiness in reviews for review spammer detection. Proc. of ICWSM.Google Scholar
Oana Goga, Giridhari Venkatadri, and Krishna P. Gummadi. 2015. The doppelg"anger bot attack: Exploring identity impersonation in online social networks Proc. of IMC.Google Scholar
Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.Google ScholarDigital Library
Ian J. Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, and Vinay Shet. 2013. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv:1312.6082 (2013).Google Scholar
Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv:1308.0850 (2013).Google Scholar
F. Maxwell Harper and Joseph A. Konstan 2016. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems, Vol. 5, 4 (2016), 19.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
Dirk Hovy. 2016. The enemy in your own camp: How well can we detect statistically-generated fake reviews--An adversarial study. In Proc. of ACL (Short Paper).Google ScholarCross Ref
Nitin Jindal and Bing Liu 2007. Review spam detection Proc. of WWW.Google Scholar
Nitin Jindal and Bing Liu 2008. Opinion spam and analysis. In Proc. of WSDM. Google ScholarDigital Library
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu 2016. Exploring the limits of language modeling. arXiv:1602.02410 (2016).Google Scholar
Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An empirical exploration of recurrent network architectures Proc. of ICML.Google Scholar
Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, László Lukács, Marina Ganea, Peter Young, and others 2016. Smart reply: Automated response suggestion for email Proc. of KDD.Google Scholar
Andrej Karpathy and Li Fei-Fei 2015. Deep visual-semantic alignments for generating image descriptions Proc. of CVPR.Google Scholar
Gyuwan Kim, Hayoon Yi, Jangho Lee, Yunheung Paek, and Sungroh Yoon 2017. LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems. In Proc. of ICLR.Google Scholar
Diederik Kingma and Jimmy Ba 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).Google Scholar
Theodoros Lappas. 2012. Fake reviews: The malicious perspective. In Proc. of NLDB. Google ScholarDigital Library
Rémi Lebret, David Grangier, and Michael Auli. 2016. Neural text generation from structured data with application to the biography domain Proc. of EMNLP.Google Scholar
Kyumin Lee, Prithivi Tamilarasan, and James Caverlee. 2013. Crowdturfers, campaigns, and social media: Tracking and revealing crowdsourced manipulation of social media. In Proc. of ICWSM.Google Scholar
Kyumin Lee, Steve Webb, and Hancheng Ge 2014. The dark side of micro-task marketplaces: Characterizing fiverr and automatically detecting crowdturfing. In Proc. of ICWSM.Google Scholar
Fangtao Li, Minlie Huang, Yi Yang, and Xiaoyan Zhu. 2011. Learning to identify review spam. In Proc. of IJCAI.Google Scholar
Jiwei Li, Myle Ott, Claire Cardie, and Eduard H. Hovy. 2014. Towards a general rule for identifying deceptive opinion spam Proc. of ACL.Google Scholar
Zachary C. Lipton, Sharad Vikram, and Julian McAuley. 2015. Capturing meaning in product reviews with character-Level generative text models. arXiv:1511.03683 (2015).Google Scholar
Michael Luca and Georgios Zervas 2016. Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, Vol. 62, 12 (2016), 3412--3427. Google ScholarDigital Library
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proc. of NAACL-HLT.Google Scholar
Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products Proc. of KDD.Google Scholar
William Melicher, Blase Ur, Sean M. Segreti, Saranga Komanduri, Lujo Bauer, Nicolas Christin, and Lorrie Faith Cranor 2016. Fast, lean and accurate: Modeling password guessability using neural networks Proc. of Usenix Security.Google Scholar
Rada Mihalcea and Carlo Strapparava 2009. The lie detector: Explorations in the automatic recognition of deceptive language Proc. of ACL-IJCNLP.Google Scholar
Tomávs Mikolov. 2012. Statistical language models based on neural networks. PhD Thesis, Brno University of Technology (2012).Google Scholar
Tomávs Mikolov, Anoop Deoras, Stefan Kombrink, Lukávs Burget, and Jan vCernockỳ 2011. Empirical evaluation and combination of advanced language modeling techniques Proc. of Interspeech.Google Scholar
George A. Miller. 1995. WordNet: A lexical database for English. Commun. ACM Vol. 38, 11 (1995), 39--41. Google ScholarDigital Library
Arash Molavi Kakhki, Chloe Kliman-Silver, and Alan Mislove. 2013. Iolaus: Securing online content rating systems. In Proc. of WWW.Google Scholar
Marti Motoyama, Damon McCoy, Kirill Levchenko, Stefan Savage, and Geoffrey M. Voelker. 2011. Dirty jobs: The role of freelance labor in web service abuse Proc. of SEC.Google Scholar
Arjun Mukherjee, Abhinav Kumar, Bing Liu, Junhui Wang, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh 2013. Spotting opinion spammers using behavioral footprints Proc. of KDD.Google Scholar
Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie S Glance 2013. What yelp fake review filter might be doing?. In Proc. of ICWSM.Google Scholar
Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination Proc. of NAACL-HLT.Google ScholarDigital Library
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks Proc. of ICML.Google Scholar
Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. 2004. WordNet::Similarity: Measuring the relatedness of concepts Proc. of HLT-NAACL (Demonstration Paper).Google Scholar
James W. Pennebaker, Cindy K. Chung, Molly Ireland, Amy Gonzales, and Roger J. Booth 2007. The development and psychometric properties of LIWC2007. Technical Report (2007).Google Scholar
Jakub Piskorski, Marcin Sydow, and Dawid Weiss. 2008. Exploring linguistic features for web spam detection: a preliminary study Proc. of AIRWeb Workshop.Google Scholar
Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, and Duen Horng Polo Chau 2015. To catch a fake: Curbing deceptive yelp ratings and venues. Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 8, 3 (2015), 147--161. Google ScholarDigital Library
Shebuti Rayana and Leman Akoglu 2015. Collective opinion spam detection: Bridging review networks and metadata Proc. of KDD.Google Scholar
Ehud Reiter, Robert Dale, and Zhiwei Feng 2000. Building natural language generation systems. Vol. Vol. 33. MIT Press. Google ScholarCross Ref
Ehud Reiter, Somayajulu Sripada, Jim Hunter, Jin Yu, and Ian Davy 2005. Choosing words in computer-generated weather forecasts. Artificial Intelligence Vol. 167, 1--2 (2005), 137--169.Google ScholarDigital Library
Chen Rui, Yang Jing, Hu Rong-gui, and Huang Shu-guang. 2013. A novel LSTM-RNN decoding algorithm in CAPTCHA recognition Proc.of IMCCC.Google Scholar
Joshua Saxe and Konstantin Berlin 2015. Deep neural network based malware detection using two dimensional binary program features Proc. of MALWARE.Google Scholar
Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. 2003. Winnowing: local algorithms for document fingerprinting Proc. of SIGMOD.Google Scholar
Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Hierarchical neural network generative models for movie dialogues Proc. of AAAI.Google Scholar
Lifeng Shang, Zhengdong Lu, and Hang Li 2015. Neural responding machine for short-text conversation Proc. of ACL-IJCNLP.Google Scholar
Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing functions in binaries with neural networks Proc. of Usenix Security.Google Scholar
Ilya Sutskever, James Martens, and Geoffrey Hinton. 2011. Generating text with recurrent neural networks. In Proc. of ICML.Google Scholar
Ross Turner, Somayajulu Sripada, and Ehud Reiter. 2010. Generating approximate geographic descriptions. In Proc. of EMNLP. Google ScholarCross Ref
Bimal Viswanath, Muhammad A. Bashir, Muhammad B. Zafar, Simon Bouget, Saikat Guha, Krishna P. Gummadi, Aniket Kate, and Alan Mislove 2015. Strength in numbers: Robust tamper detection in crowd computations Proc. of COSN.Google Scholar
Andreas Vlachos and Sebastian Riedel 2014. Fact Checking: Task definition and dataset construction Proc. of ACL.Google Scholar
Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y. Zhao. 2014. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers Proc. of Usenix Security.Google ScholarDigital Library
Gang Wang, Christo Wilson, Xiaohan Zhao, Yibo Zhu, Manish Mohanlal, Haitao Zheng, and Ben Y. Zhao 2012. Serf and turf: crowdturfing for fun and profit. In Proc. of WWW. Google ScholarDigital Library
Guan Wang, Sihong Xie, Bing Liu, and Philip S. Yu. 2011. Review graph based online store review spammer detection Proc. of ICDM.Google Scholar
William Yang Wang. 2017. "Liar, liar pants on fire": A new benchmark dataset for fake news detection Proc. of ACL.Google Scholar
Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems Proc. of EMNLP.Google Scholar
Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. 2014. Droid-sec: deep learning in Android malware detection Proc. of SIGCOMM (Demonstration Paper).Google Scholar
Qing Zhang, David Y. Wang, and Geoffrey M. Voelker. 2014. DSpin: Detecting automatically spun content on the web Proc. of NDSS. endthebibliographyGoogle Scholar

Index Terms

Automated Crowdturfing Attacks and Defenses in Online Review Systems
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Social aspects of security and privacy

Recommendations

Paid review and paid writer detection
WI '17: Proceedings of the International Conference on Web Intelligence

There has been a surge in opinion-sharing in the public domain. Some opinions greatly influence our decisions, e.g., the choice of purchase. Malicious parties or individuals exploit social media by generating fake reviews for opinion manipulation. This ...
Read More
Detecting Fake Review with Rumor Model--Case Study in Hotel Review
IScIDE 2015: Revised Selected Papers, Part II, of the 5th International Conference on Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques - Volume 9243

With the development of the Internet economy, various websites accumulate numerous reviews about different products and services. Those reviews have become one major information source besides official product information, expert opinion, and ...
Read More
Scriptless attacks: Stealing more pie without touching the sill
Web Application Security Web @ 25

Due to their high practical impact, Cross-Site Scripting (XSS) attacks have attracted a lot of attention from the members of security community worldwide. In the same way, a plethora of more or less effective defense techniques have been proposed, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
October 2017
2682 pages
ISBN:9781450349468
DOI:10.1145/3133956
General Chair:
Bhavani Thuraisingham
The University of Texas at Dallas, USA
,
Program Chairs:
David Evans
University of Virginia
,
Tal Malkin
Columbia University
,
Dongyan Xu
Purdue University
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdtur ng
fake review
opinion spam
web security
Qualifiers
- research-article
Conference

Acceptance Rates
CCS '17 Paper Acceptance Rate151of836submissions,18%Overall Acceptance Rate1,261of6,999submissions,18%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 77
  Total Citations
  View Citations
- 1,503
  Total Downloads
- Downloads (Last 12 months)221
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automated Crowdturfing Attacks and Defenses in Online Review Systems

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Paid review and paid writer detection

Detecting Fake Review with Rumor Model--Case Study in Hotel Review

Scriptless attacks: Stealing more pie without touching the sill

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automated Crowdturfing Attacks and Defenses in Online Review Systems

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Paid review and paid writer detection

Detecting Fake Review with Rumor Model--Case Study in Hotel Review

Scriptless attacks: Stealing more pie without touching the sill

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media