Uncovering anomalous rating behaviors for rating systems

doi:10.1016/j.neucom.2018.05.001

Neurocomputing

Volume 308, 25 September 2018, Pages 205-226

https://doi.org/10.1016/j.neucom.2018.05.001 Get rights and content

Abstract

Personalization collaborative filtering recommendation plays a key component in online rating systems, which also suffers from profile injection attacks in reality. Although anomalous rating detection for online rating systems has attracted increasing attention in recent years, detection performance of the existing methods has not reached an end. Eliminating the impact of interfering information on anomaly detection is a crucial issue for reducing false alarm rates. Moreover, detecting anomalous ratings for unlabeled and real-world data is always a big challenge. In this paper, we investigate a two-stage detection framework to spot anomalous rating profiles. Firstly, interfering rating profiles are determined by comprehensively analyzing the distributions of user activity, item popularity and special ratings in order to eliminate sparse ratings. Based on the reserved rating profiles, combining target item analysis and non-linear structure clustering is then adopted to further determine the concerned attackers. Extensive experimental comparisons in diverse attacks demonstrate the effectiveness of the proposed method compared with competing benchmarks. Additionally, discovering interesting findings including anomalous ratings and items on two real-world datasets, Amazon and TripAdvisor, is also investigated.

Introduction

Online rating systems have been significantly developed in parallel with the social networks in the last decade. Rating data is ubiquitous on the well-known E-commerce websites including Amazon, Taobo, TripAdvisor, Yelp and etc. [5], [6], [11], [12], [21], [32], [48], [55], [61]. Personalization collaborative recommender systems play a crucial role in handling the increasingly prominent problem of information overload, which automatically suggest to a user items that might be of interest to her [1], [4], [23], [24], [25], [26], [27], [28], [41]. However, collaborative filtering recommender systems (CFRSs) are highly vulnerable to outside attacks, called profile injection attacks (a.k.a. shilling attacks) [6], [34], due to the fact that recommender systems are entirely based on the input provided by users or customers [3], [7], [8], [14], [16], [19], [22], [37], [44], [49], [56], [61]. Profile injection attacks, in which attackers manipulate biased ratings in order to influence future recommendations, have been demonstrated to be effective against collaborative filtering recommendation engines. According to the intention of attackers, shilling attacks can be classified in two basic categories: inserting malicious profiles which rate a particular item highly are called push attacks, conversely inserting malicious profiles aimed at downgrading the popularity of an item are termed nuke attacks [31]. Anonymous or pseudonymous users in online systems can multiply their profiles and identities nearly indefinitely, which utilize well-designed rating profiles to produce recommendation behaviors that the attackers desire. Therefore, proactively identifying the malicious rating profiles is extremely significant and meaningful for personalized collaborative recommendations.

Securing collaborative filtering recommender systems from malicious attacks have become an important issue with increasing popularity of recommender systems [13], [47]. Although previous researches have shown promising results, defending such attacks is still an unresolved technique, and has not reached a full level of performance [13], [36], [47], [57], [61], [62], [63], [64]. In particular, how to construct a strategy that can be used to spot anomalous ratings for real-world data is also extremely desired. Furthermore, developing detection method which can effectively defense diverse shilling attacks is always a big challenge. Moreover, compared with the number of genuine profiles (authentic profiles), the number of attack profiles is very small in rating systems. The distinct difference between the numbers of genuine and attack profiles is call imbalanced distribution of rating profiles [6], [53]. The imbalanced distribution makes a challenging task for abnormality detection due to the difficulty of characterizing rating behaviors of users. How to eliminate a part of genuine profiles (interfering rating profiles) and reduce imbalanced distribution before anomaly detection is a concerned task especially for large-scale and real-world data. With respect to unlabeled and real-world datasets, investigating abnormality forensics metrics for determining the concerned users or items is a realistic problem that cannot be ignored.

In this paper, we present a two-stage detection framework to spot anomalous rating profiles. Facing with the imbalanced distribution of rating profiles, interfering rating profiles are first determined by comprehensively analyzing the rating distribution of user activity, the distribution of item popularity and special ratings in order to eliminate sparse ratings. The goal of the first stage is to filter out interfering rating profiles (genuine profiles) [36], [52] as many as possible and simultaneously reserve all attack profiles. Based on the remaining rating profiles, combining target item analysis and non-linear structure clustering is then adopted to further determine the concerned attackers. Since shilling attackers mimic rating details of authentic user to manipulate attack profiles, it is difficult to identify them. A robust multiple kernel data clustering method is employed to distinguish the attack profiles from authentic profiles in an appropriate feature space while the clusters are not linear separable in the original space. Moreover, we also explore evaluation metrics of abnormality forensics for discovering interesting findings in two real-world datasets including TripAdvisor and Amazon. More importantly, analyzing the internal relationship between historical ratings and reviews of items is provided to spot anomalous items. Extensive experimental comparisons on diverse attack datasets demonstrate the effectiveness of the proposed detection method compared with competing benchmarks. In addition, discovering interesting findings including anomalous items, ratings and etc. on Amazon and TripAdvisor datasets is investigated.

The main contributions of this paper are four-fold as follows:

•
Eliminating interfering profiles according to the distributions of users’ activity, items’ popularity and sparse ratings in advance provides a feasible idea for abnormality detection faced with the imbalanced distribution of rating profiles, which is also favorable to characterize rating behaviors of users.
•
Combining target item analysis and non-linear structure clustering is effective to reduce the scope of determining anomalous users. The false alarm rate of the proposed approach can be further reduced.
•
To discover suspicious items or ratings on unlabeled and real-world datasets, suspected items detected by the proposed approach are further determined by comprehensively analyzing intrinsic association between overall rating and each aspect rating on the same item, rating behavior aggregation, rating intention distribution and topological structure analysis of suspicious items.
•
Extensive experiments on both synthetic datasets in 10 different attacks and real-world datasets including Amazon and TripAdvisor are conducted to demonstrate the effectiveness of the proposed approach.

The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 introduces the proposed method in detail. In Section 4, experimental results are reported and analyzed. Finally, we briefly conclude the paper with a brief summary and discuss our future work.

Section snippets

Related work

Detecting anomalous rating behaviors has received much attentions over the last decade and achieved impressive results. In this section, we only discuss methods related to the presented work, which can be briefly introduced in the following three aspects, namely eliminating sparse ratings, clustering for shilling attack detection and anomalous rating detection for real-world data.

The proposed approach

In this paper, a two-stage detection framework is proposed to detect shilling attackers and discover interesting findings for real-world datasets. Firstly, analyzing the distribution of user rating, item distribution and special ratings is provided to determine interfering rating profiles. The determined interference rating profiles are then eliminated in advance. Based on the remaining rating profiles, combining target item analysis and non-linear spatial clustering is adopted to finally

Experiment simulation

In this section, experiment settings will be first introduced. The performance of each stage of the proposed detection method will be discussed in diverse attacks. Furthermore, we also analyze the detection performance of all presented methods in different attacks. To discover interesting findings in real-world datasets, extensive experiments are conducted to demonstrate the practicability of the proposed approach. Finally, we briefly discuss the experimental results.

Conclusions and future work

Collaborative filtering recommender systems are highly vulnerable to shilling attacks or profile injection attacks. The existence of sparse rating profiles and the limitation of existing rating behavior features are always the challenges for attack detection in CFRSs. In this paper, we proposed a relatively flexible detection framework which consists of two stages to detect shilling attacks and discover anomalous ratings in real-world datasets. Analyzing the distribution of users, items and

Acknowledgments

The research is supported by the National Natural Science Foundation of China (No.: 61702412 and 61571360), Shaanxi Science & Technology Co-ordination & Innovation Project (No.: 2016KTZDGY05-09), the Innovation Project of Shaanxi Provincial Department of Education (No.: 17JF023) and Ph.D. Research Startup Funds of Xi’an University of Technology (No.: 112-256081704). In addition, three anonymous reviewers have carefully read this paper and have provided to us numerous constructive suggestions.

Zhihai Yang received the Ph.D. degree in Control Science and Engineering from Xi’an Jiaotong University, in 2016. He is currently a lecturer in the School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China. His research interests include information security, recommender system and data mining.

References (65)

A. Bilge et al.
A novel shilling attack detection method
Procedia Comput. Sci.
(2014)
J. Bobadilla et al.
Recommender systems survey
Knowl. Based Syst.
(2013)
ChungC. et al.
$β P$ : a novel approach to filter out malicious rating profiles from recommender systems
J. Decis. Supp. Syst.
(2013)
LuJ. et al.
Recommender system application developments: a survey
Decis. Supp. Syst.
(2015)
NohG. et al.
PSD: practical SYBIL detection schemes using stickiness and persistence in online recommender systems
Inf. Sci.
(2014)
WangY. et al.
A comparative study of shilling attack detectors for recommender systems
Proceedings of the Twelfth International Conference on Service Systems and Service Management (ICSSSM)
(2015)
XiaH. et al.
A novel item anomaly detection approach against shilling attacks in collaborative recommendation systems using the dynamic time interval segmentation technique
Inf. Sci.
(2015)
YangZ. et al.
Estimating user behavior toward detecting anomalous ratings in rating systems
Knowl. Based Syst.
(2016)
YangZ. et al.
Spotting anomalous ratings for rating systems by analyzing target users and items
Neurocomputing
(2017)
YangZ. et al.
Re-scale AdaBoost for attack detection in collaborative filtering recommender systems
Knowl. Based Syst.
(2016)

ZhangF. et al.

HHT-SVM: an online method for detecting profile injection attacks in collaborative recommender systems

Knowl. Based Syst.

(2014)

ZhangZ. et al.

Graph-based detection of shilling attacks in recommender systems

Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing

(2013)

ZhangZ. et al.

Detection of shilling attacks in recommender systems via spectral clustering

Proceedings of the International Conference on Information Fusion

(2014)

ZhouZ.H. et al.

Deep forest: towards an alternative to deep neural networks

Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI’17)

(2017)

G. Adomavicius et al.

Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

IEEE Trans. Knowl. Data Eng.

(2005)

R. Bhaumik et al.

A clustering approach to unsupervised attack detection in collaborative recommender systems

Proceedings of the Proceedings of seventh IEEE ICML

(2011)

K. Bryan et al.

Unsupervised retrieval of attack profiles in collaborative recommender systems

Proceedings of the ACM conference on Recommender Systems

(2008)

R. Burke et al.

Classification features for attack detection in collaborative recommender systems

Proceedings of the International Conference on Knowledge Discovery and Data Mining

(2006)

CaoJ. et al.

Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system

World Wide Web

(2013)

DuL. et al.

Unsupervised feature selection with adaptive structure learning

Proceedings of the Twenty-firth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

(2015)

DuL. et al.

Robust multiple kernel k-means clustering using l21-norm

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI)

(2015)

N. Günnemann et al.

Robust multivariate autoregression for anomaly detection in dynamic product ratings

Proceedings of the Twenty-third International Conference on World Wide Web

(2014)

S. Günnemann et al.

Detecting anomalies in dynamic rating data: a robust probabilistic model for rating evolution

Proceedings of the KDD’2014

(2014)

I. Gunes et al.

Shilling attacks against recommender systems: a comprehensive survey

Artif. Intell. Rev.

(2012)

HeF. et al.

Attack detection by rough set theory in recommendation system

Proceedings of the IEEE International Conference on Granular Computing

(2010)

HuangH.C. et al.

Multiple kernel learning algorithms

IEEE Trans. Fuzzy Syst.

(2012)

HuangS. et al.

A hybrid decision approach to detect profile injection attacks in collaborative recommender systems

Found. Intell. Syst.

(2012)

N. Hurley et al.

Statistical attack detection

Proceedings of the Third ACM Conference on Recommender Systems

(2009)

JiangM. et al.

Catchsync: catching synchronized behavior in large directed graphs

Proceedings of the Proceedings of the Twentieth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

(2014)

S. Lam et al.

Shilling recommender systems for fun and profit

Proceedings of the Thirteenth International Conference on World Wide Web

(2004)

LeeJ. et al.

Shilling attack detection: a new approach for a trustworthy recommender system

INFORMS J. Comput.

(2012)

LiB. et al.

Noisy but non-malicious user detection in social recommender systems

World Wide Web

(2013)

Cited by (15)

Recommendation attack detection based on improved Meta Pseudo Labels
2023, Knowledge-Based Systems
Attackers attempt to bias the outputs of collaborative recommender systems by maliciously rating goods or services. To detect such attacks, many deep learning-based detection methods have been proposed and shown to be feasible. However, most methods require a large number of labeled user profiles for training to ensure good detection performance. To address this issue, in this paper, we propose a deep semisupervised detection approach based on the improved Meta Pseudo Labels, named DSSD-ImMPL. DSSD-ImMPL can achieve high detection performance given a small number of labeled training samples and a certain number of unlabeled training samples. We first improve the Meta Pseudo Labels method by generating a group of student networks by an experienced teacher network instead of only one student network in the original Meta Pseudo Labels method to improve the classification performance. Then, we use the group of student networks to detect the recommendation attack. The detection performance is verified with classical, mixed, GSA-GANs, and real attacks on three benchmark datasets by comparing DSSD-ImMPL with the state-of-the-art detection methods.
Sampling and noise filtering methods for recommender systems: A literature review
2023, Engineering Applications of Artificial Intelligence
In the era of online business, many e-commerce sites have evolved which recommend items according to one’s needs and interests. Plenty of data is available to be processed to make the recommender systems work effectively and efficiently. But, processing the entire dataset is a cumbersome process. So, there is a need to select a part of the data to be processed easily. Sampling is a way to select a subset of the entire dataset which contains all the attributes that the database can represent. It is important to understand which type of sampling method is suitable for a particular application in recommender systems. Thus, there is a need to study various sampling methods previously used in recommender systems. Also, before processing the data, we need to clean it up as it contains a certain amount of noise. This noise is described as either malicious noise or natural noise. Malicious noise is implicitly inserted in the system to alter the behavior of the system. This type of noise is termed as shilling attack. Natural noise enters the systems unknowingly due to the reluctance of users in giving proper ratings to the items. So, there is a need to filter these types of noise before making the data suitable for processing. In this paper, we have provided a review of 80 papers including both sampling and noise filtering methods. This is the first paper to the best of our knowledge, combining a literature review of both sampling and noise filtering methods.
Decision making towards large-scale alternatives from multiple online platforms by a multivariate time-series-based method
2023, Expert Systems with Applications
Citation Excerpt :
New users can access service or product information by means of unstructured data such as text, pictorial or audiovisual data. Many online platforms introduced the so-called rating system in which users can evaluate objects by giving discrete ratings (Yang, Sun, Zhang, & Zhang, 2018). To approximately judge the quality of a certain object, a user can refer to the historical ratings that the object received (Liao et al., 2014).
With the increasing popularity of Internet-related techniques, decision-making problems with large-scale alternatives from multiple online platforms, such as consumer choice decisions and movie selections, have been emerging hot topics. How to make a selection from a set of alternatives in multiple online platforms is challenge for consumers. In this study, we introduce a multivariate time-series-based decision-making method to solve the problem with large-scale alternatives. Firstly, we set up a multivariate time series from multiple-platforms regarding each alternative. The weights of these platforms are determined based on the information entropy of time series and the number of received evaluations given by platform users. To calculate the information entropy regarding a large number of alternatives, we adopt a time series clustering method to classify alternatives into different clusters, and then calculate the information entropy of clusters and take it as the information entropy of all alternatives. Afterwards, the scores of alternatives are calculated based on the weighted averaging aggregation operator and the alternatives are ranked according to their scores. We demonstrate the effectiveness of the proposed method in guaranteeing the consistency between ranking results and users' consumption behaviors based on real ratings collected from three film-review websites. It is hoped that the proposed method would be helpful for users to intelligently make a selection from large-scale candidate products or services in multiple platforms.
Semi-supervised recommendation attack detection based on Co-Forest
2021, Computers and Security
Citation Excerpt :
However, when detecting attack profiles with large attack sizes, this method has poor performance. The clustering-based methods (Lee and Zhu, 2012; Yang et al., 2018; Zhang et al., 2018) try to detect recommendation attack by clustering genuine profiles and attack profiles into different clusters. However, this type of methods may have low detection performance when there is only one type of user profiles in the test set.
In recommendation attack, malicious users attempt to bias the recommendation results by injecting fake profiles into the rating database. To detect such attack, three types of methods, i.e., unsupervised, supervised and semi-supervised, have been proposed. Among these works, the advantage of semi-supervised methods is that they can use the unlabeled user profiles to improve the detection performance. However, the existing semi-supervised methods suffer from low precision. Aiming at this problem, in this paper, we propose a semi-supervised detection approach named SSADR-CoF based on the Co-Forest algorithm. Being different from the existing semi-supervised methods which only use a few of features to train a single classifier for the detection, the proposed approach uses a series of features to train an ensemble of classifiers to detect the recommendation attack. We first use the window dividing and rating behavior statistical methods to extract a series of user rating behavior mode features for training the detection model. Then, we use a small number of labeled user profiles to initialize an ensemble of classifiers, and use the ensemble of classifiers to assign labels to the unlabeled user profiles. Finally, we use the labeled and the newly labeled user profiles to iteratively update the classifiers for the detection. Experiments conducted on three benchmark datasets MovieLens 10M, MovieLens 25M, and Amazon show that the proposed approach can effectively improve the precision of the semi-supervised methods under the condition of maintaining high recall and AUC.
Recommendation attack detection based on deep learning
2020, Journal of Information Security and Applications
Citation Excerpt :
Recommendation attack presents a great challenge to play a normal function of collaborative recommender systems. To detect recommendation attack, researchers have proposed many methods [6–18,20,21] by using the traditional machine learning techniques, such as clustering technique [9], Hidden Markov Model [13], C4.5, and SVM [14–16]. Despite the effectiveness of these methods, a large number of them [14–18,20,21] are built based on the hand-designed features which is usually a challenge task to extract even for domain experts.
Collaborative recommender systems are vulnerable to recommendation attack, in which malicious users insert fake profiles into the rating database in order to bias the systems output. To reduce this risk, many methods have been proposed to detect such attack. Despite their effectiveness, a lot of these methods are built based on the hand-designed features which are usually difficult to extract even for domain experts. In order to build the detection method without resorting to hand-designed features, in this paper, we propose a deep learning-based approach for detecting recommendation attack (called DL-DRA). The proposed approach can learn directly from the low-level rating data for the training of classifier. Therefore, it does not have the problem of how to extract hand-designed features. We first propose a framework to show the basic structure of the proposed detection approach. Then, we propose a rating matrix generation method to transform the rating vector into rating matrix for each user. After that, we use the bicubic interpolation algorithm to resize the rating matrix in order to reduce the sparsity of the rating matrix. Finally, on the basis of convolutional neural network (CNN), we construct a deep learning network which can learn directly from the resized rating matrix. On this basis, we propose an algorithm for detecting the recommendation attack. We conduct a large number of comparative experiments with the state-of-the-art methods for recommendation attack on different scale MovieLens datasets. The experimental results show that the proposed approach can detect the recommendation attack, effectively and steadily.
A novel Enhanced Collaborative Autoencoder with knowledge distillation for top-N recommender systems
2019, Neurocomputing
Citation Excerpt :
With the rapid development of Internet and E-commerce, information overloading has become a severe problem that makes it difficult to find useful information for users [1,2]. To address this problem, numerous recommender systems are proposed to make personal recommendations for users to help finding information to feed their requirements [3–9]. These methods are widely applied in most web-services, such as Tmall, Ciao and Epinions.
In most recommender systems, the data of user feedbacks are usually represented with a set of discrete values, which are difficult to exactly describe users’ interests. This problem makes it not easy to exactly model users’ latent preferences for recommendation. Intuitively, a basic idea for this issue is to predict continuous values through a trained model to reveal users’ essential feedbacks, and then make use of the generated data to retrain another model to learn users’ preferences. However, since these continuous data are generated by an imperfect model which are trained by discrete data, there exists a lot of noise among the generated data. This problem may have a severe adverse impact on the performance. Towards this problem, we propose a novel Enhanced Collaborative Autoencoder (ECAE) to learn robust information from generated soft data with the technique of knowledge distillation. First, we propose a tightly coupled structure to incorporate the generation and retraining stages into a unified framework. So that the generated data can be fine tuned to reduce the noise by propagating training errors of retraining network. Second, for that each unit of the generated data contains different level of noise, we propose a novel distillation layer to balance the influence of noise and knowledge. Finally, we propose to take both predict results of generation and retraining network into account to make final recommendations for each user. The experimental results on four public datasets for top-N recommendation show that the ECAE model performs better than several state-of-the-art algorithms on metrics of MAP and NDCG.

View all citing articles on Scopus

Qindong Sun received his Ph.D. degree in School of Electronic and Information Engineering from the Xi’an Jiaotong University, China. He is currently a professor at the Department of Computer Science and Engineering of Xi’an University of Technology. His research interests include network information security, online social networks and internet of things.

Yaling Zhang received the B.S. degree in computer science in 1988 from Northwest University, Xi’an, China. She received the B.S. degree in computer science in 2001, and earned the Ph.D. degree in mechanism electron engineering in 2008, both from the Xi’an University of Technology, Xi’an, China. Currently, she is a professor in Xi’an University of Technology. Hers current research interests include cryptography and differential privacy protection in data mining.

Beibei Zhang received the Ph.D. degree in Control Science and Engineering from Xi’an Jiaotong University, in 2012. He accepted a postdoctoral position in Xi’an Jiaotong University from 2012 to 2016. He is currently a lecturer in the School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China. His research interests include information propagation, topic detection and tracking, complex network structure evolution detection and data mining.

View full text

Uncovering anomalous rating behaviors for rating systems

Abstract

Introduction

Section snippets

Related work

The proposed approach

Experiment simulation

Conclusions and future work

Acknowledgments

Procedia Comput. Sci.

Knowl. Based Syst.

J. Decis. Supp. Syst.

Decis. Supp. Syst.

Inf. Sci.

Inf. Sci.

Knowl. Based Syst.

Neurocomputing

Knowl. Based Syst.

Knowl. Based Syst.

Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

IEEE Trans. Knowl. Data Eng.

A clustering approach to unsupervised attack detection in collaborative recommender systems

Proceedings of the Proceedings of seventh IEEE ICML

Unsupervised retrieval of attack profiles in collaborative recommender systems

Proceedings of the ACM conference on Recommender Systems

Classification features for attack detection in collaborative recommender systems

Proceedings of the International Conference on Knowledge Discovery and Data Mining

Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system

World Wide Web

Unsupervised feature selection with adaptive structure learning

Proceedings of the Twenty-firth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Robust multiple kernel k-means clustering using l21-norm

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI)

Robust multivariate autoregression for anomaly detection in dynamic product ratings

Proceedings of the Twenty-third International Conference on World Wide Web

Detecting anomalies in dynamic rating data: a robust probabilistic model for rating evolution

Proceedings of the KDD’2014

Shilling attacks against recommender systems: a comprehensive survey

Artif. Intell. Rev.

Attack detection by rough set theory in recommendation system

Proceedings of the IEEE International Conference on Granular Computing

Multiple kernel learning algorithms

IEEE Trans. Fuzzy Syst.

A hybrid decision approach to detect profile injection attacks in collaborative recommender systems

Found. Intell. Syst.

Statistical attack detection

Proceedings of the Third ACM Conference on Recommender Systems

Catchsync: catching synchronized behavior in large directed graphs

Proceedings of the Proceedings of the Twentieth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Shilling recommender systems for fun and profit

Proceedings of the Thirteenth International Conference on World Wide Web

Shilling attack detection: a new approach for a trustworthy recommender system

INFORMS J. Comput.

Noisy but non-malicious user detection in social recommender systems

World Wide Web