Uncovering anomalous rating behaviors for rating systems
Introduction
Online rating systems have been significantly developed in parallel with the social networks in the last decade. Rating data is ubiquitous on the well-known E-commerce websites including Amazon, Taobo, TripAdvisor, Yelp and etc. [5], [6], [11], [12], [21], [32], [48], [55], [61]. Personalization collaborative recommender systems play a crucial role in handling the increasingly prominent problem of information overload, which automatically suggest to a user items that might be of interest to her [1], [4], [23], [24], [25], [26], [27], [28], [41]. However, collaborative filtering recommender systems (CFRSs) are highly vulnerable to outside attacks, called profile injection attacks (a.k.a. shilling attacks) [6], [34], due to the fact that recommender systems are entirely based on the input provided by users or customers [3], [7], [8], [14], [16], [19], [22], [37], [44], [49], [56], [61]. Profile injection attacks, in which attackers manipulate biased ratings in order to influence future recommendations, have been demonstrated to be effective against collaborative filtering recommendation engines. According to the intention of attackers, shilling attacks can be classified in two basic categories: inserting malicious profiles which rate a particular item highly are called push attacks, conversely inserting malicious profiles aimed at downgrading the popularity of an item are termed nuke attacks [31]. Anonymous or pseudonymous users in online systems can multiply their profiles and identities nearly indefinitely, which utilize well-designed rating profiles to produce recommendation behaviors that the attackers desire. Therefore, proactively identifying the malicious rating profiles is extremely significant and meaningful for personalized collaborative recommendations.
Securing collaborative filtering recommender systems from malicious attacks have become an important issue with increasing popularity of recommender systems [13], [47]. Although previous researches have shown promising results, defending such attacks is still an unresolved technique, and has not reached a full level of performance [13], [36], [47], [57], [61], [62], [63], [64]. In particular, how to construct a strategy that can be used to spot anomalous ratings for real-world data is also extremely desired. Furthermore, developing detection method which can effectively defense diverse shilling attacks is always a big challenge. Moreover, compared with the number of genuine profiles (authentic profiles), the number of attack profiles is very small in rating systems. The distinct difference between the numbers of genuine and attack profiles is call imbalanced distribution of rating profiles [6], [53]. The imbalanced distribution makes a challenging task for abnormality detection due to the difficulty of characterizing rating behaviors of users. How to eliminate a part of genuine profiles (interfering rating profiles) and reduce imbalanced distribution before anomaly detection is a concerned task especially for large-scale and real-world data. With respect to unlabeled and real-world datasets, investigating abnormality forensics metrics for determining the concerned users or items is a realistic problem that cannot be ignored.
In this paper, we present a two-stage detection framework to spot anomalous rating profiles. Facing with the imbalanced distribution of rating profiles, interfering rating profiles are first determined by comprehensively analyzing the rating distribution of user activity, the distribution of item popularity and special ratings in order to eliminate sparse ratings. The goal of the first stage is to filter out interfering rating profiles (genuine profiles) [36], [52] as many as possible and simultaneously reserve all attack profiles. Based on the remaining rating profiles, combining target item analysis and non-linear structure clustering is then adopted to further determine the concerned attackers. Since shilling attackers mimic rating details of authentic user to manipulate attack profiles, it is difficult to identify them. A robust multiple kernel data clustering method is employed to distinguish the attack profiles from authentic profiles in an appropriate feature space while the clusters are not linear separable in the original space. Moreover, we also explore evaluation metrics of abnormality forensics for discovering interesting findings in two real-world datasets including TripAdvisor and Amazon. More importantly, analyzing the internal relationship between historical ratings and reviews of items is provided to spot anomalous items. Extensive experimental comparisons on diverse attack datasets demonstrate the effectiveness of the proposed detection method compared with competing benchmarks. In addition, discovering interesting findings including anomalous items, ratings and etc. on Amazon and TripAdvisor datasets is investigated.
The main contributions of this paper are four-fold as follows:
- •
Eliminating interfering profiles according to the distributions of users’ activity, items’ popularity and sparse ratings in advance provides a feasible idea for abnormality detection faced with the imbalanced distribution of rating profiles, which is also favorable to characterize rating behaviors of users.
- •
Combining target item analysis and non-linear structure clustering is effective to reduce the scope of determining anomalous users. The false alarm rate of the proposed approach can be further reduced.
- •
To discover suspicious items or ratings on unlabeled and real-world datasets, suspected items detected by the proposed approach are further determined by comprehensively analyzing intrinsic association between overall rating and each aspect rating on the same item, rating behavior aggregation, rating intention distribution and topological structure analysis of suspicious items.
- •
Extensive experiments on both synthetic datasets in 10 different attacks and real-world datasets including Amazon and TripAdvisor are conducted to demonstrate the effectiveness of the proposed approach.
The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 introduces the proposed method in detail. In Section 4, experimental results are reported and analyzed. Finally, we briefly conclude the paper with a brief summary and discuss our future work.
Section snippets
Related work
Detecting anomalous rating behaviors has received much attentions over the last decade and achieved impressive results. In this section, we only discuss methods related to the presented work, which can be briefly introduced in the following three aspects, namely eliminating sparse ratings, clustering for shilling attack detection and anomalous rating detection for real-world data.
The proposed approach
In this paper, a two-stage detection framework is proposed to detect shilling attackers and discover interesting findings for real-world datasets. Firstly, analyzing the distribution of user rating, item distribution and special ratings is provided to determine interfering rating profiles. The determined interference rating profiles are then eliminated in advance. Based on the remaining rating profiles, combining target item analysis and non-linear spatial clustering is adopted to finally
Experiment simulation
In this section, experiment settings will be first introduced. The performance of each stage of the proposed detection method will be discussed in diverse attacks. Furthermore, we also analyze the detection performance of all presented methods in different attacks. To discover interesting findings in real-world datasets, extensive experiments are conducted to demonstrate the practicability of the proposed approach. Finally, we briefly discuss the experimental results.
Conclusions and future work
Collaborative filtering recommender systems are highly vulnerable to shilling attacks or profile injection attacks. The existence of sparse rating profiles and the limitation of existing rating behavior features are always the challenges for attack detection in CFRSs. In this paper, we proposed a relatively flexible detection framework which consists of two stages to detect shilling attacks and discover anomalous ratings in real-world datasets. Analyzing the distribution of users, items and
Acknowledgments
The research is supported by the National Natural Science Foundation of China (No.: 61702412 and 61571360), Shaanxi Science & Technology Co-ordination & Innovation Project (No.: 2016KTZDGY05-09), the Innovation Project of Shaanxi Provincial Department of Education (No.: 17JF023) and Ph.D. Research Startup Funds of Xi’an University of Technology (No.: 112-256081704). In addition, three anonymous reviewers have carefully read this paper and have provided to us numerous constructive suggestions.
Zhihai Yang received the Ph.D. degree in Control Science and Engineering from Xi’an Jiaotong University, in 2016. He is currently a lecturer in the School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China. His research interests include information security, recommender system and data mining.
References (65)
- et al.
A novel shilling attack detection method
Procedia Comput. Sci.
(2014) - et al.
Recommender systems survey
Knowl. Based Syst.
(2013) - et al.
: a novel approach to filter out malicious rating profiles from recommender systems
J. Decis. Supp. Syst.
(2013) - et al.
Recommender system application developments: a survey
Decis. Supp. Syst.
(2015) - et al.
PSD: practical SYBIL detection schemes using stickiness and persistence in online recommender systems
Inf. Sci.
(2014) - et al.
A comparative study of shilling attack detectors for recommender systems
Proceedings of the Twelfth International Conference on Service Systems and Service Management (ICSSSM)
(2015) - et al.
A novel item anomaly detection approach against shilling attacks in collaborative recommendation systems using the dynamic time interval segmentation technique
Inf. Sci.
(2015) - et al.
Estimating user behavior toward detecting anomalous ratings in rating systems
Knowl. Based Syst.
(2016) - et al.
Spotting anomalous ratings for rating systems by analyzing target users and items
Neurocomputing
(2017) - et al.
Re-scale AdaBoost for attack detection in collaborative filtering recommender systems
Knowl. Based Syst.
(2016)
HHT-SVM: an online method for detecting profile injection attacks in collaborative recommender systems
Knowl. Based Syst.
Graph-based detection of shilling attacks in recommender systems
Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing
Detection of shilling attacks in recommender systems via spectral clustering
Proceedings of the International Conference on Information Fusion
Deep forest: towards an alternative to deep neural networks
Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI’17)
Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions
IEEE Trans. Knowl. Data Eng.
A clustering approach to unsupervised attack detection in collaborative recommender systems
Proceedings of the Proceedings of seventh IEEE ICML
Unsupervised retrieval of attack profiles in collaborative recommender systems
Proceedings of the ACM conference on Recommender Systems
Classification features for attack detection in collaborative recommender systems
Proceedings of the International Conference on Knowledge Discovery and Data Mining
Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system
World Wide Web
Unsupervised feature selection with adaptive structure learning
Proceedings of the Twenty-firth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Robust multiple kernel k-means clustering using l21-norm
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI)
Robust multivariate autoregression for anomaly detection in dynamic product ratings
Proceedings of the Twenty-third International Conference on World Wide Web
Detecting anomalies in dynamic rating data: a robust probabilistic model for rating evolution
Proceedings of the KDD’2014
Shilling attacks against recommender systems: a comprehensive survey
Artif. Intell. Rev.
Attack detection by rough set theory in recommendation system
Proceedings of the IEEE International Conference on Granular Computing
Multiple kernel learning algorithms
IEEE Trans. Fuzzy Syst.
A hybrid decision approach to detect profile injection attacks in collaborative recommender systems
Found. Intell. Syst.
Statistical attack detection
Proceedings of the Third ACM Conference on Recommender Systems
Catchsync: catching synchronized behavior in large directed graphs
Proceedings of the Proceedings of the Twentieth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Shilling recommender systems for fun and profit
Proceedings of the Thirteenth International Conference on World Wide Web
Shilling attack detection: a new approach for a trustworthy recommender system
INFORMS J. Comput.
Noisy but non-malicious user detection in social recommender systems
World Wide Web
Cited by (15)
Recommendation attack detection based on improved Meta Pseudo Labels
2023, Knowledge-Based SystemsSampling and noise filtering methods for recommender systems: A literature review
2023, Engineering Applications of Artificial IntelligenceDecision making towards large-scale alternatives from multiple online platforms by a multivariate time-series-based method
2023, Expert Systems with ApplicationsCitation Excerpt :New users can access service or product information by means of unstructured data such as text, pictorial or audiovisual data. Many online platforms introduced the so-called rating system in which users can evaluate objects by giving discrete ratings (Yang, Sun, Zhang, & Zhang, 2018). To approximately judge the quality of a certain object, a user can refer to the historical ratings that the object received (Liao et al., 2014).
Semi-supervised recommendation attack detection based on Co-Forest
2021, Computers and SecurityCitation Excerpt :However, when detecting attack profiles with large attack sizes, this method has poor performance. The clustering-based methods (Lee and Zhu, 2012; Yang et al., 2018; Zhang et al., 2018) try to detect recommendation attack by clustering genuine profiles and attack profiles into different clusters. However, this type of methods may have low detection performance when there is only one type of user profiles in the test set.
Recommendation attack detection based on deep learning
2020, Journal of Information Security and ApplicationsCitation Excerpt :Recommendation attack presents a great challenge to play a normal function of collaborative recommender systems. To detect recommendation attack, researchers have proposed many methods [6–18,20,21] by using the traditional machine learning techniques, such as clustering technique [9], Hidden Markov Model [13], C4.5, and SVM [14–16]. Despite the effectiveness of these methods, a large number of them [14–18,20,21] are built based on the hand-designed features which is usually a challenge task to extract even for domain experts.
A novel Enhanced Collaborative Autoencoder with knowledge distillation for top-N recommender systems
2019, NeurocomputingCitation Excerpt :With the rapid development of Internet and E-commerce, information overloading has become a severe problem that makes it difficult to find useful information for users [1,2]. To address this problem, numerous recommender systems are proposed to make personal recommendations for users to help finding information to feed their requirements [3–9]. These methods are widely applied in most web-services, such as Tmall, Ciao and Epinions.
Zhihai Yang received the Ph.D. degree in Control Science and Engineering from Xi’an Jiaotong University, in 2016. He is currently a lecturer in the School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China. His research interests include information security, recommender system and data mining.
Qindong Sun received his Ph.D. degree in School of Electronic and Information Engineering from the Xi’an Jiaotong University, China. He is currently a professor at the Department of Computer Science and Engineering of Xi’an University of Technology. His research interests include network information security, online social networks and internet of things.
Yaling Zhang received the B.S. degree in computer science in 1988 from Northwest University, Xi’an, China. She received the B.S. degree in computer science in 2001, and earned the Ph.D. degree in mechanism electron engineering in 2008, both from the Xi’an University of Technology, Xi’an, China. Currently, she is a professor in Xi’an University of Technology. Hers current research interests include cryptography and differential privacy protection in data mining.
Beibei Zhang received the Ph.D. degree in Control Science and Engineering from Xi’an Jiaotong University, in 2012. He accepted a postdoctoral position in Xi’an Jiaotong University from 2012 to 2016. He is currently a lecturer in the School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China. His research interests include information propagation, topic detection and tracking, complex network structure evolution detection and data mining.