Elsevier

Neurocomputing

Volume 308, 25 September 2018, Pages 205-226
Neurocomputing

Uncovering anomalous rating behaviors for rating systems

https://doi.org/10.1016/j.neucom.2018.05.001Get rights and content

Abstract

Personalization collaborative filtering recommendation plays a key component in online rating systems, which also suffers from profile injection attacks in reality. Although anomalous rating detection for online rating systems has attracted increasing attention in recent years, detection performance of the existing methods has not reached an end. Eliminating the impact of interfering information on anomaly detection is a crucial issue for reducing false alarm rates. Moreover, detecting anomalous ratings for unlabeled and real-world data is always a big challenge. In this paper, we investigate a two-stage detection framework to spot anomalous rating profiles. Firstly, interfering rating profiles are determined by comprehensively analyzing the distributions of user activity, item popularity and special ratings in order to eliminate sparse ratings. Based on the reserved rating profiles, combining target item analysis and non-linear structure clustering is then adopted to further determine the concerned attackers. Extensive experimental comparisons in diverse attacks demonstrate the effectiveness of the proposed method compared with competing benchmarks. Additionally, discovering interesting findings including anomalous ratings and items on two real-world datasets, Amazon and TripAdvisor, is also investigated.

Introduction

Online rating systems have been significantly developed in parallel with the social networks in the last decade. Rating data is ubiquitous on the well-known E-commerce websites including Amazon, Taobo, TripAdvisor, Yelp and etc. [5], [6], [11], [12], [21], [32], [48], [55], [61]. Personalization collaborative recommender systems play a crucial role in handling the increasingly prominent problem of information overload, which automatically suggest to a user items that might be of interest to her [1], [4], [23], [24], [25], [26], [27], [28], [41]. However, collaborative filtering recommender systems (CFRSs) are highly vulnerable to outside attacks, called profile injection attacks (a.k.a. shilling attacks) [6], [34], due to the fact that recommender systems are entirely based on the input provided by users or customers [3], [7], [8], [14], [16], [19], [22], [37], [44], [49], [56], [61]. Profile injection attacks, in which attackers manipulate biased ratings in order to influence future recommendations, have been demonstrated to be effective against collaborative filtering recommendation engines. According to the intention of attackers, shilling attacks can be classified in two basic categories: inserting malicious profiles which rate a particular item highly are called push attacks, conversely inserting malicious profiles aimed at downgrading the popularity of an item are termed nuke attacks [31]. Anonymous or pseudonymous users in online systems can multiply their profiles and identities nearly indefinitely, which utilize well-designed rating profiles to produce recommendation behaviors that the attackers desire. Therefore, proactively identifying the malicious rating profiles is extremely significant and meaningful for personalized collaborative recommendations.

Securing collaborative filtering recommender systems from malicious attacks have become an important issue with increasing popularity of recommender systems [13], [47]. Although previous researches have shown promising results, defending such attacks is still an unresolved technique, and has not reached a full level of performance [13], [36], [47], [57], [61], [62], [63], [64]. In particular, how to construct a strategy that can be used to spot anomalous ratings for real-world data is also extremely desired. Furthermore, developing detection method which can effectively defense diverse shilling attacks is always a big challenge. Moreover, compared with the number of genuine profiles (authentic profiles), the number of attack profiles is very small in rating systems. The distinct difference between the numbers of genuine and attack profiles is call imbalanced distribution of rating profiles [6], [53]. The imbalanced distribution makes a challenging task for abnormality detection due to the difficulty of characterizing rating behaviors of users. How to eliminate a part of genuine profiles (interfering rating profiles) and reduce imbalanced distribution before anomaly detection is a concerned task especially for large-scale and real-world data. With respect to unlabeled and real-world datasets, investigating abnormality forensics metrics for determining the concerned users or items is a realistic problem that cannot be ignored.

In this paper, we present a two-stage detection framework to spot anomalous rating profiles. Facing with the imbalanced distribution of rating profiles, interfering rating profiles are first determined by comprehensively analyzing the rating distribution of user activity, the distribution of item popularity and special ratings in order to eliminate sparse ratings. The goal of the first stage is to filter out interfering rating profiles (genuine profiles) [36], [52] as many as possible and simultaneously reserve all attack profiles. Based on the remaining rating profiles, combining target item analysis and non-linear structure clustering is then adopted to further determine the concerned attackers. Since shilling attackers mimic rating details of authentic user to manipulate attack profiles, it is difficult to identify them. A robust multiple kernel data clustering method is employed to distinguish the attack profiles from authentic profiles in an appropriate feature space while the clusters are not linear separable in the original space. Moreover, we also explore evaluation metrics of abnormality forensics for discovering interesting findings in two real-world datasets including TripAdvisor and Amazon. More importantly, analyzing the internal relationship between historical ratings and reviews of items is provided to spot anomalous items. Extensive experimental comparisons on diverse attack datasets demonstrate the effectiveness of the proposed detection method compared with competing benchmarks. In addition, discovering interesting findings including anomalous items, ratings and etc. on Amazon and TripAdvisor datasets is investigated.

The main contributions of this paper are four-fold as follows:

  • Eliminating interfering profiles according to the distributions of users’ activity, items’ popularity and sparse ratings in advance provides a feasible idea for abnormality detection faced with the imbalanced distribution of rating profiles, which is also favorable to characterize rating behaviors of users.

  • Combining target item analysis and non-linear structure clustering is effective to reduce the scope of determining anomalous users. The false alarm rate of the proposed approach can be further reduced.

  • To discover suspicious items or ratings on unlabeled and real-world datasets, suspected items detected by the proposed approach are further determined by comprehensively analyzing intrinsic association between overall rating and each aspect rating on the same item, rating behavior aggregation, rating intention distribution and topological structure analysis of suspicious items.

  • Extensive experiments on both synthetic datasets in 10 different attacks and real-world datasets including Amazon and TripAdvisor are conducted to demonstrate the effectiveness of the proposed approach.

The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 introduces the proposed method in detail. In Section 4, experimental results are reported and analyzed. Finally, we briefly conclude the paper with a brief summary and discuss our future work.

Section snippets

Related work

Detecting anomalous rating behaviors has received much attentions over the last decade and achieved impressive results. In this section, we only discuss methods related to the presented work, which can be briefly introduced in the following three aspects, namely eliminating sparse ratings, clustering for shilling attack detection and anomalous rating detection for real-world data.

The proposed approach

In this paper, a two-stage detection framework is proposed to detect shilling attackers and discover interesting findings for real-world datasets. Firstly, analyzing the distribution of user rating, item distribution and special ratings is provided to determine interfering rating profiles. The determined interference rating profiles are then eliminated in advance. Based on the remaining rating profiles, combining target item analysis and non-linear spatial clustering is adopted to finally

Experiment simulation

In this section, experiment settings will be first introduced. The performance of each stage of the proposed detection method will be discussed in diverse attacks. Furthermore, we also analyze the detection performance of all presented methods in different attacks. To discover interesting findings in real-world datasets, extensive experiments are conducted to demonstrate the practicability of the proposed approach. Finally, we briefly discuss the experimental results.

Conclusions and future work

Collaborative filtering recommender systems are highly vulnerable to shilling attacks or profile injection attacks. The existence of sparse rating profiles and the limitation of existing rating behavior features are always the challenges for attack detection in CFRSs. In this paper, we proposed a relatively flexible detection framework which consists of two stages to detect shilling attacks and discover anomalous ratings in real-world datasets. Analyzing the distribution of users, items and

Acknowledgments

The research is supported by the National Natural Science Foundation of China (No.: 61702412 and 61571360), Shaanxi Science & Technology Co-ordination & Innovation Project (No.: 2016KTZDGY05-09), the Innovation Project of Shaanxi Provincial Department of Education (No.: 17JF023) and Ph.D. Research Startup Funds of Xi’an University of Technology (No.: 112-256081704). In addition, three anonymous reviewers have carefully read this paper and have provided to us numerous constructive suggestions.

Zhihai Yang received the Ph.D. degree in Control Science and Engineering from Xi’an Jiaotong University, in 2016. He is currently a lecturer in the School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China. His research interests include information security, recommender system and data mining.

References (65)

  • ZhangF. et al.

    HHT-SVM: an online method for detecting profile injection attacks in collaborative recommender systems

    Knowl. Based Syst.

    (2014)
  • ZhangZ. et al.

    Graph-based detection of shilling attacks in recommender systems

    Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing

    (2013)
  • ZhangZ. et al.

    Detection of shilling attacks in recommender systems via spectral clustering

    Proceedings of the International Conference on Information Fusion

    (2014)
  • ZhouZ.H. et al.

    Deep forest: towards an alternative to deep neural networks

    Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI’17)

    (2017)
  • G. Adomavicius et al.

    Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

    IEEE Trans. Knowl. Data Eng.

    (2005)
  • R. Bhaumik et al.

    A clustering approach to unsupervised attack detection in collaborative recommender systems

    Proceedings of the Proceedings of seventh IEEE ICML

    (2011)
  • K. Bryan et al.

    Unsupervised retrieval of attack profiles in collaborative recommender systems

    Proceedings of the ACM conference on Recommender Systems

    (2008)
  • R. Burke et al.

    Classification features for attack detection in collaborative recommender systems

    Proceedings of the International Conference on Knowledge Discovery and Data Mining

    (2006)
  • CaoJ. et al.

    Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system

    World Wide Web

    (2013)
  • DuL. et al.

    Unsupervised feature selection with adaptive structure learning

    Proceedings of the Twenty-firth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2015)
  • DuL. et al.

    Robust multiple kernel k-means clustering using l21-norm

    Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI)

    (2015)
  • N. Günnemann et al.

    Robust multivariate autoregression for anomaly detection in dynamic product ratings

    Proceedings of the Twenty-third International Conference on World Wide Web

    (2014)
  • S. Günnemann et al.

    Detecting anomalies in dynamic rating data: a robust probabilistic model for rating evolution

    Proceedings of the KDD’2014

    (2014)
  • I. Gunes et al.

    Shilling attacks against recommender systems: a comprehensive survey

    Artif. Intell. Rev.

    (2012)
  • HeF. et al.

    Attack detection by rough set theory in recommendation system

    Proceedings of the IEEE International Conference on Granular Computing

    (2010)
  • HuangH.C. et al.

    Multiple kernel learning algorithms

    IEEE Trans. Fuzzy Syst.

    (2012)
  • HuangS. et al.

    A hybrid decision approach to detect profile injection attacks in collaborative recommender systems

    Found. Intell. Syst.

    (2012)
  • N. Hurley et al.

    Statistical attack detection

    Proceedings of the Third ACM Conference on Recommender Systems

    (2009)
  • JiangM. et al.

    Catchsync: catching synchronized behavior in large directed graphs

    Proceedings of the Proceedings of the Twentieth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2014)
  • S. Lam et al.

    Shilling recommender systems for fun and profit

    Proceedings of the Thirteenth International Conference on World Wide Web

    (2004)
  • LeeJ. et al.

    Shilling attack detection: a new approach for a trustworthy recommender system

    INFORMS J. Comput.

    (2012)
  • LiB. et al.

    Noisy but non-malicious user detection in social recommender systems

    World Wide Web

    (2013)
  • Cited by (15)

    • Sampling and noise filtering methods for recommender systems: A literature review

      2023, Engineering Applications of Artificial Intelligence
    • Decision making towards large-scale alternatives from multiple online platforms by a multivariate time-series-based method

      2023, Expert Systems with Applications
      Citation Excerpt :

      New users can access service or product information by means of unstructured data such as text, pictorial or audiovisual data. Many online platforms introduced the so-called rating system in which users can evaluate objects by giving discrete ratings (Yang, Sun, Zhang, & Zhang, 2018). To approximately judge the quality of a certain object, a user can refer to the historical ratings that the object received (Liao et al., 2014).

    • Semi-supervised recommendation attack detection based on Co-Forest

      2021, Computers and Security
      Citation Excerpt :

      However, when detecting attack profiles with large attack sizes, this method has poor performance. The clustering-based methods (Lee and Zhu, 2012; Yang et al., 2018; Zhang et al., 2018) try to detect recommendation attack by clustering genuine profiles and attack profiles into different clusters. However, this type of methods may have low detection performance when there is only one type of user profiles in the test set.

    • Recommendation attack detection based on deep learning

      2020, Journal of Information Security and Applications
      Citation Excerpt :

      Recommendation attack presents a great challenge to play a normal function of collaborative recommender systems. To detect recommendation attack, researchers have proposed many methods [6–18,20,21] by using the traditional machine learning techniques, such as clustering technique [9], Hidden Markov Model [13], C4.5, and SVM [14–16]. Despite the effectiveness of these methods, a large number of them [14–18,20,21] are built based on the hand-designed features which is usually a challenge task to extract even for domain experts.

    • A novel Enhanced Collaborative Autoencoder with knowledge distillation for top-N recommender systems

      2019, Neurocomputing
      Citation Excerpt :

      With the rapid development of Internet and E-commerce, information overloading has become a severe problem that makes it difficult to find useful information for users [1,2]. To address this problem, numerous recommender systems are proposed to make personal recommendations for users to help finding information to feed their requirements [3–9]. These methods are widely applied in most web-services, such as Tmall, Ciao and Epinions.

    View all citing articles on Scopus

    Zhihai Yang received the Ph.D. degree in Control Science and Engineering from Xi’an Jiaotong University, in 2016. He is currently a lecturer in the School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China. His research interests include information security, recommender system and data mining.

    Qindong Sun received his Ph.D. degree in School of Electronic and Information Engineering from the Xi’an Jiaotong University, China. He is currently a professor at the Department of Computer Science and Engineering of Xi’an University of Technology. His research interests include network information security, online social networks and internet of things.

    Yaling Zhang received the B.S. degree in computer science in 1988 from Northwest University, Xi’an, China. She received the B.S. degree in computer science in 2001, and earned the Ph.D. degree in mechanism electron engineering in 2008, both from the Xi’an University of Technology, Xi’an, China. Currently, she is a professor in Xi’an University of Technology. Hers current research interests include cryptography and differential privacy protection in data mining.

    Beibei Zhang received the Ph.D. degree in Control Science and Engineering from Xi’an Jiaotong University, in 2012. He accepted a postdoctoral position in Xi’an Jiaotong University from 2012 to 2016. He is currently a lecturer in the School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China. His research interests include information propagation, topic detection and tracking, complex network structure evolution detection and data mining.

    View full text