Elsevier

Expert Systems with Applications

Volume 38, Issue 12, November–December 2011, Pages 14609-14623
Expert Systems with Applications

A framework for collaborative filtering recommender systems

https://doi.org/10.1016/j.eswa.2011.05.021Get rights and content

Abstract

As the use of recommender systems becomes more consolidated on the Net, an increasing need arises to develop some kind of evaluation framework for collaborative filtering measures and methods which is capable of not only testing the prediction and recommendation results, but also of other purposes which until now were considered secondary, such as novelty in the recommendations and the users’ trust in these. This paper provides: (a) measures to evaluate the novelty of the users’ recommendations and trust in their neighborhoods, (b) equations that formalize and unify the collaborative filtering process and its evaluation, (c) a framework based on the above-mentioned elements that enables the evaluation of the quality results of any collaborative filtering applied to the desired recommender systems, using four graphs: quality of the predictions, the recommendations, the novelty and the trust.

Highlights

► Framework to evaluate the quality results of any collaborative filtering based recommender system. ► Provides 4 graphs: quality of the predictions, the recommendations, the novelty and the trust. ► Equations that formalize and unify the collaborative filtering process and its evaluation.

Introduction

Recommender systems (RS) are developed to attempt to reduce part of the information overload problem produced on the Net. As opposed to other traditional help systems, such as search engines (Google, Yahoo, etc.), RS generally base their operation on a Collaborative Filtering (CF) process, which provides personalized recommendations to active users of websites where different elements (products, films, holidays, etc.) can be rated.

RS are inspired by human social behavior, where it is common to take into account the tastes, opinions and experiences of our acquaintances when making all kinds of decisions (choosing films to watch, selecting schools for our children, choosing products to buy, etc.). Obviously, our decisions are modulated according to our interpretation of the similarity that exists between us and our group of acquaintances, in such a way that we rate the opinions and experiences of some more highly than others.

By emulating each step of our own behavior insofar as is possible, the CF process of RS firstly selects the group of users from the RS website that is most similar to us, and then provides us with a group of recommendations of elements that we have not rated yet (assuming in this way that they are new to us) and which have been rated the best by the group of users with similar tastes to us. This way, a trip to the Canary Islands could be recommended to an individual who has rated different destinations in the Caribbean very highly, based on the positive ratings about the holiday destination of “Canary Islands” of an important number of individuals who also rated destinations in the Caribbean very highly. This suggestion (recommendation) will often provide the user of the service with inspiring information from the collective knowledge of all other users of the service.

RS cover a wide variety of applications (Baraglia and Silvestri, 2004, Bobadilla et al., 2009, Fesenmaier et al., 2002, Jinghua et al., 2007, Serrano et al., 2011), although those related to movie recommendations are by far the best and most widely-used in the research field (Antonopoulus and Salter, 2006, Konstan et al., 2004).

A substantial part of the research in the area of CF focuses on how to determine which users are similar to the given one; in order to tackle this task, there are fundamentally three approaches: memory-based methods, model-based methods and hybrid approaches.

Memory-based methods (Bobadilla et al., in press, Bobadilla et al., 2010, Kong et al., 2005, Sanchez et al., 2008, Symeonidis et al., 2008) use similarity metrics and act directly on the ratio matrix that contains the ratings of all users who have expressed their preferences on the collaborative service; these metrics mathematically express a distance between two users based on each of their ratios. Model-based methods (Adomavicius & Tuzhilin, 2005) use the ratio matrix to create a model from which the sets of similar users will be established. Among the most widely used models we have: Bayesian classifiers (Cho, Hong, & Park, 2007), neural networks (Ingoo, Kyong, & Tae, 2003) and fuzzy systems (Yager, 2003). Generally, commercial RS use memory-based methods (Giaglis & Lekakos, 2006), whilst model-based methods are usually associated with research RS.

Regardless of the method used in the CF stage, the technical aim generally pursued is to minimize the prediction errors, by making the accuracy (Fuyuki et al., 2006, Giaglis and Lekakos, 2006, Li and Yamada, 2004, Manolopoulus et al., 2007, Su and Khoshgoftaar, 2009) of the RS as high as possible; nevertheless, there are other purposes that need to be taken into account: avoid overspecialization phenomena, find good items, trust of recommendations, novelty, precision and recall measures, sparsity, cold start issues, etc.

The framework proposed in the paper gives special importance to the quality of the predictions and the recommendations, as well as to the novelty and trust results. Whilst the importance of the quality obtained in the predictions and recommendations has been studied in detail since the start of the RS, the quality results in novelty and trust provided by the different methods and metrics used in CF have not been evaluated in depth.

Measuring the quality of the trust results in recommendations becomes even more complicated as we are entering a particularly subjective field, where each specific user can grant more or less importance to various aspects that are selected as relevant to gain their trust in the recommendations offered (recommendation of recent elements, such as film premieres, introduction of novel elements, etc.). Another additional problem is the number of nuances that can be taken into account together with the lack of consensus to define them; in this way we can find studies on trust, reputation, credibility, importance, expertise, competence, reliability, etc. which sometimes pursue the same objective and other times do not.

In Buhwan, Jaewook, and Hyunbo (2009) we can see some novel memory-based methods that incorporate the level of a user credit instead of using similarity between users. In Kwiseok, Jinhyung, and Yongtae (2009) they employ a multidimensional credibility model, source credibility from consumer psychology, and provide a credible neighbor selection method, although the equations involved require a great number of parameters of difficult or arbitrary adjustment. O’Donovan and Smyth (2005) presents two computational models of trust and show how they can be readily incorporated into CF frameworks. Kitisin and Neuman (2006) propose an approach to include the social factors e.g. user’s past behaviors and reputation together as an element of trust that can be incorporated into the RS. Zhang (2008) and Hijikata et al., 2009 tackle the novelty issue: in the first paper they propose a novel topic diversity metric which explores hierarchical domain knowledge, whilst in the second paper they infer items that a user does not know by calculating the similarity of users or items based on information about what items users already know. An aspect related to the trust measures is the capacity to provide justifications for the recommendations made; in Symeonidis et al. (2008) they propose an approach that attains both accurate and justifiable recommendations, constructing a feature profile for the users to reveal their favorite features.

To date, various publications have been written which tackle the way the RS are evaluated, among the most significant we have Herlocker, Konstan, Riedl, and Terveen (2004) which reviews the key decisions in evaluating CF RS: the user tasks, the type of analysis and datasets being used, the ways in which prediction quality is measured and the user-based evaluation of the system as a whole. Hernández and Gaudioso (2008) is a current study which proposes a recommendation filtering process based on the distinction between interactive and non-interactive subsystems. General publications and reviews also exist which include the most commonly accepted metrics, aggregation approaches and evaluation measures: mean absolute error, coverage, precision, recall and derivatives of these: mean squared error, normalized mean absolute error, ROC and fallout; Goldberg, Roeder, Gupta, and Perkins (2001) focus on the aspects not related to the evaluation, Breese, Heckerman, and Kadie (1998) compare the predictive accuracy of various methods in a set of representative problem domains. Candillier, Meyer, and Boullé (2007) and Schafer, Frankowski, Herlocker, and Sen, 2007 review the main CF methods proposed in the literature.

Among the most significant papers that propose a CF framework is Herlocker, Konstan, Borchers, and Riedl (1999) which evaluates the following: similarity weight, significance weighting, variance weighting, selecting neighborhood and rating normalization; Hernández and Gaudioso (2008) propose a framework in which any RS is formed by two different subsystems, one of them to guide the user and the other to provide useful/interesting items. Koutrika, Bercovitz, and Garcia (2009) is a recent and very interesting framework which introduces levels of abstraction in CF process, making the modifications in the RS more flexible.

The RS frameworks proposed until now present two deficiencies which we aim to tackle in this paper. The first of these is the lack of formalization in the evaluation methods; although the quality metrics are well defined, there are a variety of details in the implementation of the methods which, in the event they are not specified, can lead to the generation of different results in similar experiments. The second deficiency is the absence of quality measures of the results in aspects such as novelty and trust of the recommendations.

The following section of this paper develops a complete series of mathematical formalizations based on sets theory, backed by a running example which aids understanding and by cases of studies which show clarifying results of the aspects and alternatives shown; in this section, we also obtain the combination of metric, aggregation approach and standardization method which provides the best results, enabling it to be used as a reference to evaluate metrics designed by the scientific community. In Section 3 we specify the evaluation measures proposed in the framework, which include the quality analysis of the following aspects: predictions (estimations), recommendations, novelty and trust; this same section shows the results obtained by using MovieLens 1M and NetFlix. Finally, we set our most relevant conclusions.

Section snippets

Framework specifications

This section provides both the equations on which the prediction/recommendation process in the CF stage is based and the equations that support the quality evaluation process offered in the proposed framework; between these last two we have the traditional MAE, coverage, precision, recall and those developed specifically to complete the framework: novelty-precision, novelty-recall, trust-precision, trust-recall.

The objective of formalizing the prediction, recommendation and evaluation processes

Proposed framework and results

The framework with which we propose to test the different CF similarity measure metrics includes the quality analysis of the following aspects: predictions, recommendations, novelty and trust. Once a suitable reference metric is considered, we will be able to compare the results obtained with the proposed metric to those obtained with the reference metric.

In each RS in operation we can decide the importance given to the quality of each of the four aspects included in the framework (predictions,

Conclusions

It is important for CF frameworks to include the specification of equations used to evaluate the results of the similarity metrics and methods, so that we can make certain that all the experiments are reproducible and comparable and, therefore, it is possible to establish them in a unified way to compare the advantages and disadvantages of the various methods and metrics proposed by the scientific community.

In the field of CF, even though RS show a broad tradition and extensive experience in

Acknowledgments

Our acknowledgement to the GroupLens Research Group and NetFlix companies.

References (35)

  • J. Bobadilla et al.

    A new collaborative filtering metric that improves the behavior of recommender systems

    Knowledge Based Systems

    (2010)
  • J. Bobadilla et al.

    Collaborative filtering adapted to recommender systems of e-learning

    Knowledge Based Systems

    (2009)
  • R.R. Yager

    Fuzzy logic methods in recommender systems

    Fuzzy Sets and Systems

    (2003)
  • E. Adomavicius et al.

    Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions

    IEEE Transactions on Knowledge and Data Engineering

    (2005)
  • N. Antonopoulus et al.

    CinemaScreen recommender agent: Combining collaborative and content-based filtering

    IEEE Intelligent Systems

    (2006)
  • Baraglia, R., & Silvestri, F. (2004). An online recommender system for large web sites. In Proceedings of the...
  • Bobadilla, J., Ortega, F., & Hernando, A. (in press). A collaborative filtering similarity measure based on...
  • J.S. Breese et al.

    Empirical analysis of predictive algorithms for collaborative filtering

  • J. Buhwan et al.

    User credit-based collaborative filtering

    Expert Systems with Applications

    (2009)
  • L. Candillier et al.

    Comparing state-of-the-art collaborative filtering systems

    LNAI

    (2007)
  • S.B. Cho et al.

    Location-based recommendation system using bayesian user’s preference model in mobile devices

    LNCS

    (2007)
  • D.R. Fesenmaier et al.

    Intelligent systems for tourism

    Intelligent Systems

    (2002)
  • Fuyuki, I., Quan, T. K., & Shinichi, H. (2006). Improving accuracy of recommender systems by clustering items based on...
  • GM. Giaglis et al.

    Improving the prediction accuracy of recommendation algorithms: approaches anchored on human factors

    Interacting with Computers

    (2006)
  • K. Goldberg et al.

    Eigentaste: A constant time collaborative filtering algorithm

    Information Retrieval

    (2001)
  • J.L. Herlocker et al.

    An algorithmic framework for performing collaborative filtering

    SIGIR

    (1999)
  • J.L. Herlocker et al.

    Evaluating collaborative filtering recommender systems

    ACM Transactions on Information Systems

    (2004)
  • Cited by (106)

    View all citing articles on Scopus
    View full text