A framework for collaborative filtering recommender systems

doi:10.1016/j.eswa.2011.05.021

Expert Systems with Applications

Volume 38, Issue 12, November–December 2011, Pages 14609-14623

https://doi.org/10.1016/j.eswa.2011.05.021 Get rights and content

Abstract

As the use of recommender systems becomes more consolidated on the Net, an increasing need arises to develop some kind of evaluation framework for collaborative filtering measures and methods which is capable of not only testing the prediction and recommendation results, but also of other purposes which until now were considered secondary, such as novelty in the recommendations and the users’ trust in these. This paper provides: (a) measures to evaluate the novelty of the users’ recommendations and trust in their neighborhoods, (b) equations that formalize and unify the collaborative filtering process and its evaluation, (c) a framework based on the above-mentioned elements that enables the evaluation of the quality results of any collaborative filtering applied to the desired recommender systems, using four graphs: quality of the predictions, the recommendations, the novelty and the trust.

Highlights

► Framework to evaluate the quality results of any collaborative filtering based recommender system. ► Provides 4 graphs: quality of the predictions, the recommendations, the novelty and the trust. ► Equations that formalize and unify the collaborative filtering process and its evaluation.

Introduction

Recommender systems (RS) are developed to attempt to reduce part of the information overload problem produced on the Net. As opposed to other traditional help systems, such as search engines (Google, Yahoo, etc.), RS generally base their operation on a Collaborative Filtering (CF) process, which provides personalized recommendations to active users of websites where different elements (products, films, holidays, etc.) can be rated.

RS are inspired by human social behavior, where it is common to take into account the tastes, opinions and experiences of our acquaintances when making all kinds of decisions (choosing films to watch, selecting schools for our children, choosing products to buy, etc.). Obviously, our decisions are modulated according to our interpretation of the similarity that exists between us and our group of acquaintances, in such a way that we rate the opinions and experiences of some more highly than others.

By emulating each step of our own behavior insofar as is possible, the CF process of RS firstly selects the group of users from the RS website that is most similar to us, and then provides us with a group of recommendations of elements that we have not rated yet (assuming in this way that they are new to us) and which have been rated the best by the group of users with similar tastes to us. This way, a trip to the Canary Islands could be recommended to an individual who has rated different destinations in the Caribbean very highly, based on the positive ratings about the holiday destination of “Canary Islands” of an important number of individuals who also rated destinations in the Caribbean very highly. This suggestion (recommendation) will often provide the user of the service with inspiring information from the collective knowledge of all other users of the service.

RS cover a wide variety of applications (Baraglia and Silvestri, 2004, Bobadilla et al., 2009, Fesenmaier et al., 2002, Jinghua et al., 2007, Serrano et al., 2011), although those related to movie recommendations are by far the best and most widely-used in the research field (Antonopoulus and Salter, 2006, Konstan et al., 2004).

A substantial part of the research in the area of CF focuses on how to determine which users are similar to the given one; in order to tackle this task, there are fundamentally three approaches: memory-based methods, model-based methods and hybrid approaches.

Memory-based methods (Bobadilla et al., in press, Bobadilla et al., 2010, Kong et al., 2005, Sanchez et al., 2008, Symeonidis et al., 2008) use similarity metrics and act directly on the ratio matrix that contains the ratings of all users who have expressed their preferences on the collaborative service; these metrics mathematically express a distance between two users based on each of their ratios. Model-based methods (Adomavicius & Tuzhilin, 2005) use the ratio matrix to create a model from which the sets of similar users will be established. Among the most widely used models we have: Bayesian classifiers (Cho, Hong, & Park, 2007), neural networks (Ingoo, Kyong, & Tae, 2003) and fuzzy systems (Yager, 2003). Generally, commercial RS use memory-based methods (Giaglis & Lekakos, 2006), whilst model-based methods are usually associated with research RS.

Regardless of the method used in the CF stage, the technical aim generally pursued is to minimize the prediction errors, by making the accuracy (Fuyuki et al., 2006, Giaglis and Lekakos, 2006, Li and Yamada, 2004, Manolopoulus et al., 2007, Su and Khoshgoftaar, 2009) of the RS as high as possible; nevertheless, there are other purposes that need to be taken into account: avoid overspecialization phenomena, find good items, trust of recommendations, novelty, precision and recall measures, sparsity, cold start issues, etc.

The framework proposed in the paper gives special importance to the quality of the predictions and the recommendations, as well as to the novelty and trust results. Whilst the importance of the quality obtained in the predictions and recommendations has been studied in detail since the start of the RS, the quality results in novelty and trust provided by the different methods and metrics used in CF have not been evaluated in depth.

Measuring the quality of the trust results in recommendations becomes even more complicated as we are entering a particularly subjective field, where each specific user can grant more or less importance to various aspects that are selected as relevant to gain their trust in the recommendations offered (recommendation of recent elements, such as film premieres, introduction of novel elements, etc.). Another additional problem is the number of nuances that can be taken into account together with the lack of consensus to define them; in this way we can find studies on trust, reputation, credibility, importance, expertise, competence, reliability, etc. which sometimes pursue the same objective and other times do not.

In Buhwan, Jaewook, and Hyunbo (2009) we can see some novel memory-based methods that incorporate the level of a user credit instead of using similarity between users. In Kwiseok, Jinhyung, and Yongtae (2009) they employ a multidimensional credibility model, source credibility from consumer psychology, and provide a credible neighbor selection method, although the equations involved require a great number of parameters of difficult or arbitrary adjustment. O’Donovan and Smyth (2005) presents two computational models of trust and show how they can be readily incorporated into CF frameworks. Kitisin and Neuman (2006) propose an approach to include the social factors e.g. user’s past behaviors and reputation together as an element of trust that can be incorporated into the RS. Zhang (2008) and Hijikata et al., 2009 tackle the novelty issue: in the first paper they propose a novel topic diversity metric which explores hierarchical domain knowledge, whilst in the second paper they infer items that a user does not know by calculating the similarity of users or items based on information about what items users already know. An aspect related to the trust measures is the capacity to provide justifications for the recommendations made; in Symeonidis et al. (2008) they propose an approach that attains both accurate and justifiable recommendations, constructing a feature profile for the users to reveal their favorite features.

To date, various publications have been written which tackle the way the RS are evaluated, among the most significant we have Herlocker, Konstan, Riedl, and Terveen (2004) which reviews the key decisions in evaluating CF RS: the user tasks, the type of analysis and datasets being used, the ways in which prediction quality is measured and the user-based evaluation of the system as a whole. Hernández and Gaudioso (2008) is a current study which proposes a recommendation filtering process based on the distinction between interactive and non-interactive subsystems. General publications and reviews also exist which include the most commonly accepted metrics, aggregation approaches and evaluation measures: mean absolute error, coverage, precision, recall and derivatives of these: mean squared error, normalized mean absolute error, ROC and fallout; Goldberg, Roeder, Gupta, and Perkins (2001) focus on the aspects not related to the evaluation, Breese, Heckerman, and Kadie (1998) compare the predictive accuracy of various methods in a set of representative problem domains. Candillier, Meyer, and Boullé (2007) and Schafer, Frankowski, Herlocker, and Sen, 2007 review the main CF methods proposed in the literature.

Among the most significant papers that propose a CF framework is Herlocker, Konstan, Borchers, and Riedl (1999) which evaluates the following: similarity weight, significance weighting, variance weighting, selecting neighborhood and rating normalization; Hernández and Gaudioso (2008) propose a framework in which any RS is formed by two different subsystems, one of them to guide the user and the other to provide useful/interesting items. Koutrika, Bercovitz, and Garcia (2009) is a recent and very interesting framework which introduces levels of abstraction in CF process, making the modifications in the RS more flexible.

The RS frameworks proposed until now present two deficiencies which we aim to tackle in this paper. The first of these is the lack of formalization in the evaluation methods; although the quality metrics are well defined, there are a variety of details in the implementation of the methods which, in the event they are not specified, can lead to the generation of different results in similar experiments. The second deficiency is the absence of quality measures of the results in aspects such as novelty and trust of the recommendations.

The following section of this paper develops a complete series of mathematical formalizations based on sets theory, backed by a running example which aids understanding and by cases of studies which show clarifying results of the aspects and alternatives shown; in this section, we also obtain the combination of metric, aggregation approach and standardization method which provides the best results, enabling it to be used as a reference to evaluate metrics designed by the scientific community. In Section 3 we specify the evaluation measures proposed in the framework, which include the quality analysis of the following aspects: predictions (estimations), recommendations, novelty and trust; this same section shows the results obtained by using MovieLens 1M and NetFlix. Finally, we set our most relevant conclusions.

Section snippets

Framework specifications

This section provides both the equations on which the prediction/recommendation process in the CF stage is based and the equations that support the quality evaluation process offered in the proposed framework; between these last two we have the traditional MAE, coverage, precision, recall and those developed specifically to complete the framework: novelty-precision, novelty-recall, trust-precision, trust-recall.

The objective of formalizing the prediction, recommendation and evaluation processes

Proposed framework and results

The framework with which we propose to test the different CF similarity measure metrics includes the quality analysis of the following aspects: predictions, recommendations, novelty and trust. Once a suitable reference metric is considered, we will be able to compare the results obtained with the proposed metric to those obtained with the reference metric.

In each RS in operation we can decide the importance given to the quality of each of the four aspects included in the framework (predictions,

Conclusions

It is important for CF frameworks to include the specification of equations used to evaluate the results of the similarity metrics and methods, so that we can make certain that all the experiments are reproducible and comparable and, therefore, it is possible to establish them in a unified way to compare the advantages and disadvantages of the various methods and metrics proposed by the scientific community.

In the field of CF, even though RS show a broad tradition and extensive experience in

Acknowledgments

Our acknowledgement to the GroupLens Research Group and NetFlix companies.

References (35)

J. Bobadilla et al.
A new collaborative filtering metric that improves the behavior of recommender systems
Knowledge Based Systems
(2010)
J. Bobadilla et al.
Collaborative filtering adapted to recommender systems of e-learning
Knowledge Based Systems
(2009)
R.R. Yager
Fuzzy logic methods in recommender systems
Fuzzy Sets and Systems
(2003)
E. Adomavicius et al.
Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions
IEEE Transactions on Knowledge and Data Engineering
(2005)
N. Antonopoulus et al.
CinemaScreen recommender agent: Combining collaborative and content-based filtering
IEEE Intelligent Systems
(2006)
Baraglia, R., & Silvestri, F. (2004). An online recommender system for large web sites. In Proceedings of the...
Bobadilla, J., Ortega, F., & Hernando, A. (in press). A collaborative filtering similarity measure based on...
J.S. Breese et al.
Empirical analysis of predictive algorithms for collaborative filtering
J. Buhwan et al.
User credit-based collaborative filtering
Expert Systems with Applications
(2009)
L. Candillier et al.
Comparing state-of-the-art collaborative filtering systems
LNAI
(2007)

S.B. Cho et al.

Location-based recommendation system using bayesian user’s preference model in mobile devices

LNCS

(2007)

D.R. Fesenmaier et al.

Intelligent systems for tourism

Intelligent Systems

(2002)

Fuyuki, I., Quan, T. K., & Shinichi, H. (2006). Improving accuracy of recommender systems by clustering items based on...

GM. Giaglis et al.

Improving the prediction accuracy of recommendation algorithms: approaches anchored on human factors

Interacting with Computers

(2006)

K. Goldberg et al.

Eigentaste: A constant time collaborative filtering algorithm

Information Retrieval

(2001)

J.L. Herlocker et al.

An algorithmic framework for performing collaborative filtering

SIGIR

(1999)

J.L. Herlocker et al.

Evaluating collaborative filtering recommender systems

ACM Transactions on Information Systems

(2004)

Cited by (106)

What rating they will probably give: A cognitive diagnosis approach for recommending items based on polytomous responses and latent attributes
2024, Expert Systems with Applications
Recommendation Systems have become prevalent in recent years, attracting the attention of researchers to investigate different methods to filter relevant information for users. This information is not always explicit and different proposals have emerged to obtain the latent values of individuals through their behavior. In educational areas, latent attributes of test-takers can be acquired by psychometric models such as the Cognitive Diagnostic Model. These models attempt to create a user’s profile in order to explore the connections between students and subjects, just like a recommendation system does with its users and the products to be recommended. The objective of this work is to develop a new recommendation approach that incorporates Cognitive Diagnostic Models applied to data from media defined by discrete content (such as genres in movies and series) in order to generate its polytomous response in the form of the rating prediction that a user would give to each item. The proposed approach was applied to two datasets (MovieLens20M Dataset and Anime Recommendation Database). The new proposal was also considered with additional information regarding the popularity of the items, in an enhanced version of our model, and compared to classic recommendation systems found in the literature.
Finally, this work also explored the performance of the models in ranking items to be recommended for the users.
In general, the new method obtained better results than the classic recommendation ones for both the predicted rating and the item ranking.
A dual learning-based recommendation approach
2022, Knowledge-Based Systems
Data sparsity and cold start are two critical issues which need to be addressed in recommender systems (RSs). Currently, most methods address these issues by applying user history files or some side information to improve the user model and complete the rating matrix. However, such methods cannot perform well when labeled data is scarce or unavailable. In this paper, we propose a dual learning-based recommendation approach (DLRA). DLRA can trigger initial recommendation and improve the quality of recommendations by using the duality characteristics of RSs, even when the available labeled information is scarce. Specifically, DLRA regards the recommendation task as two independent subtasks — primal task and dual task, and these two tasks show strong duality in DLRA. The primal task is item-centered which aims to find users who can rate high for items, while the dual task is user-centered that aims to recommend the most favorite items to users. These two tasks have strong dualities in terms of the recommendation space, selection probability and recommendation basis. Based on these dualities, we design three dual learning strategies to couple the whole recommendation process and realize the self-tuning and self-improvement of each task model, and finally optimize the whole recommendation model. Based on the dataset of Movielens and BookCrossing, we simulate data sparsity and cold start recommendation scenarios, the experimental results show that DLRA achieves substantial improvement when the labeled data is scare, and it outperforms other hybrid recommendation approaches and deep learning strategies with a smaller predictive error as well as better recommendation accuracy.
MFSR: A novel multi-level fuzzy similarity measure for recommender systems
2021, Expert Systems with Applications
There is nowadays explosive growth and diversity of information due to the development of the internet. Thus, decision making in various fields has faced different challenges. Recommender systems by identifying the interests of users, data filtering and data management, offer personalized services to users. This is beneficial for marketing and user satisfaction. Recommender system has always faced challenges such as cold start, sparsity, scalability, accuracy, and quality. Collaborative Filtering (CF) as one of the most successful methods used in recommender systems is based on the similarity between users. We argue that similarity is a fuzzy notion and we get more realistic results in recommender systems by using fuzzy logic. Fuzzy logic deals better with uncertainty and is an effective method to identify ambiguities and uncertainty in measuring the similarity of items and users. In this paper, we present a new multi-level fuzzy similarity measure for recommender systems, called MFSR, which is based on popularity and significance. In order to improve the accuracy and quality of recommendations, we also propose a hierarchical structure for calculation of the similarity. To evaluate the contribution of this work, we use MAE, F1, recall, and precision. The MAE value based on the proposed similarity measure and the hierarchical structure is equal to 0.423 and outperforms the PIP and NHSM respectively by %4 and %13. Also, using the proposed similarity measure and the hierarchical structures, we obtain F1 value equal to 0.654, which outperforms the PIP and NHSM respectively by %17 and %20. We have also observed an improvement in recall and precision using the proposed approach. The results show that the proposed method (MFSR) performs better than similar methods in recent years such as PIP and NHSM.
The crowd against the few: Measuring the impact of expert recommendations
2020, Decision Support Systems
A large amount of research on recommender systems has focused on improving the accuracy of suggestions in offline settings. However, this focus and the commonly used techniques can lead to a “filter bubble”, severely limiting the diversity of content discovered by users. Several offline studies show that this can be mitigated by using experts for recommendation. In contrast to standard recommender systems, experts are able to generate more diverse recommendations and increase the novelty of given suggestions. They can be used in missing-data or cold-start scenarios and reduce noise in the users' ratings. This paper examines the impact of employed experts' recommendations on user behavior for a real-world recommender system on a popular video-on-demand website, provided by a large television network. We study whether the potential benefits of experts lead to differences in user behavior, user perceptions and properties of given recommendations (e.g., diversity). We find that enriching a state-of-the-art system with the suggestions of employed experts can significantly increase platform use. Even though expert recommendations are used less frequently and are less successful than expected, users watch a greater number of clips, use more recommendations, and come back to the website more frequently when they receive expert suggestions. When searching for other influencing factors, we find that experts generate more diverse recommendations and improve the taste coverage of the system keeping user satisfaction unaffected. In summary, our results show large benefits of using employed experts and have implications for the design and use of recommender systems in real-world scenarios.
Providing effective recommendations in discussion groups using a new hybrid recommender system based on implicit ratings and semantic similarity
2020, Electronic Commerce Research and Applications
Discussion groups are one of the most important elements of collaborative learning which utilize recommender systems to improve their performance in several aspects. This type of learning facilitates a comfort communication between users to share their problems and questions and receive the appropriate solutions. Most of recommender systems of discussion groups are based on using collaborative filtering techniques and a few numbers of them use content-based or hybrid filtering. Experimental results of previous works show that using hybrid recommender systems on discussion groups’ databases cause significant improvement in accuracy of recommended posts in comparison with other filtering techniques (Kardan and Ebrahimi, 2013). To improve performance of (Kardan and Ebrahimi, 2013), in this paper, a new recommender system is represented, which includes three parts, namely content-based, collaborative, and hybrid filtering parts. The proposed recommender system uses the tagging features to provide more appropriate recommendations on discussion groups. For this purpose, semantic relevance of tags is extracted using WordNet lexical database and the tags are organized in a hierarchical structure based on their semantic relevance. The hierarchical structure is used for searching relevant posts in content-based filtering part, and the user’s query is extended using related semantic tags. The implicit ratings of the users are calculated in the collaborative filtering part using similarity measures. Finally, the results of these two parts are combined in the hybrid filtering part of the proposed system to recommend the posts of the discussion group which are similar to the query of the active user. Experimental results show higher precision of the proposed system comparing to the former recommender systems.
Personalized Privacy Preservation in Consumer Mobile Trajectories
2024, Information Systems Research

View all citing articles on Scopus

View full text

A framework for collaborative filtering recommender systems

Abstract

Highlights

Introduction

Section snippets

Framework specifications

Proposed framework and results

Conclusions

Acknowledgments

Knowledge Based Systems

Knowledge Based Systems

Fuzzy Sets and Systems

Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions

IEEE Transactions on Knowledge and Data Engineering

CinemaScreen recommender agent: Combining collaborative and content-based filtering

IEEE Intelligent Systems

Empirical analysis of predictive algorithms for collaborative filtering

User credit-based collaborative filtering