An iterative semi-explicit rating method for building collaborative recommender systems

https://doi.org/10.1016/j.eswa.2008.07.085Get rights and content

Abstract

Collaborative filtering plays the key role in recent recommender systems. It uses a user-item preference matrix rated either explicitly (i.e., explicit rating) or implicitly (i.e., implicit feedback). Despite the explicit rating captures the preferences better, it often results in a severely sparse matrix. The paper presents a novel iterative semi-explicit rating method that extrapolates unrated elements in a semi-supervised manner. Extrapolation is simply an aggregation of neighbor ratings, and iterative extrapolations result in a dense preference matrix. Preliminary simulation results show that the recommendation using the semi-explicit rating data outperforms that of using the pure explicit data only.

Introduction

Recommender systems have gained more importance ever before as the increasing popularity of Internet and social networking, e.g., electronic commerce, Web 2.0, and web personalization. Over the last decade, they are ones of the most successful applications both in academia and in industry. Success stories can be found in recommending books and CDs at Amazon.com (Linden, Smith, & York, 2003), movies by MovieLens (Miller, Albert, Lam, Konstan, & Riedl, 2003), news by GroupLens (Konstan et al., 1997) and by MONERs (Lee & Park, 2007), ESL reading lessons (Hsu, 2008) and so forth. Nonetheless, current state-of-the-art shows that they require further improvements to make them more effective and applicable to a broader range of real-life applications. For example, developments of better methods for representing user behavior and the information about the items to be recommended, more advanced recommendation methods that incorporate various contextual information into the recommendation process and utilize multi-criteria ratings, and less intrusive and more flexible recommendation methods require to be further enhanced (Adomavicius & Tuzhilin, 2005). The paper particularly concentrates on an improvement of capturing better user behaviors, i.e., rating the user preference.

Rating for recommender systems (or collaborative filtering in particular) results in a user-item preference matrix by means of either explicit rating or implicit rating. In the explicit rating, each user examines items and assigns them rating values on a rating scale, while in the implicit rating the rating values are presumed based on the user’s behaviors such as purchase of the item, access to the information content, time duration to read the content, actions (e.g., save, print, delete) applied to the content, etc. It is reported that the explicit rating captures user preferences to items more accurately than implicit rating does (Nichols, 1998). However, the latent problem of the explicit rating, i.e., data sparsity (which is usually severer than that of the implicit rating), makes it hard to manipulate the rating matrix – i.e., recommending items to an active user – in a pragmatic sense.

The paper aims to propose a novel rating method, namely semi-explicit rating (SER), to overcome the sparsity problem. The proposed method extrapolates the rating scores of unrated elements in the principle of semi-supervised learning (Jeong et al., 2008, Lee and Lee, 2005, Lee and Lee, 2006, Lee and Lee, 2007), in that by manipulating a few labeled/rated elements mathematically a number of the rest unlabeled/unrated elements are estimated. Especially to enhance the recommendation accuracy, the proposed method iteratively updates the user-item preference matrix until it becomes stabilized.

The remainder of the paper is organized as follows: Section 2 addresses previous works on recommender systems, especially on collaborative filtering. Section 3 presents the details of the proposed method, followed by preliminary validations via numerical experiments in Section 4. Finally, the concluding remarks and future works are given in Section 5.

Section snippets

Related works

Due to massive diversity in algorithms and applications, this section briefly reviews the key research branches of the recommender systems and collaborative filtering relevant to this paper. For more comprehensive reviews and comparison, see references such as Adomavicius and Tuzhilin (2005), Deshpande and Karypis (2004) and Candillier, Meyer, and Boullé (2007).

The recommendation problem is to maximize an active user’s satisfaction by suggesting him/her a set of items from many. According to

Semi-explicit rating and recommendation prediction

This section presents a novel extrapolation method, namely semi-explicit rating (SER), that estimates unrated elements in the user-item preference matrix. The method is based on the semi-supervised learning principle, in that a number of unrated elements are filled by numerical inference of a few (sparse) explicit ratings.

Simulation setting

Preliminary simulations are conducted to validate the underpinning concept of the proposed method. The simulation is limited for it is intended only to show the validity of using the method. The dataset used is the MovieLens (ML) data, which contain 100,000 explicit ratings (on 1–5 rating scale) from 943 users and 1682 items (Sarwar et al., 2001). Note that the ML data are very sparse: the sparsity level is about 93.7% (i.e., 1-nonzero entriestotal entries=1-100,000943×1682). For the underlying

Conclusion

The recommender systems, or collaborative filtering in particular, have been omnipresent in various applications such as products recommendation, spams filtering, web personalization, etc. As the amount of information content grows, the importance of accurate recommender systems increases. The availability of correct user-item preference matrices is critical to build a better system. The explicit rating method usually gives a better preference matrix than the implicit rating methods does.

Acknowledgements

Thanks to Shyong Lam and Jon Herlocker for cleaning up and generating the MovieLens (ML) data set, and to Cai-Nicolas Ziegler and Ron Hornbaker for the Book-Crossing (BX) data set. This work was supported partially by the Korea Research Foundation under the Grant No. KRF-2008-314-D00483 and partially by the KOSEF under the Grant No. R01-2007-000-20792-0.

References (21)

  • M.-H. Hsu

    A personalized English learning recommender system for ESL students

    Expert Systems with Applications

    (2008)
  • B. Jeong et al.

    A novel method for measuring semantic similarity for xml matching

    Expert Systems with Applications

    (2008)
  • J.-S. Lee et al.

    Classification-based collaborative filtering using market basket data

    Expert Systems with Applications

    (2005)
  • H. Lee et al.

    MONERS: A news recommender for the mobile web

    Expert Systems with Applications

    (2007)
  • G. Adomavicius et al.

    Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions

    IEEE Transactions on Knowledge and Data Engineering

    (2005)
  • Araujo, R., Trielli, G., Orair, G., Ferreira, W. M., Jr., R., & Guedes, D. (2006). ParTriCluster: A scalable parallel...
  • Breese, J. S., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative...
  • L. Candillier et al.

    Comparing state-of-the-art collaborative filtering systems

    Machine Learning and Data Mining in Pattern Recognition LNCS

    (2007)
  • M. Deshpande et al.

    Evaluating collaborative filtering recommender systems

    ACM Transactions on Information Systems

    (2004)
  • M. Grcar et al.

    Data sparsity issues in the collaborative filtering framework

    Advances in Web Mining and Web Usage Analysis LNAI

    (2006)
There are more references available in the full text version of this article.

Cited by (45)

  • Multi-criteria collaborative filtering with high accuracy using higher order singular value decomposition and Neuro-Fuzzy system

    2014, Knowledge-Based Systems
    Citation Excerpt :

    Two of the most popular approaches to computing similarities between users and items are the Pearson correlation coefficient and cosine-based coefficients. One of the main problems in the recommender systems specifically CF is known as the sparsity problem [11–14]. Also, memory based CF approaches suffer from the scalability problem.

View all citing articles on Scopus
View full text