Elsevier

Electronic Commerce Research and Applications

Volume 26, November–December 2017, Pages 101-108
Electronic Commerce Research and Applications

Using community preference for overcoming sparsity and cold-start problems in collaborative filtering system offering soft ratings

https://doi.org/10.1016/j.elerap.2017.10.002Get rights and content

Highlights

  • This paper proposes a new collaborative filtering recommender system offering soft ratings.

  • In the system, community preferences are used for overcoming sparsity and cold-start problems.

  • The new system is experimentally tested on Flixster data set.

Abstract

This paper introduces a new collaborative filtering recommender system that is capable of offering soft ratings as well as integrating with a social network containing all users. Offering soft ratings is known as a new methodology for modeling subjective, qualitative, and imperfect information about user preferences, as well as a more realistic and flexible means for users to express their preferences on products and services. Additionally, in the system, community preferences that are extracted from the social network are employed for overcoming sparsity and cold-start problems. In the experiment, the new system is tested using a data set culled from Flixster, a social network focused on movies. The experiment’s results show that this system is more effective than the selected baseline in terms of recommendation accuracy.

Introduction

Recommender systems (RSs) (Adomavicius and Tuzhilin, 2005, Hwang et al., 2010, Kim et al., 2002) have been developing rapidly since they were introduced in 1990s; in practice, RSs have been applied in a variety of e-commerce applications (Linden et al., 2003, Shambour and Lu, 2011, Kim et al., 2011). In general, RSs collect information about user preferences from multiple sources, estimate user preferences on unseen items, and then generate suitable recommendations based on the estimated data. Logically, the quality of the recommendations in an RS mainly depends on the representation of user preferences and the accuracy of estimations.

Conventional RSs described in the literature usually provide a rating domain defined as a totally ordered finite set of rating scores, and represent a user’s preference for an item as a rating score (i.e., a hard rating). However, user preferences are naturally subjective and qualitative; therefore, representing user preferences as hard ratings is not appropriate in some cases (Nguyen and Huynh, 2015). For example, let us consider an RS offering hard ratings with a rating domain Θ={1,2,3,4,5}. Suppose that a user has rated two items, I1 and I3, with rating scores of 4 and 5, respectively; further, this user wants to rate item I2 with a rating score indicating that it is better than I1 but worse than I3. By using a rating score, the user can evaluate item I2 only as 4 or 5; as a consequence, this user may hesitate to express his/her preference for the item. In this scenario, it will be more comfortable for the user to use a combination of rating scores, such as {4,5} (i.e., a soft rating). Moreover, even though a user has evaluated an item by using a hard rating, this rating might encode imperfect information inside. For instance, when a user has rated an item with a rating score, such as 3, we cannot know exactly what this user thought about the item because the user probably wanted to evaluate the item as at least 3 or 3 for 90%. To addess such situations, the use of soft ratings in RSs has recently been studied and developed for the purpose of capturing subjective, qualitative, and imperfect information about user preferences (Wickramarathne et al., 2011, Nguyen and Huynh, 2014, Nguyen and Huynh, 2015). With a rating domain Θ={1,2,3,4,5}, a user can express his/her preference for an item by using a soft rating as follows:

  • 3 for sure ({3} with a probability of 1.0)

  • At least 3 ({3,4,5} with a probability of 1.0)

  • 3 for 90% ({3} and Θ with probabilities of 0.9 and 0.1, respectively)

  • Less than 3 ({1,2} with a probability of 1.0)

According to the aforementioned studies, RSs offering soft ratings are developed based on the Dempster-Shafer theory (DST) (Dempster, 1967, Shafer, 1976), which is considered as one of the most general theories for modeling imperfect information (Wickramarathne et al., 2011).

Furthermore, in the RS research area, recommendation techniques can be divided into three main categories: collaborative filtering, content-based, and hybrid (Adomavicius and Tuzhilin, 2005). Among these, collaborative filtering is regarded as the most well-known technique (Ricci et al., 2011, Jannach et al., 2012). To provide personalized recommendations to an active user, collaborative filtering RSs commonly try to find other users who are expected to have preferences similar to those of the user, and employ those users’ ratings to estimate the original user’s preferences for unseen items. However, collaborative filtering RSs also suffer from two fundamental limitations known as sparsity and cold-start problems (Adomavicius and Tuzhilin, 2005). The first problem is caused when each user only rates a very small number of items; as a result, the number of ratings is insufficient and recommendation performance is significantly affected (Huang et al., 2004). The second problem arises because of missing information about new items and new users.

Social networks are growing very fast and play a significant role on the Internet. Additionally, communication and collaboration in social networks have become more and more convenient and frequent. In a social network, users naturally form into various communities whose members interact frequently with one another (Tang and Liu, 2010). These social relations might influence individual behaviors and decisions, including those related to buying items. Commonly, when consulting for advice before buying new items, most people tend to believe in recommendations from friends in the same community rather than in recommendations from outside users. In a community, each member has his/her own preference for an item, and the overall preference of all members is called the community preference for that item. In practice, community preferences can be used for better understanding of user behaviors and ratings; thus, these preferences can potentially be used for predicting the missing information about user preferences. In other words, community preferences might be useful for overcoming the limitations in collaborative filtering RSs.

On the basis of this observation, this paper proposes a new collaborative filtering RS that is capable of (1) representing user preferences as soft ratings and (2) exploiting community preferences derived from a social network containing all users, which helps address the aforementioned sparsity and cold-start problems. In the system, community preferences are employed for predicting all unprovided user ratings on items (including new users and new items); subsequently, active users receive personalized recommendations based mainly on the provided and predicted ratings.

The remainder of this paper is organized as follows. In Section 2, background information about DST is introduced. In Section 3, related work is discussed. The proposed system is described in Section 4. In Section 5, experiments are presented, and their results are discussed. Finally, conclusions and suggestions for future research are presented in Section 6.

Section snippets

Dempster-Shafer theory

DST (Dempster, 1967, Shafer, 1976), also called evidence theory or the theory of belief functions, is a well-known theory for modeling uncertain, imprecise, and incomplete information. In the context of this theory, let us consider a problem domain represented by a finite set Θ={θ1,θ2,,θL} of mutually exclusive and exhaustive hypotheses (Shafer, 1976). A function m:2Θ[0,1] is called a mass function if it satisfiesm()=0andAΘm(A)=1.

A subset AΘ with m(A)>0 is called a focal element. In

Related work

Over the years, many researchers have focused on tackling the two fundamental problems in collaborative filtering RSs, and various methods have been developed for addressing these problems. Regarding the previous studies, matrix factorization (Barjasteh et al., 2015, Bauer and Nanopoulos, 2014, Guo et al., 2015), which exploits latent factors for predicting all unprovided ratings, is known as a popular method. In addition, some authors proposed combining collaborative filtering with

Data modeling

Let Θ denote a rating domain consisting of L preference labels, Θ={θ1,θ2,,θL}. In addition, let U and I denote a set containing M users and a set consisting of N items, respectively; in other words, U={U1,U2,,UM} and I={I1,I2,,IN}. Users can express their preferences on items by using soft ratings; each soft rating of user UiU on item IkI is modeled by a mass function ri,k:2Θ[0,1]. Further, Dempster’s rule of combination (Dempster, 1967) is used for combining information about user

Experiment

For evaluating the proposed system, data sets need to contain (1) rating data, (2) social network, and (3) context information. After considering the available data sets, we found that Flixster data set (Nguyen and Huynh, 2014) can satisfy these requirements. Thus, we selected this data set in the experiments.

Flixster data set contains 535,013 hard ratings from 3,827 users on 1210 movies with a rating domain consisting of 10 elements, Θ={0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0} (L=10); the data

Conclusion

Compared to RSs offering hard ratings, RSs using soft ratings are more effective because they (1) provide better modeling of subjective, qualitative, and imperfect information about user preferences and (2) allow users to express preferences in a more realistic and flexible manner. Recently social networks have been developing very fast, and these networks play a significant role on the Internet. It is possible that social relations in social networks can naturally influence individual

Acknowledgement

This research work was supported by JSPS KAKENHI Grant No. 25240049.

References (42)

  • B. Lika et al.

    Facing the cold start problem in recommender systems

    Expert Syst. Appl.

    (2014)
  • P. Smets et al.

    The transferable belief model

    Artif. Intell.

    (1994)
  • Z. Sun et al.

    Recommender systems based on social networks

    J. Syst. Software

    (2015)
  • H. Wu et al.

    Collaborative topic regression with social trust ensemble for recommendation in social media systems

    Knowl. Based Syst.

    (2016)
  • G. Adomavicius et al.

    Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

    IEEE Trans. Knowl. Data Eng.

    (2005)
  • I. Barjasteh et al.

    Cold-start item and user recommendation with decoupled completion and transduction

  • A.P. Dempster

    Upper and lower probabilities induced by a multivalued mapping

    Ann. Math. Stat.

    (1967)
  • M. Girvan et al.

    Community structure in social and biological networks

    Proc. Natl. Acad. Sci.

    (2002)
  • Grcar, M., Mladenic, D., Fortuna, B., Grobelnik, M., 2005. Data sparsity issues in the collaborative filtering...
  • Gregory, S., 2007. An algorithm to find overlapping community structure in networks. In: Proceedings of the 11th...
  • Gregory, S., 2008. A fast algorithm to find overlapping communities in networks. In: Proceedings of the European...
  • Cited by (0)

    This paper is an extended and revised version of the conference paper presented at CSoNet-2016 (Nguyen and Huynh, 2016)

    View full text