ABSTRACT
The Web offers rich relational data with different semantics. In this paper, we address the problem of document recommendation in a digital library, where the documents in question are networked by citations and are associated with other entities by various relations. Due to the sparsity of a single graph and noise in graph construction, we propose a new method for combining multiple graphs to measure document similarities, where different factorization strategies are used based on the nature of different graphs. In particular, the new method seeks a single low-dimensional embedding of documents that captures their relative similarities in a latent space. Based on the obtained embedding, a new recommendation framework is developed using semi-supervised learning on graphs. In addition, we address the scalability issue and propose an incremental algorithm. The new incremental method significantly improves the efficiency by calculating the embedding for new incoming documents only. The new batch and incremental methods are evaluated on two real world datasets prepared from CiteSeer. Experiments demonstrate significant quality improvement for our batch method and significant efficiency improvement with tolerable quality loss for our incremental method.
- F. Chung. Spectral Graph Theory. American Mathematical Society, 1997.Google Scholar
- F. Chung. Laplacians and the cheeger inequality for directed graphs. Annals of Combinatorics, 9, 2005.Google Scholar
- D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. Proc. ICML 2000. pp.167--174., 2000. Google ScholarDigital Library
- D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 430--436. MIT Press, 2001.Google Scholar
- S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
- M. Fazel, H. Hindi, and S. P. Boyd. Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. In Proceedings of American Control Conference, 2003.Google ScholarCross Ref
- R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of trust and distrust. In WWW ?04: Proceedings of the 13th international conference on World Wide Web, pages 403--412, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- X. He, H. Zha, C. H. Q. Ding, and H. D. Simon. Web document clustering using hyperlink structures. Computational Statistics & Data Analysis, 41(1):19--45, November 2002. Google ScholarDigital Library
- T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1-2):177--196, 2001. Google ScholarDigital Library
- T. Hofmann. Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst., 22(1):89--115, 2004. Google ScholarDigital Library
- B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In WWW ?01: Proceedings of the 10th international conference on World Wide Web, pages 285--295, New York, NY, USA, 2001. ACM Press. Google ScholarDigital Library
- F. Wang, S. Ma, L. Yang, and T. Li. Recommendation on item graphs. In ICDM ?06: Proceedings of the Sixth International Conference on Data Mining, pages 1119--1123, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- J. Wang, A. P. de Vries, and M. J. T. Reinders. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In SIGIR ?06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 501--508, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- K. Yu, A. Schwaighofer, V. Tresp, X. Xu, and H.-P. Kriegel. Probabilistic memory-based collaborative filtering. IEEE Transactions on Knowledge and Data Engineering, 16(1):56--69, 2004. Google ScholarDigital Library
- H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. In Neural Information Processing Systems, volume 14, 2001.Google Scholar
- D. Zhou and C. J. C. Burges. Spectral clustering and transductive learning with multiple views. In ICML ?07: Proceedings of the 24th international conference on Machine learning, pages 1159--1166, 2007. Google ScholarDigital Library
- D. Zhou, I. Councill, H. Zha, and C. L. Giles. Discovering temporal communities from social network documents. In ICDM?07: Proceedings of the 7th IEEE International Conference on Data Mining, 2007. Google ScholarDigital Library
- D. Zhou, J. Huang, and B. Scholkopf. Learning from labeled and unlabeled data on a directed graph. In ICML ?05: Proceedings of the 22nd international conference on Machine learning, pages 1036--1043, 2005. Google ScholarDigital Library
- D. Zhou, E. Manavoglu, J. Li, C. L. Giles, and H. Zha. Probabilistic models for discovering e-communities. In WWW ?06: Proceedings of the 15th international conference on World Wide Web, pages 173--182. ACM Press, 2006. Google ScholarDigital Library
- S. Zhu, K. Yu, Y. Chi, and Y. Gong. Combining content and link for classification using matrix factorization. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007. Google ScholarDigital Library
Index Terms
- Learning multiple graphs for document recommendations
Recommendations
Naïve filterbots for robust cold-start recommendations
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningThe goal of a recommender system is to suggest items of interest to a user based on historical behavior of a community of users. Given detailed enough history, item-based collaborative filtering (CF) often performs as well or better than almost any ...
Integrating collaborative filtering and matching-based search for product recommendations
Currently, recommender systems (RS) have been widely applied in many commercial e-commerce sites to help users deal with the information overload problem. Recommender systems provide personalized recommendations to users and, thus, help in making good ...
Recommendation algorithm based on improved spectral clustering and transfer learning
Collaborative filtering (CF) recommendation has made great success in solving information overload. However, CF has some disadvantages such as cold start, data sparseness, low operation efficiency and knowledge cannot transfer between multiple rating ...
Comments