Abstract
The emergence of Web 2.0 and the consequent success of social network Web sites such as Del.icio.us and Flickr introduce us to a new concept called social bookmarking, or tagging. Tagging is the action of connecting a relevant user-defined keyword to a document, image, or video, which helps the user to better organize and share their collections of interesting stuff. With the rapid growth of Web 2.0, tagged data is becoming more and more abundant on the social network Web sites. An interesting problem is how to automate the process of making tag recommendations to users when a new resource becomes available.
In this article, we address the issue of tag recommendation from a machine learning perspective. From our empirical observation of two large-scale datasets, we first argue that the user-centered approach for tag recommendation is not very effective in practice. Consequently, we propose two novel document-centered approaches that are capable of making effective and efficient tag recommendations in real scenarios. The first, graph-based, method represents the tagged data in two bipartite graphs, (document, tag) and (document, word), then finds document topics by leveraging graph partitioning algorithms. The second, prototype-based, method aims at finding the most representative documents within the data collections and advocates a sparse multiclass Gaussian process classifier for efficient document classification. For both methods, tags are ranked within each topic cluster/class by a novel ranking method. Recommendations are performed by first classifying a new document into one or more topic clusters/classes, and then selecting the most relevant tags from those clusters/classes as machine-recommended tags.
Experiments on real-world data from Del.icio.us, CiteULike, and BibSonomy examine the quality of tag recommendation as well as the efficiency of our recommendation algorithms. The results suggest that our document-centered models can substantially improve the performance of tag recommendations when compared to the user-centered methods, as well as topic models LDA and SVM classifiers.
- Begelman, G., Keller, P., and Smadja, F. 2006. Automated tag clustering: Improving search and exploration in the tag space. In Proceedings of the Collaborative Web Tagging Workshop (WWW'06).Google Scholar
- Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarCross Ref
- Bogers, T. and van den Bosch, A. 2008. Recommending scientific articles using citeulike. In Proceedings of the ACM Conference on Recommender Systems (RecSys'08). ACM, New York, NY, 287--290. Google ScholarDigital Library
- Breese, J. S., Heckerman, D., and Kadie, C. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. 43--52. Google ScholarDigital Library
- Brinker, K., Furnkranz, J., and Hullermeier, E. 2006. A unified model for multilabel classification and ranking. In Proceedings of the European Conference on Artificial Intelligence (ECAI'06). Google ScholarDigital Library
- Chirita, P. A., Costache, S., Nejdl, W., and Handschuh, S. 2007. P-tag: large scale automatic generation of personalized annotation tags for the web. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM Press, New York, NY, 845--854. Google ScholarDigital Library
- Cristianini, N. and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press. Google ScholarDigital Library
- Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Sens B, 39, 1, 1--38.Google Scholar
- Farooq, U., Song, Y., Carroll, J. M., and Giles, C. L. 2007. Social bookmarking for scholarly digital libraries. IEEE Internet Comput. 29--35. Google ScholarDigital Library
- Figueiredo, M. A. T. and Jain, A. K. 2002. Unsupervised learning of finite mixture models. IEEE Trans. Patt. Anal. Mach. Intell. 24, 3, 381--396. Google ScholarDigital Library
- Golder, S. and Huberman, B. 2006. Usage patterns of collaborative tagging systems. J. Inform. Sci. Google ScholarDigital Library
- Golub, G. H. and Loan, C. F. V. 1996. Matrix Computations 3rd Ed. Johns Hopkins University Press. Google ScholarDigital Library
- Jaeschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., and Stumme, G. 2007. Tag recommendations in folksonomies. In Workshop Proceedings of Lernen—Wissensentdeckung—Adaptivitt (LWA'07). Martin-Luther-Universität Halle-Wittenberg, 13--20.Google Scholar
- Johnson, R. and Zhang, T. 2007. On the effectiveness of laplacian normalization for graph semi-supervised learning. J. Mach. Learn. Res. 8, 1489--1517. Google ScholarDigital Library
- Kendall, M. 1938. A new measure of rank correlation. Biometrika 30, 81--89.Google ScholarCross Ref
- Kohonen, T. 2001. Self Organization Maps. Springer.Google Scholar
- Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annl. Math. Stat. 22, 79--86.Google ScholarCross Ref
- Lawrence, N., Seeger, M., and Herbrich, R. 2003. Fast sparse gaussian process methods: The informative vector machine. In Proceedings of Neural Information Processing Systems (NIPS15). 609--616.Google Scholar
- Li, J. and Wang, J. Z. 2006. Real-time computerized annotation of pictures. In Proceedings of the International Conference on Multimedia (MULTIMEDIA'06). 911--920. Google ScholarDigital Library
- Li, J. and Zha, H. 2006. Two-way poisson mixture models for simultaneous document classification and word clustering. Comput. Stat. Data Anal. 50, 1. Google ScholarDigital Library
- Platt, J. C. 2000. Probabilities for SV machines. In Advances in Large Margin Classifiers, 61--74.Google Scholar
- Rasmussen, C. E. and Williams, C. K. I. 2006. Gaussian Processes for Machine Learning. MIT Press. Google ScholarDigital Library
- Schlattmann, P. 2003. Estimating the number of components in a finite mixture model: the special case of homogeneity. Comput. Stat. Data Anal. 41, 3-4, 441--451. Google ScholarDigital Library
- Seeger, M. and Jordan, M. 2004. Sparse gaussian process classification with multiple classes. Tech. rep. 661, Department of Statistics, University of California at Berkeley.Google Scholar
- Seeger, M. and Williams, C. 2003. Fast forward selection to speed up sparse gaussian process regression. In Proceedings of the Workshop on AI and Statistics.Google Scholar
- Seo, S., Bode, M., and Obermayer, K. 2003. Soft nearest prototype classification. IEEE Trans. Neural Net. Google ScholarDigital Library
- Song, Y., Zhang, L., and Giles, C. L. 2008. Sparse gaussian processes classification for fast tag recommendation. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, New York, NY. Google ScholarDigital Library
- Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W.-C., and Giles, C. L. 2008. Real-time automatic tag recommendation. In Proceedings of the Annual International ACM SIGIR Conference (SIGIR'08). Google ScholarDigital Library
- Symeonidis, P., Nanopoulos, A., and Manolopoulos, Y. 2008. Tag recommendations based on tensor dimensionality reduction. In Proceedings of the ACM Conference on Recommender Systems (RecSys'08). ACM, New York, NY, 43--50. Google ScholarDigital Library
- Tsoumakas, G. and Katakis, I. 2007. Multi-label classification: An overview. Intl. J. Data Warehous. Mining 3, 3, 1--13.Google ScholarCross Ref
- Zha, H., He, X., Ding, C., Simon, H., and Gu, M. 2001. Bipartite graph partitioning and data clustering. In Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM'01). ACM Press, New York, NY, 25--32. Google ScholarDigital Library
Index Terms
- Automatic tag recommendation algorithms for social recommender systems
Recommendations
Real-time automatic tag recommendation
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalTags are user-generated labels for entities. Existing research on tag recommendation either focuses on improving its accuracy or on automating the process, while ignoring the efficiency issue. We propose a highly-automated novel framework for real-time ...
A sparse gaussian processes classification framework for fast tag suggestions
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementTagged data is rapidly becoming more available on the World Wide Web. Web sites which populate tagging services offer a good way for Internet users to share their knowledge. An interesting problem is how to make tag suggestions when a new resource ...
Tag Recommendation Based on Collaborative Filtering and Text Similarity
ETCS '11: Proceedings of the 2011 Third International Workshop on Education Technology and Computer Science - Volume 02In current social tagging system, users can freely add tags for the uploaded resources, which caused a problem that many tags could not describe the resource properly and even have some spelling errors. This problem may bring unnecessary troubles for ...
Comments