skip to main content
research-article

Automatic tag recommendation algorithms for social recommender systems

Published:17 February 2011Publication History
Skip Abstract Section

Abstract

The emergence of Web 2.0 and the consequent success of social network Web sites such as Del.icio.us and Flickr introduce us to a new concept called social bookmarking, or tagging. Tagging is the action of connecting a relevant user-defined keyword to a document, image, or video, which helps the user to better organize and share their collections of interesting stuff. With the rapid growth of Web 2.0, tagged data is becoming more and more abundant on the social network Web sites. An interesting problem is how to automate the process of making tag recommendations to users when a new resource becomes available.

In this article, we address the issue of tag recommendation from a machine learning perspective. From our empirical observation of two large-scale datasets, we first argue that the user-centered approach for tag recommendation is not very effective in practice. Consequently, we propose two novel document-centered approaches that are capable of making effective and efficient tag recommendations in real scenarios. The first, graph-based, method represents the tagged data in two bipartite graphs, (document, tag) and (document, word), then finds document topics by leveraging graph partitioning algorithms. The second, prototype-based, method aims at finding the most representative documents within the data collections and advocates a sparse multiclass Gaussian process classifier for efficient document classification. For both methods, tags are ranked within each topic cluster/class by a novel ranking method. Recommendations are performed by first classifying a new document into one or more topic clusters/classes, and then selecting the most relevant tags from those clusters/classes as machine-recommended tags.

Experiments on real-world data from Del.icio.us, CiteULike, and BibSonomy examine the quality of tag recommendation as well as the efficiency of our recommendation algorithms. The results suggest that our document-centered models can substantially improve the performance of tag recommendations when compared to the user-centered methods, as well as topic models LDA and SVM classifiers.

References

  1. Begelman, G., Keller, P., and Smadja, F. 2006. Automated tag clustering: Improving search and exploration in the tag space. In Proceedings of the Collaborative Web Tagging Workshop (WWW'06).Google ScholarGoogle Scholar
  2. Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarCross RefCross Ref
  3. Bogers, T. and van den Bosch, A. 2008. Recommending scientific articles using citeulike. In Proceedings of the ACM Conference on Recommender Systems (RecSys'08). ACM, New York, NY, 287--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Breese, J. S., Heckerman, D., and Kadie, C. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brinker, K., Furnkranz, J., and Hullermeier, E. 2006. A unified model for multilabel classification and ranking. In Proceedings of the European Conference on Artificial Intelligence (ECAI'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chirita, P. A., Costache, S., Nejdl, W., and Handschuh, S. 2007. P-tag: large scale automatic generation of personalized annotation tags for the web. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM Press, New York, NY, 845--854. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cristianini, N. and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Sens B, 39, 1, 1--38.Google ScholarGoogle Scholar
  9. Farooq, U., Song, Y., Carroll, J. M., and Giles, C. L. 2007. Social bookmarking for scholarly digital libraries. IEEE Internet Comput. 29--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Figueiredo, M. A. T. and Jain, A. K. 2002. Unsupervised learning of finite mixture models. IEEE Trans. Patt. Anal. Mach. Intell. 24, 3, 381--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Golder, S. and Huberman, B. 2006. Usage patterns of collaborative tagging systems. J. Inform. Sci. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Golub, G. H. and Loan, C. F. V. 1996. Matrix Computations 3rd Ed. Johns Hopkins University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jaeschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., and Stumme, G. 2007. Tag recommendations in folksonomies. In Workshop Proceedings of Lernen—Wissensentdeckung—Adaptivitt (LWA'07). Martin-Luther-Universität Halle-Wittenberg, 13--20.Google ScholarGoogle Scholar
  14. Johnson, R. and Zhang, T. 2007. On the effectiveness of laplacian normalization for graph semi-supervised learning. J. Mach. Learn. Res. 8, 1489--1517. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kendall, M. 1938. A new measure of rank correlation. Biometrika 30, 81--89.Google ScholarGoogle ScholarCross RefCross Ref
  16. Kohonen, T. 2001. Self Organization Maps. Springer.Google ScholarGoogle Scholar
  17. Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annl. Math. Stat. 22, 79--86.Google ScholarGoogle ScholarCross RefCross Ref
  18. Lawrence, N., Seeger, M., and Herbrich, R. 2003. Fast sparse gaussian process methods: The informative vector machine. In Proceedings of Neural Information Processing Systems (NIPS15). 609--616.Google ScholarGoogle Scholar
  19. Li, J. and Wang, J. Z. 2006. Real-time computerized annotation of pictures. In Proceedings of the International Conference on Multimedia (MULTIMEDIA'06). 911--920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li, J. and Zha, H. 2006. Two-way poisson mixture models for simultaneous document classification and word clustering. Comput. Stat. Data Anal. 50, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Platt, J. C. 2000. Probabilities for SV machines. In Advances in Large Margin Classifiers, 61--74.Google ScholarGoogle Scholar
  22. Rasmussen, C. E. and Williams, C. K. I. 2006. Gaussian Processes for Machine Learning. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Schlattmann, P. 2003. Estimating the number of components in a finite mixture model: the special case of homogeneity. Comput. Stat. Data Anal. 41, 3-4, 441--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Seeger, M. and Jordan, M. 2004. Sparse gaussian process classification with multiple classes. Tech. rep. 661, Department of Statistics, University of California at Berkeley.Google ScholarGoogle Scholar
  25. Seeger, M. and Williams, C. 2003. Fast forward selection to speed up sparse gaussian process regression. In Proceedings of the Workshop on AI and Statistics.Google ScholarGoogle Scholar
  26. Seo, S., Bode, M., and Obermayer, K. 2003. Soft nearest prototype classification. IEEE Trans. Neural Net. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Song, Y., Zhang, L., and Giles, C. L. 2008. Sparse gaussian processes classification for fast tag recommendation. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W.-C., and Giles, C. L. 2008. Real-time automatic tag recommendation. In Proceedings of the Annual International ACM SIGIR Conference (SIGIR'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Symeonidis, P., Nanopoulos, A., and Manolopoulos, Y. 2008. Tag recommendations based on tensor dimensionality reduction. In Proceedings of the ACM Conference on Recommender Systems (RecSys'08). ACM, New York, NY, 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tsoumakas, G. and Katakis, I. 2007. Multi-label classification: An overview. Intl. J. Data Warehous. Mining 3, 3, 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  31. Zha, H., He, X., Ding, C., Simon, H., and Gu, M. 2001. Bipartite graph partitioning and data clustering. In Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM'01). ACM Press, New York, NY, 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic tag recommendation algorithms for social recommender systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 5, Issue 1
        February 2011
        150 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/1921591
        Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 February 2011
        • Accepted: 1 November 2009
        • Revised: 1 March 2009
        • Received: 1 September 2008
        Published in tweb Volume 5, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader