ABSTRACT
In this paper, we look at the "social tag prediction" problem. Given a set of objects, and a set of tags applied to those objects by users, can we predict whether a given tag could/should be applied to a particular object? We investigated this question using one of the largest crawls of the social bookmarking system del.icio.us gathered to date. For URLs in del.icio.us, we predicted tags based on page text, anchor text, surrounding hosts, and other tags applied to the URL. We found an entropy-based metric which captures the generality of a particular tag and informs an analysis of how well that tag can be predicted. We also found that tag-based association rules can produce very high-precision predictions as well as giving deeper understanding into the relationships between tags. Our results have implications for both the study of tagging systems as potential information retrieval tools, and for the design of such systems.
- R. Agrawal, T. Imieliński, and A. Swami. Mining Association Rules Between Sets of Items in Large Databases. SIGMOD Record, 22(2), 1993. Google ScholarDigital Library
- M. Aurnhammer, P. Hanappe, and L. Steels. Integrating Collaborative Tagging and Emergent Semantics for Image Retrieval. Collaborative Web Tagging Workshop (WWW'06).Google Scholar
- S. Chakrabarti, B. Dom, and P. Indyk. Enhanced Hypertext Categorization Using Hyperlinks. SIGMOD'98. Google ScholarDigital Library
- E. Chi and T. Mytkowicz. Understanding the Efficiency of Social Tagging Systems using Information Theory. HT'08. Google ScholarDigital Library
- E. Gabrilovich and S. Markovitch. Text Categorization with Many Redundant Features: Using Aggressive Feature Selection to Make SVMs Competitive with C4.5. ICML'04. Google ScholarDigital Library
- S. Golder and B. A. Huberman. Usage Patterns of Collaborative Tagging Systems. Journal of Information Science, 32(2):198--208, April 2006. Google ScholarDigital Library
- T. Haveliwala, A. Gionis, D. Klein, and P. Indyk. Evaluating Strategies for Similarity Search on the Web. WWW'02. Google ScholarDigital Library
- P. Heymann, G. Koutrika, and H. Garcia-Molina. Can Social Bookmarking Improve Web Search. WSDM'08. Google ScholarDigital Library
- T. Joachims. A Support Vector Method for Multivariate Performance Measures. ICML'05. Google ScholarDigital Library
- T. Joachims. Making Large-scale Support Vector Machine Learning Practical. Advances in Kernel Methods: Support Vector Learning, 1999. Google ScholarDigital Library
- K. Jones and C. van Rijsbergen. Information Retrieval Test Collections. Journal of Documentation, 32(1):59--75, 1976.Google ScholarCross Ref
- K. Jones and C. van Rijsbergen. Information Retrieval Test Collections. Journal of Documentation, 32(1):59--75, 1976.Google ScholarCross Ref
- G. Mishne. AutoTag: a collaborative approach to automated tag assignment for weblog posts. WWW'06. Google ScholarDigital Library
- C. Schmitz, A. Hotho, R. Jaschke, and G. Stumme. Mining Association Rules in Folksonomies. IFCS'06.Google Scholar
- E. Schwarzkopf, D. Heckmann, D. Dengler, and A. Kroner. Mining the Structure of Tag Spaces for User Modeling. Workshop on Data Mining for User Modeling (ICUM'07).Google Scholar
- S. Sen, S. K. Lam, A. M. Rashid, D. Cosley, D. Frankowski, J. Osterhouse, F. M. Harper, and J. Riedl. tagging, communities, vocabulary, evolution. CSCW'06. Google ScholarDigital Library
- S. Sood, K. Hammond, S. Owsley, and L. Birnbaum. TagAssist: Automatic Tag Suggestion for Blog Posts. ICWSM'07.Google Scholar
- Z. Xu, Y. Fu, J. Mao, and D. Su. Towards the Semantic Web: Collaborative Tag Suggestions. Collaborative Web Tagging Workshop (WWW'06).Google Scholar
- Y. Yang and J. O. Pedersen. A Comparative Study on Feature Selection in Text Categorization. ICML'97. Google ScholarDigital Library
- Y. Yang, S. Slattery, and R. Ghani. A Study of Approaches to Hypertext Categorization. Journal of Intelligent Information Systems, 18(2--3), 2002. Google ScholarDigital Library
Index Terms
- Social tag prediction
Recommendations
Survey on social tagging techniques
Social tagging on online portals has become a trend now. It has emerged as one of the best ways of associating metadata with web objects. With the increase in the kinds of web objects becoming available, collaborative tagging of such objects is also ...
Tag recommendation for social bookmarking: Probabilistic approaches
Principles and Practice of Multi-Agent SystemsTagging has become increasingly popular with the explosion of user-created content on the web. A 'tag' can be defined as a group of keywords that makes organizing, browsing and searching for content more efficient. Users apply tags to a variety of web-...
Detecting Trends in Social Bookmarking Systems: A del.icio.us Endeavor
The authors present and evaluate an approach to trend detection in social bookmarking systems using a probabilistic generative model in combination with smoothing techniques. Social bookmarking systems are gaining major interest among researchers in the ...
Comments