Skip to main content
Erschienen in: The VLDB Journal 2/2014

01.04.2014 | Special Issue Paper

An expressive framework and efficient algorithms for the analysis of collaborative tagging

verfasst von: Mahashweta Das, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das, Cong Yu

Erschienen in: The VLDB Journal | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The rise of Web 2.0 is signaled by sites such as Flickr, del.icio.us, and YouTube, and social tagging is essential to their success. A typical tagging action involves three components, user, item (e.g., photos in Flickr), and tags (i.e., words or phrases). Analyzing how tags are assigned by certain users to certain items has important implications in helping users search for desired information. In this paper, we develop a dual mining framework to explore tagging behavior. This framework is centered around two opposing measures, similarity and diversity, applied to one or more tagging components, and therefore enables a wide range of analysis scenarios such as characterizing similar users tagging diverse items with similar tags or diverse users tagging similar items with diverse tags. By adopting different concrete measures for similarity and diversity in the framework, we show that a wide range of concrete analysis problems can be defined and they are NP-Complete in general. We design four sets of efficient algorithms for solving many of those problems and demonstrate, through comprehensive experiments over real data, that our algorithms significantly out-perform the exact brute-force approach without compromising analysis result quality.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Since the user and item dimensions share the same characteristics in the dual mining framework, we present here only the user dimension for simplicity.
 
3
Two tagging action groups are neighbors if they are directly connected in the lattice.
 
Literatur
1.
Zurück zum Zitat Amer-Yahia, S., Huang, J., Yu, C.: Building community-centric information exploration applications on social content sites. In: SIGMOD, pp. 947–952 (2009) Amer-Yahia, S., Huang, J., Yu, C.: Building community-centric information exploration applications on social content sites. In: SIGMOD, pp. 947–952 (2009)
2.
Zurück zum Zitat Amer-Yahia, S., Huang, J., Yu, C.: Jelly: a language for building community-centric information exploration applications. In: ICDE, pp. 1588–1594 (2009) Amer-Yahia, S., Huang, J., Yu, C.: Jelly: a language for building community-centric information exploration applications. In: ICDE, pp. 1588–1594 (2009)
3.
Zurück zum Zitat Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)CrossRef Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)CrossRef
4.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
5.
Zurück zum Zitat Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: SIGMOD, pp. 455–466 (2004) Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: SIGMOD, pp. 455–466 (2004)
6.
Zurück zum Zitat Bundschus, M., Yu, S., Tresp, V., Rettinger, A., Dejori, M., Kriegel, H.-P.: Hierarchical bayesian models for collaborative tagging systems. In: ICDM, pp. 728–733 (2009) Bundschus, M., Yu, S., Tresp, V., Rettinger, A., Dejori, M., Kriegel, H.-P.: Hierarchical bayesian models for collaborative tagging systems. In: ICDM, pp. 728–733 (2009)
7.
Zurück zum Zitat Charikar, M.: Similarity estimation techniques from rounding algorithms. In: STOC, pp. 380–388 (2002) Charikar, M.: Similarity estimation techniques from rounding algorithms. In: STOC, pp. 380–388 (2002)
8.
Zurück zum Zitat Chen, Y., Harper, F.M., Konstan, J.A., Li, S.X.: Social comparisons and contributions to online communities: a field experiment on movielens. In: Computational Social Systems and the Internet (2007) Chen, Y., Harper, F.M., Konstan, J.A., Li, S.X.: Social comparisons and contributions to online communities: a field experiment on movielens. In: Computational Social Systems and the Internet (2007)
9.
Zurück zum Zitat Das, M., Amer-Yahia, S., Das, G., Yu, C.: Mri: meaningful interpretations of collaborative ratings. In: PVLDB, pp. 1063–1074 (2011) Das, M., Amer-Yahia, S., Das, G., Yu, C.: Mri: meaningful interpretations of collaborative ratings. In: PVLDB, pp. 1063–1074 (2011)
10.
Zurück zum Zitat Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling lsh for performance tuning. In: CIKM, pp. 669–678 (2008) Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling lsh for performance tuning. In: CIKM, pp. 669–678 (2008)
11.
Zurück zum Zitat Erkut, E., Baptie, T., Hohenbalken, B.V.: The discrete p-maxian location problem. Comput. OR 17(1), 51–61 (1990)CrossRefMATH Erkut, E., Baptie, T., Hohenbalken, B.V.: The discrete p-maxian location problem. Comput. OR 17(1), 51–61 (1990)CrossRefMATH
12.
Zurück zum Zitat Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999) Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)
13.
Zurück zum Zitat Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42(6), 1115–1145 (1995)CrossRefMATHMathSciNet Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42(6), 1115–1145 (1995)CrossRefMATHMathSciNet
14.
Zurück zum Zitat Golder, S.A., Huberman, B.A.: The structure of collaborative tagging systems. CoRR, abs/cs/0508082 (2005) Golder, S.A., Huberman, B.A.: The structure of collaborative tagging systems. CoRR, abs/cs/0508082 (2005)
15.
Zurück zum Zitat Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198–208 (2006)CrossRef Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198–208 (2006)CrossRef
16.
Zurück zum Zitat Goldfeld, S.M., Quandt, R.E., Trotter, H.F.: Maximization by quadratic hill-climbing. Econ. Soc. 34 (1966) Goldfeld, S.M., Quandt, R.E., Trotter, H.F.: Maximization by quadratic hill-climbing. Econ. Soc. 34 (1966)
17.
Zurück zum Zitat Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)CrossRef Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)CrossRef
18.
Zurück zum Zitat Guo, Y., Joshi, J.B.D.: Topic-based personalized recommendation for collaborative tagging system. In: HT, pp. 61–66 (2010) Guo, Y., Joshi, J.B.D.: Topic-based personalized recommendation for collaborative tagging system. In: HT, pp. 61–66 (2010)
19.
Zurück zum Zitat Gwizdka, J.: Of kings, traffic signs and flowers: exploring navigation of tagged documents. In: HT, pp. 167–172 (2010) Gwizdka, J.: Of kings, traffic signs and flowers: exploring navigation of tagged documents. In: HT, pp. 167–172 (2010)
20.
Zurück zum Zitat Handler, G., Mirchandani, P.: Location on networks: theory and algorithms. MIT Press series in signal processing, optimization, and, control (1979) Handler, G., Mirchandani, P.: Location on networks: theory and algorithms. MIT Press series in signal processing, optimization, and, control (1979)
21.
Zurück zum Zitat Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: CSCW, pp. 241–250 (2000) Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: CSCW, pp. 241–250 (2000)
22.
Zurück zum Zitat Heymann, P., Paepcke, A., Garcia-Molina, H.: Tagging human knowledge. In: WSDM, pp. 51–60 (2010) Heymann, P., Paepcke, A., Garcia-Molina, H.: Tagging human knowledge. In: WSDM, pp. 51–60 (2010)
23.
Zurück zum Zitat Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: SIGIR, pp. 531–538 (2008) Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: SIGIR, pp. 531–538 (2008)
24.
Zurück zum Zitat Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998) Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)
25.
Zurück zum Zitat Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)CrossRef Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)CrossRef
27.
Zurück zum Zitat Körner, C., Kern, R., Grahsl, H.-P., Strohmaier, M.: Of categorizers and describers: an evaluation of quantitative measures for tagging motivation. In: HT, pp. 157–166 (2010) Körner, C., Kern, R., Grahsl, H.-P., Strohmaier, M.: Of categorizers and describers: an evaluation of quantitative measures for tagging motivation. In: HT, pp. 157–166 (2010)
28.
Zurück zum Zitat Liang, H., Xu, Y., Li, Y., Nayak, R., Tao, X.: Connecting users and items with weighted tags for personalized item recommendations. In: HT, pp. 51–60 (2010) Liang, H., Xu, Y., Li, Y., Nayak, R., Tao, X.: Connecting users and items with weighted tags for personalized item recommendations. In: HT, pp. 51–60 (2010)
29.
Zurück zum Zitat Liu, K., Fang, B., Zhang, W.: Speak the same language with your friends: augmenting tag recommenders with social relations. In: HT, pp. 45–50 (2010) Liu, K., Fang, B., Zhang, W.: Speak the same language with your friends: augmenting tag recommenders with social relations. In: HT, pp. 45–50 (2010)
30.
Zurück zum Zitat Lu, C., Hu, X., Chen, X., Park, J.-R., He, T., Li, Z.: The topic-perspective model for social tagging systems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pp. 683–692. ACM, New York (2010) Lu, C., Hu, X., Chen, X., Park, J.-R., He, T., Li, Z.: The topic-perspective model for social tagging systems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pp. 683–692. ACM, New York (2010)
31.
Zurück zum Zitat Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe lsh: efficient indexing for high-dimensional similarity search. In: VLDB, pp. 950–961 (2007) Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe lsh: efficient indexing for high-dimensional similarity search. In: VLDB, pp. 950–961 (2007)
32.
Zurück zum Zitat Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH
33.
Zurück zum Zitat Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP (1), 331–340 (2009) Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP (1), 331–340 (2009)
34.
Zurück zum Zitat Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)CrossRefMATH Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)CrossRefMATH
35.
Zurück zum Zitat Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topic models. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, AAAI (2010) Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topic models. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, AAAI (2010)
36.
Zurück zum Zitat Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: WSDM, pp. 54–63 (2009) Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: WSDM, pp. 54–63 (2009)
37.
Zurück zum Zitat Ramakrishnan, R., Chen, B.-C.: Exploratory mining in cube space. Data Min. Knowl. Discov. 15(1), 29–54 (2007)CrossRefMathSciNet Ramakrishnan, R., Chen, B.-C.: Exploratory mining in cube space. Data Min. Knowl. Discov. 15(1), 29–54 (2007)CrossRefMathSciNet
38.
Zurück zum Zitat Ravi, S.S., Rosenkrantz, D.J., Tayi, G.K.: Facility dispersion problems: Heuristics and special cases (extended abstract). In: WADS, pp. 355–366 (1991) Ravi, S.S., Rosenkrantz, D.J., Tayi, G.K.: Facility dispersion problems: Heuristics and special cases (extended abstract). In: WADS, pp. 355–366 (1991)
39.
Zurück zum Zitat Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)CrossRef Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)CrossRef
40.
Zurück zum Zitat Sathe, G., Sarawagi, S.: Intelligent rollups in multidimensional olap data. In: VLDB, pp. 531–540 (2001) Sathe, G., Sarawagi, S.: Intelligent rollups in multidimensional olap data. In: VLDB, pp. 531–540 (2001)
41.
Zurück zum Zitat Sen, S., Lam, S.K., Rashid, A.M., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F.M., Riedl, J.: Tagging, communities, vocabulary, evolution. In: CSCW, pp. 181–190 (2006) Sen, S., Lam, S.K., Rashid, A.M., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F.M., Riedl, J.: Tagging, communities, vocabulary, evolution. In: CSCW, pp. 181–190 (2006)
42.
Zurück zum Zitat Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proc. IEEE 100(9), 2604–2623 (2012)CrossRef Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proc. IEEE 100(9), 2604–2623 (2012)CrossRef
43.
Zurück zum Zitat Vaughan, D.E., Jacobson, S.H., Kaul, H.: Analyzing the performance of simultaneous generalized hill climbing algorithms. Comput. Opt. Appl. 37, 103–119 (2007)CrossRefMATHMathSciNet Vaughan, D.E., Jacobson, S.H., Kaul, H.: Analyzing the performance of simultaneous generalized hill climbing algorithms. Comput. Opt. Appl. 37, 103–119 (2007)CrossRefMATHMathSciNet
44.
Zurück zum Zitat Venetis, P., Koutrika, G., Garcia-Molina, H.: On the selection of tags for tag clouds. In: WSDM, pp. 835–844 (2011) Venetis, P., Koutrika, G., Garcia-Molina, H.: On the selection of tags for tag clouds. In: WSDM, pp. 835–844 (2011)
45.
Zurück zum Zitat Wu, P., Sismanis, Y., Reinwald, B.: Towards keyword-driven analytical processing. In: SIGMOD Conference, pp. 617–628 (2007) Wu, P., Sismanis, Y., Reinwald, B.: Towards keyword-driven analytical processing. In: SIGMOD Conference, pp. 617–628 (2007)
46.
Zurück zum Zitat Yu, C., Lakshmanan, L., Amer-Yahia, S.: It takes variety to make a world: diversification in recommender systems. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pp. 368–378. ACM, New York (2009) Yu, C., Lakshmanan, L., Amer-Yahia, S.: It takes variety to make a world: diversification in recommender systems. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pp. 368–378. ACM, New York (2009)
47.
Zurück zum Zitat Zhou, D., Bian, J., Zheng, S., Zha, H., Giles, C.L.: Exploring social annotations for information retrieval. In: WWW, pp. 715–724 (2008) Zhou, D., Bian, J., Zheng, S., Zha, H., Giles, C.L.: Exploring social annotations for information retrieval. In: WWW, pp. 715–724 (2008)
Metadaten
Titel
An expressive framework and efficient algorithms for the analysis of collaborative tagging
verfasst von
Mahashweta Das
Saravanan Thirumuruganathan
Sihem Amer-Yahia
Gautam Das
Cong Yu
Publikationsdatum
01.04.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 2/2014
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-013-0341-y

Weitere Artikel der Ausgabe 2/2014

The VLDB Journal 2/2014 Zur Ausgabe