Skip to main content
Erschienen in: Journal of Intelligent Information Systems 3/2015

01.06.2015

Semantic grounding of social annotations for enhancing resource classification in folksonomies

verfasst von: Antonela Tommasel, Daniela Godoy

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

User-generated annotations in tagging or bookmarking sites such as Flickr or Delicious can provide a promising and interesting source of information for aiding tasks such as Web resource classification. However, the use of tags brings up some challenges. Since there are no constraints on the terms that can be used for tagging, noise and ambiguity are introduced when users annotate resources. Moreover, traditional bag-of-words representations ignore connections between terms and, thus, are affected by synonymity and hyponymia. Althougth tag-based representations are a valuable source for classifying resources, the problems associated with the unsupervised nature of tags may hinder classification results. This paper presents an approach for semantically analysing social annotations in order to attain enriched concept-based representations of Web resources. Representations are enriched with concepts extracted from WordNet and Wikipedia to overcome problems caused by natural language as well as enhancing the quality of information available for performing an effective classification of resources. Several strategies for tag pre-processing, concept disambiguation and incorporation of semantic entities to representations are discussed and evaluated in this paper. Experimental results showed that the strategies proposed to associate tags with conceptual entities allow improving resource classification results, outperforming traditional approaches based on bag-of-words representations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. In Proceedings of the 16th conference on computational linguistics - Volume 1, ACL, (COLING ’96) Copenhagen, Denmark, (pp. 16–22). Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. In Proceedings of the 16th conference on computational linguistics - Volume 1, ACL, (COLING ’96) Copenhagen, Denmark, (pp. 16–22).
Zurück zum Zitat Aliakbary, S., Abolhassani, H., Rahmani, H., Nobakht, B. (2009). Web page classification using social tags. In Proceedings of the 2009 international conference on computational science and engineering (CSE ’09) (pp. 588–593). Aliakbary, S., Abolhassani, H., Rahmani, H., Nobakht, B. (2009). Web page classification using social tags. In Proceedings of the 2009 international conference on computational science and engineering (CSE ’09) (pp. 588–593).
Zurück zum Zitat Baeza-Yates, R.A., & Ribeiro-Neto, B.A. (1999). Modern information retrieval. Boston: Addison-Wesley Longman Publishing Co. Inc. Baeza-Yates, R.A., & Ribeiro-Neto, B.A. (1999). Modern information retrieval. Boston: Addison-Wesley Longman Publishing Co. Inc.
Zurück zum Zitat Buckley, C. (1993). The importance of proper weighting methods. In Proceedings of the workshop on human language technology, association for computational linguistics, (HLT ’93), Princeton, New Jersey, (pp. 349–352). Buckley, C. (1993). The importance of proper weighting methods. In Proceedings of the workshop on human language technology, association for computational linguistics, (HLT ’93), Princeton, New Jersey, (pp. 349–352).
Zurück zum Zitat Cavnar, W.B., & Trenkle, J.M. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval (pp. 161–175). Cavnar, W.B., & Trenkle, J.M. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval (pp. 161–175).
Zurück zum Zitat Dagher, G.G., & Fung, B.C.M. (2013). Subject-based semantic document clustering for digital forensic investigations. Data & Knowledge Engineering (DKE), 86, 224–241.CrossRef Dagher, G.G., & Fung, B.C.M. (2013). Subject-based semantic document clustering for digital forensic investigations. Data & Knowledge Engineering (DKE), 86, 224–241.CrossRef
Zurück zum Zitat Dattolo, A., Eynard, D., Mazzola, L. (2011). An integrated approach to discover tag semantics. In Proceedings of the 2011 ACM symposium on applied computing, ACM, (SAC ’11), TaiChung, Taiwan, (pp. 814–820). Dattolo, A., Eynard, D., Mazzola, L. (2011). An integrated approach to discover tag semantics. In Proceedings of the 2011 ACM symposium on applied computing, ACM, (SAC ’11), TaiChung, Taiwan, (pp. 814–820).
Zurück zum Zitat Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.CrossRef Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.CrossRef
Zurück zum Zitat Fellbaum, C. (2005). Wordnet and wordnets In K. Brown (Ed.), , Encyclopedia of language and linguistics (pp. 665–670). Oxford: Elsevier. Fellbaum, C. (2005). Wordnet and wordnets In K. Brown (Ed.), , Encyclopedia of language and linguistics (pp. 665–670). Oxford: Elsevier.
Zurück zum Zitat Fogarolli, A. (2009). Word sense disambiguation based on wikipedia link structure. In Proceedings of the 2009 IEEE international conference on semantic computing, IEEE Computer Society, (ICSC ’09), Washington, DC, (pp. 77–82). Fogarolli, A. (2009). Word sense disambiguation based on wikipedia link structure. In Proceedings of the 2009 IEEE international conference on semantic computing, IEEE Computer Society, (ICSC ’09), Washington, DC, (pp. 77–82).
Zurück zum Zitat Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference on artifical intelligence, (IJCAI’07) (pp. 1606–1611). Hyderabad: Morgan Kaufmann Publishers Inc. Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference on artifical intelligence, (IJCAI’07) (pp. 1606–1611). Hyderabad: Morgan Kaufmann Publishers Inc.
Zurück zum Zitat Hotho, A., Staab, S., Stumme, G. (2003). Wordnet improves text document clustering. In Proceedings of the semantic web workshop of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, (SIGIR 2003), Toronto Canada. Hotho, A., Staab, S., Stumme, G. (2003). Wordnet improves text document clustering. In Proceedings of the semantic web workshop of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, (SIGIR 2003), Toronto Canada.
Zurück zum Zitat Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. (2006). Bibsonomy: a social bookmark and publication sharing system In A. de Moor, S. Polovina, H. Delugach (Eds.), Proceedings of the conceptual structures tool interoperability workshop at the 14th international conference on conceptual structures. Aalborg: Aalborg University Press. Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. (2006). Bibsonomy: a social bookmark and publication sharing system In A. de Moor, S. Polovina, H. Delugach (Eds.), Proceedings of the conceptual structures tool interoperability workshop at the 14th international conference on conceptual structures. Aalborg: Aalborg University Press.
Zurück zum Zitat Huang, A., Milne, D., Frank, E., Witten, I.H. (2009). Clustering documents using a wikipedia-based concept representation. In Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, (PAKDD ’09) (pp. 628–636). Bangkok: Springer-Verlag. Huang, A., Milne, D., Frank, E., Witten, I.H. (2009). Clustering documents using a wikipedia-based concept representation. In Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, (PAKDD ’09) (pp. 628–636). Bangkok: Springer-Verlag.
Zurück zum Zitat Jankowski, N., & Usowicz, K. (2011). Analysis of feature weighting methods based on feature ranking methods for classification. In Proceedings of the 18th international conference on neural information processing, (ICONIP’11) (pp. 238–247). Shanghai: Springer-Verlag. Jankowski, N., & Usowicz, K. (2011). Analysis of feature weighting methods based on feature ranking methods for classification. In Proceedings of the 18th international conference on neural information processing, (ICONIP’11) (pp. 238–247). Shanghai: Springer-Verlag.
Zurück zum Zitat Kohavi, R., Langley, P., Yun, Y. (1997). The utility of feature weighting in nearest-neighbor algorithms. In Proceedings of the 9th European conference on machine learning (pp. 85–92). Springer-Verlag. Kohavi, R., Langley, P., Yun, Y. (1997). The utility of feature weighting in nearest-neighbor algorithms. In Proceedings of the 9th European conference on machine learning (pp. 85–92). Springer-Verlag.
Zurück zum Zitat Körner, C., Kern, R., Grahsl, H.P., Strohmaier, M. (2010). Of categorizers and describers: an evaluation of quantitative measures for tagging motivation. In Proceedings of the 21st ACM conference on hypertext and hypermedia, (HT ’10). (pp. 157–166). Toronto: ACM. Körner, C., Kern, R., Grahsl, H.P., Strohmaier, M. (2010). Of categorizers and describers: an evaluation of quantitative measures for tagging motivation. In Proceedings of the 21st ACM conference on hypertext and hypermedia, (HT ’10). (pp. 157–166). Toronto: ACM.
Zurück zum Zitat Lan, H. (2011). Concept-based text clustering. PhD thesis, University of Waikato, New Zealand. Lan, H. (2011). Concept-based text clustering. PhD thesis, University of Waikato, New Zealand.
Zurück zum Zitat Lan, M., Tan, C.L., Low, H.B., Sung, S.Y. (2005). A comprehensive comparative study on term weighting schemes for text categorization with support vector machines. In Special interest tracks and posters of the 14th international conference on world wide web, ACM, (WWW ’05)(pp. 1032–1033). Chiba, Japan. Lan, M., Tan, C.L., Low, H.B., Sung, S.Y. (2005). A comprehensive comparative study on term weighting schemes for text categorization with support vector machines. In Special interest tracks and posters of the 14th international conference on world wide web, ACM, (WWW ’05)(pp. 1032–1033). Chiba, Japan.
Zurück zum Zitat Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space? Machine Learning, 46(1–3), 423–444.CrossRefMATH Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space? Machine Learning, 46(1–3), 423–444.CrossRefMATH
Zurück zum Zitat Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on systems documentation, ACM, (SIGDOC ’86)(pp. 24–26). Toronto, Canada. Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on systems documentation, ACM, (SIGDOC ’86)(pp. 24–26). Toronto, Canada.
Zurück zum Zitat Lops, P., de Gemmis, M., Semeraro, G., Musto, C., Narducci, F. (2013). Content-based and collaborative techniques for tag recommendation: an empirical evaluation. Journal of Intelligent Information Systems, 40(1), 41–61. doi:10.1007/s10844-012-0215-6.CrossRef Lops, P., de Gemmis, M., Semeraro, G., Musto, C., Narducci, F. (2013). Content-based and collaborative techniques for tag recommendation: an empirical evaluation. Journal of Intelligent Information Systems, 40(1), 41–61. doi:10.​1007/​s10844-012-0215-6.CrossRef
Zurück zum Zitat Maree, M., & Belkhatir, M. (2013). Coupling semantic and statistical techniques for dynamically enriching web ontologies. Journal of Intelligent Information Systems, 40(3), 455–478. doi:10.1007/s10844-012-0233-4.CrossRef Maree, M., & Belkhatir, M. (2013). Coupling semantic and statistical techniques for dynamically enriching web ontologies. Journal of Intelligent Information Systems, 40(3), 455–478. doi:10.​1007/​s10844-012-0233-4.CrossRef
Zurück zum Zitat Mathes, A. (2004). Folksonomies - cooperative classification and communication through shared metadata. Computer Mediated Communication. Mathes, A. (2004). Folksonomies - cooperative classification and communication through shared metadata. Computer Mediated Communication.
Zurück zum Zitat Medelyan, O., Milne, D., Legg, C., Witten, I.H. (2009). Mining meaning from wikipedia. International Journal of Human-Computer Studies, 67(9), 716–754.CrossRef Medelyan, O., Milne, D., Legg, C., Witten, I.H. (2009). Mining meaning from wikipedia. International Journal of Human-Computer Studies, 67(9), 716–754.CrossRef
Zurück zum Zitat Milne, D., & Witten, I.H. (2008a). An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy (pp. 25–30). AAAI Press. Milne, D., & Witten, I.H. (2008a). An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy (pp. 25–30). AAAI Press.
Zurück zum Zitat Milne, D., & Witten, I.H. (2008b). Learning to link with wikipedia. In Proceedings of the 17th ACM conference on information and knowledge management, ACM, (CIKM ’08) (pp. 509–518). Napa Valley: California. Milne, D., & Witten, I.H. (2008b). Learning to link with wikipedia. In Proceedings of the 17th ACM conference on information and knowledge management, ACM, (CIKM ’08) (pp. 509–518). Napa Valley: California.
Zurück zum Zitat Milne, D., & Witten, I.H. (2009). An open-source toolkit for mining Wikipedia. In Proceedings of the New Zealand computer science research student conference, (NZCSRSC’09)(Vol. 9). Milne, D., & Witten, I.H. (2009). An open-source toolkit for mining Wikipedia. In Proceedings of the New Zealand computer science research student conference, (NZCSRSC’09)(Vol. 9).
Zurück zum Zitat Navigli, R. (2009). Word sense disambiguation: a survey. ACM Computing Surveys, 41(2), 1–69.CrossRef Navigli, R. (2009). Word sense disambiguation: a survey. ACM Computing Surveys, 41(2), 1–69.CrossRef
Zurück zum Zitat Noll, M.G., & Meinel, C. (2007). Authors vs. readers: a comparative study of document metadata and content in the www. In Proceedings of the 2007 ACM symposium on document engineering, ACM, (DocEng ’07) (pp. 177–186). Winnipeg: Manitoba, Canada. Noll, M.G., & Meinel, C. (2007). Authors vs. readers: a comparative study of document metadata and content in the www. In Proceedings of the 2007 ACM symposium on document engineering, ACM, (DocEng ’07) (pp. 177–186). Winnipeg: Manitoba, Canada.
Zurück zum Zitat Noll, M.G., & Meinel, C. (2008). Exploring social annotations for web document classification. In Proceedings of the 2008 ACM symposium on applied computing, SAC ’08 (pp. 2315–2320). New York: ACM. Noll, M.G., & Meinel, C. (2008). Exploring social annotations for web document classification. In Proceedings of the 2008 ACM symposium on applied computing, SAC ’08 (pp. 2315–2320). New York: ACM.
Zurück zum Zitat Platt, J.C. (1999). Advances in kernel methods. MIT Press, Cambridge, MA, USA, chap Fast training of support vector machines using sequential minimal optimization, (pp. 185-208). Platt, J.C. (1999). Advances in kernel methods. MIT Press, Cambridge, MA, USA, chap Fast training of support vector machines using sequential minimal optimization, (pp. 185-208).
Zurück zum Zitat Porter, M. (1997). Readings in information retrieval. Morgan Kaufmann Publishers Inc., CA, USA, chap An algorithm for suffix stripping, (pp. 313–316). Porter, M. (1997). Readings in information retrieval. Morgan Kaufmann Publishers Inc., CA, USA, chap An algorithm for suffix stripping, (pp. 313–316).
Zurück zum Zitat Rijsbergen, C.Jv. (1979). Information retrieval, 2nd edn. Newton: Butterworth-Heinemann. Rijsbergen, C.Jv. (1979). Information retrieval, 2nd edn. Newton: Butterworth-Heinemann.
Zurück zum Zitat Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.CrossRef Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.CrossRef
Zurück zum Zitat Schütze, H., & Silverstein, C. (1997). Projections for efficient document clustering. In Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval, (SIGIR ’97)(pp. 74–81). Philadelphia: ACM. Schütze, H., & Silverstein, C. (1997). Projections for efficient document clustering. In Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval, (SIGIR ’97)(pp. 74–81). Philadelphia: ACM.
Zurück zum Zitat Solskinnsbakk, G., Gulla, J.A., Haderlein, V., Myrseth, P., Cerrato, O. (2012). Quality of hierarchies in ontologies and folksonomies. Data & Knowledge Engineering, 74, 13–25. Solskinnsbakk, G., Gulla, J.A., Haderlein, V., Myrseth, P., Cerrato, O. (2012). Quality of hierarchies in ontologies and folksonomies. Data & Knowledge Engineering, 74, 13–25.
Zurück zum Zitat Strube M, & Ponzetto SP (2006). Wikirelate! computing semantic relatedness using wikipedia. In Proceedings of the 21st national conference on artificial intelligence, (AAAI’06) (pp. 1419–1424). MA: AAAI Press. Strube M, & Ponzetto SP (2006). Wikirelate! computing semantic relatedness using wikipedia. In Proceedings of the 21st national conference on artificial intelligence, (AAAI’06) (pp. 1419–1424). MA: AAAI Press.
Zurück zum Zitat Vapnik, V.N. (1995). The nature of statistical learning theory. New York: Springer-Verlag.CrossRefMATH Vapnik, V.N. (1995). The nature of statistical learning theory. New York: Springer-Verlag.CrossRefMATH
Zurück zum Zitat Yin, Z., Li, R., Mei, Q., Han, J. (2009). Exploring social tagging graph for web object classification. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, (KDD ’09) (pp. 957–966). Paris: ACM. Yin, Z., Li, R., Mei, Q., Han, J. (2009). Exploring social tagging graph for web object classification. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, (KDD ’09) (pp. 957–966). Paris: ACM.
Zurück zum Zitat Zipf, G.K. (1935). The Psychobiology of Language. Houghton-Mifflin. Zipf, G.K. (1935). The Psychobiology of Language. Houghton-Mifflin.
Zurück zum Zitat Zubiaga, A., Martínez, R., Fresno, V. (2009). Getting the most out of social annotations for web page classification. In Proceedings of the 9th ACM symposium on document engineering, ACM, (DocEng ’09) (pp. 74–83). Munich, Germany. Zubiaga, A., Martínez, R., Fresno, V. (2009). Getting the most out of social annotations for web page classification. In Proceedings of the 9th ACM symposium on document engineering, ACM, (DocEng ’09) (pp. 74–83). Munich, Germany.
Zurück zum Zitat Zubiaga, A., Körner, C., Strohmaier, M. (2011a). Tags vs shelves: from social tagging to social classification. In Proceedings of the 22nd ACM conference on hypertext and hypermedia, ACM, (HT ’11) (pp. 93–102). Eindhoven, The Netherlands. Zubiaga, A., Körner, C., Strohmaier, M. (2011a). Tags vs shelves: from social tagging to social classification. In Proceedings of the 22nd ACM conference on hypertext and hypermedia, ACM, (HT ’11) (pp. 93–102). Eindhoven, The Netherlands.
Zurück zum Zitat Zubiaga, A., Martínez, R., Fresno, V. (2011b). Analyzing tag distributions in folksonomies for resource classification. In Proceedings of the 5th international conference on knowledge science, engineering and management, (KSEM’11) (pp. 91–102). Irvine: Springer-Verlag. Zubiaga, A., Martínez, R., Fresno, V. (2011b). Analyzing tag distributions in folksonomies for resource classification. In Proceedings of the 5th international conference on knowledge science, engineering and management, (KSEM’11) (pp. 91–102). Irvine: Springer-Verlag.
Metadaten
Titel
Semantic grounding of social annotations for enhancing resource classification in folksonomies
verfasst von
Antonela Tommasel
Daniela Godoy
Publikationsdatum
01.06.2015
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 3/2015
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-014-0339-y

Weitere Artikel der Ausgabe 3/2015

Journal of Intelligent Information Systems 3/2015 Zur Ausgabe