Skip to main content
Top

2013 | OriginalPaper | Chapter

A Fully Semantic Approach to Large Scale Text Categorization

Authors : Nicoletta Dessì, Stefania Dessì, Barbara Pes

Published in: Information Sciences and Systems 2013

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Text categorization is usually performed by supervised algorithms on the large amount of hand-labelled documents which are labor-intensive and often not available. To avoid this drawback, this paper proposes a text categorization approach that is designed to fully exploiting semantic resources. It employs the ontological knowledge not only as lexical support for disambiguating terms and deriving their sense inventory, but also to classify documents in topic categories. Specifically, our work relates to apply two corpus-based thesauri (i.e. WordNet and WordNet Domains) for selecting the correct sense of words in a document while utilizing domain names for classification purposes. Experiments presented show how our approach performs well in classifying a large corpus of documents. A key part of the paper is the discussion of important aspects related to the use of surrounding words and different methods for word sense disambiguation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Liu T, Yang Y, Wan H et al (2005) An experimental study on large-scale web categorization. In: Posters proceedings of the 14th international World Wide Web conference, pp 1106–1107 Liu T, Yang Y, Wan H et al (2005) An experimental study on large-scale web categorization. In: Posters proceedings of the 14th international World Wide Web conference, pp 1106–1107
2.
go back to reference Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47CrossRef Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47CrossRef
4.
go back to reference Bai R, Wang X, Liao J (2010) Extract semantic information from WordNet to improve text classification performance. In: Proceedings of the international conference on Advances in computer science and information technology. LNCS 6059:409–420 Bai R, Wang X, Liao J (2010) Extract semantic information from WordNet to improve text classification performance. In: Proceedings of the international conference on Advances in computer science and information technology. LNCS 6059:409–420
5.
go back to reference Miller GA (1995) WordNet: a Lexical database for English. Commun ACM 38(11):39–41CrossRef Miller GA (1995) WordNet: a Lexical database for English. Commun ACM 38(11):39–41CrossRef
6.
go back to reference Fellbaum C (ed) (1998) WordNet: an electronic Lexical database. MIT Press, CambridgeMATH Fellbaum C (ed) (1998) WordNet: an electronic Lexical database. MIT Press, CambridgeMATH
7.
go back to reference Magnini B, Cavaglià G (2000) Integrating subject field codes into WordNet. In: Proceedings of LREC-2000, 2nd international conference on language resources and evaluation, Athens, Greece, pp 1413–1418 Magnini B, Cavaglià G (2000) Integrating subject field codes into WordNet. In: Proceedings of LREC-2000, 2nd international conference on language resources and evaluation, Athens, Greece, pp 1413–1418
8.
go back to reference Bentivogli L, Forner P, Magnini B et al (2004) Revising WordNet domains hierarchy: semantics, coverage, and balancing. In: Proceedings of COLING workshop on multilingual Linguistic resources. Switzerland, Geneva, pp 101–108 Bentivogli L, Forner P, Magnini B et al (2004) Revising WordNet domains hierarchy: semantics, coverage, and balancing. In: Proceedings of COLING workshop on multilingual Linguistic resources. Switzerland, Geneva, pp 101–108
9.
go back to reference Kulkarni S, Singh A, Ramakrishnan G et al (2009) Collective annotation of Wikipedia entities in web text. In: Proceedings of ACM KDD, pp 457–466 Kulkarni S, Singh A, Ramakrishnan G et al (2009) Collective annotation of Wikipedia entities in web text. In: Proceedings of ACM KDD, pp 457–466
10.
go back to reference Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of ACM CIKM, pp 233–242 Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of ACM CIKM, pp 233–242
12.
go back to reference Bizer C, Lehmann J, Kobilarov G et al (2009) DBpedia: a crystallization Point for the Web of Data. J Web Semant: Sci, Serv Agents WWW 7:154–165CrossRef Bizer C, Lehmann J, Kobilarov G et al (2009) DBpedia: a crystallization Point for the Web of Data. J Web Semant: Sci, Serv Agents WWW 7:154–165CrossRef
13.
go back to reference de Buenaga Rodriguez M, Gomez-Hidalgo J, Diaz-Agudo B (1997) Using WordNet to complement training information in text categorization. In: Proceedings of the 2nd international conference on recent advances in natural language processing (RANLP’97), pp 150–157 de Buenaga Rodriguez M, Gomez-Hidalgo J, Diaz-Agudo B (1997) Using WordNet to complement training information in text categorization. In: Proceedings of the 2nd international conference on recent advances in natural language processing (RANLP’97), pp 150–157
14.
go back to reference Gan M, Dou X, Jiang R (2013) From ontology to semantic similarity: calculation of ontology-based semantic similarity. Sci World J , Article ID 793091, p 11 Gan M, Dou X, Jiang R (2013) From ontology to semantic similarity: calculation of ontology-based semantic similarity. Sci World J , Article ID 793091, p 11
15.
go back to reference Basile P, De Gemmis M, Gentile AL et al (2007) UNIBA: JIGSAW algorithm for Word Sense Disambiguation, In: Proceedings of the 4th international workshop on semantic evaluations (SemEval-2007), pp 398–401 Basile P, De Gemmis M, Gentile AL et al (2007) UNIBA: JIGSAW algorithm for Word Sense Disambiguation, In: Proceedings of the 4th international workshop on semantic evaluations (SemEval-2007), pp 398–401
16.
go back to reference Jiang J, Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings on international conference on research in computational linguistics, pp 19–33 Jiang J, Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings on international conference on research in computational linguistics, pp 19–33
17.
go back to reference Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the international conference on machine learning, Madison, pp 296–304 Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the international conference on machine learning, Madison, pp 296–304
18.
go back to reference Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on, artificial intelligence, pp 448–453 Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on, artificial intelligence, pp 448–453
19.
go back to reference Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In: Fellbaum C (ed) WordNet: an electronic lexical database, pp 265–283 (MIT Press) Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In: Fellbaum C (ed) WordNet: an electronic lexical database, pp 265–283 (MIT Press)
20.
go back to reference Miller GA, Leacock C, Tengi R et al (1993) A semantic concordance. In: Proceedings of ARPA workshop on human language technology, pp 303–308 Miller GA, Leacock C, Tengi R et al (1993) A semantic concordance. In: Proceedings of ARPA workshop on human language technology, pp 303–308
Metadata
Title
A Fully Semantic Approach to Large Scale Text Categorization
Authors
Nicoletta Dessì
Stefania Dessì
Barbara Pes
Copyright Year
2013
DOI
https://doi.org/10.1007/978-3-319-01604-7_15

Premium Partner