Skip to main content
Top
Published in: Soft Computing 8/2010

01-06-2010 | Focus

Fuzzy optimized self-organizing maps and their application to document clustering

Authors: Francisco P. Romero, Arturo Peralta, Andres Soto, Jose A. Olivas, Jesus Serrano-Guerrero

Published in: Soft Computing | Issue 8/2010

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, an approach using fuzzy logic techniques and self-organizing maps (SOM) is presented in order to manage conceptual aspects in document clusters and to reduce the training time. In order to measure the presence degree of a concept in a document, a concept frequency formula is introduced. This formula is based on new fuzzy formulas to calculate the polysemy degree of terms and the synonymy degree between terms. In this approach, new fuzzy improvements such as automatic choice of the topology, heuristic map initialization, a fuzzy similarity measure and a keywords extraction process are used. Some experiments have been carried out in order to compare the proposed system with classic SOM approaches by means of Reuters collection. The system performance has been measured in terms of F-measure and training time. The experimental results show that the proposed approach generates good results with less training time compared to classic SOM techniques.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251CrossRef Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251CrossRef
go back to reference Azcarraga AP, Yap TN (2001) Som-based methodology for building large text archives. In: Proceedings of the 7th international conference on database systems for advanced applications, pp 66–73. IEEE Computer Society, Washington Azcarraga AP, Yap TN (2001) Som-based methodology for building large text archives. In: Proceedings of the 7th international conference on database systems for advanced applications, pp 66–73. IEEE Computer Society, Washington
go back to reference Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, New York Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, New York
go back to reference Bezdek JC, Tsao EC, Pal NR (1992) Fuzzy kohonen clustering networks. In: Proceedings of the IEEE international conference on fuzzy systems, pp 1035–1043 Bezdek JC, Tsao EC, Pal NR (1992) Fuzzy kohonen clustering networks. In: Proceedings of the IEEE international conference on fuzzy systems, pp 1035–1043
go back to reference Bordogna G, Pagani M, Pasi G (2006) A dynamical hierarchical fuzzy clustering algorithm for document filtering. Stud Fuzziness Soft Comput 197:1–23CrossRef Bordogna G, Pagani M, Pasi G (2006) A dynamical hierarchical fuzzy clustering algorithm for document filtering. Stud Fuzziness Soft Comput 197:1–23CrossRef
go back to reference Bouchachia A, Mittermeir R (2006) Towards incremental fuzzy classifiers. Soft Comput 11(2):193–207CrossRef Bouchachia A, Mittermeir R (2006) Towards incremental fuzzy classifiers. Soft Comput 11(2):193–207CrossRef
go back to reference Cottrell M, Verleysen M (2006) Advances in self-organizing maps. Neural Netw 19(6):721–722CrossRef Cottrell M, Verleysen M (2006) Advances in self-organizing maps. Neural Netw 19(6):721–722CrossRef
go back to reference Ellman J (2003) Eurowordnet: a multilingual database with lexical semantic networks. Nat Lang Eng 9(4):427–430CrossRef Ellman J (2003) Eurowordnet: a multilingual database with lexical semantic networks. Nat Lang Eng 9(4):427–430CrossRef
go back to reference Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge
go back to reference Fernandez S, Grana J, Sobrino A (2002) A spanish e-dictionary of synonyms as a fuzzy tool for information retrieval. In: Actas de las I Jornadas de Tratamiento y Recuperacion de Informacion Fernandez S, Grana J, Sobrino A (2002) A spanish e-dictionary of synonyms as a fuzzy tool for information retrieval. In: Actas de las I Jornadas de Tratamiento y Recuperacion de Informacion
go back to reference Garcés P, Olivas J, Romero F (2006) Concept-matching IR systems versus word-matching information retrieval systems: considering fuzzy interrelations for indexing web pages. J Am Soc Inf Sci Technol 57(4):564–576CrossRef Garcés P, Olivas J, Romero F (2006) Concept-matching IR systems versus word-matching information retrieval systems: considering fuzzy interrelations for indexing web pages. J Am Soc Inf Sci Technol 57(4):564–576CrossRef
go back to reference Gonzalo J, Verdejo F, Chugur I (1998) Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL’98 Workshop on Usage of WordNet for NLP, pp 38–44 Gonzalo J, Verdejo F, Chugur I (1998) Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL’98 Workshop on Usage of WordNet for NLP, pp 38–44
go back to reference Han J (2005) Data Mining: concepts and techniques. Morgan Kaufmann, San Francisco Han J (2005) Data Mining: concepts and techniques. Morgan Kaufmann, San Francisco
go back to reference Hotho A, Staab S, Stumme G (2003) Ontologies improve text document clustering. In: Proceedings of the third IEEE international conference on data mining, pp 541–544. IEEE Press, Washington DC Hotho A, Staab S, Stumme G (2003) Ontologies improve text document clustering. In: Proceedings of the third IEEE international conference on data mining, pp 541–544. IEEE Press, Washington DC
go back to reference Huntsberger T, Ajjimarangsee P (1992) Parallel self-organizing feature maps for unsupervised pattern recognition. In: Fuzzy Models for Pattern Recognition, pp 483–495 Huntsberger T, Ajjimarangsee P (1992) Parallel self-organizing feature maps for unsupervised pattern recognition. In: Fuzzy Models for Pattern Recognition, pp 483–495
go back to reference Kaski S (1998) Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Proceedings international join conference on neural networks, vol 1, pp 413–418 Kaski S (1998) Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Proceedings international join conference on neural networks, vol 1, pp 413–418
go back to reference Kong S, Kosko B (1992) Adaptive fuzzy system for backing up a truck-and-trailer. IEEE Trans Neural Netw 3:211–223CrossRef Kong S, Kosko B (1992) Adaptive fuzzy system for backing up a truck-and-trailer. IEEE Trans Neural Netw 3:211–223CrossRef
go back to reference Lagus K, Honkela T, Kaski S, Kohonen T (1999) Websom for textual data mining. Artif Intell Rev 13(5–6):345–364CrossRef Lagus K, Honkela T, Kaski S, Kohonen T (1999) Websom for textual data mining. Artif Intell Rev 13(5–6):345–364CrossRef
go back to reference Lazzerini B, Marcelloni F (2007) A hierarchical fuzzy clustering-based system to create user profiles. Soft Comput 11:157–168MATHCrossRef Lazzerini B, Marcelloni F (2007) A hierarchical fuzzy clustering-based system to create user profiles. Soft Comput 11:157–168MATHCrossRef
go back to reference Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404CrossRef Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404CrossRef
go back to reference Lin X, Soergel D, Marchionini G (1991) A self-organizing semantic map for information retrieval. In: Proceedings of the 14th annual international ACM SIGIR conference, pp 262–269. ACM, New York Lin X, Soergel D, Marchionini G (1991) A self-organizing semantic map for information retrieval. In: Proceedings of the 14th annual international ACM SIGIR conference, pp 262–269. ACM, New York
go back to reference Merkl D (1998) Text classification with self-organizing maps: some lessons learned. Neurocomputing 21:68–77CrossRef Merkl D (1998) Text classification with self-organizing maps: some lessons learned. Neurocomputing 21:68–77CrossRef
go back to reference Miller GA, Beckwith R, Fellbaum C et al (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicogr 3(4):235–244CrossRef Miller GA, Beckwith R, Fellbaum C et al (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicogr 3(4):235–244CrossRef
go back to reference Mitra S, Pal SK (1994) Self-organizing neural network as a fuzzy classifier. IEEE Trans Syst Man Cybern 24(3):385–399CrossRef Mitra S, Pal SK (1994) Self-organizing neural network as a fuzzy classifier. IEEE Trans Syst Man Cybern 24(3):385–399CrossRef
go back to reference Miyamoto S (1990) Fuzzy sets in information retrieval and cluster analysis. Kluwer, Dordrecht Miyamoto S (1990) Fuzzy sets in information retrieval and cluster analysis. Kluwer, Dordrecht
go back to reference Nürnberger A, Detyniecki M (2006) Externally growing self-organizing maps and its application to e-mail database visualization and exploration. Appl Soft Comput 6(4):357–371CrossRef Nürnberger A, Detyniecki M (2006) Externally growing self-organizing maps and its application to e-mail database visualization and exploration. Appl Soft Comput 6(4):357–371CrossRef
go back to reference Olivas JA, Garcés PJ, Romero FP (2003) An application of the fis-crm model to the fiss metasearcher: using fuzzy synonymy and fuzzy generality for representing concepts in documents. Int J Approx Reason 34:201–209MATHCrossRef Olivas JA, Garcés PJ, Romero FP (2003) An application of the fis-crm model to the fiss metasearcher: using fuzzy synonymy and fuzzy generality for representing concepts in documents. Int J Approx Reason 34:201–209MATHCrossRef
go back to reference Pascual-Marqui RD, Pascual-Montano AD, Kochi K, Carazo JM (2001) Smoothly distributed fuzzy c-means: a new self-organizing map. Pattern Recognit 34:2395–2402CrossRef Pascual-Marqui RD, Pascual-Montano AD, Kochi K, Carazo JM (2001) Smoothly distributed fuzzy c-means: a new self-organizing map. Pattern Recognit 34:2395–2402CrossRef
go back to reference Ritter H, Kohonen T (1989) Self-organizing semantic maps. Biol Cybern 61:241–254CrossRef Ritter H, Kohonen T (1989) Self-organizing semantic maps. Biol Cybern 61:241–254CrossRef
go back to reference Romero FP, Olivas JA, Garcés PJ (2006) A soft approach to hybrid models for document clustering. Proc Inform Process Manag Uncertain Knowl Based Syst 1:1040–1045 Romero FP, Olivas JA, Garcés PJ (2006) A soft approach to hybrid models for document clustering. Proc Inform Process Manag Uncertain Knowl Based Syst 1:1040–1045
go back to reference Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, New York Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, New York
go back to reference Soto A, Olivas JA, Prieto M (2008) Fuzzy approach of synonymy and polysemy for information retrieval. Stud Fuzziness Soft Comput 224:179–198CrossRef Soto A, Olivas JA, Prieto M (2008) Fuzzy approach of synonymy and polysemy for information retrieval. Stud Fuzziness Soft Comput 224:179–198CrossRef
go back to reference Steinbach M, Karypis G, Kumara V (2000) A comparison of document clustering techniques. In: Proceedings of the knowledge discovery on databases, pp 3–7 Steinbach M, Karypis G, Kumara V (2000) A comparison of document clustering techniques. In: Proceedings of the knowledge discovery on databases, pp 3–7
go back to reference Uchida H, Zhu M, Della ST (1995) UNL: a gift for a millennium. The United Nations University, Tokyo Uchida H, Zhu M, Della ST (1995) UNL: a gift for a millennium. The United Nations University, Tokyo
go back to reference Van Rijsbergen C (1979) Information retrieval. Butterworth, London Van Rijsbergen C (1979) Information retrieval. Butterworth, London
go back to reference Wallace M, Akrivas G, Stamou G (2003) Automatic thematic categorization of documents using a fuzzy taxonomy and fuzzy hierarchical clustering. In: Proceedings of the 12th IEEE international conference on fuzzy systems, vol 2, pp 1446–1451 Wallace M, Akrivas G, Stamou G (2003) Automatic thematic categorization of documents using a fuzzy taxonomy and fuzzy hierarchical clustering. In: Proceedings of the 12th IEEE international conference on fuzzy systems, vol 2, pp 1446–1451
Metadata
Title
Fuzzy optimized self-organizing maps and their application to document clustering
Authors
Francisco P. Romero
Arturo Peralta
Andres Soto
Jose A. Olivas
Jesus Serrano-Guerrero
Publication date
01-06-2010
Publisher
Springer-Verlag
Published in
Soft Computing / Issue 8/2010
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-009-0468-3

Other articles of this Issue 8/2010

Soft Computing 8/2010 Go to the issue

Premium Partner