Skip to main content
Erschienen in: Neural Computing and Applications 3/2010

01.04.2010 | Original Article

Visualization and clustering of categorical data with probabilistic self-organizing map

verfasst von: Mustapha Lebbah, Khalid Benabdeslem

Erschienen in: Neural Computing and Applications | Ausgabe 3/2010

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper introduces a self-organizing map dedicated to clustering, analysis and visualization of categorical data. Usually, when dealing with categorical data, topological maps use an encoding stage: categorical data are changed into numerical vectors and traditional numerical algorithms (SOM) are run. In the present paper, we propose a novel probabilistic formalism of Kohonen map dedicated to categorical data where neurons are represented by probability tables. We do not need to use any coding to encode variables. We evaluate the effectiveness of our model in four examples using real data. Our experiments show that our model provides a good quality of results when dealing with categorical data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Andreopoulos B, An A, Wang X (2006) Bi-level clustering of mixed categorical and numerical biomedical data. Int J Data Min Bioinform 1(1):19–56 Andreopoulos B, An A, Wang X (2006) Bi-level clustering of mixed categorical and numerical biomedical data. Int J Data Min Bioinform 1(1):19–56
2.
Zurück zum Zitat Anouar F, Badran F, Thiria S (1998) Probabilistic self-organizing map and radial basis function networks. Neurocomputing 20(1–3):83–96MATHCrossRef Anouar F, Badran F, Thiria S (1998) Probabilistic self-organizing map and radial basis function networks. Neurocomputing 20(1–3):83–96MATHCrossRef
4.
Zurück zum Zitat Bishop CM, Tipping ME (1998) A hierarchical latent variable model for data visualization. IEEE Trans Pattern Anal Machine Intell 20:281–293CrossRef Bishop CM, Tipping ME (1998) A hierarchical latent variable model for data visualization. IEEE Trans Pattern Anal Machine Intell 20:281–293CrossRef
5.
Zurück zum Zitat Bishop CM, Svensén M, Williams CKI (1998) GTM: the generative topographic mapping. Neural Comput 10(1) Bishop CM, Svensén M, Williams CKI (1998) GTM: the generative topographic mapping. Neural Comput 10(1)
6.
Zurück zum Zitat Celeux G, Govaert G (1992) A classification EM algorithm for clustering and stochastic version. Comput Stat Data Anal 14:351–332CrossRefMathSciNet Celeux G, Govaert G (1992) A classification EM algorithm for clustering and stochastic version. Comput Stat Data Anal 14:351–332CrossRefMathSciNet
7.
Zurück zum Zitat Celeux G, Forbes F, Payrard N (2003) EM procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognit 36:131–144MATHCrossRef Celeux G, Forbes F, Payrard N (2003) EM procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognit 36:131–144MATHCrossRef
8.
Zurück zum Zitat Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(1):1–38MATHMathSciNet Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(1):1–38MATHMathSciNet
9.
Zurück zum Zitat Dolinicar S, Weingessel A, Buchta C, Dimitriadou E (1998) A Comparaison of several cluster algorithms on artificial binary data, scenarios from travel market segmentation. Working paper series 19, SFB (adaptive information systems and modelling in economics and management science) Dolinicar S, Weingessel A, Buchta C, Dimitriadou E (1998) A Comparaison of several cluster algorithms on artificial binary data, scenarios from travel market segmentation. Working paper series 19, SFB (adaptive information systems and modelling in economics and management science)
11.
Zurück zum Zitat Girolami M (2001) The topographic organisation and visualisation of binary data using multivariate-Bernoulli latent variable models. IEEE Trans Neural Netw 12(6):1367–1374CrossRef Girolami M (2001) The topographic organisation and visualisation of binary data using multivariate-Bernoulli latent variable models. IEEE Trans Neural Netw 12(6):1367–1374CrossRef
12.
Zurück zum Zitat Graepel T, Burger M, Obermayer K (1998) Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21:173–190MATHCrossRef Graepel T, Burger M, Obermayer K (1998) Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21:173–190MATHCrossRef
13.
Zurück zum Zitat Heskes T (2001) Self-organizing maps, vector quantization, and mixture modeling. IEEE Trans Neural Netw 12:1299–1305CrossRef Heskes T (2001) Self-organizing maps, vector quantization, and mixture modeling. IEEE Trans Neural Netw 12:1299–1305CrossRef
14.
Zurück zum Zitat Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Machine Learn 42:177–196MATHCrossRef Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Machine Learn 42:177–196MATHCrossRef
15.
Zurück zum Zitat Hsu C-C, Wang K-M, Wang S-H (2006) Gvisom for multivariate mixed data projection and structure visualization. In: International joint conferences on neural networks. IJCNN 16–21 July 2006 Hsu C-C, Wang K-M, Wang S-H (2006) Gvisom for multivariate mixed data projection and structure visualization. In: International joint conferences on neural networks. IJCNN 16–21 July 2006
16.
Zurück zum Zitat Ibbou S, Cottrell M (1995) Multiple correspondance analysis crosstabulation matrix using the Kohonen algorithm. In: Verlaeysen M (ed) Proceedings of ESANN’95, pp 27–32. Dfacto Bruxelles Ibbou S, Cottrell M (1995) Multiple correspondance analysis crosstabulation matrix using the Kohonen algorithm. In: Verlaeysen M (ed) Proceedings of ESANN’95, pp 27–32. Dfacto Bruxelles
17.
Zurück zum Zitat Jollois F, Nadif M (2007) Speed-up for the expectation-maximization algorithm for clustering categorical data. J Glob Optim 37(4):513–525MATHCrossRefMathSciNet Jollois F, Nadif M (2007) Speed-up for the expectation-maximization algorithm for clustering categorical data. J Glob Optim 37(4):513–525MATHCrossRefMathSciNet
18.
Zurück zum Zitat Jorgensen M, Hunt L (1996) Mixture model clustering of data sets with categorical and continuous variables. In: Dove DL, Korb KB, Oliver JJ (eds) Information, statistics and induction in science, ISIS 96, Australia. MIT Press, Cambridge, pp 375–384 Jorgensen M, Hunt L (1996) Mixture model clustering of data sets with categorical and continuous variables. In: Dove DL, Korb KB, Oliver JJ (eds) Information, statistics and induction in science, ISIS 96, Australia. MIT Press, Cambridge, pp 375–384
19.
Zurück zum Zitat Kaban A, Girolami M (2001) A combined latent class and trait model for the analysis and visualization of discrete data. IEEE Trans Pattern Anal Mach Intell 23:859–872CrossRef Kaban A, Girolami M (2001) A combined latent class and trait model for the analysis and visualization of discrete data. IEEE Trans Pattern Anal Mach Intell 23:859–872CrossRef
20.
Zurück zum Zitat Kaski S, Honkela T, Lagus K, Kohonen T (1998) WEBSOM–self-organizing maps of document collections. Neurocomputing 21:101–117MATHCrossRef Kaski S, Honkela T, Lagus K, Kohonen T (1998) WEBSOM–self-organizing maps of document collections. Neurocomputing 21:101–117MATHCrossRef
21.
Zurück zum Zitat Kohonen T (2001) Self-organizing maps. Springer, Berlin.MATH Kohonen T (2001) Self-organizing maps. Springer, Berlin.MATH
22.
Zurück zum Zitat Kohonen T, Kaski S, Lappalainen H (1997) Self-organized formation of various invariant-feature filters in the adaptive subspace SOM. Neural Comput 9(6):1321–1344CrossRef Kohonen T, Kaski S, Lappalainen H (1997) Self-organized formation of various invariant-feature filters in the adaptive subspace SOM. Neural Comput 9(6):1321–1344CrossRef
23.
Zurück zum Zitat Kostiainen T, Lampinen J (2002) On the generative probability density model in the self-organizing map. Neurocomputing 48:217-228MATHCrossRef Kostiainen T, Lampinen J (2002) On the generative probability density model in the self-organizing map. Neurocomputing 48:217-228MATHCrossRef
24.
Zurück zum Zitat Lebart L, Piron M, Steiner J-F (2003) La sémiométrie. Dunod, Paris. Lebart L, Piron M, Steiner J-F (2003) La sémiométrie. Dunod, Paris.
25.
Zurück zum Zitat Lebbah M, Thiria S, Badran F (2000) Topological map for binary data, topological map for binary data, ESANN, Bruges, April 26-27-28, (2000), Proceedings. Lebbah M, Thiria S, Badran F (2000) Topological map for binary data, topological map for binary data, ESANN, Bruges, April 26-27-28, (2000), Proceedings.
26.
Zurück zum Zitat Lebbah M, Chazottes A, Badran F, Thiria S (2005) Mixed topological map. In: ESANN, pp 357–362 Lebbah M, Chazottes A, Badran F, Thiria S (2005) Mixed topological map. In: ESANN, pp 357–362
27.
Zurück zum Zitat Lebbah M, Rogovschi N, Bennani Y (2007) BeSOM: bernoulli on self organizing map. In: International joint conferences on neural networks. IJCNN 2007, 12–17 August. Orlando, Florida, pp 631–636 Lebbah M, Rogovschi N, Bennani Y (2007) BeSOM: bernoulli on self organizing map. In: International joint conferences on neural networks. IJCNN 2007, 12–17 August. Orlando, Florida, pp 631–636
28.
Zurück zum Zitat Leich F, Weingessel A, Dimitriadou E (1998) E.: Competitive learning for binary data. In: Proceedings of ICANN’98, 2–4 september. Springer, Heidelberg Leich F, Weingessel A, Dimitriadou E (1998) E.: Competitive learning for binary data. In: Proceedings of ICANN’98, 2–4 september. Springer, Heidelberg
29.
Zurück zum Zitat Luttrel SP (1994) A bayesian analysis of self-organizing maps. Neural Comput 6 Luttrel SP (1994) A bayesian analysis of self-organizing maps. Neural Comput 6
30.
Zurück zum Zitat Martinetz T, Schulten K (1991) A “neural-gas” network learns topologies. Artif Neural Netw I:397–402 Martinetz T, Schulten K (1991) A “neural-gas” network learns topologies. Artif Neural Netw I:397–402
31.
Zurück zum Zitat McLachlan G, Krishman T (1997) The EM algorithm and extensions. Wiley, New YorkMATH McLachlan G, Krishman T (1997) The EM algorithm and extensions. Wiley, New YorkMATH
32.
Zurück zum Zitat Nadif M, Govaert G (1998) Clustering for binary data and mixture models: choice of the model. Appl Stoch Models Data Anal 13:269–278CrossRef Nadif M, Govaert G (1998) Clustering for binary data and mixture models: choice of the model. Appl Stoch Models Data Anal 13:269–278CrossRef
33.
Zurück zum Zitat Saund E (1995) A multiple cause mixture model for unsupervised learning. Neural Comput 7:51–71CrossRef Saund E (1995) A multiple cause mixture model for unsupervised learning. Neural Comput 7:51–71CrossRef
34.
Zurück zum Zitat Steiner J-F, Auliard O (1992) La sémiometrie: un outil de validation des réponses. In: Lebart L (ed) La Qualité de l’Information dans les Enquêtes. Quality of information in sample surveys. ASU, Dunod, Paris, pp 241–274 Steiner J-F, Auliard O (1992) La sémiometrie: un outil de validation des réponses. In: Lebart L (ed) La Qualité de l’Information dans les Enquêtes. Quality of information in sample surveys. ASU, Dunod, Paris, pp 241–274
35.
Zurück zum Zitat Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Proceedings of the 1998 conference on advances in neural information processing systems II. MIT Press, Cambridge, pp 592–598. ISBN: 0-262-11245-0 Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Proceedings of the 1998 conference on advances in neural information processing systems II. MIT Press, Cambridge, pp 592–598. ISBN: 0-262-11245-0
36.
Zurück zum Zitat Verbeek JJ, Vlassis N, Kröse BJA (2005) Self-organizing mixture models. Neurocomputing 63:99–123CrossRef Verbeek JJ, Vlassis N, Kröse BJA (2005) Self-organizing mixture models. Neurocomputing 63:99–123CrossRef
37.
Zurück zum Zitat Yin J, Tan Z (2005) Clustering mixed type attributes in large dataset. In: ISPA, pp 655–661 Yin J, Tan Z (2005) Clustering mixed type attributes in large dataset. In: ISPA, pp 655–661
Metadaten
Titel
Visualization and clustering of categorical data with probabilistic self-organizing map
verfasst von
Mustapha Lebbah
Khalid Benabdeslem
Publikationsdatum
01.04.2010
Verlag
Springer-Verlag
Erschienen in
Neural Computing and Applications / Ausgabe 3/2010
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-009-0299-2

Weitere Artikel der Ausgabe 3/2010

Neural Computing and Applications 3/2010 Zur Ausgabe

Premium Partner