Skip to main content
Erschienen in: Neural Computing and Applications 4/2010

01.06.2010 | Original Article

Cluster identification and separation in the growing self-organizing map: application in protein sequence classification

verfasst von: Norashikin Ahmad, Damminda Alahakoon, Rowena Chau

Erschienen in: Neural Computing and Applications | Ausgabe 4/2010

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Growing self-organizing map (GSOM) has been introduced as an improvement to the self-organizing map (SOM) algorithm in clustering and knowledge discovery. Unlike the traditional SOM, GSOM has a dynamic structure which allows nodes to grow reflecting the knowledge discovered from the input data as learning progresses. The spread factor parameter (SF) in GSOM can be utilized to control the spread of the map, thus giving an analyst a flexibility to examine the clusters at different granularities. Although GSOM has been applied in various areas and has been proven effective in knowledge discovery tasks, no comprehensive study has been done on the effect of the spread factor parameter value to the cluster formation and separation. Therefore, the aim of this paper is to investigate the effect of the spread factor value towards cluster separation in the GSOM. We used simple k-means algorithm as a method to identify clusters in the GSOM. By using Davies–Bouldin index, clusters formed by different values of spread factor are obtained and the resulting clusters are analyzed. In this work, we show that clusters can be more separated when the spread factor value is increased. Hierarchical clusters can then be constructed by mapping the GSOM clusters at different spread factor values.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
2.
Zurück zum Zitat Fritzke B (1994) Growing cell structures: a self-organizing network for unsupervised and supervised learning. Neural Netw 7:1441–1460CrossRef Fritzke B (1994) Growing cell structures: a self-organizing network for unsupervised and supervised learning. Neural Netw 7:1441–1460CrossRef
3.
Zurück zum Zitat Blackmore J, Miikkulainen R (1993) Incremental grid growing: encoding high-dimensional structure into a two-dimensional feature map. In: IEEE international conference on neural networks, pp 450–455 Blackmore J, Miikkulainen R (1993) Incremental grid growing: encoding high-dimensional structure into a two-dimensional feature map. In: IEEE international conference on neural networks, pp 450–455
4.
Zurück zum Zitat Alahakoon LD (2000) Data mining with structure adapting neural networks. In: School of computer science and software engineering. Monash University, pp xvii, 286 leaves Alahakoon LD (2000) Data mining with structure adapting neural networks. In: School of computer science and software engineering. Monash University, pp xvii, 286 leaves
5.
Zurück zum Zitat Alahakoon D, Halgamuge SK, Srinivasan B (2000) Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans Neural Netw 11:601–614CrossRef Alahakoon D, Halgamuge SK, Srinivasan B (2000) Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans Neural Netw 11:601–614CrossRef
6.
Zurück zum Zitat Alahakoon LD (2004) Controlling the spread of dynamic self-organising maps. Neural Comput Appl 13:168–174 Alahakoon LD (2004) Controlling the spread of dynamic self-organising maps. Neural Comput Appl 13:168–174
7.
Zurück zum Zitat Amarasiri R, Alahakoon D,Smith KA (2004) HDGSOM: a modified growing self-organizing map for high dimensional data clustering. In: Fourth international conference on hybrid intelligent systems, 2004 (HIS ‘04), pp 216–221 Amarasiri R, Alahakoon D,Smith KA (2004) HDGSOM: a modified growing self-organizing map for high dimensional data clustering. In: Fourth international conference on hybrid intelligent systems, 2004 (HIS ‘04), pp 216–221
8.
Zurück zum Zitat Zheng X, Liu W, He P, Dai W (2004) Document clustering algorithm based on tree-structured growing self-organizing feature map advances in neural networks—ISNN 2004, pp 840–845 Zheng X, Liu W, He P, Dai W (2004) Document clustering algorithm based on tree-structured growing self-organizing feature map advances in neural networks—ISNN 2004, pp 840–845
9.
Zurück zum Zitat Hsu AL, Tang S-L, Halgamuge SK (2003) An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics 19:2131–2140CrossRef Hsu AL, Tang S-L, Halgamuge SK (2003) An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics 19:2131–2140CrossRef
10.
Zurück zum Zitat Chan C-KK, Hsu AL, Tang S-L, Halgamuge SK (2008) Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. J Biomed Biotechnol 2008:10 Chan C-KK, Hsu AL, Tang S-L, Halgamuge SK (2008) Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. J Biomed Biotechnol 2008:10
11.
Zurück zum Zitat Wang H, Azuaje F, Black N (2004) An integrative and interactive framework for improving biomedical pattern discovery and visualization. IEEE Trans Inf Technol Biomed 8:16–27CrossRef Wang H, Azuaje F, Black N (2004) An integrative and interactive framework for improving biomedical pattern discovery and visualization. IEEE Trans Inf Technol Biomed 8:16–27CrossRef
12.
Zurück zum Zitat Zheng H, Wang H, Azuaje F (2008) Improving pattern discovery and visualization of SAGE data through poisson-based self-adaptive neural networks. IEEE Trans Inf Technol Biomed 12:459–469CrossRef Zheng H, Wang H, Azuaje F (2008) Improving pattern discovery and visualization of SAGE data through poisson-based self-adaptive neural networks. IEEE Trans Inf Technol Biomed 12:459–469CrossRef
13.
Zurück zum Zitat Wang H, Zheng H, Hu J (2008) Poisson approach to clustering analysis of regulatory sequences. Int J Comput Biol Drug Design 1:141–157CrossRef Wang H, Zheng H, Hu J (2008) Poisson approach to clustering analysis of regulatory sequences. Int J Comput Biol Drug Design 1:141–157CrossRef
14.
Zurück zum Zitat Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227CrossRef Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227CrossRef
15.
Zurück zum Zitat Amarasiri R, Wickramasinge K,Alahakoon D (2003) Enhanced cluster visualization using the data skeleton model. In: 3rd international conference on intelligent systems design and application (ISDA), Oklahoma, USA Amarasiri R, Wickramasinge K,Alahakoon D (2003) Enhanced cluster visualization using the data skeleton model. In: 3rd international conference on intelligent systems design and application (ISDA), Oklahoma, USA
16.
Zurück zum Zitat Hsu A, Alahakoon D, Halgamuge SK, Srinivasan B (2000) Automatic clustering and rule extraction using a dynamic SOM tree. In: Proceedings of the 6th international conference on automation, robotics, control and vision, Singapore Hsu A, Alahakoon D, Halgamuge SK, Srinivasan B (2000) Automatic clustering and rule extraction using a dynamic SOM tree. In: Proceedings of the 6th international conference on automation, robotics, control and vision, Singapore
17.
Zurück zum Zitat Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323 Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323
18.
Zurück zum Zitat Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11:586–600CrossRef Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11:586–600CrossRef
19.
Zurück zum Zitat Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl Acids Res 31:365–370CrossRef Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl Acids Res 31:365–370CrossRef
20.
Zurück zum Zitat Ferran EA, Pflugfelder B, Ferrara P (1994) Self-organized neural maps of human protein sequences. Protein Sci 3:507–521CrossRef Ferran EA, Pflugfelder B, Ferrara P (1994) Self-organized neural maps of human protein sequences. Protein Sci 3:507–521CrossRef
21.
Zurück zum Zitat Wang H-C, Dopazo J, De La Fraga LG, Zhu Y-P, Carazo JM (1998) Self-organizing tree-growing network for the classification of protein sequences. Protein Sci 7:2613–2622CrossRef Wang H-C, Dopazo J, De La Fraga LG, Zhu Y-P, Carazo JM (1998) Self-organizing tree-growing network for the classification of protein sequences. Protein Sci 7:2613–2622CrossRef
22.
Zurück zum Zitat Wu CH, McLarty JW (2000) Neural networks and genome informatics. Elsevier, Oxford, Amsterdam Wu CH, McLarty JW (2000) Neural networks and genome informatics. Elsevier, Oxford, Amsterdam
23.
Zurück zum Zitat Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ (2006) PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucl Acids Res 34:W32–37CrossRef Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ (2006) PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucl Acids Res 34:W32–37CrossRef
24.
Zurück zum Zitat Andrade MA, Casari G, Sander C, Valencia A (1997) Classification of protein families and detection of the determinant residues with an improved self-organizing map. Biol Cybern 76:441–450MATHCrossRef Andrade MA, Casari G, Sander C, Valencia A (1997) Classification of protein families and detection of the determinant residues with an improved self-organizing map. Biol Cybern 76:441–450MATHCrossRef
25.
Zurück zum Zitat Ferran EA, Ferrara P (1991) Topological maps of protein sequences. Biol Cybern 65:451–458MATHCrossRef Ferran EA, Ferrara P (1991) Topological maps of protein sequences. Biol Cybern 65:451–458MATHCrossRef
26.
Zurück zum Zitat Wu CH, Ermongkonchai A, Chang T-C (1991) Protein classification using a neural network database system. In: Proceedings of the conference on analysis of neural network applications. ACM, Fairfax, Virginia, United States Wu CH, Ermongkonchai A, Chang T-C (1991) Protein classification using a neural network database system. In: Proceedings of the conference on analysis of neural network applications. ACM, Fairfax, Virginia, United States
27.
Zurück zum Zitat Wu C, Whitson G, McLarty J, Ermongkonchai A, Chang TC (1992) Protein classification artificial neural system. Protein Sci 1:667–677CrossRef Wu C, Whitson G, McLarty J, Ermongkonchai A, Chang TC (1992) Protein classification artificial neural system. Protein Sci 1:667–677CrossRef
28.
Zurück zum Zitat Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678CrossRef Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678CrossRef
Metadaten
Titel
Cluster identification and separation in the growing self-organizing map: application in protein sequence classification
verfasst von
Norashikin Ahmad
Damminda Alahakoon
Rowena Chau
Publikationsdatum
01.06.2010
Verlag
Springer-Verlag
Erschienen in
Neural Computing and Applications / Ausgabe 4/2010
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-009-0300-0

Weitere Artikel der Ausgabe 4/2010

Neural Computing and Applications 4/2010 Zur Ausgabe

Premium Partner