Skip to main content
Top
Published in: Neural Computing and Applications 7-8/2013

01-06-2013 | Original Article

Bi-clustering continuous data with self-organizing map

Authors: Khalid Benabdeslem, Kais Allab

Published in: Neural Computing and Applications | Issue 7-8/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we present a new SOM-based bi-clustering approach for continuous data. This approach is called Bi-SOM (for Bi-clustering based on Self-Organizing Map). The main goal of bi-clustering aims to simultaneously group the rows and columns of a given data matrix. In addition, we propose in this work to deal with some issues related to this task: (1) the topological visualization of bi-clusters with respect to their neighborhood relation, (2) the optimization of these bi-clusters in macro-blocks and (3) the dimensionality reduction by eliminating noise blocks, iteratively. Finally, experiments are given over several data sets for validating our approach in comparison with other bi-clustering methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Angiulli F, Cesario E, Pizzuti C (2008) Random walk biclustering for microarray data. Inf Sci 178:1479–1497MATHCrossRef Angiulli F, Cesario E, Pizzuti C (2008) Random walk biclustering for microarray data. Inf Sci 178:1479–1497MATHCrossRef
2.
go back to reference Bandyopadhyay S, Mukhopadhyay A, Maulik U (2007) An improved algorithm for clustering gene expression data. Bioinformatics 21:2859–2865CrossRef Bandyopadhyay S, Mukhopadhyay A, Maulik U (2007) An improved algorithm for clustering gene expression data. Bioinformatics 21:2859–2865CrossRef
3.
go back to reference BenDor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order preserving sub matrix problem. J Comput Biol 10(3–4):373–384CrossRef BenDor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order preserving sub matrix problem. J Comput Biol 10(3–4):373–384CrossRef
4.
go back to reference Bergmann S, Ihmels J, Barkai N (2004) Defining transcription modules using large-scale gene expression. Bioinformatics 20(13):1993–2003CrossRef Bergmann S, Ihmels J, Barkai N (2004) Defining transcription modules using large-scale gene expression. Bioinformatics 20(13):1993–2003CrossRef
5.
go back to reference Bryan K, Cunningham P, Bolshakova N (2005) Biclustering of expression data using simulated annealing. CBMS 2005:383–388 Bryan K, Cunningham P, Bolshakova N (2005) Biclustering of expression data using simulated annealing. CBMS 2005:383–388
6.
go back to reference Busygin S, Jacobsen G, Kramer E (2002) Double conjugated clustering applied to leukemia microarray data. In: Proceedings of the 2nd SIAM international conference on data mining, workshop on clustering high dimensional data Busygin S, Jacobsen G, Kramer E (2002) Double conjugated clustering applied to leukemia microarray data. In: Proceedings of the 2nd SIAM international conference on data mining, workshop on clustering high dimensional data
7.
go back to reference Cheng Y, Church G (2000) Biclustering of expression data. In: Proceedings of the 8th international conference on intelligent systems for molecular biology (ISMB’00), vol 8, pp 93–103 Cheng Y, Church G (2000) Biclustering of expression data. In: Proceedings of the 8th international conference on intelligent systems for molecular biology (ISMB’00), vol 8, pp 93–103
8.
go back to reference Cottrell M, Ibbou S, Letrémy P (2004) Som-based algorithms for qualitative variables. Neural Netw 17(8–9):1149–1167MATHCrossRef Cottrell M, Ibbou S, Letrémy P (2004) Som-based algorithms for qualitative variables. Neural Netw 17(8–9):1149–1167MATHCrossRef
9.
go back to reference Cottrell M, Letrémy MP (2005) How to use the kohonen algorithm to simultaneously analyze individuals and modalities in a survey. Neurocomputing 63:193–207CrossRef Cottrell M, Letrémy MP (2005) How to use the kohonen algorithm to simultaneously analyze individuals and modalities in a survey. Neurocomputing 63:193–207CrossRef
10.
go back to reference Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–274 Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–274
11.
go back to reference Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25):14863–14868CrossRef Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25):14863–14868CrossRef
12.
go back to reference Fort J, Cottrel M, Letrémy P (2001) Stochastic on-row algorithm versus batch algorithm for quantization and self-organizing maps. Neural networks for signal processing XI, 2001. In: Proceedings of the 2001 IEEE signal processing society workshop, pp 43–52 Fort J, Cottrel M, Letrémy P (2001) Stochastic on-row algorithm versus batch algorithm for quantization and self-organizing maps. Neural networks for signal processing XI, 2001. In: Proceedings of the 2001 IEEE signal processing society workshop, pp 43–52
14.
go back to reference Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh M, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression. Science 286:531–537CrossRef Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh M, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression. Science 286:531–537CrossRef
15.
go back to reference Govaert G, Nadif M (2008) Block clustering with mixture models: comparison of different approaches. Comput Stat Data Anal 52:3233–3245MathSciNetMATHCrossRef Govaert G, Nadif M (2008) Block clustering with mixture models: comparison of different approaches. Comput Stat Data Anal 52:3233–3245MathSciNetMATHCrossRef
16.
go back to reference Govaert G (1983) Classification Croisée. Thèse d’état, Université de Paris6 Govaert G (1983) Classification Croisée. Thèse d’état, Université de Paris6
17.
go back to reference Hartigan J (1972) Direct clustering of data matrix. J Am Stat Assoc 67(337):123–129CrossRef Hartigan J (1972) Direct clustering of data matrix. J Am Stat Assoc 67(337):123–129CrossRef
18.
go back to reference Hartigan J (1975) Direct splitting. Clustering algorithms, Chap. 14. Wiley, New York, pp 251–277 Hartigan J (1975) Direct splitting. Clustering algorithms, Chap. 14. Wiley, New York, pp 251–277
19.
go back to reference Klugar Y, Basri R, Chang J, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13:703–716CrossRef Klugar Y, Basri R, Chang J, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13:703–716CrossRef
21.
go back to reference Lazzeroni L, Owen A (2000) Plaid models for gene expression data. Stat Sin 12:61–86MathSciNet Lazzeroni L, Owen A (2000) Plaid models for gene expression data. Stat Sin 12:61–86MathSciNet
22.
go back to reference MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proc Fifth Berkeley Symp Math Stat Probab 1:281–297MathSciNet MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proc Fifth Berkeley Symp Math Stat Probab 1:281–297MathSciNet
23.
go back to reference Madeira S, Oliveira A (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf 1(1):24–45CrossRef Madeira S, Oliveira A (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf 1(1):24–45CrossRef
24.
go back to reference Meeds E, Roweis S (2007) Nonparametric bayesian bi-clustering. Technical report Meeds E, Roweis S (2007) Nonparametric bayesian bi-clustering. Technical report
25.
go back to reference Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn 39(12):2464–2477MATHCrossRef Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn 39(12):2464–2477MATHCrossRef
26.
go back to reference Murali T, Kasif S (2003) Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput 8:77–88 Murali T, Kasif S (2003) Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput 8:77–88
27.
go back to reference Pensa R, Boulicaut J-F, Cordero F, Atzori M (2010) Co-clustering numerical data under user-defined constraints. Stat Anal Data Min 3(1):38–55MathSciNet Pensa R, Boulicaut J-F, Cordero F, Atzori M (2010) Co-clustering numerical data under user-defined constraints. Stat Anal Data Min 3(1):38–55MathSciNet
28.
go back to reference Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1131CrossRef Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1131CrossRef
29.
go back to reference Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850CrossRef Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850CrossRef
30.
go back to reference Santamaria R, Quintales L, Theron R (2007) Methods to bicluster validation and comparison in microarray data. In: Proceedings of IDEAL 2007, LNCS4881, pp 780–789 Santamaria R, Quintales L, Theron R (2007) Methods to bicluster validation and comparison in microarray data. In: Proceedings of IDEAL 2007, LNCS4881, pp 780–789
31.
go back to reference Schummer M, Ng W, Bumgarner R, Nelson P, Schummer B, Bednarski D, Hassell L, Baldwin R, Karlan B, Hood L (1999) Comparative hybridization of an array of 21500 ovarian cdnas for the discovery of genes overexpressed in ovarian carcinomas. Gene 238(2):375–385CrossRef Schummer M, Ng W, Bumgarner R, Nelson P, Schummer B, Bednarski D, Hassell L, Baldwin R, Karlan B, Hood L (1999) Comparative hybridization of an array of 21500 ovarian cdnas for the discovery of genes overexpressed in ovarian carcinomas. Gene 238(2):375–385CrossRef
32.
go back to reference Shi J, Malik J (2000) Normalized cuts and image segmentation. Technical report, University of California at Berkeley, Berkeley, CA, USA Shi J, Malik J (2000) Normalized cuts and image segmentation. Technical report, University of California at Berkeley, Berkeley, CA, USA
33.
go back to reference Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:36–44CrossRef Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:36–44CrossRef
34.
go back to reference Xiaowen L, Wang L (2007) Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23(1):50–56CrossRef Xiaowen L, Wang L (2007) Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23(1):50–56CrossRef
35.
go back to reference Yang J, Wang W, Wang H, Yu P (2003) Enhanced biclustering on expression data. BIBE ’03, pp. 321–327 Yang J, Wang W, Wang H, Yu P (2003) Enhanced biclustering on expression data. BIBE ’03, pp. 321–327
Metadata
Title
Bi-clustering continuous data with self-organizing map
Authors
Khalid Benabdeslem
Kais Allab
Publication date
01-06-2013
Publisher
Springer-Verlag
Published in
Neural Computing and Applications / Issue 7-8/2013
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-012-1047-6

Other articles of this Issue 7-8/2013

Neural Computing and Applications 7-8/2013 Go to the issue

Premium Partner