Skip to main content
Erschienen in: Knowledge and Information Systems 1/2019

20.03.2018 | Regular Paper

Connectedness-based subspace clustering

verfasst von: Namita Jain, C. A. Murthy

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

An algorithm for density-based subspace clustering of given data is proposed here. Unlike the existing density-based subspace clustering algorithms which find clusters using spatial proximity, existence of common high-density regions is the condition for grouping of features here. The proposed method is capable of finding subspace clusters based on both linear and nonlinear relationships between features. Unlike existing density-based subspace clustering algorithms, the values of parameters for density estimation need not be provided by the user. These values are calculated for each pair of features using data distribution in space corresponding to the particular pair of features. This allows proposed approach to find subspace clusters where relationship between different features exists at different scales. The performance of proposed algorithm is compared with other subspace clustering methods using artificial and real-life datasets. The proposed method is seen to find subspace clusters embedded in 5 artificial datasets with greater G score. It is also seen that the proposed method is able to find subspace clusters corresponding to known classes in 4 real-life datasets, with greater accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
4.
Zurück zum Zitat Ahmed HA, Mahanta P, Bhattacharyya DK, Kalita JK (2014) Shifting-and-scaling correlation based biclustering algorithm. IEEE ACM Trans Comput Biol Bioinform 11(6):1239–1252CrossRef Ahmed HA, Mahanta P, Bhattacharyya DK, Kalita JK (2014) Shifting-and-scaling correlation based biclustering algorithm. IEEE ACM Trans Comput Biol Bioinform 11(6):1239–1252CrossRef
5.
Zurück zum Zitat Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W, Bijnens L, Ghlmann HWH, Shkedy Z, Clevert D-A (2010) Fabia: factor analysis for bicluster acquisition. Bioinformatics 26:1520CrossRef Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W, Bijnens L, Ghlmann HWH, Shkedy Z, Clevert D-A (2010) Fabia: factor analysis for bicluster acquisition. Bioinformatics 26:1520CrossRef
6.
Zurück zum Zitat Bergmann S, Ihmels J, Barkai N (2003) Iterative signature algorithm for the analysis of large-scale gene expression. Phys Rev E Stat Nonlinear Soft Matter Phys 67:131902CrossRef Bergmann S, Ihmels J, Barkai N (2003) Iterative signature algorithm for the analysis of large-scale gene expression. Phys Rev E Stat Nonlinear Soft Matter Phys 67:131902CrossRef
9.
Zurück zum Zitat Cheung L, Yip KY, Cheung DW, Kao B, Ng MK (2005) On mining micro-array data by order-preserving submatrix. In: 21st International conference on data engineering workshops (ICDEW’05), pp 1153–1153 Cheung L, Yip KY, Cheung DW, Kao B, Ng MK (2005) On mining micro-array data by order-preserving submatrix. In: 21st International conference on data engineering workshops (ICDEW’05), pp 1153–1153
11.
Zurück zum Zitat Divina F, Aguilar-Ruiz JS (2006) Biclustering of expression data with evolutionary computation. IEEE Trans Knowl Data Eng 18(5):590–602CrossRef Divina F, Aguilar-Ruiz JS (2006) Biclustering of expression data with evolutionary computation. IEEE Trans Knowl Data Eng 18(5):590–602CrossRef
12.
Zurück zum Zitat Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, KDD’96. AAAI Press, pp 226–231. http://dl.acm.org/citation.cfm?id=3001460.3001507 Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, KDD’96. AAAI Press, pp 226–231. http://​dl.​acm.​org/​citation.​cfm?​id=​3001460.​3001507
13.
Zurück zum Zitat Gallo CA, Carballido JA, Ponzoni I (2009) Bihea: a hybrid evolutionary approach for microarray biclustering, In: Guimarães A, Katia S, Panchenko, Przytycka TM (eds) Proceedings of the advances in bioinformatics and computational biology: 4th Brazilian symposium on bioinformatics, BSB 2009, Porto Alegre, Brazil, July 29–31, 2009. Springer, Berlin, pp 36–47. https://doi.org/10.1007/978-3-642-03223-3 Gallo CA, Carballido JA, Ponzoni I (2009) Bihea: a hybrid evolutionary approach for microarray biclustering, In: Guimarães A, Katia S, Panchenko, Przytycka TM (eds) Proceedings of the advances in bioinformatics and computational biology: 4th Brazilian symposium on bioinformatics, BSB 2009, Porto Alegre, Brazil, July 29–31, 2009. Springer, Berlin, pp 36–47. https://​doi.​org/​10.​1007/​978-3-642-03223-3
14.
Zurück zum Zitat Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129CrossRef Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129CrossRef
17.
Zurück zum Zitat Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the SIAM international Conference on data mining (SDM’04), vol 4 Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the SIAM international Conference on data mining (SDM’04), vol 4
18.
Zurück zum Zitat Kriegel H-P, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the fifth IEEE international conference on data mining, ICDM ’05. IEEE Computer Society, Washington, DC, USA, pp 250–257. https://doi.org/10.1109/ICDM.2005.5 Kriegel H-P, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the fifth IEEE international conference on data mining, ICDM ’05. IEEE Computer Society, Washington, DC, USA, pp 250–257. https://​doi.​org/​10.​1109/​ICDM.​2005.​5
20.
Zurück zum Zitat Kriegel H-P, Zimek A (2010) Subspace clustering, ensemble clustering, alternative clustering, multiview clustering: what can we learn from each other? In: Proceedings of the 1st international workshop on discovering, summarizing and using multiple clusterings (MultiClust) held in conjunction with KDD Kriegel H-P, Zimek A (2010) Subspace clustering, ensemble clustering, alternative clustering, multiview clustering: what can we learn from each other? In: Proceedings of the 1st international workshop on discovering, summarizing and using multiple clusterings (MultiClust) held in conjunction with KDD
21.
24.
Zurück zum Zitat Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE ACM Trans Comput Biol Bioinform 1(1):24–45CrossRef Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE ACM Trans Comput Biol Bioinform 1(1):24–45CrossRef
25.
Zurück zum Zitat Mandal DP, Murthy CA (1997) Selection of alpha for alpha-hull in \(\{\text{ R2 }\}\). Pattern Recognit 30(10):1759–1767MATHCrossRef Mandal DP, Murthy CA (1997) Selection of alpha for alpha-hull in \(\{\text{ R2 }\}\). Pattern Recognit 30(10):1759–1767MATHCrossRef
26.
Zurück zum Zitat Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recognit 39(12):2464–2477MATHCrossRef Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recognit 39(12):2464–2477MATHCrossRef
28.
Zurück zum Zitat Müller AC, Nowozin S, Lampert CH (2012) Information theoretic clustering using minimum spanning trees. Springer, Berlin, pp 205–215 Müller AC, Nowozin S, Lampert CH (2012) Information theoretic clustering using minimum spanning trees. Springer, Berlin, pp 205–215
31.
Zurück zum Zitat Pontes B, Girldez R, Aguilar-Ruiz JS (2015) Biclustering on expression data: a review. J Biomed Inform 57:163–180CrossRef Pontes B, Girldez R, Aguilar-Ruiz JS (2015) Biclustering on expression data: a review. J Biomed Inform 57:163–180CrossRef
32.
Zurück zum Zitat Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 16:1518–1524MATHCrossRef Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 16:1518–1524MATHCrossRef
34.
Zurück zum Zitat Seridi K, Jourdan L, Talbi EG (2011) Multi-objective evolutionary algorithm for biclustering in microarrays data. In: 2011 IEEE congress of evolutionary computation (CEC), pp 2593–2599 Seridi K, Jourdan L, Talbi EG (2011) Multi-objective evolutionary algorithm for biclustering in microarrays data. In: 2011 IEEE congress of evolutionary computation (CEC), pp 2593–2599
35.
Zurück zum Zitat Sim K, Gopalkrishnan V, Zimek A, Cong G (2013) A survey on enhanced subspace clustering. Data Min Knowl Discov 26(2):332–397MathSciNetMATHCrossRef Sim K, Gopalkrishnan V, Zimek A, Cong G (2013) A survey on enhanced subspace clustering. Data Min Knowl Discov 26(2):332–397MathSciNetMATHCrossRef
38.
Zurück zum Zitat Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:S136–S144CrossRef Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:S136–S144CrossRef
40.
Zurück zum Zitat Yun T, Yi G-S (2013) Biclustering for the comprehensive search of correlated gene expression patterns using clustered seed expansion. BMC Genom 14:144CrossRef Yun T, Yi G-S (2013) Biclustering for the comprehensive search of correlated gene expression patterns using clustered seed expansion. BMC Genom 14:144CrossRef
Metadaten
Titel
Connectedness-based subspace clustering
verfasst von
Namita Jain
C. A. Murthy
Publikationsdatum
20.03.2018
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2019
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-018-1181-2

Weitere Artikel der Ausgabe 1/2019

Knowledge and Information Systems 1/2019 Zur Ausgabe