Skip to main content
Erschienen in: Knowledge and Information Systems 3/2016

01.06.2016 | Regular Paper

Co-clustering of multi-view datasets

verfasst von: Syed Fawad Hussain, Shariq Bashir

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In many clustering problems, we have access to multiple sources of data representing different aspects of the problem. Each of these data separately represents an association between entities. Multi-view clustering involves integrating clustering information from these heterogeneous sources of data and has been shown to improve results over a single-view clustering. On the other hand, co-clustering has been widely used as a technique to improve clustering results on a single view by exploiting the duality between objects and their attributes. In this paper, we propose a multi-view clustering setting in the context of a co-clustering framework. Our underlying assumption is that similarity values generated from the individual data can be transferred from one view to the other(s) resulting in a better clustering of the data. We provide empirical evidence to show that this framework results in a better clustering accuracy than those obtained from any of the single views, tested on different datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
In practice, a sample is used to estimate the value corresponding to \(\rho \,\%\) rather than sorting the values in the entire matrix and removing the lowest values.
 
2
We ignore here the normalization factor for the sake of clarity.
 
3
Values for \(\rho =1\) and \(\lambda =0\) are omitted since they will result in all 0’s (pruning 100 % of similarity values in R) and all 1’s (raising all values in M to the power 0) in the R and M matrices, respectively.
 
Literatur
1.
Zurück zum Zitat Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory (ICDT), pp 420–434 Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory (ICDT), pp 420–434
2.
Zurück zum Zitat Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, Berlin Heidelberg, p 25–71 Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, Berlin Heidelberg, p 25–71
3.
Zurück zum Zitat Berry MW (2007) Survey of text mining: clustering, classification, and retrieval. Springer, New York Berry MW (2007) Survey of text mining: clustering, classification, and retrieval. Springer, New York
4.
Zurück zum Zitat Bickel S, Scheffer T (2004) Multi-view clustering. In: Proceedings of the IEEE international conference on data mining Bickel S, Scheffer T (2004) Multi-view clustering. In: Proceedings of the IEEE international conference on data mining
5.
Zurück zum Zitat Bisson G, Grimal C (2012) Co-clustering of multi-view datasets: a parallelizable approach. In: Proceedings of the 2012 IEEE 12th international conference on data mining. IEEE Computer Society, pp 828–833 Bisson G, Grimal C (2012) Co-clustering of multi-view datasets: a parallelizable approach. In: Proceedings of the 2012 IEEE 12th international conference on data mining. IEEE Computer Society, pp 828–833
6.
Zurück zum Zitat Bisson G, Hussain F (2008) Chi-Sim: a new similarity measure for the co-clustering task. In: Proceedings of the 2008 seventh international conference on machine learning and applications, pp 211–217 Bisson G, Hussain F (2008) Chi-Sim: a new similarity measure for the co-clustering task. In: Proceedings of the 2008 seventh international conference on machine learning and applications, pp 211–217
7.
Zurück zum Zitat Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, pp 92–100 Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, pp 92–100
8.
Zurück zum Zitat Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 2598–2604 Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 2598–2604
9.
Zurück zum Zitat Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: Proceedings of the 26th annual international conference on machine learning, pp 129–136 Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: Proceedings of the 26th annual international conference on machine learning, pp 129–136
10.
Zurück zum Zitat Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, pp 93–103 Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, pp 93–103
11.
Zurück zum Zitat Drost I, Bickel S, Scheffer T (2006) Discovering communities in linked data by multi-view clustering. In: From Data and Information Analysis to Knowledge Egineering. Springer, Berlin Heidelberg, p 342–349 Drost I, Bickel S, Scheffer T (2006) Discovering communities in linked data by multi-view clustering. In: From Data and Information Analysis to Knowledge Egineering. Springer, Berlin Heidelberg, p 342–349
12.
Zurück zum Zitat Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, LondonMATH Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, LondonMATH
13.
Zurück zum Zitat Hussain SF (2010) A new co-similarity measure: application to text mining and bioinformatics. Institut National Polytechnique de Grenoble-INPG, Grenoble Hussain SF (2010) A new co-similarity measure: application to text mining and bioinformatics. Institut National Polytechnique de Grenoble-INPG, Grenoble
14.
Zurück zum Zitat Hussain SF (2011) Bi-clustering gene expression data using co-similarity. In: proceedings of the 7th international conference on Advanced Data Mining and Applications (ADMA), Beijing, p 190–200 Hussain SF (2011) Bi-clustering gene expression data using co-similarity. In: proceedings of the 7th international conference on Advanced Data Mining and Applications (ADMA), Beijing, p 190–200
15.
Zurück zum Zitat Hussain SF, Bisson G (2010) Text categorization using word similarities based on higher order co-occurrences. In: SIAM international conference on data mining (SDM 10), Columbus, OH Hussain SF, Bisson G (2010) Text categorization using word similarities based on higher order co-occurrences. In: SIAM international conference on data mining (SDM 10), Columbus, OH
16.
Zurück zum Zitat Hussain SF, Bisson G, Grimal C (2010) An improved co-similarity measure for document clustering. In: Ninth international conference on machine learning and applications (ICMLA), pp 190–197 Hussain SF, Bisson G, Grimal C (2010) An improved co-similarity measure for document clustering. In: Ninth international conference on machine learning and applications (ICMLA), pp 190–197
17.
Zurück zum Zitat Hussain SF, Mushtaq M, Halim Z (2014) Multi-view document clustering via ensemble method. J Intell Inf Syst 43(1):81–99 Hussain SF, Mushtaq M, Halim Z (2014) Multi-view document clustering via ensemble method. J Intell Inf Syst 43(1):81–99
18.
Zurück zum Zitat Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26:217–254MathSciNetCrossRefMATH Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26:217–254MathSciNetCrossRefMATH
19.
Zurück zum Zitat Ilc N, Dobnikar A (2012) Generation of a clustering ensemble based on a gravitational self-organising map. Neurocomputing 96:47–56CrossRef Ilc N, Dobnikar A (2012) Generation of a clustering ensemble based on a gravitational self-organising map. Neurocomputing 96:47–56CrossRef
20.
Zurück zum Zitat Izakian H, Pedrycz W (2014) Agreement-based fuzzy C-means for clustering data with blocks of features. Neurocomputing 127:266–280CrossRef Izakian H, Pedrycz W (2014) Agreement-based fuzzy C-means for clustering data with blocks of features. Neurocomputing 127:266–280CrossRef
21.
Zurück zum Zitat Janssens F, Glänzel W, De Moor B (2007) Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 360–369 Janssens F, Glänzel W, De Moor B (2007) Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 360–369
22.
Zurück zum Zitat Liu J, Wang C, Gao J, Han J (2013) Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of SIAM data mining conference, pp 252–260 Liu J, Wang C, Gao J, Han J (2013) Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of SIAM data mining conference, pp 252–260
23.
Zurück zum Zitat Liu X, Yu S, Janssens F, Glänzel W, Moreau Y, De Moor B (2010) Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. J Am Soc Inf Sci Technol 61:1105–1119 Liu X, Yu S, Janssens F, Glänzel W, Moreau Y, De Moor B (2010) Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. J Am Soc Inf Sci Technol 61:1105–1119
24.
Zurück zum Zitat Liu X, Yu S, Moreau Y, De Moor B, Glänzel W, Janssens F (2009) Hybrid clustering of text mining and bibliometrics applied to journal sets. In: Proceedings of SIAM data mining conference Liu X, Yu S, Moreau Y, De Moor B, Glänzel W, Janssens F (2009) Hybrid clustering of text mining and bibliometrics applied to journal sets. In: Proceedings of SIAM data mining conference
25.
Zurück zum Zitat Mirzaei A, Rahmati M (2010) A novel hierarchical–clustering–combination scheme based on fuzzy-similarity relations. IEEE Trans Fuzzy Syst 18:27–39CrossRef Mirzaei A, Rahmati M (2010) A novel hierarchical–clustering–combination scheme based on fuzzy-similarity relations. IEEE Trans Fuzzy Syst 18:27–39CrossRef
26.
Zurück zum Zitat Mirzaei H (2010) A novel multi-view agglomerative clustering algorithm based on ensemble of partitions on different views. In: IEEE 20th international conference on pattern recognition (ICPR), pp 1007–1010 Mirzaei H (2010) A novel multi-view agglomerative clustering algorithm based on ensemble of partitions on different views. In: IEEE 20th international conference on pattern recognition (ICPR), pp 1007–1010
28.
Zurück zum Zitat Priam R, Nadif M, Govaert G (2013) Gaussian topographic co-clustering model. In: Advances in intelligent data analysis, XII. Springer, Berlin, pp 345–356 Priam R, Nadif M, Govaert G (2013) Gaussian topographic co-clustering model. In: Advances in intelligent data analysis, XII. Springer, Berlin, pp 345–356
29.
Zurück zum Zitat Qian M, Zhai C (2014) Unsupervised feature selection for multi-view clustering on text-image web news data. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 1963–1966 Qian M, Zhai C (2014) Unsupervised feature selection for multi-view clustering on text-image web news data. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 1963–1966
30.
Zurück zum Zitat Slonim N, Tishby N (2001) The power of word clusters for text classification. In: Proceedings of the 23rd European colloquium on information retrieval research Slonim N, Tishby N (2001) The power of word clusters for text classification. In: Proceedings of the 23rd European colloquium on information retrieval research
31.
Zurück zum Zitat Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Ph.D. Thesis, The University of Texas at Austin Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Ph.D. Thesis, The University of Texas at Austin
32.
Zurück zum Zitat Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH
33.
Zurück zum Zitat Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23:2031–2038CrossRef Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23:2031–2038CrossRef
34.
Zurück zum Zitat Tang L, Wang X, Liu H (2010) Community detection in multi-dimensional networks. Technical Report, Defense Technical Information Center Tang L, Wang X, Liu H (2010) Community detection in multi-dimensional networks. Technical Report, Defense Technical Information Center
35.
Zurück zum Zitat Tang W, Lu Z, Dhillon IS (2009) Clustering with multiple graphs. In: Ninth IEEE international conference on data mining, ICDM’09, pp 1016–1021 Tang W, Lu Z, Dhillon IS (2009) Clustering with multiple graphs. In: Ninth IEEE international conference on data mining, ICDM’09, pp 1016–1021
36.
Zurück zum Zitat Wang P, Laskey KB, Domeniconi C, Jordan MI (2011) Nonparametric Bayesian co-clustering ensembles. In: Proceedings of SIAM data mining conference Wang P, Laskey KB, Domeniconi C, Jordan MI (2011) Nonparametric Bayesian co-clustering ensembles. In: Proceedings of SIAM data mining conference
37.
Zurück zum Zitat Zheng L, Li T, Ding C (2010) Hierarchical ensemble clustering. In: 2010 IEEE 10th international conference on data mining (ICDM), pp 1199–1204 Zheng L, Li T, Ding C (2010) Hierarchical ensemble clustering. In: 2010 IEEE 10th international conference on data mining (ICDM), pp 1199–1204
Metadaten
Titel
Co-clustering of multi-view datasets
verfasst von
Syed Fawad Hussain
Shariq Bashir
Publikationsdatum
01.06.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-015-0861-4

Weitere Artikel der Ausgabe 3/2016

Knowledge and Information Systems 3/2016 Zur Ausgabe