nach oben

Knowledge and Information Systems

Erschienen in:

01.06.2016 | Regular Paper

Co-clustering of multi-view datasets

verfasst von: Syed Fawad Hussain, Shariq Bashir

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In many clustering problems, we have access to multiple sources of data representing different aspects of the problem. Each of these data separately represents an association between entities. Multi-view clustering involves integrating clustering information from these heterogeneous sources of data and has been shown to improve results over a single-view clustering. On the other hand, co-clustering has been widely used as a technique to improve clustering results on a single view by exploiting the duality between objects and their attributes. In this paper, we propose a multi-view clustering setting in the context of a co-clustering framework. Our underlying assumption is that similarity values generated from the individual data can be transferred from one view to the other(s) resulting in a better clustering of the data. We provide empirical evidence to show that this framework results in a better clustering accuracy than those obtained from any of the single views, tested on different datasets.

Vorheriger Artikel Tracking the evolution of social emotions with topic models

Nächster Artikel Mining exceptional relationships with grammar-guided genetic programming

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

In practice, a sample is used to estimate the value corresponding to \(\rho \,\%\) rather than sorting the values in the entire matrix and removing the lowest values.

We ignore here the normalization factor for the sake of clarity.

Values for \(\rho =1\) and \(\lambda =0\) are omitted since they will result in all 0’s (pruning 100 % of similarity values in R) and all 1’s (raising all values in M to the power 0) in the R and M matrices, respectively.

Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory (ICDT), pp 420–434

Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, Berlin Heidelberg, p 25–71

Berry MW (2007) Survey of text mining: clustering, classification, and retrieval. Springer, New York

Bickel S, Scheffer T (2004) Multi-view clustering. In: Proceedings of the IEEE international conference on data mining

Bisson G, Grimal C (2012) Co-clustering of multi-view datasets: a parallelizable approach. In: Proceedings of the 2012 IEEE 12th international conference on data mining. IEEE Computer Society, pp 828–833

Bisson G, Hussain F (2008) Chi-Sim: a new similarity measure for the co-clustering task. In: Proceedings of the 2008 seventh international conference on machine learning and applications, pp 211–217

Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, pp 92–100

Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 2598–2604

Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: Proceedings of the 26th annual international conference on machine learning, pp 129–136

10.

Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, pp 93–103

11.

Drost I, Bickel S, Scheffer T (2006) Discovering communities in linked data by multi-view clustering. In: From Data and Information Analysis to Knowledge Egineering. Springer, Berlin Heidelberg, p 342–349

12.

Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, LondonMATH

13.

Hussain SF (2010) A new co-similarity measure: application to text mining and bioinformatics. Institut National Polytechnique de Grenoble-INPG, Grenoble

14.

Hussain SF (2011) Bi-clustering gene expression data using co-similarity. In: proceedings of the 7th international conference on Advanced Data Mining and Applications (ADMA), Beijing, p 190–200

15.

Hussain SF, Bisson G (2010) Text categorization using word similarities based on higher order co-occurrences. In: SIAM international conference on data mining (SDM 10), Columbus, OH

16.

Hussain SF, Bisson G, Grimal C (2010) An improved co-similarity measure for document clustering. In: Ninth international conference on machine learning and applications (ICMLA), pp 190–197

17.

Hussain SF, Mushtaq M, Halim Z (2014) Multi-view document clustering via ensemble method. J Intell Inf Syst 43(1):81–99

18.

Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26:217–254MathSciNetCrossRefMATH

19.

Ilc N, Dobnikar A (2012) Generation of a clustering ensemble based on a gravitational self-organising map. Neurocomputing 96:47–56CrossRef

20.

Izakian H, Pedrycz W (2014) Agreement-based fuzzy C-means for clustering data with blocks of features. Neurocomputing 127:266–280CrossRef

21.

Janssens F, Glänzel W, De Moor B (2007) Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 360–369

22.

Liu J, Wang C, Gao J, Han J (2013) Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of SIAM data mining conference, pp 252–260

23.

Liu X, Yu S, Janssens F, Glänzel W, Moreau Y, De Moor B (2010) Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. J Am Soc Inf Sci Technol 61:1105–1119

24.

Liu X, Yu S, Moreau Y, De Moor B, Glänzel W, Janssens F (2009) Hybrid clustering of text mining and bibliometrics applied to journal sets. In: Proceedings of SIAM data mining conference

25.

Mirzaei A, Rahmati M (2010) A novel hierarchical–clustering–combination scheme based on fuzzy-similarity relations. IEEE Trans Fuzzy Syst 18:27–39CrossRef

26.

Mirzaei H (2010) A novel multi-view agglomerative clustering algorithm based on ensemble of partitions on different views. In: IEEE 20th international conference on pattern recognition (ICPR), pp 1007–1010

27.

Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5:32–38MathSciNetCrossRefMATH

28.

Priam R, Nadif M, Govaert G (2013) Gaussian topographic co-clustering model. In: Advances in intelligent data analysis, XII. Springer, Berlin, pp 345–356

29.

Qian M, Zhai C (2014) Unsupervised feature selection for multi-view clustering on text-image web news data. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 1963–1966

30.

Slonim N, Tishby N (2001) The power of word clusters for text classification. In: Proceedings of the 23rd European colloquium on information retrieval research

31.

Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Ph.D. Thesis, The University of Texas at Austin

32.

Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH

33.

Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23:2031–2038CrossRef

34.

Tang L, Wang X, Liu H (2010) Community detection in multi-dimensional networks. Technical Report, Defense Technical Information Center

35.

Tang W, Lu Z, Dhillon IS (2009) Clustering with multiple graphs. In: Ninth IEEE international conference on data mining, ICDM’09, pp 1016–1021

36.

Wang P, Laskey KB, Domeniconi C, Jordan MI (2011) Nonparametric Bayesian co-clustering ensembles. In: Proceedings of SIAM data mining conference

37.

Zheng L, Li T, Ding C (2010) Hierarchical ensemble clustering. In: 2010 IEEE 10th international conference on data mining (ICDM), pp 1199–1204

Titel: Co-clustering of multi-view datasets
verfasst von: Syed Fawad Hussain
Shariq Bashir
Publikationsdatum: 01.06.2016
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 3/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-015-0861-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2016

Tracking the evolution of social emotions with topic models

A transversal hypergraph approach for the frequent itemset hiding problem

AMORE: design and implementation of a commercial-strength parallel hybrid movie recommendation engine

A differentially private algorithm for location data release

Constrained pattern mining in the new era

Splitting anonymization: a novel privacy-preserving approach of social network