nach oben

Knowledge and Information Systems

Erschienen in:

28.11.2016 | Regular Paper

rFILTA: relevant and nonredundant view discovery from collections of clusterings via filtering and ranking

verfasst von: Yang Lei, Nguyen Xuan Vinh, Jeffrey Chan, James Bailey

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Meta-clustering is a popular approach for finding multiple clusterings in the dataset, taking a large number of base clusterings as input for further user navigation and refinement. However, the effectiveness of meta-clustering is highly dependent on the distribution of the base clusterings and open challenges exist with regard to its stability and noise tolerance. In addition, the clustering views returned may not all be relevant, hence there is open challenge on how to rank those clustering views. In this paper we propose a simple and effective filtering algorithm that can be flexibly used in conjunction with any meta-clustering method. In addition, we propose an unsupervised method to rank the returned clustering views. We evaluate the framework (rFILTA) on both synthetic and real-world datasets, and see how its use can enhance the clustering view discovery for complex scenarios.

Vorheriger Artikel Identification of critical situations via Event Processing and Event Trust Analysis

Nächster Artikel Exploiting interactions of review text, hidden user communities and item groups, and time for collaborative filtering

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Please refer to Sect. 9.5.1 for more details about the dataset and experiments.

The similarity between two clusterings are measured by adjusted mutual information (AMI), which will be introduced in Sect. 4.1.

We demonstrate this point in the experiments part, refer to Fig. 21.

The normalized version of mutual information for scaling mutual information to [0, 1].

Even though we are not sampling the clustering space uniformly, the size of the meta-cluster can be considered as one reasonable standard.

The images of card are downloaded from https://code.google.com/p/vectorized-playing-cards/.

The code of feature extraction is available at https://github.com/adikhosla/feature-extraction.

www.cs.cmu.edu/webkb.

Azimi J, Fern X (2009) Adaptive cluster ensemble selection. In: IJCAI vol 9, pp 992–997

Bache K, Lichman M (2013) UCI machine learning repository. URL http://archive.ics.uci.edu/ml

Bae E, Bailey J Coala (2006) A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Sixth international conference on data mining, 2006 (ICDM’06). IEEE, pp 53–62

Bailey J (2013) Alternative clustering analysis: a review. In: Aggarwal C, Reddy C (eds) Data clustering: algorithms and applications. CRC Press, Boca Raton

Caruana R, Elhaway M, Nguyen N, Smith C (2006) Meta clustering. In: Proceedings of ICDM, pp 107–118

Cui Y, Fern XZ, Dy JG (2007) Multi-view clustering via orthogonalization. In: Proceedings of ICDM, pp 133–142

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society Conference on computer vision and pattern recognition, 2005 (CVPR’2005) IEEE, vol 1, pp 886–893

Dang XH, Bailey J (2010) A hierarchical information theoretic technique for the discovery of non linear alternative clusterings. In: Proceedings of the of (KDD’10), pp 573–582

Dang XH, Bailey J (2014) Generating multiple alternative clusterings via globally optimal subspaces. Data Min Knowl Discov 28(3):569–592MathSciNetCrossRefMATH

10.

Dang XH, Bailey J (2015) A framework to uncover multiple alternative clusterings. Mach Learn 98(1–2):7–30MathSciNetCrossRefMATH

11.

Davidson I, Qi Z (2008) Finding alternative clusterings using constraints. In: Proceedings of ICDM, pp 773–778

12.

Faivishevsky L, Goldberger J (2010) Nonparametric information theoretic clustering algorithm. In: Proceedings of ICML, pp 351–358

13.

Fern XZ, Lin W (2008) Cluster ensemble selection. Stat Anal Data Min 1(3):128–141MathSciNetCrossRef

14.

Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):4CrossRef

15.

Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306MathSciNetCrossRefMATH

16.

Gullo F, Domeniconi C, Tagarelli A (2015) Metacluster-based projective clustering ensembles. Mach Learn 98(1–2):181–216MathSciNetCrossRefMATH

17.

Hadjitodorov ST, Kuncheva LI, Todorova LP (2006) Moderate diversity for better cluster ensembles. Inf Fusion 7(3):264–275CrossRef

18.

Havens TC, Bezdek JC (2012) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24(5):813–822CrossRef

19.

Havens TC, Bezdek JC, Keller JM, Popescu M (2009) Clustering in ordered dissimilarity data. Int J Int Syst 24(5):504–528CrossRefMATH

20.

Hossain MS, Ramakrishnan N, Davidson I, Watson LT (2013) How to “alternatize” a clustering algorithm. Data Min Knowl Discov 27(2):193–224MathSciNetCrossRefMATH

21.

Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood CliffsMATH

22.

Jain P, Meka R, Dhillon IS (2008) Simultaneous unsupervised learning of disparate clusterings. Stat Anal Data Min: ASA Data Sci J 1(3):195–210MathSciNetCrossRef

23.

Jaskowiak PA, Moulavi D, Furtado AC, Campello RJ, Zimek A, Sander J (2016) On strategies for building effective ensembles of relative clustering validity criteria. Knowl Inf Syst 47(2):329–354CrossRef

24.

Lei Y, Vinh NX, Chan J, Bailey J (2014) Filta Better view discovery from collections of clusterings via filtering. Machine learning and knowledge discovery in databases. Springer, Berlin, pp 145–160

25.

Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeCrossRefMATH

26.

Naldi MC, Carvalho A, Campello RJ (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289MathSciNetCrossRefMATH

27.

Nguyen N, Caruana R (2007) Consensus clusterings. In: Seventh IEEE international conference on data mining (ICDM’2007). IEEE, pp 607–612

28.

Nie F, Xu D, Li X (2012) Initialization independent clustering with actively self-training method. IEEE Trans Syst, Man, Cybern, Part B (Cybern) 42(1):17–27CrossRef

29.

Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 977–986

30.

Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 2, pp 1447–1454

31.

Niu D, Dy JG, Jordan MI (2014) Iterative discovery of multiple alternativeclustering views. IEEE Trans Pattern Anal Mach Intell 36(7):1340–1353CrossRef

32.

Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef

33.

Phillips JM, Raman P, Venkatasubramanian S (2011) Generating a diverse set of high-quality clusterings. arXiv:1108.0017

34.

Pihur V, Datta S, Datta S (2007) Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics 23(13):1607–1615CrossRef

35.

Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRefMATH

36.

Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst, Man, Cybern, Part B (Cybern) 35(6):1156–1167CrossRef

37.

Strehl A, Ghosh J (2003) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH

38.

Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881CrossRef

39.

Vinh NX, Epps J (2010) minCEntropy: a novel information theoretic approach for the generation of alternative clusterings. In: Proceedings of the ICDM, pp 521–530

40.

Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of ICML. ACM, pp 1073–1080

41.

Wang L, Nguyen UT, Bezdek JC, Leckie CA, Ramamohanarao K (2010) iVAT and aVAT: enhanced visual analysis for cluster tendency assessment. In: Proceedings of PAKDD, pp 16–27

42.

Wang H, Shan H, Banerjee A (2011) Bayesian cluster ensembles. Stat Anal Data Min 4(1):54–70MathSciNetCrossRef

43.

Zhang Y, Li T (2011) Extending consensus clustering to explore multiple clustering views. In: Proceedings of the SDM, pp 920–931

Titel: rFILTA: relevant and nonredundant view discovery from collections of clusterings via filtering and ranking
verfasst von: Yang Lei
Nguyen Xuan Vinh
Jeffrey Chan
James Bailey
Publikationsdatum: 28.11.2016
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 1/2017
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-016-1008-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2017

Synchronization-based scalable subspace clustering of high-dimensional data

Identification of critical situations via Event Processing and Event Trust Analysis

Wind speed parameters sensitivity analysis based on fractals and neuro-fuzzy selection technique

Random indexing of multidimensional data

A review of task scheduling based on meta-heuristics approach in cloud computing

A user parameter-free approach for mining robust sequential classification rules

Premium Partner