Skip to main content
Erschienen in: Knowledge and Information Systems 3/2017

04.10.2016 | Regular Paper

Finding multiple stable clusterings

verfasst von: Juhua Hu, Qi Qian, Jian Pei, Rong Jin, Shenghuo Zhu

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multi-clustering, which tries to find multiple independent ways to partition a data set into groups, has enjoyed many applications, such as customer relationship management, bioinformatics and healthcare informatics. This paper addresses two fundamental questions in multi-clustering: How to model quality of clusterings and how to find multiple stable clusterings (MSC). We introduce to multi-clustering the notion of clustering stability based on Laplacian eigengap, which was originally used by the regularized spectral learning method for similarity matrix learning. We mathematically prove that the larger the eigengap, the more stable the clustering. Furthermore, we propose a novel multi-clustering method MSC. An advantage of our method comparing to the state-of-the-art multi-clustering methods is that our method can provide users a feature subspace to understand each clustering solution. Another advantage is that MSC does not need users to specify the number of clusters and the number of alternative clusterings, which is usually difficult for users without any guidance. Our method can heuristically estimate the number of stable clusterings in a data set. We also discuss a practical way to make MSC applicable to large-scale data. We report an extensive empirical study that clearly demonstrates the effectiveness of our method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD international conference on management of data. Seattle, WA, pp 94–105 Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD international conference on management of data. Seattle, WA, pp 94–105
2.
Zurück zum Zitat Assent I, Krieger R, Müller E, Seidl T (2008) INSCY: indexing subspace clusters with in-process-removal of redundancy. In: Proceedings of the IEEE international conference on data mining. Pisa, Italy, pp 719–724 Assent I, Krieger R, Müller E, Seidl T (2008) INSCY: indexing subspace clusters with in-process-removal of redundancy. In: Proceedings of the IEEE international conference on data mining. Pisa, Italy, pp 719–724
3.
Zurück zum Zitat Azran A, Ghahramani Z (2006), Spectral methods for automatic multiscale data clustering, In: IEEE computer society conference on computer vision and pattern recognition. New York, NY, pp 190–197 Azran A, Ghahramani Z (2006), Spectral methods for automatic multiscale data clustering, In: IEEE computer society conference on computer vision and pattern recognition. New York, NY, pp 190–197
4.
Zurück zum Zitat Bae E, Bailey J (2006) COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the IEEE international conference on data mining. Hong Kong, China, pp 53–62 Bae E, Bailey J (2006) COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the IEEE international conference on data mining. Hong Kong, China, pp 53–62
5.
Zurück zum Zitat Bae E, Bailey J, Dong G (2010) A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings. Data Min Knowl Disc 21(3):427–471MathSciNetCrossRef Bae E, Bailey J, Dong G (2010) A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings. Data Min Knowl Disc 21(3):427–471MathSciNetCrossRef
6.
Zurück zum Zitat Bailey J (2013) Alternative clustering analysis: a review. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. Taylor & Francis, London, pp 535–550 Bailey J (2013) Alternative clustering analysis: a review. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. Taylor & Francis, London, pp 535–550
7.
8.
Zurück zum Zitat Caruana R, Elhawary MF, Nguyen N, Smith C (2006) Meta clustering. In: Proceedings of the IEEE international conference on data mining. Hong Kong, China, pp 107–118 Caruana R, Elhawary MF, Nguyen N, Smith C (2006) Meta clustering. In: Proceedings of the IEEE international conference on data mining. Hong Kong, China, pp 107–118
9.
Zurück zum Zitat Chen, X, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: Proceedings of the twenty-fifth AAAI conference on artificial intelligence. San Francisco, CA Chen, X, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: Proceedings of the twenty-fifth AAAI conference on artificial intelligence. San Francisco, CA
11.
Zurück zum Zitat Dasgupta S, Ng V (2010) Mining clustering dimensions. In: Proceedings of the 27th international conference on machine learning. Haifa, Israel, pp 263–270 Dasgupta S, Ng V (2010) Mining clustering dimensions. In: Proceedings of the 27th international conference on machine learning. Haifa, Israel, pp 263–270
12.
Zurück zum Zitat Daubechies I (1992) Ten lectures on wavelets. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefMATH Daubechies I (1992) Ten lectures on wavelets. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefMATH
14.
Zurück zum Zitat Drineas P, Mahoney MW (2005) On the nyström method for approximating a gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175MathSciNetMATH Drineas P, Mahoney MW (2005) On the nyström method for approximating a gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175MathSciNetMATH
15.
Zurück zum Zitat Ester M, Kriegel H-P, Sander J, Xu X (1996), A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. Portland, OR, pp 226–231 Ester M, Kriegel H-P, Sander J, Xu X (1996), A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. Portland, OR, pp 226–231
16.
Zurück zum Zitat Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 186–193 Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 186–193
17.
Zurück zum Zitat Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469MathSciNetMATH Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469MathSciNetMATH
19.
Zurück zum Zitat Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRef Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRef
20.
Zurück zum Zitat Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 4th SIAM international conference on data mining. Lake Buena Vista, FL, pp 246–257 Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 4th SIAM international conference on data mining. Lake Buena Vista, FL, pp 246–257
21.
Zurück zum Zitat Kriegel H-P, Kröger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the IEEE international conference on data mining. Houston, TX, pp 250–257 Kriegel H-P, Kröger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the IEEE international conference on data mining. Houston, TX, pp 250–257
22.
Zurück zum Zitat Kumar A, Sabharwal Y, Sen S (2004) A simple linear time (1+\(\acute{\epsilon }\))-approximation algorithm for k-means clustering in any dimensions. In: Proceedings of the 45th symposium on foundations of computer science. Rome, Italy, pp 454–462 Kumar A, Sabharwal Y, Sen S (2004) A simple linear time (1+\(\acute{\epsilon }\))-approximation algorithm for k-means clustering in any dimensions. In: Proceedings of the 45th symposium on foundations of computer science. Rome, Italy, pp 454–462
23.
Zurück zum Zitat Kyrillidis AT, Becker S, Cevher V, Koch C (2013) Sparse projections onto the simplex. In: Proceedings of the 30th international conference on machine learning. Atlanta, GA, pp 235–243 Kyrillidis AT, Becker S, Cevher V, Koch C (2013) Sparse projections onto the simplex. In: Proceedings of the 30th international conference on machine learning. Atlanta, GA, pp 235–243
24.
Zurück zum Zitat Li M, Lian X, Kwok JT, Lu B (2011) Time and space efficient spectral clustering via column sampling. In: Proceedings of IEEE conference on computer vision and pattern recognition. Colorado Springs, CO, pp 2297–2304 Li M, Lian X, Kwok JT, Lu B (2011) Time and space efficient spectral clustering via column sampling. In: Proceedings of IEEE conference on computer vision and pattern recognition. Colorado Springs, CO, pp 2297–2304
28.
Zurück zum Zitat Meilǎ M, Shortreed S (2006) Regularized spectral learning. J Mach Learn Res 2006:1–20 Meilǎ M, Shortreed S (2006) Regularized spectral learning. J Mach Learn Res 2006:1–20
29.
Zurück zum Zitat Müller E, Assent I, Krieger R, Günnemann S, Seidl T (2009) Densest: density estimation for data mining in high dimensional spaces. In: Proceedings of the 9th SIAM international conference on data mining. Sparks, NV, pp 173–184 Müller E, Assent I, Krieger R, Günnemann S, Seidl T (2009) Densest: density estimation for data mining in high dimensional spaces. In: Proceedings of the 9th SIAM international conference on data mining. Sparks, NV, pp 173–184
30.
Zurück zum Zitat Nagesh H, Goil S, Choudhary A (2001) Adaptive grids for clustering massive data sets. In: Proceedings of the 1st SIAM international conference on data mining. Chicago, IL, pp 1–17 Nagesh H, Goil S, Choudhary A (2001) Adaptive grids for clustering massive data sets. In: Proceedings of the 1st SIAM international conference on data mining. Chicago, IL, pp 1–17
31.
Zurück zum Zitat Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 14, pp 849–856 Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 14, pp 849–856
32.
Zurück zum Zitat Roth V, Lange T (2003) Feature selection in clustering problems. In: Advances in neural information processing systems 16 [neural information processing systems, NIPS 2003, 8–13 Dec 2003, Vancouver and Whistler, British Columbia, Canada], pp 473–480 Roth V, Lange T (2003) Feature selection in clustering problems. In: Advances in neural information processing systems 16 [neural information processing systems, NIPS 2003, 8–13 Dec 2003, Vancouver and Whistler, British Columbia, Canada], pp 473–480
33.
Zurück zum Zitat Sequeira K, Zaki MJ (2005) Schism: a new approach to interesting subspace mining. Int J Bus Intell Data Min 1(2):137–160CrossRef Sequeira K, Zaki MJ (2005) Schism: a new approach to interesting subspace mining. Int J Bus Intell Data Min 1(2):137–160CrossRef
34.
Zurück zum Zitat Shapiro LG, Stockman GC (2001) Computer vision. Prentice Hall, San Diego Shapiro LG, Stockman GC (2001) Computer vision. Prentice Hall, San Diego
35.
Zurück zum Zitat Stewart GW, Sun J-G (1990) Matrix perturbation theory. Academic Press, San DiegoMATH Stewart GW, Sun J-G (1990) Matrix perturbation theory. Academic Press, San DiegoMATH
36.
Zurück zum Zitat Tang C, Zhang A, Pei J (2003) Mining phenotypes and informative genes from gene expression data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03). Washington, DC Tang C, Zhang A, Pei J (2003) Mining phenotypes and informative genes from gene expression data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03). Washington, DC
37.
Zurück zum Zitat Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetMATH Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetMATH
38.
Zurück zum Zitat Wang JZ, Li J, Wiederhold G (2001) Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 23(9):947–963CrossRef Wang JZ, Li J, Wiederhold G (2001) Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 23(9):947–963CrossRef
39.
Zurück zum Zitat Williams CKI, Seeger M (2001) Using the nyström method to speed up kernel machines. In: Advances in neural information processing systems 13, pp 682–688 Williams CKI, Seeger M (2001) Using the nyström method to speed up kernel machines. In: Advances in neural information processing systems 13, pp 682–688
40.
Zurück zum Zitat Xiang S, Tong X, Ye J (2013) Efficient sparse group feature selection via nonconvex optimization. In: Proceedings of the 30th international conference on machine learning. Atlanta, GA, pp 284–292 Xiang S, Tong X, Ye J (2013) Efficient sparse group feature selection via nonconvex optimization. In: Proceedings of the 30th international conference on machine learning. Atlanta, GA, pp 284–292
41.
Zurück zum Zitat Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. France, Paris, pp 907–916 Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. France, Paris, pp 907–916
42.
Metadaten
Titel
Finding multiple stable clusterings
verfasst von
Juhua Hu
Qi Qian
Jian Pei
Rong Jin
Shenghuo Zhu
Publikationsdatum
04.10.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2017
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-016-0998-9

Weitere Artikel der Ausgabe 3/2017

Knowledge and Information Systems 3/2017 Zur Ausgabe

Premium Partner