Skip to main content
Erschienen in: Knowledge and Information Systems 1/2017

02.12.2016 | Regular Paper

Synchronization-based scalable subspace clustering of high-dimensional data

verfasst von: Junming Shao, Xinzuo Wang, Qinli Yang, Claudia Plant, Christian Böhm

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

How to address the challenges of the “curse of dimensionality” and “scalability” in clustering simultaneously? In this paper, we propose arbitrarily oriented synchronized clusters (ORSC), a novel effective and efficient method for subspace clustering inspired by synchronization. Synchronization is a basic phenomenon prevalent in nature, capable of controlling even highly complex processes such as opinion formation in a group. Control of complex processes is achieved by simple operations based on interactions between objects. Relying on the weighted interaction model and iterative dynamic clustering, our approach ORSC (a) naturally detects correlation clusters in arbitrarily oriented subspaces, including arbitrarily shaped nonlinear correlation clusters. Our approach is (b) robust against noise and outliers. In contrast to previous methods, ORSC is (c) easy to parameterize, since there is no need to specify the subspace dimensionality or other difficult parameters. Instead, all interesting subspaces are detected in a fully automatic way. Finally, (d) ORSC outperforms most comparison methods in terms of runtime efficiency and is highly scalable to large and high-dimensional data sets. Extensive experiments have demonstrated the effectiveness and efficiency of our approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
2.
Zurück zum Zitat Aggarwal CC, Wolf JL, Yu PS et al (1999) Fast algorithms for projected clustering. ACM SIGMOD international conference on management of data, pp 61–72 Aggarwal CC, Wolf JL, Yu PS et al (1999) Fast algorithms for projected clustering. ACM SIGMOD international conference on management of data, pp 61–72
3.
Zurück zum Zitat Aggarwal CC, Yu P S (2000) Finding generalized projected clusters in high dimensional spaces. ACM SIGMOD international conference on management of data, pp 70–81 Aggarwal CC, Yu P S (2000) Finding generalized projected clusters in high dimensional spaces. ACM SIGMOD international conference on management of data, pp 70–81
4.
Zurück zum Zitat Agrawal R, Gehrke JE, Gunopulos D et al (1998) Automatic subspace clustering of high dimensional data for data mining applications. ACM SIGMOD international conference on management of data, pp 94–105 Agrawal R, Gehrke JE, Gunopulos D et al (1998) Automatic subspace clustering of high dimensional data for data mining applications. ACM SIGMOD international conference on management of data, pp 94–105
5.
Zurück zum Zitat Ankerst M, Breunig MM, Kriegel HP et al (1999) Optics: ordering points to identify the clustering structure. ACM SIGMOD international conference on management of data, pp 49–60 Ankerst M, Breunig MM, Kriegel HP et al (1999) Optics: ordering points to identify the clustering structure. ACM SIGMOD international conference on management of data, pp 49–60
6.
Zurück zum Zitat Arenas A, Diaz-Guilera A, Perez-Vicente CJ (2006) Synchronization reveals topological scales in complex networks. Phys Rev Lett 96(11):1–4CrossRefMATH Arenas A, Diaz-Guilera A, Perez-Vicente CJ (2006) Synchronization reveals topological scales in complex networks. Phys Rev Lett 96(11):1–4CrossRefMATH
7.
8.
Zurück zum Zitat Bahrololoum A, Nezamabadi-pour H, Saryazdi S (2015) A data clustering approach based on universal gravity rule. Eng Appl Artif Intell 45:415–428CrossRef Bahrololoum A, Nezamabadi-pour H, Saryazdi S (2015) A data clustering approach based on universal gravity rule. Eng Appl Artif Intell 45:415–428CrossRef
9.
Zurück zum Zitat Böhm C, Kailing K, Kröger P et al (2004) Computing clusters of correlation connected objects. ACM SIGMOD international conference on management of data, pp 455–466 Böhm C, Kailing K, Kröger P et al (2004) Computing clusters of correlation connected objects. ACM SIGMOD international conference on management of data, pp 455–466
10.
Zurück zum Zitat Böhm C, Plant C, Shao J et al (2010) Clustering by synchronization. ACM SIGKDD international conference on knowledge discovery and data mining, pp 583–592 Böhm C, Plant C, Shao J et al (2010) Clustering by synchronization. ACM SIGKDD international conference on knowledge discovery and data mining, pp 583–592
11.
Zurück zum Zitat Cheng CH, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. ACM SIGKDD international conference on knowledge discovery and data mining, pp 84–93 Cheng CH, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. ACM SIGKDD international conference on knowledge discovery and data mining, pp 84–93
12.
Zurück zum Zitat Elhamifar E, Vidal R (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781CrossRef Elhamifar E, Vidal R (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781CrossRef
14.
Zurück zum Zitat Givoni I, Chung C, Frey B (2011) Hierarchical affinity propagation. 27th conference on uncertainty in artificial intelligence, Barcelona, Spain Givoni I, Chung C, Frey B (2011) Hierarchical affinity propagation. 27th conference on uncertainty in artificial intelligence, Barcelona, Spain
15.
Zurück zum Zitat Goil S, Nagesh H, Choudhary A (1999) MAFIA: efficient and scalable subspace clustering for very large data sets. ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452 Goil S, Nagesh H, Choudhary A (1999) MAFIA: efficient and scalable subspace clustering for very large data sets. ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452
16.
Zurück zum Zitat Günnemann S, Faloutsos C (2013) Mixed membership subspace clustering. IEEE international conference on data mining, pp 221–230 Günnemann S, Faloutsos C (2013) Mixed membership subspace clustering. IEEE international conference on data mining, pp 221–230
17.
Zurück zum Zitat Hinneburg A, Keim DA (1999) Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering. International conference on very large data bases, pp 506–517 Hinneburg A, Keim DA (1999) Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering. International conference on very large data bases, pp 506–517
18.
Zurück zum Zitat Huang J, Sun H, Kang J et al (2013) ESC: an efficient synchronization-based clustering algorithm. Knowl Based Syst 40:111–122CrossRef Huang J, Sun H, Kang J et al (2013) ESC: an efficient synchronization-based clustering algorithm. Knowl Based Syst 40:111–122CrossRef
19.
Zurück zum Zitat Indulska M, Orlowska M (2002) Gravity based spatial clustering. ACM international symposium on advances in geographic information systems, pp 125–130 Indulska M, Orlowska M (2002) Gravity based spatial clustering. ACM international symposium on advances in geographic information systems, pp 125–130
20.
Zurück zum Zitat Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle RiverMATH Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle RiverMATH
21.
Zurück zum Zitat Kailing K, Kriegel HP, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. SIAM international conference on data mining, p 4 Kailing K, Kriegel HP, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. SIAM international conference on data mining, p 4
22.
Zurück zum Zitat Kim CS, Bae CS, Tcha HJ (2008) A phase synchronization clustering algorithm for identifying interesting groups of genes from cell cycle expression data. BMC Bioinform 9:1CrossRef Kim CS, Bae CS, Tcha HJ (2008) A phase synchronization clustering algorithm for identifying interesting groups of genes from cell cycle expression data. BMC Bioinform 9:1CrossRef
23.
Zurück zum Zitat Kuramoto Y(1975) Self-entrainment of a population of coupled nonlinear oscillators. In: Araki H (ed) Proceedings of the international symposium on mathematical problems in theoretical physics. Lecture notes in physics. Springer, New York, pp 420–422 Kuramoto Y(1975) Self-entrainment of a population of coupled nonlinear oscillators. In: Araki H (ed) Proceedings of the international symposium on mathematical problems in theoretical physics. Lecture notes in physics. Springer, New York, pp 420–422
24.
25.
Zurück zum Zitat Liu J, Wang W (2003) Op-cluster: clustering by tendency in high dimensional space. IEEE international conference on data mining, pp 187–194 Liu J, Wang W (2003) Op-cluster: clustering by tendency in high dimensional space. IEEE international conference on data mining, pp 187–194
26.
Zurück zum Zitat Oyang Y, Chen C, Yang T (2001) A study on the hierarchical data clustering algorithm based on gravity theory. Principles of data mining and knowledge discovery, pp 350–361 Oyang Y, Chen C, Yang T (2001) A study on the hierarchical data clustering algorithm based on gravity theory. Principles of data mining and knowledge discovery, pp 350–361
27.
Zurück zum Zitat Procopiuc CM, Jones M, Agarwal PK et al (2002) A Monte Carlo algorithm for fast projective clustering. ACM SIGMOD international conference on management of data, pp 418–427 Procopiuc CM, Jones M, Agarwal PK et al (2002) A Monte Carlo algorithm for fast projective clustering. ACM SIGMOD international conference on management of data, pp 418–427
28.
Zurück zum Zitat Shao J (2012) Synchronization on data mining: a universal concept for knowledge discovery. LAP LAMBERT Academic Publishing, Saarbrücken Shao J (2012) Synchronization on data mining: a universal concept for knowledge discovery. LAP LAMBERT Academic Publishing, Saarbrücken
29.
Zurück zum Zitat Shao J, He X, Böhm C et al (2013) Synchronization-inspired partitioning and hierarchical clustering. IEEE Trans Knowl Discov Data Eng 25(4):893–905CrossRef Shao J, He X, Böhm C et al (2013) Synchronization-inspired partitioning and hierarchical clustering. IEEE Trans Knowl Discov Data Eng 25(4):893–905CrossRef
30.
Zurück zum Zitat Shao J, Yang Q, Dang H et al (2016) Scalable clustering by iterative partitioning and point attractor representation. ACM Trans Knowl Discov Data 11(1):5CrossRef Shao J, Yang Q, Dang H et al (2016) Scalable clustering by iterative partitioning and point attractor representation. ACM Trans Knowl Discov Data 11(1):5CrossRef
31.
Zurück zum Zitat Shao J, Ahmadi Z, Kramer S (2014) Prototype-based Learning on concept-drifting data streams. ACM SIGKDD international conference on knowledge discovery and data mining, pp 512–521 Shao J, Ahmadi Z, Kramer S (2014) Prototype-based Learning on concept-drifting data streams. ACM SIGKDD international conference on knowledge discovery and data mining, pp 512–521
32.
Zurück zum Zitat Shao J, Böhm C, Yang Q et al (2010) Synchronization based outlier detection. ECML/PKDD 2010, pp 245–260 Shao J, Böhm C, Yang Q et al (2010) Synchronization based outlier detection. ECML/PKDD 2010, pp 245–260
33.
Zurück zum Zitat Shao J, He X, Yang Q et al (2013) Robust synchronization-based graph clustering. Pacific-Asia conference on knowledge discovery and data mining, pp 249–260 Shao J, He X, Yang Q et al (2013) Robust synchronization-based graph clustering. Pacific-Asia conference on knowledge discovery and data mining, pp 249–260
34.
Zurück zum Zitat Tung AKH, Xu X, Ooi BC (2005) Curler: finding and visualizing nonlinear correlated clusters. ACM SIGMOD international conference on management of data, pp 467–478 Tung AKH, Xu X, Ooi BC (2005) Curler: finding and visualizing nonlinear correlated clusters. ACM SIGMOD international conference on management of data, pp 467–478
35.
Zurück zum Zitat Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In: The 26th international conference on machine learning, pp 1073–1080 Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In: The 26th international conference on machine learning, pp 1073–1080
36.
Zurück zum Zitat Wang H, Wang W, Yang J et al (2002) Clustering by pattern similarity in large data sets. ACM SIGMOD international conference on management of data, pp 394–405 Wang H, Wang W, Yang J et al (2002) Clustering by pattern similarity in large data sets. ACM SIGMOD international conference on management of data, pp 394–405
37.
Zurück zum Zitat Ying W, Chung F, Wang S (2014) Scaling up synchronization-inspired partitioning clustering. IEEE Trans Knowl Data Eng 26(8):2045–2057CrossRef Ying W, Chung F, Wang S (2014) Scaling up synchronization-inspired partitioning clustering. IEEE Trans Knowl Data Eng 26(8):2045–2057CrossRef
38.
Zurück zum Zitat Zhang T, Ramakrishnan R, Livny M (1996) An efficient data clustering method for very large databases. ACM SIGMOD international conference on management of data, pp 103–114 Zhang T, Ramakrishnan R, Livny M (1996) An efficient data clustering method for very large databases. ACM SIGMOD international conference on management of data, pp 103–114
Metadaten
Titel
Synchronization-based scalable subspace clustering of high-dimensional data
verfasst von
Junming Shao
Xinzuo Wang
Qinli Yang
Claudia Plant
Christian Böhm
Publikationsdatum
02.12.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2017
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-016-1013-1

Weitere Artikel der Ausgabe 1/2017

Knowledge and Information Systems 1/2017 Zur Ausgabe

Premium Partner