Skip to main content
Erschienen in: Journal of Classification 2/2020

24.04.2019

Improving a Centroid-Based Clustering by Using Suitable Centroids from Another Clustering

verfasst von: Mohammad Rezaei

Erschienen in: Journal of Classification | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Fast centroid-based clustering algorithms such as k-means usually converge to a local optimum. In this work, we propose a method for constructing a better clustering from two such suboptimal clustering solutions based on the fact that each suboptimal clustering has benefits regarding to including some of the correct clusters. We develop the new method COTCLUS to find two centroids from one clustering and replace them by two centroids from the other clustering so that the maximum decrease in the mean square error of the first clustering is achieved. After modifying centroids, k-means algorithm with few iterations is applied for fine-tuning. In an iterative algorithm, the resulting clustering is further improved using a new given clustering. The proposed method can find optimal clustering in a very small number of iterations for datasets with well-separated clusters. We demonstrate by experiments that the proposed method outperforms the selected competitive methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Arthur, D., & Vassilvitskii, S. (2007). K-means++: the advantages of careful seeding. Paper Proceedings of 18th annual ACM-SIAM symposium on Discrete algorithms, pp. 1027–1035. Arthur, D., & Vassilvitskii, S. (2007). K-means++: the advantages of careful seeding. Paper Proceedings of 18th annual ACM-SIAM symposium on Discrete algorithms, pp. 1027–1035.
Zurück zum Zitat Brusco, M. J., & Steinley, D. (2007). A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika, 72(4), 583–600.MathSciNetCrossRef Brusco, M. J., & Steinley, D. (2007). A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika, 72(4), 583–600.MathSciNetCrossRef
Zurück zum Zitat Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40(1), 200–210.CrossRef Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40(1), 200–210.CrossRef
Zurück zum Zitat de Amorim, R. C., Makarenkov, V., & Mirkin, B. (2016). A-Ward pβ: effective hierarchical clustering using the Minkowski metric and a fast k-means initialisation. Information Sciences, 370, 343–354.CrossRef de Amorim, R. C., Makarenkov, V., & Mirkin, B. (2016). A-Ward pβ: effective hierarchical clustering using the Minkowski metric and a fast k-means initialisation. Information Sciences, 370, 343–354.CrossRef
Zurück zum Zitat Fränti, P., & Kivijärvi, J. (2000). Randomised local search algorithm for the clustering problem. Pattern Analysis & Applications, 3(4), 358–369.MathSciNetCrossRef Fränti, P., & Kivijärvi, J. (2000). Randomised local search algorithm for the clustering problem. Pattern Analysis & Applications, 3(4), 358–369.MathSciNetCrossRef
Zurück zum Zitat Fränti, P., & Virmajoki, O. (2006). Iterative shrinking method for clustering problems. Pattern Recognition, 39(5), 761–775.CrossRef Fränti, P., & Virmajoki, O. (2006). Iterative shrinking method for clustering problems. Pattern Recognition, 39(5), 761–775.CrossRef
Zurück zum Zitat Franti, P., Virmajoki, O., & Hautamaki, V. (2006). Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1875–1881.CrossRef Franti, P., Virmajoki, O., & Hautamaki, V. (2006). Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1875–1881.CrossRef
Zurück zum Zitat Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850.CrossRef Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850.CrossRef
Zurück zum Zitat Hansen, P., & Mladenović, N. (2001). J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognition, 34(2), 405–413.CrossRef Hansen, P., & Mladenović, N. (2001). J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognition, 34(2), 405–413.CrossRef
Zurück zum Zitat Hruschka, E. R., Campello, R. J., & Freitas, A. A. (2009). A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 39(2), 133–155.CrossRef Hruschka, E. R., Campello, R. J., & Freitas, A. A. (2009). A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 39(2), 133–155.CrossRef
Zurück zum Zitat Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRef Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRef
Zurück zum Zitat Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264–323.CrossRef Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264–323.CrossRef
Zurück zum Zitat Klein, R. W., & Dubes, R. C. (1989). Experiments in projection and clustering by simulated annealing. Pattern Recognition, 22(2), 213–220.CrossRef Klein, R. W., & Dubes, R. C. (1989). Experiments in projection and clustering by simulated annealing. Pattern Recognition, 22(2), 213–220.CrossRef
Zurück zum Zitat Likas, A., Vlassis, N., & Verbeek, J. J. (2003). The global k-means clustering algorithm. Pattern Recognition, 36(2), 451–461.CrossRef Likas, A., Vlassis, N., & Verbeek, J. J. (2003). The global k-means clustering algorithm. Pattern Recognition, 36(2), 451–461.CrossRef
Zurück zum Zitat Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9), 1455–1465.CrossRef Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9), 1455–1465.CrossRef
Zurück zum Zitat Rezaei, M., & Fränti, P. (2016). Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8), 2173–2186.CrossRef Rezaei, M., & Fränti, P. (2016). Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8), 2173–2186.CrossRef
Zurück zum Zitat Selim, S. Z., & Alsultan, K. (1991). A simulated annealing algorithm for the clustering problem. Pattern Recognition, 24(10), 1003–1008.MathSciNetCrossRef Selim, S. Z., & Alsultan, K. (1991). A simulated annealing algorithm for the clustering problem. Pattern Recognition, 24(10), 1003–1008.MathSciNetCrossRef
Zurück zum Zitat Steinley, D. (2003). Local optima in k-means clustering: what you don't know may hurt you. Psychological Methods, 8(3), 294–304.CrossRef Steinley, D. (2003). Local optima in k-means clustering: what you don't know may hurt you. Psychological Methods, 8(3), 294–304.CrossRef
Zurück zum Zitat Steinley, D., & Brusco, M. J. (2007). Initializing k-means batch clustering: a critical evaluation of several techniques. Journal of Classification, 24(1), 99–121.MathSciNetCrossRef Steinley, D., & Brusco, M. J. (2007). Initializing k-means batch clustering: a critical evaluation of several techniques. Journal of Classification, 24(1), 99–121.MathSciNetCrossRef
Zurück zum Zitat Tzortzis, G., & Likas, A. (2014). The MinMax k-means clustering algorithm. Pattern Recognition, 47(7), 2505–2516.CrossRef Tzortzis, G., & Likas, A. (2014). The MinMax k-means clustering algorithm. Pattern Recognition, 47(7), 2505–2516.CrossRef
Zurück zum Zitat Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.MathSciNetCrossRef Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.MathSciNetCrossRef
Zurück zum Zitat Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: a new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1(2), 141–182.CrossRef Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: a new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1(2), 141–182.CrossRef
Metadaten
Titel
Improving a Centroid-Based Clustering by Using Suitable Centroids from Another Clustering
verfasst von
Mohammad Rezaei
Publikationsdatum
24.04.2019
Verlag
Springer US
Erschienen in
Journal of Classification / Ausgabe 2/2020
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-018-9296-4

Weitere Artikel der Ausgabe 2/2020

Journal of Classification 2/2020 Zur Ausgabe

Premium Partner