Skip to main content

2017 | OriginalPaper | Buchkapitel

K-normal: An Improved K-means for Dealing with Clusters of Different Sizes

verfasst von : Yonggang Lu, Jiangang Qiao, Xiaochun Wang

Erschienen in: Intelligent Computing Methodologies

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

K-means is the most well-known and widely used classical clustering method, benefited from its efficiency and ease of implementation. But k-means has three main drawbacks: the selection of its initial cluster centers can greatly affect its final results, the number of clusters has to be predefined, and it can only find clusters of similar sizes. A lot of work has been done on improving the selection of the initial cluster centers and on determining the number of clusters. However, very little work has been done on improving k-means to deal with clusters of different sizes. In this paper, we have proposed a new clustering method, called k-normal, whose main idea is to learn cluster sizes during the same process of learning cluster centers. The proposed k-normal method can identify clusters of different sizes while keeping the efficiency of k-means. Although the Expectation Maximization (EM) method based on Gaussian mixture models can also identify the clusters of different sizes, it has a much higher computational complexity than both k-normal and k-means. Experiments on a synthetic dataset and seven real datasets show that, k-normal can outperform k-means on all the datasets. If compared with the EM method, k-normal still produces better results on six out of the eight datasets while enjoys a much higher efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Omran, M.G., Engelbrecht, A.P., Salman, A.: An overview of clustering methods. Intell. Data Anal. 11, 583–605 (2007) Omran, M.G., Engelbrecht, A.P., Salman, A.: An overview of clustering methods. Intell. Data Anal. 11, 583–605 (2007)
2.
Zurück zum Zitat Kleinberg, J.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems (NIPS) 15, Vancouver, British Columbia, Canada, 9–14 December, pp. 463–470 (2002) Kleinberg, J.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems (NIPS) 15, Vancouver, British Columbia, Canada, 9–14 December, pp. 463–470 (2002)
3.
Zurück zum Zitat Steinhaus, H.: Sur la division des corps matriels en parties. Bull. Acad. Polon. Sci. (in French) 4, 801–804 (1957) Steinhaus, H.: Sur la division des corps matriels en parties. Bull. Acad. Polon. Sci. (in French) 4, 801–804 (1957)
4.
Zurück zum Zitat Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31, 651–666 (2010) Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31, 651–666 (2010)
5.
Zurück zum Zitat Arthur, D., Vassilvitskii, S.: How slow is the k-means method. In: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry, pp. 144–153 (2006) Arthur, D., Vassilvitskii, S.: How slow is the k-means method. In: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry, pp. 144–153 (2006)
6.
Zurück zum Zitat Har-Peled, S., Sadri, B.: How fast is the k-means method. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, USA (2005) Har-Peled, S., Sadri, B.: How fast is the k-means method. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, USA (2005)
7.
Zurück zum Zitat MacKay, D.: An example inference task: clustering. In: Information Theory, Inference and Learning Algorithms, Cambridge University Press (2003) MacKay, D.: An example inference task: clustering. In: Information Theory, Inference and Learning Algorithms, Cambridge University Press (2003)
8.
Zurück zum Zitat Bezdek, J.: A convergence theorem for the fuzzy isodata clustering algorithms. Pattern Anal. Mach. Intell. 2, 1–8 (1980)CrossRefMATH Bezdek, J.: A convergence theorem for the fuzzy isodata clustering algorithms. Pattern Anal. Mach. Intell. 2, 1–8 (1980)CrossRefMATH
9.
Zurück zum Zitat Schaefer, G., Zhou, H.: Fuzzy clustering for colour reduction in images. Telecommun. Syst. 40(1-2), 17–25 (2009) Schaefer, G., Zhou, H.: Fuzzy clustering for colour reduction in images. Telecommun. Syst. 40(1-2), 17–25 (2009)
10.
Zurück zum Zitat Zhou, H., Sadka, A.H., Swash, M.R., Azizi, J., Sadiq, U.A.: Feature extraction and clustering for dynamic video summarisation. Neurocomputing 73(10–12), 1718–1729 (2010) Zhou, H., Sadka, A.H., Swash, M.R., Azizi, J., Sadiq, U.A.: Feature extraction and clustering for dynamic video summarisation. Neurocomputing 73(10–12), 1718–1729 (2010)
11.
Zurück zum Zitat Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007) Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
12.
Zurück zum Zitat Erisoglu, M., Calis, N., Sakallioglu, S.: A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn. Lett. 32, 1701–1705 (2011) Erisoglu, M., Calis, N., Sakallioglu, S.: A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn. Lett. 32, 1701–1705 (2011)
13.
Zurück zum Zitat Hung, M.C., Wu, J., Chang, J.H., Yang, D.L.: An efficient k-Means clustering algorithm using simple partitioning. J. Inf. Sci. Eng. 21, 1157–1177 (2005) Hung, M.C., Wu, J., Chang, J.H., Yang, D.L.: An efficient k-Means clustering algorithm using simple partitioning. J. Inf. Sci. Eng. 21, 1157–1177 (2005)
14.
Zurück zum Zitat Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 727–734 (2000) Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 727–734 (2000)
15.
Zurück zum Zitat Liao, K.Y., Liu, G.Z., Xiao, L., Liu, C.T.: A sample-based hierarchical adaptive k-means clustering method for large-scale video retrieval. Knowl.-Based Syst. 49, 123–133 (2013) Liao, K.Y., Liu, G.Z., Xiao, L., Liu, C.T.: A sample-based hierarchical adaptive k-means clustering method for large-scale video retrieval. Knowl.-Based Syst. 49, 123–133 (2013)
16.
18.
Zurück zum Zitat Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–569 (1983)CrossRefMATH Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–569 (1983)CrossRefMATH
Metadaten
Titel
K-normal: An Improved K-means for Dealing with Clusters of Different Sizes
verfasst von
Yonggang Lu
Jiangang Qiao
Xiaochun Wang
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-63315-2_29