nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Accelerating K-Means by Grouping Points Automatically

verfasst von : Qiao Yu, Bi-Ru Dai

Erschienen in: Big Data Analytics and Knowledge Discovery

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

K-means is a well-known clustering algorithm in data mining and machine learning. It is widely applicable in various domains such as computer vision, market segmentation, social network analysis, etc. However, k-means wastes a large amount of time on the unnecessary distance calculations. Thus accelerating k-means has become a worthy and important topic. Accelerated k-means algorithms can achieve the same result as k-means, but only faster. In this paper, we present a novel accelerated exact k-means algorithm named Fission-Fusion k-means that is significantly faster than the state-of-the-art accelerated k-means algorithms. The additional memory consumption of our algorithm is also much less than other accelerated k-means algorithms. Fission-Fusion k-means accelerates k-means by grouping number of points automatically during the iterations. It can balance these expenses well between distance calculations and the filtering time cost. We conduct extensive experiments on the real world datasets. In the experiments, real world datasets verify that Fission-Fusion k-means can considerably outperform the state-of-the-art accelerated k-means algorithms especially when the datasets are low-dimensional and the number of clusters is quite large. In addition, for more separated and naturally-clustered datasets, our algorithm is relatively faster than other accelerated k-means algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Modeling Data Flow Execution in a Parallel Environment

Nächstes Kapitel A Machine Learning Trainable Model to Assess the Accuracy of Probabilistic Record Linkage

Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)MathSciNetCrossRefMATH

Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)

Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178 (2010)

Wang, J., Wang, J., Ke, Q., Zeng, G., and Li, S.: Fast approximate k-means via cluster closures. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3037–3044 (2012)

Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 277–281 (1999)

Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24, 881–892 (2002)CrossRefMATH

Elkan, C.: Using the triangle inequality to accelerate k- means. In: Proceedings of the 20th International Conference on Machine Learning (ICML), pp. 147–153 (2003)

Hamerly, G.: Making k-means even faster. In: SIAM International Conference on Data Mining (SDM), pp. 130–140 (2010)

Drake, J., Hamerly, G.: Accelerated k-means with adaptive distance bounds. In: 5th NIPS Workshop on Optimization for Machine Learning, pp. 579–587 (2012)

10.

Drake, J.: Faster k-means clustering (2013). Accessed online 19 August 2015

11.

Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., Mytkowicz, T.: Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In: Proceedings of the 32nd International Conference on Machine Learning (ICML), pp. 579–587 (2015)

12.

Ryšavý, P., Hamerly, G.: Geometric methods to accelerate k-means algorithms. In: SIAM International Conference on Data Mining (SDM), pp. 324–332 (2016)

13.

Bottesch, T., Bühler, T., Kächele, M.: Speeding up k-means by approximating euclidean distances via block vectors. In: Proceedings of the 33rd International Conference on Machine Learning, New York (2016)

14.

Newling, J., Fleuret, F.: Fast K-means with accurate bounds. In: Proceedings of the 33rd International Conference on Machine Learning, New York (2016)

15.

Bache, K., Lichman, M.: UCI machine learning repository (2013). url: http://archive.ics.uci.edu/ml/

16.

Joensuu: clustering datasets – Joensuu homepage url: https://cs.joensuu.fi/sipu/datasets/

17.

Rong-En, F.: LIBSVM homepage url:https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

Titel: Accelerating K-Means by Grouping Points Automatically
verfasst von: Qiao Yu
Bi-Ru Dai
Verlag: Springer International Publishing
Buch: Big Data Analytics and Knowledge Discovery
Print ISBN: 978-3-319-64282-6

Electronic ISBN: 978-3-319-64283-3

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-64283-3_15

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"