Abstract
In the k-means clustering algorithm, the selection of the initial clustering center affects the clustering efficiency. Currently widely used k-means++ can effectively improve the speed and accuracy of k-means. But k-means cluster algorithm does not scale well to massive datasets, as it needs to traverse the data set multiple times. In this paper, based on k-means++ clustering algorithm and grid clustering algorithm, a fast and efficient grid-based k-means++ clustering algorithm was proposed, which can efficiently process large-scale data. First, the N-dimensional space is granulated into disjoint rectangular grid cells. Then, the dense grid cell is marked by statistical gird cell information. Finally, the modified k-means++ clustering algorithm is applied to the meshed datasets. The experimental results on the simulation dataset show that compared with the original k-means++ clustering algorithm, the proposed algorithm can quickly obtain the clustering center and can effectively deal with the clustering problem of large-scale datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, Y.S., Chen, B.T.: Efficient fuzzy c-means clustering for image data. J. Electron. Imaging 14(1), 013017 (2005). https://doi.org/10.1117/1.1879012
Lavrač, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16(1), 3–23 (1999). https://doi.org/10.1016/S0933-3657(98)00062-1
Nazeri, Z., Bloedorn, E., Ostwald, P.: Experiences in mining aviation safety data. In: ACM SIGMOD Record, vol. 30, No. 2, pp. 562–566. ACM (2001). https://doi.org/10.1145/376284.375743
Lynch, C.: Big data: How do your data grow? Nature 455(7209), 28 (2008). https://doi.org/10.1038/455028a
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979). https://doi.org/10.2307/2346830
Arthur, D., Vassilvitskii, S.: k-means ++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007). https://doi.org/10.1145/1283383.1283494
Anusha, M., Sathiaseelan, J.G.R.: Feature selection using k-means genetic algorithm for multi-objective optimization. Procedia Comput. Sci. 57, 1074–1080 (2015). https://doi.org/10.1016/j.procs.2015.07.387
Li, M.J., Ng, M.K., Cheung, Y.M., Huang, J.Z.: Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters. IEEE Trans. Knowl. Data Eng. 20(11), 1519–1534 (2008). https://doi.org/10.1109/TKDE.2008.88
Berger, M., Rigoutsos, I.: An algorithm for point clustering and grid generation. IEEE Trans. Syst. Man Cybern. 21(5), 1278–1286 (1991). https://doi.org/10.1109/21.120081
Bhatnagar, V., Kaur, S., Chakravarthy, S.: Clustering data streams using grid-based synopsis. Knowl. Inf. Syst. 41(1), 127–152 (2014). https://doi.org/10.1007/s10115-013-0659-1
Park, N.H., Lee, W.S.: Statistical grid-based clustering over data streams. ACM Sigmod Record 33(1), 32–37 (2004). https://doi.org/10.1145/974121.974127
Yue, S., Wei, M., Wang, J.S., Wang, H.: A general grid-clustering approach. Pattern Recogn. Lett. 29(9), 1372–1384 (2008). https://doi.org/10.1016/j.patrec.2008.02.019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, Y., Zhu, Z. (2019). A Fast and Efficient Grid-Based K-means++ Clustering Algorithm for Large-Scale Datasets. In: Krömer, P., Zhang, H., Liang, Y., Pan, JS. (eds) Proceedings of the Fifth Euro-China Conference on Intelligent Data Analysis and Applications. ECC 2018. Advances in Intelligent Systems and Computing, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-03766-6_57
Download citation
DOI: https://doi.org/10.1007/978-3-030-03766-6_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03765-9
Online ISBN: 978-3-030-03766-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)