Skip to main content
Log in

Grid-DPC: Improved density peaks clustering based on spatial grid walk

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Traditional clustering methods need to find the initial centers first. A reasonable cluster center can improve the efficiency and accuracy of the algorithm. However, finding centers is not an easy task. It often needs much calculation and easily falls into local optimal points. In allusion to the problem, an improved density peaks clustering algorithm based on spatial grid walk (Grid-DPC) is proposed. Grid-DPC uses a spatial grid walk method to determine the initial cluster centers, avoiding the traditional method requiring multiple iterations to optimize the centers and preventing falling into the local optimal. In terms of density definition, Grid-DPC uses grid density instead of the traditional method, which reduces the time cost of finding neighbors. In terms of the random walk strategy, the adjacent higher grid weight direction and Lévy Flight based step length method are adopted to improve the convergence speed of the algorithm effectively. Aiming at the phenomenon of “target loss” that may occur in random walks, the variable neighborhood search method is used to help the algorithm find high-weight grids and prolong the number of steps per walk as long as possible. Simulation experiments show that the algorithm can effectively find clustering centers, complete the task of clustering arbitrarily distributed data, and has high efficiency for massive data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Dogan A, Birant D (2021) Machine learning and data mining in manufacturing. Expert Syst Appl 166:114060. https://doi.org/10.1016/j.eswa.2020.114060

    Article  Google Scholar 

  2. Maia J, Severiano C, Guimarães F, Castro C, Lemos A, Galindo J, Weiss M (2020) Evolving clustering algorithm based on mixture of typicalities for stream data mining. Future Generation Computer Systems, 106. https://doi.org/10.1016/j.future.2020.01.017

  3. Adolfsson A, Ackerman M, Brownstein NC (2019) To cluster, or not to cluster: an analysis of clusterability methods. Pattern Recogn 88:13–26. https://doi.org/10.1016/j.patcog.2018.10.026

    Article  Google Scholar 

  4. Chauhan R, Batra P, Chaudhary S (2014) A survey of density based clustering algorithms. www.ijcst.com 5. https://doi.org/10.1007/s11704-019-9059-

  5. Zaman M, Hassan A (2019) Improved statistical features-based control chart patterns recognition using anfis with fuzzy clustering. Neural Comput Appl 31:5935–5949. https://doi.org/10.1007/s00521-018-3388-2

    Article  Google Scholar 

  6. Chen L (2019) Opportunities and challenges for resource management and machine learning clusters, pp 165–166. https://doi.org/10.1145/3368235.3369376

  7. Curiel A, Gutiérrez-Soto C, Soto-Borquez P-N, Galdames P (2020) Measuring the effects of summarization in cluster-based information retrieval, 1–8. https://doi.org/10.1109/SCCC51225.2020.9281189

  8. Gao X, Shen W, Zhang L, Hu J, Fortin N, Frostig R, Ombao H (2020) Regularized matrix data clustering and its application to image analysis. Biometrics 01:1–13. https://doi.org/10.1111/biom.13354

    Google Scholar 

  9. Saha J, Mukherjee J (2021) Cnak: Cluster number assisted k-means. Pattern Recogn, 110. https://doi.org/10.1016/j.patcog.2020.107625

  10. Song K, Yao X, Nie F, Li X, Xu M (2021) Weighted bilateral k-means algorithm for fast co-clustering and fast spectral clustering. Pattern Recogn, 109. https://doi.org/10.1016/j.patcog.2020.107560

  11. Pan Pin-chen LY-J, Jiang H (2019) Initial center optimization method of k-means algorithm within non-independent and identi cally distribution context. J Chin Comput Syst 40:1254–1259

    Google Scholar 

  12. Chen Guang-ping HJ, Wang W-P (2012) Improved initial clustering center selection method for k-means algorithm. J Chin Comput Syst 33:1320–1323

    Google Scholar 

  13. Dinh T, Huynh V-N (2020) K-pbc: an improved cluster center initialization for categorical data clustering. Appl Intell, 2610–2632. https://doi.org/10.1007/s10489-020-01677-5

  14. Liu Y, Jiang Y, Hou T, Liu F (2021) A new robust fuzzy clustering validity index for imbalanced data sets. Inf Sci 547:579–591. https://doi.org/10.1016/j.ins.2020.08.041

    Article  MathSciNet  MATH  Google Scholar 

  15. Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797. https://doi.org/10.1016/j.eswa.2014.09.054

    Article  Google Scholar 

  16. Ros F, Guillaume S, El Hajji M, Riad R (2020) Kdmutual: a novel clustering algorithm combining mutual neighboring and hierarchical approaches using a new selection criterion. Knowl-Based Syst, 204. https://doi.org/10.1016/j.knosys.2020.106220

  17. Xu Q, Zhang Q, Liu J, Luo B (2020) Efficient synthetical clustering validity indexes for hierarchical clustering. Expert Syst Appl, 151. https://doi.org/10.1016/j.eswa.2020.113367

  18. Shi P, Zhao Z, Zhong H, Shen H, Ding L (2020) An improved agglomerative hierarchical clustering anomaly detection method for scientific data. Concurrency and Computation: Practice and Experience, 33. https://doi.org/10.1002/cpe.6077

  19. Pasupathi S, Shanmuganthan V, Madasamy K, Robinson H, Kim M (2021) Trend analysis using agglomerative hierarchical clustering approach for time series big data. J Supercomput 77:1–20. https://doi.org/10.1007/s11227-020-03580-9

    Article  Google Scholar 

  20. Cirrincione G, Ciravegna G, Barbiero P, Randazzo V, Pasero E (2020) The gh-exin neural network for hierarchical clustering. Neural Netw 121:57–73. https://doi.org/10.1016/j.neunet.2019.07.018

    Article  Google Scholar 

  21. Wang X, Wang X, Wilkes M (2020) A fast distance-based outlier detection technique using a divisive hierarchical clustering algorithm, pp 39–69. https://doi.org/10.1007/978-981-15-9519-6_3

  22. Yamada Y, Masuyama N, Amako N, Nojima Y, Chu Kiong L, Ishibuchi H (2020) Divisive hierarchical clustering based on adaptive resonance theory, 1–6. https://doi.org/10.1109/CcS49175.2020.9231474

  23. Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114. https://doi.org/10.1145/235968.233324

    Article  Google Scholar 

  24. Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. Inf Syst 26:35–58. https://doi.org/10.1016/S0306-4379(01)00008-4

    Article  MATH  Google Scholar 

  25. Guha S, Rastogi R, Shim K (2002) Rock: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366. https://doi.org/10.1016/S0306-4379(00)00022-3

    Article  Google Scholar 

  26. Shi J, Zhu Q, Li J (2021) A novel hierarchical clustering algorithm with merging strategy based on shared subordinates. Applied Intelligence. https://doi.org/10.1007/s10489-021-02830-4

  27. Alex R, Alessandro L (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496. https://doi.org/10.1126/science.1242072

    Article  Google Scholar 

  28. Wang S, Li Q, Zhao C, Zhu X, Yuan H, Dai T (2021) Extreme clustering – a clustering method via density extreme points. Inf Sci 542:24–39. https://doi.org/10.1016/j.ins.2020.06.069

    Article  MathSciNet  MATH  Google Scholar 

  29. Hou J, Zhang A, Qi N (2020) Density peak clustering based on relative density relationship. Pattern Recognition 108:107554. https://doi.org/10.1016/j.patcog.2020.107554

    Article  Google Scholar 

  30. Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2015) Irahc: Instance reduction algorithm using hyperrectangle clustering. Pattern Recogn 48(5):1878–1889. https://doi.org/10.1016/j.patcog.2014.11.005

    Article  MATH  Google Scholar 

  31. Flores KG, Garza SE (2020) Density peaks clustering with gap-based automatic center detection. Knowl-Based Syst, 206. https://doi.org/10.1016/j.knosys.2020.106350

  32. Dogan A, Birant D (2021) K-centroid link: a novel hierarchical clustering linkage method. Applied Intelligence. https://doi.org/10.1007/s10489-021-02624-8

  33. Skurichina M, Duin R (2002) Limited bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Applic, 5. https://doi.org/10.1007/s100440200011

  34. Yu Q (2017) Optimization of initial clustering centers selection method for k-means algorithm. Comput Syst Appl 26:170–174. https://doi.org/10.15888/j.cnki.csa.005733

    Google Scholar 

  35. Zhang S, Zhao H (2017) Algorithm research of optimal cluster number and initial cluster center. Appl Res Comput 34:1617–1620. https://doi.org/10.3969/j.issn.1001-3695.2017.06.004

    Google Scholar 

  36. Chen Y, Hu X, Fan W, Shen L, Zhang Z, Liu X, Du J, Li H, Chen Y, Li H (2020) Fast density peak clustering for large scale data based on knn. Knowledge-Based Systems 187:104824. https://doi.org/10.1016/j.knosys.2019.06.032

    Article  Google Scholar 

  37. Ullah I, Youn H, Han Y-H (2021) An efficient data aggregation and outlier detection scheme based on radial basis function neural network for wsn. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-020-02703-7

  38. Paikray H, Das P, Panda S (2021) Optimal multi-robot path planning using particle swarm optimization algorithm improved by sine and cosine algorithms. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-020-05046-9

  39. Surono S, Putri R (2020) Optimization of fuzzy c-means clustering algorithm with combination of minkowski and chebyshev distance using principal component analysis. International Journal of Fuzzy Systems. https://doi.org/10.1007/s40815-020-00997-5

  40. Pearson K (1905) The problem of the random walk. Nature 72:1865

    Article  Google Scholar 

  41. Liting G (2018) Improved wolf pack algorithm based on adaptive step length and adjustable scouting direction. Journal of Zhejiang University (Science Edition) 45:284–293. https://doi.org/10.3785/j.issn.1008-9497.2018.03.003

    Google Scholar 

  42. Mendonça JRG (2020) A numerical investigation into the scaling behavior of the longest increasing subsequences of the symmetric ultra-fat tailed random walk. Phys Lett A 384(29):126753. https://doi.org/10.1016/j.physleta.2020.126753

    Article  MathSciNet  Google Scholar 

  43. Abdel-Basset M, Hessin A-N, Abdel-Fatah L (2018) A comprehensive study of cuckoo-inspired algorithms. Neural Comput Appl 29(2):345–361. https://doi.org/10.1007/s00521-016-2464-8

    Article  Google Scholar 

  44. Schelling B, Plant C (2016) Clustering with the levy walk: hunting for clusters. In: 2016 IEEE 16Th international conference on data mining workshops (ICDMW), pp 1251–1260. https://doi.org/10.1109/ICDMW.2016.0180

  45. kelidari M, Hamidzadeh J (2021) Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator. Soft Comput 25:1–23. https://doi.org/10.1007/s00500-020-05349-x

    Article  Google Scholar 

  46. Amiri E, Mahmoudi S (2016) Efficient protocol for data clustering by fuzzy cuckoo optimization algorithm. Appl Soft Comput 41:15–21. https://doi.org/10.1016/j.asoc.2015.12.00

    Article  Google Scholar 

  47. Boushaki SI, Kamel N, Bendjeghaba O (2018) A new quantum chaotic cuckoo search algorithm for data clustering. Expert Syst Appl 96:358–372. https://doi.org/10.1016/j.eswa.2017.12.001

    Article  MATH  Google Scholar 

  48. Niu X, Zheng Y, Fournier Viger P, Wang B (2021) Parallel grid-based density peak clustering of big trajectory data. Applied Intelligence. https://doi.org/10.1007/s10489-021-02757-w

  49. Mantegna R (1994) Accurate algorithm for numerical simulation of levy stable stochastic processes. Physical Review E, Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics 49:4677–4683

    Article  Google Scholar 

  50. Zhang G, Zhang C, Zhang H (2018) Improved k-means algorithm based on density canopy. Knowledge-Based Systems 145:289–297. https://doi.org/10.1016/j.knosys.2018.01.031

    Article  Google Scholar 

  51. Yuan X-T, Hu B-G, He R (2012) Agglomerative mean-shift clustering. IEEE Trans Knowl Data Eng 24(2):209–219. https://doi.org/10.1109/TKDE.2010.232

    Article  Google Scholar 

  52. de Amorim RC, Hennig C (2015) Recovering the number of clusters in data sets with noise features using feature rescaling factors. Inf Sci 324:126–145. https://doi.org/10.1016/j.ins.2015.06.039

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China [Grant Nos. U1931209], Key Research and Development Projects of Shanxi Province [Grant Nos. 201903D121116], Central Government Guides Local Science and Technology Development Funds [Grant No. 20201070], Fundamental Research Program of Shanxi Province [Grant Nos. 20210302123223, 202103021224275], and the Graduate Education Innovation Project of Shanxi Province [Grant No.2019BY117].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to JiangHui Cai.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, B., Cai, J. & Yang, H. Grid-DPC: Improved density peaks clustering based on spatial grid walk. Appl Intell 53, 3221–3239 (2023). https://doi.org/10.1007/s10489-022-03705-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03705-y

Keywords

Navigation