nach oben

The Journal of Supercomputing

Erschienen in:

26.11.2020

K-DBSCAN: An improved DBSCAN algorithm for big data

verfasst von: Nahid Gholizadeh, Hamid Saadatfar, Nooshin Hanafi

Erschienen in: The Journal of Supercomputing | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Big data storage and processing are among the most important challenges now. Among data mining algorithms, DBSCAN is a common clustering method. One of the most important drawbacks of this algorithm is its low execution speed. This study aims to accelerate the DBSCAN execution speed so that the algorithm can respond to big datasets in an acceptable period of time. To overcome the problem, an initial grouping was applied to the data in this article through the K-means++ algorithm. DBSCAN was then employed to perform clustering in each group separately. As a result, the computational burden of DBSCAN execution reduced and the clustering execution speed increased significantly. Finally, border clusters were merged if necessary. According to the results of executing the proposed algorithm, it managed to greatly reduce the DBSCAN execution time (98% in the best-case scenario) with no significant changes in the qualitative evaluation criteria for clustering.

Vorheriger Artikel Parallelization of the self-organized maps algorithm for federated learning on distributed sources

Nächster Artikel An evolutionary multi-hidden Markov model for intelligent threat sensing in industrial internet of things

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Storey V, Song I (2017) Big data technologies and management: What conceptual modeling can do. Data KnowlEng 108:50–67CrossRef

Ianni M, Masciari E, Mazzeo G, Zaniolo C (2018) Efficient big data clustering. In: 22nd International Database Engineering & Applications Symposium, pp 103–109. ACM

Arora P, Deepali D, Varshney S (2016) Analysis of K-Means and K-Medoids algorithm for big data. ProcediaCompuSci 78:507–512

Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM ComputSurv 31(3):264–323

Zhu J, Zeng M, Huang J, Liao S, Cai C, Zheng L (2020) Vehicle re-identification using quadruple directional deep learning features. IEEE Trans IntellTranspSyst 21(1):410–420

Liu S, Liu M, Li P, Zhao J, Zhu Z, Wang X (2017) SAR image denoising via sparse representation in Shearlet domain based on continuous cycle spinning. IEEE Trans Geosci Remote Sens 55(5):2985–2992CrossRef

Pei S, Shen T, Wang X, Gu C, Ning Z, Ye X, Xiong N (2020) 3DACN: 3D augmented convolutional network for time series data. InfSci 513:17–29

Qiao S, Li T, Li H, Peng J, Chen H (2012) A new blockmodeling based hierarchical clustering algorithm for web social networks. EngApplArtifIntell 25(3):640–647

Che D, Safran M, Peng Z (2013) From big data to big data mining: challenges, issues, and opportunities. In: International Conference on Database Systems for Advanced Applications, pp 1–15. Springer

10.

Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp 226–231.

11.

David A, Sergei V (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp 1027–1035. ACM

12.

Katal A, Wazid M, and Goudar R (2013) Big data: Issues, challenges, tools and good practices. In: 2013 6th International Conference on Contemporary Computing, pp 404–409. IEEE

13.

Shahrivari S (2014) Beyond batch processing: Towards real-time and streaming big data. Computers 3(4):117–129CrossRef

14.

Chen M, Mao S, Liu Y (2014) Big data: A survey. Mob NetwAppl 19(2):171–209CrossRef

15.

Shirkhorshidi A, Aghabozorgi S, Wah T, Herawan T (2014) Big data clustering: a review. In: International Conference on Computational Science and its Applications, pp 707–720. Springer

16.

LIU B (2006) A fast density-based clustering algorithm for large databases. In: 2006 International Conference on Machine Learning and Cybernetics, pp 996–1000. IEEE

17.

Wu Y, Guo J, ZHANG X (2007) A linear DBSCAN algorithm based on LSH. In: 2007 International Conference on Machine Learning and Cybernetics, Vol 5, pp 2608–2614. IEEE

18.

Dogan Y, Birant D, Kut A (2013) SOM++: Integration of self-organizing map and K-Means++ algorithms. In: International Conference on Machine Learning and Data Mining in Pattern Recognition, pp 246–259. Springer

19.

Bakr A, Ghanem N, Ismail M (2015) Efficient incremental density-based algorithm for clustering large datasets. Alex Eng J 54(4):1147–1154CrossRef

20.

Xu T, Chiang H, Liu G, Tan C (2015) Hierarchical K-means method for clustering large-scale advanced metering infrastructure data. IEEE Trans Power Delivery 32(2):609–616CrossRef

21.

Ismkhan H (2018) I-k-means++: An iterative clustering algorithm based on an enhanced version of the k-means. PattRecogn 79:402–413

22.

Brown D, Japa A, Shi Y (2019) A fast density-grid based clustering method Daniel Brown. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference, pp 0048–0054. IEEE

23.

Mathur V, Mehta J, Singh S (2019) "HCA-DBSCAN: HyperCube accelerated density based spatial clustering for applications with noise," in 33rd Conference on Neural Information Processing Systems (arXiv preprint).

24.

Luchi D, Rodrigues A, Varejao F (2019) Sampling approaches for applying DBSCAN to large datasets. Pattern RecognLett 117:90–96CrossRef

25.

Chen Y, Zhou L, Pei S, Yu Z, Chen Y, Liu X, Du J, Xiong N (2019) KNN-BLOCK DBSCAN Fast Clustering for Large-Scale Data. IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp 1–15.

26.

Chen Y, Zhou L, Bouguila N, Wang C, Chen Y, Du J (2020) BLOCK-DBSCAN Fast clustering for large scale data. PattRecogn 109:107627

27.

Cui X, Zhu P, Yang X, Li K, Ji C (2014) Optimized big data K-means clustering using MapReduce. J Supercompu 70(3):1249–1259CrossRef

28.

Sinha A, Jana P (2016) A novel K-means based clustering algorithm for big data. In: Conference on Advances in Computing, Communications and Informatics, pp 1875–1879. IEEE

29.

Song H, Lee J (2018) RP-DBSCAN: A superfast parallel DBSCAN algorithm based on random partitioning. In: 2018 International Conference on Management of Data, pp 1173–1187.

30.

Li S (2020) An Improved DBSCAN Algorithm Based on the Neighbor Similarity and Fast Nearest Neighbor Query. IEEE Access 8:47468–47476CrossRef

31.

José-García A, Gómez-Flores W (2016) Automatic clustering using nature-inspired metaheuristics: A survey. Appl Soft Compu 41:192–213CrossRef

32.

Davies D, Bouldin D (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227CrossRef

33.

Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J ComputAppl Math 20:53–65CrossRef

34.

UCI. http://archive.ics.uci.edu/ml/index.php. Accessed 1 June 2020

35.

GitHub. https://vincentarelbundock.github.io/Rdatasets/datasets.html. Accessed 1 June 2020

Titel: K-DBSCAN: An improved DBSCAN algorithm for big data
verfasst von: Nahid Gholizadeh
Hamid Saadatfar
Nooshin Hanafi
Publikationsdatum: 26.11.2020
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 6/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-020-03524-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 6/2021

Query processing optimization in broadcasting XML data in mobile communications

Error detection of arithmetic expressions

New set of adapted Gegenbauer–Chebyshev invariant moments for image recognition and classification

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

DIESEL: A novel deep learning-based tool for SpMV computations and solving sparse linear equation systems

Parallelization of the self-organized maps algorithm for federated learning on distributed sources