nach oben

The Journal of Supercomputing

Erschienen in:

08.05.2018

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

verfasst von: Jeong-Hun Kim, Jong-Hyeok Choi, Kwan-Hee Yoo, Aziz Nasridinov

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Clustering is a typical data mining technique that partitions a dataset into multiple subsets of similar objects according to similarity metrics. In particular, density-based algorithms can find clusters of different shapes and sizes while remaining robust to noise objects. DBSCAN, a representative density-based algorithm, finds clusters by defining the density criterion with global parameters, \( \varepsilon \)-distance and \( MinPts \). However, most density-based algorithms, including DBSCAN, find clusters incorrectly because the density criterion is fixed to the global parameters and misapplied to clusters of varying densities. Although studies have been conducted to determine optimal parameters or to improve clustering performance using additional parameters and computations, running time for clustering has been significantly increased, particularly when the dataset is large. In this study, we focus on minimizing the additional computation required to determine the parameters by using the approximate adaptive \( \varepsilon \)-distance for each density while finding the clusters with varying densities that DBSCAN cannot find. Specifically, we propose a new tree structure based on a quadtree to define a dataset density layer. In addition, we propose approximate adaptive DBSCAN (AA-DBSCAN) and kAA-DBSCAN that have clustering performance similar to those of existing algorithms for finding clusters with varying densities while significantly reducing the running time required to perform clustering. We evaluate the proposed algorithms, AA-DBSCAN and kAA-DBSCAN, via extensive experiments using the state-of-the-art algorithms. Experimental results demonstrate an improvement in clustering performance and reduction in running time of the proposed algorithms.

Vorheriger Artikel An efficient parallel similarity matrix construction on MapReduce for collaborative filtering

Nächster Artikel A pattern-based outlier region detection method for two-dimensional arrays

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Lv Y, Ma T, Tang M et al (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171:9–22. https://doi.org/10.1016/j.neucom.2015.05.109 CrossRef

Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, WalthamMATH

Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recogn 60:983–997. https://doi.org/10.1016/j.patcog.2016.07.007 CrossRef

Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34):226–231

Wang X, Hamilton HJ (2003) DBRS: a density-based spatial clustering method with random sampling. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 563–575. https://doi.org/10.1007/3-540-36175-8_56

Roy S, Bhattacharyya DK (2005) An approach to find embedded clusters using density based techniques. In: International Conference on Distributed Computing and Internet Technology, pp 523–535. https://doi.org/10.1007/11604655_59

Zhou A, Zhou S, Cao J et al (2000) Approaches for scaling DBSCAN algorithm to large spatial databases. J Comput Sci Technol 15(6):509–526. https://doi.org/10.1007/BF02948834 CrossRefMATH

Xiong Z, Chen R, Zhang Y, Zhang X (2012) Multi-density DBSCAN algorithm based on density levels partitioning. J Inform Comput Sci 9(10):2739–2749

El-Sonbaty Y, Ismail MA, Farouk M (2004) An efficient density based clustering algorithm for large databases. In: 16th IEEE International Conference on Tools with Artificial Intelligence, pp 673–677. https://doi.org/10.1109/ictai.2004.27

10.

Xiaoyun C, Yufang M, Yan Z, Ping W (2008) GMDBSCAN: multi-density DBSCAN cluster based on grid. In: IEEE International Conference on e-Business Engineering, pp 780–783. https://doi.org/10.1109/ICEBE.2008.54

11.

Jiang H, Li J, Yi S et al (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Syst Appl 38(8):9373–9381. https://doi.org/10.1016/j.eswa.2011.01.135 CrossRef

12.

Chen X, Liu W, Qiu H, Lai J (2011) APSCAN: a parameter free algorithm for clustering. Pattern Recogn Lett 32(7):973–986. https://doi.org/10.1016/j.patrec.2011.02.001 CrossRef

13.

Hou J, Gao H, Li X (2016) DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans Image Process 25(7):3182–3193. https://doi.org/10.1109/TIP.2016.2559803 MathSciNetCrossRefMATH

14.

Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Rec 28(2):49–60. https://doi.org/10.1145/304182.304187 CrossRef

15.

Liu P, Zhou D, Wu N (2007) VDBSCAN: varied density based spatial clustering of applications with noise. In: International Conference on Service Systems and Service Management, pp 1–4. https://doi.org/10.1109/ICSSSM.2007.4280175

16.

Jahirabadkar S, Kulkarni P (2014) Algorithm to determine ε-distance parameter in density based clustering. Expert Syst Appl 41(6):2939–2946. https://doi.org/10.1016/j.eswa.2013.10.025 CrossRef

17.

Huang TQ, Yu YQ, Li K, Zeng WF (2009) Reckon the parameter of dbscan for multi-density data sets with constraints. Int Conf Artif Intell Comput Intell 4:375–379. https://doi.org/10.1109/AICI.2009.393

18.

Xu X, Jäger J, Kriegel H-P (1999) A fast parallel clustering algorithm for large spatial databases. Data Min Knowl Disccov 3(3):263–290. https://doi.org/10.1007/0-306-47011-X_3 CrossRef

19.

Lumer ED, Faieta B (1994) Diversity and adaptation in populations of clustering ants. Proc Third Int Conf Simul Adapt Behav 3:501–508

20.

Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J Roy Stat Soc Ser C (Appl Stat) 28(1):100–108MATH

21.

Limwattanapibool O, Arch-int S (2017) Determination of the appropriate parameters for K-means clustering using selection of region clusters based on density DBSCAN (SRCD-DBSCAN). Expert Syst. https://doi.org/10.1111/exsy.12204

22.

Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp 47–58. https://doi.org/10.1137/1.9781611972733.5

23.

Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496. https://doi.org/10.1126/science.1242072 CrossRef

24.

Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619. https://doi.org/10.1109/34.1000236 CrossRef

25.

Liu X, Yang Q, He L (2017) A novel DBSCAN with entropy and probability for mixed data. Cluster Comput 20(2):1313–1323. https://doi.org/10.1007/s10586-017-0818-3 CrossRef

26.

Kim J, Lee W, Song JJ, Lee SB (2017) Optimized combinatorial clustering for stochastic processes. Cluster Comput 20(2):1135–1148. https://doi.org/10.1007/s10586-017-0763-1 CrossRef

27.

Lulli A, Dell’Amico M, Michiardi P, Ricci L (2016) NG-DBSCAN: scalable density-based clustering for arbitrary data. Proc VLDB Endow 10(3):157–168. https://doi.org/10.14778/3021924.3021932 CrossRef

28.

Dalli A (2003) Adaptation of the F-measure to cluster based lexicon quality evaluation. In: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: Are Evaluation Methods, Metrics and Resources Reusable? pp 51–56

29.

Duan L, Xu L, Guo F et al (2007) A local-density based spatial clustering algorithm with noise. Inform Syst 32(7):978–986. https://doi.org/10.1016/j.is.2006.10.006 CrossRef

30.

Machine Learning. Clustering datasets (2016) http://cs.joensuu.fi/sipu/datasets

31.

Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

32.

Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220. https://doi.org/10.1016/j.knosys.2017.07.010 CrossRef

33.

Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. ACM Sigmod Rec 19(2):322–331. https://doi.org/10.1145/93597.98741 CrossRef

34.

Loh WK, Yu H (2015) Fast density-based clustering through dataset partition using graphics processing units. Inf Sci 308:94–112. https://doi.org/10.1016/j.ins.2014.10.023 CrossRef

35.

Andrade G, Ramos G et al (2013) G-dbscan: a gpu accelerated algorithm for density-based clustering. Proc Comput Sci 18:369–378. https://doi.org/10.1016/j.procs.2013.05.200 CrossRef

Titel: AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities
verfasst von: Jeong-Hun Kim
Jong-Hyeok Choi
Kwan-Hee Yoo
Aziz Nasridinov
Publikationsdatum: 08.05.2018
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 1/2019
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-018-2380-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 1/2019

Mesh-of-Torus: a new topology for server-centric data center networks

High-performance ECC processor architecture design for IoT security applications

A domain-divided configurable security model for cloud computing-based telecommunication services

An efficient parallel similarity matrix construction on MapReduce for collaborative filtering

Fast solution of electromagnetic scattering problems using Xeon Phi coprocessors

A lock-aware virtual machine scheduling scheme for synchronization performance

Premium Partner