Top

Pattern Analysis and Applications

Published in:

05-04-2019 | Theoretical advances

`DBSCAN`-like clustering method for various data densities

Authors: Rudolf Scitovski, Kristian Sabo

Published in: Pattern Analysis and Applications | Issue 2/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, we propose a modification of the well-known DBSCAN algorithm, which recognizes clusters with various data densities in a given set of data points \({\mathcal {A}}=\{a^i\in {\mathbb {R}}^n:i=1,\dots ,m\}\). First, we define the parameter \(MinPts=\lfloor \ln |{\mathcal {A}}|\rfloor\) and after that, by using a standard procedure from DBSCAN algorithm, for each \(a\in {\mathcal {A}}\) we determine radius \(\epsilon _a\) of the circle containing MinPts elements from the set \({\mathcal {A}}\). We group the set of all these radii into the most appropriate number (t) of clusters by using Least Squares distance-like function applying SymDIRECT or SepDIRECT algorithm. In that way, we obtain parameters \(\epsilon _1>\dots >\epsilon _t\). Furthermore, for parameters \(\{MinPts,\epsilon _1\}\) we construct a partition starting with one cluster and then add new clusters for as long as the isolated groups of at least MinPts data points in some circle with radius \(\epsilon _1\) exist. We follow a similar procedure for other parameters \(\epsilon _2,\dots ,\epsilon _t\). After the implementation of the algorithm, a larger number of clusters appear than can be expected in the optimal partition. Along with defined criteria, some of them are merged by applying a merging process for which a detailed algorithm has been written. Compared to the standard DBSCAN algorithm, we show an obvious advantage for the case of data with various densities.

previous article Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

next article Smooth estimates of multiple quantiles in dynamically varying data streams

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Aggarwall CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC data mining and knowledge discovery series. Chapman & Hall, London

Akinlar C, Topal C (2013) Edcircles: a real-time circle detector with a false detection control. Pattern Recognit 46:725–740

Amami R, Smiti A (2017) An incremental method combining density clustering and support vector machines for voice pathology detection. Comput Electr Eng 57:257–265

Andrade G, Ramos G, Madeira D, Sachetto R, Ferreira R, Rocha L (2013) G-DBSCAN: a GPU accelerated algorithm for density-based clustering. Procedia Comput Sci 18:369–378

Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Rec 28:49–60

Bagirov AM, Ugon J, Webb D (2011) Fast modified global \(k\)-means algorithm for incremental cluster construction. Pattern Recognit 44:866–876MATH

Bakr AM, Ghanem NM, Ismail MA (2015) Efficient incremental density-based algorithm for clustering large datasets. Alex Eng J 54:1147–1154

Bezdek JC, Keller J, Krisnapuram R, Pal NR (2005) Fuzzy models and algorithms for pattern recognition and image processing. Springer, New YorkMATH

Birant D, Kut A (2007) ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data Knowl Eng 60:208–221

10.

Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576MathSciNetMATH

11.

Darong H, Peng W (2012) Grid-based DBSCAN algorithm with referential parameters. Phys Procedia 24:1166–1170

12.

Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of second SIAM international conference on data mining, San Francisco

13.

Ester M, Krieogel H, Sander J (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International conference on knowledge discovery and data mining (KDD-96), Portland, pp 226–231

14.

Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of Gaussian models. Pattern Anal Appl 8:32–49MathSciNetMATH

15.

Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136MathSciNetMATH

16.

Grbić R, Grahovac D, Scitovski R (2016) A method for solving the multiple ellipses detection problem. Pattern Recognit 60:824–834

17.

Grbić R, Nyarko EK, Scitovski R (2013) A modification of the DIRECT method for Lipschitz global optimization for a symmetric function. J Glob Optim 57:1193–1212MathSciNetMATH

18.

Gunawan A (2013). A Faster Algorithm for DBSCAN. Ph.D. thesis, Technische Universiteit Eindhoven

19.

Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218MATH

20.

Jiang H, Li J, Yi S, Wang X, Hu X (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Syst Appl 38:9373–9381

21.

Jones DR (2001) The direct global optimization algorithm. In: Floudas CA, Pardalos PM (eds) The encyclopedia of optimization. Kluwer Academic Publishers, Dordrect, pp 431–440

22.

Jones DR, Perttunen CD, Stuckman BE (1993) Lipschitzian optimization without the Lipschitz constant. J Optim Theory Appl 79:157–181MathSciNetMATH

23.

Karami A, Johansson R (2014) Choosing DBSCAN parameters automatically using differential evolution. Int J Comput Appl 91:1–11

24.

Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New YorkMATH

25.

Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognit 58:39–48

26.

Lai HP, Visani M, Boucher A, Ogier JM (2012) An experimental comparison of clustering methods for content-based indexing of large image databases. Pattern Anal Appl 15:345–366MathSciNet

27.

Li Z, Zhang Y, Gong H, Liu G, Li W, Tang X (2017) An automatic and efficient coronary arteries extraction method in CT angiographies. Biomed Signal Process Control 36:221–233

28.

Louhichi S, Gzara M, Ben-Abdallah H (2017) Unsupervised varied density based clustering algorithm using spline. Pattern Recognit Lett 93:48–57

29.

MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297

30.

Marošević T, Sabo K, Taler P (2013) A mathematical model for uniform distribution voters per constituencies. Croat Oper Res Rev 4:53–64

31.

McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: International conference on knowledge discovery and data mining. DBLP

32.

Mimaroglu S, Aksehirli E (2011) Improving DBSCAN’s execution time by using a pruning technique on bit vectors. Pattern Recognit Lett 32:1572–1580

33.

Morales-Esteban A, Martínez-Álvarez F, Scitovski S, Scitovski R (2014) A fast partitioning algorithm using adaptive Mahalanobis clustering with application to seismic zoning. Comput Geosci 73:132–141

34.

Sabo K, Scitovski R (2015) An approach to cluster separability in a partition. Inf Sci 305:208–218MathSciNetMATH

35.

Sabo K, Scitovski R, Vazler I (2013) One-dimensional center-based \(l_1\)-clustering method. Optim Lett 7:5–22MathSciNetMATH

36.

Scitovski R (2017) A new global optimization method for a symmetric Lipschitz continuous function and application to searching for a globally optimal partition of a one-dimensional set. J Glob Optim 68:713–727MathSciNetMATH

37.

Scitovski R, Marošević T (2014) Multiple circle detection based on center-based clustering. Pattern Recognit Lett 52:9–16

38.

Scitovski R, Sabo K (2014) Analysis of the \(k\)-means algorithm in the case of data points occurring on the border of two or more clusters. Knowl Based Syst 57:1–7

39.

Scitovski R, Scitovski S (2013) A fast partitioning algorithm and its application to earthquake investigation. Comput Geosci 59:124–131

40.

Scitovski R, Vidović I, Bajer D (2016) A new fast fuzzy partitioning algorithm. Expert Syst Appl 51:143–150

41.

Späth H (1983) Cluster-formation und analyse. R. Oldenburg Verlag, MünchenMATH

42.

Steinbach M, Tan PN, Potter VKC, Klooster S (2002) Data mining for the discovery of ocean climate indices, In: Mining scientific datasets workshop, 2nd Annual SIAM international conference on data mining

43.

Teboulle M, Berkhin P, Dhilon I, Guan Y, Kogan J (2006) Clustering with entropy-like \(k\)-means algorithms. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin, pp 127–160

44.

Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Academic Press, BurlingtonMATH

45.

Vendramin L, Campello RJGB, Hruschka ER (2009) On the comparison of relative clustering validity criteria, In: Proceedings of the SIAM international conference on data mining, SDM 2009, April 30–May 2, 2009. SIAM, Sparks, pp 733–744

46.

Viswanath P, Babu VS (2009) Rough-DBSCAN: a fast hybrid density based clustering method for large data sets. Pattern Recognit Lett 30:1477–1488

47.

Wolfram Research I (2016) Mathematica, version 11.0 edition. Wolfram Research, Inc., Champaign

48.

Xie J, Gao H, Xie W, Liu X, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(K\)-nearest neighbors. Inf Sci 354:19–40

49.

Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, New YorkMATH

50.

Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recognit 60:983–997MATH

Title: DBSCAN-like clustering method for various data densities
Authors: Rudolf Scitovski
Kristian Sabo
Publication date: 05-04-2019
Publisher: Springer London
Published in: Pattern Analysis and Applications / Issue 2/2020
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-019-00809-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2020

Smooth estimates of multiple quantiles in dynamically varying data streams

Fractal dimension of synthesized and natural color images in Lab space

Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement filtering algorithm

Pedestrian detection using multiple feature channels and contour cues with census transform histogram and random forest classifier

BISDBx: towards batch-incremental clustering for dynamic datasets using SNN-DBSCAN

Manifold ranking graph regularization non-negative matrix factorization with global and local structures

Premium Partner