Skip to main content
Erschienen in: Pattern Analysis and Applications 2/2020

05.04.2019 | Theoretical advances

DBSCAN-like clustering method for various data densities

verfasst von: Rudolf Scitovski, Kristian Sabo

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we propose a modification of the well-known DBSCAN algorithm, which recognizes clusters with various data densities in a given set of data points \({\mathcal {A}}=\{a^i\in {\mathbb {R}}^n:i=1,\dots ,m\}\). First, we define the parameter \(MinPts=\lfloor \ln |{\mathcal {A}}|\rfloor\) and after that, by using a standard procedure from DBSCAN algorithm, for each \(a\in {\mathcal {A}}\) we determine radius \(\epsilon _a\) of the circle containing MinPts elements from the set \({\mathcal {A}}\). We group the set of all these radii into the most appropriate number (t) of clusters by using Least Squares distance-like function applying SymDIRECT or SepDIRECT algorithm. In that way, we obtain parameters \(\epsilon _1>\dots >\epsilon _t\). Furthermore, for parameters \(\{MinPts,\epsilon _1\}\) we construct a partition starting with one cluster and then add new clusters for as long as the isolated groups of at least MinPts data points in some circle with radius \(\epsilon _1\) exist. We follow a similar procedure for other parameters \(\epsilon _2,\dots ,\epsilon _t\). After the implementation of the algorithm, a larger number of clusters appear than can be expected in the optimal partition. Along with defined criteria, some of them are merged by applying a merging process for which a detailed algorithm has been written. Compared to the standard DBSCAN algorithm, we show an obvious advantage for the case of data with various densities.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwall CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC data mining and knowledge discovery series. Chapman & Hall, London Aggarwall CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC data mining and knowledge discovery series. Chapman & Hall, London
2.
Zurück zum Zitat Akinlar C, Topal C (2013) Edcircles: a real-time circle detector with a false detection control. Pattern Recognit 46:725–740 Akinlar C, Topal C (2013) Edcircles: a real-time circle detector with a false detection control. Pattern Recognit 46:725–740
3.
Zurück zum Zitat Amami R, Smiti A (2017) An incremental method combining density clustering and support vector machines for voice pathology detection. Comput Electr Eng 57:257–265 Amami R, Smiti A (2017) An incremental method combining density clustering and support vector machines for voice pathology detection. Comput Electr Eng 57:257–265
4.
Zurück zum Zitat Andrade G, Ramos G, Madeira D, Sachetto R, Ferreira R, Rocha L (2013) G-DBSCAN: a GPU accelerated algorithm for density-based clustering. Procedia Comput Sci 18:369–378 Andrade G, Ramos G, Madeira D, Sachetto R, Ferreira R, Rocha L (2013) G-DBSCAN: a GPU accelerated algorithm for density-based clustering. Procedia Comput Sci 18:369–378
5.
Zurück zum Zitat Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Rec 28:49–60 Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Rec 28:49–60
6.
Zurück zum Zitat Bagirov AM, Ugon J, Webb D (2011) Fast modified global \(k\)-means algorithm for incremental cluster construction. Pattern Recognit 44:866–876MATH Bagirov AM, Ugon J, Webb D (2011) Fast modified global \(k\)-means algorithm for incremental cluster construction. Pattern Recognit 44:866–876MATH
7.
Zurück zum Zitat Bakr AM, Ghanem NM, Ismail MA (2015) Efficient incremental density-based algorithm for clustering large datasets. Alex Eng J 54:1147–1154 Bakr AM, Ghanem NM, Ismail MA (2015) Efficient incremental density-based algorithm for clustering large datasets. Alex Eng J 54:1147–1154
8.
Zurück zum Zitat Bezdek JC, Keller J, Krisnapuram R, Pal NR (2005) Fuzzy models and algorithms for pattern recognition and image processing. Springer, New YorkMATH Bezdek JC, Keller J, Krisnapuram R, Pal NR (2005) Fuzzy models and algorithms for pattern recognition and image processing. Springer, New YorkMATH
9.
Zurück zum Zitat Birant D, Kut A (2007) ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data Knowl Eng 60:208–221 Birant D, Kut A (2007) ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data Knowl Eng 60:208–221
10.
Zurück zum Zitat Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576MathSciNetMATH Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576MathSciNetMATH
11.
Zurück zum Zitat Darong H, Peng W (2012) Grid-based DBSCAN algorithm with referential parameters. Phys Procedia 24:1166–1170 Darong H, Peng W (2012) Grid-based DBSCAN algorithm with referential parameters. Phys Procedia 24:1166–1170
12.
Zurück zum Zitat Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of second SIAM international conference on data mining, San Francisco Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of second SIAM international conference on data mining, San Francisco
13.
Zurück zum Zitat Ester M, Krieogel H, Sander J (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International conference on knowledge discovery and data mining (KDD-96), Portland, pp 226–231 Ester M, Krieogel H, Sander J (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International conference on knowledge discovery and data mining (KDD-96), Portland, pp 226–231
14.
Zurück zum Zitat Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of Gaussian models. Pattern Anal Appl 8:32–49MathSciNetMATH Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of Gaussian models. Pattern Anal Appl 8:32–49MathSciNetMATH
15.
Zurück zum Zitat Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136MathSciNetMATH Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136MathSciNetMATH
16.
Zurück zum Zitat Grbić R, Grahovac D, Scitovski R (2016) A method for solving the multiple ellipses detection problem. Pattern Recognit 60:824–834 Grbić R, Grahovac D, Scitovski R (2016) A method for solving the multiple ellipses detection problem. Pattern Recognit 60:824–834
17.
Zurück zum Zitat Grbić R, Nyarko EK, Scitovski R (2013) A modification of the DIRECT method for Lipschitz global optimization for a symmetric function. J Glob Optim 57:1193–1212MathSciNetMATH Grbić R, Nyarko EK, Scitovski R (2013) A modification of the DIRECT method for Lipschitz global optimization for a symmetric function. J Glob Optim 57:1193–1212MathSciNetMATH
18.
Zurück zum Zitat Gunawan A (2013). A Faster Algorithm for DBSCAN. Ph.D. thesis, Technische Universiteit Eindhoven Gunawan A (2013). A Faster Algorithm for DBSCAN. Ph.D. thesis, Technische Universiteit Eindhoven
19.
Zurück zum Zitat Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218MATH Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218MATH
20.
Zurück zum Zitat Jiang H, Li J, Yi S, Wang X, Hu X (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Syst Appl 38:9373–9381 Jiang H, Li J, Yi S, Wang X, Hu X (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Syst Appl 38:9373–9381
21.
Zurück zum Zitat Jones DR (2001) The direct global optimization algorithm. In: Floudas CA, Pardalos PM (eds) The encyclopedia of optimization. Kluwer Academic Publishers, Dordrect, pp 431–440 Jones DR (2001) The direct global optimization algorithm. In: Floudas CA, Pardalos PM (eds) The encyclopedia of optimization. Kluwer Academic Publishers, Dordrect, pp 431–440
22.
Zurück zum Zitat Jones DR, Perttunen CD, Stuckman BE (1993) Lipschitzian optimization without the Lipschitz constant. J Optim Theory Appl 79:157–181MathSciNetMATH Jones DR, Perttunen CD, Stuckman BE (1993) Lipschitzian optimization without the Lipschitz constant. J Optim Theory Appl 79:157–181MathSciNetMATH
23.
Zurück zum Zitat Karami A, Johansson R (2014) Choosing DBSCAN parameters automatically using differential evolution. Int J Comput Appl 91:1–11 Karami A, Johansson R (2014) Choosing DBSCAN parameters automatically using differential evolution. Int J Comput Appl 91:1–11
24.
Zurück zum Zitat Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New YorkMATH Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New YorkMATH
25.
Zurück zum Zitat Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognit 58:39–48 Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognit 58:39–48
26.
Zurück zum Zitat Lai HP, Visani M, Boucher A, Ogier JM (2012) An experimental comparison of clustering methods for content-based indexing of large image databases. Pattern Anal Appl 15:345–366MathSciNet Lai HP, Visani M, Boucher A, Ogier JM (2012) An experimental comparison of clustering methods for content-based indexing of large image databases. Pattern Anal Appl 15:345–366MathSciNet
27.
Zurück zum Zitat Li Z, Zhang Y, Gong H, Liu G, Li W, Tang X (2017) An automatic and efficient coronary arteries extraction method in CT angiographies. Biomed Signal Process Control 36:221–233 Li Z, Zhang Y, Gong H, Liu G, Li W, Tang X (2017) An automatic and efficient coronary arteries extraction method in CT angiographies. Biomed Signal Process Control 36:221–233
28.
Zurück zum Zitat Louhichi S, Gzara M, Ben-Abdallah H (2017) Unsupervised varied density based clustering algorithm using spline. Pattern Recognit Lett 93:48–57 Louhichi S, Gzara M, Ben-Abdallah H (2017) Unsupervised varied density based clustering algorithm using spline. Pattern Recognit Lett 93:48–57
29.
Zurück zum Zitat MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297 MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297
30.
Zurück zum Zitat Marošević T, Sabo K, Taler P (2013) A mathematical model for uniform distribution voters per constituencies. Croat Oper Res Rev 4:53–64 Marošević T, Sabo K, Taler P (2013) A mathematical model for uniform distribution voters per constituencies. Croat Oper Res Rev 4:53–64
31.
Zurück zum Zitat McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: International conference on knowledge discovery and data mining. DBLP McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: International conference on knowledge discovery and data mining. DBLP
32.
Zurück zum Zitat Mimaroglu S, Aksehirli E (2011) Improving DBSCAN’s execution time by using a pruning technique on bit vectors. Pattern Recognit Lett 32:1572–1580 Mimaroglu S, Aksehirli E (2011) Improving DBSCAN’s execution time by using a pruning technique on bit vectors. Pattern Recognit Lett 32:1572–1580
33.
Zurück zum Zitat Morales-Esteban A, Martínez-Álvarez F, Scitovski S, Scitovski R (2014) A fast partitioning algorithm using adaptive Mahalanobis clustering with application to seismic zoning. Comput Geosci 73:132–141 Morales-Esteban A, Martínez-Álvarez F, Scitovski S, Scitovski R (2014) A fast partitioning algorithm using adaptive Mahalanobis clustering with application to seismic zoning. Comput Geosci 73:132–141
34.
Zurück zum Zitat Sabo K, Scitovski R (2015) An approach to cluster separability in a partition. Inf Sci 305:208–218MathSciNetMATH Sabo K, Scitovski R (2015) An approach to cluster separability in a partition. Inf Sci 305:208–218MathSciNetMATH
35.
Zurück zum Zitat Sabo K, Scitovski R, Vazler I (2013) One-dimensional center-based \(l_1\)-clustering method. Optim Lett 7:5–22MathSciNetMATH Sabo K, Scitovski R, Vazler I (2013) One-dimensional center-based \(l_1\)-clustering method. Optim Lett 7:5–22MathSciNetMATH
36.
Zurück zum Zitat Scitovski R (2017) A new global optimization method for a symmetric Lipschitz continuous function and application to searching for a globally optimal partition of a one-dimensional set. J Glob Optim 68:713–727MathSciNetMATH Scitovski R (2017) A new global optimization method for a symmetric Lipschitz continuous function and application to searching for a globally optimal partition of a one-dimensional set. J Glob Optim 68:713–727MathSciNetMATH
37.
Zurück zum Zitat Scitovski R, Marošević T (2014) Multiple circle detection based on center-based clustering. Pattern Recognit Lett 52:9–16 Scitovski R, Marošević T (2014) Multiple circle detection based on center-based clustering. Pattern Recognit Lett 52:9–16
38.
Zurück zum Zitat Scitovski R, Sabo K (2014) Analysis of the \(k\)-means algorithm in the case of data points occurring on the border of two or more clusters. Knowl Based Syst 57:1–7 Scitovski R, Sabo K (2014) Analysis of the \(k\)-means algorithm in the case of data points occurring on the border of two or more clusters. Knowl Based Syst 57:1–7
39.
Zurück zum Zitat Scitovski R, Scitovski S (2013) A fast partitioning algorithm and its application to earthquake investigation. Comput Geosci 59:124–131 Scitovski R, Scitovski S (2013) A fast partitioning algorithm and its application to earthquake investigation. Comput Geosci 59:124–131
40.
Zurück zum Zitat Scitovski R, Vidović I, Bajer D (2016) A new fast fuzzy partitioning algorithm. Expert Syst Appl 51:143–150 Scitovski R, Vidović I, Bajer D (2016) A new fast fuzzy partitioning algorithm. Expert Syst Appl 51:143–150
41.
Zurück zum Zitat Späth H (1983) Cluster-formation und analyse. R. Oldenburg Verlag, MünchenMATH Späth H (1983) Cluster-formation und analyse. R. Oldenburg Verlag, MünchenMATH
42.
Zurück zum Zitat Steinbach M, Tan PN, Potter VKC, Klooster S (2002) Data mining for the discovery of ocean climate indices, In: Mining scientific datasets workshop, 2nd Annual SIAM international conference on data mining Steinbach M, Tan PN, Potter VKC, Klooster S (2002) Data mining for the discovery of ocean climate indices, In: Mining scientific datasets workshop, 2nd Annual SIAM international conference on data mining
43.
Zurück zum Zitat Teboulle M, Berkhin P, Dhilon I, Guan Y, Kogan J (2006) Clustering with entropy-like \(k\)-means algorithms. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin, pp 127–160 Teboulle M, Berkhin P, Dhilon I, Guan Y, Kogan J (2006) Clustering with entropy-like \(k\)-means algorithms. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin, pp 127–160
44.
Zurück zum Zitat Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Academic Press, BurlingtonMATH Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Academic Press, BurlingtonMATH
45.
Zurück zum Zitat Vendramin L, Campello RJGB, Hruschka ER (2009) On the comparison of relative clustering validity criteria, In: Proceedings of the SIAM international conference on data mining, SDM 2009, April 30–May 2, 2009. SIAM, Sparks, pp 733–744 Vendramin L, Campello RJGB, Hruschka ER (2009) On the comparison of relative clustering validity criteria, In: Proceedings of the SIAM international conference on data mining, SDM 2009, April 30–May 2, 2009. SIAM, Sparks, pp 733–744
46.
Zurück zum Zitat Viswanath P, Babu VS (2009) Rough-DBSCAN: a fast hybrid density based clustering method for large data sets. Pattern Recognit Lett 30:1477–1488 Viswanath P, Babu VS (2009) Rough-DBSCAN: a fast hybrid density based clustering method for large data sets. Pattern Recognit Lett 30:1477–1488
47.
Zurück zum Zitat Wolfram Research I (2016) Mathematica, version 11.0 edition. Wolfram Research, Inc., Champaign Wolfram Research I (2016) Mathematica, version 11.0 edition. Wolfram Research, Inc., Champaign
48.
Zurück zum Zitat Xie J, Gao H, Xie W, Liu X, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(K\)-nearest neighbors. Inf Sci 354:19–40 Xie J, Gao H, Xie W, Liu X, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(K\)-nearest neighbors. Inf Sci 354:19–40
49.
Zurück zum Zitat Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, New YorkMATH Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, New YorkMATH
50.
Zurück zum Zitat Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recognit 60:983–997MATH Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recognit 60:983–997MATH
Metadaten
Titel
DBSCAN-like clustering method for various data densities
verfasst von
Rudolf Scitovski
Kristian Sabo
Publikationsdatum
05.04.2019
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 2/2020
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-019-00809-z

Weitere Artikel der Ausgabe 2/2020

Pattern Analysis and Applications 2/2020 Zur Ausgabe

Premium Partner