Skip to main content

2018 | OriginalPaper | Buchkapitel

Distributed DBSCAN Algorithm – Concept and Experimental Evaluation

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One of the most popular clustering algorithm is DBSCAN, which is known to be efficient and highly resistant to noise. In this paper we propose its distributed implementation. Distributed computing is a very fast growing way of solving problems in big datasets using a multinode cluster, rather than parallelization in one computer. Using its features in proper way, can lead to higher performance and, what is probably more important, higher scalability. In order to show added value of this way of designing and implementing algorithms we compare our results with GPU parallelization. On the basis of the obtained results We formulate the propositions how to improve our solution.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cal, P., Woźniak, M.: Data preprocessing with GPU for DBSCAN algorithm. In: Proceedings of the 8th International Conference on Computer Recognition Systems, CORES 2013, pp. 793–801 (2013) Cal, P., Woźniak, M.: Data preprocessing with GPU for DBSCAN algorithm. In: Proceedings of the 8th International Conference on Computer Recognition Systems, CORES 2013, pp. 793–801 (2013)
2.
Zurück zum Zitat Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231. AAAI Press (1996) Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231. AAAI Press (1996)
3.
Zurück zum Zitat Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7, 623–640 (1995)CrossRef Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7, 623–640 (1995)CrossRef
4.
Zurück zum Zitat Zhu, X., Wu, X.: Class noise vs attribute noise: a quantitative study. Artif. Intell. Rev. 22, 177–210 (2004)CrossRefMATH Zhu, X., Wu, X.: Class noise vs attribute noise: a quantitative study. Artif. Intell. Rev. 22, 177–210 (2004)CrossRefMATH
5.
Zurück zum Zitat Wu, X., Zhu, X.: Mining with noise knowledge: error-aware data mining. IEEE Trans. Syst. Man Cybern. 38, 917–932 (2008)CrossRef Wu, X., Zhu, X.: Mining with noise knowledge: error-aware data mining. IEEE Trans. Syst. Man Cybern. 38, 917–932 (2008)CrossRef
6.
Zurück zum Zitat Brecheisen, S., Kriegel, H.-P., Pfeifle, M.: Parallel density-based clustering of complex objects. In: PAKDD 2006, pp. 179–188. Springer, Heidelberg (2006) Brecheisen, S., Kriegel, H.-P., Pfeifle, M.: Parallel density-based clustering of complex objects. In: PAKDD 2006, pp. 179–188. Springer, Heidelberg (2006)
7.
Zurück zum Zitat Li, H., Chen, M., Gao, X.: Parallel dbscan with priority r-tree. In: Information Management and Engineering (ICIME) (2010) Li, H., Chen, M., Gao, X.: Parallel dbscan with priority r-tree. In: Information Management and Engineering (ICIME) (2010)
8.
Zurück zum Zitat Patwary, M.A., Palsetia, D., Agrawal, A., Liao, W.-K., Manne, F., Choudhary, A.: A new scalable parallel dbscan algorithm using the disjoint-set data structure. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 62:1–62:11. IEEE Computer Society Press, Los Alamitos (2012) Patwary, M.A., Palsetia, D., Agrawal, A., Liao, W.-K., Manne, F., Choudhary, A.: A new scalable parallel dbscan algorithm using the disjoint-set data structure. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 62:1–62:11. IEEE Computer Society Press, Los Alamitos (2012)
9.
Zurück zum Zitat Xu, X., Jäger, J., Kriegel, H.-P.: A fast parallel clustering algorithm for large spatial databases. Data Min. Knowl. Discov., 263–290 (1999) Xu, X., Jäger, J., Kriegel, H.-P.: A fast parallel clustering algorithm for large spatial databases. Data Min. Knowl. Discov., 263–290 (1999)
10.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004, pp. 137–150 (2004) Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004, pp. 137–150 (2004)
11.
Zurück zum Zitat White, T.: Hadoop, The Definitive Guide. O’Reilly Media Inc. (2012) White, T.: Hadoop, The Definitive Guide. O’Reilly Media Inc. (2012)
13.
Zurück zum Zitat Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analytics. O’Reilly Media, Incorporated (2015) Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analytics. O’Reilly Media, Incorporated (2015)
15.
Zurück zum Zitat Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 2 July 2014 Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 2 July 2014
16.
Zurück zum Zitat Porwik, P., Doroz, R.: Self-adaptive biometric classifier working on the reduced dataset. In: 9th International Conference on Hybrid Artificial Intelligence Systems (HAIS), Salamanca. Spain Book Series, LNCS, vol. 8480, pp. 377–388 (2014) Porwik, P., Doroz, R.: Self-adaptive biometric classifier working on the reduced dataset. In: 9th International Conference on Hybrid Artificial Intelligence Systems (HAIS), Salamanca. Spain Book Series, LNCS, vol. 8480, pp. 377–388 (2014)
Metadaten
Titel
Distributed DBSCAN Algorithm – Concept and Experimental Evaluation
verfasst von
Adam Merk
Piotr Cal
Michał Woźniak
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-59162-9_49