nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

Parallel Canopy Clustering on GPUs

verfasst von : Yusuke Kozawa, Fumitaka Hayashi, Toshiyuki Amagasa, Hiroyuki Kitagawa

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Canopy clustering is a preprocessing method for standard clustering algorithms such as k-means and hierarchical agglomerative clustering. Canopy clustering can greatly reduce the computational cost of clustering algorithms. However, canopy clustering itself may also take a vast amount of time for handling massive data, if we naïvely implement it. To address this problem, we present efficient algorithms and implementations of canopy clustering on GPUs, which have evolved recently as general-purpose many-core processors. We not only accelerate the computation of original canopy clustering, but also propose an algorithm using grid index. This algorithm partitions the data into cells to reduce redundant computations and, at the same time, to exploit the parallelism of GPUs. Experiments show that the proposed implementations on the GPU is 2 times faster on average than multi-threaded, SIMD implementations on two octa-core CPUs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Does Multilevel Semantic Representation Improve Text Categorization?

Nächstes Kapitel Efficient Storage and Query Processing of Large String in Oracle

http://nvlabs.github.io/moderngpu/.

https://code.google.com/p/thrust/.

Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the SC, 18:1–18:11 (2009)

Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Proceedings of the CIKM, pp. 661–670 (2009)

Dash, M., Petrutiu, S., Scheuermann, P.: pPOP: Fast yet accurate parallel hierarchical clustering using partitioning. Data Knowl. Eng. 61(3), 563–578 (2007)CrossRef

Fan, Z.G., Wu, Y., Wu, B.: Maximum normalized spacing for efficient visual clustering. In: Proceedings of the CIKM, pp. 409–418 (2010)

Harris, M.: Optimizing Parallel Reduction in CUDA. http://developer.download.nvidia.com/compute/cuda/2_2/sdk/website/projects/reduction/doc/reduction.pdf

Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)

He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst. 34(4), 21:1–21:39 (2009)CrossRef

Lomont, C.: Introduction to Intel\(\textregistered \) Advanced Vector Extensions. https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions

Kohlhoff, K.J., Pande, V.S., Altman, R.B.: K-Means for parallel architectures using All-Prefix-sum sorting and updating steps. IEEE Trans. Parallel Distrib. Syst. 24(8), 1602–1612 (2013)CrossRefMATH

10.

Li, Y., Zhao, K., Chu, X., Liu, J.: Speeding up k-Means algorithm by GPUs. J. Comput. Syst. Sci. 79(2), 216–229 (2013)MathSciNetCrossRef

11.

Li, Q., Wang, P., Wang, W., Hu, H., Li, Z., Li, J.: An efficient K-means clustering algorithm on MapReduce. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS, vol. 8421, pp. 357–371. Springer, Heidelberg (2014) CrossRef

12.

McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of High-dimensional data sets with application to reference matching. In: Proceedings of the KDD, pp. 169–178 (2000)

13.

NVIDIA: CUDA C Programming Guide. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf

14.

Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: Proc. IEEE GPU Comput. 96(5), 879–899 (2008)

15.

Patwary, M.A., Palsetia, D., Agrawal, A., Liao, W.k., Manne, F., Choudhary, A.: A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In: SC, pp. 62:1–62:11 (2012)

16.

Shalom, S.A.A., Dash, M.: Efficient partitioning based hierarchical agglomerative clustering using graphics accelerators with CUDA. Int. J. Artif. Intell. Appl. 4(2), 13–33 (2013)

17.

Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: SIGMOD, pp. 253–264 (2011)

18.

Wasif, M., Narayanan, P.: Scalable clustering using multiple GPUs. In: HiPC, pp. 1–10 (2011)

19.

Welton, B., Samanas, E., Miller, B.P.: Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes. In: SC, 84:1–84:11 (2013)

20.

Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: automatically fusing database primitives for efficient GPU computation. In: MICRO, pp. 107–118 (2012)

Titel: Parallel Canopy Clustering on GPUs
verfasst von: Yusuke Kozawa
Fumitaka Hayashi
Toshiyuki Amagasa
Hiroyuki Kitagawa
Verlag: Springer International Publishing
Buch: Database and Expert Systems Applications
Print ISBN: 978-3-319-22848-8

Electronic ISBN: 978-3-319-22849-5

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-3-319-22849-5_23

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"