Skip to main content

2015 | OriginalPaper | Buchkapitel

Parallel Canopy Clustering on GPUs

verfasst von : Yusuke Kozawa, Fumitaka Hayashi, Toshiyuki Amagasa, Hiroyuki Kitagawa

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Canopy clustering is a preprocessing method for standard clustering algorithms such as k-means and hierarchical agglomerative clustering. Canopy clustering can greatly reduce the computational cost of clustering algorithms. However, canopy clustering itself may also take a vast amount of time for handling massive data, if we naïvely implement it. To address this problem, we present efficient algorithms and implementations of canopy clustering on GPUs, which have evolved recently as general-purpose many-core processors. We not only accelerate the computation of original canopy clustering, but also propose an algorithm using grid index. This algorithm partitions the data into cells to reduce redundant computations and, at the same time, to exploit the parallelism of GPUs. Experiments show that the proposed implementations on the GPU is 2 times faster on average than multi-threaded, SIMD implementations on two octa-core CPUs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the SC, 18:1–18:11 (2009) Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the SC, 18:1–18:11 (2009)
2.
Zurück zum Zitat Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Proceedings of the CIKM, pp. 661–670 (2009) Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Proceedings of the CIKM, pp. 661–670 (2009)
3.
Zurück zum Zitat Dash, M., Petrutiu, S., Scheuermann, P.: pPOP: Fast yet accurate parallel hierarchical clustering using partitioning. Data Knowl. Eng. 61(3), 563–578 (2007)CrossRef Dash, M., Petrutiu, S., Scheuermann, P.: pPOP: Fast yet accurate parallel hierarchical clustering using partitioning. Data Knowl. Eng. 61(3), 563–578 (2007)CrossRef
4.
Zurück zum Zitat Fan, Z.G., Wu, Y., Wu, B.: Maximum normalized spacing for efficient visual clustering. In: Proceedings of the CIKM, pp. 409–418 (2010) Fan, Z.G., Wu, Y., Wu, B.: Maximum normalized spacing for efficient visual clustering. In: Proceedings of the CIKM, pp. 409–418 (2010)
6.
Zurück zum Zitat Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011) Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
7.
Zurück zum Zitat He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst. 34(4), 21:1–21:39 (2009)CrossRef He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst. 34(4), 21:1–21:39 (2009)CrossRef
9.
Zurück zum Zitat Kohlhoff, K.J., Pande, V.S., Altman, R.B.: K-Means for parallel architectures using All-Prefix-sum sorting and updating steps. IEEE Trans. Parallel Distrib. Syst. 24(8), 1602–1612 (2013)CrossRefMATH Kohlhoff, K.J., Pande, V.S., Altman, R.B.: K-Means for parallel architectures using All-Prefix-sum sorting and updating steps. IEEE Trans. Parallel Distrib. Syst. 24(8), 1602–1612 (2013)CrossRefMATH
10.
Zurück zum Zitat Li, Y., Zhao, K., Chu, X., Liu, J.: Speeding up k-Means algorithm by GPUs. J. Comput. Syst. Sci. 79(2), 216–229 (2013)MathSciNetCrossRef Li, Y., Zhao, K., Chu, X., Liu, J.: Speeding up k-Means algorithm by GPUs. J. Comput. Syst. Sci. 79(2), 216–229 (2013)MathSciNetCrossRef
11.
Zurück zum Zitat Li, Q., Wang, P., Wang, W., Hu, H., Li, Z., Li, J.: An efficient K-means clustering algorithm on MapReduce. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS, vol. 8421, pp. 357–371. Springer, Heidelberg (2014) CrossRef Li, Q., Wang, P., Wang, W., Hu, H., Li, Z., Li, J.: An efficient K-means clustering algorithm on MapReduce. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS, vol. 8421, pp. 357–371. Springer, Heidelberg (2014) CrossRef
12.
Zurück zum Zitat McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of High-dimensional data sets with application to reference matching. In: Proceedings of the KDD, pp. 169–178 (2000) McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of High-dimensional data sets with application to reference matching. In: Proceedings of the KDD, pp. 169–178 (2000)
14.
Zurück zum Zitat Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: Proc. IEEE GPU Comput. 96(5), 879–899 (2008) Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: Proc. IEEE GPU Comput. 96(5), 879–899 (2008)
15.
Zurück zum Zitat Patwary, M.A., Palsetia, D., Agrawal, A., Liao, W.k., Manne, F., Choudhary, A.: A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In: SC, pp. 62:1–62:11 (2012) Patwary, M.A., Palsetia, D., Agrawal, A., Liao, W.k., Manne, F., Choudhary, A.: A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In: SC, pp. 62:1–62:11 (2012)
16.
Zurück zum Zitat Shalom, S.A.A., Dash, M.: Efficient partitioning based hierarchical agglomerative clustering using graphics accelerators with CUDA. Int. J. Artif. Intell. Appl. 4(2), 13–33 (2013) Shalom, S.A.A., Dash, M.: Efficient partitioning based hierarchical agglomerative clustering using graphics accelerators with CUDA. Int. J. Artif. Intell. Appl. 4(2), 13–33 (2013)
17.
Zurück zum Zitat Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: SIGMOD, pp. 253–264 (2011) Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: SIGMOD, pp. 253–264 (2011)
18.
Zurück zum Zitat Wasif, M., Narayanan, P.: Scalable clustering using multiple GPUs. In: HiPC, pp. 1–10 (2011) Wasif, M., Narayanan, P.: Scalable clustering using multiple GPUs. In: HiPC, pp. 1–10 (2011)
19.
Zurück zum Zitat Welton, B., Samanas, E., Miller, B.P.: Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes. In: SC, 84:1–84:11 (2013) Welton, B., Samanas, E., Miller, B.P.: Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes. In: SC, 84:1–84:11 (2013)
20.
Zurück zum Zitat Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: automatically fusing database primitives for efficient GPU computation. In: MICRO, pp. 107–118 (2012) Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: automatically fusing database primitives for efficient GPU computation. In: MICRO, pp. 107–118 (2012)
Metadaten
Titel
Parallel Canopy Clustering on GPUs
verfasst von
Yusuke Kozawa
Fumitaka Hayashi
Toshiyuki Amagasa
Hiroyuki Kitagawa
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-22849-5_23