Skip to main content
Erschienen in: Soft Computing 3/2014

01.03.2014 | Methodologies and Application

A parallel and scalable CAST-based clustering algorithm on GPU

verfasst von: Kawuu W. Lin, Chun-Hung Lin, Chun-Yuan Hsiao

Erschienen in: Soft Computing | Ausgabe 3/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The advances in nanometer technology and integrated circuit technology enable the graphics card to attach individual memory and one or more processing units, named GPU, in which most of the graphing instructions can be processed in parallel. Obviously, the computation resource can be used to improve the execution efficiency of not only graphing applications but other time consuming applications like data mining. The Clustering Affinity Search Technique is a famous clustering algorithm, which is widely used in clustering the biological data. In this paper, we will propose an algorithm that can utilize the GPU and the individual memory of graphics card to accelerate the execution. The experimental results show that our proposed algorithm can deliver excellent performance in terms of execution time and is scalable to very large databases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile
Zurück zum Zitat Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, Washington Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, Washington
Zurück zum Zitat Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadephia, Pennsylvania, USA, pp 49–60 Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadephia, Pennsylvania, USA, pp 49–60
Zurück zum Zitat Bai H, He L, Ouyang D, Li Z, Li H (2009) K-means on commodity GPUs with CUDA. In: Proceedings of the WRI World Congress on Computer Science and Information Engineering, Los Angeles, California USA Bai H, He L, Ouyang D, Li Z, Li H (2009) K-means on commodity GPUs with CUDA. In: Proceedings of the WRI World Congress on Computer Science and Information Engineering, Los Angeles, California USA
Zurück zum Zitat Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6(3/4):281–297 Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6(3/4):281–297
Zurück zum Zitat Cheeseman P, Stutz J (1996) Bayesian classification (AutoClass): theory and results. In: Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, pp 153–180 Cheeseman P, Stutz J (1996) Bayesian classification (AutoClass): theory and results. In: Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, pp 153–180
Zurück zum Zitat Dagdeviren O, Erciyes K (2010) Graph matching-based distributed clustering and backbone formation algorithms for sensor networks. Comput J 53:1553–1575CrossRef Dagdeviren O, Erciyes K (2010) Graph matching-based distributed clustering and backbone formation algorithms for sensor networks. Comput J 53:1553–1575CrossRef
Zurück zum Zitat Ester M, Kriegel H-P, Sander J, Xu X, (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, Portland, Orgon, pp 226–231 Ester M, Kriegel H-P, Sander J, Xu X, (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, Portland, Orgon, pp 226–231
Zurück zum Zitat Fang W, Lau KK, Lu M, Xiao X, Lam CK, Yang PY, He BS, Luo Q, Sander PV, Yang K (2008) Parallel data mining on graphics processors. Hong Kong University of Science and Technology Tech Rep (HKUSTCS0807) Fang W, Lau KK, Lu M, Xiao X, Lam CK, Yang PY, He BS, Luo Q, Sander PV, Yang K (2008) Parallel data mining on graphics processors. Hong Kong University of Science and Technology Tech Rep (HKUSTCS0807)
Zurück zum Zitat Fisher D (1987) Improving inference through conceptual clustering. In: Proceedings of the sixth national conference on artificial intelligence, vol 2, pp 461–465 Fisher D (1987) Improving inference through conceptual clustering. In: Proceedings of the sixth national conference on artificial intelligence, vol 2, pp 461–465
Zurück zum Zitat Gennari JH, Langley P, Fisher D (1989) Models of incremental concept formation. Artif Intell 40(1–3):11–61CrossRef Gennari JH, Langley P, Fisher D (1989) Models of incremental concept formation. Artif Intell 40(1–3):11–61CrossRef
Zurück zum Zitat Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, New York, pp 73–84 Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, New York, pp 73–84
Zurück zum Zitat Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366 Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366
Zurück zum Zitat Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12 Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12
Zurück zum Zitat Karypis G, Han EH, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. Computer 32(8): 68–75CrossRef Karypis G, Han EH, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. Computer 32(8): 68–75CrossRef
Zurück zum Zitat Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Zurück zum Zitat Kohonen T (1990) The self-organizing map. Proc IEEE, pp 1464–1480 Kohonen T (1990) The self-organizing map. Proc IEEE, pp 1464–1480
Zurück zum Zitat Lin KW, Deng DJ (2010) A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments. Int J Ad Hoc Ubiquitous Comput 6(4):205–215CrossRef Lin KW, Deng DJ (2010) A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments. Int J Ad Hoc Ubiquitous Comput 6(4):205–215CrossRef
Zurück zum Zitat McQueen JB (1967) Some methods of classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, pp 281–297 McQueen JB (1967) Some methods of classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, pp 281–297
Zurück zum Zitat Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th VLDB conference, Santiago, Chile, pp 144–155 Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th VLDB conference, Santiago, Chile, pp 144–155
Zurück zum Zitat Sheikholeslami G, Chatterjee S, Zhang A (1998) WaveCluster : a multi-resolution clustering approach for very large spatial databases. In: Proceedings of the 24rd international conference on very large data bases Sheikholeslami G, Chatterjee S, Zhang A (1998) WaveCluster : a multi-resolution clustering approach for very large spatial databases. In: Proceedings of the 24rd international conference on very large data bases
Zurück zum Zitat Spanakis G, Siolas G, Stafylopatis A (2012) Exploiting wikipedia knowledge for conceptual hierarchical clustering of documents. Comput J 55:299–312CrossRef Spanakis G, Siolas G, Stafylopatis A (2012) Exploiting wikipedia knowledge for conceptual hierarchical clustering of documents. Comput J 55:299–312CrossRef
Zurück zum Zitat Tseng S, Kao C-P (2003) Mining and validation gene expression patterns: an integrated approach and applications. Informatica 27:21–27MathSciNet Tseng S, Kao C-P (2003) Mining and validation gene expression patterns: an integrated approach and applications. Informatica 27:21–27MathSciNet
Zurück zum Zitat Voorhees EM (1986) Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf Process Manag 22(6):465–476CrossRef Voorhees EM (1986) Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf Process Manag 22(6):465–476CrossRef
Zurück zum Zitat Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd international conference on very large data bases, pp 186–195 Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd international conference on very large data bases, pp 186–195
Zurück zum Zitat Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. in: Proceedings of the 1996 ACM SIGMOD international conference on management of data, pp 103–114 Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. in: Proceedings of the 1996 ACM SIGMOD international conference on management of data, pp 103–114
Zurück zum Zitat Zhang T, Ramakrishnan R, Livny M, (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Discov, pp 141–182 Zhang T, Ramakrishnan R, Livny M, (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Discov, pp 141–182
Zurück zum Zitat Zhou J, Yu K-M, Wu B-C (2010) Parallel frequent patterns mining algorithm on GPU. In: Proceedings of the 2010 IEEE international conference on systems man and cybernetics, pp 435–440 Zhou J, Yu K-M, Wu B-C (2010) Parallel frequent patterns mining algorithm on GPU. In: Proceedings of the 2010 IEEE international conference on systems man and cybernetics, pp 435–440
Metadaten
Titel
A parallel and scalable CAST-based clustering algorithm on GPU
verfasst von
Kawuu W. Lin
Chun-Hung Lin
Chun-Yuan Hsiao
Publikationsdatum
01.03.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 3/2014
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-013-1074-y

Weitere Artikel der Ausgabe 3/2014

Soft Computing 3/2014 Zur Ausgabe