Skip to main content
Top
Published in: International Journal of Data Science and Analytics 4/2018

30-01-2018 | Regular Paper

Parallel edge-based visual assessment of cluster tendency on GPU

Authors: Tao Meng, Bo Yuan

Published in: International Journal of Data Science and Analytics | Issue 4/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The visual assessment of (cluster) tendency (VAT) algorithm is an effective tool for investigating cluster tendency, which produces an intuitive image of matrix as the representation of complex datasets. The improved VAT (iVAT) incorporates a path-based distance metric into VAT to improve its effectiveness on complex-shaped datasets. The efficient formulation of the iVAT algorithm (efiVAT) further reduces the computational complexity of iVAT from \(O(N^3)\) to \(O(N^2)\). In this paper, we propose eVAT, an edge-based algorithm that can replicate the output of efiVAT but is more efficient and more suitable for parallelism. We also propose a parallel scheme to accelerate eVAT using NVIDIA GPU and CUDA architecture. We show that, on a range of datasets, the GPU-based eVAT features good scalability and can achieve significant speedups.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bezdek, J.C., Hathaway, R.J.: VAT: a tool for visual assessment of (cluster) tendency. In: International Joint Conference on Neural Networks, pp. 2225–2230 (2002) Bezdek, J.C., Hathaway, R.J.: VAT: a tool for visual assessment of (cluster) tendency. In: International Joint Conference on Neural Networks, pp. 2225–2230 (2002)
2.
go back to reference Bezdek, J.C., Hathaway, R.J., Huband, J.M.: Visual assessment of clustering tendency for rectangular dissimilarity matrices. IEEE Trans. Fuzzy Syst. 15(5), 890–903 (2007)CrossRef Bezdek, J.C., Hathaway, R.J., Huband, J.M.: Visual assessment of clustering tendency for rectangular dissimilarity matrices. IEEE Trans. Fuzzy Syst. 15(5), 890–903 (2007)CrossRef
3.
go back to reference Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Elsevier, Hoboken (2012) Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Elsevier, Hoboken (2012)
4.
go back to reference Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)MATH Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)MATH
6.
go back to reference Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. In: International Workshop on Data Management on New Hardware, Damon 2009, pp. 34–42. Providence June (2009) Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. In: International Workshop on Data Management on New Hardware, Damon 2009, pp. 34–42. Providence June (2009)
7.
go back to reference Farber, R.: CUDA Application Design and Development. Morgan Kaufmann Publishers Inc., Burlington (2011) Farber, R.: CUDA Application Design and Development. Morgan Kaufmann Publishers Inc., Burlington (2011)
8.
go back to reference Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: ACM SIGMOD International Conference on Management of Data, pp. 325–336. Chicago June (2006) Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: ACM SIGMOD International Conference on Management of Data, pp. 325–336. Chicago June (2006)
9.
go back to reference Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: ACM SIGMOD International Conference on Management of Data, pp. 215–226. Paris June (2004) Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: ACM SIGMOD International Conference on Management of Data, pp. 215–226. Paris June (2004)
10.
go back to reference Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and approximate stream mining of quantiles and frequencies using graphics processors. In: ACM SIGMOD International Conference on Management of Data, pp. 611–622 (2005) Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and approximate stream mining of quantiles and frequencies using graphics processors. In: ACM SIGMOD International Conference on Management of Data, pp. 611–622 (2005)
11.
go back to reference Hathaway, R.J., Bezdek, J.C., Huband, J.M.: Scalable visual assessment of cluster tendency for large data sets. Pattern Recognit. 39(7), 1315–1324 (2006)CrossRef Hathaway, R.J., Bezdek, J.C., Huband, J.M.: Scalable visual assessment of cluster tendency for large data sets. Pattern Recognit. 39(7), 1315–1324 (2006)CrossRef
12.
go back to reference Havens, T.C., Bezdek, J.C.: An efficient f of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans. Knowl. Data Eng. 24(5), 813–822 (2012)CrossRef Havens, T.C., Bezdek, J.C.: An efficient f of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans. Knowl. Data Eng. 24(5), 813–822 (2012)CrossRef
13.
go back to reference Havens, T.C., Bezdek, J.C., Keller, J.M., Popescu, M.: Clustering in ordered dissimilarity data. Int. J. Intell. Syst. 24(5), 504–528 (2009)CrossRef Havens, T.C., Bezdek, J.C., Keller, J.M., Popescu, M.: Clustering in ordered dissimilarity data. Int. J. Intell. Syst. 24(5), 504–528 (2009)CrossRef
14.
go back to reference He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, p. 46 (2007) He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, p. 46 (2007)
15.
go back to reference He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 511–524, Vancouver June (2008) He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 511–524, Vancouver June (2008)
16.
go back to reference Huband, J.M., Bezdek, J.C., Hathaway, R.J.: Revised visual assessment of (cluster) tendency (reVAT). In: International Conference of the North American Fuzzy Information Processing Society, vol. 1, pp. 101–104 (2004) Huband, J.M., Bezdek, J.C., Hathaway, R.J.: Revised visual assessment of (cluster) tendency (reVAT). In: International Conference of the North American Fuzzy Information Processing Society, vol. 1, pp. 101–104 (2004)
17.
go back to reference Huband, J.M., Bezdek, J.C., Hathaway, R.J.: bigVAT: visual assessment of cluster tendency for large data sets. Pattern Recognit. 38(11), 1875–1886 (2005)CrossRef Huband, J.M., Bezdek, J.C., Hathaway, R.J.: bigVAT: visual assessment of cluster tendency for large data sets. Pattern Recognit. 38(11), 1875–1886 (2005)CrossRef
18.
go back to reference Larsen, E.S., Mcallister, D.: Fast matrix multiplies using graphics hardware. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 43–43 (2001) Larsen, E.S., Mcallister, D.: Fast matrix multiplies using graphics hardware. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 43–43 (2001)
20.
go back to reference Meng, T., Yuan, B.: Parallel visual assessment of cluster tendency on GPU. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 429–440. Springer (2017) Meng, T., Yuan, B.: Parallel visual assessment of cluster tendency on GPU. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 429–440. Springer (2017)
22.
go back to reference Pakhira, M.K.: Finding number of clusters before finding clusters. Procedia Technol. 4(11), 27–37 (2012)CrossRef Pakhira, M.K.: Finding number of clusters before finding clusters. Procedia Technol. 4(11), 27–37 (2012)CrossRef
23.
go back to reference Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(10), 2825–2830 (2013)MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(10), 2825–2830 (2013)MathSciNetMATH
24.
go back to reference Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (2014)CrossRef Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (2014)CrossRef
25.
go back to reference Sart, D., Mueen, A., Najjar, W., Keogh, E., Niennattrakul, V.: Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In: ICDM 2010, The IEEE International Conference on Data Mining, , pp. 14–17. Sydney December (2010) Sart, D., Mueen, A., Najjar, W., Keogh, E., Niennattrakul, V.: Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In: ICDM 2010, The IEEE International Conference on Data Mining, , pp. 14–17. Sydney December (2010)
26.
go back to reference Sledge, I.J., Huband, J.M., Bezdek, J.C.: (Automatic) Cluster count extraction from unlabeled data sets. In: 5th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 3–13 (2008) Sledge, I.J., Huband, J.M., Bezdek, J.C.: (Automatic) Cluster count extraction from unlabeled data sets. In: 5th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 3–13 (2008)
27.
go back to reference Wang, L., Geng, X., Bezdek, J., Leckie, C., Kotagiri, R.: SpecVAT: enhanced visual cluster analysis. In: IEEE International Conference on Data Mining, pp. 638–647 (2008) Wang, L., Geng, X., Bezdek, J., Leckie, C., Kotagiri, R.: SpecVAT: enhanced visual cluster analysis. In: IEEE International Conference on Data Mining, pp. 638–647 (2008)
28.
go back to reference Wang, L., Leckie, C., Ramamohanarao, K., Bezdek, J.: Automatically determining the number of clusters in unlabeled data sets. IEEE Trans. Knowl. Data Eng. 21(3), 335–350 (2009)CrossRef Wang, L., Leckie, C., Ramamohanarao, K., Bezdek, J.: Automatically determining the number of clusters in unlabeled data sets. IEEE Trans. Knowl. Data Eng. 21(3), 335–350 (2009)CrossRef
29.
go back to reference Wang, L., Nguyen, U.T., Bezdek, J.C., Leckie, C.A., Ramamohanarao, K.: iVAT and aVAT: enhanced visual analysis for cluster tendency assessment. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 16–27. Springer (2010) Wang, L., Nguyen, U.T., Bezdek, J.C., Leckie, C.A., Ramamohanarao, K.: iVAT and aVAT: enhanced visual analysis for cluster tendency assessment. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 16–27. Springer (2010)
30.
go back to reference Wilt, N.: The CUDA Handbook: A Comprehensive Guide to GPU Programming. Pearson Education, London (2013) Wilt, N.: The CUDA Handbook: A Comprehensive Guide to GPU Programming. Pearson Education, London (2013)
Metadata
Title
Parallel edge-based visual assessment of cluster tendency on GPU
Authors
Tao Meng
Bo Yuan
Publication date
30-01-2018
Publisher
Springer International Publishing
Published in
International Journal of Data Science and Analytics / Issue 4/2018
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-018-0100-7

Other articles of this Issue 4/2018

International Journal of Data Science and Analytics 4/2018 Go to the issue

Premium Partner