Skip to main content
Top
Published in: Cluster Computing 2/2017

28-02-2017

High-performance data mining with intelligent SSD

Authors: Yong-Yeon Jo, Sang-Wook Kim, Sung-Woo Cho, Duck-Ho Bae, Hyunok Oh

Published in: Cluster Computing | Issue 2/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

An intuitive way to process the big data efficiently is to reduce the volume of data transferred over the storage interface to a host system. This is the reason that the notion of intelligent SSD (iSSD) was proposed to give processing power to SSD. There is rich literature on iSSD, however, its real implementation has not been provided to the public yet. Most prior work aims to quantify the benefits of iSSD with analytical modeling. In this paper, we first develop on iSSD simulator and present the potential of iSSD in data mining through the iSSD simulator. Our iSSD simulator performs on top of the gem 5 simulator and fully simulates all the processes of data mining algorithms running in iSSD with cycle-level accuracy. Then, we further addresse how to exploit all the computing resources for efficient processing of data mining algorithms. These days, CPU, GPU, and SSD are recently equipped together in most computing environment. If SSD is replaced with iSSD later on, we have a new computing environment where the three computing resources collaborate one another to process big data quite effectively. For this, scheduling is required to decide which computing resource is going to run for which function at which time. In our heterogeneous scheduling, types of computing resources, memory sizes in computing resources, and inter-processor communication times including IO time in SSD are considered. Our scheduling results show that processing in the collaborative environment outperforms that in the traditional one by up to about 10 times.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
3
The SDF graphs of data mining algorithms are also used in our heterogeneous scheduling scheme in Sect. 5.
 
Literature
1.
go back to reference Bae, D., Kim, J., Jo, Y., Kim, S., Oh, H., Park, C.: Intelligent SSD: a turbo for big data mining. Compu. Sci. Inf. Syst. 13(2), 375–394 (2016)CrossRef Bae, D., Kim, J., Jo, Y., Kim, S., Oh, H., Park, C.: Intelligent SSD: a turbo for big data mining. Compu. Sci. Inf. Syst. 13(2), 375–394 (2016)CrossRef
2.
go back to reference Kim, S., Oh, H., Park, C., Cho, S., Lee, S.: Fast, energy efficient scan inside flash memory SSDs. In: Proceedings of International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (2011) Kim, S., Oh, H., Park, C., Cho, S., Lee, S.: Fast, energy efficient scan inside flash memory SSDs. In: Proceedings of International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (2011)
3.
go back to reference Do, J., Kee, Y., Patel, J., Park, C., Park, K., DeWitt, D.: Query processing on smart SSDs: opportunities and challenges. In: Proceedings of the ACM International Conference on Management of Data, pp. 1221–1230 (2013) Do, J., Kee, Y., Patel, J., Park, C., Park, K., DeWitt, D.: Query processing on smart SSDs: opportunities and challenges. In: Proceedings of the ACM International Conference on Management of Data, pp. 1221–1230 (2013)
4.
go back to reference Jo, Y., Cho, S., Kim, S., Bae, D., Oh, H.: On running data-intensive algorithms with intelligent SSD and host CPU: a collaborative approach. In: Proceedings of the International Conference on ACM/SIGAPP Symposium on Applied Computing, pp. 2060–2065 (2015) Jo, Y., Cho, S., Kim, S., Bae, D., Oh, H.: On running data-intensive algorithms with intelligent SSD and host CPU: a collaborative approach. In: Proceedings of the International Conference on ACM/SIGAPP Symposium on Applied Computing, pp. 2060–2065 (2015)
7.
go back to reference Fung, J., Mann, S.: Using graphics devices in reverse: GPU-based image processing and computer vision. In: Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, pp. 9–12 (2008) Fung, J., Mann, S.: Using graphics devices in reverse: GPU-based image processing and computer vision. In: Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, pp. 9–12 (2008)
8.
go back to reference Ryoo, S., Rodrigues, C., Baghsorkhi, S., Stone, S., Kirk, D., Hwu, W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 73–82 (2008) Ryoo, S., Rodrigues, C., Baghsorkhi, S., Stone, S., Kirk, D., Hwu, W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 73–82 (2008)
9.
go back to reference Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics coprocessor sorting for large database management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 325–336 (2006) Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics coprocessor sorting for large database management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 325–336 (2006)
10.
go back to reference Pabst, S., Koch, A., Straber, W.: Fast and scalable CPU/GPU collision detection for rigid and deformable surfaces. Comput. Graph. Forum 29(5), 1605–1612 (2010)CrossRef Pabst, S., Koch, A., Straber, W.: Fast and scalable CPU/GPU collision detection for rigid and deformable surfaces. Comput. Graph. Forum 29(5), 1605–1612 (2010)CrossRef
11.
go back to reference Oh, H., Ha, S.: A static scheduling heuristic for heterogeneous processors. In: Proceedings of International Conference on Euro-par Parallel Processing, pp. 573–577 (1996) Oh, H., Ha, S.: A static scheduling heuristic for heterogeneous processors. In: Proceedings of International Conference on Euro-par Parallel Processing, pp. 573–577 (1996)
12.
go back to reference Topcuoglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 3(3), 260–274 (2002)CrossRef Topcuoglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 3(3), 260–274 (2002)CrossRef
13.
go back to reference Sih, G., Lee, E.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)CrossRef Sih, G., Lee, E.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)CrossRef
14.
go back to reference Kwok, Y., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999)CrossRef Kwok, Y., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999)CrossRef
15.
go back to reference Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report (2008) Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report (2008)
16.
go back to reference Farivar, R., Rebolledo, D., Chan, E., Campbell, R.: A parallel implementation of K-means clustering on GPUs. PDPTA 13(2), 212–312 (2008) Farivar, R., Rebolledo, D., Chan, E., Campbell, R.: A parallel implementation of K-means clustering on GPUs. PDPTA 13(2), 212–312 (2008)
17.
go back to reference Adil, S., Qamar, S.: Implementation of association rule mining using CUDA. In: Proceedings of International Conference on Emerging Technologies, pp. 332–336 (2009) Adil, S., Qamar, S.: Implementation of association rule mining using CUDA. In: Proceedings of International Conference on Emerging Technologies, pp. 332–336 (2009)
18.
go back to reference Zhou, J., Yu, K., Wu, B.-C.: Parallel frequent patterns mining algorithm on GPU. In: Proceedings of International Conference on Systems Man and Cybernetics, pp. 435–440 (2010) Zhou, J., Yu, K., Wu, B.-C.: Parallel frequent patterns mining algorithm on GPU. In: Proceedings of International Conference on Systems Man and Cybernetics, pp. 435–440 (2010)
19.
go back to reference Catanzaro, B., Sundaram, N., Keutzer, K.: Fast support vector machine training and classification on graphics processors. In: Proceedings of the 25th International Conference on Machine Learning, pp. 104–111 (2008) Catanzaro, B., Sundaram, N., Keutzer, K.: Fast support vector machine training and classification on graphics processors. In: Proceedings of the 25th International Conference on Machine Learning, pp. 104–111 (2008)
20.
go back to reference Cho, S., Park, C., Oh, H., Kim, S., Y,i Y., Ganger, G.: Active disk meets flash: a case for intelligent SSDs. In: Proceedings of the 27th International ACM Conference on Supercomputing, pp. 91–102 (2013) Cho, S., Park, C., Oh, H., Kim, S., Y,i Y., Ganger, G.: Active disk meets flash: a case for intelligent SSDs. In: Proceedings of the 27th International ACM Conference on Supercomputing, pp. 91–102 (2013)
21.
go back to reference Kang, Y., Kee, Y., Miller, E., Park, C.: Enabling cost-effective data processing with smart SSD. In: Proceedings of the 29th IEEE Symposium on Massive Storage Systems and Technologies, pp. 1–12 (2013) Kang, Y., Kee, Y., Miller, E., Park, C.: Enabling cost-effective data processing with smart SSD. In: Proceedings of the 29th IEEE Symposium on Massive Storage Systems and Technologies, pp. 1–12 (2013)
22.
go back to reference Lee, E., Messerschmitt, D.: Synchronous data flow. Proc. IEEE 75(9), 1235–1245 (1987)CrossRef Lee, E., Messerschmitt, D.: Synchronous data flow. Proc. IEEE 75(9), 1235–1245 (1987)CrossRef
23.
go back to reference MacQueen, J., et al.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(14), pp. 281–297 (1967) MacQueen, J., et al.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(14), pp. 281–297 (1967)
24.
go back to reference Page L. et al.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford University (1999) Page L. et al.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford University (1999)
25.
go back to reference Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002) Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002)
26.
go back to reference Binkert, N., et al.: The Gem5 simulator. ACM SIGARCH Comput. Archit. News 39(2), 1–7 (2011)CrossRef Binkert, N., et al.: The Gem5 simulator. ACM SIGARCH Comput. Archit. News 39(2), 1–7 (2011)CrossRef
28.
go back to reference Jo, Y., Kim, S., Chung, M., Oh, H.: Data mining in intelligent SSD: simulator-based evaluation. In: Proceedings of Big Data and Smart Computing, pp. 123–128 (2016) Jo, Y., Kim, S., Chung, M., Oh, H.: Data mining in intelligent SSD: simulator-based evaluation. In: Proceedings of Big Data and Smart Computing, pp. 123–128 (2016)
29.
go back to reference Jin, D., Cho, S., Jo, Y., Kim, S.: Performance analysis of collaborative processing by scheduling algorithm. In: Proceeding of The 2014 Fall Conference of the KIPS, pp. 105–107 (2014) Jin, D., Cho, S., Jo, Y., Kim, S.: Performance analysis of collaborative processing by scheduling algorithm. In: Proceeding of The 2014 Fall Conference of the KIPS, pp. 105–107 (2014)
30.
go back to reference Agraval, R., Srikant, R.: Fast algorithms for mining association rules in large data bases. In: Proceedings of 20th International Conference on Very Large Data Bases. pp. 487–499 (1994) Agraval, R., Srikant, R.: Fast algorithms for mining association rules in large data bases. In: Proceedings of 20th International Conference on Very Large Data Bases. pp. 487–499 (1994)
Metadata
Title
High-performance data mining with intelligent SSD
Authors
Yong-Yeon Jo
Sang-Wook Kim
Sung-Woo Cho
Duck-Ho Bae
Hyunok Oh
Publication date
28-02-2017
Publisher
Springer US
Published in
Cluster Computing / Issue 2/2017
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-0789-4

Other articles of this Issue 2/2017

Cluster Computing 2/2017 Go to the issue

Premium Partner