Skip to main content

2019 | OriginalPaper | Buchkapitel

nativeNDP: Processing Big Data Analytics on Native Storage Nodes

verfasst von : Tobias Vinçon, Sergey Hardock, Christian Riegger, Andreas Koch, Ilia Petrov

Erschienen in: Advances in Databases and Information Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data analytics tasks on large datasets are computationally-intensive and often demand the compute power of cluster environments. Yet, data cleansing, preparation, dataset characterization and statistics or metrics computation steps are frequent. These are mostly performed ad hoc, in an explorative manner and mandate low response times. But, such steps are I/O intensive and typically very slow due to low data locality, inadequate interfaces and abstractions along the stack. These typically result in prohibitively expensive scans of the full dataset and transformations on interface boundaries.
In this paper, we examine R as analytical tool, managing large persistent datasets in Ceph, a wide-spread cluster file-system. We propose nativeNDP – a framework for Near-Data Processing that pushes down primitive R tasks and executes them in-situ, directly within the storage device of a cluster-node. Across a range of data sizes, we show that nativeNDP is more than an order of magnitude faster than other pushdown alternatives.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Acharya, A., Uysal, M., Saltz, J.H.: Active disks: programming model, algorithms and evaluation. In: ASPLOS (1998) Acharya, A., Uysal, M., Saltz, J.H.: Active disks: programming model, algorithms and evaluation. In: ASPLOS (1998)
3.
Zurück zum Zitat Boral, H., De Witt, D.J.: Database machines: an idea whose time has passed? A critique of the future of database machines. In: Parallel Architectures for Database Systems (1989) Boral, H., De Witt, D.J.: Database machines: an idea whose time has passed? A critique of the future of database machines. In: Parallel Architectures for Database Systems (1989)
4.
Zurück zum Zitat Cho, S., Park, C., Oh, H., Kim, S., Yi, Y., Ganger, G.R.: Active disk meets flash. In: Proceedings 27th International Conference on Supercomputing - ICS, p. 91. ACM Press (2013) Cho, S., Park, C., Oh, H., Kim, S., Yi, Y., Ganger, G.R.: Active disk meets flash. In: Proceedings 27th International Conference on Supercomputing - ICS, p. 91. ACM Press (2013)
5.
Zurück zum Zitat De, A., Gokhale, M., Gupta, R., Swanson, S.: Minerva: accelerating data analysis in next-generation SSDs. In: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 9–16. IEEE, April 2013 De, A., Gokhale, M., Gupta, R., Swanson, S.: Minerva: accelerating data analysis in next-generation SSDs. In: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 9–16. IEEE, April 2013
6.
Zurück zum Zitat DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35, 85–98 (1992) CrossRef DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35, 85–98 (1992) CrossRef
8.
Zurück zum Zitat Gray, J., Shenoy, P.J.: Rules of thumb in data engineering. In: Proceedings ICDE, p. 3 (2000) Gray, J., Shenoy, P.J.: Rules of thumb in data engineering. In: Proceedings ICDE, p. 3 (2000)
9.
Zurück zum Zitat Gu, B., et al.: Biscuit: a framework for near-data processing of big data workloads. In: ACM/IEEE 43rd Annual International Symposium on Computer Architecture, vol. 8, pp. 153–165. IEEE, June 2016 Gu, B., et al.: Biscuit: a framework for near-data processing of big data workloads. In: ACM/IEEE 43rd Annual International Symposium on Computer Architecture, vol. 8, pp. 153–165. IEEE, June 2016
10.
Zurück zum Zitat Hardock, S., Petrov, I., Gottstein, R., Buchmann, A.: NoFTL: database systems on FTL-less flash storage. Proc. VLDB Endow. (2013) Hardock, S., Petrov, I., Gottstein, R., Buchmann, A.: NoFTL: database systems on FTL-less flash storage. Proc. VLDB Endow. (2013)
11.
Zurück zum Zitat István, Z., Sidler, D., Alonso, G.: Caribou. Proc. VLDB Endow. 10(11), 1202–1213 (2017)CrossRef István, Z., Sidler, D., Alonso, G.: Caribou. Proc. VLDB Endow. 10(11), 1202–1213 (2017)CrossRef
12.
Zurück zum Zitat Keeton, K., Patterson, D.A., Hellerstein, J.M.: A case for intelligent disks (IDISKS). SIGMOD Rec. 27(3), 42–52 (1998)CrossRef Keeton, K., Patterson, D.A., Hellerstein, J.M.: A case for intelligent disks (IDISKS). SIGMOD Rec. 27(3), 42–52 (1998)CrossRef
13.
Zurück zum Zitat Kim, S., Oh, H., Park, C., Cho, S., Lee, S.W., Moon, B.: In-storage processing of database scans and joins. Inf. Sci. (Ny) 327, 183–200 (2016)CrossRef Kim, S., Oh, H., Park, C., Cho, S., Lee, S.W., Moon, B.: In-storage processing of database scans and joins. Inf. Sci. (Ny) 327, 183–200 (2016)CrossRef
14.
Zurück zum Zitat Minutoli, M., Kuntz, S.K., Tumeo, A., Kogge, P.M.: Implementing Radix Sort on Emu 1. Work. Near-Data Process, pp. 1–6 (2015) Minutoli, M., Kuntz, S.K., Tumeo, A., Kogge, P.M.: Implementing Radix Sort on Emu 1. Work. Near-Data Process, pp. 1–6 (2015)
15.
Zurück zum Zitat Riedel, E., Gibson, G.A., Faloutsos, C.: Active storage for large-scale data mining and multimedia. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 62–73. VLDB, Morgan Kaufmann Publishers Inc., San Francisco (1998) Riedel, E., Gibson, G.A., Faloutsos, C.: Active storage for large-scale data mining and multimedia. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 62–73. VLDB, Morgan Kaufmann Publishers Inc., San Francisco (1998)
16.
Zurück zum Zitat Vinçon, T., Hardock, S., Riegger, C., Oppermann, J., Koch, A., Petrov, I.: NoFTL-KV: Tacklingwrite-amplification on KV-stores with native storage management. In: EDBT (2018) Vinçon, T., Hardock, S., Riegger, C., Oppermann, J., Koch, A., Petrov, I.: NoFTL-KV: Tacklingwrite-amplification on KV-stores with native storage management. In: EDBT (2018)
17.
Zurück zum Zitat Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: OSDI (2006) Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: OSDI (2006)
18.
Zurück zum Zitat Weil, S.A., Leung, A.W., Brandt, S.A., Maltzahn, C.: RADOS: a scalable, reliable storage service for petabyte-scale storage clusters. In: PDSW (2007) Weil, S.A., Leung, A.W., Brandt, S.A., Maltzahn, C.: RADOS: a scalable, reliable storage service for petabyte-scale storage clusters. In: PDSW (2007)
19.
Zurück zum Zitat Woods, L., Teubner, J., Alonso, G.: Less watts, more performance. In: Proceedings 2013 Int. Conference Management of Data - SIGMOD, p. 1073. ACM Press, New York (2013) Woods, L., Teubner, J., Alonso, G.: Less watts, more performance. In: Proceedings 2013 Int. Conference Management of Data - SIGMOD, p. 1073. ACM Press, New York (2013)
Metadaten
Titel
nativeNDP: Processing Big Data Analytics on Native Storage Nodes
verfasst von
Tobias Vinçon
Sergey Hardock
Christian Riegger
Andreas Koch
Ilia Petrov
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-28730-6_9

Premium Partner