nach oben

International Journal of Parallel Programming

Erschienen in:

09.04.2019

A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks

verfasst von: Bo Wang, Jie Tang, Rui Zhang, Wei Ding, Deyu Qi

Erschienen in: International Journal of Parallel Programming | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Artificial intelligence applications that greatly depend on deep learning and compute vision processing becomes popular. Their strong demands for low-latency or real-time services make Spark, an in-memory big data computing framework, the best choice in taking place of previous disk-based big data computing. As an in-memory framework, reasonable data arrangement in storage is the key factor of performance. However, the existing cache replacement strategy and storage selection mechanism based optimizations all rely on an imprecise available memory model and will lead to negative decision. To address this issue, we propose an available memory model to capture the accurate information of to be freed memory space by sensing the dependencies between the data. And we also propose a maximum memory requirement model for execution prediction to exclude the redundancy from inactive blocks. With such two models, we build DASS, a dependency-aware storage selection mechanism for Spark to make dynamic and fine-grained storage decision. Our experiments show that compared with previous methods the DASS could effectively reduce the cost of garbage collection and RDD blocks re-computing, give better computing performance by 77.4%.

Vorheriger Artikel Migration Cost and Energy-Aware Virtual Machine Consolidation Under Cloud Environments Considering Remaining Runtime

Nächster Artikel ElasticActor: An Actor System with Automatic Granularity Adjustment

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

An extended algorithm of CSAS.

Yu, Y., Wang, W., Zhang, J., Letaief, K.B.: LRC: dependency-aware cache management for data analytics clusters (2017)

Liu, Z., Ng, T.S.E.: Leaky buffer: a novel abstraction for relieving memory pressure from cluster data processing frameworks. IEEE Trans. Parallel Distrib. Syst. 28(1), 128–140 (2017)CrossRef

Apache Sparkhttp://Spark.apache.org/

Caffe http://caffe.berkeleyvision.org/

TensorFlow https://www.tensorflow.org/

CaffeOnSpark https://github.com/yahoo/CaffeOnSpark

TensorFlowOnSpark https://github.com/yahoo/TensorFlowOnSpark

Saha, B., Shah, H., Seth, S., Vijayaraghavan, G., Murthy, A., Curino, C.: Apache Tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp. 1357–1369 (2015). https://doi.org/10.1145/2723372.2742790

Apache Flinkhttp://flink.apache.org/

10.

Nicolae, B., Costa, C.H.A., Misale, C., Katrinis, K., Park, Y.: Leveraging adaptive I/O to optimize collective data shuffling patterns for big data analytics. IEEE Trans. Parallel Distrib. Syst. 28(6), 1663–1674 (2017)CrossRef

11.

Mattson, R.L., et al.: Evaluation techniques for storage hierarchies. IBM Syst. J. 9(2), 78–117 (1970). https://doi.org/10.1147/sj.92.0078 CrossRef

12.

Aho, A.V., et al.: Principles of optimal page replacement. J. ACM 18(1), 80–93 (1971). https://doi.org/10.1145/321623.321632 MathSciNetCrossRefMATH

13.

Nguyen, K., Fang, L., Xu, G., Demsky, B.: Speculative region-based memory management for big data systems. In: Proceedings of the 8th workshop on programming languages and operating systems, pp. 27–32 (2015). https://doi.org/10.1145/2818302.2818308

14.

Nguyen, K., Wang, K., Bu, Y., Fang, L., Hu, J., Xu, G.: Facade: a compiler and runtime for (almost) object-bounded big data applications. SIGPLAN Not. 50(4), 675–690 (2015)CrossRef

15.

Koliopoulos, A.K., Yiapanis, P., Tekiner, F., Nenadic, G., Keane, J.: Towards automatic memory tuning for in-memory big data analytics in clusters. In: Proceedings 2016 IEEE international congress on big data (BigData congress), pp. 353–356 (2016)

16.

Wang, B., Tang, J., Zhang, R., Gu, Z.: CSAS: cost-based storage auto-selection, a fine grained storage selection mechanism for spark. In: Proceedings network and parallel computing: 14th IFIP WG 10.3 international conference (NPC 2017), pp. 150–154 (2017). https://doi.org/10.1007/978-3-319-68210-5_18

17.

Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform spark. In: Proceedings the 12th ACM international conference on computing frontiers, pp. 1–8 (2015). https://doi.org/10.1145/2742854.2747283

18.

Zaharia, M., Chowdhury, M., Das, T., Dave, Ma, AJ., Mccauley, M., Franklin, MJ., Shenker, S., Stoica, I. : Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings the 9th USENIX conference on networked systems design and im-plementation, pp. 2 (2012)

19.

Spark tuning http://spark.apache.org/docs/latest/tuning.html#tuning-spark

20.

Chen, Q.A., et al.: Parameter optimization for spark jobs based on runtime data analysis. China Comput. Eng. Sci. 38(1), 11–19 (2016)

21.

Khan, M., et al.: Optimizing hadoop parameter settings with gene expression programming guided PSO. Concurr. Comput. Pract. Exp. 29(3), e3786 (2017) https://doi.org/10.1002/cpe.3786

22.

Wang, G.L. et al.: A performance automatic optimization method for spark, Patent CN 105868019 A (2016)

23.

Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of mapreduce programs. In: Proceedings of the VLDB, pp. 1111–1122 (2011)

24.

Geng, Y., Shi, X., Pei, C., Jin, H., Jiang, W.: LCS: an efficient data eviction strategy for Spark. Int. J. Parallel Program. 45, 1–13 (2016)

25.

Duan, M., et al.: Selection and replacement algorithms for memory performance improvement in spark. Concurr. Comput. Pract. Exp. 28(8), 2473–2486 (2016)CrossRef

26.

Zhao, Y., et al.: An adaptive tuning strategy on spark based on in-memory computation characteristics. In: Proceedings ICACT, pp. 484–488 (2016)

Titel: A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks
verfasst von: Bo Wang
Jie Tang
Rui Zhang
Wei Ding
Deyu Qi
Publikationsdatum: 09.04.2019
Verlag: Springer US
Erschienen in: International Journal of Parallel Programming / Ausgabe 3/2019
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-018-0612-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2019

Lightweight and Accurate Memory Allocation in Key-Value Cache

Improving the Performance of Distributed MXNet with RDMA

Optimizing Sparse Matrix–Vector Multiplications on an ARMv8-based Many-Core Architecture

BSHIFT: A Low Cost Deep Neural Networks Accelerator

HARE: History-Aware Adaptive Routing Algorithm for Endpoint Congestion in Networks-on-Chip

Training Deep Nets with Progressive Batch Normalization on Multi-GPUs