Skip to main content
Erschienen in: International Journal of Parallel Programming 3/2019

09.04.2019

A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks

verfasst von: Bo Wang, Jie Tang, Rui Zhang, Wei Ding, Deyu Qi

Erschienen in: International Journal of Parallel Programming | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Artificial intelligence applications that greatly depend on deep learning and compute vision processing becomes popular. Their strong demands for low-latency or real-time services make Spark, an in-memory big data computing framework, the best choice in taking place of previous disk-based big data computing. As an in-memory framework, reasonable data arrangement in storage is the key factor of performance. However, the existing cache replacement strategy and storage selection mechanism based optimizations all rely on an imprecise available memory model and will lead to negative decision. To address this issue, we propose an available memory model to capture the accurate information of to be freed memory space by sensing the dependencies between the data. And we also propose a maximum memory requirement model for execution prediction to exclude the redundancy from inactive blocks. With such two models, we build DASS, a dependency-aware storage selection mechanism for Spark to make dynamic and fine-grained storage decision. Our experiments show that compared with previous methods the DASS could effectively reduce the cost of garbage collection and RDD blocks re-computing, give better computing performance by 77.4%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
An extended algorithm of CSAS.
 
Literatur
1.
Zurück zum Zitat Yu, Y., Wang, W., Zhang, J., Letaief, K.B.: LRC: dependency-aware cache management for data analytics clusters (2017) Yu, Y., Wang, W., Zhang, J., Letaief, K.B.: LRC: dependency-aware cache management for data analytics clusters (2017)
2.
Zurück zum Zitat Liu, Z., Ng, T.S.E.: Leaky buffer: a novel abstraction for relieving memory pressure from cluster data processing frameworks. IEEE Trans. Parallel Distrib. Syst. 28(1), 128–140 (2017)CrossRef Liu, Z., Ng, T.S.E.: Leaky buffer: a novel abstraction for relieving memory pressure from cluster data processing frameworks. IEEE Trans. Parallel Distrib. Syst. 28(1), 128–140 (2017)CrossRef
8.
Zurück zum Zitat Saha, B., Shah, H., Seth, S., Vijayaraghavan, G., Murthy, A., Curino, C.: Apache Tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp. 1357–1369 (2015). https://doi.org/10.1145/2723372.2742790 Saha, B., Shah, H., Seth, S., Vijayaraghavan, G., Murthy, A., Curino, C.: Apache Tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp. 1357–1369 (2015). https://​doi.​org/​10.​1145/​2723372.​2742790
10.
Zurück zum Zitat Nicolae, B., Costa, C.H.A., Misale, C., Katrinis, K., Park, Y.: Leveraging adaptive I/O to optimize collective data shuffling patterns for big data analytics. IEEE Trans. Parallel Distrib. Syst. 28(6), 1663–1674 (2017)CrossRef Nicolae, B., Costa, C.H.A., Misale, C., Katrinis, K., Park, Y.: Leveraging adaptive I/O to optimize collective data shuffling patterns for big data analytics. IEEE Trans. Parallel Distrib. Syst. 28(6), 1663–1674 (2017)CrossRef
14.
Zurück zum Zitat Nguyen, K., Wang, K., Bu, Y., Fang, L., Hu, J., Xu, G.: Facade: a compiler and runtime for (almost) object-bounded big data applications. SIGPLAN Not. 50(4), 675–690 (2015)CrossRef Nguyen, K., Wang, K., Bu, Y., Fang, L., Hu, J., Xu, G.: Facade: a compiler and runtime for (almost) object-bounded big data applications. SIGPLAN Not. 50(4), 675–690 (2015)CrossRef
15.
Zurück zum Zitat Koliopoulos, A.K., Yiapanis, P., Tekiner, F., Nenadic, G., Keane, J.: Towards automatic memory tuning for in-memory big data analytics in clusters. In: Proceedings 2016 IEEE international congress on big data (BigData congress), pp. 353–356 (2016) Koliopoulos, A.K., Yiapanis, P., Tekiner, F., Nenadic, G., Keane, J.: Towards automatic memory tuning for in-memory big data analytics in clusters. In: Proceedings 2016 IEEE international congress on big data (BigData congress), pp. 353–356 (2016)
16.
Zurück zum Zitat Wang, B., Tang, J., Zhang, R., Gu, Z.: CSAS: cost-based storage auto-selection, a fine grained storage selection mechanism for spark. In: Proceedings network and parallel computing: 14th IFIP WG 10.3 international conference (NPC 2017), pp. 150–154 (2017). https://doi.org/10.1007/978-3-319-68210-5_18 Wang, B., Tang, J., Zhang, R., Gu, Z.: CSAS: cost-based storage auto-selection, a fine grained storage selection mechanism for spark. In: Proceedings network and parallel computing: 14th IFIP WG 10.3 international conference (NPC 2017), pp. 150–154 (2017). https://​doi.​org/​10.​1007/​978-3-319-68210-5_​18
17.
Zurück zum Zitat Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform spark. In: Proceedings the 12th ACM international conference on computing frontiers, pp. 1–8 (2015). https://doi.org/10.1145/2742854.2747283 Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform spark. In: Proceedings the 12th ACM international conference on computing frontiers, pp. 1–8 (2015). https://​doi.​org/​10.​1145/​2742854.​2747283
18.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Das, T., Dave, Ma, AJ., Mccauley, M., Franklin, MJ., Shenker, S., Stoica, I. : Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings the 9th USENIX conference on networked systems design and im-plementation, pp. 2 (2012) Zaharia, M., Chowdhury, M., Das, T., Dave, Ma, AJ., Mccauley, M., Franklin, MJ., Shenker, S., Stoica, I. : Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings the 9th USENIX conference on networked systems design and im-plementation, pp. 2 (2012)
20.
Zurück zum Zitat Chen, Q.A., et al.: Parameter optimization for spark jobs based on runtime data analysis. China Comput. Eng. Sci. 38(1), 11–19 (2016) Chen, Q.A., et al.: Parameter optimization for spark jobs based on runtime data analysis. China Comput. Eng. Sci. 38(1), 11–19 (2016)
22.
Zurück zum Zitat Wang, G.L. et al.: A performance automatic optimization method for spark, Patent CN 105868019 A (2016) Wang, G.L. et al.: A performance automatic optimization method for spark, Patent CN 105868019 A (2016)
23.
Zurück zum Zitat Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of mapreduce programs. In: Proceedings of the VLDB, pp. 1111–1122 (2011) Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of mapreduce programs. In: Proceedings of the VLDB, pp. 1111–1122 (2011)
24.
Zurück zum Zitat Geng, Y., Shi, X., Pei, C., Jin, H., Jiang, W.: LCS: an efficient data eviction strategy for Spark. Int. J. Parallel Program. 45, 1–13 (2016) Geng, Y., Shi, X., Pei, C., Jin, H., Jiang, W.: LCS: an efficient data eviction strategy for Spark. Int. J. Parallel Program. 45, 1–13 (2016)
25.
Zurück zum Zitat Duan, M., et al.: Selection and replacement algorithms for memory performance improvement in spark. Concurr. Comput. Pract. Exp. 28(8), 2473–2486 (2016)CrossRef Duan, M., et al.: Selection and replacement algorithms for memory performance improvement in spark. Concurr. Comput. Pract. Exp. 28(8), 2473–2486 (2016)CrossRef
26.
Zurück zum Zitat Zhao, Y., et al.: An adaptive tuning strategy on spark based on in-memory computation characteristics. In: Proceedings ICACT, pp. 484–488 (2016) Zhao, Y., et al.: An adaptive tuning strategy on spark based on in-memory computation characteristics. In: Proceedings ICACT, pp. 484–488 (2016)
Metadaten
Titel
A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks
verfasst von
Bo Wang
Jie Tang
Rui Zhang
Wei Ding
Deyu Qi
Publikationsdatum
09.04.2019
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 3/2019
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-018-0612-8

Weitere Artikel der Ausgabe 3/2019

International Journal of Parallel Programming 3/2019 Zur Ausgabe