Top

The Journal of Supercomputing

Published in:

27-07-2022

Memory management optimization strategy in Spark framework based on less contention

Authors: Yixin Song, Junyang Yu, JinJiang Wang, Xin He

Published in: The Journal of Supercomputing | Issue 2/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The parallel computing framework Spark 2.x adopts a unified memory management model. In the case of the memory bottleneck, the memory allocation of active tasks and the RDD(Resilient Distributed Datasets) cache causes memory contention, which may reduce computing resource utilization and persistence acceleration effects, thus affecting program execution efficiency. To this end, we propose less contention management strategy, abbreviated as MCM, to reduce the negative impact of memory contention. MCM is divided into two steps: Firstly, the task minimum memory priority guarantee algorithm is priority to meet the minimum resources of tasks for execution, to optimize the number of active tasks. Secondly, considering contention costs, the persisted location selection algorithm dynamically selects the best storage location to improve the effect of persistence acceleration. The experimental results comfirm that MCM has wonderful adaptability and scalability. In the case of serious memory bottleneck, MCM obviously reduces job execution time. Compared with similar works, such as only_memory, only_disk, memory_and_disk, DMAOM and SACM, MCM reduces the execution time by 28.3% .

previous article A hybrid bi-objective scheduling algorithm for execution of scientific workflows on cloud platforms with execution time and reliability approach

next article Application of edge computing combined with deep learning model in the dynamic evolution of network public opinion in emergencies

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Mostafaeipour A, Rafsanjani AJ, Ahmadi M, Dhanraj JA (2020) Investigating the performance of hadoop and spark platforms on machine learning algorithms. J Supercomput 77(2):1–28

Apache. Apache Spark. https://spark.apache.org. Accessed 24 Oct 2021

Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning Spark. O’Reilly Media Inc. Sebastopol pp 1-30

Zhang XW, Li ZH, Liu GS, Xu JJ, Xie TK (2018) A spark scheduling strategy for heterogeneous cluster. Comput Mater Continua 55(3):405–417

Ahmed N, Barczak A, Susnjak T et al (2020) A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench. J Big Data 110(7):1–18

Hu ZY, Shi XH, Ke ZX et al (2020) Estimating the memory consumption of big data applications based on program analysis. Scientia Sinica Inf 50(8):1178–1196

Hong-tao M, Song-ping Y, Fang L, Nong XIAO et al (2017) Research on memory management and cache replacement policies in spark. Comput Sci 44(06):37–41

Apache.Spark memory management overview. http://Spark.apache.org/docs/latest/tuning.html#memory-management-overview. Accessed 24 Oct (2021)

Zaharia M, Xin R, Wendell P et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRef

10.

Apache.Unified Memory Management in Spark 1.6. https://issues.apache.org/jira/secure/attachment/12765646/unified-memory-management-Spark-10000.pdf. Accessed 24 Dec (2019)

11.

Zhao Z, Zhang H, Geng X, Ma H (2019). Resource-aware cache management for in-memory data analytics frameworks. In: 2019 IEEE international conference on parallel & distributed processing with applications, big data & cloud computing, sustainable computing & communications, social computing & networking (ISPA/BDCloud/SocialCom/SustainCom), pp 365-371

12.

Bian C (2017) Research on key technologies of memory computing framework performance optimization (Ph.D. Thesis). Xinjiang University, China

13.

Ying C T (2017) Research on storage layer fault tolerance and optimization strategy in memory computing environment (Ph.D. Thesis). Xinjiang University, China

14.

L Yuan (2018) Research and optimization on resource usage and allocation strategy for spark (M.S. Thesis). Huazhong University of Science and Technology, China

15.

Geng Yuanzhen et al (2017) LCS: an efficient data eviction strategy for spark. Int J Parallel Prog 45(6):1285–1297CrossRef

16.

Adinew DM, Shijie Z, Liao Y (2020) Spark performance optimization analysis in memory management with deploy mode in standalone cluster computing. In: 2020 IEEE 36th international conference on data engineering (ICDE), IEEE, pp 58-69

17.

Wang SZ, Zhang YP, Zhang L, Cao N, Pang CY (2018) An improved memory cache management study based on spark. Comput Mater Continua 56(04):415–431

18.

Yun W, Yuchen Ding (2020) Research on efficient RDD self-cache replacement strategy in Spark. Appl Res Comput 37(10):3043–3047

19.

Wangjian L, Yongfeng H, Congkai Bao (2018) Memory optimization of Spark parallellel computing framework. Comput Eng Sci 40(04):587–593

20.

Wang B, Tang J, Zhang R, Ding W, Qi D (2019) LCRC: a dependency-aware cache management policy for spark. In: 2018 IEEE international conference on parallel & distributed processing with applications, ubiquitous computing & communications, big data & cloud computing, social computing & networking, sustainable computing & communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom). IEEE. pp 27-46

21.

Young N (1994) The k-server dual and loose competitiveness for paging. Algorithmica 11(6):525–541MathSciNetCrossRef

22.

Li C, Cox AL (2015) GD-Wheel: a cost-aware replacement policy for key-value stores. In: the tenth European conference on computer systems,ACM, pp 1-15

23.

Duan M, Li K, Tang Z, Xiao G, Li K (2016) Selection and replacement algorithms for memory performance improvement in Spark. Concurr Comput Pract Exp 28(8):2473–2486CrossRef

24.

Heng Liu, Liang Tan (2018) New RDD partition weight cache replacement algorithm in spark. J Chin Comput Syst 39(10):2279–2284

25.

Bian C, Yu J, Ying CT et al (2017) Self-adaptive strategy for cache management in spark. Acta Electron Sin 45(2):278–284

26.

Kang M, Lee JG (2020) Effect of garbage collection in iterative algorithms on Spark: an experimental analysis. J Supercomput 76(01):7204–7218CrossRef

27.

Bian C, Yu J, Xiu YR et al (2019) Parallelism deduction algorithm for spark. J Univ Electron Sci Technol China 48(04):567–574

28.

Zhuo T, Az A, Xz A, Li YC, Kl A (2020) Dynamic memory-aware scheduling in Spark computing environment. J Parallel Distrib Comput 14(01):10–22

29.

Xu L, Li M, Zhang L, Butt AR, Wang Y, Hu ZZ (2016) MEMTUNE: dynamic memory management for in-memory data analytic platforms. In: proceedings of IEEE international parallel and distributed processing symposium (IPDPS), pp 383–392

30.

Wang Suzhen et al (2019) A dynamic memory allocation optimization mechanism based on spark. CMC Comput Mater Continua 61(02):739–757

31.

Karau H, Warren R (2016) High performance Spark: best practices for scaling and optimizing Apache Spark. O’Reilly Media Inc, Sebastopol pp 1-10

32.

Li C, Cai Q, Luo Y (2022) Data balancing-based intermediate data partitioning and check point-based cache recovery in spark environment. Supercomput 78(08):3561–3604CrossRef

33.

Elmeiligy MA, Desouky A, Elghamrawy SM (2021) An efficient parallel indexing structure for multi-dimensional big data using spark. Supercomput 77(03):11187–11214CrossRef

34.

Raj S, Ramesh D, Sethi KK (2021) A spark-based apriori algorithm with reduced shuffle overhead. Supercomput 77(03):133–151CrossRef

35.

Zhu Z, Shen Q, Yang Y, Wu Z (2017) MCS: memory constraint strategy for unified memory manager in spark. In: IEEE international conference on parallel & distributed systems, IEEE. pp 41-60

36.

Jia D, Bhimani J, Nguyen SN, Sheng B, Mi N (2019) ATuMm: auto-tuning memory manager in apache spark. In: 2019 IEEE 38th international performance computing and communications conference (IPCCC), IEEE. pp 14-33

37.

Hussain MA, Tsai TH (2021) Memory access optimization for on-chip transfer learning. IEEE Trans Circuits Syst 68(04):1507–1519CrossRef

38.

Kumari P, Saxena AS (2021) Advanced fusion ACO approach for memory optimization in cloud computing environment. In: 2021 third international conference on intelligent communication technologies and virtual mobile networks (ICICV) pp 168-172

39.

Allen T, Ge R (2021) In-depth analyses of unified virtual memory system for GPU accelerated computing. In: the international conference for high performance computing, networking, storage and analysis pp 1-15

40.

Bender MA (2021) External-memory dictionaries in the affine and PDAM models. ACM Trans Parallel Comput 8(03):1–20MathSciNetCrossRef

41.

Saha R, Pundir YP, Pal PK (2021) Design of an area and energy-efficient last-level cache memory using STT-MRAM. J Magn Magn Mater 529(03):167882CrossRef

42.

Chaudhuri M (2021) Zero directory eviction victim: unbounded coherence directory and core cache isolation. In: 2021 IEEE international symposium on high-performance computer architecture (HPCA) IEEE, pp 277-290

43.

Apache. Apache Spark web interfaces. https://Spark.apache.org/docs/latest/monitoring.html. Accessed 24 Dec (2021)

Title: Memory management optimization strategy in Spark framework based on less contention
Authors: Yixin Song
Junyang Yu
JinJiang Wang
Xin He
Publication date: 27-07-2022
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 2/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-022-04663-5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 2/2023

Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: a comparative study

Optimized task scheduling and preemption for distributed resource management in fog-assisted IoT environment

Enabling efficient and reliable IoT deployment in 5G and LTE cellular areas for optimized service provisioning

Secure cloud storage with anonymous deduplication using ID-based key management

Graph entropies-graph energies indices for quantifying network structural irregularity

Traffic-aware dynamic controller placement in SDN using NFV

Premium Partner