Skip to main content
Top
Published in: The Journal of Supercomputing 2/2023

27-07-2022

Memory management optimization strategy in Spark framework based on less contention

Authors: Yixin Song, Junyang Yu, JinJiang Wang, Xin He

Published in: The Journal of Supercomputing | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The parallel computing framework Spark 2.x adopts a unified memory management model. In the case of the memory bottleneck, the memory allocation of active tasks and the RDD(Resilient Distributed Datasets) cache causes memory contention, which may reduce computing resource utilization and persistence acceleration effects, thus affecting program execution efficiency. To this end, we propose less contention management strategy, abbreviated as MCM, to reduce the negative impact of memory contention. MCM is divided into two steps: Firstly, the task minimum memory priority guarantee algorithm is priority to meet the minimum resources of tasks for execution, to optimize the number of active tasks. Secondly, considering contention costs, the persisted location selection algorithm dynamically selects the best storage location to improve the effect of persistence acceleration. The experimental results comfirm that MCM has wonderful adaptability and scalability. In the case of serious memory bottleneck, MCM obviously reduces job execution time. Compared with similar works, such as only_memory, only_disk, memory_and_disk, DMAOM and SACM, MCM reduces the execution time by 28.3% .

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Mostafaeipour A, Rafsanjani AJ, Ahmadi M, Dhanraj JA (2020) Investigating the performance of hadoop and spark platforms on machine learning algorithms. J Supercomput 77(2):1–28 Mostafaeipour A, Rafsanjani AJ, Ahmadi M, Dhanraj JA (2020) Investigating the performance of hadoop and spark platforms on machine learning algorithms. J Supercomput 77(2):1–28
3.
go back to reference Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning Spark. O’Reilly Media Inc. Sebastopol pp 1-30 Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning Spark. O’Reilly Media Inc. Sebastopol pp 1-30
4.
go back to reference Zhang XW, Li ZH, Liu GS, Xu JJ, Xie TK (2018) A spark scheduling strategy for heterogeneous cluster. Comput Mater Continua 55(3):405–417 Zhang XW, Li ZH, Liu GS, Xu JJ, Xie TK (2018) A spark scheduling strategy for heterogeneous cluster. Comput Mater Continua 55(3):405–417
5.
go back to reference Ahmed N, Barczak A, Susnjak T et al (2020) A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench. J Big Data 110(7):1–18 Ahmed N, Barczak A, Susnjak T et al (2020) A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench. J Big Data 110(7):1–18
6.
go back to reference Hu ZY, Shi XH, Ke ZX et al (2020) Estimating the memory consumption of big data applications based on program analysis. Scientia Sinica Inf 50(8):1178–1196 Hu ZY, Shi XH, Ke ZX et al (2020) Estimating the memory consumption of big data applications based on program analysis. Scientia Sinica Inf 50(8):1178–1196
7.
go back to reference Hong-tao M, Song-ping Y, Fang L, Nong XIAO et al (2017) Research on memory management and cache replacement policies in spark. Comput Sci 44(06):37–41 Hong-tao M, Song-ping Y, Fang L, Nong XIAO et al (2017) Research on memory management and cache replacement policies in spark. Comput Sci 44(06):37–41
9.
go back to reference Zaharia M, Xin R, Wendell P et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRef Zaharia M, Xin R, Wendell P et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRef
11.
go back to reference Zhao Z, Zhang H, Geng X, Ma H (2019). Resource-aware cache management for in-memory data analytics frameworks. In: 2019 IEEE international conference on parallel & distributed processing with applications, big data & cloud computing, sustainable computing & communications, social computing & networking (ISPA/BDCloud/SocialCom/SustainCom), pp 365-371 Zhao Z, Zhang H, Geng X, Ma H (2019). Resource-aware cache management for in-memory data analytics frameworks. In: 2019 IEEE international conference on parallel & distributed processing with applications, big data & cloud computing, sustainable computing & communications, social computing & networking (ISPA/BDCloud/SocialCom/SustainCom), pp 365-371
12.
go back to reference Bian C (2017) Research on key technologies of memory computing framework performance optimization (Ph.D. Thesis). Xinjiang University, China Bian C (2017) Research on key technologies of memory computing framework performance optimization (Ph.D. Thesis). Xinjiang University, China
13.
go back to reference Ying C T (2017) Research on storage layer fault tolerance and optimization strategy in memory computing environment (Ph.D. Thesis). Xinjiang University, China Ying C T (2017) Research on storage layer fault tolerance and optimization strategy in memory computing environment (Ph.D. Thesis). Xinjiang University, China
14.
go back to reference L Yuan (2018) Research and optimization on resource usage and allocation strategy for spark (M.S. Thesis). Huazhong University of Science and Technology, China L Yuan (2018) Research and optimization on resource usage and allocation strategy for spark (M.S. Thesis). Huazhong University of Science and Technology, China
15.
go back to reference Geng Yuanzhen et al (2017) LCS: an efficient data eviction strategy for spark. Int J Parallel Prog 45(6):1285–1297CrossRef Geng Yuanzhen et al (2017) LCS: an efficient data eviction strategy for spark. Int J Parallel Prog 45(6):1285–1297CrossRef
16.
go back to reference Adinew DM, Shijie Z, Liao Y (2020) Spark performance optimization analysis in memory management with deploy mode in standalone cluster computing. In: 2020 IEEE 36th international conference on data engineering (ICDE), IEEE, pp 58-69 Adinew DM, Shijie Z, Liao Y (2020) Spark performance optimization analysis in memory management with deploy mode in standalone cluster computing. In: 2020 IEEE 36th international conference on data engineering (ICDE), IEEE, pp 58-69
17.
go back to reference Wang SZ, Zhang YP, Zhang L, Cao N, Pang CY (2018) An improved memory cache management study based on spark. Comput Mater Continua 56(04):415–431 Wang SZ, Zhang YP, Zhang L, Cao N, Pang CY (2018) An improved memory cache management study based on spark. Comput Mater Continua 56(04):415–431
18.
go back to reference Yun W, Yuchen Ding (2020) Research on efficient RDD self-cache replacement strategy in Spark. Appl Res Comput 37(10):3043–3047 Yun W, Yuchen Ding (2020) Research on efficient RDD self-cache replacement strategy in Spark. Appl Res Comput 37(10):3043–3047
19.
go back to reference Wangjian L, Yongfeng H, Congkai Bao (2018) Memory optimization of Spark parallellel computing framework. Comput Eng Sci 40(04):587–593 Wangjian L, Yongfeng H, Congkai Bao (2018) Memory optimization of Spark parallellel computing framework. Comput Eng Sci 40(04):587–593
20.
go back to reference Wang B, Tang J, Zhang R, Ding W, Qi D (2019) LCRC: a dependency-aware cache management policy for spark. In: 2018 IEEE international conference on parallel & distributed processing with applications, ubiquitous computing & communications, big data & cloud computing, social computing & networking, sustainable computing & communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom). IEEE. pp 27-46 Wang B, Tang J, Zhang R, Ding W, Qi D (2019) LCRC: a dependency-aware cache management policy for spark. In: 2018 IEEE international conference on parallel & distributed processing with applications, ubiquitous computing & communications, big data & cloud computing, social computing & networking, sustainable computing & communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom). IEEE. pp 27-46
22.
go back to reference Li C, Cox AL (2015) GD-Wheel: a cost-aware replacement policy for key-value stores. In: the tenth European conference on computer systems,ACM, pp 1-15 Li C, Cox AL (2015) GD-Wheel: a cost-aware replacement policy for key-value stores. In: the tenth European conference on computer systems,ACM, pp 1-15
23.
go back to reference Duan M, Li K, Tang Z, Xiao G, Li K (2016) Selection and replacement algorithms for memory performance improvement in Spark. Concurr Comput Pract Exp 28(8):2473–2486CrossRef Duan M, Li K, Tang Z, Xiao G, Li K (2016) Selection and replacement algorithms for memory performance improvement in Spark. Concurr Comput Pract Exp 28(8):2473–2486CrossRef
24.
go back to reference Heng Liu, Liang Tan (2018) New RDD partition weight cache replacement algorithm in spark. J Chin Comput Syst 39(10):2279–2284 Heng Liu, Liang Tan (2018) New RDD partition weight cache replacement algorithm in spark. J Chin Comput Syst 39(10):2279–2284
25.
go back to reference Bian C, Yu J, Ying CT et al (2017) Self-adaptive strategy for cache management in spark. Acta Electron Sin 45(2):278–284 Bian C, Yu J, Ying CT et al (2017) Self-adaptive strategy for cache management in spark. Acta Electron Sin 45(2):278–284
26.
go back to reference Kang M, Lee JG (2020) Effect of garbage collection in iterative algorithms on Spark: an experimental analysis. J Supercomput 76(01):7204–7218CrossRef Kang M, Lee JG (2020) Effect of garbage collection in iterative algorithms on Spark: an experimental analysis. J Supercomput 76(01):7204–7218CrossRef
27.
go back to reference Bian C, Yu J, Xiu YR et al (2019) Parallelism deduction algorithm for spark. J Univ Electron Sci Technol China 48(04):567–574 Bian C, Yu J, Xiu YR et al (2019) Parallelism deduction algorithm for spark. J Univ Electron Sci Technol China 48(04):567–574
28.
go back to reference Zhuo T, Az A, Xz A, Li YC, Kl A (2020) Dynamic memory-aware scheduling in Spark computing environment. J Parallel Distrib Comput 14(01):10–22 Zhuo T, Az A, Xz A, Li YC, Kl A (2020) Dynamic memory-aware scheduling in Spark computing environment. J Parallel Distrib Comput 14(01):10–22
29.
go back to reference Xu L, Li M, Zhang L, Butt AR, Wang Y, Hu ZZ (2016) MEMTUNE: dynamic memory management for in-memory data analytic platforms. In: proceedings of IEEE international parallel and distributed processing symposium (IPDPS), pp 383–392 Xu L, Li M, Zhang L, Butt AR, Wang Y, Hu ZZ (2016) MEMTUNE: dynamic memory management for in-memory data analytic platforms. In: proceedings of IEEE international parallel and distributed processing symposium (IPDPS), pp 383–392
30.
go back to reference Wang Suzhen et al (2019) A dynamic memory allocation optimization mechanism based on spark. CMC Comput Mater Continua 61(02):739–757 Wang Suzhen et al (2019) A dynamic memory allocation optimization mechanism based on spark. CMC Comput Mater Continua 61(02):739–757
31.
go back to reference Karau H, Warren R (2016) High performance Spark: best practices for scaling and optimizing Apache Spark. O’Reilly Media Inc, Sebastopol pp 1-10 Karau H, Warren R (2016) High performance Spark: best practices for scaling and optimizing Apache Spark. O’Reilly Media Inc, Sebastopol pp 1-10
32.
go back to reference Li C, Cai Q, Luo Y (2022) Data balancing-based intermediate data partitioning and check point-based cache recovery in spark environment. Supercomput 78(08):3561–3604CrossRef Li C, Cai Q, Luo Y (2022) Data balancing-based intermediate data partitioning and check point-based cache recovery in spark environment. Supercomput 78(08):3561–3604CrossRef
33.
go back to reference Elmeiligy MA, Desouky A, Elghamrawy SM (2021) An efficient parallel indexing structure for multi-dimensional big data using spark. Supercomput 77(03):11187–11214CrossRef Elmeiligy MA, Desouky A, Elghamrawy SM (2021) An efficient parallel indexing structure for multi-dimensional big data using spark. Supercomput 77(03):11187–11214CrossRef
34.
go back to reference Raj S, Ramesh D, Sethi KK (2021) A spark-based apriori algorithm with reduced shuffle overhead. Supercomput 77(03):133–151CrossRef Raj S, Ramesh D, Sethi KK (2021) A spark-based apriori algorithm with reduced shuffle overhead. Supercomput 77(03):133–151CrossRef
35.
go back to reference Zhu Z, Shen Q, Yang Y, Wu Z (2017) MCS: memory constraint strategy for unified memory manager in spark. In: IEEE international conference on parallel & distributed systems, IEEE. pp 41-60 Zhu Z, Shen Q, Yang Y, Wu Z (2017) MCS: memory constraint strategy for unified memory manager in spark. In: IEEE international conference on parallel & distributed systems, IEEE. pp 41-60
36.
go back to reference Jia D, Bhimani J, Nguyen SN, Sheng B, Mi N (2019) ATuMm: auto-tuning memory manager in apache spark. In: 2019 IEEE 38th international performance computing and communications conference (IPCCC), IEEE. pp 14-33 Jia D, Bhimani J, Nguyen SN, Sheng B, Mi N (2019) ATuMm: auto-tuning memory manager in apache spark. In: 2019 IEEE 38th international performance computing and communications conference (IPCCC), IEEE. pp 14-33
37.
go back to reference Hussain MA, Tsai TH (2021) Memory access optimization for on-chip transfer learning. IEEE Trans Circuits Syst 68(04):1507–1519CrossRef Hussain MA, Tsai TH (2021) Memory access optimization for on-chip transfer learning. IEEE Trans Circuits Syst 68(04):1507–1519CrossRef
38.
go back to reference Kumari P, Saxena AS (2021) Advanced fusion ACO approach for memory optimization in cloud computing environment. In: 2021 third international conference on intelligent communication technologies and virtual mobile networks (ICICV) pp 168-172 Kumari P, Saxena AS (2021) Advanced fusion ACO approach for memory optimization in cloud computing environment. In: 2021 third international conference on intelligent communication technologies and virtual mobile networks (ICICV) pp 168-172
39.
go back to reference Allen T, Ge R (2021) In-depth analyses of unified virtual memory system for GPU accelerated computing. In: the international conference for high performance computing, networking, storage and analysis pp 1-15 Allen T, Ge R (2021) In-depth analyses of unified virtual memory system for GPU accelerated computing. In: the international conference for high performance computing, networking, storage and analysis pp 1-15
40.
go back to reference Bender MA (2021) External-memory dictionaries in the affine and PDAM models. ACM Trans Parallel Comput 8(03):1–20MathSciNetCrossRef Bender MA (2021) External-memory dictionaries in the affine and PDAM models. ACM Trans Parallel Comput 8(03):1–20MathSciNetCrossRef
41.
go back to reference Saha R, Pundir YP, Pal PK (2021) Design of an area and energy-efficient last-level cache memory using STT-MRAM. J Magn Magn Mater 529(03):167882CrossRef Saha R, Pundir YP, Pal PK (2021) Design of an area and energy-efficient last-level cache memory using STT-MRAM. J Magn Magn Mater 529(03):167882CrossRef
42.
go back to reference Chaudhuri M (2021) Zero directory eviction victim: unbounded coherence directory and core cache isolation. In: 2021 IEEE international symposium on high-performance computer architecture (HPCA) IEEE, pp 277-290 Chaudhuri M (2021) Zero directory eviction victim: unbounded coherence directory and core cache isolation. In: 2021 IEEE international symposium on high-performance computer architecture (HPCA) IEEE, pp 277-290
Metadata
Title
Memory management optimization strategy in Spark framework based on less contention
Authors
Yixin Song
Junyang Yu
JinJiang Wang
Xin He
Publication date
27-07-2022
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 2/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-022-04663-5

Other articles of this Issue 2/2023

The Journal of Supercomputing 2/2023 Go to the issue

Premium Partner