Skip to main content
Erschienen in: Computing 6/2022

14.02.2022 | Regular Paper

Scalability and performance analysis of BDPS in clouds

verfasst von: Yuegang Li, Dongyang Ou, Xin Zhou, Congfeng Jiang, Christophe Cérin

Erschienen in: Computing | Ausgabe 6/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The increasing demand for big data processing leads to commercial off-the-shelf (COTS) and cloud-based big data analytics services. Giant cloud service vendors provide customized big data processing systems (BDPS), which are more cost-effective for operation and maintenance than self-owned platforms. End users can rent big data analytics services with a pay-as-you-go cost model. However, when users’ data size increases, they need to scale their rental BDPS in order to achieve approximately the same performance, such as task completion time and normalized system throughput. Unfortunately, there is no effective way to help end-users to choose between scale-up direction and scale-out direction to expand their existing rental BDPS. Moreover, there is no any metric to measure the scalability of BDPS, either. Furthermore, the performance of BDPS services at different time slots is not consistent due to co-location and workload placement policies in modern internet data centers. To this end, this paper proposes scalability metric for BDPS in clouds, which can mitigate the aforementioned issues. This scalability metric quantifies the scalability of BDPS consistently under different system expansion configurations. This paper also conducts experiments on real BDPS platforms and derives optimization approaches for better scalability of BDPS, such as file compression during Shuffle process in MapReduce. The experiment results demonstrate the validity of the proposed optimization strategies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
20.
Zurück zum Zitat Ahmad F, Lee S, Thottethodi M, Vijaykumar T (2012) PUMA: purdue mapreduce benchmarks suite Ahmad F, Lee S, Thottethodi M, Vijaykumar T (2012) PUMA: purdue mapreduce benchmarks suite
25.
Zurück zum Zitat Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) Samr: a self-adaptive mapreduce scheduling algorithm in heterogeneous environment. In: 2010 10th IEEE international conference on computer and information technology. IEEE, pp 2736–2743. https://doi.org/10.1109/CIT.2010.458 Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) Samr: a self-adaptive mapreduce scheduling algorithm in heterogeneous environment. In: 2010 10th IEEE international conference on computer and information technology. IEEE, pp 2736–2743. https://​doi.​org/​10.​1109/​CIT.​2010.​458
29.
Zurück zum Zitat Echihabi K, Zoumpatianos K, Palpanas T (2020) Big sequence management: on scalability. In: Proceedings of the IEEE international conference on Big Data. IEEE BigData Echihabi K, Zoumpatianos K, Palpanas T (2020) Big sequence management: on scalability. In: Proceedings of the IEEE international conference on Big Data. IEEE BigData
31.
Zurück zum Zitat Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: ACM SIGPLAN Notices, vol 47. ACM, pp 37–48. https://doi.org/10.1145/2150976.2150982 Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: ACM SIGPLAN Notices, vol 47. ACM, pp 37–48. https://​doi.​org/​10.​1145/​2150976.​2150982
32.
Zurück zum Zitat Gao J, Manjula K, Roopa P, Sumalatha E, Bai X, Tsai WT, Uehara T (2012) A cloud-based TaaS infrastructure with tools for SaaS validation, performance and scalability evaluation. In: 4th IEEE international conference on cloud computing technology and science proceedings. IEEE, pp 464–471. https://doi.org/10.1109/CloudCom.2012.6427555 Gao J, Manjula K, Roopa P, Sumalatha E, Bai X, Tsai WT, Uehara T (2012) A cloud-based TaaS infrastructure with tools for SaaS validation, performance and scalability evaluation. In: 4th IEEE international conference on cloud computing technology and science proceedings. IEEE, pp 464–471. https://​doi.​org/​10.​1109/​CloudCom.​2012.​6427555
34.
Zurück zum Zitat Garate-Escamilla AK, El Hassani AH, Andres E (2019) Big data scalability based on spark machine learning libraries. In: Proceedings of the 2019 3rd international conference on Big Data research, pp 166–171 Garate-Escamilla AK, El Hassani AH, Andres E (2019) Big data scalability based on spark machine learning libraries. In: Proceedings of the 2019 3rd international conference on Big Data research, pp 166–171
36.
37.
Zurück zum Zitat Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen HA (2013) BigBench: Towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, pp 1197–1208. https://doi.org/10.1145/2463676.2463712 Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen HA (2013) BigBench: Towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, pp 1197–1208. https://​doi.​org/​10.​1145/​2463676.​2463712
38.
Zurück zum Zitat Govindaraju V, Idicula S, Agrawal S, Vardarajan V, Raghavan A, Wen J, Balkesen C, Giannikis G, Agarwal N, Sedlar E (2017) Big data processing: scalability with extreme single-node performance. In: 2017 IEEE international congress on Big Data (BigData Congress). IEEE, pp 129–136. https://doi.org/10.1109/BigDataCongress.2017.26 Govindaraju V, Idicula S, Agrawal S, Vardarajan V, Raghavan A, Wen J, Balkesen C, Giannikis G, Agarwal N, Sedlar E (2017) Big data processing: scalability with extreme single-node performance. In: 2017 IEEE international congress on Big Data (BigData Congress). IEEE, pp 129–136. https://​doi.​org/​10.​1109/​BigDataCongress.​2017.​26
39.
Zurück zum Zitat Grama A, Gupta A, Kumar V (1996) Isoefficiency function: a scalability metric for parallel algorithms and architectures. IEEE Trans Parallel Distrib Syst 4(8):12–21 Grama A, Gupta A, Kumar V (1996) Isoefficiency function: a scalability metric for parallel algorithms and architectures. IEEE Trans Parallel Distrib Syst 4(8):12–21
44.
Zurück zum Zitat Henning S, Hasselbring W (2021) How to measure scalability of distributed stream processing engines? In: Companion of the ACM/SPEC international conference on performance engineering, pp 85–88 Henning S, Hasselbring W (2021) How to measure scalability of distributed stream processing engines? In: Companion of the ACM/SPEC international conference on performance engineering, pp 85–88
45.
49.
Zurück zum Zitat Jiang C, Fan T, Gao H, Shi W, Liu L, Cerin C, Wan J (2020) Energy aware edge computing: a survey. Comput Commun 151:556–580CrossRef Jiang C, Fan T, Gao H, Shi W, Liu L, Cerin C, Wan J (2020) Energy aware edge computing: a survey. Comput Commun 151:556–580CrossRef
51.
Zurück zum Zitat Jiang C, Han G, Lin J, Jia G, Shi W, Wan J (2019) Characteristics of co-allocated online services and batch jobs in internet data centers: a case study from Alibaba cloud. IEEE Access 7:22495–22508CrossRef Jiang C, Han G, Lin J, Jia G, Shi W, Wan J (2019) Characteristics of co-allocated online services and batch jobs in internet data centers: a case study from Alibaba cloud. IEEE Access 7:22495–22508CrossRef
52.
Zurück zum Zitat Jiang C, Qiu Y, Shi W, Ge Z, Wang J, Chen S, Cerin C, Ren Z, Xu G, Lin J (2020) Characterizing co-located workloads in Alibaba cloud datacenters. IEEE Trans Cloud Comput Jiang C, Qiu Y, Shi W, Ge Z, Wang J, Chen S, Cerin C, Ren Z, Xu G, Lin J (2020) Characterizing co-located workloads in Alibaba cloud datacenters. IEEE Trans Cloud Comput
55.
Zurück zum Zitat Jiang C, Wang Y, Ou D, Qiu Y, Li Y, Wan J, Luo B, Shi W, Cerin C (2018) Ease: energy efficiency and proportionality aware virtual machine scheduling. In: 2018 30th international symposium on computer architecture and high performance computing (SBAC-PAD). IEEE, pp 65–68 Jiang C, Wang Y, Ou D, Qiu Y, Li Y, Wan J, Luo B, Shi W, Cerin C (2018) Ease: energy efficiency and proportionality aware virtual machine scheduling. In: 2018 30th international symposium on computer architecture and high performance computing (SBAC-PAD). IEEE, pp 65–68
58.
Zurück zum Zitat Lee JY, Lee JW, Kim SD, et al (2009) A quality model for evaluating software-as-a-service in cloud computing. In: 2009 seventh ACIS international conference on software engineering research, management and applications. IEEE, pp 261–266. https://doi.org/10.1109/SERA.2009.43 Lee JY, Lee JW, Kim SD, et al (2009) A quality model for evaluating software-as-a-service in cloud computing. In: 2009 seventh ACIS international conference on software engineering research, management and applications. IEEE, pp 261–266. https://​doi.​org/​10.​1109/​SERA.​2009.​43
59.
Zurück zum Zitat Li M, Tan J, Wang Y, Zhang L, Salapura V (2015) Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform Spark. In: Proceedings of the 12th ACM international conference on computing frontiers. ACM, p 53. https://doi.org/10.1145/2742854.2747283 Li M, Tan J, Wang Y, Zhang L, Salapura V (2015) Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform Spark. In: Proceedings of the 12th ACM international conference on computing frontiers. ACM, p 53. https://​doi.​org/​10.​1145/​2742854.​2747283
62.
Zurück zum Zitat Marco VS, Taylor B, Porter B, Wang Z (2017) Improving Spark application throughput via memory aware task co-location: a mixture of experts approach. In: Proceedings of the 18th ACM/IFIP/USENIX middleware conference. ACM, pp 95–108. https://doi.org/10.1145/3135974.3135984 Marco VS, Taylor B, Porter B, Wang Z (2017) Improving Spark application throughput via memory aware task co-location: a mixture of experts approach. In: Proceedings of the 18th ACM/IFIP/USENIX middleware conference. ACM, pp 95–108. https://​doi.​org/​10.​1145/​3135974.​3135984
66.
Zurück zum Zitat Nguyen N, Khan MMH, Albayram Y, Wang K (2017) Understanding the influence of configuration settings: an execution model-driven framework for Apache Spark platform. In: 2017 IEEE 10th international conference on cloud computing (CLOUD). IEEE, pp 802–807. https://doi.org/10.1109/CLOUD.2017.119 Nguyen N, Khan MMH, Albayram Y, Wang K (2017) Understanding the influence of configuration settings: an execution model-driven framework for Apache Spark platform. In: 2017 IEEE 10th international conference on cloud computing (CLOUD). IEEE, pp 802–807. https://​doi.​org/​10.​1109/​CLOUD.​2017.​119
67.
Zurück zum Zitat Nguyen N, Khan MMH, Wang K (2016) Csminer: an automated tool for analyzing changes in configuration settings across multiple versions of large scale cloud software. In: 2016 IEEE 9th international conference on cloud computing (CLOUD). IEEE, pp 472–480. https://doi.org/10.1109/CLOUD.2016.0069 Nguyen N, Khan MMH, Wang K (2016) Csminer: an automated tool for analyzing changes in configuration settings across multiple versions of large scale cloud software. In: 2016 IEEE 9th international conference on cloud computing (CLOUD). IEEE, pp 472–480. https://​doi.​org/​10.​1109/​CLOUD.​2016.​0069
68.
Zurück zum Zitat Ousterhout K, Rasti R, Ratnasamy S, Shenker S, Chun BG (2015) Making sense of performance in data analytics frameworks. In: 12th \(\{\)USENIX\(\}\) symposium on networked systems design and implementation (\(\{\)NSDI\(\}\) 15), pp 293–307 Ousterhout K, Rasti R, Ratnasamy S, Shenker S, Chun BG (2015) Making sense of performance in data analytics frameworks. In: 12th \(\{\)USENIX\(\}\) symposium on networked systems design and implementation (\(\{\)NSDI\(\}\) 15), pp 293–307
71.
Zurück zum Zitat Raïs I, Balouek-Thomert D, Orgerie A.C, Lefèvre L, Parashar M (2019) Leveraging energy-efficient non-lossy compression for data-intensive applications. In: 2019 international conference on high performance computing & simulation (HPCS). IEEE Raïs I, Balouek-Thomert D, Orgerie A.C, Lefèvre L, Parashar M (2019) Leveraging energy-efficient non-lossy compression for data-intensive applications. In: 2019 international conference on high performance computing & simulation (HPCS). IEEE
73.
Zurück zum Zitat Sandel R, Shtern M, Fokaefs M, Litoiu M (2015) Evaluating cluster configurations for big data processing: an exploratory study. In: 2015 IEEE 9th international symposium on the maintenance and evolution of service-oriented and cloud-based environments (MESOCA). IEEE, pp 23–30. https://doi.org/10.1109/MESOCA.2015.7328122 Sandel R, Shtern M, Fokaefs M, Litoiu M (2015) Evaluating cluster configurations for big data processing: an exploratory study. In: 2015 IEEE 9th international symposium on the maintenance and evolution of service-oriented and cloud-based environments (MESOCA). IEEE, pp 23–30. https://​doi.​org/​10.​1109/​MESOCA.​2015.​7328122
74.
77.
Zurück zum Zitat Wang G, Xu J, He B (2016) A novel method for tuning configuration parameters of Spark based on machine learning. In: 2016 IEEE 18th international conference on high performance computing and communications; IEEE 14th international conference on smart city; IEEE 2nd international conference on data science and systems (HPCC/SmartCity/DSS). IEEE, pp 586–593. https://doi.org/10.1109/HPCC-SmartCity-DSS.2016.0088 Wang G, Xu J, He B (2016) A novel method for tuning configuration parameters of Spark based on machine learning. In: 2016 IEEE 18th international conference on high performance computing and communications; IEEE 14th international conference on smart city; IEEE 2nd international conference on data science and systems (HPCC/SmartCity/DSS). IEEE, pp 586–593. https://​doi.​org/​10.​1109/​HPCC-SmartCity-DSS.​2016.​0088
78.
Zurück zum Zitat Wang K, Khan MMH (2015) Performance prediction for Apache Spark platform. In: 2015 IEEE 17th international conference on high performance computing and communications, 2015 IEEE 7th international symposium on cyberspace safety and security, and 2015 IEEE 12th international conference on embedded software and systems. IEEE, pp 166–173. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.246 Wang K, Khan MMH (2015) Performance prediction for Apache Spark platform. In: 2015 IEEE 17th international conference on high performance computing and communications, 2015 IEEE 7th international symposium on cyberspace safety and security, and 2015 IEEE 12th international conference on embedded software and systems. IEEE, pp 166–173. https://​doi.​org/​10.​1109/​HPCC-CSS-ICESS.​2015.​246
80.
Zurück zum Zitat Wang L, Zhan J, Gao W, Jiang Z, Ren R, He X, Luo C, Lu G, Li J (2018) BOPS, not FLOPS! A new metric and roofline performance model for datacenter computing. arXiv preprint arXiv:1801.09212 Wang L, Zhan J, Gao W, Jiang Z, Ren R, He X, Luo C, Lu G, Li J (2018) BOPS, not FLOPS! A new metric and roofline performance model for datacenter computing. arXiv preprint arXiv:​1801.​09212
81.
Zurück zum Zitat Wang L, Zhan J, Luo C, Zhu Y, Yang Q, He Y, Gao W, Jia Z, Shi Y, Zhang S, et al (2014) Bigdatabench: a big data benchmark suite from internet services. In: 2014 IEEE 20th international symposium on high performance computer architecture (HPCA). IEEE, pp 488–499. https://doi.org/10.1109/HPCA.2014.6835958 Wang L, Zhan J, Luo C, Zhu Y, Yang Q, He Y, Gao W, Jia Z, Shi Y, Zhang S, et al (2014) Bigdatabench: a big data benchmark suite from internet services. In: 2014 IEEE 20th international symposium on high performance computer architecture (HPCA). IEEE, pp 488–499. https://​doi.​org/​10.​1109/​HPCA.​2014.​6835958
84.
Metadaten
Titel
Scalability and performance analysis of BDPS in clouds
verfasst von
Yuegang Li
Dongyang Ou
Xin Zhou
Congfeng Jiang
Christophe Cérin
Publikationsdatum
14.02.2022
Verlag
Springer Vienna
Erschienen in
Computing / Ausgabe 6/2022
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI
https://doi.org/10.1007/s00607-022-01056-7

Weitere Artikel der Ausgabe 6/2022

Computing 6/2022 Zur Ausgabe