Skip to main content
Erschienen in: The Journal of Supercomputing 6/2022

06.01.2022

OFP-TM: an online VM failure prediction and tolerance model towards high availability of cloud computing environments

verfasst von: Deepika Saxena, Ashutosh Kumar Singh

Erschienen in: The Journal of Supercomputing | Ausgabe 6/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The indispensable collaboration of cloud computing in every digital service has raised its resource usage exponentially. The ever-growing demand of cloud resources evades service availability leading to critical challenges such as cloud outages, SLA violation, and excessive power consumption. Previous approaches have addressed this problem by utilizing multiple cloud platforms or running multiple replicas of a Virtual Machine (VM) resulting into high operational cost. This paper has addressed this alarming problem from a different perspective by proposing a novel \(\mathbb {O}\)nline virtual machine \(\mathbb {F}\)ailure \(\mathbb {P}\)rediction and \(\mathbb {T}\)olerance \(\mathbb {M}\)odel (OFP-TM) with high availability awareness embedded in physical machines as well as virtual machines. The failure-prone VMs are estimated in real-time based on their future resource usage by developing an ensemble approach-based resource predictor. These VMs are assigned to a failure tolerance unit comprising of a resource provision matrix and Selection Box (S-Box) mechanism which triggers the migration of failure-prone VMs and handle any outage beforehand while maintaining the desired level of availability for cloud users. The proposed model is evaluated and compared against existing related approaches by simulating cloud environment and executing several experiments using a real-world workload Google Cluster dataset. Consequently, it has been concluded that OFP-TM improves availability and scales down the number of live VM migrations up to 33.5% and 83.3%, respectively, over without OFP-TM.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Saxena S, Saxena D (2015) EWAS: an enriched workflow scheduling algorithm in cloud computing. In 2015 International Conference on Computing, Communication and Security (ICCCS), pages 1–5. IEEE Saxena S, Saxena D (2015) EWAS: an enriched workflow scheduling algorithm in cloud computing. In 2015 International Conference on Computing, Communication and Security (ICCCS), pages 1–5. IEEE
4.
Zurück zum Zitat Saxena D, Singh AK (2021) Workload forecasting and resource management models based on machine learning for cloud computing environments. arXiv preprint arXiv:2106.15112 Saxena D, Singh AK (2021) Workload forecasting and resource management models based on machine learning for cloud computing environments. arXiv preprint arXiv:​2106.​15112
6.
Zurück zum Zitat Singh AK, Saxena D (2021) A cryptography and machine learning based authentication for secure data-sharing in federated cloud services environment. J Appl Secur Res 1–24 Singh AK, Saxena D (2021) A cryptography and machine learning based authentication for secure data-sharing in federated cloud services environment. J Appl Secur Res 1–24
7.
Zurück zum Zitat Saxena D, Gupta R, Singh AK (2021) A survey and comparative study on multi-cloud architectures: emerging issues and challenges for cloud federation. arXiv preprint arXiv:2108.12831, Saxena D, Gupta R, Singh AK (2021) A survey and comparative study on multi-cloud architectures: emerging issues and challenges for cloud federation. arXiv preprint arXiv:​2108.​12831,
9.
Zurück zum Zitat Li Z, Yang Y (2017) A novel network structure with power efficiency and high availability for data centers. IEEE Trans Parallel Distrib Syst 29(2):254–268CrossRef Li Z, Yang Y (2017) A novel network structure with power efficiency and high availability for data centers. IEEE Trans Parallel Distrib Syst 29(2):254–268CrossRef
10.
Zurück zum Zitat Saxena D, Chauhan RK, Kait R (2016) Dynamic fair priority optimization task scheduling algorithm in cloud computing: concepts and implementations. Int J Comput Netw Inf Secur 8(2):41 Saxena D, Chauhan RK, Kait R (2016) Dynamic fair priority optimization task scheduling algorithm in cloud computing: concepts and implementations. Int J Comput Netw Inf Secur 8(2):41
11.
Zurück zum Zitat Saxena D, Vaisla KS, Rauthan MS (2018) Abstract model of trusted and secure middleware framework for multi-cloud environment. In: International Conference on Advanced Informatics for Computing Research, pages 469–479. Springer Saxena D, Vaisla KS, Rauthan MS (2018) Abstract model of trusted and secure middleware framework for multi-cloud environment. In: International Conference on Advanced Informatics for Computing Research, pages 469–479. Springer
12.
Zurück zum Zitat Saxena D, Singh AK (2020) Security embedded dynamic resource allocation model for cloud data centre. Electron Lett 56(20):1062–1065CrossRef Saxena D, Singh AK (2020) Security embedded dynamic resource allocation model for cloud data centre. Electron Lett 56(20):1062–1065CrossRef
14.
Zurück zum Zitat Gupta R, Saxena D, Singh AK (2021) Data security and privacy in cloud computing: concepts and emerging trends. arXiv preprint arXiv:2108.09508 Gupta R, Saxena D, Singh AK (2021) Data security and privacy in cloud computing: concepts and emerging trends. arXiv preprint arXiv:​2108.​09508
15.
Zurück zum Zitat Saxena D, Singh AK (2020) Auto-adaptive learning-based workload forecasting in dynamic cloud environment. Int J Comput Appl 1–11 Saxena D, Singh AK (2020) Auto-adaptive learning-based workload forecasting in dynamic cloud environment. Int J Comput Appl 1–11
16.
Zurück zum Zitat Saxena D, Singh AK (2020) A proactive autoscaling and energy-efficient VM allocation framework using online multi-resource neural network for cloud data center. Neurocomputing 426:248–264CrossRef Saxena D, Singh AK (2020) A proactive autoscaling and energy-efficient VM allocation framework using online multi-resource neural network for cloud data center. Neurocomputing 426:248–264CrossRef
17.
Zurück zum Zitat Saxena D, Saxena S (2015) Highly advanced cloudlet scheduling algorithm based on particle swarm optimization. In 2015 Eighth International Conference on Contemporary Computing (IC3), pages 111–116. IEEE Saxena D, Saxena S (2015) Highly advanced cloudlet scheduling algorithm based on particle swarm optimization. In 2015 Eighth International Conference on Contemporary Computing (IC3), pages 111–116. IEEE
18.
Zurück zum Zitat Saxena D, Singh AK (2021) Energy aware resource efficient-(eare) server consolidation framework for cloud datacenter. Advances in communication and computational technology. Springer, Singapore, pp 1455–1464CrossRef Saxena D, Singh AK (2021) Energy aware resource efficient-(eare) server consolidation framework for cloud datacenter. Advances in communication and computational technology. Springer, Singapore, pp 1455–1464CrossRef
19.
Zurück zum Zitat Zhang Q, Li S, Li Z, Xing Y, Yang Z, Dai Y (2015) Charm: a cost-efficient multi-cloud data hosting scheme with high availability. IEEE Trans Cloud Comput 3(3):372–386CrossRef Zhang Q, Li S, Li Z, Xing Y, Yang Z, Dai Y (2015) Charm: a cost-efficient multi-cloud data hosting scheme with high availability. IEEE Trans Cloud Comput 3(3):372–386CrossRef
21.
Zurück zum Zitat Endo PT, Gonçalves GE, Rosendo D, Gomes D, Santos GL, Moreira ALC, Kelner J, Sadok D, Mahloo M (2017) Highly available clouds: system modeling, evaluations, and open challenges. Research Advances in Cloud Computing. Springer, Singapore, pp 21–53CrossRef Endo PT, Gonçalves GE, Rosendo D, Gomes D, Santos GL, Moreira ALC, Kelner J, Sadok D, Mahloo M (2017) Highly available clouds: system modeling, evaluations, and open challenges. Research Advances in Cloud Computing. Springer, Singapore, pp 21–53CrossRef
23.
Zurück zum Zitat Mukwevho MA, Celik T (2018) Toward a smart cloud: a review of fault-tolerance methods in cloud systems. IEEE Trans Serv Comput Mukwevho MA, Celik T (2018) Toward a smart cloud: a review of fault-tolerance methods in cloud systems. IEEE Trans Serv Comput
24.
Zurück zum Zitat Endo PT, Rodrigues M, Gonçalves GE, Kelner J, Sadok DH, Curescu C (2016) High availability in clouds: systematic review and research challenges. J Cloud Comput 5(1):1–15CrossRef Endo PT, Rodrigues M, Gonçalves GE, Kelner J, Sadok DH, Curescu C (2016) High availability in clouds: systematic review and research challenges. J Cloud Comput 5(1):1–15CrossRef
25.
Zurück zum Zitat Gill SS, Buyya R (2018) Failure management for reliable cloud computing: a taxonomy, model, and future directions. Comput Sci Eng 22(3):52–63CrossRef Gill SS, Buyya R (2018) Failure management for reliable cloud computing: a taxonomy, model, and future directions. Comput Sci Eng 22(3):52–63CrossRef
26.
Zurück zum Zitat Jhawar R, Piuri V, Santambrogio M (2012) Fault tolerance management in cloud computing: a system-level perspective. IEEE Syst J 7(2):288–297CrossRef Jhawar R, Piuri V, Santambrogio M (2012) Fault tolerance management in cloud computing: a system-level perspective. IEEE Syst J 7(2):288–297CrossRef
27.
Zurück zum Zitat Costa Carlos HA, Park Y, Rosenburg BS, Cher C-Y, Ryu KD (2014) A system software approach to proactive memory-error avoidance. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 707–718. IEEE Costa Carlos HA, Park Y, Rosenburg BS, Cher C-Y, Ryu KD (2014) A system software approach to proactive memory-error avoidance. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 707–718. IEEE
29.
Zurück zum Zitat Sharma Y, Si W, Sun D, Javadi B (2019) Failure-aware energy-efficient VM consolidation in cloud computing systems. Future Gener Comput Syst 94:620–633CrossRef Sharma Y, Si W, Sun D, Javadi B (2019) Failure-aware energy-efficient VM consolidation in cloud computing systems. Future Gener Comput Syst 94:620–633CrossRef
30.
Zurück zum Zitat Bui D-M, Lee S et al (2018) Early fault detection in IAAS cloud computing based on fuzzy logic and prediction technique. J Supercomput 74(11):5730–5745CrossRef Bui D-M, Lee S et al (2018) Early fault detection in IAAS cloud computing based on fuzzy logic and prediction technique. J Supercomput 74(11):5730–5745CrossRef
31.
Zurück zum Zitat Nguyen HM, Kalra G, Kim D (2019) Host load prediction in cloud computing using long short-term memory encoder-decoder. J Supercomput 75(11):7592–7605CrossRef Nguyen HM, Kalra G, Kim D (2019) Host load prediction in cloud computing using long short-term memory encoder-decoder. J Supercomput 75(11):7592–7605CrossRef
32.
Zurück zum Zitat Pinto J, Jain P, Kumar T (2016) Hadoop distributed computing clusters for fault prediction. In: 2016 International Computer Science and Engineering Conference (ICSEC), pages 1–6. IEEE Pinto J, Jain P, Kumar T (2016) Hadoop distributed computing clusters for fault prediction. In: 2016 International Computer Science and Engineering Conference (ICSEC), pages 1–6. IEEE
33.
Zurück zum Zitat Xu Y, Sui K, Yao R, Zhang H, Lin Q, Dang Y, Li P, Jiang K, Zhang W, Lou J-G et al. (2018) Improving service availability of cloud systems by predicting disk error. In 2018 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\} \{\)ATC\(\}\) 18), pages 481–494 Xu Y, Sui K, Yao R, Zhang H, Lin Q, Dang Y, Li P, Jiang K, Zhang W, Lou J-G et al. (2018) Improving service availability of cloud systems by predicting disk error. In 2018 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\} \{\)ATC\(\}\) 18), pages 481–494
34.
Zurück zum Zitat Wang J, Bao W, Zhu X, Yang LT, Xiang Y (2014) Festal: fault-tolerant elastic scheduling algorithm for real-time tasks in virtualized clouds. IEEE Trans Comput 64(9):2545–2558MathSciNetCrossRef Wang J, Bao W, Zhu X, Yang LT, Xiang Y (2014) Festal: fault-tolerant elastic scheduling algorithm for real-time tasks in virtualized clouds. IEEE Trans Comput 64(9):2545–2558MathSciNetCrossRef
35.
Zurück zum Zitat Zhu X, Wang J, Guo H, Zhu D, Yang LT, Liu L (2016) Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Trans Parallel Distrib Syst 27(12):3501–3517CrossRef Zhu X, Wang J, Guo H, Zhu D, Yang LT, Liu L (2016) Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Trans Parallel Distrib Syst 27(12):3501–3517CrossRef
36.
Zurück zum Zitat Sivagami VM, Easwarakumar KS (2019) An improved dynamic fault tolerant management algorithm during VM migration in cloud data center. Future Gener Comput Syst 98:35–43CrossRef Sivagami VM, Easwarakumar KS (2019) An improved dynamic fault tolerant management algorithm during VM migration in cloud data center. Future Gener Comput Syst 98:35–43CrossRef
37.
Zurück zum Zitat Vinay K, Kumar SM Dilip, Raghavendra S, Venugopal KR (2018) Cost and fault-tolerant aware resource management for scientific workflows using hybrid instances on clouds. Multimed Tools Appl 77(8):10171–10193CrossRef Vinay K, Kumar SM Dilip, Raghavendra S, Venugopal KR (2018) Cost and fault-tolerant aware resource management for scientific workflows using hybrid instances on clouds. Multimed Tools Appl 77(8):10171–10193CrossRef
38.
Zurück zum Zitat Ghoreyshi SM (2013) Energy-efficient resource management of cloud datacenters under fault tolerance constraints. In: 2013 International Green Computing Conference Proceedings, pages 1–6. IEEE Ghoreyshi SM (2013) Energy-efficient resource management of cloud datacenters under fault tolerance constraints. In: 2013 International Green Computing Conference Proceedings, pages 1–6. IEEE
39.
Zurück zum Zitat Chunlin L, YaPing W, Yi C, Youlong L (2019) Energy-efficient fault-tolerant replica management policy with deadline and budget constraints in edge-cloud environment. J Netw Comput Appl 143:152–166CrossRef Chunlin L, YaPing W, Yi C, Youlong L (2019) Energy-efficient fault-tolerant replica management policy with deadline and budget constraints in edge-cloud environment. J Netw Comput Appl 143:152–166CrossRef
43.
Zurück zum Zitat Charles R, John W, Hellerstein JL (2011) Google cluster-usage traces: format+ schema. Google Inc., White Paper, pp 1–14 Charles R, John W, Hellerstein JL (2011) Google cluster-usage traces: format+ schema. Google Inc., White Paper, pp 1–14
44.
Zurück zum Zitat Araujo J, Maciel P, Torquato M, Callou G, Andrade E (2014) Availability evaluation of digital library cloud services. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 666–671. IEEE Araujo J, Maciel P, Torquato M, Callou G, Andrade E (2014) Availability evaluation of digital library cloud services. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 666–671. IEEE
45.
Zurück zum Zitat Santos GL, Endo PT, Gonçalves G, Rosendo D, Gomes D, Kelner J, Sadok D, Mahloo M (2017) Analyzing the it subsystem failure impact on availability of cloud services. In: 2017 IEEE Symposium on Computers and Communications (ISCC), pages 717–723. IEEE Santos GL, Endo PT, Gonçalves G, Rosendo D, Gomes D, Kelner J, Sadok D, Mahloo M (2017) Analyzing the it subsystem failure impact on availability of cloud services. In: 2017 IEEE Symposium on Computers and Communications (ISCC), pages 717–723. IEEE
Metadaten
Titel
OFP-TM: an online VM failure prediction and tolerance model towards high availability of cloud computing environments
verfasst von
Deepika Saxena
Ashutosh Kumar Singh
Publikationsdatum
06.01.2022
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 6/2022
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-021-04235-z

Weitere Artikel der Ausgabe 6/2022

The Journal of Supercomputing 6/2022 Zur Ausgabe

Premium Partner