Skip to main content
Erschienen in: The Journal of Supercomputing 3/2014

01.09.2014

On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism

verfasst von: Chao-Tung Yang, Jung-Chun Liu, Ching-Hsien Hsu, Wei-Li Chou

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Virtualization, particularly in the field of cloud computing, is a common strategy to improve existing computing resources. Hadoop, one of the Apache projects, is designed to scale up from single servers to thousands of machines, each offering local computation and storage capabilities. However, how to guarantee both stability and reliability of virtualization have become important topics. In this article, to reach this goal we used current open-source software and platforms, for instance, the Xen-Hypervisor virtualization technology, and the OpenNebula virtual machines management tool. After extending components capabilities, we developed a mechanism to support our ideas and reached high availability with Hadoop that is also called as virtualization fault tolerance (VFT). We considered a practical problem, i.e., the single-point-of-failure issue that occurs frequently in virtualization systems, and the experimental results confirm that the downtime interval can be greatly shortened even if failure occurred. As a result, VFT is useful not only for Hadoop applications, but also for more areas in cluster-based systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chaudhary V, Minsuk C, Walters JP, Guercio S, Gallo S (2008) A comparison of virtualization technologies for HPC. In: 22nd international conference on advanced information networking and applications, AINA 2008, pp 861–868 Chaudhary V, Minsuk C, Walters JP, Guercio S, Gallo S (2008) A comparison of virtualization technologies for HPC. In: 22nd international conference on advanced information networking and applications, AINA 2008, pp 861–868
2.
Zurück zum Zitat Rafael M-V, Ruben SM, Ignacio ML (2009) Elastic management of cluster-based services in the cloud. In: Proceedings of the 1st workshop on automated control for datacenters and clouds, Barcelona, Spain. ACM, New York, pp 19–24 Rafael M-V, Ruben SM, Ignacio ML (2009) Elastic management of cluster-based services in the cloud. In: Proceedings of the 1st workshop on automated control for datacenters and clouds, Barcelona, Spain. ACM, New York, pp 19–24
3.
Zurück zum Zitat Engelmann C, Scott SL, Leangsuksun C, He X (2008) 8th IEEE international symposium on symmetric active/active high availability for high-performance computing system services: accomplishments and limitations. In: Cluster computing and the grid, CCGRID ‘08, pp 813–818 Engelmann C, Scott SL, Leangsuksun C, He X (2008) 8th IEEE international symposium on symmetric active/active high availability for high-performance computing system services: accomplishments and limitations. In: Cluster computing and the grid, CCGRID ‘08, pp 813–818
4.
Zurück zum Zitat Turner D, Xuehua C (2002) Protocol-dependent message-passing performance on Linux clusters. In: IEEE international conference on cluster computing, proceedings, pp 187–194 CrossRef Turner D, Xuehua C (2002) Protocol-dependent message-passing performance on Linux clusters. In: IEEE international conference on cluster computing, proceedings, pp 187–194 CrossRef
8.
Zurück zum Zitat Grossman RL, Gu Y, Sabala M, Zhang W (2009) Compute and storage clouds using wide area high performance networks. Future Gener Comput Syst 25:179–183 CrossRef Grossman RL, Gu Y, Sabala M, Zhang W (2009) Compute and storage clouds using wide area high performance networks. Future Gener Comput Syst 25:179–183 CrossRef
9.
Zurück zum Zitat Shafer J, Rixner S, Cox AL (2010) The hadoop distributed filesystem: balancing portability and performance. In: IEEE international symposium on performance analysis of systems & software (ISPASS), White Plains, NY, pp 122–133 CrossRef Shafer J, Rixner S, Cox AL (2010) The hadoop distributed filesystem: balancing portability and performance. In: IEEE international symposium on performance analysis of systems & software (ISPASS), White Plains, NY, pp 122–133 CrossRef
10.
Zurück zum Zitat Mackey G, Sehrish S, Jun W (2009) Improving metadata management for small files in HDFS. In: IEEE international conference on cluster computing and workshops, CLUSTER’09, pp 1–4 CrossRef Mackey G, Sehrish S, Jun W (2009) Improving metadata management for small files in HDFS. In: IEEE international conference on cluster computing and workshops, CLUSTER’09, pp 1–4 CrossRef
12.
Zurück zum Zitat Xuhui L, Jizhong H, Yunqin Z, Chengde H, Xubin H (2009) Implementing WebGIS on hadoop: a case study of improving small file I/O performance on HDFS. In: IEEE international conference on cluster computing and workshops, CLUSTER’09, pp 1–8 Xuhui L, Jizhong H, Yunqin Z, Chengde H, Xubin H (2009) Implementing WebGIS on hadoop: a case study of improving small file I/O performance on HDFS. In: IEEE international conference on cluster computing and workshops, CLUSTER’09, pp 1–8
13.
Zurück zum Zitat White T (2012) Hadoop: The definitive guide. Storage and analysis at Internet scale, 3rd edn. O’Reilly Media/Yahoo Press, Sebastopol White T (2012) Hadoop: The definitive guide. Storage and analysis at Internet scale, 3rd edn. O’Reilly Media/Yahoo Press, Sebastopol
14.
Zurück zum Zitat Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26:1–26 CrossRefMATH Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26:1–26 CrossRefMATH
15.
Zurück zum Zitat Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. Oper Syst Rev 37:29–43 CrossRef Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. Oper Syst Rev 37:29–43 CrossRef
16.
Zurück zum Zitat Engelmann C, Scott SL, Leangsuksun C, He X (2006) Active/active replication for highly available HPC system services. In: The first international conference on availability, reliability and security, ARES 2006, p 7 Engelmann C, Scott SL, Leangsuksun C, He X (2006) Active/active replication for highly available HPC system services. In: The first international conference on availability, reliability and security, ARES 2006, p 7
17.
Zurück zum Zitat Fei-fei L, Xiang-zhan Y, Gang W (2009) Design and implementation of high availability distributed system based on multi-level heartbeat protocol. In: IITA international conference on control, automation and systems engineering, CASE 2009, pp 83–87 Fei-fei L, Xiang-zhan Y, Gang W (2009) Design and implementation of high availability distributed system based on multi-level heartbeat protocol. In: IITA international conference on control, automation and systems engineering, CASE 2009, pp 83–87
18.
Zurück zum Zitat Walters J, Chaudhary V (2009) A fault-tolerant strategy for virtualized HPC clusters. J Supercomput 50:209–239 CrossRef Walters J, Chaudhary V (2009) A fault-tolerant strategy for virtualized HPC clusters. J Supercomput 50:209–239 CrossRef
19.
Zurück zum Zitat Vargas E (2000) High availability fundamentals. Sun Microsystems, Santa Clara Vargas E (2000) High availability fundamentals. Sun Microsystems, Santa Clara
20.
Zurück zum Zitat Vallee G, Engelmann C, Tikotekar A, Naughton T, Charoenpornwattana K, Leangsuksun C, Scott SL (2008) A framework for proactive fault tolerance. In: Third international conference on availability, reliability and security, ARES 08, pp 659–664 Vallee G, Engelmann C, Tikotekar A, Naughton T, Charoenpornwattana K, Leangsuksun C, Scott SL (2008) A framework for proactive fault tolerance. In: Third international conference on availability, reliability and security, ARES 08, pp 659–664
21.
Zurück zum Zitat Ang C-W, Tham C-K (2007) Analysis and optimization of service availability in a HA cluster with load-dependent machine availability. IEEE Trans Parallel Distrib Syst 18:1307–1319 CrossRef Ang C-W, Tham C-K (2007) Analysis and optimization of service availability in a HA cluster with load-dependent machine availability. IEEE Trans Parallel Distrib Syst 18:1307–1319 CrossRef
22.
Zurück zum Zitat Dejan M, Liorente LM, Montero RS (2011) OpenNebula: a cloud management tool. IEEE Internet Comput 15:11–14 Dejan M, Liorente LM, Montero RS (2011) OpenNebula: a cloud management tool. IEEE Internet Comput 15:11–14
23.
Zurück zum Zitat Nurmi D, Wolski R, Grzegorczyk C, Obertelli G, Soman S, Youseff L, Zagorodnov D (2009) The Eucalyptus open-source cloud-computing system. Presented at the proceedings of the 2009 9th IEEE/ACM international symposium on cluster computing and the grid Nurmi D, Wolski R, Grzegorczyk C, Obertelli G, Soman S, Youseff L, Zagorodnov D (2009) The Eucalyptus open-source cloud-computing system. Presented at the proceedings of the 2009 9th IEEE/ACM international symposium on cluster computing and the grid
24.
Zurück zum Zitat Sempolinski P, Thain D (2010) A comparison and critique of eucalyptus, OpenNebula and Nimbus. In: IEEE second international conference on cloud computing technology and science (CloudCom), pp 417–426 Sempolinski P, Thain D (2010) A comparison and critique of eucalyptus, OpenNebula and Nimbus. In: IEEE second international conference on cloud computing technology and science (CloudCom), pp 417–426
25.
Zurück zum Zitat Yang C-T, Cheng H-Y, Chou W-L, Kuo C-T (2011) A dynamic resource allocation model for virtual machine management on cloud. In: Symposium on cloud and service computing Yang C-T, Cheng H-Y, Chou W-L, Kuo C-T (2011) A dynamic resource allocation model for virtual machine management on cloud. In: Symposium on cloud and service computing
26.
Zurück zum Zitat Piedad F, Hawkins M (2001) High availability, design, techniques and processes. Prentice-Hall, New York Piedad F, Hawkins M (2001) High availability, design, techniques and processes. Prentice-Hall, New York
29.
Zurück zum Zitat Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. Oper Syst Rev 37:164–177 CrossRef Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. Oper Syst Rev 37:164–177 CrossRef
30.
Zurück zum Zitat Hagen Wv (2008) Professional Xen virtualization, 1st edn. Wiley, New York Hagen Wv (2008) Professional Xen virtualization, 1st edn. Wiley, New York
31.
Zurück zum Zitat Yang C-T, Tseng C-H, Chou K-Y, Tsaur S-C (2009) A virtualized HPC cluster computing environment on Xen with web-based user interface. In: Second international conference, HPCA 2009, Shanghai, China, 10–12 August 2009, pp 503–508. Revised Selected papers. doi:10.1007/978-3-642-11842-5_70 Yang C-T, Tseng C-H, Chou K-Y, Tsaur S-C (2009) A virtualized HPC cluster computing environment on Xen with web-based user interface. In: Second international conference, HPCA 2009, Shanghai, China, 10–12 August 2009, pp 503–508. Revised Selected papers. doi:10.​1007/​978-3-642-11842-5_​70
32.
Zurück zum Zitat Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: Proceedings of the 21st annual international conference on supercomputing, Seattle, Washington. doi:10.1145/1274971.1274978 Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: Proceedings of the 21st annual international conference on supercomputing, Seattle, Washington. doi:10.​1145/​1274971.​1274978
33.
Zurück zum Zitat Montero RS, Moreno-Vozmediano R, Llorente IM (2011) An elasticity model for high throughput computing clusters. J Parallel Distrib Comput 71:750–757 CrossRef Montero RS, Moreno-Vozmediano R, Llorente IM (2011) An elasticity model for high throughput computing clusters. J Parallel Distrib Comput 71:750–757 CrossRef
34.
Zurück zum Zitat Sotomayor B, Montero RS, Llorente IM, Foster I (2009) Virtual infrastructure management in private and hybrid clouds. IEEE Internet Comput 13(5):14–22 CrossRef Sotomayor B, Montero RS, Llorente IM, Foster I (2009) Virtual infrastructure management in private and hybrid clouds. IEEE Internet Comput 13(5):14–22 CrossRef
37.
Zurück zum Zitat Hai Z, Kun T, Xuejie Z (2010) An approach to optimized resource scheduling algorithm for open-source cloud systems. In: Fifth annual ChinaGrid conference (ChinaGrid), pp 124–129 Hai Z, Kun T, Xuejie Z (2010) An approach to optimized resource scheduling algorithm for open-source cloud systems. In: Fifth annual ChinaGrid conference (ChinaGrid), pp 124–129
38.
Zurück zum Zitat Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In: 10th IEEE international conference on computer and information technology, pp 2736–2743 Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In: 10th IEEE international conference on computer and information technology, pp 2736–2743
Metadaten
Titel
On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism
verfasst von
Chao-Tung Yang
Jung-Chun Liu
Ching-Hsien Hsu
Wei-Li Chou
Publikationsdatum
01.09.2014
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 3/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-013-1045-1

Weitere Artikel der Ausgabe 3/2014

The Journal of Supercomputing 3/2014 Zur Ausgabe