Skip to main content
Erschienen in: Cluster Computing 1/2016

01.03.2016

A decentralized fault tolerance model based on level of performance for grid environment

verfasst von: Mohammed Rebbah, Yahya Slimani, Abdelkader Benyettou, Lionel Brunie

Erschienen in: Cluster Computing | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Computational grids have the potential for solving large-scale scientific problems using heterogeneous and geographically distributed resources. At this scale, computer resources and network failures are no more exceptions, but belong to the normal system behavior. Therefore, one of the most valuable characteristics of grid tools, apart from the performance they can achieve, is fault tolerance, which is a significant and complex issue in grid computing systems. In this paper, we propose a fault tolerant model for grid computing systems namely DCFT. This model is based on dynamic colored graphs without replication of computer resources. The proposed faut tolerance model consists of two stages. In the first stage, each node is described by a state vector. We color each attribute of the state vector as three colors (green, blue and red) based on its level of performance. In the second stage, we classify the nodes of a grid into three categories: the identical computer resources in term of performance, the more efficient ones and the less efficient ones. We used the colors of the nodes to develop a new strategy for fault tolerance based on the level of performance. A simulation of the proposed model using SimGrid simulator and Graphstream is conducted. Experimental results show that the proposed model performs very well in a large grid environment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abbasian, R., Mouhoub, M.: An efficient hierarchical parallel genetic algorithm for graph coloring problem. In: Krasnogor N (ed.) Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO’11), pp. 521–528. ACM, New York (2011) Abbasian, R., Mouhoub, M.: An efficient hierarchical parallel genetic algorithm for graph coloring problem. In: Krasnogor N (ed.) Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO’11), pp. 521–528. ACM, New York (2011)
2.
Zurück zum Zitat Abbes, H., Cérin, C.: A decentralized and fault-tolerant desktop grid system for distributed applications. Concurr. Comput. Pract. Exp. 22(3), 261–277 (2010) Abbes, H., Cérin, C.: A decentralized and fault-tolerant desktop grid system for distributed applications. Concurr. Comput. Pract. Exp. 22(3), 261–277 (2010)
3.
Zurück zum Zitat Aliaa, A.A.Y., Atef, Z.G., Mohammed, E.E.D.: An efficient decentralized grid service advertisement approach using multi-agent system. Comput. Inf. Sci. 3(2), 220–228 (2010) Aliaa, A.A.Y., Atef, Z.G., Mohammed, E.E.D.: An efficient decentralized grid service advertisement approach using multi-agent system. Comput. Inf. Sci. 3(2), 220–228 (2010)
4.
Zurück zum Zitat Anderson, D.P.: Boinc: a system for public-resource computing and storage. In: GRID 2004: Proceedings of 5th International Workshop on Grid Computing, Pittsburgh, pp. 4–10 (2004) Anderson, D.P.: Boinc: a system for public-resource computing and storage. In: GRID 2004: Proceedings of 5th International Workshop on Grid Computing, Pittsburgh, pp. 4–10 (2004)
5.
Zurück zum Zitat Arora, M., Das, S.K., Biswas, R.: A de-centralized scheduling and load balancing algorithm for heterogeneous grid environments. In: Workshop on Scheduling and Resource Management for Cluster Computing, Vancouver (2002) Arora, M., Das, S.K., Biswas, R.: A de-centralized scheduling and load balancing algorithm for heterogeneous grid environments. In: Workshop on Scheduling and Resource Management for Cluster Computing, Vancouver (2002)
6.
Zurück zum Zitat Balasangameshwara, J., Raju N.: A fault tolerance optimal neighbor load balancing algorithm for grid environment. In: Interantional Conference on Computational Intelligence and Communication Networks, IEEE, pp. 428-433 (2010) Balasangameshwara, J., Raju N.: A fault tolerance optimal neighbor load balancing algorithm for grid environment. In: Interantional Conference on Computational Intelligence and Communication Networks, IEEE, pp. 428-433 (2010)
7.
Zurück zum Zitat Balasangameshwara, J., Raju, N.: A hybrid policy for fault tolerant load balancing in grid computing environments. J. Netw. Comput. Appl. (Elsevier) 35, 412–422 (2012)CrossRef Balasangameshwara, J., Raju, N.: A hybrid policy for fault tolerant load balancing in grid computing environments. J. Netw. Comput. Appl. (Elsevier) 35, 412–422 (2012)CrossRef
8.
Zurück zum Zitat Braun, T., Siegel, H.J., Beck, N., Boloni, L., Maheswaran, M., Reuther, A., et al.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)CrossRefMATH Braun, T., Siegel, H.J., Beck, N., Boloni, L., Maheswaran, M., Reuther, A., et al.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)CrossRefMATH
9.
Zurück zum Zitat Budati, K., Sonnek, J.D., Chandra, A., Weissman, J.B.: ’Ridge: combining reliability and performance in open grid platforms’. In: HPDC 2007: Proceedings of 3rd International Symposium on High Performance Computing and Communications, Monterey, pp. 55–64 (2007) Budati, K., Sonnek, J.D., Chandra, A., Weissman, J.B.: ’Ridge: combining reliability and performance in open grid platforms’. In: HPDC 2007: Proceedings of 3rd International Symposium on High Performance Computing and Communications, Monterey, pp. 55–64 (2007)
10.
Zurück zum Zitat Casanova, H., Legrand, A., Quinson, M.: SimGrid: a Generic Framework for Large-Scale Distributed Experimentations. In: Proceedings of the 10th IEEE International Conference on Computer Modelling and Simulation (UKSIM/EUROSIM08) (2008) Casanova, H., Legrand, A., Quinson, M.: SimGrid: a Generic Framework for Large-Scale Distributed Experimentations. In: Proceedings of the 10th IEEE International Conference on Computer Modelling and Simulation (UKSIM/EUROSIM08) (2008)
12.
Zurück zum Zitat Chandy, K.M., Lamport, L.: Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 63–75 (1985)CrossRef Chandy, K.M., Lamport, L.: Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 63–75 (1985)CrossRef
13.
Zurück zum Zitat Chervenak, A.L., Schuler, R., Ripeanu, M., Amer, M.A., Bharathi, S., Foster, I., Iamnitchi, A., Kesselman, C.: The globus replica location service: design and experience. Trans. Parallel Distrib. Syst. 20(9), 1260–1272 (2009)CrossRef Chervenak, A.L., Schuler, R., Ripeanu, M., Amer, M.A., Bharathi, S., Foster, I., Iamnitchi, A., Kesselman, C.: The globus replica location service: design and experience. Trans. Parallel Distrib. Syst. 20(9), 1260–1272 (2009)CrossRef
14.
Zurück zum Zitat Dai, Y.S., Pan, Y., Zou, X.: A hierarchical modeling and analysis for grid service reliability. IEEE Trans. Comput. 56, 681–691 (2007)MathSciNetCrossRef Dai, Y.S., Pan, Y., Zou, X.: A hierarchical modeling and analysis for grid service reliability. IEEE Trans. Comput. 56, 681–691 (2007)MathSciNetCrossRef
15.
Zurück zum Zitat Delamare, S., Fedak, G., Kondo, D., Lodygensky, O.: SpeQuloS: a QoS service for hybrid and elastic computing infrastructures. Clust. Comput. 17(1), 79–100 (2014)CrossRef Delamare, S., Fedak, G., Kondo, D., Lodygensky, O.: SpeQuloS: a QoS service for hybrid and elastic computing infrastructures. Clust. Comput. 17(1), 79–100 (2014)CrossRef
16.
Zurück zum Zitat Dìaz, D., Pardo, X. C., Martìn, M. J., González, P.: Application-level fault-tolerance solutions for grid computing. In: Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’08). IEEE Computer Society, Washington, pp. 554–559 (2008) Dìaz, D., Pardo, X. C., Martìn, M. J., González, P.: Application-level fault-tolerance solutions for grid computing. In: Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’08). IEEE Computer Society, Washington, pp. 554–559 (2008)
17.
Zurück zum Zitat Dijkstra, E. W.: A note on two problems in connexion with graphs. In: Numerische Mathematik, Mathematisch Centrum, Amsterdam, Vol. 1, pp. 269–271 (1959) Dijkstra, E. W.: A note on two problems in connexion with graphs. In: Numerische Mathematik, Mathematisch Centrum, Amsterdam, Vol. 1, pp. 269–271 (1959)
18.
Zurück zum Zitat Dutot, A., Guinand, F., Olivier, D., Pign, Y.: Graphstream: A tool for bridging the gap between complex systems and dynamic graphs. In: Emergent Properties in Natural and Artificial Complex Systems. Satellite Conference within the 4th European Conference on Complex Systems, ECCS’2007, Dresden (2007) Dutot, A., Guinand, F., Olivier, D., Pign, Y.: Graphstream: A tool for bridging the gap between complex systems and dynamic graphs. In: Emergent Properties in Natural and Artificial Complex Systems. Satellite Conference within the 4th European Conference on Complex Systems, ECCS’2007, Dresden (2007)
19.
Zurück zum Zitat Ebenezer, A.S., Baskaran, K.: Fault tolerant most fitting resource scheduling algorithm (FMFRS) for computational grid. Eur. J. Sci. Res. 86(4), 468–473 (2012) Ebenezer, A.S., Baskaran, K.: Fault tolerant most fitting resource scheduling algorithm (FMFRS) for computational grid. Eur. J. Sci. Res. 86(4), 468–473 (2012)
20.
Zurück zum Zitat Foster, I., Kesselman, C., Nick, J.M.: Grid services for distributed system integration. Computer 35(6), 37–46 (2002)CrossRef Foster, I., Kesselman, C., Nick, J.M.: Grid services for distributed system integration. Computer 35(6), 37–46 (2002)CrossRef
21.
Zurück zum Zitat Garg, R., Singh, A.K.: Fault tolerance grid computing: state of the art and open issues. Int. J. Comput. Sci. Eng. Surv. 2(1), 88–97 (2011)CrossRef Garg, R., Singh, A.K.: Fault tolerance grid computing: state of the art and open issues. Int. J. Comput. Sci. Eng. Surv. 2(1), 88–97 (2011)CrossRef
22.
Zurück zum Zitat Ghafarian-M., T., Deldari, H., Mohhamad, H., Yaghmaee-M., M.-H.: Proximity-aware resource discovery architecture in peer-to-peer based volunteer computing system. In: 11th IEEE International Conference on Computer and Information Technology, CIT 2011, pp 83–90 Ghafarian-M., T., Deldari, H., Mohhamad, H., Yaghmaee-M., M.-H.: Proximity-aware resource discovery architecture in peer-to-peer based volunteer computing system. In: 11th IEEE International Conference on Computer and Information Technology, CIT 2011, pp 83–90
23.
Zurück zum Zitat Ghafarian, T., Deldari, H., Javadi, B., Yaghmaee, M.H., Buyya, R.: CycloidGrid: a proximity-aware P2P-based resource discovery architecture in volunteer computing systems. Future Gener. Comput. Syst. 29, 1583–1595 (2013)CrossRef Ghafarian, T., Deldari, H., Javadi, B., Yaghmaee, M.H., Buyya, R.: CycloidGrid: a proximity-aware P2P-based resource discovery architecture in volunteer computing systems. Future Gener. Comput. Syst. 29, 1583–1595 (2013)CrossRef
24.
Zurück zum Zitat Harvey, D.J., Das, S.K., Biswas, R.: Design and performance of a heterogeneous grid partitioner. Algorithmica 45(3), 509–530 (2006)CrossRefMATH Harvey, D.J., Das, S.K., Biswas, R.: Design and performance of a heterogeneous grid partitioner. Algorithmica 45(3), 509–530 (2006)CrossRefMATH
25.
Zurück zum Zitat Huedo, E., Montero, R., Llorente, I.: Evaluating the reliability of computational grids from the end user’s point of view. J. Syst. Archit. 52(12), 727–736 (2006)CrossRef Huedo, E., Montero, R., Llorente, I.: Evaluating the reliability of computational grids from the end user’s point of view. J. Syst. Archit. 52(12), 727–736 (2006)CrossRef
26.
Zurück zum Zitat Iosup, A., Sonmez, O., Anoep, S., Epema, D.: The performance of Bags-of-Tasks in large-scale distributed systems. In: Proceedings of The 17th International Symposium on High Performance Distributed Computing, HPDC, pp. 97108 (2008) Iosup, A., Sonmez, O., Anoep, S., Epema, D.: The performance of Bags-of-Tasks in large-scale distributed systems. In: Proceedings of The 17th International Symposium on High Performance Distributed Computing, HPDC, pp. 97108 (2008)
27.
Zurück zum Zitat Jin, H., Shi, X., Qiang, W., Zou, D.: DRIC: dependable grid computing framework. IEICE Trans. E89–D(2), 612–623 (2006) Jin, H., Shi, X., Qiang, W., Zou, D.: DRIC: dependable grid computing framework. IEICE Trans. E89–D(2), 612–623 (2006)
28.
Zurück zum Zitat Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7, 48–50 (1956)MathSciNetCrossRefMATH Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7, 48–50 (1956)MathSciNetCrossRefMATH
29.
Zurück zum Zitat Kumar, S., Das, S., Biswas, R.: Graph partitioning for parallel applications in heterogeneous grid environments. In: Proceedings of the 16th International Parallel and Distributed Processing Symposium, p. 167 (2002) Kumar, S., Das, S., Biswas, R.: Graph partitioning for parallel applications in heterogeneous grid environments. In: Proceedings of the 16th International Parallel and Distributed Processing Symposium, p. 167 (2002)
30.
Zurück zum Zitat Levitin, G., Dai, Y.S.: Service reliability and performance in grid system with star topology. Reliab. Eng. Syst. Saf. 92(1), 40–46 (2007)CrossRef Levitin, G., Dai, Y.S.: Service reliability and performance in grid system with star topology. Reliab. Eng. Syst. Saf. 92(1), 40–46 (2007)CrossRef
31.
Zurück zum Zitat Lieberman, E., Hauert, C., Nowak, M.A.: Evolutionary dynamics on graphs. Nature 433(7023), 312–316 (2005)CrossRef Lieberman, E., Hauert, C., Nowak, M.A.: Evolutionary dynamics on graphs. Nature 433(7023), 312–316 (2005)CrossRef
32.
Zurück zum Zitat Liu, N.N., Yang, Q.: Eigenrank: a ranking-oriented approach to collaborative filtering. In: SIGIR 2008: Proceeding of 10th International Conference on Research and Development in Informantion Retrieval, Singapore, pp. 83–90 (2008) Liu, N.N., Yang, Q.: Eigenrank: a ranking-oriented approach to collaborative filtering. In: SIGIR 2008: Proceeding of 10th International Conference on Research and Development in Informantion Retrieval, Singapore, pp. 83–90 (2008)
33.
Zurück zum Zitat Mabrouk, B.B., Hasni, H., Mahjoub, Z.: On a parallel genetic-tabu search based algorithm for solving the graph coloring problem. Eur. J. Oper. Res. 197(3), 1192–1201 (2009)CrossRefMATH Mabrouk, B.B., Hasni, H., Mahjoub, Z.: On a parallel genetic-tabu search based algorithm for solving the graph coloring problem. Eur. J. Oper. Res. 197(3), 1192–1201 (2009)CrossRefMATH
34.
Zurück zum Zitat Malecot, P., Kondo, D., Fedak, G.: Xtremlab: a system for characterizing internet desktop grids. In: HPCC 2006: Proceeding of 2th International Conference on High Performance Computing and Communications, Munich, pp. 357–358 (2006) Malecot, P., Kondo, D., Fedak, G.: Xtremlab: a system for characterizing internet desktop grids. In: HPCC 2006: Proceeding of 2th International Conference on High Performance Computing and Communications, Munich, pp. 357–358 (2006)
35.
Zurück zum Zitat Marx, D.: Graph coloring Pproblems and their applications in scheduling. In: Proceedings of John von Neumann, PhD Students Conference, pp. 1–2 (2004) Marx, D.: Graph coloring Pproblems and their applications in scheduling. In: Proceedings of John von Neumann, PhD Students Conference, pp. 1–2 (2004)
36.
Zurück zum Zitat Pal, A.J., Sarma, S.S., Ray, B.: CCTP, graph coloring algorithms—soft computing solutions. In: Proceedings of the 6th IEEE International Conference on Cognitive Informatics (COGINF’07). IEEE Computer Society, Washington, DC, pp. 364-372 (2007) Pal, A.J., Sarma, S.S., Ray, B.: CCTP, graph coloring algorithms—soft computing solutions. In: Proceedings of the 6th IEEE International Conference on Cognitive Informatics (COGINF’07). IEEE Computer Society, Washington, DC, pp. 364-372 (2007)
37.
Zurück zum Zitat Rebbah, M., Slimani, Y., Benyettou, A., Brunie, L.: Dynamic hierarchical model for fault tolerant grid computing. World Appl. Program. J. 1(5), 309–321 (2011) Rebbah, M., Slimani, Y., Benyettou, A., Brunie, L.: Dynamic hierarchical model for fault tolerant grid computing. World Appl. Program. J. 1(5), 309–321 (2011)
38.
Zurück zum Zitat Sonnek, J.D., Chandra, A., Weissman, J.B.: Adaptive reputation-based scheduling on unreliable distributed infrastructures. IEEE Trans. Parallel Distrib. Syst. 18(11), 1551–1564 (2007)CrossRef Sonnek, J.D., Chandra, A., Weissman, J.B.: Adaptive reputation-based scheduling on unreliable distributed infrastructures. IEEE Trans. Parallel Distrib. Syst. 18(11), 1551–1564 (2007)CrossRef
39.
Zurück zum Zitat Sun, Q., Wang, S., Zou, H., Yang, F.: QSSA: a QoS-aware service selection approach. Int. J. Web Grid Serv. 7(2), 147–169 (2011)CrossRef Sun, Q., Wang, S., Zou, H., Yang, F.: QSSA: a QoS-aware service selection approach. Int. J. Web Grid Serv. 7(2), 147–169 (2011)CrossRef
41.
Zurück zum Zitat Tourino, J., Martin, M.J., Tarrio, J., Arenaz, M.: A grid portal for an undergraduate parallel programming course. IEEE Trans. Educ. 48(3), 391–399 (2005)CrossRef Tourino, J., Martin, M.J., Tarrio, J., Arenaz, M.: A grid portal for an undergraduate parallel programming course. IEEE Trans. Educ. 48(3), 391–399 (2005)CrossRef
42.
Zurück zum Zitat Xia, Y., Jiang, C., Sun, T., Yang, R.: A novel failure detection algorithm for reliable distributed systems. J. Comput. 6(10), 2013–2020 (2011)CrossRef Xia, Y., Jiang, C., Sun, T., Yang, R.: A novel failure detection algorithm for reliable distributed systems. J. Comput. 6(10), 2013–2020 (2011)CrossRef
43.
Zurück zum Zitat Zhang, Y., Huang, G., Liu, X., Mei, H.: Integrating resource consumption and allocation for infrastructure resources on-demand. In: CLOUD 2010 Proceeding of 3th International Conference on Cloud Computing, Miami, pp. 75–82 (2010) Zhang, Y., Huang, G., Liu, X., Mei, H.: Integrating resource consumption and allocation for infrastructure resources on-demand. In: CLOUD 2010 Proceeding of 3th International Conference on Cloud Computing, Miami, pp. 75–82 (2010)
44.
Zurück zum Zitat Zheng, Z., Zhou, T.C., Lyu, M.R., King, I.: Component ranking for fault-tolerant cloud applications. IEEE Trans. Serv. Comput. 5(4), 540–550 (2010)CrossRef Zheng, Z., Zhou, T.C., Lyu, M.R., King, I.: Component ranking for fault-tolerant cloud applications. IEEE Trans. Serv. Comput. 5(4), 540–550 (2010)CrossRef
Metadaten
Titel
A decentralized fault tolerance model based on level of performance for grid environment
verfasst von
Mohammed Rebbah
Yahya Slimani
Abdelkader Benyettou
Lionel Brunie
Publikationsdatum
01.03.2016
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 1/2016
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-015-0497-x

Weitere Artikel der Ausgabe 1/2016

Cluster Computing 1/2016 Zur Ausgabe