Skip to main content
Erschienen in: Cluster Computing 3/2014

01.09.2014

A Threshold-based Dynamic Data Replication and Parallel Job Scheduling strategy to enhance Data Grid

verfasst von: N. Mansouri

Erschienen in: Cluster Computing | Ausgabe 3/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data Grids provide environment for huge, data-intensive applications that produce and process enormous data. Such environments are thus asked to manage data and schedule jobs at the same time. These two important operations have to be tightly coupled to achieve the best results. Replication techniques are widely used to increase the availability of data, improving performance of query latency and load balancing in Data Grid. Also effective resource scheduling is a challenging research issue. In this paper we propose a job scheduling policy, called Parallel Job Scheduling (PJS), and a dynamic data replication strategy, called Threshold-based Dynamic Data Replication (TDDR), to improve the data access efficiencies in a hierarchical Data Grid. The PJS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. The main idea of TDDR strategy is using a threshold value to determine if the requested replica needs to be copied to the node. The TDDR determines this threshold dynamically based on data request arrival rates and available storage capacities. Then, in order to overcome the problem of limited storage space in each node, we design an efficient replica replacement strategy, which is developed as a two stages process. First, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. Results from the simulation show that our proposed algorithms have better performance in comparison with other algorithms in terms of Mean Job Time, Number of Intercommunications, Number of Replications, Computing Resource Usage, and Effective Network Usage.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yeo, C.S., Buyya, R., Assuncao, M.D., Yu, J., Sulistio, A., Venugopal, S., Placek, M.: Utility computing on global grids. In: Bidgoli, H. (ed.) Handbook of Computer Networks. Wiley, New York (2006) Yeo, C.S., Buyya, R., Assuncao, M.D., Yu, J., Sulistio, A., Venugopal, S., Placek, M.: Utility computing on global grids. In: Bidgoli, H. (ed.) Handbook of Computer Networks. Wiley, New York (2006)
3.
Zurück zum Zitat Pinel, F., Dorronsoro, B., Pecero, J.E., Bouvry, P., Khan, U.S.: A two-phase heuristic for the energy-efficient scheduling of independent tasks on computational grids. Clust. Comput. 16(3), 421–433 (2013). doi:10.1007/s10586-012-0207-x CrossRef Pinel, F., Dorronsoro, B., Pecero, J.E., Bouvry, P., Khan, U.S.: A two-phase heuristic for the energy-efficient scheduling of independent tasks on computational grids. Clust. Comput. 16(3), 421–433 (2013). doi:10.​1007/​s10586-012-0207-x CrossRef
4.
Zurück zum Zitat Taheri, J., Lee, Y.C., Zomaya, A.Y., Siegel, H.: A Bee Colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Comput. Oper. Res. (2011). doi:10.1016/j.cor.2011.11.012 Taheri, J., Lee, Y.C., Zomaya, A.Y., Siegel, H.: A Bee Colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Comput. Oper. Res. (2011). doi:10.​1016/​j.​cor.​2011.​11.​012
5.
9.
Zurück zum Zitat Ranganathan, K., Foster, I.: Design and evaluation of dynamic replication strategies for a high performance Data Grid. In: International Conference on Computing in High Energy and Nuclear Physics (2001) Ranganathan, K., Foster, I.: Design and evaluation of dynamic replication strategies for a high performance Data Grid. In: International Conference on Computing in High Energy and Nuclear Physics (2001)
11.
Zurück zum Zitat Ranganathan, K., Iamnitchi, A., Foster, I.: Improving data availability through dynamic model-driven replication in large peer-to-peer communities. In: CCGrid, pp. 376–381 (2002). doi:10.1109/CCGRID.2002.1017164 Ranganathan, K., Iamnitchi, A., Foster, I.: Improving data availability through dynamic model-driven replication in large peer-to-peer communities. In: CCGrid, pp. 376–381 (2002). doi:10.​1109/​CCGRID.​2002.​1017164
12.
Zurück zum Zitat Rahman, R.M., Barker, K., Alhajj, R.: Replica placement in data grid: considering utility and risk. In: International Conference on Information Technology: Coding and Computing, vol. 1, pp. 354–359 (2005). doi:10.1109/ITCC.2005.117 Rahman, R.M., Barker, K., Alhajj, R.: Replica placement in data grid: considering utility and risk. In: International Conference on Information Technology: Coding and Computing, vol. 1, pp. 354–359 (2005). doi:10.​1109/​ITCC.​2005.​117
13.
15.
Zurück zum Zitat Yuan, Y., Wu, Y., Yang, G., Yu, F.: Dynamic data replication based on local optimization principle in Data Grid. In: Proceedings of the Sixth International Conference on Grid and Cooperative Computing, pp. 815–822 (2007). doi:10.1109/gcc.2007.62 Yuan, Y., Wu, Y., Yang, G., Yu, F.: Dynamic data replication based on local optimization principle in Data Grid. In: Proceedings of the Sixth International Conference on Grid and Cooperative Computing, pp. 815–822 (2007). doi:10.​1109/​gcc.​2007.​62
16.
17.
Zurück zum Zitat Dang, N.N., Lim, S.B.: Combination of replication and scheduling in data grid. Int. J. Comput. Sci. Netw. Secur. 7(3), 304–308 (2007) Dang, N.N., Lim, S.B.: Combination of replication and scheduling in data grid. Int. J. Comput. Sci. Netw. Secur. 7(3), 304–308 (2007)
18.
Zurück zum Zitat Liu, C., Baskiyar, S.: A scalable grid scheduler for real-time applications. Int. J. Comput. Appl. 16(1), 34–42 (2009) Liu, C., Baskiyar, S.: A scalable grid scheduler for real-time applications. Int. J. Comput. Appl. 16(1), 34–42 (2009)
19.
Zurück zum Zitat Song, H.J., Liu, J., Jakobsen, D., Zhang, X., Taura, K., Chien, A.: The MicroGrid: a scientific tool for modeling computational grids. Sci. Program. 8(3), 127–141 (2000) Song, H.J., Liu, J., Jakobsen, D., Zhang, X., Taura, K., Chien, A.: The MicroGrid: a scientific tool for modeling computational grids. Sci. Program. 8(3), 127–141 (2000)
20.
Zurück zum Zitat Takefusa, A., Matsuoka, S., Nakada, H., Aida, K., Nagashima, U.: Overview of a performance evaluation system for global computing scheduling algorithms. In: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing (1999) Takefusa, A., Matsuoka, S., Nakada, H., Aida, K., Nagashima, U.: Overview of a performance evaluation system for global computing scheduling algorithms. In: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing (1999)
21.
Zurück zum Zitat Casanova, H.: SimGrid: a toolkit for the simulation of application scheduling. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 430–437 (2001). doi:10.1109/CCGRID.2001.923223 CrossRef Casanova, H.: SimGrid: a toolkit for the simulation of application scheduling. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 430–437 (2001). doi:10.​1109/​CCGRID.​2001.​923223 CrossRef
22.
Zurück zum Zitat Buyya, R., Murshed, M.: GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. J. Concurr. Comput. 14, 1175–1200 (2002) CrossRefMATH Buyya, R., Murshed, M.: GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. J. Concurr. Comput. 14, 1175–1200 (2002) CrossRefMATH
23.
Zurück zum Zitat Bell, W.H., Cameron, D.G., Capozza, L., Millar, A.P., Stockinger, K., Zini, F.: OptorSim: a grid simulator for studying dynamic data replication strategies. Int. J. High Perform. Comput. Appl. 17, 1–20 (2003) Bell, W.H., Cameron, D.G., Capozza, L., Millar, A.P., Stockinger, K., Zini, F.: OptorSim: a grid simulator for studying dynamic data replication strategies. Int. J. High Perform. Comput. Appl. 17, 1–20 (2003)
24.
Zurück zum Zitat Ranganathan, K., Foster, I.: Identifying dynamic replication strategies for a high performance Data Grid. In: Proceedings of the Second International Workshop on Grid Computing, pp. 75–86 (2001) Ranganathan, K., Foster, I.: Identifying dynamic replication strategies for a high performance Data Grid. In: Proceedings of the Second International Workshop on Grid Computing, pp. 75–86 (2001)
27.
Zurück zum Zitat Sashi, K., Thanamani, A.S.: Dynamic replica management for Data Grid. Int. J. Eng. Technol. 2, 329–333 (2010) CrossRef Sashi, K., Thanamani, A.S.: Dynamic replica management for Data Grid. Int. J. Eng. Technol. 2, 329–333 (2010) CrossRef
28.
Zurück zum Zitat Park, S.-M., Kim, J.-H., Go, Y.-B., Yoon, W.-S.: Dynamic grid replication strategy based on Internet hierarchy. In: International Workshop on Grid and Cooperative Computing, vol. 1, pp. 1324–1331 (2003) Park, S.-M., Kim, J.-H., Go, Y.-B., Yoon, W.-S.: Dynamic grid replication strategy based on Internet hierarchy. In: International Workshop on Grid and Cooperative Computing, vol. 1, pp. 1324–1331 (2003)
29.
Zurück zum Zitat Sashi, K., Thanamani, A.: Dynamic replication in a Data Grid using a modified BHR region based algorithm. Future Gener. Comput. Syst. 27(2), 202–210 (2011) CrossRef Sashi, K., Thanamani, A.: Dynamic replication in a Data Grid using a modified BHR region based algorithm. Future Gener. Comput. Syst. 27(2), 202–210 (2011) CrossRef
30.
Zurück zum Zitat Horri, A., Sepahvand, R., Dastghaibyfard, G.H.: A hierarchical scheduling and replication strategy. Int. J. Comput. Sci. Netw. Secur. 8(8), 30–35 (2008) Horri, A., Sepahvand, R., Dastghaibyfard, G.H.: A hierarchical scheduling and replication strategy. Int. J. Comput. Sci. Netw. Secur. 8(8), 30–35 (2008)
33.
Zurück zum Zitat Nukarapu, D.T., Tang, B., Wang, L., Lu, S.: Data replication in data intensive scientific applications with performance guarantee. IEEE Trans. Parallel Distrib. Syst. (2011). doi:10.1109/TPDS.2010.207 Nukarapu, D.T., Tang, B., Wang, L., Lu, S.: Data replication in data intensive scientific applications with performance guarantee. IEEE Trans. Parallel Distrib. Syst. (2011). doi:10.​1109/​TPDS.​2010.​207
35.
Zurück zum Zitat Zhang, J., Lee, B., Tang, X., Yeo, C.: Impact of parallel download on job scheduling in Data Grid environment. In: Seventh International Conference on Grid and Cooperative Computing, pp. 102–109 (2008) CrossRef Zhang, J., Lee, B., Tang, X., Yeo, C.: Impact of parallel download on job scheduling in Data Grid environment. In: Seventh International Conference on Grid and Cooperative Computing, pp. 102–109 (2008) CrossRef
36.
Zurück zum Zitat Tang, M., Lee, B.S., Tang, X., Yeo, C.: The impact of data replication on job scheduling performance in the Data Grid. Future Gener. Comput. Syst. 22, 254–268 (2006) CrossRefMATH Tang, M., Lee, B.S., Tang, X., Yeo, C.: The impact of data replication on job scheduling performance in the Data Grid. Future Gener. Comput. Syst. 22, 254–268 (2006) CrossRefMATH
37.
Zurück zum Zitat Vazhkudai, S.: Enabling the co-allocation of Grid Data transfers. In: Proceedings of the Fourth International Workshop on Grid Computing, pp. 44–51 (2003) Vazhkudai, S.: Enabling the co-allocation of Grid Data transfers. In: Proceedings of the Fourth International Workshop on Grid Computing, pp. 44–51 (2003)
38.
Zurück zum Zitat Shorfuzzaman, M., Graham, P., Eskicioglu, R.: Adaptive popularity-driven replica placement in hierarchical Data Grids. J. Supercomput. 51, 374–392 (2010) CrossRef Shorfuzzaman, M., Graham, P., Eskicioglu, R.: Adaptive popularity-driven replica placement in hierarchical Data Grids. J. Supercomput. 51, 374–392 (2010) CrossRef
40.
Zurück zum Zitat Mansouri, N., Dastghaibyfard, G.H., Mansouri, E.: Combination of data replication and scheduling algorithm for improving data availability in Data Grids. J. Netw. Comput. Appl. 36, 711–722 (2013) CrossRef Mansouri, N., Dastghaibyfard, G.H., Mansouri, E.: Combination of data replication and scheduling algorithm for improving data availability in Data Grids. J. Netw. Comput. Appl. 36, 711–722 (2013) CrossRef
41.
42.
Zurück zum Zitat Cameron, D.G., Carvajal-schiaffino, R., Millar, A.P., Nicholson, C., Stockinger, K., Zini, F.: UK grid simulation with OptorSim. In: UK e-Science All Hands Meeting (2003) Cameron, D.G., Carvajal-schiaffino, R., Millar, A.P., Nicholson, C., Stockinger, K., Zini, F.: UK grid simulation with OptorSim. In: UK e-Science All Hands Meeting (2003)
Metadaten
Titel
A Threshold-based Dynamic Data Replication and Parallel Job Scheduling strategy to enhance Data Grid
verfasst von
N. Mansouri
Publikationsdatum
01.09.2014
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 3/2014
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-013-0330-3

Weitere Artikel der Ausgabe 3/2014

Cluster Computing 3/2014 Zur Ausgabe