Skip to main content
Erschienen in: Cluster Computing 2/2014

01.06.2014

DA-TC: a novel application execution model in multicluster systems

verfasst von: Zhifeng Yun, Zhou Lei, Gabrielle Allen, Daniel S. Katz, J. Ramanujam

Erschienen in: Cluster Computing | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The availability of a large number of separate clusters has given rise to the field of multicluster systems in which these resources are coupled to obtain their combined benefits to solve large-scale compute-intensive applications. However, it is challenging to achieve automatic load balancing of the jobs across these participating autonomic systems. We developed a novel user space execution model named DA-TC to address the workload allocation techniques for the applications with large number of sequential jobs in multicluster systems. Through this model, we can achieve dynamic load balancing for task assignment, and slower resources become beneficial factors rather than bottlenecks for application execution. The effectiveness of this strategy is demonstrated through theoretical analysis. This model is also evaluated through extensive experimental studies and the results show that when compared with the traditional method, the proposed DA-TC model can significantly improve the performance of application execution in terms of application turnaround time and system reliability in multicluster circumstances.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Task clustering technologies can partially remove the inter-task communication.
 
2
Some clusters may be overloaded by other users and take a long time to allocate resources for its assigned TCs.
 
3
We only consider the single CPU per node. It can be easily extended to multicore system if job is scheduled at the core level.
 
4
A system is said to be greedy if it never leaves any resource idle unless there is no job waiting for that resource.
 
5
Multiple tasks are merged into one single job.
 
6
The experiment can easily be extended on heterogeneous participating clusters with different local scheduling systems and access methods.
 
Literatur
1.
Zurück zum Zitat Abawajy, J., Dandamudi, S.: Parallel job scheduling on multicluster computing system. In: Proceedings of 2003 IEEE International Conference on Cluster Computing, pp. 11–18. IEEE Computer Society Press, Los Alamitos (2003). doi:10.1109/CLUSTR.2003.1253294 Abawajy, J., Dandamudi, S.: Parallel job scheduling on multicluster computing system. In: Proceedings of 2003 IEEE International Conference on Cluster Computing, pp. 11–18. IEEE Computer Society Press, Los Alamitos (2003). doi:10.​1109/​CLUSTR.​2003.​1253294
2.
Zurück zum Zitat Aumage, O.: Heterogeneous multi-cluster networking with the Madeleine III communication library. In: Proceedings of 2002 IEEE International Parallel and Distributed Processing Symposium, vol. 2, pp. 85–96. IEEE Computer Society Press, Los Alamitos (2002). doi:10.1109/IPDPS.2002.1015658 Aumage, O.: Heterogeneous multi-cluster networking with the Madeleine III communication library. In: Proceedings of 2002 IEEE International Parallel and Distributed Processing Symposium, vol. 2, pp. 85–96. IEEE Computer Society Press, Los Alamitos (2002). doi:10.​1109/​IPDPS.​2002.​1015658
4.
Zurück zum Zitat Banen, S., Bucur, A.I.D., Epema, D.H.J.: A measurement-based simulation study of processor co-allocation in multicluster systems. In: Scheduling Strategies for Parallel Processing, pp. 105–128. Springer, Berlin (2003) CrossRef Banen, S., Bucur, A.I.D., Epema, D.H.J.: A measurement-based simulation study of processor co-allocation in multicluster systems. In: Scheduling Strategies for Parallel Processing, pp. 105–128. Springer, Berlin (2003) CrossRef
5.
Zurück zum Zitat Barreto, M., Avila, R., Navaux, P.: The multicluster model to the integrated use of multiple workstation clusters. In: Proc. of the 3rd Workshop on Personal Computer-based Networks of Workstations, pp. 71–80 (2000) Barreto, M., Avila, R., Navaux, P.: The multicluster model to the integrated use of multiple workstation clusters. In: Proc. of the 3rd Workshop on Personal Computer-based Networks of Workstations, pp. 71–80 (2000)
6.
Zurück zum Zitat Berten, V., Goossens, J., Jeannot, E.: On the distribution of sequential jobs in random brokering for heterogeneous computational grids. IEEE Trans. Parallel Distrib. Syst. 17(2), 113–124 (2006) CrossRef Berten, V., Goossens, J., Jeannot, E.: On the distribution of sequential jobs in random brokering for heterogeneous computational grids. IEEE Trans. Parallel Distrib. Syst. 17(2), 113–124 (2006) CrossRef
7.
Zurück zum Zitat Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Nashua (1996) Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Nashua (1996)
8.
Zurück zum Zitat Brevik, J., Nurmi, D., Wolski, R.: Predicting bounds on queuing delay for batch-scheduled parallel machines. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP’06, pp. 110–118. ACM Press, New York (2006). doi:10.1145/1122971.1122989 Brevik, J., Nurmi, D., Wolski, R.: Predicting bounds on queuing delay for batch-scheduled parallel machines. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP’06, pp. 110–118. ACM Press, New York (2006). doi:10.​1145/​1122971.​1122989
10.
Zurück zum Zitat Chu, M., Fan, K., Mahlke, S.: Region-based hierarchical operation partitioning for multicluster processors. In: Proc. of the SIGPLAN’03 Conference on Programming Language Design and Implementation, pp. 300–311 (2003) Chu, M., Fan, K., Mahlke, S.: Region-based hierarchical operation partitioning for multicluster processors. In: Proc. of the SIGPLAN’03 Conference on Programming Language Design and Implementation, pp. 300–311 (2003)
11.
Zurück zum Zitat Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A.C., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005) Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A.C., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)
12.
Zurück zum Zitat Downey, A.: Using queue time predictions for processor allocation. In: Feitelson, D., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. Lecture Notes in Computer Science, vol. 1291, pp. 35–57. Springer, Berlin/Heidelberg (1997) CrossRef Downey, A.: Using queue time predictions for processor allocation. In: Feitelson, D., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. Lecture Notes in Computer Science, vol. 1291, pp. 35–57. Springer, Berlin/Heidelberg (1997) CrossRef
15.
Zurück zum Zitat Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-g: a computation management agent for multi-institutional grids. In: Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing, pp. 55–63 (2001). doi:10.1109/HPDC.2001.945176 CrossRef Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-g: a computation management agent for multi-institutional grids. In: Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing, pp. 55–63 (2001). doi:10.​1109/​HPDC.​2001.​945176 CrossRef
16.
Zurück zum Zitat He, L., Jarvis, S.A., Spooner, D.P., Chen, X., Nudd, G.R.: Hybrid performance-based workload management for multiclusters and grids. IEE Proc., Softw. 151(5), 224–231 (2004) CrossRef He, L., Jarvis, S.A., Spooner, D.P., Chen, X., Nudd, G.R.: Hybrid performance-based workload management for multiclusters and grids. IEE Proc., Softw. 151(5), 224–231 (2004) CrossRef
17.
Zurück zum Zitat He, L., Jarvis, S.A., Spooner, D.P., Jiang, H., Dillenberger, D.N., Nudd, G.R.: Allocating non-real-time and soft real-time jobs in multiclusters. IEEE Trans. Parallel Distrib. Syst. 17, 99–112 (2006). doi:10.1109/TPDS.2006.18 CrossRef He, L., Jarvis, S.A., Spooner, D.P., Jiang, H., Dillenberger, D.N., Nudd, G.R.: Allocating non-real-time and soft real-time jobs in multiclusters. IEEE Trans. Parallel Distrib. Syst. 17, 99–112 (2006). doi:10.​1109/​TPDS.​2006.​18 CrossRef
18.
Zurück zum Zitat He, L., Jarvis, S.A., Spooner, D.P., Nudd, G.R.: Optimising static workload allocation in multiclusters. In: Proceedings of 18th IEEE International Parallel and Distributed Processing Symposium (IPDPS’04), pp. 26–30. IEEE Computer Society Press, Los Alamitos (2004) He, L., Jarvis, S.A., Spooner, D.P., Nudd, G.R.: Optimising static workload allocation in multiclusters. In: Proceedings of 18th IEEE International Parallel and Distributed Processing Symposium (IPDPS’04), pp. 26–30. IEEE Computer Society Press, Los Alamitos (2004)
20.
Zurück zum Zitat Kee, Y.S., Kesselman, C., Nurmi, D., Wolski, R.: Enabling personal clusters on demand for batch resources using commodity software. In: Parallel and Distributed Processing Symposium, International, pp. 1–7 (2008). doi:10.1109/IPDPS.2008.4536167 Kee, Y.S., Kesselman, C., Nurmi, D., Wolski, R.: Enabling personal clusters on demand for batch resources using commodity software. In: Parallel and Distributed Processing Symposium, International, pp. 1–7 (2008). doi:10.​1109/​IPDPS.​2008.​4536167
21.
Zurück zum Zitat Khalid, O., Anthony, R.J., Nilsson, P., Keahey, K., Schulz, M., Parrot, K., Petridis, M.: Enabling and optimizing pilot jobs using xen based virtual machines for the hpc grid applications. In: VTDC’09: Proceedings of the 3rd International Workshop on Virtualization Technologies in Distributed Computing, pp. 1–8. ACM Press, New York (2009) CrossRef Khalid, O., Anthony, R.J., Nilsson, P., Keahey, K., Schulz, M., Parrot, K., Petridis, M.: Enabling and optimizing pilot jobs using xen based virtual machines for the hpc grid applications. In: VTDC’09: Proceedings of the 3rd International Workshop on Virtualization Technologies in Distributed Computing, pp. 1–8. ACM Press, New York (2009) CrossRef
22.
Zurück zum Zitat Kleinrock, L.: Queueing System. Wiley, New York (1975) Kleinrock, L.: Queueing System. Wiley, New York (1975)
25.
Zurück zum Zitat Nelson, R.: Probability, Stochastic Processes, and Queueing Theory. Springer, Berlin (1995) CrossRefMATH Nelson, R.: Probability, Stochastic Processes, and Queueing Theory. Springer, Berlin (1995) CrossRefMATH
26.
27.
Zurück zum Zitat Nurmi, D., Brevik, J., Wolski, R.: Qbets: queue bounds estimation from time series. In: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP), pp. 76–101 (2007) Nurmi, D., Brevik, J., Wolski, R.: Qbets: queue bounds estimation from time series. In: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP), pp. 76–101 (2007)
28.
Zurück zum Zitat Nurmi, D., Wolski, R., Brevik, J.: Probabilistic advanced reservations for batch-scheduled parallel machines. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP’08, pp. 289–290. ACM Press, New York (2008). doi:10.1145/1345206.1345260 Nurmi, D., Wolski, R., Brevik, J.: Probabilistic advanced reservations for batch-scheduled parallel machines. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP’08, pp. 289–290. ACM Press, New York (2008). doi:10.​1145/​1345206.​1345260
29.
Zurück zum Zitat Nurmi, D.C., Wolski, R., Brevik, J.: Varq: virtual advance reservations for queues. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing, HPDC’08, pp. 75–86. ACM Press, New York (2008). doi:10.1145/1383422.1383433 CrossRef Nurmi, D.C., Wolski, R., Brevik, J.: Varq: virtual advance reservations for queues. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing, HPDC’08, pp. 75–86. ACM Press, New York (2008). doi:10.​1145/​1383422.​1383433 CrossRef
31.
Zurück zum Zitat Sfiligoi, I.: Making science in the grid world: using glideins to maximize scientific output. In: Nuclear Science Symposium Conference Record, 2007. NSS’07, vol. 2, pp. 1107–1109. IEEE Press, New York (2007) CrossRef Sfiligoi, I.: Making science in the grid world: using glideins to maximize scientific output. In: Nuclear Science Symposium Conference Record, 2007. NSS’07, vol. 2, pp. 1107–1109. IEEE Press, New York (2007) CrossRef
34.
Zurück zum Zitat Tang, X., Chanson, S.T.: Optimizing static job scheduling in a network of heterogeneous computers. In: International Conference on Parallel Processing, p. 373. IEEE Computer Society Press, Los Alamitos (2000). doi:10.1109/ICPP.2000.876153 Tang, X., Chanson, S.T.: Optimizing static job scheduling in a network of heterogeneous computers. In: International Conference on Parallel Processing, p. 373. IEEE Computer Society Press, Los Alamitos (2000). doi:10.​1109/​ICPP.​2000.​876153
35.
Zurück zum Zitat Thain, D., Tannenbaum, T., Livny, M.: Condor and the grid. In: Berman, F., Fox, G., Hey, T. (eds.) Grid Computing: Making the Global Infrastructure a Reality. Wiley, New York (2002) Thain, D., Tannenbaum, T., Livny, M.: Condor and the grid. In: Berman, F., Fox, G., Hey, T. (eds.) Grid Computing: Making the Global Infrastructure a Reality. Wiley, New York (2002)
36.
Zurück zum Zitat Tsaregorodtsev, A., Garonne, V., Stokes-Rees, I.: DIRAC: A scalable lightweight architecture for high throughput computing. In: IEEE/ACM International Workshop on Grid Computing, pp. 19–25 (2004) CrossRef Tsaregorodtsev, A., Garonne, V., Stokes-Rees, I.: DIRAC: A scalable lightweight architecture for high throughput computing. In: IEEE/ACM International Workshop on Grid Computing, pp. 19–25 (2004) CrossRef
37.
Zurück zum Zitat Walker, E., Gardner, J., Litvin, V., Turner, E.: Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment. In: IEEE Workshop on Challenges of Large Applications in Distributed Environments, Paris, France, pp. 95–103 (2006). doi:10.1109/CLADE.2006.1652061 Walker, E., Gardner, J., Litvin, V., Turner, E.: Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment. In: IEEE Workshop on Challenges of Large Applications in Distributed Environments, Paris, France, pp. 95–103 (2006). doi:10.​1109/​CLADE.​2006.​1652061
38.
Zurück zum Zitat Xie, M., Yun, Z., Lei, Z., Allen, G.: Cluster abstraction: towards uniform resource description and access in multicluster grid. In: International Multi-Symposiums on Computer and Computational Sciences, pp. 220–227 (2007) Xie, M., Yun, Z., Lei, Z., Allen, G.: Cluster abstraction: towards uniform resource description and access in multicluster grid. In: International Multi-Symposiums on Computer and Computational Sciences, pp. 220–227 (2007)
39.
Zurück zum Zitat Xu, M.: Effective metacomputing using LSF multicluster. In: Proceedings of IEEE International Symposium on Cluster Computing and the Grid(CCGrid01) (2001) Xu, M.: Effective metacomputing using LSF multicluster. In: Proceedings of IEEE International Symposium on Cluster Computing and the Grid(CCGrid01) (2001)
40.
Zurück zum Zitat Yoshimoto, K., Kovatch, P.A., Andrews, P.: Co-scheduling with user-settable reservations. In: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP), pp. 146–156 (2005) CrossRef Yoshimoto, K., Kovatch, P.A., Andrews, P.: Co-scheduling with user-settable reservations. In: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP), pp. 146–156 (2005) CrossRef
41.
Zurück zum Zitat Zhang, Y., Koelbel, C., Cooper, K.: Batch queue resource scheduling for workflow applications. In: IEEE International Conference on Cluster Computing and Workshops (CLUSTER’09), pp. 1–10 (2009). doi:10.1109/CLUSTR.2009.5289186 Zhang, Y., Koelbel, C., Cooper, K.: Batch queue resource scheduling for workflow applications. In: IEEE International Conference on Cluster Computing and Workshops (CLUSTER’09), pp. 1–10 (2009). doi:10.​1109/​CLUSTR.​2009.​5289186
Metadaten
Titel
DA-TC: a novel application execution model in multicluster systems
verfasst von
Zhifeng Yun
Zhou Lei
Gabrielle Allen
Daniel S. Katz
J. Ramanujam
Publikationsdatum
01.06.2014
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 2/2014
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-012-0228-5

Weitere Artikel der Ausgabe 2/2014

Cluster Computing 2/2014 Zur Ausgabe

Premium Partner