Skip to main content
Erschienen in: Journal of Scheduling 2/2018

20.07.2017

Multi-stage resource-aware scheduling for data centers with heterogeneous servers

verfasst von: Tony T. Tran, Meghana Padmanabhan, Peter Yun Zhang, Heyse Li, Douglas G. Down, J. Christopher Beck

Erschienen in: Journal of Scheduling | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a three-stage algorithm for resource-aware scheduling of computational jobs in a large-scale heterogeneous data center. The algorithm aims to allocate job classes to machine configurations to attain an efficient mapping between job resource request profiles and machine resource capacity profiles. The first stage uses a queueing model that treats the system in an aggregated manner with pooled machines and jobs represented as a fluid flow. The latter two stages use combinatorial optimization techniques to solve a shorter-term, more accurate representation of the problem using the first-stage, long-term solution for heuristic guidance. In the second stage, jobs and machines are discretized. A linear programming model is used to obtain a solution to the discrete problem that maximizes the system capacity given a restriction on the job class and machine configuration pairings based on the solution of the first stage. The final stage is a scheduling policy that uses the solution from the second stage to guide the dispatching of arriving jobs to machines. We present experimental results of our algorithm on both Google workload trace data and generated data and show that it outperforms existing schedulers. These results illustrate the importance of considering heterogeneity of both job and machine configuration profiles in making effective scheduling decisions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Earlier work on our algorithm, appearing at the Multidisciplinary International Scheduling Conference: Theory and Applications (MISTA) 2015 presented a comparison only to the Greedy policy. We have extended the paper by improving our algorithm, including a comparison to the Tetris scheduler, and significantly expanding the experimentation.
 
2
It may be beneficial to consider the dominant resource classification of Dominant Resource Fairness when creating such an ordering (Ghodsi et al. 2011).
 
4
We examine the impact of processing time variation in subsequent experiments (see Sect. 5.4.3).
 
5
Note that \(\lambda ^*\) represents an upper bound on the system load that can be handled. The bound may not be tight depending on the fragmentation of resources on a machine and/or the inefficiencies in the scheduling model used.
 
Literatur
Zurück zum Zitat Al-Azzoni, I., & Down, D. G. (2008). Linear programming-based affinity scheduling of independent tasks on heterogeneous computing systems. IEEE Transactions on Parallel and Distributed Systems, 19(12), 1671–1682.CrossRef Al-Azzoni, I., & Down, D. G. (2008). Linear programming-based affinity scheduling of independent tasks on heterogeneous computing systems. IEEE Transactions on Parallel and Distributed Systems, 19(12), 1671–1682.CrossRef
Zurück zum Zitat Andradóttir, S., Ayhan, H., & Down, D. G. (2003). Dynamic server allocation for queueing networks with flexible servers. Operations Research, 51(6), 952–968.CrossRef Andradóttir, S., Ayhan, H., & Down, D. G. (2003). Dynamic server allocation for queueing networks with flexible servers. Operations Research, 51(6), 952–968.CrossRef
Zurück zum Zitat Berral, J. L., Goiri, Í., Nou, R., Julià, F., Guitart, J., Gavaldà, R., & Torres, J. (2010). Towards energy-aware scheduling in data centers using machine learning. In Proceedings of the 1st international conference on energy-efficient computing and networking (pp. 215–224). ACM. Berral, J. L., Goiri, Í., Nou, R., Julià, F., Guitart, J., Gavaldà, R., & Torres, J. (2010). Towards energy-aware scheduling in data centers using machine learning. In Proceedings of the 1st international conference on energy-efficient computing and networking (pp. 215–224). ACM.
Zurück zum Zitat Dai, J. G., & Meyn, S. P. (1995). Stability and convergence of moments for multiclass queueing networks via fluid limit models. IEEE Transactions on Automatic Control, 40(11), 1889–1904.CrossRef Dai, J. G., & Meyn, S. P. (1995). Stability and convergence of moments for multiclass queueing networks via fluid limit models. IEEE Transactions on Automatic Control, 40(11), 1889–1904.CrossRef
Zurück zum Zitat Gandhi, A., Harchol-Balter, M., & Kozuch, M. A. (2012). Are sleep states effective in data centers? In International green computing conference (IGCC) (pp. 1–10). IEEE. Gandhi, A., Harchol-Balter, M., & Kozuch, M. A. (2012). Are sleep states effective in data centers? In International green computing conference (IGCC) (pp. 1–10). IEEE.
Zurück zum Zitat Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., & Stoica, I. (2011). Dominant resource fairness: Fair allocation of multiple resource types. In Proceedings of the 8th USENIX conference on networked systems design and implementation (Vol. 11, pp. 323–336). Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., & Stoica, I. (2011). Dominant resource fairness: Fair allocation of multiple resource types. In Proceedings of the 8th USENIX conference on networked systems design and implementation (Vol. 11, pp. 323–336).
Zurück zum Zitat Grandl, R., Ananthanarayanan, G., Kandula, S., Rao, S., & Akella, A. (2014). Multi-resource packing for cluster schedulers. In Proceedings of the 2014 ACM conference on SIGCOMM (pp. 455–466). ACM. Grandl, R., Ananthanarayanan, G., Kandula, S., Rao, S., & Akella, A. (2014). Multi-resource packing for cluster schedulers. In Proceedings of the 2014 ACM conference on SIGCOMM (pp. 455–466). ACM.
Zurück zum Zitat Guazzone, M., Anglano, C., & Canonico, M. (2012). Exploiting vm migration for the automated power and performance management of green cloud computing systems. In Energy efficient data centers (Vol. 7396, pp. 81–92). Springer. Guazzone, M., Anglano, C., & Canonico, M. (2012). Exploiting vm migration for the automated power and performance management of green cloud computing systems. In Energy efficient data centers (Vol. 7396, pp. 81–92). Springer.
Zurück zum Zitat Guenter, B., Jain, N., & Williams, C. (2011). Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning. In INFOCOM, 2011 proceedings IEEE (pp. 1332–1340). IEEE. Guenter, B., Jain, N., & Williams, C. (2011). Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning. In INFOCOM, 2011 proceedings IEEE (pp. 1332–1340). IEEE.
Zurück zum Zitat He, Y.-T., & Down, D. G. (2008). Limited choice and locality considerations for load balancing. Performance Evaluation, 65(9), 670–687.CrossRef He, Y.-T., & Down, D. G. (2008). Limited choice and locality considerations for load balancing. Performance Evaluation, 65(9), 670–687.CrossRef
Zurück zum Zitat Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., & Goldberg, A. (2009). Quincy: Fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles (pp. 261–276). ACM. Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., & Goldberg, A. (2009). Quincy: Fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles (pp. 261–276). ACM.
Zurück zum Zitat Jain, R., Chiu, D.-M., & Hawe, W. (1984). A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. In Digital equipment corporation research technical report TR-301 (pp. 1–37). Jain, R., Chiu, D.-M., & Hawe, W. (1984). A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. In Digital equipment corporation research technical report TR-301 (pp. 1–37).
Zurück zum Zitat Kim, J.-K., Shivle, S., Siegel, H. J., Maciejewski, A. A., Braun, T. D., Schneider, M., et al. (2007). Dynamically mapping tasks with priorities and multiple deadlines in a heterogeneous environment. Journal of Parallel and Distributed Computing, 67(2), 154–169.CrossRef Kim, J.-K., Shivle, S., Siegel, H. J., Maciejewski, A. A., Braun, T. D., Schneider, M., et al. (2007). Dynamically mapping tasks with priorities and multiple deadlines in a heterogeneous environment. Journal of Parallel and Distributed Computing, 67(2), 154–169.CrossRef
Zurück zum Zitat Le, K., Bianchini, R., Zhang, J., Jaluria, Y., Meng, J., & Nguyen, T. D. (2011). Reducing electricity cost through virtual machine placement in high performance computing clouds. In Proceedings of the international conference for high performance computing, networking, storage and analysis (p. 22). ACM. Le, K., Bianchini, R., Zhang, J., Jaluria, Y., Meng, J., & Nguyen, T. D. (2011). Reducing electricity cost through virtual machine placement in high performance computing clouds. In Proceedings of the international conference for high performance computing, networking, storage and analysis (p. 22). ACM.
Zurück zum Zitat Liu, Z., Lin, M., Wierman, A., Low, S. H., & Andrew, L. L. H. (2011). Greening geographical load balancing. In Proceedings of the ACM SIGMETRICS joint international conference on measurement and modeling of computer systems (pp. 233–244). ACM. Liu, Z., Lin, M., Wierman, A., Low, S. H., & Andrew, L. L. H. (2011). Greening geographical load balancing. In Proceedings of the ACM SIGMETRICS joint international conference on measurement and modeling of computer systems (pp. 233–244). ACM.
Zurück zum Zitat Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.CrossRef Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.CrossRef
Zurück zum Zitat Maguluri, S. T., Srikant, R., & Ying, L. (2012a). Heavy traffic optimal resource allocation algorithms for cloud computing clusters. In Proceedings of the 24th international teletraffic congress (pp. 25). International Teletraffic Congress. Maguluri, S. T., Srikant, R., & Ying, L. (2012a). Heavy traffic optimal resource allocation algorithms for cloud computing clusters. In Proceedings of the 24th international teletraffic congress (pp. 25). International Teletraffic Congress.
Zurück zum Zitat Maguluri, S. T., Srikant, R., & Ying, L. (2012b). Stochastic models of load balancing and scheduling in cloud computing clusters. In Proceedings IEEE INFOCOM (pp. 702–710). IEEE. Maguluri, S. T., Srikant, R., & Ying, L. (2012b). Stochastic models of load balancing and scheduling in cloud computing clusters. In Proceedings IEEE INFOCOM (pp. 702–710). IEEE.
Zurück zum Zitat Mann, Z. Á. (2015). Allocation of virtual machines in cloud data centers–A survey of problem models and optimization algorithms. ACM Computing Surveys, 48(1), 1–31.CrossRef Mann, Z. Á. (2015). Allocation of virtual machines in cloud data centers–A survey of problem models and optimization algorithms. ACM Computing Surveys, 48(1), 1–31.CrossRef
Zurück zum Zitat Mishra, A. K., Hellerstein, J. L., Cirne, W., & Das, C. R. (2010). Towards characterizing cloud backend workloads: Insights from Google compute clusters. ACM SIGMETRICS Performance Evaluation Review, 37(4), 34–41.CrossRef Mishra, A. K., Hellerstein, J. L., Cirne, W., & Das, C. R. (2010). Towards characterizing cloud backend workloads: Insights from Google compute clusters. ACM SIGMETRICS Performance Evaluation Review, 37(4), 34–41.CrossRef
Zurück zum Zitat Ousterhout, K., Wendell, P., Zaharia, M., & Stoica, I. (2013). Sparrow: Distributed, low latency scheduling. In Proceedings of the twenty-fourth ACM symposium on operating systems principles (pp. 69–84). ACM. Ousterhout, K., Wendell, P., Zaharia, M., & Stoica, I. (2013). Sparrow: Distributed, low latency scheduling. In Proceedings of the twenty-fourth ACM symposium on operating systems principles (pp. 69–84). ACM.
Zurück zum Zitat Rasooli, A., & Down, D. G. (2014). COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems. Future Generation Computer Systems, 36, 1–15.CrossRef Rasooli, A., & Down, D. G. (2014). COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems. Future Generation Computer Systems, 36, 1–15.CrossRef
Zurück zum Zitat Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., & Kozuch, M. A. (2012). Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the third ACM symposium on cloud computing (pp. 1–13). ACM. Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., & Kozuch, M. A. (2012). Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the third ACM symposium on cloud computing (pp. 1–13). ACM.
Zurück zum Zitat Salehi, M. A., Krishna, P. R., Deepak, K. S., & Buyya, R. (2012). Preemption-aware energy management in virtualized data centers. In 2012 IEEE 5th international conference on cloud computing (CLOUD) (pp. 844–851). IEEE. Salehi, M. A., Krishna, P. R., Deepak, K. S., & Buyya, R. (2012). Preemption-aware energy management in virtualized data centers. In 2012 IEEE 5th international conference on cloud computing (CLOUD) (pp. 844–851). IEEE.
Zurück zum Zitat Tang, Q., Gupta, S. K. S., & Varsamopoulos, G. (2007). Thermal-aware task scheduling for data centers through minimizing heat recirculation. In IEEE international conference on cluster computing (pp. 129–138). IEEE. Tang, Q., Gupta, S. K. S., & Varsamopoulos, G. (2007). Thermal-aware task scheduling for data centers through minimizing heat recirculation. In IEEE international conference on cluster computing (pp. 129–138). IEEE.
Zurück zum Zitat Tarplee, K. M., Friese, R., Maciejewski, A. A., Siegel, H. J., & Chong, E. K. P. (2016). Energy and makespan tradeoffs in heterogeneous computing systems using efficient linear programming techniques. IEEE Transactions on Parallel and Distributed Systems, 27(6), 1633–1646.CrossRef Tarplee, K. M., Friese, R., Maciejewski, A. A., Siegel, H. J., & Chong, E. K. P. (2016). Energy and makespan tradeoffs in heterogeneous computing systems using efficient linear programming techniques. IEEE Transactions on Parallel and Distributed Systems, 27(6), 1633–1646.CrossRef
Zurück zum Zitat Terekhov, D., Tran, T. T., Down, D. G., & Beck, J. C. (2014). Integrating queueing theory and scheduling for dynamic scheduling problems. Journal of Artificial Intelligence Research, 50, 535–572. Terekhov, D., Tran, T. T., Down, D. G., & Beck, J. C. (2014). Integrating queueing theory and scheduling for dynamic scheduling problems. Journal of Artificial Intelligence Research, 50, 535–572.
Zurück zum Zitat Wang, L., Von Laszewski, G., Dayal, J., He, X., Younge, A. J., & Furlani, T. R. (2009). Towards thermal aware workload scheduling in a data center. In 2009 10th international symposium on pervasive systems, algorithms, and networks (ISPAN) (pp. 116–122). IEEE. Wang, L., Von Laszewski, G., Dayal, J., He, X., Younge, A. J., & Furlani, T. R. (2009). Towards thermal aware workload scheduling in a data center. In 2009 10th international symposium on pervasive systems, algorithms, and networks (ISPAN) (pp. 116–122). IEEE.
Zurück zum Zitat Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., & Stoica, I. (2010). Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on computer systems (pp. 265–278). ACM. Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., & Stoica, I. (2010). Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on computer systems (pp. 265–278). ACM.
Metadaten
Titel
Multi-stage resource-aware scheduling for data centers with heterogeneous servers
verfasst von
Tony T. Tran
Meghana Padmanabhan
Peter Yun Zhang
Heyse Li
Douglas G. Down
J. Christopher Beck
Publikationsdatum
20.07.2017
Verlag
Springer US
Erschienen in
Journal of Scheduling / Ausgabe 2/2018
Print ISSN: 1094-6136
Elektronische ISSN: 1099-1425
DOI
https://doi.org/10.1007/s10951-017-0537-x

Weitere Artikel der Ausgabe 2/2018

Journal of Scheduling 2/2018 Zur Ausgabe

Preface

Preface