Skip to main content
Top

Joint Optimization of Computation and Communication Resources for GPU Allocation in Heterogeneous Clusters

  • 2026
  • OriginalPaper
  • Chapter
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter delves into the complexities of GPU resource allocation in heterogeneous clusters, particularly in the context of distributed training for deep learning models. It highlights the limitations of existing resource allocation schemes, which either prioritize computational efficiency at the cost of significant communication overhead or vice versa. The chapter introduces HetSpeed, a novel resource allocation method designed to optimize both computational efficiency and communication overhead. HetSpeed formulates the job placement problem as an integer linear programming optimization problem and provides a submodular optimization-based solution. Experimental results demonstrate that HetSpeed can minimize the total cross-rack traffic of the cluster, shorten the average job completion time by 27.8%, and improve the cluster speedup value by 21.3% compared to state-of-the-art solutions. The chapter also discusses the NP-hardness of the problem and provides a detailed analysis of the performance of HetSpeed in various scenarios. Overall, this chapter offers valuable insights into the challenges of GPU resource allocation in heterogeneous clusters and presents a promising solution to address these challenges.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Business + Economics & Engineering + Technology"

Online-Abonnement

Springer Professional "Business + Economics & Engineering + Technology" gives you access to:

  • more than 130.000 books
  • more than 540 journals

from the following subject areas:

  • Automotive
  • Construction + Real Estate
  • Business IT + Informatics
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Mechanical Engineering + Materials
  • Surfaces + Materials Technology
  • Insurance + Risk


Secure your knowledge advantage now!

Springer Professional "Engineering + Technology"

Online-Abonnement

Springer Professional "Engineering + Technology" gives you access to:

  • more than 75.000 books
  • more than 390 journals

from the following specialised fileds:

  • Automotive
  • Business IT + Informatics
  • Construction + Real Estate
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Mechanical Engineering + Materials
  • Surfaces + Materials Technology





 

Secure your knowledge advantage now!

Springer Professional "Business + Economics"

Online-Abonnement

Springer Professional "Business + Economics" gives you access to:

  • more than 100.000 books
  • more than 340 journals

from the following specialised fileds:

  • Construction + Real Estate
  • Business IT + Informatics
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Insurance + Risk



Secure your knowledge advantage now!

Title
Joint Optimization of Computation and Communication Resources for GPU Allocation in Heterogeneous Clusters
Authors
Jiacheng Zhu
Chu Xu
Gongming Zhao
Hongli Xu
Gangyi Luo
Hao Zheng
Copyright Year
2026
DOI
https://doi.org/10.1007/978-3-032-10459-5_12
This content is only visible if you are logged in and have the appropriate permissions.

Premium Partner

    Image Credits
    Neuer Inhalt/© ITandMEDIA, Nagarro GmbH/© Nagarro GmbH, AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, USU GmbH/© USU GmbH, Ferrari electronic AG/© Ferrari electronic AG