Skip to main content
main-content
Top

Hint

Swipe to navigate through the articles of this issue

06-11-2020 | Issue 6/2021

The Journal of Supercomputing 6/2021

CRState: checkpoint/restart of OpenCL program for in-kernel applications

Journal:
The Journal of Supercomputing > Issue 6/2021
Authors:
Genlang Chen, Jiajian Zhang, Zufang Zhu, Qiangqiang Jiang, Hai Jiang, Chaoyi Pang
Important notes
Grant No. Natural Science Foundation of Zhejiang Province (LY20F020001).

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Abstract

The checkpoint/restart mechanism is critical in a preemptive system because clusters with this mechanism will be improved in terms of fault tolerance, load balance, and resource utilization. As graphics processing units (GPUs) have more recently become commonplace with the advent of general-purpose computation, and open computing language (OpenCL) programs are portable across various CPUs and GPUs, it is increasingly important to set up checkpoint/restart mechanism in OpenCL programs. However, due to the complexity of the internal computational state of the GPU, there is currently no effective and reasonable checkpoint/restart scheme for OpenCL applications. This paper proposes a feasible system, checkpoint/restart state (CRState), to achieve checkpoint/restart in GPU kernels. The computation states including heap, data segments, local memory, stack and code segments in the underlying hardware are identified and concretized in order to establish an association between the underlying level state and the application level representation. Then, a pre-compiler is developed to insert primitives into OpenCL programs at compile time so that major components of the computation state will be extracted at runtime. Since the computation state is duplicated at application level, such OpenCL programs can be preempted and ported across heterogeneous devices. A comprehensive example and ten authoritative benchmark programs are selected to demonstrate the feasibility and effectiveness of the proposed system.

Please log in to get access to this content

To get access to this content you need the following product:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 58.000 Bücher
  • über 300 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Testen Sie jetzt 30 Tage kostenlos.

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 50.000 Bücher
  • über 380 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Umwelt
  • Maschinenbau + Werkstoffe




Testen Sie jetzt 30 Tage kostenlos.

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 69.000 Bücher
  • über 500 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Umwelt
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Testen Sie jetzt 30 Tage kostenlos.

Literature
About this article

Other articles of this Issue 6/2021

The Journal of Supercomputing 6/2021 Go to the issue

Premium Partner

    Image Credits