Skip to main content
Top

2021 | Book

High Performance Computing in Science and Engineering

4th International Conference, HPCSE 2019, Karolinka, Czech Republic, May 20–23, 2019, Revised Selected Papers

Editors: Prof. Dr. Tomáš Kozubek, Prof. Dr. Peter Arbenz, Jiří Jaroš, Prof. Lubomír Říha, Dr. Jakub Šístek, Petr Tichý

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the thoroughly refereed post-conference proceedings of the 4th International Conference on High Performance Computing in Science and Engineering, HPCSE 2019, held in Karolinka, Czech Republic, in May 2019.

The 9 papers presented in this volume were carefully reviewed and selected from 13 submissions. The conference provides an international forum for exchanging ideas among researchers involved in scientific and parallel computing, including theory and applications, as well as applied and computational mathematics. The focus of HPCSE 2019 was on models, algorithms, and software tools that facilitate efficient and convenient utilization of modern parallel and distributed computing architectures, as well as on large-scale applications.

Table of Contents

Frontmatter
Thermal Characterization of a Tier0 Datacenter Room in Normal and Thermal Emergency Conditions
Abstract
Datacenters are at the heart of the AI, Industry 4.0 and cloud revolution. A datacenter contains a large number of computing nodes hosted in a large temperature-controlled room. Due to the increasing total power and power density of computing nodes, the overall datacenter compute capacity is often capped by peak power consumption and temperature bottlenecks. To preserve the homogeneous performance assumption between all the nodes, complex cooling solution are required, but they might not be sufficient. In this work, we analysed and characterised the thermal properties of a Tier0 datacenter deploying advanced hybrid cooling technologies: specifically, we studied the spatial and temporal heterogeneity during production and cooling emergency hazards. This paper gives first quantitative evidence of thermal bottlenecks in real-life production workload, showing the presence of significant spatial thermal heterogeneity which could be exploited by thermal-aware job scheduling and datacenter-room run-time workload adaptation and distribution.
Mohsen Seyedkazemi Ardebili, Carlo Cavazzoni, Luca Benini, Andrea Bartolini
Towards Local-Failure Local-Recovery in PDE Frameworks: The Case of Linear Solvers
Abstract
It is expected that with the appearance of exascale supercomputers the mean time between failure in supercomputers will decrease. Classical checkpoint-restart approaches are too expensive at that scale. Local-failure local-recovery (LFLR) strategies are an option that promises to leverage the costs, but actually implementing it into any sufficiently large simulation environment is a challenging task. In this paper we discuss how LFLR methods can be incorporated in a PDE framework, focussing at the linear solvers as the innermost component. We discuss how Krylov solvers can be modified to support LFLR, and present numerical tests. We exemplify our approach by reporting on the implementation of these features in the Dune framework, present C++ software abstractions, which simplify the incorporation of LFLR techniques and show how we use these in our solver library. To reduce the memory costs of full remote backups, we further investigate the benefits of lossy compression and in-memory checkpointing.
Mirco Altenbernd, Nils-Arne Dreier, Christian Engwer, Dominik Göddeke
Complexity Analysis of a Fast Directional Matrix-Vector Multiplication
Abstract
We consider a fast, data-sparse directional method to realize matrix-vector products related to point evaluations of the Helmholtz kernel. The method is based on a hierarchical partitioning of the point sets and the matrix. The considered directional multi-level approximation of the Helmholtz kernel can be applied even on high-frequency levels efficiently. We provide a detailed analysis of the almost linear asymptotic complexity of the presented method. Our numerical experiments are in good agreement with the provided theory.
Günther Of, Raphael Watschinger
Fast Large-Scale Boundary Element Algorithms
Abstract
Boundary element methods (BEM) reduce a partial differential equation in a domain to an integral equation on the domain’s boundary. They are particularly attractive for solving problems on unbounded domains, but handling the dense matrices corresponding to the integral operators requires efficient algorithms.
This article describes two approaches that allow us to solve boundary element equations on surface meshes consisting of several millions of triangles while preserving the optimal convergence rates of the Galerkin discretization.
Steffen Börm
Solving Large-Scale Interior Eigenvalue Problems to Investigate the Vibrational Properties of the Boson Peak Regime in Amorphous Materials
Abstract
Amorphous solids, like metallic glasses, exhibit an excess of low frequency vibrational states reflecting the break-up of sound due to the strong structural disorder inherent to these materials. Referred to as the boson peak regime of frequencies, how the corresponding eigenmodes relate to the underlying atomic-scale disorder remains an active research topic. In this paper we investigate the use of a polynomial filtered eigensolver for the computation and study of low frequency eigenmodes of a Hessian matrix located in a specific interval close to the boson peak regime. A distributed-memory parallel implementation of a polynomial filtered eigensolver is presented. Our implementation, based on the Trilinos framework, is then applied to a Hessian matrix of an atomistic bulk metallic glass structure derived from a molecular dynamics simulation for the computation of eigenmodes close to the boson peak. In addition, we study the parallel scalability of our implementation on multicore nodes. Our resulting calculations successfully concur with previous atomistic results, and additionally demonstrate a broad cross-over of boson peak frequencies within which sound is seen to break-up.
Giuseppe Accaputo, Peter M. Derlet, Peter Arbenz
Performance Evaluation of Pseudospectral Ultrasound Simulations on a Cluster of Xeon Phi Accelerators
Abstract
The rapid development of novel procedures in medical ultrasonics, including treatment planning in therapeutic ultrasound and image reconstruction in photoacoustic tomography, leads to increasing demand for large-scale ultrasound simulations. However, routine execution of such simulations using traditional methods, e.g., finite difference time domain, is expensive and often considered intractable due to the computational and memory requirements. The k-space corrected pseudospectral time domain method used by the k-Wave toolbox allows for significant reductions in spatial and temporal grid resolution. These improvements are achieved at the cost of all-to-all communication, which are inherent to the multi-dimensional fast Fourier transforms. To improve data locality, reduce communication and allow efficient use of accelerators, we recently implemented a domain decomposition technique based on a local Fourier basis.
In this paper, we investigate whether it is feasible to run the distributed k-Wave implementation on the Salomon cluster equipped with 864 Intel Xeon Phi (Knight’s Corner) accelerators. The results show the immaturity of the KNC platform with issues ranging from limited support of Infiniband and LustreFS in Intel MPI on this platform to poor performance of 3D FFTs achieved by Intel MKL on the KNC architecture. Yet, we show that it is possible to achieve strong and weak scaling comparable to CPU-only platforms albeit with the runtime \(1.8\times \) to \(4.3\times \) longer. However, the accounting policy for Salomon’s accelerators is far more favorable and thus their employment reduces the computational cost significantly.
Filip Vaverka, Bradley E. Treeby, Jiri Jaros
Estimation of Execution Parameters for k-Wave Simulations
Abstract
Estimation of execution parameters takes centre stage in automatic offloading of complex biomedical workflows to cloud and high performance facilities. Since ordinary users have no or very limited knowledge of the performance characteristics of particular tasks in the workflow, the scheduling system has to have the capabilities to select appropriate amount of compute resources, e.g., compute nodes, GPUs, or processor cores and estimate the execution time and cost.
The presented approach considers a fixed set of executables that can be used to create custom workflows, and collects performance data of successfully computed tasks. Since the workflows may differ in the structure and size of the input data, the execution parameters can only be obtained by searching the performance database and interpolating between similar tasks. This paper shows it is possible to predict the execution time and cost with a high confidence. If the task parameters are found in the performance database, the mean interpolation error stays below 2.29%. If only similar tasks are found, the mean interpolation error may grow up to 15%. Nevertheless, this is still an acceptable error since the cluster performance may vary on order of percent as well.
Marta Jaros, Tomas Sasak, Bradley E. Treeby, Jiri Jaros
Analysis and Visualization of the Dynamic Behavior of HPC Applications
Abstract
The behavior of a parallel application can be presented in many ways, but performance visualization tools usually focus on communication graphs and runtime of processes or threads in specific (groups of) functions. A different approach is required when searching for the optimal configuration of tunable parameters, for which it is necessary to run the application several times and compare the resource consumption of these runs. We present RADAR visualizer, a tool that was originally developed to analyze such measurements and to detect the optimal configuration for each instrumented part of the code. In this case, the optimum was defined as the minimum energy consumption of the whole application, but any other metric can be defined.
RADAR visualizer presents the application behavior in several graphical representations and tables including the amount of savings that can be reached. Together with our MERIC library, we provide a complete toolchain for HPC application behavior monitoring, data analysis, and graphical representation. The final part is performing dynamic tuning (applying optimal settings for each region during the application runtime) for the production runs of the analyzed application.
Ondrej Vysocky, Ivo Peterek, Martin Beseda, Matej Spetko, David Ulcak, Lubomir Riha
A Convenient Graph Connectedness for Digital Imagery
Abstract
In a simple undirected graph, we introduce a special connectedness induced by a set of paths of length 2. We focus on the 8-adjacency graph (with the vertex set \(\mathbb {Z}^2\)) and study the connectedness induced by a certain set of paths of length 2 in the graph. For this connectedness, we prove a digital Jordan curve theorem by determining the Jordan curves, i.e., the circles in the graph that separate \(\mathbb {Z}^2\) into exactly two connected components. These Jordan curves are shown to have an advantage over those given by the Khalimsky topology on \(\mathbb {Z}^2\).
Josef Šlapal
Backmatter
Metadata
Title
High Performance Computing in Science and Engineering
Editors
Prof. Dr. Tomáš Kozubek
Prof. Dr. Peter Arbenz
Jiří Jaroš
Prof. Lubomír Říha
Dr. Jakub Šístek
Petr Tichý
Copyright Year
2021
Electronic ISBN
978-3-030-67077-1
Print ISBN
978-3-030-67076-4
DOI
https://doi.org/10.1007/978-3-030-67077-1

Premium Partner