Skip to main content
Top

2021 | Book

Sustained Simulation Performance 2019 and 2020

Proceedings of the Joint Workshop on Sustained Simulation Performance, University of Stuttgart (HLRS) and Tohoku University, 2019 and 2020

Editors: Prof. Dr. Michael M. Resch, Manuela Wossough, Dr. Wolfgang Bez, Dr. Erich Focht, Prof. Hiroaki Kobayashi

Publisher: Springer International Publishing

insite
SEARCH

About this book

This book presents the state of the art in High Performance Computing on modern supercomputer architectures. It addresses trends in hardware and software development in general. The contributions cover a broad range of topics, from performance evaluations in context with power efficiency to Computational Fluid Dynamics and High Performance Data Analytics. In addition, they explore new topics like the use of High Performance Computers in the field of Artificial Intelligence and Machine Learning. All contributions are based on selected papers presented at the 30th Workshop on Sustained Simulation Performance (WSSP) held at the High Performance Computing Center, University of Stuttgart, Germany in October 2019 and on the papers for the planned Workshop on Sustained Simulation Performance in March 2020, which could not take place due to the Covid-19 pandemic.

Table of Contents

Frontmatter

Performance and Power

Frontmatter
Performance Evaluation of SX-Aurora TSUBASA and Its QA-Assisted Application Design
Abstract
In this article, we present an overview of our on-going project entitled, R&D of a Quantum-Annealing Assisted Next Generation HPC Infrastructure and its Applications. We describes our system design concept of a new computing infrastructure toward the Post-Moore era by the integration of classical HPC engines and a quantum-annealing engine as a single system image and a realization of the ensemble of domain specific architectures. We also present the performance evaluation of SX-Aurora TSUBASA, which is the central system of this infrastructure, by using world well-known benchmark kernels. Here we discuss its sustained performance, power-efficiency, and scalability of vector engines of SX-Aurora TSUBASA by using HPL, Himeno and HPCG benchmarks. Moreover, As an example of the quantum annealing assisted application design, we present how a quantum annealing data processing mechanism is introduced into a large scale data-clustering.
Hiroaki Kobayashi, Kazuhiko Komatsu
Towards Energy Efficient Computing Based on the Estimation of Energy Consumption
Abstract
The amount of computation power in the world keeps increasing as well as the computation needs by the industry and society. That increases also the total energy consumption on ICT, which reached the level of billions of dollars spent every year, as well as an equivalent emission print of millions of tons of CO\(_2\) per year. That economical and ecological costs motivate us to search for more efficient computation. In addition, one more need for an efficient computer is the target of exascale computing and higher levels after that. We consider that it is needed a shift from considering only computation time when optimizing code, to also consider more efficient use of energy. To achieve energy-efficient computing, we consider that the first step considers recording the energy consumption of the algorithms used, and then using those results to select a more efficient energy algorithm among those available, which may require to increase the parallelization level and/or computation time, while still fulfill with the application requirements. Notice that cooling systems in the HPC may require to consume the same amount of energy as that consumed in the computing nodes, which means that the reduction of energy consumption due to efficient energy programming will also be doubled.
José Miguel Montañana Aliaga, Alexey Cheptsov, Antonio Hervás
Speeding Up Vector Engine Offloading with AVEO
Abstract
Vector Engine Offloading (VEO) was the first implementation of an API for programming the SX-Aurora Tsubasa Vector Engine (VE) like an accelerator, i.e. writing programs for the host CPU which call certain offloading kernels running on the VE. The native VE programming model using OpenMP and MPI still dominates in applications, but CUDA, HIP, OpenMP Target, OpenACC, OpenCL find more and more traction. This report introduces AVEO, an alternative VE offloading implementation with VEO compatible API. It was redesigned to solve a set of problems in VEO and improve call latency as well as memory transfer bandwidth. The results show latency improvements of up to factor 18 and bandwidth increases by factor 8–10 for small buffers and 15–20% for very large buffers. We describe implementation details and remote memory access mechanisms as well as API extensions. This development should contribute to making accelerator-style hybrid programming more attractive on the vector engine, ease porting of hybrid programs but also developing more sophisticated hybrid programming frameworks.
Erich Focht

Numerics and Optimization

Frontmatter
Optimizations of DNS Codes for Turbulence on SX-Aurora TSUBASA
Abstract
Direct numerical simulations (DNSs) of incompressible turbulence have been performed since the late 1960s, but simulations that reproduce strongly nonlinear turbulent flows as in the real-world have not been realized. We have implemented two kinds of parallel Fourier-spectral DNS codes by using a one-dimensional domain decomposition (slab decomposition) and a two-dimensional domain decomposition (pencil decomposition) for a cutting-edge vector supercomputer in order to carry out larger DNSs than ever before. In the DNS by the Fourier spectral method, the three-dimensional Fast Fourier Transforms (3D-FFTs) account for more than 90% of the computational time. Thus, in this article, our FFT codes for vector computers are optimized on SX-Aurora TSUBASA, and vector execution performance of the codes is measured. After optimization, the calculation time of the pencil decomposition code is 1.6 times shorter than before optimization.
Yujiro Takenaka, Mitsuo Yokokawa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi
Dynamic Load Balancing for Coupled Simulation Methods
Abstract
A dynamic load balancing technique for simulation methods based on hierarchical Cartesian meshes is presented for two applications in this paper. The first method is a hybrid CFD/CAA solver for the prediction of aeroacoustic noise. In this application, a finite-volume method for the large eddy simulation of the turbulent flow field is coupled to a discontinuous Galerkin method for the solution of the acoustic perturbation equations to predict the generation and propagation of the sound field. The second simulation method predicts a combustion process of a premixed fuel. The turbulent flow field is predicted again by large eddy simulation using the finite-volume method, which is coupled to a level-set solver used for the prediction of the flame surface. In both applications, a joint Cartesian mesh is used for the involved solvers, which allows to efficiently redistribute the computational load using a space filling curve. The results show that the dynamic load balancing can enhance the parallel efficiency even for static meshes. The simulation of the combustion process with a solution adaptvie mesh technique demonstrates the necessity of a dynamic load balancing technique.
Matthias Meinke, Ansgar Niemöller, Sohel Herff, Wolfgang Schröder
Brinkman Penalization and Boundary Layer in High-Order Discontinuous Galerkin
Abstract
In this chapter we look into a high-order representation of complex geometries by the Brinkman penalization method. We focus on the effect of this immersed boundary model on the boundary layers in a high-order discontinuous Galerkin scheme. High-order approximations are attractive on modern computing architectures, as they require few degrees of freedom, to represent smooth solutions, resulting in a smaller memory footprint when compared to lower order discretizations. A significant hurdle in using high-order methods for problems involving complex geometries is matching the surface description of the geometry with the discretization scheme. The Brinkman penalization offers a method to achieve this without the need for complicated mesh generation and special elements. We investigated its use in our high-order discontinuous Galerkin implementation Ateles in [2], where we looked at inviscid effects like reflections. Here we investigate the viscous boundary layer close to a wall, modeled by the penalization.
Neda Ebrahimi Pour, Nikhil Anand, Felix Bernhards, Harald Klimach, Sabine Roller

Data Handling and New Concepts

Frontmatter
Handling Large Numerical Data-Sets: Viability of a Lossy Compressor for CFD-simulations
Abstract
Over the years, a steady increase in computing power has enabled scientists and engineers to develop increasingly complex applications for machine learning and scientific computing. But while these applications promise to solve some of the most difficult problems we face today, their data hunger also reveals an ever-increasing I/O bottleneck. It is therefore imperative that we develop I/O strategies to better utilize the raw power of our high-performance machines and improve the usability and efficiency of our tools. To achieve this goal, we have developed the BigWhoop compression library based on the JPEG 2000 standard. It enables the efficient and lossy compression of numerical data-sets while minimizing information loss and the introduction of compression artifacts. This paper presents a comparative study using the Taylor-Green Vortex test case to demonstrate the superior compression performance of BigWhoop compared to contemporary solutions. An evaluation of compression-related distortion at high compression ratios is shown to prove its feasibility for both visualization and statistical analysis.
Patrick Vogler, Ulrich Rist
A Method for Stream Data Analysis
Abstract
Due to the recent advances in hardware and software, a number of applications generate a huge amount of data at a great velocity which make big data stream become ubiquitous. Different from static data analysis, processing data stream imposes new challenges for algorithms and methods which need to incrementally deal with incoming data with limited memory and time. Furthermore, due to the inherent dynamic characteristics of streaming data, algorithms are often required to solve problems like concept drift, temporal dependencies, load imbalance, etc. In this paper, we discuss state of the art researches on data stream analysis which employed rigorous and methodical approaches, especially deep learning. Besides, a new method for processing data stream based on the latest development of GAN is proposed. And finally the future work to be done is discussed.
Li Zhong
CFD Simulation with Microservices for Geoscience Applications
Abstract
The current geoscience applications are confronted by two major challenges—integration with numerous diverse sensor devices and use in real-time use case scenarios. Whilst the challenge of service integration is addressed by the concept of Cyber-Physical systems, which aims to incorporate sensor data in application workflows, the usage of High Performance Computers helps minimize the execution time to fulfill the real time scenarios requirements. However, the existing programming models do not allow scientific workflows to take advantage of both technologies simultaneously. This paper contribution offers an approach to encapsulation of workflow-based applications into services, which are flexible enough to run on heterogeneous, distributed infrastructures spanning over both industrial sensor services and parallel computing systems. The approach is demonstrated on a computational fluid dynamics simulation study of aerodynamic processes in complex underground mine ventilation networks.
Alexey Cheptsov
Containerization and Orchestration on HPC Systems
Abstract
Containerization demonstrates its efficiency in application deployment in Cloud clusters. HPC systems start to adopt containers, as containers can encapsulate complex programs with their dependencies in isolated environments making applications more portable. Nevertheless, conventional HPC workload managers lack micro-service support and deeply-integrated container management, as opposed to container orchestrators. We enable the synergy of Cloud and HPC clusters. We propose the preliminary design of a feedback control scheduler that performs efficient container scheduling meanwhile taking advantage of the scheduling policies of the container orchestrator (Kubernetes) and the HPC workload manager (TORQUE).
Naweiluo Zhou

Trends in HPC and AI

Frontmatter
The Role of Machine Learning and Artificial Intelligence in High Performance Computing
Abstract
High Performance Computing has recently been challenged by the advent of Data Analytics (DA), Machine Learning (ML) and Artificial Intelligence (AI). In this paper we will first look at the situation of HPC which is mainly shaped by the end of Moore’s law and an increase in electrical power consumption. We then explore the role that these technologies can play when coming together. We will look into the situation of HPC and into how DA, ML and AI can change the scientific and industrial usage of simulation on high performance computers. Finally, we make suggestions of how to use the convergence of technologies to solve new problems.
Michael M. Resch, Bastian Koller
Trends and Emerging Technologies in AI
Abstract
The growth of artificial intelligence (AI) is accelerating. AI has left research and innovation labs, and nowadays plays a significant role in our everyday lives. The impact on society is graspable: autonomous driving cars produced by Tesla, voice assistants such as Siri, and AI systems that beat renowned champions in board games like Go. All these advancements are facilitated by powerful computing infrastructures based on HPC and advanced AI-specific hardware, as well as highly-optimized AI codes. In this paper, we will thus overview current and future trends in AI, as well as emerging technologies that drive AI innovation. We will spend a significant part on the ethical aspects that arose around AI whenever citizens interact with AI systems. This translates directly to key topics such as transparency, trustworthiness, and explainability of AI systems. This paper will therefore discuss several approaches of the new research field called explainable AI (XAI). Finally, we will present briefly AI-specific hardware that may find its way into HPC computing.
Dennis Hoppe
Synergetic Build-up of National Competence Centres All over Europe
Abstract
This chapter presents the rationale behind and the implementation strategy for the setup of National Competence Centres for HPC and associated technologies all over Europe. Furthermore, it will present how a national activity like this, can benefit from coordination and support activities on the European level and how all this covers the needed actions in Europe to boost the uptake and impact of HPC.
Bastian Koller, Natalie Lewandowski
Metadata
Title
Sustained Simulation Performance 2019 and 2020
Editors
Prof. Dr. Michael M. Resch
Manuela Wossough
Dr. Wolfgang Bez
Dr. Erich Focht
Prof. Hiroaki Kobayashi
Copyright Year
2021
Electronic ISBN
978-3-030-68049-7
Print ISBN
978-3-030-68048-0
DOI
https://doi.org/10.1007/978-3-030-68049-7

Premium Partner