Top

2017 | Book

Read chapter Read first chapter

High-Performance Scientific Computing

First JARA-HPC Symposium, JHPCS 2016, Aachen, Germany, October 4–5, 2016, Revised Selected Papers

Editors: Edoardo Di Napoli, Marc-André Hermanns, Hristo Iliev, Andreas Lintermann, Alexander Peyser

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book constitutes the thoroughly refereed post-conference proceedings of the First JARA High-Performance Computing Symposium, JARA-HPC 2016, held in Aachen, Germany, in October 2016.
The 21 full papers presented were carefully reviewed and selected from 26 submissions. They cover many diverse topics, such as coupling methods and strategies in Computational Fluid Dynamics (CFD), performance portability and applications in HPC, as well as provenance tracking for large-scale simulations.

Frontmatter

Efficient HPC-Optimized Multi-Physics Coupling Strategies in CFD

Frontmatter

Partitioned High Performance Code Coupling Applied to CFD

Abstract

Based on in situ observations obtained in the context of multiphysics and multicomponent simulations of the Computational Fluid Dynamics community, parallel performances of code coupling is first discussed. Overloads due to coupling steps are then analyzed with a simple toy model. Many parameters can impact the communication times, such as the number of cores, the communication mode (synchronous or asynchronous), the global size of the exchanged fields or the amount of data per core. Results show that the respective partionning of the coupled codes as well as core distributions on the machine have an important role in exchange times and thus on the total CPU hours needed by an application. For the synchronous communications presented in this paper, two main outcomes independent from the coupler can be addressed by incorporating the knowledge of the coupling in the preprocessing step of the solvers with constraint and co-partitioning as well as process placement. Such conclusions can be directly extended to other field of applications such as climat science where coupling between ocean and atmosphere is of primary importance.

Florent Duchaine, Sandrine Berger, Gabriel Staffelbach, Laurent Gicquel

Dynamic Load Balancing for Large-Scale Multiphysics Simulations

Abstract

In parallel computing load balancing is an essential component of any efficient and scalable simulation code. Static data decomposition methods have proven to work well for symmetric workloads. But, in today’s multiphysics simulations, with asymmetric workloads, this imbalance prevents good scalability on future generation of parallel architectures. We present our work on developing a general dynamic load balancing framework for multiphysics simulations on hierarchical Cartesian meshes. Using a weighted dual graph based workload estimation and constrained multilevel graph partitioning, the required runtime for industrial applications could be reduced by 40\(\%\) of the runtime, running on the K computer.

Niclas Jansson, Rahul Bale, Keiji Onishi, Makoto Tsubokura

On the Significance of Exposure Time in Computational Blood Damage Estimation

Abstract

The reliability of common stress-based power law models for hemolysis estimations in blood pumps is still not satisfying. Stress-based models are based on an instantaneous shear stress measure. Therefore, such models implicitly assume that red blood cells deform immediately due to the action of forces. In contrast, a strain-based model considers the entire deformation history of the cells. By applying a viscoelastic tensor equation for the stress computation, the effect of exposure time is represented as a biophysical phenomenon. Comparisons of stress-based and strain-based hemolysis models in a centrifugal blood pump show very significant differences. Stress peaks with short exposure time contribute to the overall hemolysis in the stress-based model, whereas regions with increased shear and long exposure time are responsible for damage in the strain-based model.

Lutz Pauli, Marek Behr

A Partitioned Methodology for Conjugate Heat Transfer on Dynamic Structures

Abstract

A partitioned coupling approach for conjugate heat transfer applications is presented. The coupling scheme is based on the extension of the parallel algebraic domain composition method already validated in fluid-structure interactions problems for thermal coupling. The method alters the original Dirichlet-Neumann approach enforcing the boundary conditions over the subdomains through matrix operations. The algorithm is tested on two benchmark cases with conjugate heat transfer: flow over a heated cylinder and flow over a flat-plate. The results indicate good agreement with previous research and encourages its application for large-scale problems.

Miguel Zavala-Aké, Daniel Mira, Mariano Vázquez, Guillaume Houzeaux

Farfield Noise Prediction Using Large-Scale Lattice-Boltzmann Simulations

Abstract

In order to predict farfield noise created by the flow over complex geometries, high-fidelity flow simulations based on the Lattice-Boltzmann solver PowerFLOW are used in conjunction with the acoustic analogy solver PowerACOUSTICS. Since the flow needs to be spatially and temporally well resolved, the simulations are usually carried on a large number of computational cores for adequate turnaround times. This paper provides the background on the two-step methodology and gives an overview on aero-acoustics computations in aerospace, ranging from an isolated airframe component to the entire aircraft system.

Benjamin Duda, Ehab Fares

FEniCS-HPC: Coupled Multiphysics in Computational Fluid Dynamics

Abstract

We present a framework for coupled multiphysics in computational fluid dynamics, targeting massively parallel systems. Our strategy is based on general problem formulations in the form of partial differential equations and the finite element method, which open for automation, and optimization of a set of fundamental algorithms. We describe these algorithms, including finite element matrix assembly, adaptive mesh refinement and mesh smoothing; and multiphysics coupling methodologies such as unified continuum fluid-structure interaction (FSI), and aeroacoustics by coupled acoustic analogies. The framework is implemented as FEniCS open source software components, optimized for massively parallel computing. Examples of applications are presented, including simulation of aeroacoustic noise generated by an airplane landing gear, simulation of the blood flow in the human heart, and simulation of the human voice organ.

Johan Hoffman, Johan Jansson, Niyazi Cem Degirmenci, Jeannette Hiromi Spühler, Rodrigo Vilela De Abreu, Niclas Jansson, Aurélien Larcher

The Direct-Hybrid Method for Computational Aeroacoustics on HPC Systems

Abstract

Classic hybrid methods for computational aeroacoustics use different solvers and methods to predict the flow field and the acoustic pressure field in two separate steps, which involves data exchange via disk I/O between the solvers. This limits the efficiency of the approach, as parallel I/O usually does not scale well to large numbers of cores. In this work, a highly scalable direct-hybrid scheme is presented, in which both the flow and the acoustics simulations run simultaneously. That is, all data between the two solvers is transferred in-memory, avoiding the restrictions of the I/O subsystem. Results for the simulation of a pair of co-rotating vortices show that the method is able to correctly predict the acoustic pressure field and that it is suitable for highly parallel simulations.

Michael Schlottke-Lakemper, Hans Yu, Sven Berger, Andreas Lintermann, Matthias Meinke, Wolfgang Schröder

A Novel Approach for Efficient Storage and Retrieval of Tabulated Chemistry in Reactive Flow Simulations

Abstract

Turbulent combustion is a typical example of a multi-scale problem, coupling different ranges of time and length scales of the flow field and the chemical reactions. Due to scale separation and the availability of suitable coupling procedures, tabulated chemistry approaches have emerged as an effective method for describing turbulence-chemistry interaction (TCI). However, different flame configurations, complex fuels and multiphase flows, among other things, increase both the number of tabulated variables and the dimension of the database, and thus the overall size. With larger database sizes, the requirements for computing time and memory management have become a crucial issue for CFD applications. In the present study, the novel flatkernel approach for efficient memory management at reduced computational cost is developed. This new software-library-based approach uses polynomial fitting to represent the database. The resulting functions are generated as source code and compiled in a shared library, taking advantage of automatic compiler optimization. Since the shared library is also memory managed by the operating system, the flatkernel approach leads to reduced memory and computing time requirements in the coupled CFD application. The approach developed is applied for scale-resolving Large Eddy Simulations (LES), coupled with the flamelet-progress variable approach (LES-FPV) for combustion modeling of a reactive jet in a cross flow configuration. The evaluation of the simulations is focused on a comparison between the novel method and an existing approach, with respect to memory and computing time requirements.

Sebastian Popp, Steffen Weise, Christian Hasse

Multi-scale Coupling for Predictive Injector Simulations

Abstract

Predictive simulations of full fuel injection systems for e.g. diesel engines could be very important for reducing emissions of current engines but are still rare. Beside the numerical issues arising from discontinuities across the liquid-gas-interface, different scales relevant for the nozzle internal flow, primary breakup in the vicinity of the nozzle, and secondary breakup and evaporation further downstream make efficient simulation of the full injection system challenging. This paper introduces a multi-scale coupling approach for overcoming this issue leading to efficient and predictive injector simulations. After a brief description of the numerical methods used in this study, the coupling among nozzle internal flow, primary breakup, and secondary breakup with evaporation is introduced and analyzed with respect to computing efficiency and physical accuracy. Finally, the simulation framework is applied to the “Spray A” case of the Engine Combustion Network.

Mathis Bode, Marco Davidovic, Heinz Pitsch

Domain-Specific Applications and High-Performance Computing

Frontmatter

Ab Initio Description of Optoelectronic Properties at Defective Interfaces in Solar Cells

Abstract

In order to optimize the optoelectronic properties of novel solar cell architectures, such as the amorphous-crystalline interface in silicon heterojunction devices, we calculate and analyze the local microscopic structure at this interface and in bulk a-Si:H, in particular with respect to the impact of material inhomogeneities. The microscopic information is used to extract macroscopic material properties, and to identify localized defect states, which govern the recombination properties encoded in quantities such as capture cross sections used in the Shockley-Read-Hall theory. To this end, atomic configurations for a-Si:H and a-Si:H/c-Si interfaces are generated using molecular dynamics. Density functional theory calculations are then applied to these configurations in order to obtain the electronic wave functions. These are analyzed and characterized with respect to their localization and their contribution to the (local) density of states. GW calculations are performed for the a-Si:H configuration in order to obtain a quasi-particle corrected absorption spectrum. The results suggest that the quasi-particle corrections can be approximated through a scissors shift of the Kohn-Sham energies.

Philippe Czaja, Massimo Celino, Simone Giusepponi, Michele Gusso, Urs Aeberhard

Scale Bridging Simulations of Large Elastic Deformations and Bainitic Transformations

Abstract

The multiscale process of bainitic microstructure formation is still insufficiently understood from a theoretical and simulation perspective. Production processes of press hardened bainitic steels lead to large deformations, and as a particular aspect we investigate the role of large elastic strains, starting from ab initio methods, bridging them to phase field crystal continuum approaches and connecting the results to macroscopic deformation laws. Our investigations show that the phase field crystal model covers large deformations in the nonlinear elastic regime very well. Concerning the microstructure evolution we use a multi phase field model including carbon diffusion, carbide formation and elastic effects. For all the covered aspects we use efficient numerical schemes, which are implemented on GPUs using CUDA.

Marc Weikamp, Claas Hüter, Mingxuan Lin, Ulrich Prahl, Diego Schicchi, Martin Hunkel, Robert Spatschek

Ab Initio Modelling of Electrode Material Properties

Abstract

We discuss elastic and thermodynamic aspects of LiCoO\(_2\) in the context of fracture propagation and hot spot formation. Approaching the problem via ab initio modelling, we can access the delithiated states which is difficult experimentally. Application of density functional theory in the quasi-harmonic approximation provides good agreement in the range of experimentally available data for isobaric heat capacities, suggesting to complement thermodynamic databases required for the modelling of heat flows. The results for the mechanical characteristics suggest a brittle-to-ductile transition with varying lithium contents and crack orientations perpendicular to the basal plane, as indicated by the obtained elastic tensors experimentally.

Siaufung O. Dang, Marco Prill, Claas Hüter, Martin Finsterbusch, Robert Spatschek

Overlapping of Communication and Computation in nb3dfft for 3D Fast Fourier Transformations

Abstract

For efficiency and accuracy of Direct Numerical Simulations (DNS) of turbulent flows pseudo-spectral methods can be employed, where the governing equations are solved partly in Fourier space. The inhouse-developed 3d-FFT library nb3dfft is optimized to the special needs of pseudo-spectral DNS, particularly for the scientific code psOpen, used by the Institute for Combustion Technology at RWTH Aachen University. In this paper we discuss the method of overlapping communication and computation of multiple FFTs at the same time.

Jens Henrik Göbbert, Hristo Iliev, Cedrick Ansorge, Heinz Pitsch

Towards Simulating Data-Driven Brain Models at the Point Neuron Level on Petascale Computers

Abstract

We present a solution to two important problems that arise in the simulation of large data-driven neural networks: (a) efficient loading of network descriptions and (b) efficient instantiation of the network by executing the model specification. To address the first problem, we present a general data-format PointBrainH5, to store the network information along with the parallel-distributed RTC algorithm to efficiently load and instantiate a network model. We test data-format and algorithm on a data-driven simulation of the size of a full mouse brain on 4 racks of a IBM Blue Gene/Q. The model comprised 75 million neurons with 664 billion synapses and occupied 15 TB on disk. Loading and instantiation of the network on 4 racks of the BlueGene/Q took 30 min. We observe good scaling for up to 16,384 nodes.

Till Schumann, Csaba Erő, Marc-Oliver Gewaltig, Fabien Jonathan Delalondre

Parallel Adaptive Integration in High-Performance Functional Renormalization Group Computations

Abstract

The conceptual framework provided by the functional Renormalization Group (fRG) has become a formidable tool to study correlated electron systems on lattices which, in turn, provided great insights to our understanding of complex many-body phenomena, such as high-temperature superconductivity or topological states of matter. In this work we present one of the latest realizations of fRG which makes use of an adaptive numerical quadrature scheme specifically tailored to the described fRG scheme. The final result is an increase in performance thanks to improved parallelism and scalability.

Julian Lichtenstein, Jan Winkelmann, David Sánchez de la Peña, Toni Vidović, Edoardo Di Napoli

Performance Portability

Frontmatter

Performance Optimization of Parallel Applications in Diverse On-Demand Development Teams

Abstract

Current supercomputing platforms and scientific application codes have grown rapidly in complexity over the past years. Multi-scale, multi-domain simulations on one hand and deep hierarchies in large-scale computing platforms on the other make it exceedingly harder to map the former onto the latter and fully exploit the available computational power. The complexity of the software and hardware components involved calls for in-depth expertise that can only be met by diversity in the application development teams. With its model of simulation labs and cross-sectional groups, JARA-HPC enables such diverse teams to form on demand to solve concrete development problems. This work showcases the effectiveness of this model with two application case studies involving the JARA-HPC cross-sectional group “Parallel Efficiency” and simulation labs and domain-specific development teams. For one application, we show the results of a completed optimization and the estimated financial impact of the combined efforts. For the other application, we present results from an ongoing engagement, where we show how an on-demand team investigates the behavior of dynamic load balancing schemes for an MD particle simulation, leading to a better overall understanding of the application and revealing targets for further investigation.

Hristo Iliev, Marc-André Hermanns, Jens Henrik Göbbert, René Halver, Christian Terboven, Bernd Mohr, Matthias S. Müller

Hybrid CPU-GPU Generation of the Hamiltonian and Overlap Matrices in FLAPW Methods

Abstract

In this paper we focus on the integration of high-performance numerical libraries in ab initio codes and the portability of performance and scalability. The target of our work is FLEUR, a software for electronic structure calculations developed in the Forschungszentrum Jülich over the course of two decades. The presented work follows up on a previous effort to modernize legacy code by re-engineering and rewriting it in terms of highly optimized libraries. We illustrate how this initial effort to get efficient and portable shared-memory code enables fast porting of the code to emerging heterogeneous architectures. More specifically, we port the code to nodes equipped with multiple GPUs. We divide our study in two parts. First, we show considerable speedups attained by minor and relatively straightforward code changes to off-load parts of the computation to the GPUs. Then, we identify further possible improvements to achieve even higher performance and scalability. On a system consisting of 16-cores and 2 GPUs, we observe speedups of up to 5\(\times \) with respect to our optimized shared-memory code, which in turn means between 7.5\(\times \) and 12.5\(\times \) speedup with respect to the original FLEUR code.

Diego Fabregat-Traver, Davor Davidović, Markus Höhnerbach, Edoardo Di Napoli

Visualizing Performance Data with Respect to the Simulated Geometry

Abstract

Understanding the performance behaviour of high-performance computing (hpc) applications based on performance profiles is a challenging task. Phenomena in the performance behaviour can stem from the hpc system itself, from the application’s code, but also from the application domain. In order to analyse the latter phenomena, we propose a system that visualizes profile-based performance data in its spatial context in the application domain, i.e., on the geometry processed by the application. It thus helps hpc experts and simulation experts understand the performance data better. Furthermore, it reduces the initially large search space by automatically labelling those parts of the data that reveal variation in performance and thus require detailed analysis.

Tom Vierjahn, Torsten W. Kuhlen, Matthias S. Müller, Bernd Hentschel

Provenance Tracking

Frontmatter

Framework for Sharing of Highly Resolved Turbulence Simulation Data

Abstract

The growing computational capabilities of nowadays supercomputers have made highly resolved turbulence simulations possible. The large data-sets and tremendous amount of required compute resources create serious new challenges when attempting to share the data between different research groups. But even more difficult to solve is the incompatibility of the data formats and numerical approaches used for turbulence simulations, which in detail are often only known to the simulation code developer. In this paper a framework for sharing data of large scale simulations is presented, which simplifies the access and further post-processing even beyond a single supercomputing center. It combines established services to provide an easy to manage-and-extend software setup without the need to standardize a database or -format. Beside other advantages, it enables the use of direct file outputs from simulation runs which are often archived anyway.

Bastian Tweddell, Jens Henrik Göbbert, Michael Gauding, Benjamin Weyers, Björn Hagemeier

UniProv: A Flexible Provenance Tracking System for UNICORE

Abstract

In this paper we present a flexible provenance management system called UniProv. UniProv is an ongoing development project providing provenance tracking in scientific workflows and data management particularly in the field of neuroscience, thus allowing users to validate and reproduce tasks and results of their experiments.

The primary goal is to equip the commonly used Grid middleware UNICORE [1] and its incorporated workflow engine with the provenance capturing mechanism of UniProv. We also explain an approach for using predefined patterns to ensure compatibility with the W3C PROV [2] Data Model and to map the provenance information properly to a neo4j graph database.

André Giesler, Myriam Czekala, Björn Hagemeier, Richard Grunzke

A Collaborative Simulation-Analysis Workflow for Computational Neuroscience Using HPC

Abstract

Workflows for the acquisition and analysis of data in the natural sciences exhibit a growing degree of complexity and heterogeneity, are increasingly performed in large collaborative efforts, and often require the use of high-performance computing (HPC). Here, we explore the reasons for these new challenges and demands and discuss their impact with a focus on the scientific domain of computational neuroscience. We argue for the need of software platforms integrating HPC systems that allow scientists to construct, comprehend and execute workflows composed of diverse data generation and processing steps using different tools. As a use case we present a concrete implementation of such a complex workflow, covering diverse topics such as HPC-based simulation using the NEST software, access to the SpiNNaker neuromorphic hardware platform, complex data analysis using the Elephant library, and interactive visualization methods for facilitating further analysis. Tools are embedded into a web-based software platform under development by the Human Brain Project, called the Collaboratory. On the basis of this implementation, we discuss the state of the art and future challenges in constructing large, collaborative workflows with access to HPC resources.

Johanna Senk, Alper Yegenoglu, Olivier Amblet, Yury Brukau, Andrew Davison, David Roland Lester, Anna Lührs, Pietro Quaglio, Vahid Rostami, Andrew Rowley, Bernd Schuller, Alan Barry Stokes, Sacha Jennifer van Albada, Daniel Zielasko, Markus Diesmann, Benjamin Weyers, Michael Denker, Sonja Grün

Backmatter

Title: High-Performance Scientific Computing
Editors: Edoardo Di Napoli
Marc-André Hermanns
Hristo Iliev
Andreas Lintermann
Alexander Peyser
Publisher: Springer International Publishing
Electronic ISBN: 978-3-319-53862-4
Print ISBN: 978-3-319-53861-7
DOI: https://doi.org/10.1007/978-3-319-53862-4

Springer Professional

High-Performance Scientific Computing

First JARA-HPC Symposium, JHPCS 2016, Aachen, Germany, October 4–5, 2016, Revised Selected Papers

About this book

Table of Contents

Frontmatter

Efficient HPC-Optimized Multi-Physics Coupling Strategies in CFD

Frontmatter

Partitioned High Performance Code Coupling Applied to CFD

Dynamic Load Balancing for Large-Scale Multiphysics Simulations

On the Significance of Exposure Time in Computational Blood Damage Estimation

A Partitioned Methodology for Conjugate Heat Transfer on Dynamic Structures

Farfield Noise Prediction Using Large-Scale Lattice-Boltzmann Simulations

FEniCS-HPC: Coupled Multiphysics in Computational Fluid Dynamics

The Direct-Hybrid Method for Computational Aeroacoustics on HPC Systems

A Novel Approach for Efficient Storage and Retrieval of Tabulated Chemistry in Reactive Flow Simulations

Multi-scale Coupling for Predictive Injector Simulations

Domain-Specific Applications and High-Performance Computing

Frontmatter

Ab Initio Description of Optoelectronic Properties at Defective Interfaces in Solar Cells

Scale Bridging Simulations of Large Elastic Deformations and Bainitic Transformations

Ab Initio Modelling of Electrode Material Properties

Overlapping of Communication and Computation in nb3dfft for 3D Fast Fourier Transformations

Towards Simulating Data-Driven Brain Models at the Point Neuron Level on Petascale Computers

Parallel Adaptive Integration in High-Performance Functional Renormalization Group Computations

Performance Portability

Frontmatter

Performance Optimization of Parallel Applications in Diverse On-Demand Development Teams

Hybrid CPU-GPU Generation of the Hamiltonian and Overlap Matrices in FLAPW Methods

Visualizing Performance Data with Respect to the Simulated Geometry

Provenance Tracking

Frontmatter

Framework for Sharing of Highly Resolved Turbulence Simulation Data

UniProv: A Flexible Provenance Tracking System for UNICORE

A Collaborative Simulation-Analysis Workflow for Computational Neuroscience Using HPC

Backmatter

Premium Partner