Skip to main content
Top

2020 | Book

High Performance Computing

ISC High Performance 2020 International Workshops, Frankfurt, Germany, June 21–25, 2020, Revised Selected Papers

insite
SEARCH

About this book

This book constitutes the refereed post-conference proceedings of 10 workshops held at the 35th International ISC High Performance 2020 Conference, in Frankfurt, Germany, in June 2020:
First Workshop on Compiler-assisted Correctness Checking and Performance Optimization for HPC (C3PO); First International Workshop on the Application of Machine Learning Techniques to Computational Fluid Dynamics Simulations and Analysis (CFDML); HPC I/O in the Data Center Workshop (HPC-IODC); First Workshop \Machine Learning on HPC Systems" (MLHPCS); First International Workshop on Monitoring and Data Analytics (MODA); 15th Workshop on Virtualization in High-Performance Cloud Computing (VHPC).

The 25 full papers included in this volume were carefully reviewed and selected. They cover all aspects of research, development, and application of large-scale, high performance experimental and commercial systems. Topics include high-performance computing (HPC), computer architecture and hardware, programming models, system software, performance analysis and modeling, compiler analysis and optimization techniques, software sustainability, scientific applications, deep learning.

Table of Contents

Frontmatter

First Workshop on Compiler-Assisted Correctness Checking and Performance Optimization for HPC (C3PO’20)

Frontmatter
Compiler-Assisted Type-Safe Checkpointing
Abstract
TyCart is a tool for type-safe checkpoint/restart and extends the memory allocation sanitizer tool TypeART with type asserts. Type asserts let the developer specify type requirements on memory regions, and, in our example implementation, they are used to implement a type-safe interface for the existing checkpoint libraries FTI and VeloC. We evaluate our approach on a set of mini-apps, and an application from astrophysics. The approach shows runtime and memory overhead below 5% in smaller benchmarks. In the astrophysics application, the runtime overhead reaches 30% and the memory overhead 70%.
Jan-Patrick Lehr, Alexander Hück, Moritz Fischer, Christian Bischof
Static Analysis to Enhance Programmability and Performance in OmpSs-2
Abstract
Task-based parallel programming models based on compiler directives have proved their effectiveness at describing parallelism in High-Performance Computing (HPC) applications. Recent studies show that cutting-edge Real-Time applications, such as those for unmanned vehicles, can successfully exploit these models. In this scenario, OpenMP is a de facto standard for HPC, and is being studied for Real-Time systems due to its time-predictability and delimited functional safety. However, changes in OpenMP take time to be standardized because it sweeps along a large community. OmpSs, instead, is a task-based model for fast-prototyping that has been a forerunner of OpenMP since its inception. OmpSs-2, its successor, aims at the same goal, and defines several features that can be introduced in future versions of OpenMP. This work targets compiler-based optimizations to enhance the programmability and performance of OmpSs-2. Regarding the former, we present an algorithm to determine the data-sharing attributes of OmpSs-2 tasks. Regarding the latter, we introduce a new algorithm to automatically release OmpSs-2 task dependencies before a task has completed. This work evaluates both algorithms in a set of well-known benchmarks, and discusses their applicability to the current and future specifications of OpenMP.
Adrian Munera, Sara Royuela, Roger Ferrer, Raul Peñacoba, Eduardo Quiñones
Automatic Detection of MPI Assertions
Abstract
The 2019 MPI standard draft specification includes the addition of defined communicator info hints. These hints are assertions that an application makes to an MPI implementation, so that a more optimized implementation is possible. The 2019 draft specifications defines four assertions: mpi_assert_no_any_tag, mpi_assert_no_any_source, mpi_assert_exact_length and mpi_assert_allow_overtaking. In this paper we will explore the capability of a Clang/LLVM based static analysis to check whether these assertions hold for a given program. With this tool, existing codebases can benefit from this new addition to the MPI standard without the need for costly human intervention.
Tim Jammer, Christian Iwainsky, Christian Bischof
Automatic Code Motion to Extend MPI Nonblocking Overlap Window
Abstract
HPC applications rely on a distributed-memory parallel programming model to improve the overall execution time. This leads to spawning multiple processes that need to communicate with each other to make the code progress. But these communications involve overheads caused by network latencies or synchronizations between processes. One possible approach to reduce those overheads is to overlap communications with computations. MPI allows this solution through its nonblocking communication mode: a nonblocking communication is composed of an initialization and a completion call. It is then possible to overlap the communication by inserting computations between these two calls. The use of nonblocking collective calls is however still marginal and adds a new layer of complexity. In this paper we propose an automatic static optimization that (i) transforms blocking MPI communications into their nonblocking counterparts and (ii) performs extensive code motion to increase the size of overlapping intervals between initialization and completion calls. Our method is implemented in LLVM as a compilation pass, and shows promising results on two mini applications.
Van Man Nguyen, Emmanuelle Saillard, Julien Jaeger, Denis Barthou, Patrick Carribault

First International Workshop on the Application of Machine Learning Techniques to Computational Fluid Dynamics Simulations and Analysis (CFDML)

Frontmatter
Complete Deep Computer-Vision Methodology for Investigating Hydrodynamic Instabilities
Abstract
In fluid dynamics, one of the most important research fields is hydrodynamic instabilities and their evolution in different flow regimes. The investigation of said instabilities is concerned with highly non-linear dynamics. Currently, three main methods are used for understanding of such phenomena – namely analytical and statistical models, experiments, and simulations – and all of them are primarily investigated and correlated using human expertise. This work demonstrates how a major portion of this research effort could and should be analysed using recent breakthrough advancements in the field of Computer Vision with Deep Learning (CVDL, or Deep Computer-Vision). Specifically, this work targets and evaluates specific state-of-the-art techniques – such as Image Retrieval, Template Matching, Parameters Regression and Spatiotemporal Prediction – for the quantitative and qualitative benefits they provide. In order to do so, this research focuses mainly on one of the most representative instabilities, the Rayleigh-Taylor instability (RTI). We include an annotated database of images returned from simulations of RTI (RayleAI). Finally, adjusted experimental results and novel physical loss methodologies were used to validate the correspondence of the predicted results to actual physical reality to evaluate the model efficiency. The techniques which were developed and proved in this work can serve as essential tools for physicists in the field of hydrodynamics for investigating a variety of physical systems. Some of them can be easily applied on already existing simulation results, while others could be used via Transfer Learning to other instabilities research. All models as well as the dataset that was created for this work, are publicly available at: https://​github.​com/​scientific-computing-nrcn/​SimulAI.
Re’em Harel, Matan Rusanovsky, Yehonatan Fridman, Assaf Shimony, Gal Oren

Open Access

Prediction of Acoustic Fields Using a Lattice-Boltzmann Method and Deep Learning
Abstract
Using traditional computational fluid dynamics and aeroacoustics methods, the accurate simulation of aeroacoustic sources requires high compute resources to resolve all necessary physical phenomena. In contrast, once trained, artificial neural networks such as deep encoder-decoder convolutional networks allow to predict aeroacoustics at lower cost and, depending on the quality of the employed network, also at high accuracy. The architecture for such a neural network is developed to predict the sound pressure level in a 2D square domain. It is trained by numerical results from up to 20,000 GPU-based lattice-Boltzmann simulations that include randomly distributed rectangular and circular objects, and monopole sources. Types of boundary conditions, the monopole locations, and cell distances for objects and monopoles serve as input to the network. Parameters are studied to tune the predictions and to increase their accuracy. The complexity of the setup is successively increased along three cases and the impact of the number of feature maps, the type of loss function, and the number of training data on the prediction accuracy is investigated. An optimal choice of the parameters leads to network-predicted results that are in good agreement with the simulated findings. This is corroborated by negligible differences of the sound pressure level between the simulated and the network-predicted results along characteristic lines and by small mean errors.
Mario Rüttgers, Seong-Ryong Koh, Jenia Jitsev, Wolfgang Schröder, Andreas Lintermann
Unsupervised Learning of Particle Image Velocimetry
Abstract
Particle Image Velocimetry (PIV) is a classical flow estimation problem which is widely considered and utilised, especially as a diagnostic tool in experimental fluid dynamics and the remote sensing of environmental flows. Recently, the development of deep learning based methods has inspired new approaches to tackle the PIV problem. These supervised learning based methods are driven by large volumes of data with ground truth training information. However, it is difficult to collect reliable ground truth data in large-scale, real-world scenarios. Although synthetic datasets can be used as alternatives, the gap between the training set-ups and real-world scenarios limits applicability. We present here what we believe to be the first work which takes an unsupervised learning based approach to tackle PIV problems. The proposed approach is inspired by classic optical flow methods. Instead of using ground truth data, we make use of photometric loss between two consecutive image frames, consistency loss in bidirectional flow estimates and spatial smoothness loss to construct the total unsupervised loss function. The approach shows significant potential and advantages for fluid flow estimation. Results presented here demonstrate that our method outputs competitive results compared with classical PIV methods as well as supervised learning based methods for a broad PIV dataset, and even outperforms these existing approaches in some difficult flow cases. Codes and trained models are available at https://​github.​com/​erizmr/​UnLiteFlowNet-PIV.
Mingrui Zhang, Matthew D. Piggott
Reduced Order Modeling of Dynamical Systems Using Artificial Neural Networks Applied to Water Circulation
Abstract
General circulation models are essential tools in weather and hydrodynamic simulation. They solve discretized, complex physical equations in order to compute evolutionary states of dynamical systems, such as the hydrodynamics of a lake. However, high-resolution numerical solutions using such models are extremely computational and time consuming, often requiring a high performance computing architecture to be executed satisfactorily. Machine learning (ML)-based low-dimensional surrogate models are a promising alternative to speed up these simulations without undermining the quality of predictions. In this work, we develop two examples of fast, reliable, low-dimensional surrogate models to produce a 36 h forecast of the depth-averaged hydrodynamics at Lake George NY, USA. Our ML approach uses two widespread artificial neural network (ANN) architectures: fully connected neural networks and long short-term memory. These ANN architectures are first validated in the deterministic and chaotic regimes of the Lorenz system and then combined with proper orthogonal decomposition (to reduce the dimensionality of the incoming input data) to emulate the depth-averaged hydrodynamics of a flow simulator called SUNTANS. Results show the ANN-based reduced order models have promising accuracy levels (within \(6\%\) of the prediction range) and advocate for further investigation into hydrodynamic applications.
Alberto Costa Nogueira Jr., João Lucas de Sousa Almeida, Guillaume Auger, Campbell D. Watson
Parameter Identification of RANS Turbulence Model Using Physics-Embedded Neural Network
Abstract
Identifying the appropriate parameters of a turbulence model for a class of flow usually requires extensive experimentation and numerical simulations. Therefore even a modest improvement of the turbulence model can significantly reduce the overall cost of a three-dimensional, time-dependent simulation. In this paper we demonstrate a novel method to find the optimal parameters in the Reynolds-averaged Navier–Stokes (RANS) turbulence model using high-fidelity direct numerical simulation (DNS) data. A physics informed neural network (PINN) that is embedded with the turbulent transport equations is studied, physical loss functions are proposed to explicitly impose information of the transport equations to neural networks. This approach solves an inverse problem by treating the five parameters in turbulence model as random variables, with the turbulent kinetic energy and dissipation rate as known quantities from DNS simulation. The objective is to optimize the five parameters in turbulence closures using the PINN leveraging limited data available from costly high-fidelity DNS data. We validated this method on two test cases of flow over bump. The recommended values were found to be \(C_{\epsilon 1}\) = 1.302, \(C_{\epsilon 2}\) = 1.862, \(C_{\mu }\) = 0.09, \(\sigma _K\) = 0.75, \(\sigma _{\epsilon }\) = 0.273; the mean absolute error of the velocity profile between RANS and DNS decreased by 22% when using these neural network inferred parameters.
Shirui Luo, Madhu Vellakal, Seid Koric, Volodymyr Kindratenko, Jiahuan Cui

HPC I/O in the Data Center Workshop (HPC-IODC)

Frontmatter
Investigating the Overhead of the REST Protocol When Using Cloud Services for HPC Storage
Abstract
With the significant advances in Cloud Computing, it is inevitable to explore the usage of Cloud technology in HPC workflows. While many Cloud vendors offer to move complete HPC workloads into the Cloud, this is limited by the massive demand of computing power alongside storage resources typically required by I/O intensive HPC applications. It is widely believed that HPC hardware and software protocols like MPI yield superior performance and lower resource consumption compared to the HTTP transfer protocol used by RESTful Web Services that are prominent in Cloud execution and Cloud storage. With the advent of enhanced versions of HTTP, it is time to reevaluate the effective usage of cloud-based storage in HPC and their ability to cope with various types of data-intensive workloads. In this paper, we investigate the overhead of the REST protocol via HTTP compared to the HPC-native communication protocol MPI when storing and retrieving objects. Albeit we compare the MPI for a communication use case, we can still evaluate the impact of data communication and, therewith, the efficiency of data transfer for data access patterns. We accomplish this by modeling the impact of data transfer using measurable performance metrics. Hence, our contribution is the creation of a performance model based on hardware counters that provide an analytical representation of data transfer over current and future protocols. We validate this model by comparing the results obtained for REST and MPI on two different cluster systems, one equipped with Infiniband and one with Gigabit Ethernet. The evaluation shows that REST can be a viable, performant, and resource-efficient solution, in particular for accessing large files.
Frank Gadban, Julian Kunkel, Thomas Ludwig
Characterizing I/O Optimization Effect Through Holistic Log Data Analysis of Parallel File Systems and Interconnects
Abstract
Recent HPC systems utilize parallel file systems such as GPFS and Lustre to cope with the huge demand of data-intensive applications. Although most of the HPC systems provide performance tuning tools on compute nodes, there is not enough chance to tune I/O activities on parallel file systems including high-speed interconnects among compute nodes and file systems. We propose an I/O performance optimization framework using log data of parallel file systems and interconnects in a holistic way for improving performance of HPC systems including I/O nodes and parallel file systems. We demonstrate our framework at the K computer with two I/O benchmarks for the original and the enhanced MPI-IO implementations. Its I/O analysis has revealed that I/O performance improvements achieved by the enhanced MPI-IO implementation are due to effective utilization of parallel file systems and interconnects among I/O nodes compared with the original MPI-IO implementation.
Yuichi Tsujita, Yoshitaka Furutani, Hajime Hida, Keiji Yamamoto, Atsuya Uno
The Importance of Temporal Behavior When Classifying Job IO Patterns Using Machine Learning Techniques
Abstract
Every day, supercomputers execute 1000s of jobs with different characteristics. Data centers monitor the behavior of jobs to support the users and improve the infrastructure, for instance, by optimizing jobs or by determining guidelines for the next procurement. The classification of jobs into groups that express similar run-time behavior aids this analysis as it reduces the number of representative jobs to look into. It is state of the practice to investigate job similarity by looking into job profiles that summarize the dynamics of job execution into one dimension of statistics and neglect the temporal behavior.
In this work, we utilize machine learning techniques to cluster and classify parallel jobs based on the similarity in their temporal IO behavior to highlight the importance of temporal behavior when comparing jobs. Our contribution is the qualitative and quantitative evaluation of different IO characterizations and similarity measurements that work toward the development of a suitable clustering algorithm.
We explore IO characteristics from monitoring data of one million parallel jobs and cluster them into groups of similar jobs. Therefore, the time series of various IO statistics is converted into features using different similarity metrics that customize the classification. We discuss conventional ML techniques that are applied to job profiles and contrast this with the analysis of time series data where we apply the Levenshtein distance as a distance metrics. While the employed Levenshtein algorithms aren’t yet optimal, the results suggest that temporal behavior is key to identify related pattern.
Eugen Betke, Julian Kunkel

1st Workshop “Machine Learning on HPC Systems” (MLHPCS)

Frontmatter
GOPHER, an HPC Framework for Large Scale Graph Exploration and Inference
Abstract
Biological ontologies, such as the Human Phenotype Ontology (HPO) and the Gene Ontology (GO), are extensively used in biomedical research to investigate the complex relationship that exists between the phenome and the genome. The interpretation of the encoded information requires methods that efficiently interoperate between multiple ontologies providing molecular details of disease-related features. To this aim, we present GenOtype PHenotype ExplOrer (GOPHER), a framework to infer associations between HPO and GO terms harnessing machine learning and large-scale parallelism and scalability in High-Performance Computing. The method enables to map genotypic features to phenotypic features thus providing a valid tool for bridging functional and pathological annotations. GOPHER can improve the interpretation of molecular processes involved in pathological conditions, displaying a vast range of applications in biomedicine.
Marc Josep-Fabregó, Xavier Teruel, Victor Gimenez-Abalos, Davide Cirillo, Dario Garcia-Gasulla, Sergio Alvarez-Napagao, Marta García-Gasulla, Eduard Ayguadé, Alfonso Valencia
Ensembles of Networks Produced from Neural Architecture Search
Abstract
Neural architecture search (NAS) is a popular topic at the intersection of deep learning and high performance computing. NAS focuses on optimizing the architecture of neural networks along with their hyperparameters in order to produce networks with superior performance. Much of the focus has been on how to produce a single best network to solve a machine learning problem, but as NAS methods produce many networks that work very well, this affords the opportunity to ensemble these networks to produce an improved result. Additionally, the diversity of network structures produced by NAS drives a natural bias towards diversity of predictions produced by the individual networks. This results in an improved ensemble over simply creating an ensemble that contains duplicates of the best network architecture retrained to have unique weights.
Emily J. Herron, Steven R. Young, Thomas E. Potok
SmartPred: Unsupervised Hard Disk Failure Detection
Abstract
Due to the rapidly increasing storage consumption worldwide, as well as the expectation of continuous availability of information, the complexity of administration in today’s data centers is growing permanently. Integrated techniques for monitoring hard disks can increase the reliability of storage systems. However, these techniques often lack intelligent data analysis to perform predictive maintenance. To solve this problem, machine learning algorithms can be used to detect potential failures in advance and prevent them. In this paper, an unsupervised model for predicting hard disk failures based on Isolation Forest is proposed. Consequently, a method is presented that can deal with the highly imbalanced datasets, as the experiment on the Backblaze benchmark dataset demonstrates.
Philipp Rombach, Janis Keuper

1st International Workshop on Monitoring and Data Analytics (MODA20)

Frontmatter
Application IO Analysis with Lustre Monitoring Using LASSi for ARCHER
Abstract
Supercomputers today have to support a complex workload with new Big Data and AI workloads adding to the more traditional HPC ones. It is important that we understand these workloads which constitute a mix of applications from different domains with different IO requirements. In some cases these applications place significant stress on the filesystem and may impact other applications making use of the shared resource. Today, ARCHER, the UK National Supercomputing service supports a diverse range of applications such as Climate Modelling, Bio-molecular Simulation, Material Science and Computational Fluid Dynamics. We will describe LASSi, a framework developed by the ARCHER Centre of Excellence to analyse application slowdown and IO usage on the shared (Lustre) filesystem.
LASSi combines application job information from the scheduler with Lustre IO monitoring statistics to construct the IO profile of applications interacting with the filesystem. We show how the metric-based, application-centric approach taken by LASSi was used both to understand application contention and reveal interesting aspects of the IO on ARCHER. In this paper we concentrate on new analysis of years of data collected from the ARCHER system. We study the general IO usage and trends in different ARCHER projects. We highlight how different application groups interact with the filesystem by building a metric based IO profile. This IO analysis of projects and applications enables project managers, HPC administrators, Application developers and Scientist to not only understand IO requirements but also plan for future. This information can be further used for reengineering applications, resource allocation planning and filesystem sizing for future systems.
Karthee Sivalingam, Harvey Richardson
AI-Driven Holistic Approach to Energy Efficient HPC
Abstract
Rapid growth of the world-wide Information Technology (IT) infrastructure fueled by demands of the global Digital Economy and associated demands for electrical power creates significant impact on the environment. Over the past decade power usage effectiveness (PUE) was the major focus for improving energy efficiency of Data Centres in particular. While PUE did result in significant energy efficiency improvements, it is not sufficient by itself. Huge energy efficiency gains are expected from optimizing hardware utilization, cooling and software stacks. We present an AI-Driven Holistic Approach to energy and power management in data centres, which can be described as Energy Aware Scheduling (EAS). EAS uses AI-driven workloads aware software-hardware co-design to optimize energy efficiency of a data centre.
Robert Tracey, Lan Hoang, Felix Subelet, Vadim Elisseev
Characterizing HPC Performance Variation with Monitoring and Unsupervised Learning
Abstract
As HPC systems grow larger and more complex, characterizing the relationships between their different components and gaining insight on their behavior becomes difficult. In turn, this puts a burden on both system administrators and developers who aim at improving the efficiency and reliability of systems, algorithms and applications. Automated approaches capable of extracting a system’s behavior, as well as identifying anomalies and outliers, are necessary more than ever.
In this work we discuss our exploratory study of Bayesian Gaussian mixture models, an unsupervised machine learning technique, to characterize the performance of an HPC system’s components, as well as to identify anomalies, based on sensor data. We propose an algorithmic framework for this purpose, implement it within the DCDB monitoring and operational data analytics system, and present several case studies carried out using data from a production HPC system.
Gence Ozer, Alessio Netti, Daniele Tafani, Martin Schulz

15th Workshop on Virtualization in High-Performance Cloud Computing (VHPC’20)

Frontmatter

Open Access

Service Function Chaining Based on Segment Routing Using P4 and SR-IOV (P4-SFC)
Abstract
In this paper we describe P4-SFC to support service function chaining (SFC) based on a single P4-capable switch and off-the-shelf components. It utilizes MPLS-based segment routing for traffic forwarding in the network and SR-IOV for efficient packet handling on hosts. We describe the P4-SFC architecture and demonstrate its feasibility by a prototype using the Tofino Edgecore Wedge 100BF-32X as P4 switch. Performance test show that L2 throughput for VNFs on a host is significantly larger when connected via SR-IOV with the host’s network interface card instead of over a software switch.
Andreas Stockmayer, Stephan Hinselmann, Marco Häberle, Michael Menth
Seamlessly Managing HPC Workloads Through Kubernetes
Abstract
This paper describes an approach to integrate the jobs management of High Performance Computing (HPC) infrastructures in cloud architectures by managing HPC workloads seamlessly from the cloud job scheduler. The paper presents hpc-connector, an open source tool that is designed for managing the full life cycle of jobs in the HPC infrastructure from the cloud job scheduler interacting with the workload manager of the HPC system. The key point is that, thanks to running hpc-connector in the cloud infrastructure, it is possible to reflect in the cloud infrastructure, the execution of a job running in the HPC infrastructure managed by hpc-connector. If the user cancels the cloud-job, as hpc-connector catches Operating System (OS) signals (for example, SIGINT), it will cancel the job in the HPC infrastructure too. Furthermore, it can retrieve logs if requested. Therefore, by using hpc-connector, the cloud job scheduler can manage the jobs in the HPC infrastructure without requiring any special privilege, as it does not need changes on the Job scheduler. Finally, we perform an experiment training a neural network for automated segmentation of Neuroblastoma tumours in the Prometheus supercomputer using hpc-connector as a batch job from a Kubernetes infrastructure.
Sergio López-Huguet, J. Damià Segrelles, Marek Kasztelnik, Marian Bubak, Ignacio Blanquer
Interference-Aware Orchestration in Kubernetes
Abstract
Nowadays, there is an increasing number of workloads, i.e. data serving, analytics, AI, HPC workloads, etc., executed on the Cloud. Although multi-tenancy has gained a lot of attention to optimize resource efficiency, current state-of-the-art resource orchestrators rely on typical metrics, such as CPU or memory utilization, for placing incoming workloads on the available pool of resources, thus, neglecting the interference effects from workload co-location. In this paper, we design an interference-aware cloud orchestrator, based on micro-architectural event monitoring. We integrate our solution with Kubernetes, one of the most widely used and commercially adopted cloud orchestration frameworks nowadays, and we show that we achieve higher performance, up to 32% compared to its default scheduler, for a variety of cloud representative workloads.
Achilleas Tzenetopoulos, Dimosthenis Masouros, Sotirios Xydis, Dimitrios Soudris
RustyHermit: A Scalable, Rust-Based Virtual Execution Environment
Abstract
System-level development has been dominated by programming languages such as C/C++ for decades. These languages are inherently unsafe, error-prone, and a major reason for vulnerabilities. High-level programming languages with a secure memory model and strong type system are able to improve the quality of the system software. This paper explores the programming language Rust for development of a scalable, virtual execution environment and presents the integration of a Rust-based IP stack into RustyHermit. RustyHermit is part of the standard Rust toolchain and common Rust applications are able to build on top of RustyHermit.
Stefan Lankes, Jonathan Klimt, Jens Breitbart, Simon Pickartz
Rootless Containers with Podman for HPC
Abstract
Containers have become popular in HPC environments to improve the mobility of applications and the delivery of user-supplied code. In this paper we evaluate Podman, an enterprise container engine that supports rootless containers, in combination with runc and crun as container runtimes using a real-world workload with LS-DYNA, and the industry-standard benchmarks sysbench and STREAM. The results suggest that Podman with crun only introduces a similar low overhead as HPC-focused container technologies.
Holger Gantikow, Steffen Walter, Christoph Reich

Open Access

Bioinformatics Application with Kubeflow for Batch Processing in Clouds
Abstract
Bioinformatics pipelines make extensive use of HPC batch processing. The rapid growth of data volumes and computational complexity, especially for modern applications such as machine learning algorithms, imposes significant challenges to local HPC facilities. Many attempts have been made to burst HPC batch processing into clouds with virtual machines. They all suffer from some common issues, for example: very high overhead, slow to scale up and slow to scale down, and nearly impossible to be cloud-agnostic.
We have successfully deployed and run several pipelines on Kubernetes in OpenStack, Google Cloud Platform and Amazon Web Services. In particular, we use Kubeflow on top of Kubernetes for more sophisticated job scheduling, workflow management, and first class support for machine learning. We choose Kubeflow/Kubernetes to avoid the overhead of provisioning of virtual machines, to achieve rapid scaling with containers, and to be truly cloud-agnostic in all cloud environments.
Kubeflow on Kubernetes also creates some new challenges in deployment, data access, performance monitoring, etc. We will discuss the details of these challenges and provide our solutions. We will demonstrate how our solutions work across all three very different clouds for both classical pipelines and new ones for machine learning.
David Yu Yuan, Tony Wildish
Converging HPC, Big Data and Cloud Technologies for Precision Agriculture Data Analytics on Supercomputers
Abstract
The convergence of HPC and Big Data along with the influence of Cloud are playing an important role in the democratization of HPC. The increasing needs of Data Analytics in computational power has added new fields of interest for the HPC facilities but also new problematics such as interoperability with Cloud and ease of use. Besides the typical HPC applications, these infrastructures are now asked to handle more complex workflows combining Machine Learning, Big Data and HPC. This brings challenges on the resource management, scheduling and environment deployment layers. Hence, enhancements are needed to allow multiple frameworks to be deployed under common system management while providing the right abstraction to facilitate adoption.
This paper presents the architecture adopted for the parallel and distributed execution management software stack of Cybele EU funded project which is put in place on production HPC centers to execute hybrid data analytics workflows in the context of precision agriculture and livestock farming applications. The design is based on: Kubernetes as a higher level orchestrator of Big Data components, hybrid workflows and a common interface to submit HPC or Big Data jobs; Slurm or Torque for HPC resource management; and Singularity containerization platform for the dynamic deployment of the different Data Analytics frameworks on HPC. The paper showcases precision agriculture workflows being executed upon the architecture and provides some initial performance evaluation results and insights for the whole prototype design.
Yiannis Georgiou, Naweiluo Zhou, Li Zhong, Dennis Hoppe, Marcin Pospieszny, Nikela Papadopoulou, Kostis Nikas, Orestis Lagkas Nikolos, Pavlos Kranas, Sophia Karagiorgou, Eric Pascolo, Michael Mercier, Pedro Velho
Backmatter
Metadata
Title
High Performance Computing
Editors
Heike Jagode
Hartwig Anzt
Dr. Guido Juckeland
Hatem Ltaief
Copyright Year
2020
Electronic ISBN
978-3-030-59851-8
Print ISBN
978-3-030-59850-1
DOI
https://doi.org/10.1007/978-3-030-59851-8

Premium Partner