Skip to main content

Open Access 2020 | Open Access | Buch

Buchtitelbild

Supercomputing Frontiers

6th Asian Conference, SCFA 2020, Singapore, February 24–27, 2020, Proceedings

insite
SUCHEN

Über dieses Buch

This open access book constitutes the refereed proceedings of the 6th Asian Supercomputing Conference, SCFA 2020, which was planned to be held in February 2020, but unfortunately, the physical conference was cancelled due to the COVID-19 pandemic.The 8 full papers presented in this book were carefully reviewed and selected from 22 submissions. They cover a range of topics including file systems, memory hierarchy, HPC cloud platform, container image configuration workflow, large-scale applications, and scheduling.

Inhaltsverzeichnis

Frontmatter

File Systems, Storage and Communication

Frontmatter

Open Access

A BeeGFS-Based Caching File System for Data-Intensive Parallel Computing
Abstract
Modern high-performance computing (HPC) systems are increasingly using large amounts of fast storage, such as solid-state drives (SSD), to accelerate disk access times. This approach has been exemplified in the design of “burst buffers”, but more general caching systems have also been built. This paper proposes extending an existing parallel file system to provide such a file caching layer. The solution unifies data access for both the internal storage and external file systems using a uniform namespace. It improves storage performance by exploiting data locality across storage tiers, and increases data sharing between compute nodes and across applications. Leveraging data striping and meta-data partitioning, the system supports high speed parallel I/O for data intensive parallel computing. Data consistency across tiers is maintained automatically using a cache aware access algorithm. A prototype has been built using BeeGFS to demonstrate rapid access to an underlying IBM Spectrum Scale file system. Performance evaluation demonstrates a significant improvement in the efficiency over an external parallel file system.
David Abramson, Chao Jin, Justin Luong, Jake Carroll

Open Access

Multiple HPC Environments-Aware Container Image Configuration Workflow for Large-Scale All-to-All Protein–Protein Docking Calculations
Abstract
Containers offer considerable portability advantages across different computing environments. These advantages can be realized by isolating processes from the host system whilst ensuring minimum performance overhead. Thus, use of containers is becoming popular in computational science. However, there exist drawbacks associated with container image configuration when operating with different specifications under varying HPC environments. Users need to possess sound knowledge of systems, container runtimes, container image formats, as well as library compatibilities in different HPC environments. The proposed study introduces an HPC container workflow that provides customized container image configurations based on the HPC container maker (HPCCM) framework pertaining to different HPC systems. This can be realized by considering differences between the container runtime, container image, and library compatibility between the host and inside of containers. The authors employed the proposed workflow in a high performance protein–protein docking application—MEGADOCK—that performs massively parallel all-to-all docking calculations using GPU, OpenMP, and MPI hybrid parallelization. The same was subsequently deployed in target HPC environments comprising different GPU devices and system interconnects. Results of the evaluation experiment performed in this study confirm that the parallel performance of the container application configured using the proposed workflow exceeded a strong-scaling value of 0.95 for half the computing nodes in the ABCI system (512 nodes with 2,048 NVIDIA V100 GPUs) and one-third those in the TSUBAME 3.0 system (180 nodes with 720 NVIDIA P100 GPUs).
Kento Aoyama, Hiroki Watanabe, Masahito Ohue, Yutaka Akiyama

Open Access

DAOS: A Scale-Out High Performance Storage Stack for Storage Class Memory
Abstract
The Distributed Asynchronous Object Storage (DAOS) is an open source scale-out storage system that is designed from the ground up to support Storage Class Memory (SCM) and NVMe storage in user space. Its advanced storage API enables the native support of structured, semi-structured and unstructured data models, overcoming the limitations of traditional POSIX based parallel filesystem. For HPC workloads, DAOS provides direct MPI-IO and HDF5 support as well as POSIX access for legacy applications. In this paper we present the architecture of the DAOS storage engine and its high-level application interfaces. We also describe initial performance results of DAOS for IO500 benchmarks.
Zhen Liang, Johann Lombardi, Mohamad Chaarawi, Michael Hennecke

Open Access

Cloud Platform Optimization for HPC
Abstract
The special requirements of HPC have typically been tacked onto existing cloud infrastructure and practices. As a result, most cloud offerings aren’t completely optimized for HPC, or aren’t yet feature-complete as far as traditional supercomputing experience is concerned. This work addresses the progress made in (1) optimizing the performance of HPC workloads in a cloud environment, and (2) evolving the usability of cloud HPC environments. Specifically, this work discusses efforts made to minimize and eliminate the impact of virtualization on HPC workloads on cloud infrastructure and move towards a more familiar supercomputing experience. Initial experience with “cloud-native” HPC is also discussed. In many aspects, this work is inspired by and impactful for many HPC workloads in many disciplines including earth sciences and manufacturing.
Aman Verma

Applications and Scheduling

Frontmatter

Open Access

swGBDT: Efficient Gradient Boosted Decision Tree on Sunway Many-Core Processor
Abstract
Gradient Boosted Decision Trees (GBDT) is a practical machine learning method, which has been widely used in various application fields such as recommendation system. Optimizing the performance of GBDT on heterogeneous many-core processors exposes several challenges such as designing efficient parallelization scheme and mitigating the latency of irregular memory access. In this paper, we propose swGBDT, an efficient GBDT implementation on Sunway processor. In swGBDT, we divide the 64 CPEs in a core group into multiple roles such as loader, saver and worker in order to hide the latency of irregular global memory access. In addition, we partition the data into two granularities such as block and tile to better utilize the LDM on each CPE for data caching. Moreover, we utilize register communication for collaboration among CPEs. Our evaluation with representative datasets shows that swGBDT achieves 4.6\(\times \) and 2\(\times \) performance speedup on average compared to the serial implementation on MPE and parallel XGBoost on CPEs respectively.
Bohong Yin, Yunchun Li, Ming Dun, Xin You, Hailong Yang, Zhongzhi Luan, Depei Qian

Open Access

Numerical Simulations of Serrated Propellers to Reduce Noise
Abstract
The objective of this research is to investigate the effect of serrations on quadcopter propeller blades on noise reduction through numerical simulations. Different types of the 5 inch 5030 propellers, such as the standard, modified and serrated, are tested. The modified propeller has a portion of its blade’s trailing edge cut off to achieve the same surface area as that of the serrated blades to ensure a fairer comparison. Three-dimensional simulations propellers have been performed using an immersed boundary method (IBM) Navier–Stokes finite volume solver to obtain the velocity flow fields and pressure. An acoustic model, based on the well-known Ffowcs Williams-Hawkings (FW-H) formulation, is then used to predict the far field noise caused by the rotating blades of the propeller. Results show that due to the reduction in surface area of the propeller’s blades, there is a drop in the thrust produced by modified and serrated propellers, compared to the standard one. However, comparing between the modified and serrated propellers with different wavelength, we found that certain wavelengths show a reduction in noise while maintaining similar thrust. This is because the serrations break up the larger vortices into smaller ones This shows that there is potential in using serrated propellers for noise reduction.
Wee-beng Tay, Zhenbo Lu, Sai Sudha Ramesh, Boo-cheong Khoo

Open Access

High-Performance Computing in Maritime and Offshore Applications
Abstract
The development of supercomputing technologies has enabled a shift towards high-fidelity simulations that is used to complement physical modelling. At the Technology Centre for Offshore and Marine, Singapore (TCOMS), such simulations are used for high-resolution investigations into particular aspects of fluid-structure interactions in order to better understand and thereby predict the generation of important flow features or the complex hydrodynamic interactions between components onboard ships and floating structures. In addition, by building on the outputs of such simulations, data-driven models of actual physical systems are being developed, which in turn can be used as digital twins for real-time predictions of the behaviour and responses when subjected to complex real-world environmental loads. In this paper, examples of the high-resolution investigations, as well as the development of digital twins, are described and discussed.
Kie Hian Chua, Harrif Santo, Yuting Jin, Hui Liang, Yun Zhi Law, Gautham R. Ramesh, Lucas Yiew, Yingying Zheng, Allan Ross Magee

Open Access

Correcting Job Walltime in a Resource-Constrained Environment
Abstract
A resource-constrained HPC system such as the Computing and Archiving Research Environment (COARE) facility provides a collaborative platform for researchers to run computationally intensive experiments to address societal issues. However, users encounter job processing delays that result in low research productivity. Known causes come from the limited system capacity and the relatively long and rarely modified default walltime. In this study, we selected and characterized real HPC workloads. Then, we reviewed and applied the recommended runtime or walltime-based predictive-corrective scheduling techniques to reduce long job queues and scheduling slowdown. Using simulations to determine walltime scheduling performances on environments with limited capacity, we proved that our proposed walltime correction, especially its simple version, is enough to increase scheduling productivity. Our experiments significantly reduced the average bounded scheduling slowdown in COARE by 98.95\(\%\) with a predictive-corrective approach, and 99.90\(\%\) with a correction-only algorithm. Systems with large job diversity as well as those comprising of mostly short jobs significantly lowered delays and slowdown, notably with walltime correction. These simulation results strengthen our recommendation to resource-constrained system administrators to start utilizing walltime correction even without prediction to eventually increase HPC productivity.
Jessi Christa Rubio, Aira Villapando, Christian Matira, Jeffrey Aborot
Backmatter
Metadaten
Titel
Supercomputing Frontiers
herausgegeben von
Dhabaleswar K. Panda
Copyright-Jahr
2020
Electronic ISBN
978-3-030-48842-0
Print ISBN
978-3-030-48841-3
DOI
https://doi.org/10.1007/978-3-030-48842-0