Skip to main content
Top

2021 | Book

High Performance Computing

7th Latin American Conference, CARLA 2020, Cuenca, Ecuador, September 2–4, 2020, Revised Selected Papers

insite
SEARCH

About this book

This book constitutes revised selected papers of the 7th Latin American High Performance Computing Conference, CARLA 2020, held in Cuenca, Ecuador, in September 2020. Due to the COVID-19 pandemic the conference was held in a virtual mode.

The 15 revised full papers presented were carefully reviewed and selected out of 36 submissions. The papers included in this book are organized according to the topics on ​High Performance Computing Applications; High Performance Computing and Artificial Intelligence.

Table of Contents

Frontmatter

High Performance Computing Applications

Frontmatter
Dynamically Distributing Tasks from an Unattended Parallel Compiler with Cloudbook
Abstract
A dynamic version of Cloudbook is presented in this work, a new tool for automatically and unattendedly parallelizing codes which also lately distributes the tasks dynamically. Cloudbook is designed for Python codes and, above all, makes the parallelization in a way in which the number and main characteristics of the available infrastructure is taken into account for optimizing the execution (performance, bandwidth connection, etc.) in a dynamic way. Cloudbook is designed to allow developers to get the technical benefits of automated distribution and parallelization of programs with a very low learning cost. It only requires labelling the original code with a reduced set of pragmas located at function headers. Results of the tests carried out with Cloudbook with several codes on a real infrastructure are presented as well.
José J. García-Aranda, Juan Ramos-Díaz, Sergio Molina-Cardín, Xavier Larriva-Novo, Andrés Bustos, Luis A. Galindo, Rafael Mayo-García
Fostering Remote Visualization: Experiences in Two Different HPC Sites
Abstract
Visualization of scientific data is crucial for scientific discovery to gain insight into the results of simulations and experiments. Remote visualization is of crucial importance to access infrastructure, data and computational resources and, to avoid data movement from where data is produced and to where data will be analyzed. Remote visualization enables geographically diverse collaboration and enhances user experience through graphical user interfaces. This paper presents two approaches deployed by two different HPC centers: The SC3 - Supercomputación y Cálculo Científico Center in Colombia and the Oak Ridge Leadership Computing Facility in USA. We overview our remote visualization experiences, adopted technologies, use cases, and challenges encountered. Our contribution is to signal the commonality between approaches in terms of the end goal, showing their fitness for their contexts, while not focusing only on attempting to provide a general picture of remote visualization, given the differences between centers in terms of purposes, needs, resources, and national impact.
Sergio Augusto Gélvez Cortés, César A. Bernal, Carlos J. Barrios, Benjamín Hernández
High Performance Computing Simulations of Granular Media in Silos
Abstract
This article presents the application of high performance computing for efficient simulations of granular media in silos. Granular media are extensively used in industry, where storage and proper treatment pose several challenges to the scientific community. A relevant problem concerns the study of granular media stored in a silo. Determining the behavior of the media during load and discharge stages is critical. Knowing how the stored particles interact with each other and how they interact with the storage structure can lead to understanding and preventing undesirable effects (e.g., the collapse of the structure) during the silo operation. Charge and discharge processes of granular media in silos are frequently studied using computer simulations. High performance computing comes to help researchers to perform granular media simulations for systems with a large number of particles, in order to model realistic situations in reasonable computing times. This article describes the application of a parallel/distributed high performance computing approach for studying the mechanisms that control the charging and discharging process of silos, in which grains pass through a bottleneck. Simulations are performed applying the Discrete Element Method, and the experimental evaluation is performed over the high performance computing infrastructure of the National Supercomputing Center in Uruguay. The analysis includes large realistic scenarios considering the physical properties of different grains, involving up to 450,000 particles. The proposed implementation allowed to reduce the execution time of simulations up to 42%, demonstrating the capabilities of the proposed parallel/distributed computing approach to scale to solve large problem instances properly.
Miguel Da Silva, Sergio Nesmachnow, Santiago Iturriaga, Gabriel Usera
Performance Analysis of Main Public Cloud Big Data Services Processing Brazilian Government Data
Abstract
The growing amount of information generated by big data systems has driven the use of tools that facilitate their processing, such as Hadoop and its entire ecosystem. These tools can run on computational clouds whose benefits include payment on-demand, self-service, and elasticity. This article evaluates three cloud services that delivers fully-configured Hadoop ecosystems: AWS Elastic Map Reduce (EMR), Google Dataproc, and Microsoft HDInsight. This evaluation was made by measuring their performance and computational resource consumption by performing workloads using data from the Bolsa Família, a social welfare program of the Brazilian Government. The results showed that HDInsight had better runtime performance. Variations in the consumption of resources related to memory, disk activity, cost, and processing were found, providing an insight into the strategy of each provider that can be useful in the decision-making processes.
Leonardo Rebouças de Carvalho, Marcelo Augusto da Cruz Motta, Aleteia Patricia Favacho de Araújo
Accelerating Machine Learning Algorithms with TensorFlow Using Thread Mapping Policies
Abstract
Machine Learning (ML) algorithms are increasingly being used in various scientific and industrial problems, with the time of execution of these algorithms as an important concern. In this work, we explore mappings of threads in multi-core architectures and their impact on new ML algorithms running with Python and TensorFlow. Using smart thread mapping, we were able to reduce the execution time of both training and inference phases for up to 46% and 29%, respectively.
Matheus W. Camargo, Matheus S. Serpa, Danilo Carastan-Santos, Alexandre Carissimi, Philippe O. A. Navaux
Methodology for Design and Implementation an Efficient HPC Cluster
Abstract
For years, clusters for HPC have been implemented through the typical process of obtaining the source code, configuring and compiling each of the tools that make up the infrastructure services. Each administrator based on their experience and knowledge assumes a series of considerations to design and implement a cluster that is considered efficient by installing base tools such as NTP, NFS, a task manager (that is, SLURM), LDAP, among others. In order to reduce these times, several open-source initiatives have emerged, such as Rocks, that allow the rapid implementation of an HPC cluster despite its low configuration flexibility. OpenHPC emerges as an alternative that provides the necessary tools in a software repository and that once installed allows the same flexibility of customization and adaptation as if they had been installed in a typical way. It’s worth mentioning that OpenHPC provides all of those standardized tools in order to spread best practices in building and managing HPC data centers, but unlike Rocks, OpenHPC requires pre-design of the platform, including network infrastructure, storage services, and the different tools to implement, requiring prior knowledge by the administrator about each of them. The objective of this paper is to present the fundamental basis for implementing an efficient cluster by using OpenHPC without becoming a technical installation guide, but rather a series of steps in a methodology used by the Supercomputación y Cálculo Cienfífico Laboratory SC3.
L. A. Torres, Carlos J. Barrios
Estimating the Execution Time of the Coupled Stage in Multiscale Numerical Simulations
Abstract
Estimating the execution time of high-performance computing (HPC) applications is an issue that affects both shared computing infrastructures and their users. The goal of the present work is to estimate the execution time of simulation applications driven by multiscale numerical methods. In computational terms, these methods induce a two-stage simulation process. Fundamentally, the number of possibilities for configuring this two-stage process tends to be much larger than that of classical, one-stage numerical methods. This scenario makes it harder to provide accurate estimates of the execution time of multiscale simulations by using classical regression techniques. We propose a methodology that explores the idiosyncrasies of multiscale simulators to reduce the uncertainty of predictions. We applied it in this paper to the specific challenge of estimating the execution time of these simulators based on knowledge about the influence of each parameter of the numerical method they employ. We consider the multiscale hybrid-mixed (MHM) finite element method as a specific multiscale method to validate our methodology. We compared our proposed technique with 3 well-known regression approaches: a model-based tree (M5P), a bayesian nonparametric method (GPR), and a state-of-the-art ensemble method (Random Forest). We found that the root-mean-square error (RMSE) of the test dataset for our technique was considerably less than that obtained by these 3 approaches. We conclude that an educated consideration of the numerical parameters of the MHM method to estimate the execution time of the simulations helps to obtain more accurate models. We believe such conclusion can be easily generalized to other multiscale numerical methods.
Juan H. L. Fabian, Antônio T. A. Gomes, Eduardo Ogasawara

High Performance Computing and Artificial Intelligence

Frontmatter
Using HPC as a Competitive Advantage in an International Robotics Challenge
Abstract
Researchers in every knowledge field are moving towards the use of supercomputing facilities because the computing power they can provide is not achievable by individual research groups. The use of supercomputing centers would allow them to reduce costs and time. Additionally, there is a growing trend towards the use of GPUs clusters in HPC centers to accelerate particularly parallel codes as the ones related with the training of artificial neural networks. This paper presents a successful use case of a supercomputing facility, SCAYLE - Centro de Supercomputación de Castilla y León -(Spain) by a group of robotic researchers while participating in an international robotics competition - the ERL Smart CIty RObotic Challenge (SciRoc). The goal of the paper is to show that HPC facilities can be required to provided particular SLAs (Service Level Agreement). In the case described, the HPC services were used to train neural networks for object recognition, that could not be easily trained on-site and that cannot be trained in advanced because of the regulation of the competition.
Claudia Álvarez Aparicio, Jonatan Ginés, Miguel A. Santamarta, Francisco Martín Rico, Ángel M. Guerrero Higueras, Francisco J. Rodríguez Lera, Vicente Matellán Olivera
A Survey on Privacy-Preserving Machine Learning with Fully Homomorphic Encryption
Abstract
The secure and efficient processing of private information in the cloud computing paradigm is still an open issue. New security threats arise with the increasing volume of data into cloud storage, where cloud providers require high levels of trust, and data breaches are significant problems. Encrypting the data with conventional schemes is considered the best option to avoid security problems. However, a decryption process is necessary when the data must be processed, but it falls into the initial problem of data vulnerability. The user cannot operate on the data directly and must download it to perform the computations locally. In this context, Fully Homomorphic Encryption (FHE) is considered the holy grail of cryptography in order to solve cybersecurity problems, it allows a non-trustworthy third-party resource to blindly process encrypted information without disclosing confidential data. FHE is a valuable capability in a world of distributed computation and heterogeneous networking. In this survey, we present a comprehensive review of theoretical concepts, state-of-the-art, limitations, potential applications, and development tools in the domain of FHE. Moreover, we show the intersection of FHE and machine learning from a theoretical and a practical point of view and identify potential research directions to enrich Machine Learning as a Service, a new paradigm of cloud computing. Specifically, this paper aims to be a guide to researchers and practitioners interested in learning, applying, and extending knowledge in FHE over machine learning.
Luis Bernardo Pulido-Gaytan, Andrei Tchernykh, Jorge M. Cortés-Mendoza, Mikhail Babenko, Gleb Radchenko
Distributed Greedy Approach for Autonomous Surveillance Using Unmanned Aerial Vehicles
Abstract
This article presents a distributed approach for autonomous exploration and surveillance using unmanned aerial vehicles. The proposed solution applies the agent-oriented paradigm to implement a cooperative approach to solve the problem efficiently. A specific state machine is proposed for unmanned aerial vehicles to implement the coordination needed to explore and monitor a set of points of interest without a centralized infrastructure. The system is conceived to be applied in low-cost commercial unmanned aerial vehicles, to provide an affordable solution for the problem. The experimental evaluation is performed over real and synthetic scenarios. Relevant metrics are studied, including coverage of the explored area and surveillance of the defined points of interest, considering the flight autonomy limitations due to the battery charge. Results demonstrate the validity and applicability of the proposed distributed approach and the effectiveness of the greedy exploration strategy to fulfill the considered goals.
Santiago Behak, Giovani Rondán, Martín Zanetti, Santiago Iturriaga, Sergio Nesmachnow
Electricity Demand Forecasting Using Computational Intelligence and High Performance Computing
Abstract
This article presents the application of parallel computing for building different computational intelligence models applied to the forecast of the hourly electricity demand of the following day. The short-term forecast of electricity demand is a crucial problem to define the dispatch of generators. In turn, it is necessary to define demand response policies related with smart grids. Computational intelligence models have emerged as successful methods for prediction in recent years. The large amount of existing data from different sources and the great development of supercomputing allows to build models with adequate complexity to represent all the variables that improves the prediction. Parallel computing techniques are applied to obtain two artificial neural network architectures and its related parameters to forecast the total electricity demand of Uruguay for the next day. These techniques consists in train and evaluate models in parallel with different architectures and sets of parameters using grid search techniques. Furthermore each model is trained using Tensorflow with finite-grained GPU parallelism. Considering the high computing demands of the applied techniques, they are developed and executed on the high performance computing platform provided by National Supercomputing Center (Cluster-UY), Uruguay. Standard performance metrics are applied to evaluate the proposed models. The experimental evaluation of the best model reports excellent forecasting results. This model has a mean absolute percentage error of \(4.3\%\) when applied to the prediction of unseen data.
Rodrigo Porteiro, Sergio Nesmachnow
Parallel/Distributed Generative Adversarial Neural Networks for Data Augmentation of COVID-19 Training Images
Abstract
This article presents an approach using parallel/distributed generative adversarial networks for image data augmentation, applied to generate COVID-19 training samples for computational intelligence methods. This is a relevant problem nowadays, considering the recent COVID-19 pandemic. Computational intelligence and learning methods are useful tools to assist physicians in the process of diagnosing diseases and acquire valuable medical knowledge. A specific generative adversarial network approach trained using a co-evolutionary algorithm is implemented, including a three-level parallel approach combining distributed memory and fine-grained parallelization using CPU and GPU. The experimental evaluation of the proposed method was performed on the high performance computing infrastructure provided by National Supercomputing Center, Uruguay. The main experimental results indicate that the proposed model is able to generate accurate images and the \(3\times 3\) version of the distributed GAN has better robustness properties of its training process, allowing to generate better and more diverse images.
Jamal Toutouh, Mathias Esteban, Sergio Nesmachnow
Analysis of Regularization in Deep Learning Models on Testbed Architectures
Abstract
Deep Learning models have come into significant use in the field of biology and healthcare, genomics, medical imaging, EEGs, and electronic medical records [14]. In the training these models can be affected due to overfitting, which is mainly due to the fact that Deep Learning models try to adapt as much as possible to the training data, looking for the decrease of the training error which leads to the increase of the validation error. To avoid this, different techniques have been developed to reduce overfitting, among which are the Lasso and Ridge regularization, weight decay, batch normalization, early stopping, data augmentation and dropout. In this research, the impact of the neural network architecture, the batch size and the value of the dropout on the decrease of overfitting, as well as on the time of execution of the tests, is analyzed. As identified in the tests, the neural network architectures with the highest number of hidden layers are the ones that try to adapt to the training data set, which makes them more prone to overfitting.
Félix Armando Mejía Cajicá, John A. García Henao, Carlos Jaime Barrios Hernández, Michel Riveill
Computer Application for the Detection of Skin Diseases in Photographic Images Using Convolutional Neural Networks
Abstract
The present work was to generate an efficient computer application for the detection of skin diseases from photographic images, using convolutional neural network algorithms. This tool is aimed at supporting diagnostic processes. For this research, priority has been given to the diseases “Impetigo” and “Psoriasis”, which are common diseases in cities of the Peruvian Amazon. The city of Iquitos will be taken as a case study. An image bank of 1640 images has been generated, and 3 algorithms have been experimented with: Inception V3, VGG 16 and ResNet 50. Finally, excellent results have been achieved in the detection of skin diseases with the Inception V3 algorithm.
Alejandro Reátegui Pezo, Isaac Ocampo Yahuarcani, Angela Milagros Nuñez Satalaya, Lelis Antony Saravia Llaja, Carlos Alberto García Cortegano, Astrid Fariza Panduro Ahuanari
Neocortex and Bridges-2: A High Performance AI+HPC Ecosystem for Science, Discovery, and Societal Good
Abstract
Artificial intelligence (AI) is transforming research through analysis of massive datasets and accelerating simulations by factors of up to a billion. Such acceleration eclipses the speedups that were made possible though improvements in CPU process and design and other kinds of algorithmic advances. It sets the stage for a new era of discovery in which previously intractable challenges will become surmountable, with applications in fields such as discovering the causes of cancer and rare diseases, developing effective, affordable drugs, improving food sustainability, developing detailed understanding of environmental factors to support protection of biodiversity, and developing alternative energy sources as a step toward reversing climate change. To succeed, the research community requires a high-performance computational ecosystem that seamlessly and efficiently brings together scalable AI, general-purpose computing, and large-scale data management. The authors, at the Pittsburgh Supercomputing Center (PSC), launched a second-generation computational ecosystem to enable AI-enabled research, bringing together carefully designed systems and groundbreaking technologies to provide at no cost a uniquely capable platform to the research community. It consists of two major systems: Neocortex and Bridges-2. Neocortex embodies a revolutionary processor architecture to vastly shorten the time required for deep learning training, foster greater integration of artificial deep learning with scientific workflows, and accelerate graph analytics. Bridges-2 integrates additional scalable AI, high-performance computing (HPC), and high-performance parallel file systems for simulation, data pre- and post-processing, visualization, and Big Data as a Service. Neocortex and Bridges-2 are integrated to form a tightly coupled and highly flexible ecosystem for AI- and data-driven research.
Paola A. Buitrago, Nicholas A. Nystrom
Backmatter
Metadata
Title
High Performance Computing
Editors
Sergio Nesmachnow
Ph.D. Harold Castro
Andrei Tchernykh
Copyright Year
2021
Electronic ISBN
978-3-030-68035-0
Print ISBN
978-3-030-68034-3
DOI
https://doi.org/10.1007/978-3-030-68035-0

Premium Partner