main-content

## Über dieses Buch

This book constitutes the refereed proceedings of the 6th Latin American High Performance Computing Conference, CARLA 2019, held in Turrialba, Costa Rica, in September 2019.

The 32 revised full papers presented were carefully reviewed and selected out of 62 submissions. The papers included in this book are organized according to the conference tracks - regular track on high performance computing: applications; algorithms and models; architectures and infrastructures; and special track on bioinspired processing (BIP): neural and evolutionary approaches; image and signal processing; biodiversity informatics and computational biology.

## Inhaltsverzeichnis

### Optimizing Water Cooling Applications on Shared Memory Systems

The Network Search method is not yet widely used in computational simulations due to its high processing time in the solutions’ calculation. In this sense, this paper seeks to analyze the gains achieved with the parallel implementation of the Network Search method algorithm for shared memory systems. The results achieved with the parallel implementation of the algorithm applied in a real water cooling system achieved a reduction of the total execution time by up to 160 times and reduction of energy consumption by up to 60 times. Given the significant reduction of the execution time achieved with the parallelization of the Network Search method, it can be applied in different scientific problems in substitution of other methods that have less accuracy in their results.

Edson Luiz Padoin, Andressa Tais Diefenthaler, Matheus S. Serpa, Pablo José Pavan, Emmanuell D. Carreño, Philippe O. A. Navaux, Jean-François Mehaut

### Collaborative Development and Use of Scientific Applications in Orlando Tools: Integration, Delivery, and Deployment

The paper addresses practical challenges related to the development and application of distributed software packages of the Orlando Tools framework to solve real problems. Such packages include a special class of scientific applications characterized by a wide class of problem solvers, modular structure of software, algorithmic knowledge implemented by modules, computations scalability, execution in heterogeneous resources, etc. It is adapted for various categories of users: developers, administrators, and end-users. Unlike other tools for developing scientific applications, Orlando Tools provides supports for the intensive evolution of algorithmic knowledge, adaptation of existed and designing new ones. It has the capability to extend the class of solved problems. We implement and automate the non-trivial technological sequence of the collaborative development and use of packages including the continuous integration, delivery, deployment, and execution of package modules in a heterogeneous distributed environment that integrates grid and cloud computing. This approach reduces the complexity of the collaborative development and use of packages, and increases software operation predictability through the preliminary detecting and eliminating errors with significant reduction of the correcting cost.

Alexander Feoktistov, Sergei Gorsky, Ivan Sidorov, Igor Bychkov, Andrei Tchernykh, Alexei Edelev

### BS-SOLCTRA: Towards a Parallel Magnetic Plasma Confinement Simulation Framework for Modular Stellarator Devices

Hand in hand, computer simulations and High Performance Computing are catalyzing advances in experimental and theoretical fusion physics and the design and construction of new confinement devices that are spearheading the quest for alternative energy sources. This paper presents the Biot-Savart Solver for Computing and Tracing Magnetic Field Lines (BS-SOLCTRA), a field line tracing code developed during the first Stellarator of Costa Rica (SCR-1) campaign. We present the process towards turning BS-SOLCTRA into a full parallel simulation framework for stellarator devices. Message passing, shared-memory programming, and vectorization form the underlying parallel infrastructure and provide scalable execution. The implemented parallel simulator led to a 1, 550X speedup when compared to the original sequential version. We also present the new powerful scientific visualization capabilities added to the BS-SOLCTRA framework.

Diego Jiménez, Luis Campos-Duarte, Ricardo Solano-Piedra, Luis Alonso Araya-Solano, Esteban Meneses, Iván Vargas

### Optimizing Big Data Network Transfers in FPGA SoC Clusters: TECBrain Case Study

Spiking Neural Network (SSN) simulators based on clusters of FPGA-based System-on-Chip (SoC) involve the transmission of large amounts of data (from hundreds of MB to tens of GB per second) from and to a data host, usually a PC or a server. TECBrain is an SNN simulator which currently uses Ethernet for transmitting results from its simulations, which can potentially take hours if the effective connection speed is around 100 Mbps. This paper proposes data transfer techniques that optimize data transmissions by grouping data into packages making the most of the payload size and the use of thread-level parallelism, trying to minimize the impact of multiple clients transmitting at the same time. The proposed method achieves its highest throughput when inserting simulation results directly into a No-SQL database.Using the proposed optimization techniques over an Ethernet connection, the minimum overhead reached is 2.93% (out of the theoretical 2.47%) for five nodes sending data simultaneously from C++, with speeds up to 95 Mbps on a network at 100 Mbps. Besides, the maximum database insertion speed reached is 32.5 MB/s, using large packages and parallelism, which is 26% of the bandwidth of the connection link at 1 Gbps.

Luis G. León-Vega, Kaleb Alfaro-Badilla, Alfonso Chacón-Rodríguez, Carlos Salazar-García

### A Load Balancing Algorithm for Fog Computing Environments

Fog Computing is characterized as an intermediate layer between the Internet of Things layer and the Cloud Computing layer, which pre-processes information closer to the sensors. However, given the increasing demand for numerous IoT applications, even when close to the sensors, Fog nodes tend to be overloaded, compromising the response times of IoT applications that have latency restrictions, and consequently compromising users’ quality experience too. In this work, we investigated ways to mitigate this problem in order to keep Fog Computing with a homogeneous distribution of load, even in heterogeneous environments, through the distribution of tasks among several computational nodes that compose Fog Computing, performing a dynamic load balancing in real time. For this, an algorithm model is presented, which takes into account the dynamics and heterogeneity of the computational nodes of Fog Computing, which allocates the tasks to the most appropriate node according to the policies predefined by the network administrator. Results show that in the proposed work the homogeneous distribution of tasks was achieved between the Fog nodes, and there was a decrease in response times when compared to other proposed solution.

Eder Pereira, Ivânia A. Fischer, Roseclea D. Medina, Emmanuell D. Carreno, Edson Luiz Padoin

### Multi-objective Configuration of a Secured Distributed Cloud Data Storage

Cloud storage is one of the most popular models of cloud computing. It benefits from a shared set of configurable resources without limitations of local data storage infrastructures. However, it brings several cybersecurity issues. In this work, we address the methods of mitigating risks of confidentiality, integrity, availability, information leakage associated with the information loss/change, technical failures, and denial of access. We rely on a configurable secret sharing scheme and error correction codes based on the Redundant Residue Number System (RRNS). To dynamically configure RRNS parameters to cope with different objective preferences, workloads, and cloud properties, we take into account several conflicting objectives: probability of information loss/change, extraction time, and data redundancy. We propose an approach based on a genetic algorithm that is effective for multi-objective optimization. We implement NSGA-II, SPEA2, and MOCell, using the JMetal 5.6 framework. We provide their experimental analysis using eleven real data cloud storage providers. We show that MOCell algorithm demonstrates best results obtaining a better Pareto optimal front approximation and quality indicators such as inverted generational distance, additive epsilon indicator, and hypervolume. We conclude that multi-objective genetic algorithms could be efficiently used for storage optimization and adaptation in a non-stationary multi-cloud environment.

Luis Enrique García-Hernández, Andrei Tchernykh, Vanessa Miranda-López, Mikhail Babenko, Arutyun Avetisyan, Raul Rivera-Rodriguez, Gleb Radchenko, Carlos Jaime Barrios-Hernandez, Harold Castro, Alexander Yu. Drozdov

### Bounding Volume Hierarchy Acceleration Through Tightly Coupled Heterogeneous Computing

Bounding Volume Hierarchy (BVH) is the main acceleration mechanism used for improving ray tracing rendering time. Several research efforts have been made to optimize the BVH algorithm for GPU and CPU architectures. Nonetheless, as far as we know, no study has targeted the APU (Accelerated Processing Unit) that have a CPU and an integrated GPU in the same die. The APU has the advantage of being able to share workloads within its internal processors (CPU and GPU) through heterogeneous computing. We crafted a specific implementation of the ray tracing algorithm with BVH traversal implemented for the APU architecture and compared the performance of this SoC against CPU and GPU equivalent implementations. It was found that the performance of the APU surpassed the other architectures.

### Towards a Lightweight Method to Predict the Performance of Sparse Triangular Solvers on Heterogeneous Hardware Platforms

The solution of sparse triangular linear systems (SpTrSV) is a fundamental building block for many numerical methods. The important presence in different fields and the considerable computational cost of this operation have motivated several efforts to accelerate it on different hardware platforms and, in particular, on those equipped with massively-parallel processors. Until recently, the dominant approach to parallelize this operation on this sort of hardware was the level-set method, which relies on a costly preprocessing phase. For this reason, much of the research on the subject is focused on the case where several triangular linear systems have to be solved for the same matrix. However, the latest efforts have proposed efficient one-phase routines that can be advantageous even when only one SpTrSV needs to be applied for each matrix. In these cases, the decision of which solver to employ strongly depends of the degree of parallelism offered by the linear system. In this work we provide an inexpensive algorithm to estimate the degree of parallelism of a triangular matrix, and explore some heuristics to select between the SpTrSV routine provided by the Intel MKL library and our one-phase GPU solver. The experimental evaluation performed shows that our proposal achieves generally accurate predictions with runtimes two orders lower than the state of the art method to compute the DAG levels.

Raúl Marichal, Ernesto Dufrechou, Pablo Ezzatti

### Accelerating the Calculation of Friedman Test Tables on Many-Core Processors

The Friedman Test has been proposed in 1937 to analyze tables of ranks, like those arising from a wine contest. If we have N judges and k wines, the standard problem is to analyze a table of N rows and k columns holding the opinion of the judges. The Friedman’s Test is used to accept/reject the null hypothesis that all the wines are equivalent. Friedman offered an asymptotically valid approximation as well as exact tables for low k and N. The accuracy of the asymptotic approximation for moderate k and N was low, and extended tables were required. The published ones were mostly computed using Monte Carlo techniques. The effort required to compute the extended tables for the case without ties was significant (over 100 years of CPU time) and an alternative using many-core processors is described here for the general case with ties. The solution can be used also for other similar tests which yet lack for large enough tables.

Diego Irigaray, Ernesto Dufrechou, Martín Pedemonte, Pablo Ezzatti, Carlos López-Vázquez

### Modelling Road Saturation Dynamics on a Complex Transportation Network Based on GPS Navigation Software Data

High traffic concentration during weekdays in the Great Metropolitan Area of Costa Rica causes severe traffic congestion and high costs for the population. It is crucial to deeply understand the dynamics of traffic congestion to design and implement long term solutions. Given the lack of official data to study traffic congestion, we model it using a transportation network based on data captured throughout the year 2018 by a GPS navigation software application (Waze), provided by the Ministry of Public Works and Transportation (MOPT in Spanish). In this paper, we focus on the data transformation procedure to create the transportation network and propose a traffic congestion classification with the available data. We developed a practical methodology which consists of four main stages: data preparation, road network modelling, road saturation estimation, and saturation dynamics analysis. The results show it is possible to model road saturation level using the proposed methodology. We were able to classify road segments in five categories that effectively represent the levels of road saturation. This classification gives us a clear overview of the real-world conditions faced by road network users.

Mariana Cubero-Corella, Esteban Durán-Monge, Warner Díaz, Esteban Meneses, Steffan Gómez-Campos

### ExaMPI: A Modern Design and Implementation to Accelerate Message Passing Interface Innovation

The difficulty of deep experimentation with Message Passing Interface (MPI) implementations—which are quite large and complex—substantially raises the cost and complexity of proof-of-concept activities and limits the community of potential contributors to new and better MPI features and implementations alike. Our goal is to enable researchers to experiment rapidly and easily with new concepts, algorithms, and internal protocols for MPI, we introduce ExaMPI, a modern MPI-3.x subset with a robust MPI-4.x roadmap. We discuss design, early implementation, and ongoing utilization in parallel programming research, plus specific research activities enabled by ExaMPI.Architecturally, ExaMPI is a C++17-based library designed for modularity, extensibility, and understandability. The code base supports both native C++ threading with thread-safe data structures and a modular progress engine. In addition, the transport abstraction implements UDP, TCP, OFED verbs, and LibFabrics for high-performance networks.By enabling researchers with ExaMPI, we seek to accelerate innovations and increase the number of new experiments and experimenters, all while expanding MPI’s applicability.

Anthony Skjellum, Martin Rüfenacht, Nawrin Sultana, Derek Schafer, Ignacio Laguna, Kathryn Mohror

### Assessing Kokkos Performance on Selected Architectures

Performance Portability frameworks allow developers to write code for familiar High-Performance Computing (HPC) architecture and minimize development effort over time to port it to other HPC architectures with little to no loss of performance. In our research, we conducted experiments with the same codebase on a Serial, OpenMP, and CUDA execution and memory space and compared it to the Kokkos Performance Portability framework. We assessed how well these approaches meet the goals of Performance Portability by solving a thermal conduction model on a 2D plate on multiple architectures (NVIDIA (K20, P100, V100, XAVIER), Intel Xeon, IBM Power 9, ARM64) and collected execution times (wall-clock) and performance counters with perf and nvprof for analysis. We used the Serial model to determine a baseline and to confirm that the model converges on both the native and Kokkos code. The OpenMP and CUDA models were used to analyze the parallelization strategy as compared to the Kokkos framework for the same execution and memory spaces.

Chang Phuong, Noman Saied, Craig Tanis

### Improving the Simulation of Biologically Accurate Neural Networks Using Data Flow HLS Transformations on Heterogeneous SoC-FPGA Platforms

This work proposes a hardware performance-oriented design methodology aimed at generating efficient high-level synthesis (HLS) coded data multiprocessing on a heterogeneous platform. The methodology is tested on typical neuroscientific complex application: the biologically accurate modeling of a brain region known as the inferior olivary nucleus (ION). The ION cells are described using a multi-compartmental model based on the extended Hodgkin-Huxley membrane model (eHH), which requires the solution of a set of coupled differential equations. The proposed methodology is tested against alternative HPC implementations (multi-core CPU i7-7820HQ, and a Virtex7 FPGA) of the same ION model for different neural network sizes. Results show that the solution runs 10 to 4 times faster than our previous implementation using the same board and closes the gap between the performance against a Virtex7 implementation without using at full-capacity the AXI-HP channels.

Kaleb Alfaro-Badilla, Andrés Arroyo-Romero, Carlos Salazar-García, Luis G. León-Vega, Javier Espinoza-González, Franklin Hernández-Castro, Alfonso Chacón-Rodríguez, Georgios Smaragdos, Christos Strydis

### Delivering Scalable Deep Learning to Research with Bridges-AI

Artificial intelligence (AI), particularly deep learning, is enabling tremendous advances and is itself of great research interest. To address these research requirements, the Pittsburgh Supercomputing Center (PSC) expanded its Bridges supercomputer with Bridges-AI, providing the world’s most powerful AI servers to the U.S. national research community and their international collaborators. We describe the motivation and architecture of Bridges-AI and its integration with Bridges, which adds to Bridges’ capabilities for scalable, converged high-performance computing (HPC), AI, and Big Data. We then describe the software environment of Bridges-AI, particularly the introduction of containers for deep learning frameworks, machine learning, and graph analytics, and PSC’s approach to container deployment. We close with a discussion of the range of research challenges that Bridges-AI is enabling breakthroughs, highlighting development of AI-driven methods to identify immune responses with automated tumor detection in breast cancer.

Paola A. Buitrago, Nicholas A. Nystrom, Rajarsi Gupta, Joel Saltz

### Towards a Platform to Evaluate the Impact of Resource Information Distribution in IoT Environments

Internet of Things (IoT) is a paradigm in which every object has the capacity of communicating through the Internet. Cloud Computing is designed to provide computational resources to costumers geographically distributed following an elastic payment strategy. Fog/Edge Computing aims to decrease bandwidth usage keeping the computation near the source of data and avoiding the collapse of network infrastructure when moving all the data from the edge to the cloud data centers. Fog and Cloud environments define a large scale distributed system composed of heterogeneous resources, which has huge theoretical computing power. But using these computational resources poses challenges to distributed applications and scheduling policies. In this work, we show the initial steps to develop a tool to support evaluate the impact of resource information quality to guide scheduling policies. This tool combines simulation and validation and simplifies the deployment of experiments on both sides. The evaluation of this initial proof of concept consists of the deployment of experiments with a different number of devices in a single site and in three different sites across France. Our results show that both simulation and validation platforms present good agreement.

Paula Verghelet, Esteban Mocskos

### GPU Support for Automatic Generation of Finite-Differences Stencil Kernels

The growth of data to be processed in the Oil & Gas industry matches the requirements imposed by evolving algorithms based on stencil computations, such as Full Waveform Inversion and Reverse Time Migration. Graphical processing units (GPUs) are an attractive architectural target for stencil computations because of its high degree of data parallelism. However, the rapid architectural and technological progression makes it difficult for even the most proficient programmers to remain up-to-date with the technological advances at a micro-architectural level. In this work, we present an extension for an open source compiler designed to produce highly optimized finite difference kernels for use in inversion methods named Devito©. We embed it with the Oxford Parallel Domain Specific Language (OP-DSL) in order to enable automatic code generation for GPU architectures from a high-level representation. We aim to enable users coding in a symbolic representation level to effortlessly get their implementations leveraged by the processing capacities of GPU architectures. The implemented backend is evaluated on a NVIDIA® GTX Titan Z, and on a NVIDIA® Tesla V100 in terms of operational intensity through the roof-line model for varying space-order discretization levels of 3D acoustic isotropic wave propagation stencil kernels with and without symbolic optimizations. It achieves approximately 63% of V100’s peak performance and 24% of Titan Z’s peak performance for stencil kernels over grids with 2563 points. Our study reveals that improving memory usage should be the most efficient strategy for leveraging the performance of the implemented solution on the evaluated architectures.

Vitor Hugo Mickus Rodrigues, Lucas Cavalcante, Maelso Bruno Pereira, Fabio Luporini, István Reguly, Gerard Gorman, Samuel Xavier de Souza

### Adding Probabilistic Certainty to Improve Performance of Convolutional Neural Networks

Convolutional Neural Networks (CNN) are successfully being used for different computer vision tasks, from labeling cancerous cells in medical images to identify traffic signals in self-driving cars. Supervised CNN classify raw input data according to the patterns learned from an input training set. This set is typically obtained by manually labeling the image which can lead to uncertainties in the data. The level of expertise of the professionals labeling the training set sometimes varies widely or some of the images used may not be clear and are difficult to label. This leads to data sets with pictures labeled differently by different experts or uncertainty in the experts opinions.These kind of errors on the training set do happen more frequently when the CNN task is to classify numerous labels with similar characteristics. For example, when labeling damages on civil infrastructures after an earthquake, there are more than two hundred different labels with some of them similar to each other and the experts labeling the sets frequently disagree on which one to use. In this paper, we use probabilistic analysis to evaluate both the likelihood of the labels in the training set (produced by the CNN) and the likelihood’s uncertainty. The uncertainty in the likelihood is represented by a probability density and represents a spreading (as it were) of the CNN’s likelihood estimate over a range of values dictated by the uncertainty in the truth set.

Maria Pantoja, Robert Kleinhenz, Drazen Fabris

### Assessing the Impact of a Preprocessing Stage on Deep Learning Architectures for Breast Tumor Multi-class Classification with Histopathological Images

In this work, we assess the impact of the adaptive unsharp mask filter as a preprocessing stage for breast tumour multi-class classification with histopathological images, evaluating two state-of-the-art architectures, not tested so far for this problem to our knowledge: DenseNet, SqueezeNet and a 5-layer baseline deep learning architecture. SqueezeNet is an efficient architecture, which can be useful in environments with restrictive computational resources. According to the results, the filter improved the accuracy from 2% to 4% in the 5-layer baseline architecture, on the other hand, DenseNet and SqueezeNet show a negative impact, losing from 2% to 6% accuracy. Hence, simpler deep learning architectures can take more advantage of filters than complex architectures, which are able to learn the preprocessing filter implemented. Squeeze net yielded the highest per parameter accuracy, while DenseNet achieved a 96% accuracy, defeating previous state of the art architectures by 1% to 5%, making DenseNet a considerably more efficient architecture for breast tumour classification.

Iván Calvo, Saul Calderon, Jordina Torrents-Barrena, Erick Muñoz, Domenec Puig

### Assessing the Robustness of Recurrent Neural Networks to Enhance the Spectrum of Reverberated Speech

Implementing voice recognition systems and voice analysis in real-life contexts present important challenges, especially when signal recording/registering conditions are adverse. One of the conditions that produce signal degradation, which has also been studied in recent years is reverberation. Reverberation is produced by the sound wave reflections that travel through the microphone from multiple directions.Several Deep Learning-based methods have been proposed to improve speech signals that have been degraded with reverberation and are proven to be effective. Recently, recurrent neural networks, especially those with short and long term memory (LSTM), have presented surprising results in those tasks.In this work, a proposal to evaluate the robustness of these neural networks to learn different reverberation conditions without any previous information is presented. The results show the necessity to train fewer sets of LSTM networks to improve speech signals, since a single network can learn several conditions simultaneously, in contrast with the current method of training a network for every single condition or noise level.The evaluation has been made based on quality measurements of the signal’s spectrum (distance and perceptual quality), in comparison with the reverberated version. Results help to affirm the fact that LSTM networks are able to enhance the signal in any of five conditions, where all of them were trained simultaneously, with equivalent results as if to train a network for every single condition of reverberation.

Carolina Paniagua-Peñaranda, Marisol Zeledón-Córdoba, Marvin Coto-Jiménez

### A Performance Evaluation of Several Artificial Neural Networks for Mapping Speech Spectrum Parameters

In this work, we compare different neural network architectures, for the task of mapping spectral coefficients of noisy speech signals with those corresponding to natural speech. In previous works on the subject, fully-connected multilayer perception (MLP) networks and recurrent neural networks (LSTM & BLSTM) have been used. Several references report some initial trial and error processes to determine which architecture to use. Finding the best network type and size is of great importance due to the considerable training time required by some models of recurrent networks. In our work, we conducted extensive tests training more than five hundred networks, with several architectures to determine which cases present significant differences. The results show that for this application of neural networks, the architectures with more layers or the greater number of neurons are not the most convenient, both for the time required in their training and for the adjustment achieved. These results depend on the complexity of the task (the signal-to-noise ratio or SNR) and the amount of data available. This exploration can guide the most efficient use of these types of neural networks in future mapping applications, and can help to optimize resources in future studies by reducing computational time and complexity.

Víctor Yeom-Song, Marisol Zeledón-Córdoba, Marvin Coto-Jiménez

### Using Cluster Analysis to Assess the Impact of Dataset Heterogeneity on Deep Convolutional Network Accuracy: A First Glance

In this paper we performed cluster analysis using Fuzzy K-means over the image-based features of two models, to assess how dataset heterogeneity impacts model accuracy. A highly heterogeneous dataset is linked with sparse data samples, which usually impacts the overall model generalization and accuracy with test samples. We propose to measure the Coefficient of Variation (CV) in the resulting clusters, to estimate data heterogeneity as a metric for predicting model generalization and test accuracy. We show that highly heterogeneous datasets are common when the number of samples are not enough, thus yielding a high CV. In our experiments with two different models and datasets, higher CV values decreased model test accuracy considerably. We tested ResNet 18, to solve binary classification of x-ray teeth scans, and VGG16, to solve age regression from hand x-ray scans. Results obtained suggest that cluster analysis can be used to identify heterogeneity influence on CNN model testing accuracy. According to our experiments, we consider that a CV $$< 5\%$$<5% is recommended to yield a satisfactory model test accuracy.

Mauro Mendez, Saul Calderon, Pascal N. Tyrrell

### Evolutionary Approach for Bus Synchronization

This article presents the application of evolutionary algorithms to solve the bus synchronization problem. The problem model includes extended synchronization points, accounting for every pair of bus stops in a city, and the transfer demands for each pair of lines on each pair of bus stops. A specific evolutionary algorithm is proposed to efficiently solve the problem and results are compared with intuitive algorithms and also with the current planning of the transportation system on real scenarios from the city of Montevideo, Uruguay. Experimental results indicate that the proposed evolutionary algorithm is able to improve in up to 13.33% the synchronizations with respect to the current planning and systematically outperforms other baseline methods.

Sergio Nesmachnow, Jonathan Muraña, Gerardo Goñi, Renzo Massobrio, Andrei Tchernykh

### Autonomous Flight of Unmanned Aerial Vehicles Using Evolutionary Algorithms

This article explores the application of evolutionary algorithms and agent-oriented programming to solve the problem of searching and monitoring objectives through a fleet of unmanned aerial vehicles. The subproblem of static off-line planning is studied to find initial flight plans for each vehicle in the fleet, using evolutionary algorithms to achieve compromise values between the size of the explored area, the proximity of the vehicles, and the monitoring of points of interest defined in the area. The results obtained in the experimental analysis on representative instances of the surveillance problem indicate that the proposed techniques are capable of computing effective flight plans.

Américo Gaudín, Gabriel Madruga, Carlos Rodríguez, Santiago Iturriaga, Sergio Nesmachnow, Claudio Paz, Gregoire Danoy, Pascal Bouvry

### An Experimental Study on Fundamental Frequency Detection in Reverberated Speech with Pre-trained Recurrent Neural Networks

The detection of the fundamental frequency ($$f_{0}$$f0) in speech signals is relevant in areas such as automatic speech recognition and identification, with multiple potential applications. For example, in virtual assistants, assistive technology devices and biomedical applications. It has been acknowledged that the extraction of this parameter is affected in adverse conditions, for example, when reverberation or background noise is present. In this paper, we present a new method to improve the detection of the $$f_{0}$$f0 in speech signals with reverberation, based on initialized Long Short-term Memory (LSTM) neural networks. In previous works, LSTM has used weights initialized with random numbers. We propose an initialization in the form of an auto-associative memory, which learns the identity function from non-reverberated data. The advantages of our proposal are shown using different objective quality measures, in particular, in the detection of segments with and without $$f_{0}$$f0.

Andrei Alfaro-Picado, Stacy Solís-Cerdas, Marvin Coto-Jiménez

### Measuring the Effect of Reverberation on Statistical Parametric Speech Synthesis

Text-to-speech (TTS) synthesis is the technique of generating intelligible speech from a given text. The most recent techniques for TTS are based on machine learning, implementing systems which learn linguistic specifications and their corresponding parameters of the speech signal. Given the growing interest in implementing verbal communication systems in different devices, such as cell phones, car navigation system and personal assistants, it is important to use speech data from many sources. The speech recordings available for this purpose are not always generated with the best quality. For example, if an artificial voice is created from historical recordings, or a voice created from a person whom only a small set of recordings exists. In these cases, there is an additional challenge due to the adverse conditions in the data. Reverberation is one of the conditions that can be found in these cases, a product of the different trajectories that a speech signal can take in an environment before registering through a microphone. In the present work, we quantitatively explore the effect of different levels of reverberation on the quality of artificial voice generated with those references. The results show that the quality of the generated artificial speech is affected considerably with any level of reverberation. Thus, the application of algorithms for speech enhancement must be taken always into consideration before and after any process of TTS.

Marvin Coto-Jiménez

### Enhancing Speech Recorded from a Wearable Sensor Using a Collection of Autoencoders

Assistive Technology (AT) is a concept which includes the use of technological devices to improve the learning process or the general capabilities of people with disabilities. One of the major tasks of the AT is the development of devices that offer alternative or augmentative communication capabilities.In this work, we implemented a simple AT device with a low-cost sensor for registering speech signals, in which the sound is perceived as low quality and corrupted. Thus, it is not suitable to integrate into speech recognition systems, automatic transcription or general recognition of vocal-tract sounds for people with disabilities.We propose the use of a group of artificial neural networks that improve different aspects of the signal. In the study of the speech enhancement, it is normal to focus on how to make improvements in specific conditions of the signal, such as background noise, reverberation, natural noises, among others. In this case, the conditions that degrade the sound are unknown. This uncertainty represents a bigger challenge for the enhancement of the speech, in a real-life application.The results show the capacity of the artificial neural networks to enhance the quality of the sound, under several objective evaluation measurements. Therefore, this proposal can become a way of treating these kinds of signals to improve robust speech recognition systems and increase the real possibilities for implementing low-cost AT devices.

Astryd González-Salazar, Michelle Gutiérrez-Muñoz, Marvin Coto-Jiménez

### Insight GT: A Public, Fast, Web Image Ground Truth Authoring Tool

This paper proposes the community the development of a public web tool for fast image Ground Truth Authoring Tool (GTAT). Image ground truth authoring tools are key to generate training and validation data for image segmentation and classification systems. The paper does a short review of similar publicly available GTAT’s, its features and short-comings, in order to spot the key features missing for a public GTAT to the community. Based in the concluded wished features, we aim to develop a free and open GTAT in the future.

Barrantes-Garro Joel, Rodríguez-Morales Hellen, Garnier-Artiñano Adrián, Calderón-Ramírez Saúl, Porras-Jiménez Fabian, Corrales-Arley Luí­s Carlos, Brenes-Camacho Ricardo

### Comparison of Four Automatic Classifiers for Cancer Cell Phenotypes Using M-Phase Features Extracted from Brightfield Microscopy Images

In our in vitro study to model and understand the regulation networks that control the live and death of the cells, it is fundamental to quantify the contribution of each of the cancer cell’ phenotypes: apoptosis, cell cycle arrest, DNA damage repair, and DNA damage proliferation. For that, an automatic microscope is used to generate several images of cell populations using brightfield microscopy. In the scientific literature, several methods to extract features from microscopy images are available, but mostly for fluorescence or contrast phase microscopy, which have the disadvantage of being phototoxic to the cells, and therefore unsuitable for our study. In this paper a successful method to automatically extract and classify the phenotypes of cancer cells is presented. The method uses features extracted automatically from the M-phase (mitosis) of cells from images obtained by brightfield microscopy. The classification results are validated by comparing them with the correct manually annotated classes for each instance. Four different classifiers: Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), k-Nearest Neighbours (kNN), and Random Forests (RF) are compared using standard comparison metrics, such as precision, recall and F1-score. It is finally shown that the LDA classifier provided the best results, reaching an overall f1-score of 0.78 and an overall weighted f1-score of 0.88.

Francisco Siles, Andrés Mora-Zúñga, Steve Quiros

### Diaforá: A Visualization Tool for the Comparison of Biological Taxonomies

We address the problem of visualizing differences between two versions of a biological taxonomy. Given the dynamics of the taxonomic work, taxonomists are often faced with alternative versions of a taxonomy that need to be reconciled. Nevertheless, visual comparison of hierarchies is an open problem that involves several difficult challenges in Visual Analytics. First, how to display not one but two possibly large taxonomies on a fixed-size screen. Second, how to highlight all differences between the two taxonomies. We present Diaforá, an interactive tool that infers and visualizes the differences. Automatic inference is achieved by incorporating taxonomy rules to identify operations such as merging, splitting, and renaming of taxa, among others. Highlighting of differences is accomplished by using the edge drawing technique, which has been enhanced with a number of features suggested by users of a prototype version. Diaforá has been implemented and tested with real world taxonomies such as Bryozoa and Annelida as well as with artificial taxonomies.

Lilliana Sancho-Chavarría, Carlos Gómez-Soza, Fabian Beck, Erick Mata-Montero

### A First Glance into Reversing Senescence on Herbarium Sample Images Through Conditional Generative Adversarial Networks

In this paper we describe a novel approach to perform senescense reversal on photos of leaves based on Conditional Generative Adversarial Networks, which have been used succesfully to perform similar tasks on faces of humans and other picture to picture translations. We show that their use can lead to a valid solution to this problem, as long as the task of creating a large and comprehensive dataset is surpassed. Additionally, we present a new dataset that consists of 120 paired photos of leaves manually collected for this work, in their fresh and senescenced states. We used the structure similarity index to compare the ground truth with the generated images and yielded an average of 0.9.

Juan Villacis-Llobet, Marco Lucio-Troya, Marvin Calvo-Navarro, Saul Calderon-Ramirez, Erick Mata-Montero

### Performance Evaluation of Parallel Inference of Large Phylogenetic Trees in Santos Dumont Supercomputer: A Practical Approach

The modern high-throughput techniques of analytical chemistry and molecular biology produce a massive amount of data. Omics sciences cover complex areas as next-generation sequencing for genomics, systems biology studies of biochemical pathways, or novel bioactive compounds discovery and they can be fostered by the use of high-performance computing. Nowadays, the effective use of supercomputers plays an important role in phyloinformatics since most of these applications are considered as memory or compute-bound and have large number of simple and regular computations which exhibit potentially massive parallelism. Phyloinformatics analyses cover phylogenomic and computational evolutionary studies of the life of genomes of organisms. RAxML is a popular phylogenomic software based on maximum likelihood algorithms used for the analyses of phylogenetic trees, which require high computational computing to process large amounts of data. RAxML implements several phylogenetic likelihood function kernel variants (SSE3, AVX, AVX2) and offers coarse-grain/fine-grain parallelism via Hybrid and MPI/PThread versions. The present paper aims at exploring the performance and scalability of RAxML in the Santos Dumont supercomputer. Machine learning analyses were applied to support the choice of features which lead to the efficient allocation of resources in Santos Dumont. Recommending features such as type of clusters, number of cores, input data size, or RAxML historical performance results were used for generating the predictive models used for allocating computational resources. In the experiments, the hybrid version of RAxML improves the speedup significantly while maintaining efficiency over 75%.

Kary Ocaña, Carla Osthoff, Micaella Coelho, Marcelo Galheigo, Isabela Canuto, Douglas de Oliveira, Daniel de Oliveira

### Matching of EM Map Segments to Structurally-Relevant Bio-molecular Regions

Electron microscopy is a technique used to determine the structure of bio-molecular machines via three-dimensional images (called maps). The state-of-the-art is able to determine structures at resolutions that allow us to identify up to secondary structural features, in some cases, but it is not widespread. Furthermore, because molecular interactions often require atomic-level details to be understood, it is still necessary to complement current maps with techniques that provide finer-grain structural details. We applied segmentation techniques to maps in the Electron Microscopy Data Bank (EMDB), the standard community repository for these data. We assessed the potential of these algorithms to match functionally relevant regions in their atomic-resolution image counterparts by comparing against three protein systems, each with multiple atomic-detailed domains. We found that at least 80% of amino acid residues in 7 out of 12 domains were assigned to single segments, suggesting there is potential to match the lower resolution segmented regions to the atomic counterparts. We also qualitatively analyzed the potential on other EMDB structures, as well as generating the raw segmentation information for the complete EMDB, for interested researchers to use. Results can be accessed online and the library developed is provided as part of an open-source project.

Manuel Zumbado-Corrales, Luis Castillo-Valverde, José Salas-Bonilla, Julio Víquez-Murillo, Daisuke Kihara, Juan Esquivel-Rodríguez

### Backmatter

Weitere Informationen