Skip to main content

2015 | Buch

High Performance Computing

Second Latin American Conference, CARLA 2015, Petrópolis, Brazil, August 26-28, 2015, Proceedings

herausgegeben von: Carla Osthoff, Philippe Olivier Alexandre Navaux, Carlos Jaime Barrios Hernandez, Pedro L. Silva Dias

Verlag: Springer International Publishing

Buchreihe : Communications in Computer and Information Science

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the Second Latin American Conference on High Performance Computing, CARLA 2015, a joint conference of the High-Performance Computing Latin America Community, HPCLATAM, and the Conferencia Latino Americana de Computación de Alto Rendimiento, CLCAR, held in Petrópolis, Brazil, in August 2015.

The 11 papers presented in this volume were carefully reviewed and selected from 17 submissions. They were organized in topical sections named: grid and cloud computing; GPU & MIC Computing: methods, libraries and applications; and scientific computing applications.

Inhaltsverzeichnis

Frontmatter

Grid and Cloud Computing

Frontmatter
Running Multi-relational Data Mining Processes in the Cloud: A Practical Approach for Social Networks
Abstract
Multi-relational Data Mining algorithms (MRDM) are the appropriate approach for inferring knowledge from databases containing multiple relationships between non-homogenous entities, which are precisely the case of datasets obtained from social networks. However, to acquire such expressivity, the search space of candidate hypotheses in MRDM algorithms is more complex than those obtained from traditional data mining algorithms. To allow a feasible search space of hypotheses, MRDM algorithms adopt several language biases during the mining process. Because of that, when running a MRDM-based system, the user needs to execute the same set of data mining tasks a number of times, each assuming a different combination of parameters in order to get a final good hypothesis. This makes manual control of such complex process tedious, laborious and error-prone. In addition, running the same MRDM process several times consumes much time. Thus, the automatic execution of each setting of parameters throughout parallelization techniques becomes essential. In this paper, we propose an approach named LPFlow4SN that models a MRDM process as a scientific workflow and then executes it in parallel in the cloud, thus benefiting from the existing Scientific Workflow Management Systems. Experimental results reinforce the potential of running parallel scientific workflows in the cloud to automatically control the MRDM process while improving its overall execution performance.
Aline Paes, Daniel de Oliveira
Methods for Job Scheduling on Computational Grids: Review and Comparison
Abstract
This paper provides a review of heuristics and metaheuristics methods, to solve the job scheduling problem in grid systems under the ETC (Expected Time to Compute) model. The problem is an important issue for efficient resource management in computational grids, which is performed by schedulers of these High Performance Computing systems. We present an overview of methods and a comparison of the results reported in the papers that use ETC model. The best methods are identified according to Braun et al. instances [8], which are ETC model instances most used in literature. This survey can help new researchers to lead them directly at the best scheduling algorithms already available to perform deep future works.
Edson Flórez, Carlos J. Barrios, Johnatan E. Pecero
Cloud Computing for Fluorescence Correlation Spectroscopy Simulations
Abstract
Fluorescence microscopy techniques and protein labeling set an inflection point in the way cells are studied. The fluorescence correlation spectroscopy is extremely useful for quantitatively measuring the movement of molecules in living cells. This article presents the design and implementation of a system for fluorescence analysis through stochastic simulations using distributed computing techniques over a cloud infrastructure. A highly scalable architecture, accessible to many users, is proposed for studying complex cellular biological processes. A MapReduce algorithm that allows the parallel execution of multiple simulations is developed over a distributed Hadoop cluster using the Microsoft Azure cloud platform. The experimental analysis shows the correctness of the implementation developed and its utility as a tool for scientific computing in the cloud.
Lucía Marroig, Camila Riverón, Sergio Nesmachnow, Esteban Mocskos
Porting a Numerical Atmospheric Model to a Cloud Service
Abstract
Cloud Computing emerged as a viable environment to perform scientific computation. The charging model and the elastic capability to allocate machines as needed are attractive for applications that execute traditionally in clusters or supercomputers. This paper presents our experiences of porting and executing a weather prediction application to the an IaaS cloud. We compared the execution of this application in our local cluster against the execution in the IaaS provider. Our results show that processing and networking in the cloud create a limiting factor compared to a physical cluster. Otherwise to store input and output data in the cloud presents a potential option to share results and to build a test-bed for a weather research platform on the cloud. Performance results show that a cloud infrastructure can be used as a viable alternative for HPC applications.
Emmanuell D. Carreño, Eduardo Roloff, Philippe O. A. Navaux
Determining the Real Capacity of a Desktop Cloud
Abstract
Computer laboratories at Universities are underutilized most of the time [1]. Having an averaged measure of its computing resources usage would allow researchers to harvest the capacity available by deploying opportunistic infrastructures, that is, infrastructures mostly supported by idle computing resources which run in parallel to tasks performed by the resource owner (end-user). In this paper we measure such usage in terms of CPU and RAM. The metrics were obtained by using the SIGAR library on 70 desktops belonging to two independent laboratories during the three busiest weeks in the semester. We found that the averaged usage of CPU is less than 5 % while RAM is around 25 %. The results show that in terms of the amount of floating point operations per second (FLOPS) there is a capacity of 24 GFLOPS that can be effectively harvest by deploying opportunistic infrastructures to support e-Science without affecting the performance perceived by end-users and avoiding underutilization and the acquisition of new hardware.
Carlos E. Gómez, César O. Díaz, César A. Forero, Eduardo Rosales, Harold Castro
Improvements to Super-Peer Policy Communication Mechanisms
Abstract
The use of large distributed computing infrastructures has become a fundamental component in most of scientific and technological projects. Due to its highly distributed nature, one of the key topics to be addressed in large distributed systems (like Grids and Federation of Clouds) is the determination of the availability and state of resources. Having up-to-date information about resources in the system is extremely important as this is consumed by the scheduler for selecting the appropriate target in each job to be served.
The way in which this information is obtained and distributed is what is known as Resource Information Distribution Policy. A centralized organization presents several drawbacks, for example, a single point of failure. Notwithstanding, the static hierarchy has become the defacto implementation of grid information systems.
There is a growing interest in the interaction with the Peer to Peer (P2P) paradigm, pushing towards scalable solutions. Super Peer Policy (SP) is a decentralized policy which presents a notable improvement in terms of response time and expected number of results compared with decentralization one. While Hierarchical policy is valuable for small and medium-sized Grids, SP is more effective in very large systems and therefore is more scalable.
In this work, we analyze SP focusing on the communication between super-peers. An improvement to the standard protocol is proposed which leads to two new SP policies outperforming the standard implementation: N-SP and A2A-SP. These policies are analyzed in terms of obtained performance in Exponential and Barabási network topologies, network consumption and scalability.
Paula Verghelet, Esteban Mocskos

GPU and MIC Computing: Methods, Libraries and Applications

Frontmatter
Asynchronous in Situ Processing with Gromacs: Taking Advantage of GPUs
Abstract
Numerical simulations using supercomputers are producing an ever growing amount of data. Efficient production and analysis of these data are the key to future discoveries. The in situ paradigm is emerging as a promising solution to avoid the I/O bottleneck encountered in the file system for both the simulation and the analytics by treating the data as soon as they are produced in memory. Various strategies and implementations have been proposed in the last years to support in situ treatments with a low impact on the simulation performance. Yet, little efforts have been made when it comes to perform in situ analytics with hybrid simulations supporting accelerators like GPUs. In this article, we propose a study of the in situ strategies with Gromacs, a molecular dynamic simulation code supporting multi-GPUs, as our application target. We specifically focus on the computational resources usage of the machine by the simulation and the in situ analytics. We finally extend the usual in situ placement strategies to the case of in situ analytics running on a GPU and study their impact on both Gromacs performance and the resource usage of the machine. We show in particular that running in situ analytics on the GPU can be a more efficient solution than on the CPU especially when the CPU is the bottleneck of the simulation.
Monica L. Hernandez, Matthieu Dreher, Carlos J. Barrios, Bruno Raffin
Solving Linear Systems on the Intel Xeon-Phi Accelerator via the Gauss-Huard Algorithm
Abstract
The solution of linear systems is a key operation in many scientific and engineering applications. Traditional solvers are based on the LU factorization of the coefficient matrix, and optimized implementations of this method are available in well-known dense linear algebra libraries for most hardware architectures. The Gauss-Huard algorithm (GHA) is a reliable and alternative method that presents a computational effort close to that of the LU-based approach. In this work we present several implementations of GHA on the Intel Xeon Phi coprocessor. The experimental results show that our solvers based in GHA represent a competitive alternative to LU-based solvers, being an appealing method for the solution of small to medium linear systems, with remarkable reductions in the time-to-solution for systems of dimension \(n\le 4,000\).
Ernesto Dufrechou, Pablo Ezzatti, Enrique S. Quintana-Ortí, Alfredo Remón
On a Dynamic Scheduling Approach to Execute OpenCL Jobs on APUs
Abstract
This work presents a dynamic scheduling approach used to load balance the computation between CPU and GPU of an Accelerated Processing Unit (APU). The results have shown that the dynamic load balancing strategy was successful in reducing the computation time of an Human Immune System (HIS) simulator that was used as benchmark. The dynamic scheduling approach accelerate the HIS code up to 7 times when compared to the parallel version that executes using only the CPU cores, up to \(32\,\%\) when compared to the parallel version that uses only the GPU cores, and up to \(9\,\%\) when compared to our previous static scheduling approach.
Tiago Marques do Nascimento, Rodrigo Weber dos Santos, Marcelo Lobosco

Scientific Computing Applications

Frontmatter
Fine-Tuning Xeon Architecture Vectorization and Parallelization of a Numerical Method for Convection-Diffusion Equations
Abstract
This work describes the optimization process to improve the performance from a convection-diffusion equation from the HOPMOC method, on the Xeon architecture through the help Intel (r) tools, Vtune Amplifier, Compiler Reports and Intel Advisor. HOPMOC is a finite diffrence method to solve parabolic equations with convective dominance on a cluster with multiple multicore nodes. The method is based both on the modified method of characteristics and the Hopscotch method, it is implemented through an explicit-implicit operator splitting technique. This work studies the vectorization and parallelization version from HOPMOC under a Xeon processor architecture, and shows performance improvements up to 2 times per core, due to optimization via vectorization techniques and a gain up to 30 times on a 54 core environment, due to parallel strategies, compared to the sequential code.
Frederico Luís Cabral, Carla Osthoff, Diego Brandão, Mauricio Kischinhevsky
Parallel Performance Analysis of a Regional Numerical Weather Prediction Model in a Petascale Machine
Abstract
This paper presents the parallel performance achieved by a regional model of numerical weather prediction (NWP), running on thousands of computing cores in a petascale supercomputing system. It was obtained good scalability, running with up to 13440 cores, distributed in 670 nodes. These results enables this application to solve large computational challenges, such as perform weather forecast at very high spatial resolution.
Roberto Pinto Souto, Pedro Leite da Silva Dias, Franck Vigilant
Backmatter
Metadaten
Titel
High Performance Computing
herausgegeben von
Carla Osthoff
Philippe Olivier Alexandre Navaux
Carlos Jaime Barrios Hernandez
Pedro L. Silva Dias
Copyright-Jahr
2015
Electronic ISBN
978-3-319-26928-3
Print ISBN
978-3-319-26927-6
DOI
https://doi.org/10.1007/978-3-319-26928-3

Neuer Inhalt