nach oben

2010 | Buch

Kapitel lesen Erstes Kapitel lesen

Parallel Processing and Applied Mathematics

8th International Conference, PPAM 2009, Wroclaw, Poland, September 13-16, 2009, Revised Selected Papers, Part II

herausgegeben von: Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, Jerzy Wasniewski

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Inhaltsverzeichnis

Frontmatter

Workshop on Scheduling for Parallel Computing (SPC 2009)

Fully Polynomial Time Approximation Schemes for Scheduling Divisible Loads

In this paper we study divisible loads scheduling in heterogeneous systems with high bandwidth. Divisible loads represent computations which can be arbitrarily divided into parts and performed independently in parallel. We propose fully polynomial time approximation schemes for two optimization problems. The first problem consists in finding the maximum load which can be processed in a given time. It turns out that this problem can be reduced to minimization of a half-product. The second problem is computing the minimum time required to process load of a given size. The FPTAS solving this problem uses a dual approximation algorithm approach.

Joanna Berlińska

Semi-online Preemptive Scheduling: Study of Special Cases

We use the duality of linear programing to obtain exact formulas of competitive ratio for the semi-online preemptive scheduling on three and four machines. We use the linear programs from [3]. Namely we consider the online scheduling and the semi-online scheduling with known sum of processing times. We solve the linear programs symbolically by examining all basic solution candidates. All solutions are obtained by a computer but all are verifiable by hand.

Tomáš Ebenlendr

Fast Multi-objective Reschulding of Grid Jobs by Heuristics and Evolution

Scheduling of jobs to a computational grid is a permanent process due to the dynamic nature of the grid and the frequent arrival of new jobs. Thus, a permanent rescheduling of already planned and new jobs must be performed. This paper will continue and extend previous work, which focused on the tuning of our

lobal

ptimising

esource

roker and

llocator GORBA in a static planning environment. A formal definition of the scheduling problem and a classification will be given. New heuristics for rescheduling exploiting the “old plan” will be introduced and it will be investigated how they contribute to the overall planning process. Furthermore, the maximal possible load, which can be handled within the given time frame of three minutes, will be examined for a grid of growing size of up to 6000 grid jobs and 600 resources.

Wilfried Jakob, Alexander Quinte, Karl-Uwe Stucky, Wolfgang Süß

Comparison of Program Task Scheduling Algorithms for Dynamic SMP Clusters with Communication on the Fly

The paper presents comparison of the two scheduling algorithms developed for program structurization for execution in dynamic SMP clusters implemented in Systems on Chip (SoC) technology. SoC modules are built of a set of processors, memory modules and a multi–bus interconnection network. A set of such SoCs is interconnected by a global communication network. Inter–processor communication inside SoC modules uses a novel technique of data transfers on the fly. The algorithms present two different scheduling approaches. The first uses ETF–based genetically supported list scheduling heuristics to map nodes of a program to processors. The second is a clustering–based algorithm using Moldable Tasks (MT) to structure the graph. Both algorithms structure computations and local data transfers to introduce processor switching and data transfers on the fly. The algorithms were tested using a set of automatically generated parameterized program graphs. The results were compared to results obtained using a classic ETF–based list scheduling without data transmissions on the fly.

Łukasz Maśko, Marek Tudruj, Gregory Mounie, Denis Trystram

Study on GEO Metaheuristic for Solving Multiprocessor Scheduling Problem

We propose a solution of the multiprocessor scheduling problem based on applying a relatively new metaheuristic called Generalized Extremal Optimization (GEO). GEO is inspired by a simple coevolutionary model known as Bak-Sneppen model. The model describes an ecosystem consisting of

species. Evolution in this model is driven by a process in which the weakest species in the ecosystem, together with its nearest neighbors is always forced to mutate. This process shows characteristic of a phenomenon called a punctuated equilibrium which is observed in evolutionary biology. We interpret the multiprocessor scheduling problem in terms of the Bak-Sneppen model and apply the GEO algorithm to solve the problem. We compare GEO algorithm with well-known Simulated Annealing (SA) algorithm. Both algorithms have some similarities which are considered in this paper. Experimental results show that GEO despite of its simplicity outperforms SA algorithm in all range of the scheduling instances.

Piotr Switalski, Franciszek Seredynski

Online Scheduling of Parallel Jobs on Hypercubes: Maximizing the Throughput

We study the online problem of scheduling unit-time parallel jobs on hypercubes. A parallel job has to be scheduled between its release time and deadline on a subcube of processors. The objective is to maximize the number of early jobs. We provide a 1.6-competitive algorithm for the problem and prove that no deterministic algorithm is better than 1.4-competitive.

Ondřej Zajíček, Jiří Sgall, Tomáš Ebenlendr

The Third Workshop on Language-Based Parallel Programming Models (WLPP 2009)

Verification of Causality Requirements in Java Memory Model Is Undecidable

The purpose of the Java memory model is to formalize the behavior of the shared memory in multithreaded Java programs. The subtlest points of its formalization are causality requirements that serve to provide safety and security guarantees for incorrectly synchronized Java programs. In this paper, we consider the problem of verifying whether an execution of a multithreaded Java program satisfies these causality requirements and show that this problem is undecidable.

Matko Botinčan, Paola Glavan, Davor Runje

A Team Object for CoArray Fortran

This paper outlines the features of a team object for CoArray Fortran to support multi-disciplinary applications. It combines object-oriented design, supported in Fortran 2003, with the parallel coarray model, supported in Fortran 2008. It extends the coarray model by adding state to a coarray object. The compiler and run-time environment use this state to dereference co-indices relative to the team that created the object. Methods are associated with a team object for synchronization, memory allocation and collective operations across the team.

Robert W. Numrich

On the Definition of Service Abstractions for Parallel Computing

The availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular. New languages and libraries have been recently proposed to increase productivity in the context of these architectures. In this paper we present a novel approach that resorts to the service abstraction for annotating parallelism.

Hervé Paulino

The Second Workshop on Performance Evaluation of Parallel Applications on Large-Scale Systems

Performance Debugging of Parallel Compression on Multicore Machines

The power of contemporary processors is based more and more on multicore architectures. This kind of power is accessible only to parallel applications, which are able to provide work for each core. Creating a scalable parallel/multithreaded application efficiently using available cores is a difficult task, especially if I/O performance must be considered as well. We consider a multithreaded database loader with a compressing function. The performance of the loader is examined from a number of perspectives. Because compression is a computationally intensive task, parallel execution can potentially provide a big advantage in this case. A list of performance related areas we encountered is presented and discussed. We identify and verify tools allowing us to deal with specific performance areas. We find out, that only an orchestrated employment of several tools can bring the desired effect. The discussion provides a general procedure one can follow when improving the performance of multithreaded programs. Key performance areas specific to the database loader are pointed out. A special interest is directed towards performance variations observed when many parallel threads are active on a multicore CPU. A significant slowdown of computations is observed if many threads are computing simultaneously. The slowdown is related mainly to memory access and cache behavior and it is much larger for Core2 Quad system than a dual Xeon machine.

Janusz Borkowski

Energy Considerations for Divisible Load Processing

In this paper we analyze energy usage in divisible load processing. Divisible load theory (DLT) applies to computations which can be divided into parts of arbitrary sizes, and the parts can be independently processed in parallel. The shortest schedule for divisible load processing is determined by the speed of computation and communication. Energy usage for such a time-optimum schedule is analyzed in this paper. We propose a simple model of energy consumption. Two states of the computing system are taken into account: an active state and an idle state with reduced energy consumption. Energy consumption is examined as a function of system parameters. We point out possible ways of energy conservation. It is demonstrated that energy can be saved by use of parallel processing.

Maciej Drozdowski

Deskilling HPL

Using an Evolutionary Algorithm to Automate Cluster Benchmarking

The High-Performance Linpack (HPL) benchmark is the accepted standard for measuring the capacity of the world’s most powerful computers, which are ranked twice yearly in the Top 500 List. Since just a small deficit in performance can cost a computer several places, it is important to tune the benchmark to obtain the best possible result. However, the adjustment of HPL’s seventeen configuration parameters to obtain maximum performance is a time-consuming task that must be performed by hand. In a previous paper, we provided a preliminary study that proposed the tuning of HPL parameters by means of an Evolutionary Algorithm. The approach was validated on a small cluster. In this article, we extend this initial work by describing

Acbea

, a fully-automatic benchmark tuning tool that performs both the configuration and installation of HPL followed by an automatic search for optimized parameters that will lead to the best benchmark results. Experiments have been conducted to validate this tool on several clusters, exploiting in particular the Grid’5000 infrastructure.

Dominic Dunlop, Sébastien Varrette, Pascal Bouvry

Monitoring of SLA Parameters within VO for the SOA Paradigm

Current trends in modern scientific and business IT infrastructures pose certain requirements in the middleware layer in order to maximize the automation of all life-cycle phases of such infrastructures including inception, deployment, execution, and dissolution. In case this infrastructure is composed of resources of different organizations, for instance in form of a Virtual Organization, the management of these resources is especially needed for achieving new quality in business. In this paper we deal with a specific aspect of the IT infrastructure management related to autonomous enforcement of Service Level Agreement between organizations sharing their resources within a Virtual Organization. The presented framework utilizes semantic technologies in order to virtualize the heterogeneity of underlying middleware components and to allow integration of services between these organizations.

Wlodzimierz Funika, Bartosz Kryza, Renata Slota, Jacek Kitowski, Kornel Skalkowski, Jakub Sendor, Dariusz Krol

A Role-Based Approach to Self-healing in Autonomous Monitoring Systems

The main intention of this paper is to introduce the proposition of a new role-based approach to self-healing monitoring. This is preceded by an overview of existing approaches to the monitoring of distributed systems using self-healing features. Starting with a discussion of autonomous monitoring systems, we will come to self-healing systems. These systems should be able to automatically resolve the problems that occur in a system under monitoring. The paper provides insight into various aspects of self-healing monitoring systems at the software and hardware level. A detailed description of a new agent-based system, AgeMon, is covered later on. The system is based on the roles played by different types of agents. The self-healing features can be achieved by a form of cooperation of agents, e.g. monitoring agents, rule agents, database agents. The paper discusses the roles and gives an implementation background.

Włodzimierz Funika, Piotr Pȩgiel

Parallel Performance Evaluation of MIC(0) Preconditioning Algorithm for Voxel μFE Simulation

Numerical homogenization is used for up-scaling of a linear elasticity tensor of strongly heterogeneous micro-structures. Utilized approach assumes presence of a periodic micro-structure and thus periodic boundary conditions. Rotated trilinear Rannacher-Turek finite elements are used for the discretization, while a parallel PCG method is used to solve arising large-scale systems with sparse, symmetric, positive semidefinite matrices. Applied preconditioner is based on modified incomplete Cholesky factorization MIC(0).

The test problem represents a trabecular bone tissue, and takes into account only the elastic response of the solid phase. The voxel micro-structure of the bone is extracted from a high resolution computer tomography image. Numerical tests performed on parallel computers demonstrate the efficiency of the developed algorithm.

Ivan Lirkov, Yavor Vutov, Marcin Paprzycki, Maria Ganzha

Parallel HAVEGE

The HAVEGE algorithm [1] [2] generates unpredictable random numbers by gathering entropy from internal processor states that are inheritably volatile and impossible to tamper with in a controlled fashion by any application running on the target system. The method used to gather the entropy implies that its main loop will almost monopolize the CPU; the output depends on the operating system and other running applications, as well as some internal mechanisms that stir the processor states to generate an enormous amount of entropy. The algorithm was designed with the idea of single-core CPUs in mind, and no parallelization; however the recent market explosion of multi-core CPUs and the lack of results in increasing the CPU frequency justifies the need to research a multithreaded parallel version of HAVEGE, capable of running the same algorithm loop on each core independently and transparently combine the results in one single output bitstream. This paper will demonstrate how such a parallelization is possible and benchmark the output speed of its implementation.

Alin Suciu, Tudor Carean, Andre Seznec, Kinga Marton

The Fourth Grid Applications and Middleware Workshop (GAMW 2009)

UNICORE Virtual Organizations System

In this paper we present a novel Virtual Organizations management system called UVOS, developed within the Chemomentum project. It is a complete and production ready solution aiming at simplicity of deployment and management without sacrificing a flexibility and functionality. The system can be also used as an underlying technology for more high level solutions. The system was designed mainly for the UNICORE grid middleware but, as it uses the open SAML protocol and implements numerous SAML profiles, its adoption for other grid middlewares is straightforward. The paper compares the UVOS system to the other existing and popular solutions: VOMS and Shibboleth used with GridShib.

Krzysztof Benedyczak, Marcin Lewandowski, Aleksander Nowiński, Piotr Bała

Application of ADMIRE Data Mining and Integration Technologies in Environmental Scenarios

In this paper we present our work on the engine for integration of environmental data. We present a suite of selected environmental scenarios, which are integrated into a novel data mining and integration environment, being developed in the project ADMIRE . The scenarios have been chosen for their suitability for data mining by environmental experts. They deal with meteorological and hydrological problems, and apply the chosen solutions to pilot areas within Slovakia. The main challenge is that the environmental data required by scenarios are maintained and provided by different organizations and are often in different formats. We present our approach to the specification and execution of data integration tasks, which deals with the distributed nature and heterogeneity of required data resources.

Marek Ciglan, Ondrej Habala, Viet Tran, Ladislav Hluchy, Martin Kremler, Martin Gera

Performance Based Matchmaking on Grid

Grid Technologies supply users with high computational and storage resources to execute demanding applications. To this end, Grid environments must provide query and discovery tools, able to select the most suitable resource(s) satisfying application requirements. A description of application and resources, grounded on a common and shared basis, is therefore crucial to favour an effective pairing. A viable criterion to match demand (job) with supply (computational resource) is to characterize resources by means of their performance evaluated through benchmarks relevant to the application. We introduce GREEN, a distributed matchmaker, based on a two-level benchmarking methodology. GREEN facilitates the submission of jobs to the Grid, through the specification of both syntactic and performance requirements, independently of the underlying middleware and thus fostering Grid interoperability.

Andrea Clematis, Angelo Corana, Daniele D’Agostino, Antonella Galizia, Alfonso Quarati

Replica Management for National Data Storage

National Data Storage is a distributed data storage system intended to provide high quality backup, archiving and data access services. These services guarantee high level of data protection as well as high performance of data storing and retrieval by using replication techniques. In this paper some conceptual and implementation details on creating a Prediction and Load Balancing Subsystem for replica management are presented. Preliminary real system test results are also shown.

Renata Słota, Darin Nikolow, Marcin Kuta, Mariusz Kapanowski, Kornel Skałkowski, Marek Pogoda, Jacek Kitowski

Churn Tolerant Virtual Organization File System for Grids

A Grid computing environment allows forming Virtual Organizations (VOs) to aggregate and share resources. We present a VO File System (VOFS) which is a VO-aware distributed file system that allows VO members to share files. VOFS supports access and location transparency by maintaining a common file namespace, which is decentralized in order to improve robustness. VOFS includes a P2P system of file servers, a VO membership service and a policy and role based security mechanism. VOFS can be mounted to a local file system in order to access files using POSIX file API. VOFS can operate in a dynamic Grid environment (e.g. desktop Grids) since it tolerates unplanned resource arrival and departure (churn) while maintaining a single uniform namespace. It supports transparent disconnected operations that allow the user to work on files while being disconnected.

Leif Lindbäck, Vladimir Vlassov, Shahab Mokarizadeh, Gabriele Violino

The Fourth Workshop on Large Scale Computations on Grids (LaSCoG 2009)

Quasi-random Approach in the Grid Application SALUTE

tochastic

Lgorithms for

ltra-fast

ransport in s

miconductors

(SALUTE)

is a Grid application which integrates a set of novel Monte Carlo, quasi-Monte Carlo and hybrid algorithms for solving various computationally intensive problems important for industry (design of modern semiconductor devices). SALUTE studies memory and quantum effects during the femtosecond relaxation process due to electron-phonon interaction in one-band semiconductors or quantum wires.

There are two main reasons for running this application on the Grid: (i) quantum problems are very computationally intensive; (ii) the inherently parallel nature of Monte Carlo applications makes efficient use of Grid resources.

In this paper we study the quasirandom approach in SALUTE, using the scrambled Halton, Sobol and Niederreiter sequences. A large number of tests have been performed on the SEEGRID grid infrastructure using specially developed grid implementation scheme. Novel results for energy and density distribution, obtained in the inhomogeneous case with applied electric field are presented.

Emanouil Atanassov, Aneta Karaivanova, Todor Gurov

Mobile Agents for Management of Native Applications in GRID

Mobile Agents technology can be exploited to develop mobile Grid services. Here we present a Grid service for jobs management, implemented by Mobile Agents. Grid users can exploit the service by a WSRF interface being unaware about agents technology. A console has been designed to interface user applications with agents. Programmers are able to extend their applications with the ability to be check-pointed, suspended, resumed, cloned, dispatched. It can be done simply by adding some methods to their code, which specialize management on occurrence of particular events. We mean that applications do not need to be rewritten into different languages or adopting specific programming models. We realized a prototype implementation for management of native applications.

Rocco Aversa, Beniamino Di Martino, Renato Donini, Salvatore Venticinque

Leveraging Complex Event Processing for Grid Monitoring

Currently existing monitoring services for Grid infrastructures typically collect information from local agents and store it as data sets in global repositories. However, for some scenarios querying real-time streams of monitoring information would be extremely useful. In this paper, we evaluate Complex Event Processing technologies applied to real-time Grid monitoring. We present a monitoring system which uses CEP technologies to expose monitoring information as queryable data streams. We study an example use case – monitoring for job rescheduling. We also employ CEP technologies for data reduction, measure the overhead of monitoring, and conclude that real-time Grid monitoring is possible without excessive intrusiveness for resources and network.

Bartosz Balis, Bartosz Kowalewski, Marian Bubak

Designing Execution Control in Programs with Global Application States Monitoring

The paper is concerned with methodology for designing ececution control based on application global states monitoring in parallel and distributed programs. The principles of global state monitoring i parallel programs are recalled. Then new control statement semantics related with the global states monitoring are proposed, including synchronous and asynchronous approach to the maintenance of information used for the global states-control in programs. Proposals of syntactical solutions for designing a control language which accounts for program global states in execution control in parallel/distributed programs are presented.

Janusz Borkowski, Marek Tudruj

Distributed MIND – A New Processing Model Based on Mobile Interactive Documents

Collaborative computing involves human actors and artificial agents interacting in a distributed system to resolve a global problem, often formed dynamically during the computation process. Owing to the open nature of the system and non-cooperative settings, its computations are in general non-algorithmic, i.e. their outcome cannot be calculated in advance by any closed distributed system. Authors advocate for a new processing model, based on exchange of documents implemented as autonomous and mobile agents, providing adaptive and self-aware content as interface units.

Magdalena Godlewska, Bogdan Wiszniewski

A Framework for Observing Dynamics of Agent-Based Computations

The paper contains a description of a framework designed for observing dynamics of mobile-agent computational applications. Such observations are thought to provide a basis for the experimental verification of an existing stochastic model of agent-oriented distributed computations. Some test results are also provided, which show that the proposed observational environment does not disturb an observed system’s dynamics.

Jarosław Kawecki, Maciej Smołka

HyCube: A DHT Routing System Based on a Hierarchical Hypercube Geometry

This paper presents a DHT routing system based on a hierarchical hypercube geometry. An approach employing a novel variable metric adopting the Steinhaus transform is presented. The use of this metric and the hierarchical hypercube geometry allows to reach very good performance of the network and a very high level of resilience to node failures. The architecture of the network and the routing algorithm are presented. The results of the simulations have been included in the paper and compared with existing solutions.

Artur Olszak

Workshop on Parallel Computational Biology (PBC 2009)

Accuracy and Performance of Single versus Double Precision Arithmetics for Maximum Likelihood Phylogeny Reconstruction

The multi-core revolution and the biological data flood that is generated by novel wet-lab techniques pose new technical challenges for large-scale inference of phylogenetic trees from molecular sequence data. We present the first assessment of accuracy and performance trade-offs between single and double precision arithmetics and the first SSE3 vectorization for computing the Phylogenetic Likelihood Kernel (PLK) which forms part of many state-of-the art tools for phylogeny reconstruction and consumes 90-95% of the overall execution time of these tools. Moreover, the PLK also dominates memory consumption, which means that deploying single precision is desirable to accommodate increasing memory requirements and to devise efficient mappings to GPUs. We find that the accuracy provided by single precision is sufficient for conducting tree searches, but that the increased amount of scaling operations to prevent numerical underflow, even when using SSE3 operations that accelerate the single precision PLK by 60%, generates run-time penalties compared to double precision on medium-sized datasets. However, on large datasets, single precision can yield significant execution time savings of 40% because of increased cache efficiency and also reduces memory footprints by 50%.

Simon A. Berger, Alexandros Stamatakis

Automated Design of Assemblable, Modular, Synthetic Chromosomes

The goal of the

Saccharomyces cerevisiae

v2.0 project is the complete synthesis of a re-designed genome for baker’s yeast. The resulting organism will permit systematic studies of eukaryotic chromosome structure that have been impossible to explore with traditional gene-at-a-time experiments. The efficiency of chemical synthesis of DNA does not yet permit direct synthesis of an entire chromosome, although it is now feasible to synthesize multi-kilobase pieces of DNA that can be combined into larger molecules. Designing a chromosome-sized sequence that can be assembled from smaller pieces has to date been accomplished by biological experts in a laborious and error-prone fashion. Here we pose DNA design as an optimization problem and obtain optimal solutions with a parallelizable dynamic programming algorithm.

Sarah M. Richardson, Brian S. Olson, Jessica S. Dymond, Randal Burns, Srinivasan Chandrasegaran, Jef D. Boeke, Amarda Shehu, Joel S. Bader

GPU Parallelization of Algebraic Dynamic Programming

Algebraic Dynamic Programming (ADP) is a framework to encode a broad range of optimization problems, including common bioinformatics problems like RNA folding or pairwise sequence alignment. The ADP compiler translates such ADP programs into C. As all the ADP problems have similar data dependencies in the dynamic programming tables, a generic parallelization is possible. We updated the compiler to include a parallel backend, launching a large number of independent threads. Depending on the application, we report speedups ranging from 6.1× to 25.8× on a Nvidia GTX 280 through the CUDA libraries.

Peter Steffen, Robert Giegerich, Mathieu Giraud

Parallel Extreme Ray and Pathway Computation

Pathway analysis is a powerful tool to study metabolic reaction networks under steady state conditions. An Elementary pathway constitutes a minimal set of reactions that can operate at steady state such that each reaction also proceeds in the appropriate direction. In mathematical terms, elementary pathways are the extreme rays of a polyhedral cone—the solution set of homogeneous equality and inequality constraints. Enumerating all extreme rays—given the constraints—is difficult especially if the problem is degenerate and high dimensional. We present and compare two approaches for the parallel enumeration of extreme rays, both based on the double description method. This iterative algorithm has proven efficient especially for degenerate problems, but is difficult to parallelize due to its sequential operation. The first approach parallelizes single iteration steps individually. In the second approach, we introduce a

born/die

matrix to store intermediary results, allowing for parallelization across several iteration steps. We test our multi-core implementations on a 16 core machine using large examples from combinatorics and biology.

Availability:

The implementation is available at

http://csb.ethz.ch

in the tools section (Java binaries); sources are available from the authors upon request.

Marco Terzer, Jörg Stelling

Minisymposium on Applications of Parallel Computation in Industry and Engineering

Parallelized Transient Elastic Wave Propagation in Orthotropic Structures

We discuss the parallelization of a Velocity-Stress FDTD (VS-FDTD) code for the simulation of the propagation of mechanical waves in three-dimensional microstructures. The C++ code has been parallelized using MPI making extensive use of the Blitz++ library for local computation. We present numerical results for pyramid shaped domains, representing the tip of a focusing lens. We discuss the parallel implementation on a cluster consisting of Opteron 250 nodes connected by a high-speed Quadrics QsNet II network.

Peter Arbenz, Jürg Bryner, Christine Tobler

Parallel Numerical Solver for Modelling of Electromagnetic Properties of Thin Conductive Layers

In this paper we develop the parallel numerical algorithm for modelling of electromagnetic properties of thin conductive layers. The explicit finite difference scheme is obtained after approximation of the system of differential equations on the staggered grid. The parallelization of the discrete algorithm is based on the domain decomposition method. The tool of parallel linear algebra objects ParSol is used to get a semiautomatical implementation of this parallel algorithm on distributed memory computers. Some results of numerical simulations are presented and the efficiency of the proposed parallel algorithm is investigated.

Raimondas Čiegis, Žilvinas Kancleris, Gediminas Šlekas

Numerical Health Check of Industrial Simulation Codes from HPC Environments to New Hardware Technologies

The numerical health check of industrial codes is crucial to give confidence about the computed results performed by studying the round-off error propagation. This problem is exacerbated in a super-computing environment where trillions of floating-point operations may be performed every second. A parallel program based on domain decomposition as shown in this paper could compute slightly different results depending on the number of processors. This numerical health check is also needed to verify if a numerical code (or some parts of the numerical code) could still have an acceptable accuracy when using single precision instead of double precision which is useful to run numerical codes on new hardware technologies like GPU where the double precision is unavailable or expensive. The round-off error propagation is measured with the MPFI (interval arithmetic approach) and CADNA (probabilistic approach) libraries.

Christophe Denis

Application of Parallel Technologies to Modeling Lithosphere Dynamics and Seismicity

A problem of modeling some processes in the crust is considered. A brief description of different modifications of the spherical block model of lithosphere dynamics and seismicity is presented; the emphasis is on incorporation of lithospheric inhomogeneities into the model. An approach to program realization based on the use of parallel technologies is applied. Results of numerical experiments with an approximation of the global system of tectonic plates are discussed.

Boris Digas, Lidiya Melnikova, Valerii Rozenberg

AMG for Linear Systems in Engine Flow Simulations

The performance of three fundamentally different AMG solvers for systems of linear equations in CFD simulations using SIMPLE and PISO algorithm is examined. The presented data is discussed with respect to computational aspects of the parallelisation. It indicates that for the compressible subsonic flows considered here basic AMG methods not requiring Krylov acceleration are faster than approaches with more expensive setup as well as recently presented k-cycle methods, but also that these methods will need special treatment for parallel application.

Maximilian Emans

Parallel Implementation of a Steady State Thermal and Hydraulic Analysis of Pipe Networks in OpenMP

The considerable computation time of a practical application of sequential algorithms for simulating thermal and flow distribution in pipe networks is the motivating factor to study their parallel implementation. The mathematical model formulated and studied in the paper requires the solution of a set of nonlinear equations, which are solved by the Newton-Raphson method. An object-oriented solver automatically formulates the equations for networks of an arbitrary topology. The hydraulic model that is chosen as a benchmark consists of nodal flows and loop equations. A general decomposition algorithm for analysis of flow and temperature distribution in a pipe network is presented, and results of speedup of its parallel implementation are demonstrated.

Mykhaylo Fedorov

High-Performance Ocean Color Monte Carlo Simulation in the Geo-info Project

The Geo-Info project aims to provide Geoscience experts with software toolkits tailored for selected target application domains. Within this framework, high-performance computing (HPC) solutions are here applied to optimize a Monte Carlo (MC) code to support Ocean Color (OC) investigations. The present paper introduces the Geo-Info project, describes the HPC solutions applied for the OC MC case study, and gives early performance results focusing on runtime, speedup, and parallel efficiency.

Tamito Kajiyama, Davide D’Alimonte, José C. Cunha, Giuseppe Zibordi

EULAG Model for Multiscale Flows – Towards the Petascale Generation of Mesoscale Numerical Weather Prediction

EULAG is an established, highly parallel model for simulating fluid flows across a wide range of scales. It is known to scale well up to 16000 processors on IBM Blue Gene/W. It is noteworthy for its non-oscillatory integration algorithm MPDATA, advanced elliptic solver and generalized coordinate formulation. In this paper we focus on complex orographic flows and present the perspective of implementing EULAG as a high resolution weather prediction tool for Europe.

As resolution of numerical models improves, numerical weather prediction enters the phase where traditional convection parameterization becomes obsolete and is replaced with a cloud-resolving approach. The boundary conditions, especially the topography, are becoming more and more complicated, demanding higher accuracy and better conservation properties from the numerical model construction. This calls for seeking fast and precise fluid solvers.

We present preliminary results of simulations of the flow over realistic topography of the Alps, which proves the model capability to handle steep slopes. We demonstrate performance of the code on IBM Blue Gene/L architecture and compare different I/O strategies, from simple one-node operations to MPI I/O solution. An example of application of VAPOR, a tool for visualization and analysis of tera-scale sized data sets is provided.

Zbigniew P. Piotrowski, Marcin J. Kurowski, Bogdan Rosa, Michal Z. Ziemianski

Parallel Implementation of Particle Tracking and Collision in a Turbulent Flow

Parallel algorithms for particle tracking are central to the modeling of a wide range of physical processes including cloud formation, spray combustion, flows of ash from wildfires and reactions in nuclear systems. Here we focus on tracking the motion of cloud droplets with radii in the range from 10 to 60

μm

that are suspended in a turbulent flow field. The gravity and droplet inertia are simultaneously considered. Our codes for turbulent flow and droplet motion are fully parallelized in MPI (message passing interface), allowing efficient computation of dynamic and kinematic properties of a polydisperse suspension with more than 10

droplets. Previous direct numerical simulations (DNS) of turbulent collision, due to their numerical complexity, are typically limited to small Taylor microscale flow Reynolds numbers (~100), or equivalently to a small physical domain size at a given flow dissipation rate in a turbulent cloud. The difficulty lies in the necessity to treat simultaneously a field representation of the turbulent flow and free movement of particles. We demonstrate here how the particle tracking and collision can be handled within the framework of a specific domain decomposition. Our newly developed MPI code can be run on computers with distributed memory and as such can take full advantage of available computational resources. We discuss scalability of five major computational tasks in our code: collision detection, advancing particle position, fluid velocity interpolation at particle location, implementation of the periodic boundary condition, using up to 128 CPUs. In most tested cases we achieved parallel efficiency above 100 %, due to a reduction in effective memory usage. Finally, our MPI results of pair statistics are validated against a previous OpenMP implementation.

Bogdan Rosa, Lian-Ping Wang

A Distributed Multilevel Ant-Colony Approach for Finite Element Mesh Decomposition

The

-way finite element mesh (FEM) decomposition problem is an NP-complete problem, which consists of finding a decomposition of a FEM into

balanced submeshes such that the number of cut edges is minimized. The multilevel ant-colony algorithm (MACA) is quite new and promising hybrid approach for solving different type of FEM-decomposition problems. The MACA is a swarm-based algorithm and therefore inherently suitable for parallel processing on many levels. Motivated by the good performance of the MACA and the possibility to improve it’s performance (computational cost and/or solution quality), in this paper we discuss the results of parallelizing the MACA on largest scale (on colony level). Explicitly, we present the distributed MACA (DMACA) approach, which is based on the idea of parallel independent runs enhanced with cooperation in form of a solution exchange among the concurrent searches. Experimental evaluation of the DMACA on a larger set of benchmark FEM-decomposition problems shows that the DMACA compared to the MACA can obtain solutions of equal quality in less computational time.

Katerina Taškova, Peter Korošec, Jurij Šilc

Minisymposium on Interval Analysis

Toward Definition of Systematic Criteria for the Comparison of Verified Solvers for Initial Value Problems

Solving initial value problems for ordinary differential equations is a common task in many disciplines. Over the last decades, several different verified techniques have been developed to compute enclosures of the exact result numerically. The obtained bounds are guaranteed to contain the corresponding solution to the initial value problem. Ideally, we want to calculate

tight

enclosures over sufficiently

long

time intervals for systems with uncertainties in both the initial conditions and system parameters. However, the existing solvers are not always equal in attaining this goal. On the one hand, the quality of the obtained results depends strongly on the types of ordinary differential equations that describe a given dynamical system. On the other hand, a great influence of the considered uncertainties can be observed. Our general aim is to provide assistance in choosing an appropriate verified initial value problem solver with its most suitable ‘tuning parameters’ for the application at hand. In this paper, we make first steps toward setting up a framework for the fair comparison of the different approaches. We suggest criteria, benchmark scenarios, and typical applications which can be used for the quantification of the efficiency of verified initial value problem solvers.

Ekaterina Auer, Andreas Rauh

Fuzzy Solution of Interval Nonlinear Equations

In [10,12], a new concept of interval and fuzzy linear equations solving based on the generalized procedure of interval extension called “interval extended zero” method has been proposed. The central for this approach is the treatment of “interval zero” as an interval centered around 0. It is shown that such proposition is not of heuristic nature, but is a direct consequence of interval subtraction operation. It is shown that the resulting solution of interval linear equation based on the proposed method may be naturally treated as a fuzzy number. In the current report, the method is extended to the case of nonlinear interval equations. It is shown that opposite to the known methods, a new approach makes it possible to get both the positive and negative solutions of quadratic interval equation.

Ludmila Dymova

Solving Systems of Interval Linear Equations with Use of Modified Interval Division Procedure

A new approach to interval division based on the concept of “interval extended zero” method [10,11] is proposed. This modified interval devision is used for solving the systems of interval linear equations. The seven known examples are used as an illustration of the method’s efficacy. It is shown that a new method provides results close to the so-called maximal inner solution. The method not only allows us to decrease the excess width effect, but makes it possible to avoid the inverted interval solutions too.

Ludmila Dymova, Mariusz Pilarek, Roman Wyrzykowski

Remarks on Algorithms Implemented in Some C++ Libraries for Floating-Point Conversions and Interval Arithmetic

The main aim of the paper is to give a short presentation of selected conversion functions developed by the author. They are included in two C++ libraries. The

FloatingPointConversion

library is dedicated for conversions in the area of floating-point numbers and the second one, the

IntervalArithmetic

library, carries out the similar task for interval values as well as supports computations in the floating-point interval arithmetic with a suitable

CInterval

class. The functions considered are all intended to be used with the Intel Architectures (i.e. the IA-32 and the IA-64) and dedicated for C++ compilers that specify 10 bytes for the

long double

data type.

Malgorzata A. Jankowska

An Interval Method for Seeking the Nash Equilibria of Non-cooperative Games

Computing Nash equilibria in continuous games is a difficult problem. In contrast to discrete games, algorithms developed for continues ones are rather inefficient. This paper proposes a new approach – making use of interval methods we try to solve the problem directly, seeking points that fulfill Nash conditions. We also consider a shared-memory parallelization of the proposed algorithm. Preliminary numerical results are presented. Some new practical aspects of interval methods are considered.

Bartłomiej Jacek Kubica, Adam Woźniak

From Gauging Accuracy of Quantity Estimates to Gauging Accuracy and Resolution of Measuring Physical Fields

For a numerical physical quantity

, because of the measurement imprecision, the measurement result

$\widetilde v$

is, in general, different from the actual value

of this quantity. Depending on what we know about the measurement uncertainty

$\Delta v\stackrel{\rm def}{=}\widetilde v-v$

, we can use different techniques for dealing with this imprecision: probabilistic, interval, etc.

When we measure the values

(

) of physical fields at different locations

(and/or different moments of time), then, in addition to the same measurement uncertainty, we also encounter another type of

localization

uncertainty: that the measured value may come not only from the desired location

, but also from the nearby locations.

In this paper, we discuss how to handle this additional uncertainty.

Vladik Kreinovich, Irina Perfilieva

A New Method for Normalization of Interval Weights

A new method for normalization of interval weights based on the so-called “interval extended zero” method is proposed. The three desirable intuitively obvious properties of normalization procedure are defined. The main of them is based on the assumption that the sum of normalized interval weights should be an interval centered around 1 with a minimal width. The advantages of a new method are illustrated with use of six numerical examples. It is shown that a new method performs better than known methods for normalization of interval weights as it provides the results with the properties which are close to the desirable ones.

Pavel Sevastjanov, Pavel Bartosiewicz, Kamil Tkacz

A Global Optimization Method for Solving Parametric Linear Systems Whose Input Data Are Rational Functions of Interval Parameters

An interval global optimization method combined with the Direct Method for solving parametric linear systems is used for computing a tight enclosure for the solution set of parametric linear system whose input data are non-linear functions of interval parameters. Revised affine arithmetic is used to handle the nonlinear dependencies. The Direct Method performs the monotonicity test to speed up the convergence of the global optimization. It is shown that the monotonicity test significantly increases the convergence of the global optimization method. Some illustrative examples are solved by the discussed method, and the results are compared to literature data produces by other methods.

Iwona Skalna

Direct Method for Solving Parametric Interval Linear Systems with Non-affine Dependencies

Many real-life problems can be modelled by systems of linear equations or safely transformed to the linear case. When uncertain model parameters are introduced by intervals, then a parametric interval linear system must properly be solved to meet all possible scenarios and yield useful results. In general case, system coefficients are nonlinear functions of parameters. The Direct Method for solving such systems is proposed. Affine arithmetic is used to handle nonlinear dependencies. Some illustrative examples are solved and the results are compared to the literature data produced by other methods.

Iwona Skalna

Workshop on Complex Collective Systems

Evaluating Lava Flow Hazard at Mount Etna (Italy) by a Cellular Automata Based Methodology

The individuation of areas that are more likely to be interested by lava eruptions is of fundamental relevance for mitigating possible consequences, both in terms of loss of human lives and material properties. Here we show a methodology for defining flexible high-detailed lava invasion susceptibility maps. It relies on both an adequate knowledge of the volcano, assessed by an accurate analysis of its past behaviour, a reliable Cellular Automata model for simulating lava flows on present topographic data and on High Performance Parallel Computing for increasing computational efficiency. The application of the methodology to the case of Mt Etna, the most active volcano in Europe, points out its usefulness in land use planning and Civil Defence applications.

Maria Vittoria Avolio, Donato D’Ambrosio, Valeria Lupiano, Rocco Rongo, William Spataro

Application of CoSMoS Parallel Design Patterns to a Pedestrian Simulation

In this paper, we discuss the implementation of a simple pedestrian simulation that uses a multi agent based design pattern developed by the CoSMoS research group. Given the nature of Multi Agent Systems (MAS), parallel processing techniques are inevitably used in their implementation. Most of these approaches rely on conventional parallel programming techniques, such as threads, Message Passing Interface (MPI) and Remote Method Invocation (RMI). The CoSMoS design patterns are founded on the use of Communicating Sequential Processes (CSP), a parallel computing paradigm that emphasises a process oriented rather than object oriented programming perspective.

Sarah Clayton, Neil Urquhard, Jon Kerridge

Artificial Intelligence of Virtual People in CA FF Pedestrian Dynamics Model

This paper deals with mathematical model of pedestrian flows. We focus here on an “intelligence” of virtual people. From macroscopic viewpoint pedestrian dynamics is already well simulated but from microscopic point of view typical features of people movement need to be implemented to models. At least such features are “keeping in mind” two strategies – the shortest path and the shortest time and keeping a certain distance from other people and obstacles if it is possible. In this paper we implement mathematical formalization of these features to stochastic cellular automata (CA) Floor Field (FF) model.

Ekaterina Kirik, Tat’yana Yurgel’yan, Dmitriy Krouglov

Towards the Calibration of Pedestrian Stream Models

Every year people die at mass events when the crowd gets out of control. Urbanization and the increasing popularity of mass events, from soccer games to religious celebrations, enforce this trend. Thus, there is a strong need to gain better control over crowd behavior. Simulation of pedestrian streams can help to achieve this goal. In order to be useful, crowd simulations must correctly reproduce real crowd behavior. This usually depends on the actual situation and a number of socio-cultural parameters. In other words, what ever model we come up with, it must be calibrated. Fundamental diagrams capture a large number of the socio-cultural characteristics in a very simple concept. In this paper we represent a method to calibrate a pedestrian stream simulation tool so that it can reproduce arbitrary fundamental diagram with high accuracy. That is, it correctly reproduces a given dependency of pedestrian speed on the crowd density. We demonstrate the functionality of the method with a cellular automaton model.

Wolfram Klein, Gerta Köster, Andreas Meister

Two Concurrent Algorithms of Discrete Potential Field Construction

Increasing demand for computational power in contemporary constructions has created the need to build faster CPUs and construct more efficient algorithms. In this context especially the concurrent algorithms seem to be very promising. Depending on the number of available CPUs they may offer significant reductions in computation time.

In this article two concurrent algorithms of potential field generation are proposed. They present two different approaches to problem domain partitioning called by the authors respectively as

tearing

and

nibbling

. It is shown that depending on the problem topology either

Tear

algorithm or

Nibble

algorithm is faster. Conclusions are summed up in form of experimental results presenting how the algorithms work in practice. However algorithms construct a discrete potential field according to some specific scheme, there should be no major problems with generalization them to other potential field schemes.

Konrad Kułakowski, Jarosław Wąs

Frustration and Collectivity in Spatial Networks

In random networks decorated with Ising spins, an increase of the density of frustrations reduces the transition temperature of the spin-glass ordering. This result is in contradiction to the Bethe theory. Here we investigate if this effect depends on the small-world property of the network. The results on the specific heat and the spin susceptibility indicate that the effect appears also in spatial networks.

Anna Mańka-Krasoń, Krzysztof Kułakowski

Weakness Analysis of a Key Stream Generator Based on Cellular Automata

This paper exposes weaknesses of a secret-key cipher based on pseudo-random number generation. The pseudo-random number generator was previously described as high quality and passing various statistical tests (entropy, Marsaglia tests). It is operated by one-dimensional, two-state, non-uniform cellular automata with rules of radius one. Special rule assignments generate number sequences with zero entropy. The paper proposes a systematic construction that leads to such assignments, as well as the computation of the size of the weak key space. Finally, we envision solutions to this problem, and discuss the possibility to discover additional issues.

Frédéric Pinel, Pascal Bouvry

Fuzzy Cellular Model for On-Line Traffic Simulation

This paper introduces a fuzzy cellular model of road traffic that was intended for on-line applications in traffic control. The presented model uses fuzzy sets theory to deal with uncertainty of both input data and simulation results. Vehicles are modelled individually, thus various classes of them can be taken into consideration. In the proposed approach, all parameters of vehicles are described by means of fuzzy numbers. The model was implemented in a simulation of vehicles queue discharge process. Changes of the queue length were analysed in this experiment and compared to the results of NaSch cellular automata model.

Bartłomiej Płaczek

Modeling Stop-and-Go Waves in Pedestrian Dynamics

Several spatially continuous pedestrian dynamics models have been validated against empirical data. We try to reproduce the experimental fundamental diagram (velocity versus density) with simulations. In addition to this quantitative criterion, we tried to reproduce stop-and-go waves as a qualitative criterion. Stop-and-go waves are a characteristic phenomenon for the single file movement. Only one of three investigated models satisfies both criteria.

Andrea Portz, Armin Seyfried

FPGA Realization of a Cellular Automata Based Epidemic Processor

More and more attention is paid to epidemics of infectious diseases that spread through populations across large regions and under condition may result on increased mortality in the infected population. In this paper, a FPGA processor based on Cellular Automata (CA) able to produce successive epidemic fronts, as it is expected by theory, is presented. The proposed CA SIR (Susceptible, Infectious and Recovered) model successfully includes the effect of movement of individuals, on the epidemic propagation as well as the effect of population vaccination which reduces the epidemic spreading. The FPGA design results from the automatically produced synthesizable VHDL code of the CA model and is advantageous in terms of low-cost, high speed and portability. Finally, the resulted hardware implementation of the CA model is considered as basic component of a wearable electronic system able to provide real time information concerning the epidemic propagation on the under test part of the examined population.

Pavlos Progias, Emmanouela Vardaki, Georgios Ch. Sirakoulis

Empirical Results for Pedestrian Dynamics at Bottlenecks

In recent years, several approaches for modeling pedestrian dynamics have been developed. Usually the focus is on the qualitative reproduction of empirically observed collective phenomena like the dynamical formation of lanes. Although this gives an indication of the realism of a model, for practical applications as in safety analysis reliable quantitative predictions are required. This asks for reliable empirical data. Here we discuss the current status of one of the basic scenarios, the dynamics at bottlenecks. Here there is currently no consensus even about the qualitative features of bottleneck flows.

Armin Seyfried, Andreas Schadschneider

Properties of Safe Cellular Automata-Based S-Boxes

In the paper we use recently proposed cellular automata (CA) - based methodology [9] to design 8 ×

(

≤ 8) S-boxes functionally equivalent to S-boxes used in current cryptographic standards. We provide an exhaustive experimental analysis of the proposed CA-based S-boxes in terms of non-linearity, autocorrelation and scalability, and compare results with other proposals. We show that the proposed CA-based S-boxes have cryptographic properties comparable or better than currently offered classical S-box tables.

Miroslaw Szaban, Franciszek Seredynski

Backmatter

Titel: Parallel Processing and Applied Mathematics
herausgegeben von: Roman Wyrzykowski
Jack Dongarra
Konrad Karczewski
Jerzy Wasniewski
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-14403-5
Print ISBN: 978-3-642-14402-8
DOI: https://doi.org/10.1007/978-3-642-14403-5