Skip to main content

2014 | Buch

Parallel Processing and Applied Mathematics

10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part II

herausgegeben von: Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, Jerzy Waśniewski

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This two-volume-set (LNCS 8384 and 8385) constitutes the refereed proceedings of the 10th International Conference of Parallel Processing and Applied Mathematics, PPAM 2013, held in Warsaw, Poland, in September 2013. The 143 revised full papers presented in both volumes were carefully reviewed and selected from numerous submissions. The papers cover important fields of parallel/distributed/cloud computing and applied mathematics, such as numerical algorithms and parallel scientific computing; parallel non-numerical algorithms; tools and environments for parallel/distributed/cloud computing; applications of parallel computing; applied mathematics, evolutionary computing and metaheuristics.

Inhaltsverzeichnis

Frontmatter

Workshop on Scheduling for Parallel Computing (SPC 2013)

Frontmatter
Scheduling Bag-of-Tasks Applications to Optimize Computation Time and Cost

Bag-of-tasks applications consist of independent tasks that can be performed in parallel. Although such problems are well known in classical scheduling theory, the distinctive feature of Grid and cloud applications is the importance of the cost factor: in addition to the traditional scheduling criterion of minimizing computation time, in Grids and clouds it also important to minimize the cost of using resources. We study the structural properties of the time/cost model and explore how the existing scheduling techniques can be extended to handle the additional cost criterion. Due to the dynamic nature of distributed systems, one of the major requirements to scheduling algorithms is related to their speed. The heuristics we propose are fast and, as we show in our experiments, they compare favourably with the existing scheduling algorithms for distributed systems.

Anastasia Grekioti, Natalia V. Shakhlevich
Scheduling Moldable Tasks with Precedence Constraints and Arbitrary Speedup Functions on Multiprocessors

Due to the increasing number of cores of current parallel machines, the question arises to which cores parallel tasks should be mapped. Thus, parallel task scheduling is now more relevant than ever, especially under the moldable task model, in which tasks are allocated a fixed number of processors before execution. Scheduling algorithms commonly assume that the speedup function of moldable tasks is either non-decreasing, sub-linear or concave. In practice, however, the resulting speedup of parallel programs on current hardware with deep memory hierarchies is most often neither non-decreasing nor concave.

We present a new algorithm for the problem of scheduling moldable tasks with precedence constraints for the makespan objective and for arbitrary speedup functions. We show through simulation that the algorithm not only creates competitive schedules for moldable tasks with arbitrary speedup functions, but also outperforms other published heuristics and approximation algorithms for non-decreasing speedup functions.

Sascha Hunold
OStrich: Fair Scheduling for Multiple Submissions

Campaign Scheduling is characterized by multiple job submissions issued from multiple users over time. This model perfectly suits today’s systems since most available parallel environments have multiple users sharing a common infrastructure. When scheduling individually the jobs submitted by various users, one crucial issue is to ensure

fairness

. This work presents a new fair scheduling algorithm called

OStrich

whose principle is to maintain a virtual time-sharing schedule in which the same amount of processors is assigned to each user. The completion times in the virtual schedule determine the execution order on the physical processors. Then, the campaigns are interleaved in a fair way by

OStrich

. For independent sequential jobs, we show that

OStrich

guarantees the

stretch

of a campaign to be proportional to campaign’s size and the total number of users. The

stretch

is used for measuring by what factor a workload is slowed down relative to the time it takes on an unloaded system. The theoretical performance of our solution is assessed by simulating

OStrich

compared to the classical FCFS algorithm, issued from synthetic workload traces generated by two different user profiles. This is done to demonstrate how

OStrich

benefits both types of users, in contrast to FCFS.

Joseph Emeras, Vinicius Pinheiro, Krzysztof Rzadca, Denis Trystram
Fair Share Is Not Enough: Measuring Fairness in Scheduling with Cooperative Game Theory

We consider the problem of fair scheduling in a multi-organizational system in which organizations contribute their own resources to the global pool and the jobs to be processed on the common resources. We consider on-line, non-clairvoyant scheduling of sequential jobs without preemption. To ensure that the organizations are willing to cooperate the scheduling algorithm must be fair.

To characterize fairness, we use a cooperative game theory approach. The contribution of an organization is computed based on how this organization influences the utility (which can be any metric, e.g., flow time, turnaround, resource allocation) of all organizations. Formally, the contribution of the organization is its Shapley value in the cooperative game. The scheduling algorithm should ensure that the contributions of the organizations are close to their utilities. Our previous work proves that this problem is NP-hard and hard to approximate.

In this paper we propose a heuristic scheduling algorithm for the fair scheduling problem. We experimentally evaluate the heuristic and compare its fairness to fair share, round robin and the exact exponential algorithm. Our results show that fairness of the heuristic algorithm is close to the optimal. The difference between our heuristic and the fair share algorithm is more visible on longer traces with more organizations. These results show that assigning static target shares (as in the fair share algorithm) is not fair in multi-organizational systems and that instead dynamic measures of organizations’ contributions should be used.

Piotr Skowron, Krzysztof Rzadca
Setting up Clusters of Computing Units to Process Several Data Streams Efficiently

Let us consider an upper bounded number of data streams to be processed by a Divisible Load application. The total workload is unknown and the available speeds for communicating and computing may be poorly a priori estimated. This paper presents a resource selection method that aims at maximizing the throughput of this processing. From a set of processing units linked by a network, this method consists in forming an optimal set of master-workers clusters. Results of simulations are presented to assess the efficiency of this method experimentally. Before focusing on the proposed resource selection method, the paper comes back on the adaptive scheduling method on which it relies.

Daniel Millot, Christian Parrot

The 5th Workshop on Language-Based Parallel Programming Models (WLPP 2013)

Frontmatter
Towards Standardization of Measuring the Usability of Parallel Languages

The efforts of the research community and the software industry to make the art of parallel programming easier continue. Measuring the usability of contemporary parallel programming languages and libraries by empirical studies is the key to understanding how programmers are thinking, designing, coding, and debugging parallel programs. In this paper we take apart into their component ingredients the empirical experiments done in the recent years. By analyzing each component separately we can better understand what is missing in these experiments and thereby improve the outcome of future studies. The result of this work is a set of recommendations that aims to make usability studies more convincing so that parallel language designers will take them seriously.

Ami Marowka
Experiences with Implementing Task Pools in Chapel and X10

The Partitioned Global Address Space (PGAS) model is a promising approach to combine programmability and performance in an architecture-independent way. Well-known representatives of PGAS languages include Chapel and X10. Both languages incorporate object orientation, but fundamentally differ in their way of accessing remote memory as well as in synchronization constructs and other issues of language design.

This paper reports on and compares experiences in using the languages. We concentrate on the interplay between object orientation and parallelism/distribution, and other issues of coding task parallelism. In particular, we discuss the realization of patterns such as objects that internally contain distributed arrays, and suggest improvements such as support for activity-local and place-local data, as well as scalar variable-based reduction. Our study is based on Unbalanced Tree Search (UTS), a well-known benchmark that uses task pools.

Claudia Fohry, Jens Breitbart
Parampl: A Simple Approach for Parallel Execution of AMPL Programs

Due to the physical processor frequency scaling constraint, current computer systems are equipped with more and more processing units. Therefore, parallel computing has become an important paradigm in the recent years. AMPL is a comprehensive algebraic modeling language for formulating optimization problems. However, AMPL itself does not support defining tasks to be executed in parallel. Although in last years the parallelism is often provided by solvers, which take advantage of multiple processing units, in many cases it is more efficient to formulate the problem in a decomposed way and apply various problem specific enhancements. Moreover, when the number of cores is permanently growing, it is possible to use both types of parallelism.

This paper presents the design of

Parampl

- a simple tool for parallel execution of AMPL programs.

Parampl

introduces explicit asynchronous execution of AMPL subproblems from within the program code. Such an extension implies a new view on AMPL programs, where a programmer is able to define complex, parallelized optimization tasks and formulate algorithms solving optimization subproblems in parallel.

Artur Olszak, Andrzej Karbowski
Prototyping Framework for Parallel Numerical Computations

Our research is focused on the simplification of parallel programming for distributed memory systems. Our goal is to build a unifying framework for creating, debugging, profiling, and verifying parallel applications. The result of this effort is an open source tool Kaira. In this paper, we focus on prototyping of parallel applications. We have extended Kaira by the ability to generate parallel libraries. More precisely, we present a framework for fast prototyping of parallel numerical computations. We demonstrate our idea on a combination of parallel libraries generated by our tool Kaira and GNU Octave. Hence, a user can verify the idea in a short time, create a real running program and verify its performance and scalability.

Ondřej Meca, Stanislav Böhm, Marek Běhálek, Martin Šurkovský
Algorithms for In-Place Matrix Transposition

This paper presents an implementation of an in-place swap-based algorithm for transposing rectangular matrices, and a proof of correctness is also sketched. The implementation is based on an algorithm described by Tretyakov and Tyrtyshnikov [

4

], but we have introduced a number of variations. In particular, we show how the original algorithm can be modified to require constant additional memory. We also identify opportunities for exploiting parallelism.

Fred G. Gustavson, David W. Walker
FooPar: A Functional Object Oriented Parallel Framework in Scala

We present FooPar, an extension for highly efficient Parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our framework FooPar combines these features with parallel computing techniques. FooPar is designed to be modular and supports easy access to different communication backends for distributed memory architectures as well as high performance math libraries. In this article we use it to parallelize matrix-matrix multiplication and show its scalability by a isoefficiency analysis. In addition, results based on a empirical analysis on two supercomputers are given. We achieve close-to-optimal performance wrt. theoretical peak performance. Based on this result we conclude that FooPar allows programmers to fully access Scalas design features without suffering from performance drops when compared to implementations purely based on C and MPI.

Felix Palludan Hargreaves, Daniel Merkle
Effects of Segmented Finite Difference Time Domain on GPU

Finite Difference Time Domain (FDTD) is the most popular method in computational electromagnetics. In acoustics, FDTD is often used as a numerical analysis technique to model mechanical wave and acoustics. FDTD in general is computationally expensive in terms of time due to its large number of time steps for accurate precision and is data parallel in nature. However, it is also memory bounded. Although previous work on FDTD has studied the effect of parallelizing FDTD on accelerators to reduce computational cost, the memory bounded problem has not been studied. In this work we consider the segmented FDTD (SFDTD) algorithm that divides the problem space into segments to reduce computational redundancy and also reduce memory. We exploit the memory hierarchy of the GPU to efficiently implement the SFDTD algorithm. To the best of our knowledge, this is the first work that studies the implementation of the SFDTD algorithm on GPU and its effect on memory.

Jose Juan Mijares Chan, Gagan Battoo, Parimala Thulasiraman, Ruppa K. Thulasiram
Optimization of an OpenCL-Based Multi-swarm PSO Algorithm on an APU

The multi-swarm particle swarm optimization (MPSO) algorithm incorporates multiple independent PSO swarms that cooperate by periodically exchanging information. In spite of its embarrassingly parallel nature, MPSO is memory bound, limiting its performance on data-parallel GPUs. Recently, heterogeneous multi-core architectures such as the AMD Accelerated Processing Unit (APU) have fused the CPU and GPU together on a single die, eliminating the traditional PCIe bottleneck between them. In this paper, we provide our experiences developing an OpenCL-based MPSO algorithm for the task scheduling problem on the APU architecture. We use the AMD A8-3530MX APU that packs four x86 computing cores and 80 four-way processing elements. We make effective use of hardware features such as the hierarchical memory structure on the APU, the four-way very long instruction word (VLIW) feature for vectorization, and global-to-local memory DMA transfers. We observe a 29 % decrease in overall execution time over our baseline implementation.

Wayne Franz, Parimala Thulasiraman, Ruppa K. Thulasiram
Core Allocation Policies on Multicore Platforms to Accelerate Forest Fire Spread Predictions

Software simulators are developed to predict forest fire spread. Such simulators require several input parameters which usually are difficult to know accurately. The input data uncertainty can provoke a mismatch between the predicted forest fire spread and the actual evolution. To overcome this uncertainty a two stage prediction methodology is used. In the first stage a genetic algorithm is applied to find the input parameter set that best reproduces actual fire evolution. Afterwards, the prediction is carried out using the calibrated input parameter set. This method improves the prediction error, but increments the execution time in a context with hard time constraints. A new approach to speed up the two stage prediction methodology by exploiting multicore architectures is proposed. A hybrid MPI-OpenMP application has been developed and different allocation policies have been tested to accelerate the forest fire prediction with an efficient use of the available resources.

Tomàs Artés, Andrés Cencerrado, Ana Cortés, Tomàs Margalef

The 4th Workshop on Performance Evaluation of Parallel Applications on Large-Scale Systems

Frontmatter
The Effect of Parallelization on a Tetrahedral Mesh Optimization Method

A parallel algorithm for simultaneous untangling and smoothing of tetrahedral meshes is proposed in this paper. This algorithm is derived from a sequential mesh optimization method. We provide a detailed analysis of its parallel scalability and efficiency, load balancing, and parallelism bottlenecks using six benchmark meshes. In addition, the influence of three previously-published graph coloring techniques on the performance of our parallel algorithm is evaluated. We demonstrate that the proposed algorithm is highly scalable when run on a shared-memory computer with up to 128 Itanium 2 processors. However, some parallel deterioration is observed. Here, we analyze its main causes using a theoretical performance model and experimental results.

Domingo Benitez, Eduardo Rodríguez, José M. Escobar, Rafael Montenegro
Analysis of Partitioning Models and Metrics in Parallel Sparse Matrix-Vector Multiplication

Graph/hypergraph partitioning models and methods have been successfully used to minimize the communication among processors in several parallel computing applications. Parallel sparse matrix-vector multiplication (SpMxV) is one of the representative applications that renders these models and methods indispensable in many scientific computing contexts. We investigate the interplay of the partitioning metrics and execution times of SpMxV implementations in three libraries: Trilinos, PETSc, and an in-house one. We carry out experiments with up to 512 processors and investigate the results with regression analysis. Our experiments show that the partitioning metrics influence the performance greatly in a distributed memory setting. The regression analyses demonstrate which metric is the most influential for the execution time of the libraries.

Kamer Kaya, Bora Uçar, Ümit V. Çatalyürek
Achieving Memory Scalability in the Gysela Code to Fit Exascale Constraints

Gyrokinetic simulations lead to huge computational needs. Up to now, the semi-Lagrangian code

Gysela

performed large simulations using a few thousands cores (65k cores). But to understand more accurately the nature of the plasma turbulence, finer resolutions are wished which make

Gysela

a good candidate to exploit the computational power of future Exascale machines. Among the Exascale challenges, the less memory per core issue is one of the must critical. This paper deals with memory management in order to reduce the memory peak, and presents an approach to understand the memory behaviour of an application when dealing with very large meshes. This enables us to extrapolate the behaviour of

Gysela

for expected capabilities of Exascale machine.

Fabien Rozar, Guillaume Latu, Jean Roman
Probabilistic Analysis of Barrier Eliminating Method Applied to Load-Imbalanced Parallel Application

In order to reduce the overhead of barrier synchronization, we have proposed an algorithm which eliminates barrier synchronizations and evaluated its validity experimentally in our previous study. As a result, we have found that the algorithm is more effective to the load-imbalanced program than load-balanced program. However, the degree of the load balance has not been discussed quantitatively. In this paper, we model the behavior of parallel programs. In our model, the execution time of a phase contained in a parallel program is represented as a random variable. To investigate how the degree of the load balance influences the performance of our algorithm, we varied the coefficient of variation of probability distribution which the random variable follows. Using the model, we evaluated the execution time of parallel programs and found that theoretical results are consistent with experimental ones.

Naoki Yonezawa, Ken’ichi Katou, Issei Kino, Koichi Wada
Multi-GPU Parallel Memetic Algorithm for Capacitated Vehicle Routing Problem

The goal of this paper is to propose and test a new memetic algorithm for the capacitated vehicle routing problem in parallel computing environment. In this paper we consider a simple variation of the vehicle routing problem in which the only parameter is the capacity of the vehicle and each client only needs one package. We analyze the efficiency of the algorithm using the hierarchical Parallel Random Access Machine (PRAM) model and run experiments with code written in CUDA.

Mieczysław Wodecki, Wojciech Bożejko, Michał Karpiński, Maciej Pacut
Parallel Applications Performance Evaluation Using the Concept of Granularity

With the advent of multi-core processors and the growing popularity of local clusters installations, better understanding of parallel applications behaviour becomes a necessity. On the other hand performance evaluation constitutes an intrinsic part of every application development process. The performance analysis can be carried out analytically or through experiments. When using experimental approach, its results are based on wall-time measurements and requires consecutive application executions which is time-consuming. In the paper an alternative approach is proposed. Utilizing the decomposition of execution time, a separate analysis of the computation time and overheads related to parallel execution are used to calculating the granularity of application and then determining the efficiency of the application. The usefulness of the new technique has been evaluated by comparing its results with those of classical ones. The obtained results suggest that the presented method can be used for performance evaluation of parallel applications.

Jan Kwiatkowski

Workshop on Parallel Computational Biology (PBC 2013)

Frontmatter
Resolving Load Balancing Issues in BWA on NUMA Multicore Architectures

Running BWA in multithreaded mode on a multi-socket server results in poor scaling behaviour. This is because the current parallelisation strategy does not take into account the load imbalance that is inherent to the properties of the data being aligned, e.g. varying read lengths and numbers of mutations. Additional load imbalance is also caused by the BWA code not anticipating certain hardware characteristics of multi-socket multicores, such as the non-uniform memory access time of the different cores. We show that rewriting the parallel section using Cilk removes the load imbalance, resulting in a factor two performance improvement over the original BWA.

Charlotte Herzeel, Thomas J. Ashby, Pascal Costanza, Wolfgang De Meuter
K-mulus: Strategies for BLAST in the Cloud

With the increased availability of next-generation sequencing technologies, researchers are gathering more data than they are able to process and analyze. One of the most widely performed analysis is identifying regions of similarity between DNA or protein sequences using the Basic Local Alignment Search Tool, or BLAST. Due to the large amount of sequencing data produced, parallel implementations of BLAST are needed to process the data in a timely manner. While these implementations have been designed for those researchers with access to computing grids, recent web-based services, such as Amazon’s Elastic Compute Cloud, now offer scalable, pay-as-you-go computing. In this paper, we present K-mulus, an application that performs distributed BLAST queries via Hadoop MapReduce using a collection of established parallelization strategies. In addition, we provide a method to speedup BLAST by clustering the sequence database to reduce the search space for a given query. Our results show that users must take into account the size of the BLAST database and memory of the underlying hardware to efficiently carry out the BLAST queries in parallel. Finally, we show that while our database clustering and indexing approach offers a significant theoretical speedup, in practice the distribution of protein sequences prevents this potential from being realized.

Christopher M. Hill, Carl H. Albach, Sebastian G. Angel, Mihai Pop
Faster GPU-Accelerated Smith-Waterman Algorithm with Alignment Backtracking for Short DNA Sequences

In this paper, we present a

G

PU-accelerated

S

mith-

W

aterman (SW) algorithm with

A

lignment

B

acktracking, called GSWAB, for short DNA sequences. This algorithm performs all-to-all pairwise alignments and retrieves optimal local alignments on CUDA-enabled GPUs. To facilitate fast alignment backtracking, we have investigated a tile-based SW implementation using the CUDA programming model. This tiled computing pattern enables us to more deeply explore the powerful compute capability of GPUs. We have evaluated the performance of GSWAB on a Kepler-based GeForce GTX Titan graphics card. The results show that GSWAB can achieve a performance of up to 56.8 GCUPS on large-scale datasets. Furthermore, our algorithm yields a speedup of up to 53.4 and 10.9 over MSA-CUDA (the first stage) and gpu-pairAlign on the same hardware configurations.

Yongchao Liu, Bertil Schmidt
Accelerating String Matching on MIC Architecture for Motif Extraction

Identifying repeated factors that occur in a string of letters or common factors that occur in a set of strings represents an important task in computer science and biology. Such patterns are called

motifs

, and the process of identifying them is called

motif extraction

. In biology, motifs may correspond to functional elements in DNA, RNA, or protein molecules. In this article, we orchestrate

MoTeX

, a high-performance computing tool for MoTif eXtraction from large-scale datasets, on Many Integrated Core (MIC) architecture.

MoTeX

uses state-of-the-art algorithms for solving the fixed-length approximate string-matching problem. It comes in three flavors: a standard CPU version; an OpenMP version; and an MPI version. We compare the performance of our MIC implementation to the corresponding CPU version of

MoTeX

. Our MIC implementation accelerates the computations by a factor of ten compared to the CPU version. We also compare the performance of our MIC implementation to the corresponding OpenMP version of

MoTeX

running on modern Multicore architectures. Our MIC implementation accelerates the computations by a factor of two compared to the OpenMP version.

Solon P. Pissis, Christian Goll, Pavlos Pavlidis, Alexandros Stamatakis
A Parallel, Distributed-Memory Framework for Comparative Motif Discovery

The increasing number of sequenced organisms has opened new possibilities for the computational discovery of cis-regulatory elements (‘motifs’) based on phylogenetic footprinting. Word-based, exhaustive approaches are among the best performing algorithms, however, they pose significant computational challenges as the number of candidate motifs to evaluate is very high. In this contribution, we describe a parallel, distributed-memory framework for

de novo

comparative motif discovery. Within this framework, two approaches for phylogenetic footprinting are implemented: an alignment-based and an alignment-free method. The framework is able to statistically evaluate the conservation of motifs in a search space containing over 160 million candidate motifs using a distributed-memory cluster with 200 CPU cores in a few hours. Software available from

http://bioinformatics.intec.ugent.be/blsspeller/

Dieter De Witte, Michiel Van Bel, Pieter Audenaert, Piet Demeester, Bart Dhoedt, Klaas Vandepoele, Jan Fostier
Parallel Seed-Based Approach to Protein Structure Similarity Detection

Finding similarities between protein structures is a crucial task in molecular biology. Many tools exist for finding an optimal alignment between two proteins. These tools, however, only find one alignment even when multiple similar regions exist. We propose a new parallel heuristic-based approach to structural similarity detection between proteins that discovers multiple pairs of similar regions. We prove that returned alignments have

$$RMSDc$$

and

$$RMSDd$$

lower than a given threshold. Computational complexity is addressed by taking advantage of both fine- and coarse-grain parallelism.

Guillaume Chapuis, Mathilde Le Boudic - Jamin, Rumen Andonov, Hristo Djidjev, Dominique Lavenier

Minisymposium on Applications of Parallel Computation in Industry and Engineering

Frontmatter
A Parallel Solver for the Time-Periodic Navier–Stokes Equations

We investigate parallel algorithms for the solution of the Navier–Stokes equations in space-time. For periodic solutions, the discretized problem can be written as a large non-linear system of equations. This system of equations is solved by a Newton iteration. The Newton correction is computed using a preconditioned GMRES solver. The parallel performance of the algorithm is illustrated.

Peter Arbenz, Daniel Hupp, Dominik Obrist
Parallel Numerical Algorithms for Simulation of Rectangular Waveguides by Using GPU

In this article we consider parallel numerical algorithms to solve the 3D mathematical model, that describes a wave propagation in rectangular waveguide. The main goal is to formulate and analyze a minimal algorithmic template to solve this problem by using the CUDA platform. This template is based on explicit finite difference schemes obtained after approximation of systems of differential equations on the staggered grid. The parallelization of the discrete algorithm is based on the domain decomposition method. The theoretical complexity model is derived and the scalability of the parallel algorithm is investigated. Results of numerical simulations are presented.

Raimondas Čiegis, Andrej Bugajev, Žilvinas Kancleris, Gediminas Šlekas
OpenACC Parallelisation for Diffusion Problems, Applied to Temperature Distribution on a Honeycomb Around the Bee Brood: A Worked Example Using BiCGSTAB

We discuss a simple OpenACC implementation of the iterative BiCGSTAB algorithm for linear systems. Problems of this type arise in the numerical solution of diffusion-reaction problems, where the linear solver constitutes the most computationally expensive component of the simulation (

$$\sim $$

80 % of time spent) and therefore has often been the primary target for parallelization. We deploy and test this method on a desktop workstation with two supported GPUs, one targeted for high performance computing, one a consumer level GPU, to compute the temperature distribution on a honeycomb around the bee brood. The paper is written from a user’s, not from a GPU computing expert’s perspective and aims to fill a gap we noticed between real world application problems and the simple problems solved in introductory OpenACC tutorials or in benchmarking studies.

Hermann J. Eberl, Rangarajan Sudarsan
Application of CUDA for Acceleration of Calculations in Boundary Value Problems Solving Using PIES

The main purpose of this paper is examination of an application of modern parallel computing solutions to speed up the calculation in the numerical solution of parametric integral equations systems (PIES). Solving boundary value problems by PIES sometimes requires large computing time, particularly in more complex problems. This paper presents use of graphics cards programming in general-purpose applications (GPGPU). The boundary value problems modelled by 3D Navier-Lamé equations are solved using PIES and NVidia CUDA technology. The testing example shows that the use of GPGPU significantly increases speed of calculations in PIES.

Andrzej Kuzelewski, Eugeniusz Zieniuk, Agnieszka Boltuc
Modeling and Simulations of Beam Stabilization in Edge-Emitting Broad Area Semiconductor Devices

A 2+1 dimensional PDE traveling wave model describing spatial-lateral dynamics of edge-emitting broad area semiconductor devices is considered. A numerical scheme based on a split-step Fourier method is presented and implemented on a parallel compute cluster. Simulations of the model equations are used for optimizing of existing devices with respect to the emitted beam quality, as well as for creating and testing of novel device design concepts.

Mindaugas Radziunas, Raimondas Čiegis
Concurrent Nomadic and Bundle Search: A Class of Parallel Algorithms for Local Optimization

We present a family of algorithms for local optimization that exploit the parallel architectures of contemporary computing systems to accomplish significant performance enhancements. This capability is important for demanding real time applications, as well as, for problems with time–consuming objective functions. The proposed concurrent schemes namely

nomadic

and

bundle

search are based upon well established techniques such as quasi-Newton updates and line searches. The parallelization strategy consists of (a) distributed computation of an approximation to the Hessian matrix and (b) parallel deployment of line searches on different directions (bundles) and from different starting points (nomads). Preliminary results showed that the new parallel algorithms can solve problems in less iterations than their serial rivals.

Costas Voglis, Dimitrios G. Papageorgiou, Isaac E. Lagaris
Parallel Multi-objective Memetic Algorithm for Competitive Facility Location

A hybrid genetic algorithm for global multi-objective optimization is parallelized and applied to solve competitive facility location problems. The impact of usage of the local search on the performance of the parallel algorithm has been investigated. An asynchronous version of the parallel genetic algorithm with the local search has been proposed and investigated by solving competitive facility location problem utilizing hybrid distributed and shared memory parallel programming model on high performance computing system.

Algirdas Lančinskas, Julius Žilinskas
Parallelization of Encryption Algorithm Based on Chaos System and Neural Networks

In this paper, the results of parallelizing of encryption algorithm based on a chaos system and neural networks are presented. A data dependence analysis of loops was applied in order to parallelize the algorithm. The parallelism of the algorithm is demonstrated in accordance with the OpenMP standard. As a result of my study, it was stated that the most time-consuming loops of the algorithm are suitable for parallelization. The efficiency measurement of a parallel program is showed.

Dariusz Burak

Minisymposium on HPC Applications in Physical Sciences

Frontmatter
Simulations of the Adsorption Behavior of Dendrimers

Using Monte Carlo simulations we study adsorption of dendrimers with flexible spacers onto a flat surface in a wide range of molecular weight,

$$N$$

, generation number,

$$G$$

, spacer length,

$$S$$

, and the monomer-surface interaction strength parameter,

$$\tau $$

. Our calculations indicate that for large values of

$$N$$

the dendrimers exist in three

$$\tau $$

-dependent regions referred to as non-adsorbed, critical and adsorbed. Slightly below the critical point of adsorption,

$$\tau _c$$

, a weakly adsorbed state is approached in which the molecules stick to the surface and are spherical in shape. By further lowering

$$\tau $$

below a spacer-length dependent value,

$$\tau ^*(S)<\tau _c$$

, a jumplike transition into a strongly adsorbed state occurs. Here, the dendrimers become flat and their lateral size is described by a 2D mean-field model.

Jarosław S. Kłos, Jens U. Sommer
An Optimized Lattice Boltzmann Code for BlueGene/Q

In this paper we describe an optimized implementation of a Lattice Boltzmann (LB) code on the BlueGene/Q system, the latest generation massively parallel system of the BlueGene family. We consider a state-of-art LB code, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equations of state of a perfect gas. The regular structure of LB algorithms offers several levels of algorithmic parallelism that can be matched by a massively parallel computer architecture. However the complex memory access patterns associated to our LB model make it not trivial to efficiently exploit all available parallelism. We describe our implementation strategies, based on previous experience made on clusters of many-core processors and GPUs, present results and analyze and compare performances.

Marcello Pivanti, Filippo Mantovani, Sebastiano Fabio Schifano, Raffaele Tripiccione, Luca Zenesini
A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW

In one of the most important methods in Density Functional Theory – the Full-Potential Linearized Augmented Plane Wave (FLAPW) method – dense generalized eigenproblems are organized in long sequences. Moreover each eigenproblem is strongly correlated to the next one in the sequence. We propose a novel approach which exploits such correlation through the use of an eigensolver based on subspace iteration and accelerated with Chebyshev polynomials. The resulting solver, parallelized using the Elemental library framework, achieves excellent scalability and is competitive with current dense parallel eigensolvers.

Mario Berljafa, Edoardo Di Napoli
Sequential Monte Carlo in Bayesian Assessment of Contaminant Source Localization Based on the Sensors Concentration Measurements

Accidental atmospheric releases of hazardous material pose great risks to human health and the environment. In this context it is valuable to develop the emergency action support system, which can quickly identify probable location and characteristics of the release source based on the measurement of dangerous substance concentration by the sensors network. In this context Bayesian approach occurs as a powerful tool being able to combine observed data along with prior knowledge to gain a current understanding of unknown model parameters.

We have applied the methodology combining Bayesian inference with Sequential Monte Carlo (SMC) to the problem of the atmospheric contaminant source localization. The algorithm input data are the on-line arriving concentrations of given substance registered by the distributed sensor’s network.

We have proposed the different version of the Hybrid SMC along with Markov Chain Monte Carlo (MCMC) algorithms and examined its effectiveness to estimate the probabilistic distributions of atmospheric release parameters. The proposed algorithms scan 5-dimensional parameters’ space searching for the contaminant source coordinates, release strength and atmospheric transport dispersion coefficients.

Anna Wawrzynczak, Piotr Kopka, Mieczyslaw Borysiewicz
Effective Parallelization of Quantum Simulations: Nanomagnetic Molecular Rings

The effective parallelization of processing exploiting the MPI library for the numerically exact quantum transfer matrix (QTM) and exact diagonalization (ED) deterministic simulations of chromium-based rings is proposed. In the QTM technique we have exploited parallelization of summation in the partition function. The efficiency of the QTM calculations is above

$$80\,\%$$

up to about

$$1000$$

processes. With our test programs we calculated low temperature torque, specific heat and entropy for the chromium ring Cr

$$_8$$

exploiting realistic Hamiltonian with single-ion anisotropy and the alternation of the nearest neighbor exchange couplings. Our parallelized ED technique makes use of the self-scheduling scheme and the longest processing time algorithm to distribute and diagonalize separate blocks of a Hamiltonian matrix by slave processes. Its parallel processing scales very well, with efficiency above

$$90\,\%$$

up to about 10 processes only. This scheme is improved by processing more input data sets in one job which leads to very good scalability up to arbitrary number of processes. The scaling is improved for both techniques when larger systems are considered.

Piotr Kozłowski, Grzegorz Musiał, Michał Antkowiak, Dante Gatteschi
DFT Study of the Cr $$_8$$ Molecular Magnet Within Chain-Model Approximations

We present a density functional theory (DFT) study of the electronic and magnetic properties of the Cr

$$_8$$

molecular ring. The all-electron linearized augmented plane wave method (LAPW) implemented in the Wien2k package and pseudopotential method implemented in SIESTA package are used to calculate the electronic states, exchange coupling parameters of an infinite chain model system of Cr

$$_8$$

. We demonstrate how, under opportune modifications to the ring cycle structure, different one-dimensional chain models can be devised, with the capability of mimicking with good approximation the electronic and magnetic properties of the original Cr

$$_8$$

molecule. Such models offer an unique opportunity, in virtue of the reduced computational effort, to carry out extensive investigations of a whole set of molecules belonging to the Cr-based molecular rings family.

Valerio Bellini, Daria M. Tomecka, Bartosz Brzostowski, Michał Wojciechowski, Filippo Troiani, Franca Manghi, Marco Affronte
Non-perturbative Methods in Phenomenological Simulations of Ring-Shape Molecular Nanomagnets

Two non-perturbative numerically exact methods: exact diagonalization and quantum transfer matrix are applied to computationally complex Heisenberg-like spin models of ring shaped molecular nanomagnets and implemented in the high performance computing environment. These methods are applicable to the wide class of ring-shaped nanomagnets. For the hypothetical antiferromagnetic nanomagnet Ni

$$_{12}$$

the influence of single-ion anisotropy on the ground states is investigated. For Cr

$$_8$$

it is demonstrated that the alternation of the nearest-neighbor bilinear exchange couplings leads to small changes in the magnetic torque with respect to the uniformly coupled system. Specific heat and entropy for Cr

$$_8$$

are showed to be good indicators of crossing fields. The applicability of the Lande rule to both systems is checked.

Piotr Kozłowski, Grzegorz Musiał, Monika Haglauer, Wojciech Florek, Michał Antkowiak, Filippo Esposito, Dante Gatteschi
Non-uniform Quantum Spin Chains: Simulations of Static and Dynamic Properties

Since magnetic materials are often composed of magnetically isolated chains, their magnetic properties can be described by the one-dimensional quantum Heisenberg model. The quantum transfer matrix (QTM) method based on a checkerboard structure has been applied for quantum alternating spin chains. To increase the length of the transfer matrix in the Trotter direction we apply the density-matrix renormalization technique and check the efficiency of parallelization for a part of the code: the construction of the transfer matrix. Next, using the Matrix Product State representation, the time evolution of the ground-state magnetization has been performed after the sudden change in applied field.

Artur Barasiński, Bartosz Brzostowski, Ryszard Matysiak, Paweł Sobczak, Dariusz Woźniak

Minisymposium on Applied High Performance Numerical Algorithms in PDEs

Frontmatter
A Domain Decomposition Method for Discretization of Multiscale Elliptic Problems by Discontinuous Galerkin Method

In this paper boundary value problems for second order elliptic equations with highly discontinuous coefficients are considered on a 2D polygonal region. The problems are discretized by a discontinuous Galerkin (DG) with finite element method (FEM) on triangular elements using piecewise linear functions.

The goal is to design and analyze a parallel algorithm for solving the discrete problem whose rate of convergence is independent of the jumps of the coefficients. The method discussed is an additive Schwarz method (ASM) which belongs to a class of domain decomposition methods and is one of the most efficient parallel algorithm for solving discretizations of PDEs.

It turns out that the convergence of the method presented here is almost optimal and only weakly depends on the jumps of coefficients. The suggested method is very well suited for parallel computations.

Maksymilian Dryja
Parallel Preconditioner for the Finite Volume Element Discretization of Elliptic Problems

In this paper we present a parallel preconditioner for the standard Finite Volume (FV) discretization of elliptic problems, using the standard continuous piecewise linear Finite Element (FE) function space. The proposed preconditioner is constructed using an abstract framework of the Additive Schwarz Method, and is fully parallel. The convergence rate of the Generalized Minimal Residual (GMRES) method with this preconditioner is shown to be almost optimal, i.e., it depends poly-logarithmically on the mesh sizes.

Leszek Marcinkowski, Talal Rahman
Preconditioning Iterative Substructuring Methods Using Inexact Local Solvers

We consider several block preconditioners for iterative substructuring algorithms with inexact subdomain solvers, including incomplete Cholesky and V-cycle multigrid. Numerical results show that block triangular preconditioners are very competitive and in certain cases outperform presently used preconditioners based on full block triangular decomposition.

Piotr Krzyzanowski
Additive Schwarz Method for Nonsymmetric Local Discontinuous Galerkin Discretization of Elliptic Problem

In this paper we design two-level additive Schwarz method method for non-symmetric, elliptic problem in two dimensions with the use of discerization by local discontinuous Galerkin method (LDG). To construct the preconditioner, we use the domain decomposition method. We also want to show the result of numerical tests regarding to this preconditioner. Condition of the preconditioned system does not depend on the size of fine mesh

$$h$$

, but only on the ratio of the coarse mesh size

$$H$$

and the overlap measure

$$\delta $$

.

Filip Z. Klawe
Fast Numerical Method for 2D Initial-Boundary Value Problems for the Boltzmann Equation

We present a new numerical scheme for the initial-boundary value problem for the Boltzmann equation in two-dimensional physical space. It is based on a splitting procedure in which the collision equation is solved using the adaptive algorithm for the computation of the full three-dimensional Boltzmann collision operator on non-uniform velocity grids introduced in the previous paper by the authors. The computation of the collision operator is performed in parallel for every physical grid cell. For the two-dimensional transport equation we use a second order finite volume method. The numerical example showing the effectiveness of our method is given.

Alexei Heintz, Piotr Kowalczyk
Simulating Phase Transition Dynamics on Non-trivial Domains

Our goal is to investigate the influence of the geometry and topology of the domain

$$\varOmega $$

on the solutions of the phase transition and other diffusion-driven phenomena in

$$\varOmega $$

, modeled e.g. by the Allen–Cahn, Cahn–Hilliard, reaction–diffusion equations. We present FEM numerical schemes for the Allen–Cahn and Cahn–Hilliard equation based on the Eyre’s algorithm and present some numerical results on split and dumbbell domains.

Łukasz Bolikowski, Maria Gokieli
Variable Block Multilevel Iterative Solution of General Sparse Linear Systems

We present numerical results with a variable block multilevel incomplete LU factorization preconditioners for solving sparse linear systems arising, e.g., from the discretization of 2D and 3D partial differential equations on unstructured meshes. The proposed method automatically detects and exploits any available block structure in the matrix to maximize computational efficiency. Both sequential and parallel experiments are shown on selected matrix problems in different application areas, also against other standard preconditioners.

Bruno Carpentieri, Jia Liao, Masha Sosonkina
An Automatic Way of Finding Robust Elimination Trees for a Multi-frontal Sparse Solver for Radical 2D Hierarchical Meshes

In this paper we present a dynamic programming algorithm for finding optimal elimination trees for the multi-frontal direct solver algorithm executed over two dimensional meshes with point singularities. The elimination tree found by the optimization algorithm results in a linear computational cost of sequential direct solver. Based on the optimal elimination tree found by the optimization algorithm we construct heuristic sequential multi-frontal direct solver algorithm resulting in a linear computational cost as well as heuristic parallel multi-frontal direct solver algorithm resulting in a logarithmic computational cost. The resulting parallel algorithm is implemented on NVIDIA CUDA GPU architecture based on our graph-grammar approach.

Hassan AbouEisha, Piotr Gurgul, Anna Paszyńska, Maciek Paszyński, Krzysztof Kuźnik, Mikhail Moshkov
Parallel Efficiency of an Adaptive, Dynamically Balanced Flow Solver

Computations in Fluid Dynamics require minimisation of time in which the result could be obtained. While parallel techniques allow for handling of large problems, it is the adaptivity that ensures that computational effort is focused on interesting regions in time and space. Parallel efficiency, in a domain decomposition based approach, strongly depends on partitioning quality. For adaptive simulation partitioning quality is lost due to the dynamic modification of the computational mesh. Maintaining high efficiency of parallelization requires rebalancing of the numerical load. This paper presents performance results of an adaptive and dynamically balanced in-house flow solver. The results indicate that the rebalancing technique might be used to remedy to the adverse effects of adaptivity on overall parallel performance.

Stanislaw Gepner, Jerzy Majewski, Jacek Rokicki
Modification of the Newton’s Method for the Simulations of Gallium Nitride Semiconductor Devices

In this paper we present application of the Newton’s method for simulation of gallium nitride semiconductor devices in the steady state.

The drift-diffusion model of carrier transport in the semiconductor material is used. It consists of three nonlinear elliptic differential equations. We present a backtracking strategy for the coupled Newton’s method, which takes into account the specific nature of the drift-diffusion equations and improves convergence of the method.

Konrad Sakowski, Leszek Marcinkowski, Stanislaw Krukowski
Numerical Realization of the One-Dimensional Model of Burning Methanol
(Cluster Version)

This program developed for IBM Blue Gene/P is based on the so-called method of time splitting. While currently the MPI standard is used, in the future a version utilizing the OpenMP standard with shared memory will be developed as well.

Krzysztof Moszyński

Minisymposium on High Performance Computing Interval Methods

Frontmatter
A Shaving Method for Interval Linear Systems of Equations

We propose an iterative improvement method for an enclosure of the solution set of a system of interval linear equations. The method sequentially cuts off (shaves) parts of a given enclosure that contain no solution, yielding thus tighter enclosures. Since shaving can be done independently in the coordinates, the procedure is easily parallelized. Our approach is convenient for problems with wide input intervals, where traditional methods give poor enclosures. Finally, we present a limited computational study.

Milan Hladík, Jaroslav Horáček
Finding Enclosures for Linear Systems Using Interval Matrix Multiplication in CUDA

In this paper we present CUDA kernels that compute an interval matrix product. Starting from a naive implementation we investigate possible speedups using commonly known techniques from standard matrix multiplication. We also evaluate the achieved speedup when our kernels are used to accelerate a variant of an existing algorithm that finds an enclosure for the solution of a linear system. Moreover the quality of our enclosure is discussed.

Alexander Dallmann, Philip-Daniel Beck, Jürgen Wolff von Gudenberg
GPU Acceleration of Metaheuristics Solving Large Scale Parametric Interval Algebraic Systems

A study on efficient solving of parametric interval linear systems using GPU computing is presented. Several illustrative examples from structural mechanics are employed to show that the proposed approach can significantly reduce computation time for this kind of problems. The stress is put on large uncertainties which are usually hard to be dealt with other, less time-consuming methods.

Jerzy Duda, Iwona Skalna
Parallel Approach to Monte Carlo Simulation for Option Price Sensitivities Using the Adjoint and Interval Analysis

This paper concerns a new approach to evaluation of Option Price sensitivities using the Monte Carlo simulation, based on the parallel GPU architecture and Automatic Differentiation methods. In order to study rounding errors, the interval arithmetic is used. Considerations are based on two implementations of the algorithm – the sequential and parallel ones. For efficient differentiation, the Adjoint method is employed. Computational experiments include analysis of performance, uncertainty error and rounding error and consider Black-Scholes and Heston models.

Grzegorz Kozikowski, Bartłomiej Jacek Kubica
Subsquares Approach – A Simple Scheme for Solving Overdetermined Interval Linear Systems

In this work we present a new simple but efficient scheme – Subsquares approach – for development of algorithms for enclosing the solution set of overdetermined interval linear systems. We are going to show two algorithms based on this scheme and discuss their features. We start with a simple algorithm as a motivation, then we continue with an improved algorithm. Both algorithms can be easily parallelized. The features of both algorithms will be discussed and numerically tested.

Jaroslav Horáček, Milan Hladík
Using Quadratic Approximations in an Interval Method for Solving Underdetermined and Well-Determined Nonlinear Systems

This paper considers quadratic approximation as a narrowing tool in an interval branch-and-prune method. We seek the roots of such an approximate equation – a quadratic equation with interval parameters. Heuristics to decide, when to use the developed operator, are proposed. Numerical results for some benchmark problems are presented and analyzed.

Bartłomiej Jacek Kubica
The Definition of Interval-Valued Intuitionistic Fuzzy Sets in the Framework of Dempster-Shafer Theory

In this report, a critical analysis of conventional operations on interval-valued intuitionistic fuzzy values (

$$IVIFVs$$

) and their applicability to the solution of multiple criteria decision making (

$$MCDM$$

) problems in the interval-valued intuitionistic fuzzy setting are presented. It is shown that the classical definition of Atanassov’s interval-valued intuitionistic fuzzy set (

$$A$$

-

$$IVIFS$$

) may lead to controversial results. Therefore, a new more constructive definition of

$$A$$

-

$$IVIFS$$

is proposed. It is shown that this new definitions makes it possible to present

$$IVIFVs$$

in the framework of interval-extended Dempster-Shafer theory of evidence (

$$DST$$

) as belief intervals with bounds presented by belief intervals.

Ludmila Dymova, Pavel Sevastjanov
Interval Finite Difference Method for Solving the Problem of Bioheat Transfer Between Blood Vessel and Tissue

The paper concerns a problem of bioheat transfer between a single large blood vessel and a surrounding tissue. On the basis of the conventional finite difference scheme with the appropriate truncation error terms included, the interval finite difference method is proposed. The interval values that contain the local truncation error of the conventional scheme can be approximated in the way described.

Malgorzata A. Jankowska

Workshop on Complex Collective Systems

Frontmatter
Bridging the Gap: From Cellular Automata to Differential Equation Models for Pedestrian Dynamics

Cellular automata (CA) and ordinary differential equation (ODE) based models compete for dominance in microscopic pedestrian dynamics. Both are inspired by the idea that pedestrians are subject to forces. However, there are two major differences: In a CA, movement is restricted to a coarse grid and navigation is achieved directly by pointing the movement in the direction of the forces. Force based ODE models operate in continuous space and navigation is computed indirectly through the acceleration vector. We present two models emanating from the CA and ODE approaches that remove these two differences: the Optimal Steps Model and the Gradient Navigation Model. Both models are very robust and produce trajectories similar to each other, bridging the gap between the older models. Both approaches are grid-free and free of oscillations, giving cause to the hypothesis that the two major differences are also the two major weaknesses of the older models.

Felix Dietrich, Gerta Köster, Michael Seitz, Isabella von Sivers
Cellular Model of Pedestrian Dynamics with Adaptive Time Span

A cellular model of pedestrian dynamics based on the Floor Field model is presented. Contrary to the parallel update in Floor Field, the concept of adaptive time span is introduced. This concept, together with the concept of bounds, supports the spontaneous line formation and chaotic queue in front of the bottleneck. Model simulations are compared to the experiment “passing through”, from which a phase transition from low to high density is observed.

Marek Bukáček, Pavel Hrabák, Milan Krbálek
The Use of GPGPU in Continuous and Discrete Models of Crowd Dynamics

The aim of the study is twofold: firstly to compare the possibilities of using GPGPU (General-Purpose Computing on Graphics Processing Units) in continuous and discrete crowd dynamics simulation, secondly to draw conclusions on the applicability of GPUs in engines of professional crowd simulations. For this purpose the authors have implemented two models of pedestrian dynamics: continuous - Social Forces model and discrete, Cellular Automata based - Social Distances model. The presented simulations refer to outdoor, large area pedestrian movement.

Hubert Mróz, Jarosław Wąs, Paweł Topa
Modeling Behavioral Traits of Employees in a Workplace with Cellular Automata

The aim of this paper is to examine a parameterized working environment on the basis of behavioral traits of employees in an organization. Firstly we define several behavioral traits of the employees, including the employee’s attitude in the workplace, the influence radius and her/his reluctance to adapt to organizational norms, stated as insistence. The combination of these traits allows us to model employee interactions to a satisfactory extent for a realistic model of the working environment. Secondly, we define two metrics illustrating the policies adopted by the organization either to restrain unwanted or impose desirable behavioral patterns. Finally, the corresponding Cellular Automaton (CA) model enables us to utilize the aforementioned parameters and to simulate the under study workplace. The presented simulation results can be used as a complementary tool for managerial decisions illustrating workplace dynamics and forecast future trends.

Petros Saravakos, Georgios Ch. Sirakoulis
Probabilistic Pharmaceutical Modelling: A Comparison Between Synchronous and Asynchronous Cellular Automata

The field of pharmaceutical modelling has, in recent years, benefited from using probabilistic methods based on cellular automata, which seek to overcome some of the limitations of differential equation based models. By modelling discrete structural element interactions instead, these are able to provide data quality adequate for the early design phases in drug modelling. In relevant literature, both synchronous (CA) and asynchronous (ACA) types of automata have been used, without analysing their comparative impact on the model outputs. In this paper, we compare several variations of probabilistic CA and ACA update algorithms for building models of complex systems used in controlled drug delivery, analysing the advantages and disadvantages related to different modelling scenarios. Choosing the appropriate update mechanism, besides having an impact on the perceived realism of the simulation, also has practical benefits on the applicability of different model parallelisation algorithms and their performance when used in large-scale simulation contexts.

Marija Bezbradica, Heather J. Ruskin, Martin Crane
The Graph of Cellular Automata Applied for Modelling Tumour Induced Angiogenesis

Angiogenesis is the process of formation of vascular network. Blocking tumour induced angiogenesis is one of the treatments applied in oncology. Research involving computer simulations looking for the rules influencing the structure of vascular network and its functionality. This paper summarizes the applications of Graph of Cellular Automata modelling tool, developed by the Author, for modelling Tumour Induced Angiogenesis. Vascular network which is modelled by the graph interacts with surrounding tissue represented by the lattice of automata. The network is developed and reorganized accordingly to locally acting factors (stimulators and inhibitors). The model includes blood flow calculations in a modelled vascular network.

Paweł Topa
Neighborhood Selection and Rules Identification for Cellular Automata: A Rough Sets Approach

In this paper a method is proposed which uses data mining techniques based on rough sets theory to select neighborhood and determine update rule for cellular automata (CA). According to the proposed approach, neighborhood is detected by reducts calculations and a rule-learning algorithm is applied to induce a set of decision rules that define the evolution of CA. Experiments were performed with use of synthetic as well as real-world data sets. The results show that the introduced method allows identification of both deterministic and probabilistic CA-based models of real-world phenomena.

Bartłomiej Płaczek
Coupling Lattice Boltzmann Gas and Level Set Method for Simulating Free Surface Flow in GPU/CUDA Environment

We present here a proof-of-concept of a novel, efficient method for modeling of liquid/gas interface dynamics. Our approach consists in coupling the lattice Boltzmann gas (LBG) and the level set (LS) methods. The inherent parallel character of LBG accelerated by level sets is the principal advantage of our approach over similar particle based solvers. Consequently, this property allows for efficient use of our solver in GPU/CUDA environment. We demonstrate preliminary results and GPU/CPU speedups simulating two standard free surface fluid scenarios: the falling droplet and the breaking dam problems.

Tomir Kryza, Witold Dzwinel
Creation of Agent’s Vision of Social Network Through Episodic Memory

Human societies appear in many types of simulations. One of the most important and the most difficult society elements to be modelled is the social context. In this paper we show how social context can be provided using agents that are equipped with internal visions of social relations between others. Internal vision is a representation of social relations from the agent’s point of view so, being subjective, it may be inconsistent with the reality. We introduce the agent model and the mechanism of rebuilding the agent’s internal vision that is similar to that used by humans. An experimental proof of concepts is also presented.

Michał Wrzeszcz, Jacek Kitowski
The Influence of Multi-agent Cooperation on the Efficiency of Taxi Dispatching

The paper deals with the problem of the optimal collaboration scheme in taxi dispatching between customers, taxi drivers and the dispatcher. The authors propose three strategies that differ by the amount of information exchanged between agents and the intensity of cooperation between taxi drivers and the dispatcher. The strategies are evaluated by means of a microscopic multi-agent transport simulator (MATSim) coupled with a dynamic vehicle routing optimizer (DVRP Optimizer), which allows to realistically simulate dynamic taxi services as one of several different transport means, all embedded into a realistic environment. The evaluation is carried out on a scenario of the Polish city of Mielec. The results obtained prove that the cooperation between the dispatcher and taxi drivers is of the utmost importance, while the customer–dispatcher communication may be reduced to minimum and compensated by the use of more sophisticated dispatching strategies, thereby not affecting the quality of service.

Michał Maciejewski, Kai Nagel
Basic Endogenous-Money Economy: An Agent-Based Approach

We present an agent-based model of a simple endogenous-money economy. The model simulates agents representing individual persons who can work, consume, invent new products and related production technologies, apply for a loan from the bank and start up a business. Through the interaction of persons with the firms, we simulate the production of goods, consumption and labour market. This setting allows us to explore how an endogenous-money economy may build up from scratch, as an emergent property of actions and interactions among heterogeneous agents, once the money is being injected into a non-monetary self-production (or barter) economy. We provide and discuss the results of several computational experiments under three scenarios: (1) with just one firm, (2) with a limited number of firms and abundant workforce, (3) and with unlimited number of firms.

Ivan Blecic, Arnaldo Cecchini, Giuseppe A. Trunfio
Backmatter
Metadaten
Titel
Parallel Processing and Applied Mathematics
herausgegeben von
Roman Wyrzykowski
Jack Dongarra
Konrad Karczewski
Jerzy Waśniewski
Copyright-Jahr
2014
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-55195-6
Print ISBN
978-3-642-55194-9
DOI
https://doi.org/10.1007/978-3-642-55195-6