Skip to main content

Über dieses Buch

“There is something fascinating about science. One gets such wholesale returns of conjecture out of such a tri?ing investment of fact. ” Mark Twain, Life on the Mississippi The challenges in succeeding with computational science are numerous and deeply a?ect all disciplines. NSF’s 2006 Blue Ribbon Panel of Simulation-Based 1 Engineering Science (SBES) states ‘researchers and educators [agree]: com- tational and simulation engineering sciences are fundamental to the security and welfare of the United States. . . We must overcome di?culties inherent in multiscale modeling, the development of next-generation algorithms, and the design. . . of dynamic data-driven application systems. . . We must determine better ways to integrate data-intensive computing, visualization, and simulation. - portantly,wemustoverhauloureducationalsystemtofostertheinterdisciplinary study. . . The payo?sformeeting these challengesareprofound. ’The International Conference on Computational Science 2009 (ICCS 2009) explored how com- tational sciences are not only advancing the traditional hard science disciplines, but also stretching beyond, with applications in the arts, humanities, media and all aspects of research. This interdisciplinary conference drew academic and industry leaders from a variety of ?elds, including physics, astronomy, mat- matics,music,digitalmedia,biologyandengineering. Theconferencealsohosted computer and computational scientists who are designing and building the - ber infrastructure necessary for next-generation computing. Discussions focused on innovative ways to collaborate and how computational science is changing the future of research. ICCS 2009: ‘Compute. Discover. Innovate. ’ was hosted by the Center for Computation and Technology at Louisiana State University in Baton Rouge.



e-Science Applications and Systems


Electronic Structure Calculations and Adaptation Scheme in Multi-core Computing Environments

Multi-core processing environments have become the norm in the generic computing environment and are being considered for adding an extra dimension to the execution of any application. The T2 Niagara processor is a very unique environment where it consists of eight cores having a capability of running eight threads simultaneously in each of the cores. Applications like General Atomic and Molecular Electronic Structure (GAMESS), used for ab-initio molecular quantum chemistry calculations, can be good indicators of the performance of such machines and would be a guideline for both hardware designers and application programmers. In this paper we try to benchmark the GAMESS performance on a T2 Niagara processor for a couple of molecules. We also show the suitability of using a middleware based adaptation algorithm on GAMESS on such a multi-core environment.

Lakshminarasimhan Seshagiri, Masha Sosonkina, Zhao Zhang

A Fuzzy Logic Fish School Model

This paper summarizes our work on fuzzy modeling for an Individual-oriented Model (IoM). Our model is particularly geared toward simulating the movement of fish schools, derived from the model by Huth and Wissel. The background and motivation for the problem as well as a description of biological model are given here. A fuzzy logic implementation is discussed based on the mathematical model proposed. Finally, the experiments performed to demonstrate that our model represents the real behavior of fish schools. Keywords: Fuzzy Logic, Individual-oriented Models (IoM), Fish School.

Juan Carlos González, Christianne Dalforno, Remo Suppi, Emilio Luque

An Open Domain-Extensible Environment for Simulation-Based Scientific Investigation (ODESSI)

In scientific domains where discovery is driven by simulation modeling there are found common methodologies and procedures applied for scientific investigation. ODESSI (Open Domain-extensible Environment for Simulation-based Scientific Investigation) is an environment to facilitate the representation and automatic conduction of scientific studies by capturing common methods for experimentation, analysis, and evaluation used in simulation science. Specific methods ODESSI will support include

parameter studies




uncertainty quantification

, and

sensitivity analysis

. By making these methods accessible in a programmable framework, ODESSI can be used to capture and run domain-specific investigations. ODESSI is demonstrated for a problem in the neuroscience domain involving computational modeling of human head electromagnetics for conductivity analysis and source localization.

Adnan M. Salman, Allen D. Malony, Matthew J. Sottile

Pattern-Based Genetic Algorithm Approach to Coverage Path Planning for Mobile Robots

Sensor-based mobile robot coverage path planning (MRCPP) problem is a challenging problem in robotic management. We here develop a genetic algorithm (GA) for MRCPP problems. The area subject to coverage is modeled with disks representing the range of sensing devices. Then the problem is defined as finding a path which runs through the center of each disk at least once with minimal cost of full coverage. The proposed GA utilizes prioritized neighborhood-disk information to generate practical and high-quality paths for the mobile robot. Prioritized movement patterns are designed to generate efficient rectilinear coverage paths with no narrow-angle turn; they enable GA to find optimal or near-optimal solutions. The results of GA are compared with a well-known approach called backtracking spiral algorithm (BSA). Experiments are also carried out using P3-DX mobile robots in the laboratory environment.

Muzaffer Kapanoglu, Metin Ozkan, Ahmet Yazıcı, Osman Parlaktuna

Economic Models with Chaotic Money Exchange

This paper presents a novel study on gas-like models for economic systems. The interacting agents and the amount of exchanged money at each trade are selected with different levels of randomness, from a purely random way to a more chaotic one. Depending on the interaction rules, these statistical models can present different asymptotic distributions of money in a community of individuals with a closed economy.

Carmen Pellicer-Lostao, Ricardo López-Ruiz

Knowledge Aware Bisimulation and Anonymity

We propose a knowledge aware bisimulation equivalence relation on the Calculus of Applied Pi Calculus. Applied Pi is well-known for discribing and analyzing security protocols. Our equivalence relation is especially useful in analyzing the property of anonymity. We give an analysis of iKP anonymity as a running example to show the effectiveness of this approach.

Han Zhu, Yonggen Gu, Xiaojuan Cai

A Parallel High-Order Discontinuous Galerkin Shallow Water Model

The depth-integrated shallow water equations are frequently used for simulating geophysical flows, such as storm-surges, tsunamis and river flooding. In this paper a parallel shallow water solver using an unstructured high-order discontinuous Galerkin method is presented. The spatial discretization of the model is based on the Nektar++ spectral/


library and the model is numerically shown to exhibit the expected exponential convergence. The parallelism of the model has been achieved within the Cactus Framework. The model has so far been executed successfully on up to 128 cores and it is shown that both weak and strong scaling are largely independent of the spatial order of the scheme. Results are also presented for the wave flume interaction with five upright cylinders.

Claes Eskilsson, Yaakoub El-Khamra, David Rideout, Gabrielle Allen, Q. Jim Chen, Mayank Tyagi

CelOWS: A Service Oriented Architecture to Define, Query and Reuse Biological Models

The amount of information generated by biological research has lead to an intensive use of models. Mathematical and computational modeling needs accurate description to share, reuse and simulate models as formulated by original authors. In this paper, we introduce the Cell Component Ontology - CelO, expressed in OWL-DL. This ontology captures both the structure of a cell model and the properties of functional components. We use this ontology in a Web project – CelOWS - to describe, query and compose CellML models. It aims to improve reuse and composition of existent components and allow semantic validation of new models.

Ely Edison Matos, Fernanda Campos, Regina Braga, Rodrigo Weber, Daniele Palazzi

Benefits of Parallel I/O in Ab Initio Nuclear Physics Calculations

Many modern scientific applications rely on highly parallel calculations, which scale to 10’s of thousands processors. However, most applications do not concentrate on parallelizing input/output operations. In particular, sequential I/O has been identified as a bottleneck for the highly scalable

MFDn (Many Fermion Dynamics for nuclear structure)

code performing

ab initio

nuclear structure calculations. In this paper, we develop interfaces and parallel I/O procedures to use a well-known parallel I/O library in MFDn. As a result, we gain efficient input/output of large datasets along with their portability and ease of use in the downstream processing.

Nikhil Laghave, Masha Sosonkina, Pieter Maris, James P. Vary

A Population-Based Approach for Diversified Protein Loop Structure Sampling

Protein loop structure modeling is regarded as a mini protein folding problem with significant scientific importance. Efficiently sampling the loop conformation space is a key step to computationally obtain accurate loop structure models. Due to the large size of the conformation space and the complication of the scoring functions describing protein energy, it is difficult to obtain broad, diverse coverage of the loop conformations with low energy (score). In this article, we present a new population-based approach to sample the backbone conformations of protein loops. The main advantage of the population-based approaches is that various selection schemes can be applied to enforce the conformations in a population to satisfy certain constraints. In our sampling approach, conformations are generated in the dihedral angles (




)-space and the Differential Evolution (DE) method is employed to implement dihedral angle crossover for generating new conformations. A diversity selection scheme is applied to achieve diversified sampling. Using a narrowing gap selection scheme, decoys satisfying loop closure condition are obtained by gradually eliminating conformations with large terminal gaps in a population. Our computational results on modeling long loop targets have shown diverse and broad coverage of the loop conformation space, which leads to consistently reaching the native-like decoys in the sampling process.

Yaohang Li

Parameter Space Exploration Using Scientific Workflows

In recent years there has been interest in performing parameter space exploration across “scientific workflows”, however, many existing workflow tools are not well suited to this. In this paper we augment existing systems with a small set of special “actors” that implement the parameter estimation logic. Specifically, we discuss a set of new Kepler actors that support both complete and partial sweeps based on experimental design techniques. When combined with a novel parallel execution mechanism, we are able to execute parallel sweeps and searches across workflows that run on distributed “Grid” infrastructure. We illustrate our new system with a case study in cardiac cell modelling.

David Abramson, Blair Bethwaite, Colin Enticott, Slavisa Garic, Tom Peachey

Interactive Parallel Analysis on the ALICE Grid with the PROOF Framework

The ALICE experiment at CERN LHC is intensively using a PROOF cluster for fast analysis and reconstruction. The current system (CAF - CERN Analysis Facility) consists of 120 CPU cores and about 45TB of local space. PROOF enables interactive parallel processing of data distributed on clusters of computers or multi-core machines. Subsets of selected data are automatically staged onto CAF from the Grid storage systems. However, a cluster of the size of CAF can only hold a fraction of the yearly 3PB data accumulated by ALICE. The impracticability to store and process such data volume in one single computing centre leads to the need to extend the concept of PROOF to the Grid paradigm.

Marco Meoni

A Comparison of Performance of Sequential Learning Algorithms on the Task of Named Entity Recognition for Indian Languages

We have taken up the issue of named entity recognition of Indian languages by presenting a comparative study of two sequential learning algorithms viz. Conditional Random Fields (CRF) and Support Vector Machine (SVM). Though we only have results for Hindi, the features used are language independent, and hence the same procedure could be applied to tag the named entities in other Indian languages like Telgu, Bengali, Marathi etc. that have same number of vowels and consonants. We have used CRF++ for implementing CRF algorithm and Yamcha for implementing SVM algorithm. The results show a superiority of CRF over SVM and are just a little lower than the highest results achieved for this task. This can be attributed to the non-usage of any pre-processing and post-processing steps. The system makes use of the contextual information of words along with various language independent features to label the Named Entities (NEs).

Awaghad Ashish Krishnarao, Himanshu Gahlot, Amit Srinet, D. S. Kushwaha

Managing Multi-concern Application Complexity in AspectSBASCO

AspectSBASCO is a new programming environment that integrates modern technologies (i.e. software components, parallel skeletons and aspects) to support the development of parallel and distributed scientific applications. This multi-paradigm approach provides high-level parallelism, reusability, interoperability and a clearer separation of concerns. This paper is focussed on a case study in which the programming model of AspectSBASCO is applied for the efficient solution of a relatively complex reaction-diffusion problem. In the application, the system of non-linear PDEs is solved in parallel using skeleton abstractions for domain decomposition methods. In addition, other concerns including distributed simulation persistence, mesh adaptation procedures, dynamic processor re-mapping and state variables communication are implemented in a modular way using (un)pluggable aspects. This style of application development leads to a better system decomposition, which is the key to improving software evolution, maintenance, productivity and reliability.

Manuel Díaz, Sergio Romero, Bartolomé Rubio, Enrique Soler, José M. Troya

Balancing Scientist Needs and Volunteer Preferences in Volunteer Computing Using Constraint Optimization

BOINC is a middleware for Volunteer Computing. In BOINC projects, heterogeneous resources distributed across the Internet are used for large-scale scientific simulations. The large need for resources in BOINC projects often competes with volunteer preferences: volunteers can impose limits on the use of their idle resources. Most of the time, maximum project performance can be achieved only when volunteer preferences are neglected.

To address this problem, we propose a novel optimization procedure based on constraint optimization techniques that actively allocates volunteer resources to improve project throughput and, at the same time, aims to preserve volunteer preferences. We show the increase in project throughput obtained with our approach and discuss the trade-off between volunteer preferences and project throughput.

James Atlas, Trilce Estrada, Keith Decker, Michela Taufer

Scheduling and Load Balancing


Hiding Communication Latency with Non-SPMD, Graph-Based Execution

Reformulating an algorithm to mask communication delays is crucial in maintaining scalability, but traditional solutions embed the overlap strategy into the application. We present an alternative approach based on dataflow, that factors the overlap strategy out of the application. Using this approach we are able to reduce communication delays, meeting and in many cases exceeding performance obtained with traditional hand coded applications.

Jacob Sorensen, Scott B. Baden

New Optimal Load Allocation for Scheduling Divisible Data Grid Applications

In many data grid applications, data can be decomposed into multiple independent sub-datasets and distributed for parallel execution and analysis. This property has been successfully employed by using Divisible Load Theory (DLT), which has been proved as a powerful tool for modeling divisible load problems in data-intensive grid. There are some scheduling models have been studied but no optimal solution has been reached due to the heterogeneity of the grids. This paper proposes a new model called Iterative DLT (IDLT) for scheduling divisible data grid applications. Recursive numerical closed form solutions are derived to find the optimal workload assigned to the processing nodes. Experimental results show that the proposed IDLT model obtains better solution than other models (almost optimal) in terms of



M. Othman, M. Abdullah, H. Ibrahim, S. Subramaniam

Dynamic Resizing of Parallel Scientific Simulations: A Case Study Using LAMMPS

Large-scale computational science simulations are a dominant component of the workload on modern supercomputers. Efficient use of high-end resources for these large computations is of considerable scientific and economic importance. However, conventional job schedulers limit flexibility in that they are ‘static’, i.e., the number of processors allocated to an application can not be changed at runtime. In earlier work, we described ReSHAPE, a system that eliminates this drawback by supporting dynamic resizability in distributed-memory parallel applications. The goal of this paper is to present a case study highlighting the steps involved in adapting a production scientific simulation code to take advantage of ReSHAPE. LAMMPS, a widely used molecular dynamics code, is the test case. Minor extensions to LAMMPS allow it to be resized using ReSHAPE, and experimental results show that resizing significantly improves overall system utilization as well as performance of an individual LAMMPS job.

Rajesh Sudarsan, Calvin J. Ribbens, Diana Farkas

Performance Evaluation of Collective Write Algorithms in MPI I/O

MPI is the


standard for message passing in parallel scientific applications. MPI-IO is a part of the MPI-2 specification defining file I/O operations in the MPI world. MPI-IO enables performance optimizations for collective file I/O operations as it acts as a portability layer between the application and the file system. The goal of this study is to optimize collective file I/O operations. Three different algorithms for performing collective I/O operations have been developed, implemented, and evaluated on a PVFS2 file system and over NFS. The results indicate that different algorithms promise the highest write bandwidth for different number of processors, application settings and file systems, making a one-size-fits-all solution inefficient.

Mohamad Chaarawi, Suneet Chandok, Edgar Gabriel

A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling

This paper presents an application-level non-blocking multicast scheme for dynamic DAG scheduling on large-scale distributed-memory systems. The multicast scheme takes into account both network topology and space requirement of routing tables to achieve scalability. Specifically, we prove that the scheme is deadlock-free and takes at most


steps to complete. The routing table chooses appropriate neighbors to store based on topology IDs and has a small space of




). Although built upon MPI point-to-point operations, the experimental results show that our scheme is significantly better than the simple flat-tree method and is comparable to vendor’s collective MPI operations.

Fengguang Song, Jack Dongarra, Shirley Moore

On the Origin of Grid Species: The Living Application

We present the living application, a method to autonomously manage applications on the grid. During its execution on the grid, the living application makes choices on the resources to use in order to complete its tasks. These choices can be based on the internal state, or on autonomously acquired knowledge from external sensors. By giving limited user capabilities to a living application, the living application is able to port itself from one resource topology to another. The application performs these actions at run-time without depending on users or external workflow tools. We have applied this new concept in a special case of a living application: the living simulation. Today, many simulations require a wide range of numerical solvers and run most efficiently if specialized nodes are matched to the solvers. The idea of the living simulation is that it decides itself which grid machines to use based on the numerical solver currently in use.

Derek Groen, Stefan Harfst, Simon Portegies Zwart

Applying Processes Rescheduling over Irregular BSP Application

This paper shows an evaluation of processes rescheduling over an irregular BSP (Bulk Synchronous Parallel) application. Such application is based on dynamic programming and its irregularity is presented through the variation of computation density along the matrix’ cells. We are using MigBSP model for processes rescheduling, which combines multiple metrics - Computation, Communication and Memory - to decide about processes migration. The main contribution of this paper includes the viability to use processes migration on irregular BSP applications. Instead to adjust the load of each process by hand, we presented that automatic processes rebalancing is an effortless technique to obtain performance. The results showed gains greater than 10% over our multi-cluster architecture. Moreover, an acceptable overhead from MigBSP was observed when no migrations happen during application execution.

Rodrigo da Rosa Righi, Laércio Lima Pilla, Alexandre Silva Carissimi, Philippe O. A. Navaux, Hans-Ulrich Heiss

Software Services and Tools


Support for Urgent Computing Based on Resource Virtualization

Virtualization technologies provide flexible execution environments that could bring important benefits for computational problems with strong deadlines. Large Grid infrastructures are becoming available nowadays and they could be a suitable environment to run such on-demand computations that might be used in decision-making processes. For these computation, we encounter the need to deliver as much resources as possible at particular times. These resources may be provided by different institutions belonging to a grid infrastructure but there are two important issues that must be satisfied. Firstly, all resources must be correctly configured and all the components needed by the application must be properly installed. If there is something small missing that is required then applications will fail. Secondly, the execution of urgent applications must be made quickly in order to produce useful results in time. If applications must wait in a queue, results might be useless because they are obtained too late. To address these issues, we describe a job management service, based on virtualization techniques, that avoids configuration problems and increases the number of available resources to run applications with critical deadlines. We describe the main components of our service that can be used on top of common batch queue systems and we show some experimental results that prove the benefits of applying time-sharing techniques on the virtual machines to increase the performance of urgent computations.

Andrés Cencerrado, Miquel Ángel Senar, Ana Cortés

Dynamic Software Updates for Accelerating Scientific Discovery

Distributed parallel applications often run for hours or even days before arriving to a result. In the case of such long-running programs, the initial requirements could change after the program has started executing. To shorten the time it takes to arrive to a result when running a distributed computationally-intensive application, this paper proposes leveraging the power and flexibility of dynamic software updates. In particular, to enable flexible dynamic software updates, we introduce a novel binary rewriting approach that is more efficient than the existing techniques. While ensuring greater flexibility in enhancing a running program for new requirements, our binary rewriting technique incurs only negligible performance overhead. We validate our approach via a case study of dynamically changing a parallel scientific simulation.

Dong Kwan Kim, Myoungkyu Song, Eli Tilevich, Calvin J. Ribbens, Shawn A. Bohner

Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes

The development of optimized codes is time-consuming and requires extensive architecture, compiler, and language expertise, therefore, computational scientists are often forced to choose between investing considerable time in tuning code or accepting lower performance. In this paper, we describe the first steps toward a fully automated system for the optimization of the matrix algebra kernels that are a foundational part of many scientific applications. To generate highly optimized code from a high-level MATLAB prototype, we define a three-step approach. To begin, we have developed a compiler that converts a MATLAB script into simple C code. We then use the polyhedral optimization system Pluto to optimize that code for coarse-grained parallelism and locality simultaneously. Finally, we annotate the resulting code with performance-tuning directives and use the empirical performance-tuning system Orio to generate many tuned versions of the same operation using different optimization techniques, such as loop unrolling and memory alignment. Orio performs an automated empirical search to select the best among the multiple optimized code variants. We discuss performance results on two architectures.

Boyana Norris, Albert Hartono, Elizabeth Jessup, Jeremy Siek

Automated Provenance Collection for CCA Component Assemblies

The problem of capturing provenance for computational tasks has recently received significant attention, due to the new set of beneficial uses (for optimization, debugging, etc.) of the recorded data. We develop a provenance collection system aimed at scientific applications that are based on the Common Component Architecture (CCA) that alleviates scientists from the responsibility to manually instrument code in order to collect provenance data. Our system collects provenance data at the granularity of component instances, by automatically recording all method invocations between them, including all input and output parameters. By relying on asynchronous communication and using optimizations to handle large data arrays, the overhead of our system is low-enough to allow continuous provenance collection.

Kostadin Damevski, Hui Chen

Modular, Fine-Grained Adaptation of Parallel Programs

We present a modular approach to realizing fine-grained adaptation of program behavior in a parallel environment. Using a compositional framework based on function call interception and manipulation, the adaptive logic to monitor internal program states and control the behavior of program modules is written and managed as a separate code, thus supporting centralized design of complex adaptation strategies for adapting to dynamic changes within an application. By ‘catching’ the functions that execute in synchronization across the parallel environment and inserting the adaptive logic operations at the intercepted control points, the proposed method provides a convenient way of synchronous adaptation without disturbing the parallel execution and communication structure already established in the original program. Applying our method to a CFD (computational fluid dynamics) simulation program to implement example adaptation scenarios, we demonstrate how effectively applications can change their behavior through fine-grained control.

Pilsung Kang, Naresh K. C. Selvarasu, Naren Ramakrishnan, Calvin J. Ribbens, Danesh K. Tafti, Srinidhi Varadarajan

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O

MPI-I/O is a part of the MPI-2 specification defining file I/O operations for parallel MPI applications. Compared to regular POSIX style I/O functions, MPI I/O offers features like the distinction between individual file pointers on a per-process basis and a shared file pointer across a group of processes. The objective of this study is the evaluation of various algorithms of shared file pointer operations for MPI-I/O. We present three algorithms to provide shared file pointer operations on file systems that do not support file locking. The evaluation of the algorithms is carried out utilizing a parallel PVFS2 file system on an InfiniBand cluster and a local ext3 file system using a 8-core SMP.

Ketan Kulkarni, Edgar Gabriel

Computer Networks


Load Balancing Scheme Based on Real Time Traffic in Wibro

The WiBro operates at the 2.3GHz broadband and the communication infrastructure is a cellular system. The WiBro is based on IEEE 802.16e standard and it is designed to maintain connectivity on mobile environment at a speed of up to 60 km/h. ACR(Access Control Router) manages several RAS(Radio Access Station). When mobile node moves to another domain from the present domain, which is managed by different ACR, MN sends Binding Update to HA (Home Agent) or CN (Correspondent Node). However ACR may be a single point of performance bottleneck because the ACR should not only handle signaling traffics but also process data tunneling traffic for all MNs registered in its domain. In this paper, we propose ACR load balancing method by priority queue. Quantitative results of the performance analysis show that our proposal has superior performance.

Wongil Park, Hyoungjin Kim

Hybrid Retrieval Mechanisms in Vehicle-Based P2P Networks

Mobile P2P networks have potential applications in many fields, making them a focus of current research. However, mobile P2P networks are subject to the limitations of transmission range, wireless bandwidth, and highly dynamic network topology, giving rise to many new challenges for efficient search. In this paper, we propose a hybrid search approach, which is automatic and economical in mobile P2P networks. The region covered by a mobile P2P network is partitioned into subregions, each of which can be identified by a unique ID and known to all peers. All the subregions then construct a mobile Kademlia (MKad) network. The proposed hybrid retrieval approach aims to utilize flooding-based and DHT-based schemes in MKad for indexing and searching according to designed utility functions. Our experiments show that the proposed approach is more accurate and efficient than existing methods.

Quanqing Xu, Heng Tao Shen, Zaiben Chen, Bin Cui, Xiaofang Zhou, Yafei Dai

An Algorithm for Unrestored Flow Optimization in Survivable Networks Based on p-Cycles

This paper provides compound algorithm for Unrestorable Flow Optimisation (UFO) problem formulated for computer networks protected by p-cycles, created on the base of mathematical model and solution approaches proposed in our complementary paper [1]. Components of the algorithm have been selected carefully and then experimentally tested in order to compose the best final algorithm. Results of the wide computer tests on common benchmarks have been also presented as well as some practical conclusions following from the research made.

Adam Smutnicki

Unrestored Flow Optimization in Survivable Networks Based on p-Cycles

This paper deals with Unrestorable Flow Optimisation (UFO) problem in networks protected by p-cycles. This novel protection technique is used as the efficient tool for ensuring survivability of computer networks. In this paper there have been formulated mathematical model of UFO problem, discussed its theoretical properties, and proposed the original solution algorithm based chiefly on metaheuristics. The algorithm combines k-shortest paths method, multi knapsack problem, p-cycles generator, linear programming and some local search procedures.

Adam Smutnicki

Simulation of Complex Systems


Hierarchical Modelling and Model Adaptivity for Gas Flow on Networks

We are interested in the simulation and optimization of gas transport in networks. Different regions of the network may be modelled by different equations. There are three models based on the Euler equations that describe the gas flow in pipelines qualitatively different: a nonlinear model, a semilinear model and a stationary also called algebraic model. For the whole network, adequate initial and boundary values as well as coupling conditions at the junctions are needed. Using adjoint techniques, one can specify model error estimators for the simplified models. A strategy to adaptively apply the different models in different regions of the network while maintaining the accuracy of the solution is presented.

Pia Bales, Oliver Kolb, Jens Lang

A Hierarchical Methodology to Specify and Simulate Complex Computational Systems

We introduce a novel methodology to formally specify complex multi-agent systems. Our approach allows us to redefine computational problems in terms of agents that perform certain tasks. In our view, a system is formed by the combination of atomic and complex agents. Atomic agents are in charge of executing atomic tasks while complex agents reunite and summarize the properties of their underlying atomic agents. Basically, our approach consists in specifying the smaller parts of the problem as atomic agents. Each atomic agent is in charge of executing a small transformation of resources. Afterwards, the system will recombine them to form complex agents that will embrace the knowledge of several atomic agents. All agents are located on a superstructure of communication cellules created to record the hierarchy of the tasks. In order to provide a useful framework, we have developed a tool that fully implements all the stages of the methodology.

César Andrés, Carlos Molinero, Manuel Núñez

GRID Resource Searching on the GridSim Simulator

Nowadays, the Grid is the focus of multiple researches. Our work is centered on Resource Management for Grids as it is an opened and current research area. Decentralized, scalable and efficient resource search algorithms are one of the key issues for resource management in large Grid systems. Resource search is required in order to allocate applications and data efficiently and to maintain the quality of service at runtime, just to mention some examples. In this work, we propose a scheme that presents essential characteristics for self-configuring search and is able to handle dynamic resources, such as memory capacity. Our approach consists on a hypercube topology connecting the nodes and a scalable and self-configuring search procedure. The algorithm improves the probability of reaching the alive nodes in the Grid even in the presence of non-alive ones (inaccessible, crashed or heavy loaded nodes). In this paper, after the theory’s description, we present some results obtained by running our search protocol on the GridSim simulator. We have evaluated 6 different metrics performing several resources searches and we show the arithmetic media for each measure.

Antonia Gallardo, Luis Díaz de Cerio, Roque Messeguer, Andreu Pere Isern-Deyà, Kana Sanjeevan

Evaluation of Different BDD Libraries to Extract Concepts in FCA – Perspectives and Limitations

This paper presents evaluation of different types of Binary Decision Diagrams (BDDs) applied to Formal Concept Analysis (FCA). The aim is to increase the FCA capability to handle large formal contexts and perform faster operations over different types of this data structure. The main idea is to represent formal context using BDDs for later extraction of the set of all formal concepts from this implicit representation. A comparison of a concept extraction algorithm using contexts implemented as table and BDD are presented. BDD is evaluated over two different implementation libraries, BuDDy and CUDD. A ZBDDs (Zero-Supressed BDDs) version of the concepts extraction algorithm is also provided. BDD has been evaluated based on several types of randomly generated synthetic contexts with large amounts of objects. Contexts are evaluated according to the computational time complexity required to build and extract the set of all concepts from it. In this work, it is shown that BDD could be used to deal with large formal contexts especially when those have few attributes and many objects. To overcome the limitations of having contexts with fewer attributes, one could consider vertical partitions of the context to be used with distributed FCA algorithms based on BDD.

Andrei Rimsa, Luis E. Zárate, Mark A. J. Song

Comparing Genetic Algorithms and Newton-Like Methods for the Solution of the History Matching Problem

In this work we presents a comparison of different optimization methods for the automatic history matching problem of reservoir simulation. The history matching process is an inverse problem that searches a set of parameters that minimizes the difference between the model performance and the historical performance of the field. This model validation process is essential and gives credibility to the predictions of the reservoir model. Derivative-based methods are compared to a free-derivative algorithm. In particular, we compare the Quasi-Newton method, non-linear Conjugate-Gradient, Steepest-Descent and a Genetic Algorithm implementation. Several tests are performed and the preliminary results are presented and discussed.

Elisa Portes dos Santos, Carolina Ribeiro Xavier, Paulo Goldfeld, Flavio Dickstein, Rodrigo Weber dos Santos

Complex System Simulations with QosCosGrid

The aim of the QosCosGrid project is to bring supercomputer-like performance and structure to cross-cluster computations. To support parallel complex systems simulations, QosCosGrid provides six reusable templates that may be instantiated with simulation-specific code to help with developing parallel applications using the ProActive Java library. The templates include static and dynamic graphs, cellular automata and mobile agents. In this work, we show that little performance is lost when a ProActive cellular automata simulation is executed across two distant administrative domains. We describe the middleware developed in the QosCosGrid project, which provides advance reservation and resource co-allocation functionality as well as support for parallel applications based on OpenMPI (for C/C++ and Fortran) or ProActive for Java. In particular, we describe how we modified ProActive Java to enable inter- cluster communication through firewalls. The bulk of the QosCosGrid software is available in open source from the QosCosGrid project website:

Krzystof Kurowski, Walter de Back, Werner Dubitzky, Laszlo Gulyás, George Kampis, Mariusz Mamonski, Gabor Szemes, Martin Swain

Geostatistical Computing in PSInSAR Data Analysis

The presented paper describes the geostatistical analysis of PSInSAR data. This analysis was preceded by short description of PSInSAR technique. The geostatistical computations showed in this article were performed with the


open-source software containing the


package. The analysis contains variograms computing (directional variograms) and ordinary kriging interpolation. The computationally costly problems in geostatistical analysis of PSInSAR data were discussed.

Andrzej Lesniak, Stanislawa Porzycka

Improving the Scalability of SimGrid Using Dynamic Routing

Research into large-scale distributed systems often relies on the use of simulation frameworks in order to bypass the disadvantages of performing experiments on real testbeds. SimGrid is such a framework, that is widely used and mature. However, we have identified a scalability problem in SimGrid’s network simulation layer that limits the number of hosts one can incorporate in a simulation. For modeling large-scale systems such as grids this is unfortunate, as the simulation of systems with tens of thousands of hosts is required. This paper describes how we have overcome this limitation through more efficient storage methods for network topology and routing information. It also describes our use of dynamic routing calculations as an alternative to the current SimGrid method which relies on a static routing table. This reduces the memory footprint of the network simulation layer significantly, at the cost of a modest increase in the runtime of the simulation. We evaluate the effect of our approach quantitatively in a number of experiments.

Silas De Munck, Kurt Vanmechelen, Jan Broeckhove

Image Processing and Visualization


Semantic Visual Abstraction for Face Recognition

In contrast to the one-dimensional structure of natural language, images consist of two- or three-dimensional structures. This contrast in dimensionality causes the mapping between words and images to be a challenging, poorly understood and undertheorized task. In this paper, we present a general theoretical framework for semantic visual abstraction in massive image databases. Our framework applies specifically to facial identification and visual search for such recognition. It accommodates the by now commonplace observation that, through a graph-based visual abstraction, language allows humans to categorize objects and to provide verbal annotations to shapes. Our theoretical framework assumes a hidden layer between facial features and the referencing of expressive words. This hidden layer contains key points of correspondence that can be articulated mathematically, visually or verbally. A semantic visual abstraction network is designed for efficient facial recognition in massive visual datasets. In this paper, we demonstrate how a two-way mapping of words and facial shapes is feasible in facial information retrieval and reconstruction.

Yang Cai, David Kaufer, Emily Hart, Elizabeth Solomon

High Frequency Assessment from Multiresolution Analysis

We propose a method for the assessment and visualization of high frequency regions of a multiresolution image. We combine both orientation tensor and multiresolution analysis to give a scalar descriptor of high frequency regions. High values of this scalar space indicate regions having coincident detail vectors in multiple scales of a wavelet decomposition. This is useful for finding edges, textures, collinear structures and salient regions for computer vision methods. The image is decomposed into several scales using the Discrete Wavelet Transform (DWT). The resulting detail spaces form vectors indicating intensity variations which are combined using orientation tensors. A high frequency scalar descriptor is then obtained from the resulting tensor for each original image pixel. Our results show that this descriptor indicates areas having relevant intensity variation in multiple scales.

Tássio Knop de Castro, Eder de Almeida Perez, Virgínia Fernandes Mota, Alexandre Chapiro, Marcelo Bernardes Vieira, Wilhelm Passarella Freire

Virtual Human Imaging

Given 3D scanned anthropological models and the physical parameters of a microwave imaging system, we develop a virtual human surface imagery system with a finite multi-physics surface model. The concealed object detection algorithms are developed based on the wave intensity and surface characteristics. The virtual human image system can be integrated into a systematic design process, enabling multidisciplinary innovations in security, privacy, healthcare, computer vision, and information visualization. This forward-thinking approach intends to transform the development of human imaging technologies from being device-specific and proprietary to being device-independent and open source-oriented. It also transforms the research into a systematic design process, enabling multidisciplinary innovations in digital human modeling, computer vision, information visualization, and computational aesthetics. This study can help to design privacy-aware imaging systems in airports and medical systems.

Yang Cai, Iryna Pavlyshak, Li Ye, Ryan Magargle, James Hoburg

Interactive Visualization of Network Anomalous Events

We present an interactive visualization and clustering algorithm that reveals real-time network anomalous events. In the model, glyphs are defined with multiple network attributes and clustered with a recursive optimization algorithm for dimensional reduction. The user’s visual latency time is incorporated into the recursive process so that it updates the display and the optimization model according to a human-based delay factor and maximizes the capacity of real-time computation. The interactive search interface is developed to enable the display of similar data points according to the degree of their similarity of attributes. Finally, typical network anomalous events are analyzed and visualized such as password guessing, etc. This technology is expected to have an impact on visual real-time data mining for network security, sensor networks and many other multivariable real-time monitoring systems.

Yang Cai, Rafael de M. Franco

Numerical Algorithms


Towards Low-Cost, High-Accuracy Classifiers for Linear Solver Selection

The time to solve linear systems depends to a large extent on the choice of the solution method and the properties of the coefficient matrix. Although there are several linear solution methods, in most cases it is impossible to predict apriori which linear solver would be best suited for a given linear system. Recent investigations on selecting linear solvers for a given system have explored the use of classification techniques based on the linear system parameters for solver selection. In this paper, we present a method to develop low-cost high-accuracy classifiers. We show that the cost for constructing a classifier can be significantly reduced by focusing on the computational complexity of each feature. In particular, we filter out low information linear system parameters and then order the remaining parameters to decrease the total computation cost for classification at a prescribed accuracy. Our results indicate that the speedup factor of the time to compute the feature set using our ordering can be as high as 262. The accuracy and computation time of the feature set generated using our method is comparable to a near-optimal one, thus demonstrating the effectiveness of our technique.

Sanjukta Bhowmick, Brice Toth, Padma Raghavan

Testing Line Search Techniques for Finite Element Discretizations for Unsaturated Flow

Unsaturated flow in porous media is often modeled using the finite element (FE) method. When employing an implicit version of the computational equations resulting from the FE discretization, a nonlinear system of equations is generated that must then be solved by techniques such as Newton’s method. This paper reveals results of the effectiveness of three line search techniques when employed within the framework of the approximate Newton method. The methods of a bisection line search, a quadratic variation of the norm of the residual line search, and a relaxation technique are considered, all in the context of a parallel computing environment.

Fred T. Tracy

Non-splitting Tridiagonalization of Complex Symmetric Matrices

A non-splitting method for tridiagonalizing

complex symmetric



) matrices is developed and analyzed. The main objective is to exploit the purely structural symmetry in terms of runtime performance. Based on the analytical derivation of the method, Fortran implementations of a blocked variant are developed and extensively evaluated experimentally. In particular, it is illustrated that a straightforward implementation based on the analytical derivation exhibits deficiencies in terms of numerical properties. Nevertheless, it is also shown that the blocked non-splitting method shows very promising results in terms of runtime performance. On average, a speed-up of more than three is achieved over competing methods. Although more work is needed to improve the numerical properties of the non-splitting tridiagonalization method, the runtime performance achieved with this non-unitary tridiagonalization process is very encouraging and indicates important research directions for this class of eigenproblems.

W. N. Gansterer, A. R. Gruber, C. Pacher

Parallel MLEM on Multicore Architectures

The efficient use of multicore architectures for sparse matrix-vector multiplication (SpMV) is currently an open challenge. One algorithm which makes use of SpMV is the maximum likelihood expectation maximization (MLEM) algorithm. When using MLEM for positron emission tomography (PET) image reconstruction, one requires a particularly large matrix. We present a new storage scheme for this type of matrix which cuts the memory requirements by half, compared to the widely-used compressed sparse row format. For parallelization we combine the two partitioning techniques recursive bisection and striping. Our results show good load balancing and cache behavior. We also give speedup measurements on various modern multicore systems.

Tilman Küstner, Josef Weidendorfer, Jasmine Schirmer, Tobias Klug, Carsten Trinitis, Sybille Ziegler

Experience with Approximations in the Trust-Region Parallel Direct Search Algorithm

Recent years have seen growth in the number of algorithms designed to solve challenging simulation-based nonlinear optimization problems. One such algorithm is the Trust-Region Parallel Direct Search (TRPDS) method developed by Hough and Meza. In this paper, we take advantage of the theoretical properties of TRPDS to make use of approximation models in order to reduce the computational cost of simulation-based optimization. We describe the extension, which we call


TRPDS, and present the results of a case study for two earth penetrator design problems. In the case study, we conduct computational experiments with an array of approximations within the


TRPDS algorithm and compare the numerical results to the original TRPDS algorithm and a trust-region method implemented using the speculative gradient approach described by Byrd, Schnabel, and Shultz. The results suggest new ways to improve the algorithm.

S. M. Shontz, V. E. Howle, P. D. Hough

A 3D Vector-Additive Iterative Solver for the Anisotropic Inhomogeneous Poisson Equation in the Forward EEG problem

We describe a novel 3D finite difference method for solving the anisotropic inhomogeneous Poisson equation based on a multi-component additive implicit method with a 13-point stencil. The serial performance is found to be comparable to the most efficient solvers from the family of preconditioned conjugate gradient (PCG) algorithms. The proposed multi-component additive algorithm is unconditionally stable in 3D and amenable for transparent domain decomposition parallelization up to one eighth of the total grid points in the initial computational domain. Some validation and numerical examples are given.

Vasily Volkov, Aleksei Zherdetsky, Sergei Turovets, Allen Malony

Hash Functions Based on Large Quasigroups

In this article we discuss a simple hash function based upon properties of a well-known combinatorial design called quasigroups. The quasigroups are equivalent to the more familiar Latin squares and one of their most important properties is that all possible element of certain quasigroup occurs with equal probability. Actual implementations are based on look-up table implementation of the quasigroup, which is unusable for large quasigroups. In contrast, presneted hash function can be easily implemented. It allows us to compute hash function without storing large amount of data (look-up table). The hash function computation is illustrated by experiments summarized in the last section of this paper.

Václav Snášel, Ajith Abraham, Jiří Dvorský, Pavel Krömer, Jan Platoš

Second Derivative Approximation for Origin-Based Algorithm

Origin-based algorithm(OBA) for traffic assignment problem has been demonstrated preferable to the widely accepted and used Frank-Wolfe algorithm and path-based algorithms. Note that OBA can not avoid path enumeration of concerned network, which will lead to two disadvantages. One is the intensive memory requirements and the other is the difficulties in manipulating and storing paths. In order to solve these problems, we first propose the lower and upper bounds of the Hessian matrix, which can be calculated without path enumeration. Then use the lower bound of Hessian matrix to approximate the direction of the origin-based algorithm. According to our computational results, the modified origin-based algorithm(MOBA) improves the convergence performance greatly. The results indicate that MOBA can deliver better and more reliable convergence than OBA and saves much more CPU time especially when large-scale networks are being considered.

Feng Li

Evaluation of Hierarchical Mesh Reorderings

Irregular and sparse scientific computing programs frequently experience performance losses due to inefficient use of the memory system in most machines. Previous work has shown that, for a graph model, performing a partitioning and then reordering within each partition improves performance. More recent work has shown that reordering heuristics based on a hypergraph model result in better reorderings than those based on a graph model. This paper studies the effects of hierarchical reordering strategies within the hypergraph model. In our experiments, the reorderings are applied to the nodes and elements of tetrahedral meshes, which are inputs to a mesh optimization application. We show that cache performance degrades over time with consecutive packing, but not with breadth-first ordering, and that hierarchical reorderings involving hypergraph partitioning followed by consecutive packing or breadth-first orderings in each partition improve overall execution time.

Michelle Mills Strout, Nissa Osheim, Dave Rostron, Paul D. Hovland, Alex Pothen

Minkowski Functionals Study of Random Number Sequences

Random number sequences are used in a wide range of applications such as simulation, sampling, numerical analysis, cryptography, and recreation. The quality of random number sequences is critical to the correctness of these applications. Many statistical tests have been developed to test various characteristics of random number generators such as randomness, independence, uniformity, etc. Most of them are based on testing on a single sequence. When multiple sequences are employed in an application, their potential correlations are also concerned. In this paper, we explore the techniques of using the Minkowski functionals and their extensions, the Minkowski valuations, to study the mathematical morphology of two dimensional binary image generated by pair-wise random number sequences, and apply this method to describe and compare the properties of several well-known pseudo- and quasi-random number generators.

Xinyu Zhang, Seth Watts, Yaohang Li, Daniel Tortorelli

Autonomous Leaves Graph Applied to the Boundary Layer Problem

In physics and fluid mechanics, the boundary layer is the fluid layer in the immediate vicinity of a bounding surface. It is important in many aerodynamic problems. This work presents a numerical simulation of the bidimensional laminar boundary-layer problem considering a steady incompressible flow with no-slip condition on the surface by Autonomous Leaves Graph based on finite volume discretizations. In addition, a Modified Hilbert Curve numbers the control volumes. Initially, the numerical solution of the flat-plate problem is compared to its analytical solution, namely Blasius Solution. Secondly, simulations of the flow along a NACA airfoil shape are presented. Computer experiments show that an adaptive mesh refinement using the Autonomous Leaves Graph with the Modified Hilbert Curve numbering is appropriate for a aerodynamic problem. Finally, results illustrate that the method provides a good trade-off between speed and accuracy.

Sanderson L. Gonzaga de Oliveira, Mauricio Kischinhevsky

Finite-Element Non-conforming h-Adaptive Strategy Based on Autonomous Leaves Graph

Adaptive mesh refinement techniques are used in order to decrease the computational cost associated with the numerical solution of Partial Differential Equations. In this work, the refined mesh is represented by a graph data structure. More precisely. this scheme follows the Autonomous Leaves Graph concepts. The objective is to construct an adaptive mesh refinement with lower cost than tree-based schemes. Moreover, the Autonomous Leaves Graph was initially proposed with the Finite Volume Method and a Modified Hilbert Curve was used for the total-ordering of the control volumes. This work proposes to integrate the Autonomous Leaves Graph and the Finite Element Method as well as to adapt the Modified Hilbert Curve for this scheme. Furthermore, a non-conforming


-adaptive strategy is implemented. This approach is applied in the solution of the Poisson equation problem and the experimental results are discussed.

Diego Brandão, Sanderson L. Gonzaga de Oliveira, Mauricio Kischinhevsky

Integers Powers of Certain Asymmetric Matrices

In this paper we derive the recurrent formulae for any integers powers of certain asymmetric matrices five order which covers also constans tridiagonal matrices and some asymmetric (and symmetric) pentadiagonal matrices. Since we find also the Binet’s formulae for the elements of the powers of these matrices, it is possible to operate with powers series of these matrices. We note that this method is alternative to the Jordan method decomposition.

Roman Wituła, Damian Słota

Short Papers


Numerical Simulation of the Dynamics of a Periodically Forced Spherical Particle in a Quiescent Newtonian Fluid at Low Reynolds Numbers

In this paper we present the results of a numerical simulation of the dynamics of a periodically forced spherical particle in a quiescent Newtonian fluid at low Reynolds number. We describe the simulation and tests performed to validate our simulation. We have obtained results which are physically reasonable and hence we have confidence in our results. We include the effects of both convective and unsteady inertia on the dynamics at low Reynolds numbers. The inclusion of inertia results in additional linear and nonlinear terms in the equations representing a fading memory of the entire history of the motion. The nonlinearity though small in the parametric regime of interest, gives rise to some interesting features in the solution of the problem.

Tumkur Ramaswamy Ramamohan, Inapura Siddagangaiah Shivakumara, Krishnamurthy Madhukar

Bending Virtual Spring-Damper: A Solution to Improve Local Platoon Control

This article presents a local control approach to linear vehicle platooning. Linear platoon systems are sets of vehicles that use local or global perception capabilities to form a train configuration, without any hard grip element. Public transportation is beginning to interest in platoon systems as a technological base to conceive new services. The main problem related to platoon system’s control corresponds with maintaining inter-vehicle distance. In literature, the platoon’s geometry control problem is treated according to two approaches: global or local vehicle control. This paper focuses on a local approach which does not require sophisticated sensors and/or costly road equipment. This local control approach intends to obtain very good global matching to arbitrary trajectories, only from local perception which consists in measuring the vectorial distance between a given vehicle and its predecessor. The behavior of each platoon vehicle is determined from a physics inspired multi agent interaction model based on a virtual spring-damper. Furthermore, stability, platoon safety properties are checked using physics and mathematical proofs. Finally, simulation is used to measure trajectory error and inter-vehicle distance evolution.

Jean-Michel Contet, Franck Gechter, Pablo Gruer, Abderrafiaa Koukam

Parallel Algorithms for Dandelion-Like Codes

We consider the class of Dandelion-like codes, i.e., a class of bijective codes for coding labeled trees by means of strings of node labels. In the literature it is possible to find optimal sequential algorithms for codes belonging to this class, but, for the best of our knowledge, no parallel algorithm is reported. In this paper we present the first encoding and decoding parallel algorithms for Dandelion-like codes. Namely, we design a unique encoding algorithm and a unique decoding algorithm that, properly parametrized, can be used for all Dandelion-like codes. These algorithms are optimal in the sequential setting. The encoding algorithm implementation on an EREW PRAM is optimal, while the efficient implementation of the decoding algorithm requires concurrent reading.

Saverio Caminiti, Rossella Petreschi

Deterministic Computation of Pseudorandomness in Sequences of Cryptographic Application

An easy method of checking balancedness degree as well as run quantification in sequences obtained from LFSR-based keystream generators has been developed. The procedure is a deterministic alternative to the traditional application of statistical tests. The computation method allows one to check deviation of balancedness and run distribution goodness from standard values. The method here developed can be considered as a first selective criterium for acceptance/rejection of this type of generators of cryptographic application.

A. Fúster-Sabater, P. Caballero-Gil, O. Delgado-Mohatar

Parallel Simulated Annealing for the Job Shop Scheduling Problem

This paper describes two parallel simulated annealing algorithms for the job shop scheduling problem with the sum of job completion times criterion. Some properties of the problem associated with the

block theory

have been presented and discussed. These properties allow us to introduce the effective neighborhood based on the adjacent swap type moves. In this paper, an original method for parallel calculation of optimization criterion value for set of solutions, recommended for the use in metaheuristics with single- and multiple- search trajectories is proposed. Additionally, the vector calculation method, that uses multiple mathematical instructions MMX supported by suitable data organization, is presented. Properties of parallel calculations are empirically verified on the PC with Intel Core 2 Duo processor on Taillard’s benchmarks.

Wojciech Bożejko, Jarosław Pempera, Czesław Smutnicki

Developing Scientific Applications with Loosely-Coupled Sub-tasks

The Simple API for Grid Applications (SAGA) can be used to develop a range of applications which are in turn composed of multiple sub-tasks. In particular SAGA is an effective tool for coordinating and orchestrating the many sub-tasks of such applications, whilst keeping the application agnostic to the details of the infrastructure used. Although developed primarily in the context of distributed applications, SAGA provides an equally valid approach for applications with many sub-tasks on single high-end supercomputers, such as emerging peta-scale computers. Specifically, in this paper we describe how SAGA has been used to develop applications from two types of applications: the first with loosely-coupled homogeneous sub-tasks and, applications with loosely-coupled heterogeneous sub-tasks. We also analyse and contrast the coupling and scheduling requirements of the sub-tasks for these two applications. We find that applications with multiple sub-tasks often have dynamic characteristics, and thus require support for both infrastructure-independent programming models and agile execution models. Hence attention must be paid to the practical deployment challenges along with the theoretical advances in the development of infrastructure-independent applications.

Shantenu Jha, Yaakoub El-Khamra, Joohyun Kim

Simulation of Multiphysics Multiscale Systems, 6th International Workshop


Simulation of Multiphysics Multiscale Systems, 6th International Workshop

Modeling and

Simulation of Multiphysics Multiscale Systems

(SMMS) poses a grand challenge to computational science. To adequately simulate numerous intertwined processes characterized by different spatial and temporal scales spanning many orders of magnitude, sophisticated models and advanced computational techniques are required. The aim of the SMMS workshop is to encourage and review the progress in this multidisciplinary research field. This short paper describes the scope of the workshop and gives pointers to the papers reflecting the latest developments in the field.

Valeria V. Krzhizhanovskaya

Two-Dimensional Micro-Hartmann Gas Flows

We analyze and simulate a near continuum MagnetoGasDynamic(MGD) flow inside a two-dimensional microchannel with a low magnetic Reynolds number assumption. Complex physics such as rarefication, electric and magnetic effects are considered in the asymptotic solutions. This work represents an extension from the classical Hartmann flow in a two-dimensional channel of infinite length to a microchannel of finite length. We obtain a non-dimensional equation that relates the pressure ratio, Reynolds number, Mach number, magnetic Reynolds number and magnetic force number. We also solved for asymptotic solutions of compressible gas flow based on the velocity-slip and temperature-jump wall boundary conditions while maintaining a consistent quasi-isothermal assumption. Numerical solutions of the same formulation are obtained for validation of the present analytical solutions.

Chunpei Cai, Khaleel R. A. Khasawneh

Practical Numerical Simulations of Two-Phase Flow and Heat Transfer Phenomena in a Thermosyphon for Design and Development

Two-phase closed thermosyphons are widely-used in various energy intensive industries including chemical, transportation and other industries for their inherently high heat transfer efficiency, simple construction and reliable operational performance. However, the performance parameters of two-phase closed thermosyphons including the distribution of internal pressure, steam and liquid phase mass fractions, velocity and wall temperatures are obtained primarily via experimental investigations. In this paper numerical methods are discussed and a two-fluid model is employed to describe the two-phase flow and heat transfer processes in a two-phase closed thermosyphon. The IPSA (Inter Phase Slip Algorithm) algorithm is employed to solve the coupled interactions of steam and liquid phases along the phase interface. Flow patterns and distribution of parameters under different conditions are predicted with numerical results agreeing well for the most part with experimental results. Thus, the numerical method and solution procedures are claimed to be of practical utility and can in essence be used profitably to simulate flow and heat transfer phenomena in thermosyphons and other types of heat pipes.

Zheshu Ma, Ali Turan, Shengmin Guo

Evaluation of Micronozzle Performance through DSMC, Navier-Stokes and Coupled DSMC/Navier-Stokes Approaches

Both the particle based Direct Simulation Monte Carlo (DSMC) method and a compressible Navier-Stokes based continuum method are used to investigate the flow inside micronozzles and to predict the performance of such devices. For the Navier-Stokes approach, both slip and no-slip boundary conditions are applied. Moreover, the two methods have been coupled to be used together in a hybrid particle-continuum approach: the continuum domain was then investigated by solving the Navier-Stokes equations with slip wall boundary condition, whereas the region of rarefied regime was studied by DSMC. The section where the domain was split was shown to have a great influence in the prediction of the nozzle performance.

Federico La Torre, Sasa Kenjeres, Chris R. Kleijn, Jean-Luc P. A. Moerel

A Multilevel Multiscale Mimetic (M3) Method for an Anisotropic Infiltration Problem

Modeling of multiphase flow and transport in highly heterogeneous porous media must capture a broad range of coupled spatial and temporal scales. Recently, a hierarchical approach dubbed the Multilevel Multiscale Mimetic (M


) method, was developed to simulate two-phase flow in porous media. The M


method is locally mass conserving at all levels in its hierarchy, it supports unstructured polygonal grids and full tensor permeabilities, and it can achieve large coarsening factors. In this work we consider infiltration of water into a two-dimensional layered medium. The grid is aligned with the layers but not the coordinate axes. We demonstrate that with an efficient temporal updating strategy for the coarsening parameters, fine-scale accuracy of prominent features in the flow is maintained by the M



Konstantin Lipnikov, David Moulton, Daniil Svyatskiy

Computational Upscaling of Inertia Effects from Porescale to Mesoscale

We propose algorithms for computational upscaling of flow from porescale (microscale) to lab scale (mesoscale). In particular, we solve Navier-Stokes equations in complex pore geometries and average their solutions to derive properties of flow relevant at lab scale such as permeability and inertia coefficients. We discuss two variants of traditional discretizations: a simple algorithm which works well in periodic isotropic media and can be used when coarse approximations are needed, and a more complex one which is well suited for nonisotropic geometries. Convergence of solutions and averaging techniques are major concerns but these can be relaxed if only mesoscopic parameters are needed. The project is a proof-of-concept computational laboratory for porous media which delivers data needed for mesoscale simulations by performing microscale computational simulations.

Małgorzata Peszyńska, Anna Trykozko, Kyle Augustson

Towards a Complex Automata Multiscale Model of In-Stent Restenosis

In-stent restenosis, the maladaptive response of a blood vessel to injury caused by the deployment of a stent, is a multiscale problem involving a large number of processes. We describe a Complex Automata Model for in-stent restenosis, coupling a bulk flow, drug diffusion, and smooth muscle cell model, all operating on different time scales. Details of the single scale models and of the coupling interfaces are described, together with first simulation results, obtained with a dedicated software environment for Complex Automata simulations. The results show that the model can reproduce growth trends observed in experimental studies.

Alfonso Caiazzo, David Evans, Jean-Luc Falcone, Jan Hegewald, Eric Lorenz, Bernd Stahl, Dinan Wang, Jörg Bernsdorf, Bastien Chopard, Julian Gunn, Rod Hose, Manfred Krafczyk, Patricia Lawford, Rod Smallwood, Dawn Walker, Alfons G. Hoekstra

A Mechano-Chemical Model of a Solid Tumor for Therapy Outcome Predictions

Experimental investigations of tumors often result in data reflecting very complex underlying mechanisms. Computer models of such phenomena enable their analysis and may lead to novel and more efficient therapy strategies. We present a generalized finite element mechano-chemical model of a solid tumor and assess its suitability for predicting therapy outcome. The model includes hosting tissue, tumor cells (vital and necrotic), nutrient, blood vessels and a growth inhibitor. At a certain time instant of the tumor development virtual therapies are performed and their outcomes are presented. The model is initiated with parameters either obtained directly from the available literature or estimated using multi-scale modeling. First results indicate the usefulness of multi-physics tumor models for predicting therapy response.

Sven Hirsch, Dominik Szczerba, Bryn Lloyd, Michael Bajka, Niels Kuster, Gábor Székely

Simulating Individual-Based Models of Epidemics in Hierarchical Networks

Current mathematical modeling methods for the spreading of infectious diseases are too simplified and do not scale well. We present the Simulator of Epidemic Evolution in Complex Networks (SEECN), an efficient simulator of detailed individual-based models by parameterizing separate dynamics operators, which are iteratively applied to the contact network. We reduce the network generator’s computational complexity, improve cache efficiency and parallelize the simulator. To evaluate its running time we experiment with an HIV epidemic model that incorporates up to one million homosexual men in a scale-free network, including hierarchical community structure, social dynamics and multi-stage intranode progression. We find that the running times are feasible, on the order of minutes, and argue that SEECN can be used to study realistic epidemics and its properties experimentally, in contrast to defining and solving ever more complicated mathematical models as is the current practice.

Rick Quax, David A. Bader, Peter M. A. Sloot

A Nonlinear Master Equation for a Degenerate Diffusion Model of Biofilm Growth

We present a continuous time/discrete space model of biofilm growth, starting from the semi-discrete master equation. The probabilities of biomass movement into neighboring sites depend on the local biomass density and on the biomass density in the target site such that spatial movement only takes place if (i) locally not enough space is available to accommodate newly produced biomass and (ii) the target site has capacity to accommodate new biomass. This mimics the rules employed by Cellular Automata models of biofilms. Grid refinement leads formally to a degenerate parabolic equation. We show that a set of transition rules can be found such that a previously studied ad hoc density-dependent diffusion-reaction model of biofilm formation is approximated well.

Hassan Khassehkhan, Thomas Hillen, Hermann J Eberl

Graphical Notation for Diagramming Coupled Systems

Multiphysics and multiscale–or


–systems share one fundamental requirement: Construction of coupling mechanisms to implement complex data exchanges between a system’s constituent models. I have created a graphical schema for describing coupling workflows that is based on a theoretical framework for describing coupled systems. The schema combines an expanded set of traditional flowchart symbols with pictograms representing data states. The data pictograms include distributed mesh, field, and domain decomposition descriptors and spatiotemporal integration and accumulation registers. Communications pictograms include: blocking- and non-blocking point-to-point and




parallel data transfer; parallel data transposes; collective broadcast, scatter, gather, reduction and barrier operators. The transformation pictograms include: intergrid interpolation; spatiotemporal integral operators for accumulation of state and flux data; and weighted merging of output data from multiple source models for input to a destination model. I apply the schema to simple problems illustrating real situations in coupler design and implementation.

J. Walter Larson

Photoabsorption and Carrier Transport Modeling in Thin Multilayer Photovoltaic Cell

The paper describes an efficient implementation of a photoabsorption model for modern thin photovoltaic (PV) cells. As the modelling of solar cells constitutes a multiphysic and multiscale problem, a special attention was paid to the speed while retaining a reasonable accuracy. The model is restrained to a normal incidence but accounts for the whole solar spectrum. Applied transfer matrix method yields an accurate distribution of the light intensity in the semiconductor structure. Usage of equivalent parameters makes it possible to simulate both plain semiconductor material and a quantum dot superlattice based material, which is used to enhance the PV device performance.

František Čajko, Alexander I. Fedoseyev

Towards Multiscale Simulations of Carbon Nanotube Growth Process: A Density Functional Theory Study of Transition Metal Hydrides

Nanoelectronics and photonics applications of single wall carbon nanotubes (SWNT) are feasible only if SWNTs have specific chirality. The knowledge of the detailed mechanism for SWNT synthesis would allow one to optimize the chemical vapor deposition (CVD) process and may help to gain control over selectivity of SWNT synthesis. While it is not probably feasible to study this mechanism experimentally, it could be analyzed using molecular simulations. Here we propose multiscale computer modeling of CVD process. High theory level can be used for di- and tri-atomic fragments, in order to generate parameters for bond order force field. In turn, force field simulations will be used to characterize the chemical origin and thermochemical properties of the intermediates and transition states. This will allow predicting the rate constants for the elementary steps, which are then used in kinetic Monte Carlo simulations to describe SWNT growth at realistic time scales.

Satyender Goel, Artëm E. Masunov

Darwin Approximation to Maxwell’s Equations

Maxwell’s equations are the foundation of electromagnetics, its extensive applications make computational electromagnetics a new rapidly growing subject. However, since the electric and the magnetic fields are coupling in the Maxwell system, the numerical solution of the full system of Maxwell’s equations is extremely expensive in terms of computer time. Darwin model, which decouple the coupled system, is such a good approximation to Maxwell’s equations. In three dimensional space case, it neglects the transverse component of the displacement current; while in two dimensional space case, Darwin model is equivalent to TE, TM models. For these reasons, it is worthwhile to investigate Darwin model, and a lot of works have been done to it. In this paper, we review the advance in the Darwin model and introduce some of our new results about the exterior problems of Darwin model and its numerical solutions with infinite elements methods.

Nengsheng Fang, Caixiu Liao, Lung-An Ying

Study of Parallel Linear Solvers for Three-Dimensional Subsurface Flow Problems

The performance of an iterative method for solving a system of linear equations depends on the structure of the system to be solved and on the choice of iterative solvers in combination with preconditioners. In this paper, the performance of a specified set of linear solvers and preconditioners provided by PETSc and Hypre is evaluated based on three data sets from subsurface finite element flow models. The results show that simple preconditioners are robust but do not enable convergence behavior that scales with problem size. They also show that it is important to choose an appropriate type of solver for different kind of simulations.

Hung V. Nguyen, Jing-Ru C. Cheng, Robert S. Maier

A Domain Decomposition Based Parallel Inexact Newton’s Method with Subspace Correction for Incompressible Navier-Stokes Equations

There are two major types of approaches for solving the incompressible Navier-Stokes equations. One of them is the so-called projection method, in which the velocity field and the pressure field are solved separately. This method is very efficient, but is difficult to be extended to another multi-physics problem when an appropriate splitting is not available. The other approach is the fully coupled method in which the velocity and pressure fields stay together throughout the computation. The coupled approach can be easily extended to other multi-physics problems, but it requires the solution of some rather difficult linear and nonlinear algebraic systems of equations. The paper focuses on a fully coupled domain decomposition based parallel inexact Newton’s method with subspace correction for incompressible Navier-Stokes equations at high Reynolds numbers. The discussion is restricted to the velocity-vorticity formulation of the Navier-Stokes equations, but the idea can be generalized to other multi-physics problems.

Xiao-Chuan Cai, Xuefeng Li

Workshop on Bioinformatics’ Challenges to Computer Science


Bioinformatics’ Challenges to Computer Science: Bioinformatics Tools and Biomedical Modeling

The second edition of the workshop on Bioinformatics’ Challenges to Computer Science aimed to discuss the gap between bioinformatics tools and biomedical simulation and modeling. This short paper summarizes the papers accepted for the workshop, focusing on bioinformatics applications at the genomics and molecular level as well as simulation and management at the biomedical level, and gives a brief outlook on future developments.

Mario Cannataro, Rodrigo Weber dos Santos, Joakim Sundnes

Cartesio: A Software Tool for Pre-implant Stent Analyses

Quantitative angiography has becoming largely used in interventional cardiology thanks to positive outcomes in terms of patients survival indexes and life quality. This technique is based on positioning a medical stent in a laparoscopic intervention to cure coronary stenoses. Patients gain an immediate benefit from such a procedure by avoiding an open-heart surgery procedure. Stent positioning is guided by using contrast liquid and angiographic X-Rays images, which are used to define stent dimensions. Physicians may have difficulties in optimally estimating the stenosis size in order to choose the most appropriate stent mainly because there is no absolute reference on the angiographer screens. In this paper we present an innovative software tool able to assist the physician pre-implant analysis and thus supporting an optimal stent choice.

Ciro Indolfi, Mario Cannataro, Pierangelo Veltri, Giuseppe Tradigo

Determination of Cardiac Ejection Fraction by Electrical Impedance Tomography - Numerical Experiments and Viability Analysis

Cardiac ejection fraction is a clinically relevant parameter that is highly correlated to the functional status of the heart. Today the non-invasive methods and technology that measure cardiac ejection fraction, such as MRI, CT and echocardiography do not offer a continuous way of monitoring this important parameter. In this work, we numerically evaluate a new method for the continuous estimation of cardiac ejection fraction based on Electrical Impedance Tomography. The proposed technique assumes the existence of recent Magnetic Resonance (MR) images of the heart to reduce the search space of the inverse problem. Simulations were performed on two-dimensional cardiac MRI images with electric potentials numerically obtained by the solution of the Poisson equation via the Boundary Element Method. Different protocols for current injection were evaluated. Preliminary results are presented and the potentialities and limitations of the proposed technique are discussed.

Franciane C. Peters, Luis Paulo S. Barra, Rodrigo Weber dos Santos

Analysis of Muscle and Metabolic Activity during Multiplanar-Cardiofitness Training

Electronic devices have been useful to evaluate muscle fatigue and activation, co-ordination and metabolic consuption among different sports activities. Since these important variables have not been investigated during fitness training, the present study aims to analyze the activity of the major muscles of the lower extremity during training activities on a cardiofitness apparatus (Cardio Wave


, Technogym


- Gambettole, Italy). This device is able to stimulate the multiplanar movements of lower limbs by combining various types of movement according to the physical principles of human motion. Working simultaneously on three axes by a sliding movement of lower limbs should activate different muscle groups following four positions at different intensity level. Muscles activity and training effectiveness were evaluated by monitoring Electromiography signal, metabolic data, oxygen uptake and heart rate. The goal of this research is to develop a system able to manage different information coming by varius electronic devices.

Arrigo Palumbo, Teresa Iona, Vera Gramigna, Antonio Ammendolia, Maurizio Iocco, Gionata Fragomeni

Gene Specific Co-regulation Discovery: An Improved Approach

Discovering gene co-regulatory relationships is a new but important research problem in DNA microarray data analysis. The problem of gene specific co-regulation discovery is to, for a particular gene of interest, called the

target gene

, identify its strongly co-regulated genes and the condition subsets where such strong gene co-regulations are observed. The study on this problem can contribute to a better understanding and characterization of the target gene. The existing method, using the genetic algorithm (GA), is slow due to its expensive fitness evaluation and long individual representation. In this paper, we propose an improved method for finding gene specific co-regulations. Compared with the current method, our method features a notably improved efficiency. We employ


NN Search Table to substantially speed up fitness evaluation in the GA. We also propose a more compact representation scheme for encoding individuals in the GA, which contributes to faster crossover and mutation operations. Experimental results with a real-life gene microarray data set demonstrate the improved efficiency of our technique compared with the current method.

Ji Zhang, Qing Liu, Kai Xu

Experimental Evaluation of Protein Secondary Structure Predictors

Understanding protein biological function is a key issue in modern biology, which is largely determined by its 3D shape. Protein 3D shape, in its turn, is functionally implied by its amino acid sequence. Since the direct inspection of such 3D structures is rather expensive and time consuming, a number of software techniques have been developed in the last few years that predict a spatial model, either of the secondary or of the tertiary form, for a given target protein starting from its amino acid sequence.

This paper offers a comparison of several available automatic secondary structure prediction tools. The comparison is of the experimental kind, where two relevant sets of proteins, a non-redundant one including 100 elements, and a 180-protein set taken from the CASP 6 contest, were used as test cases. Comparisons have been based on evaluating standard quality measures, such as the Q3 and SOV.

Luca Miceli, Luigi Palopoli, Simona E. Rombo, Giorgio Terracina, Giuseppe Tradigo, Pierangelo Veltri

Workshop on Using Emerging Parallel Architectures for Computational Science


Workshop on Using Emerging Parallel Architectures for Computational Science

The Workshop on

Using Emerging Parallel Architectures for Computational Science

, held in conjunction with ICCS 2009, provides a forum for exploring the capabilities of emerging parallel architectures such as GPUs, FPGAs, Cell B.E., and multi-cores to accelerate computational science applications.

Bertil Schmidt, Douglas Maskell

Solving Sparse Linear Systems on NVIDIA Tesla GPUs

Current many-core GPUs have enormous processing power, and unlocking this power for general-purpose computing is very attractive due to their low cost and efficient power utilization. However, the fine-grained parallelism and the stream-programming model supported by these GPUs require a paradigm shift, especially for algorithm designers. In this paper we present the design of a GPU-based sparse linear solver using the Generalized Minimum RESidual (GMRES) algorithm in the CUDA programming environment. Our implementation achieved a speedup of over 20x on the Tesla T10P based GTX280 GPU card for benchmarks with from a few thousands to a few millions unknowns.

Mingliang Wang, Hector Klie, Manish Parashar, Hari Sudan

A Particle-Mesh Integrator for Galactic Dynamics Powered by GPGPUs

We present a particle-mesh N-body integrator running on GPU using CUDA. Relying on a grid-based description of the gravitational potential, it can simulate the evolution of self-interacting ‘stars’ in order to model e.g. galaxies. All the steps of the application have been ported on the GPU , namely 1/ an histogramming algorithm with CUDPP, 2/ of the resolution of the Poisson equation by means of FFT with CUFFT and multi-grid relaxation, 3/ of an optimized finite difference scheme to compute the accelerations of stars and 4/ of an update procedure for positions and velocities. We present several tests at different resolution, and reach a speedup from 2 to 50 depending on the resolution and on the test case.

Dominique Aubert, Mehdi Amini, Romaric David

A Note on Auto-tuning GEMM for GPUs

The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is especially true for Graphics Processing Units (GPUs), as evidenced by recently published results on DLA for GPUs that rely on highly optimized GEMM. However, the current best GEMM performance, e.g. of up to 375 GFlop/s in single precision and of up to 75 GFlop/s in double precision arithmetic on NVIDIA’s GTX 280, is difficult to achieve. The development involves extensive GPU knowledge and even backward engineering to understand some undocumented insides about the architecture that have been of key importance in the development. In this paper, we describe some GPU GEMM


optimization techniques that allow us to keep up with changing hardware by rapidly reusing, rather than reinventing, the existing ideas. Auto-tuning, as we show in this paper, is a very practical solution where in addition to getting an easy portability, we can often get substantial speedups even on current GPUs (e.g. up to 27% in certain cases for both single and double precision GEMMs on the GTX 280).

Yinan Li, Jack Dongarra, Stanimire Tomov

Fast Conjugate Gradients with Multiple GPUs

The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on multiple GPUs installed on a single mainboard. The solver achieves double precision accuracy with single precision GPUs, using a mixed precision iterative refinement algorithm. To achieve high computation speed, we propose a fast sparse matrix-vector multiplication algorithm, which is the core operation of iterative solvers. The proposed multiplication algorithm efficiently utilizes GPU resources via caching, coalesced memory accesses and load balance between running threads. Experiments on wide range of matrices show that our matrix-vector multiplication algorithm achieves up to 11.6 Gflops on single GeForce 8800 GTS card and CG implementation achieves up to 24.6 Gflops with four GPUs.

Ali Cevahir, Akira Nukada, Satoshi Matsuoka

CUDA Solutions for the SSSP Problem

We present several algorithms that solve the single-source shortest-path problem using CUDA. We have run them on a database, composed of hundreds of large graphs represented by adjacency lists and adjacency matrices, achieving high speedups regarding a CPU implementation based on Fibonacci heaps. Concerning correctness, we outline why our solutions work, and show that a previous approach [10] is incorrect.

Pedro J. Martín, Roberto Torres, Antonio Gavilanes

Power Consumption of GPUs from a Software Perspective

GPUs are now considered as serious challengers for high-performance computing solutions. They have power consumptions up to 300 W. This may lead to power supply and thermal dissipation problems in computing centers. In this article we investigate, using measurements, how and where modern GPUs are using energy during various computations in a CUDA environment.

Sylvain Collange, David Defour, Arnaud Tisserand

Experiences with Mapping Non-linear Memory Access Patterns into GPUs

Modern Graphics Processing Units (GPU) are very powerful computational systems on a chip. For this reason there is a growing interest in using these units as general purpose hardware accelerators (GPGPU). To facilitate the programming of general purpose applications, NVIDIA introduced the CUDA programming environment. CUDA provides a simplified abstraction of the underlying complex GPU architecture, so as a number of critical optimizations must be applied to the code in order to get maximum performance. In this paper we discuss our experience in porting an application kernel to the GPU, and all classes of design decisions we adopted in order to obtain maximum performance.

Eladio Gutierrez, Sergio Romero, Maria A. Trenas, Oscar Plata

Accelerated Discovery of Discrete M-Clusters/Outliers on the Raster Plane Using Graphical Processing Units

This paper presents two discrete computational geometry algorithms designed for execution on Graphics Processing Units (GPUs). The algorithms are parallelized versions of sequential algorithms intended for application in geographical data mining. The first algorithm finds clusters of


points, called m-clusters, in the rasterized plane. The second algorithm complements the first by identifying outliers, those points which are not members of any m-clusters. The use of a raster representation of coordinates provides an ideal data stream environment for efficient GPU utilization. The parallel algorithms have low memory demands, and require only a limited amount of inter-process communication. Initial performance analysis indicates the algorithms are scalable, both in problem size and in the number of seeds, and significantly outperform commercial implementations.

Christian Trefftz, Joseph Szakas, Igor Majdandzic, Gregory Wolffe

Evaluating the Jaccard-Tanimoto Index on Multi-core Architectures

The Jaccard/Tanimoto coefficient is an important workload, used in a large variety of problems including drug design fingerprinting, clustering analysis, similarity web searching and image segmentation. This paper evaluates the Jaccard coefficient on three platforms: the Cell Broadband Engine


processor Intel ®Xeon ®dual-core platform and Nvidia ®8800 GTX GPU. In our work, we have developed a

novel parallel algorithm specially suited for the Cell/B.E. architecture for all-to-all Jaccard comparisons, that minimizes DMA transfers and reuses data in the local store.

We show that our implementation on Cell/B.E. outperforms the implementations on comparable Intel platforms by 6-20X with full accuracy, and from 10-50X in reduced accuracy mode, depending on the size of the data, and by more than 60X compared to Nvidia 8800 GTX. In addition to performance, we also discuss in detail our efforts to optimize our workload on these architectures and explain how avenues for optimization on each architecture are very different and vary from one architecture to another for our workload. Our work shows that the algorithms or kernels employed for the Jaccard coefficient calculation are heavily dependent on the traits of the target hardware.

Vipin Sachdeva, Douglas M. Freimuth, Chris Mueller

Pairwise Distance Matrix Computation for Multiple Sequence Alignment on the Cell Broadband Engine

Multiple sequence alignment is an important tool in bioinformatics. Although efficient heuristic algorithms exist for this problem, the exponential growth of biological data demands an even higher throughput. The recent emergence of accelerator technologies has made it possible to achieve a highly improved execution time for many bioinformatics applications compared to general-purpose platforms. In this paper, we demonstrate how the PlayStation®3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the distance matrix computation utilized in multiple sequence alignment algorithms.

Adrianto Wirawan, Bertil Schmidt, Chee Keong Kwoh

Evaluation of the SUN UltraSparc T2+ Processor for Computational Science

The Sun UltraSparc T2+ processor was designed for throughput computing and thread level parallelism. In this paper we evaluate its suitability for computational science. A set of benchmarks representing typical building blocks of scientific applications and a real-world hybrid MPI/OpenMP code for ocean simulation are used for performance evaluation. Additionally we apply micro benchmarks to evaluate the performance of certain components (such as the memory subsystem). To recognise the capabilities of the T2+ processor we compare its performance with the IBM POWER6 processor. While the UltraSparc T2+ is targeted on server workloads with high throughput requirements via low-frequency core design and massive chip multithreading capabilities, the ultra-high frequency core design of the IBM POWER6 optimised for instruction-level parallelism follows a contrary approach. The intention of this evaluation is to investigate whether the current generation of massive chip multithreading processors is capable of providing competitive performance for non-server workloads in scientific applications.

Martin Sandrieser, Sabri Pllana, Siegfried Benkner

Streamlining Offload Computing to High Performance Architectures

Many important scientific, engineering and financial applications can benefit from offloading computation to emerging parallel systems, such as the Cell Broadband Engine


(Cell/B.E.). However, traditional remote procedure call (RPC) mechanisms require significant investment of time and effort to rewrite applications to use a specific RPC system. As a result, offloading functions to remote systems is not viable for many applications. IBM


Dynamic Application Virtualization


(DAV) insulates the application developer by automatically generating stub libraries that allow direct calling of remote procedures without application source code modification. In this paper, we describe DAV automates the conversion of client applications to use remote procedure calls. DAV can generate stub libraries for a wide variety of client applications running on a variety of architectures, allowing allows simple and fast remote procedure call enablement of applications with minimum programming effort.

Mark Purcell, Owen Callanan, David Gregg

Multi-walk Parallel Pattern Search Approach on a GPU Computing Platform

This paper studies the efficiency of using Pattern Search (PS) on bound constrained optimization functions on a Graphics Processing Unit (GPU) computing platform. Pattern Search is a direct search optimization technique that does not require derivative information on non-linear programming problems. Pattern Search is ideally suited to a GPU computing environment due to its low memory requirement and no communication between threads in a multi-walk setting. To adapt to a GPU environment, traditional Pattern Search is modified by terminating based on iterations instead of tolerance. This research designed and implemented a multi-walk Pattern Search algorithm on a GPU computing platform. Computational results are promising with a computing speedup of 100+ compared to a corresponding implementation on a single CPU.

Weihang Zhu, James Curry

A Massively Parallel Architecture for Bioinformatics

Today’s general purpose computers lack in meeting the requirements on computing performance for standard applications in bioinformatics like DNA sequence alignment, error correction for assembly, or TFBS finding. The size of DNA sequence databases doubles twice a year. On the other hand the advance in computing performance per unit cost only doubles every 2 years. Hence, ingenious approaches have been developed for putting this discrepancy in perspective by use of special purpose computing architectures like ASICs, GPUs, multicore CPUs or CPU Clusters. These approaches suffer either from being too application specific (ASIC and GPU) or too general (CPU-Cluster and multicore CPUs). An alternative is the FPGA, which outperforms the solutions mentioned above in case of bioinformatic applications with respect to cost and power efficiency, flexibility and communication bandwidths. For making maximal use of the advantages, a new massively parallel architecture consisting of low-cost FPGAs is presented.

Gerd Pfeiffer, Stefan Baumgart, Jan Schröder, Manfred Schimmler

GPU Accelerated RNA Folding Algorithm

Many bioinformatics studies require the analysis of RNA or DNA structures. More specifically, extensive work is done to elaborate efficient algorithms able to predict the 2-D folding structures of RNA or DNA sequences. However, the high computational complexity of the algorithms, combined with the rapid increase of genomic data, triggers the need of faster methods. Current approaches focus on parallelizing these algorithms on multiprocessor systems or on clusters, yielding to good performance but at a relatively high cost. Here, we explore the use of computer graphics hardware to speed up these algorithms which, theoretically, provide both high performance and low cost. We use the CUDA programming language to harness the power of NVIDIA graphic cards for general computation with a C-like environment. Performances on recent graphic cards achieve a ×17 speed-up.

Guillaume Rizk, Dominique Lavenier

Parallel Calculating of the Goal Function in Metaheuristics Using GPU

We consider a metaheuristic optimization algorithm which uses single process (thread) to guide the search through the solution space. Thread performs in the cyclic way (iteratively) two main tasks: the goal function evaluation for a single solution or a set of solutions and management (solution filtering and selection, collection of history, updating). The latter task takes statistically 1-3% total iteration time, therefore we skip its acceleration as useless. The former task can be accelerated in parallel environments in various manners. We propose certain parallel small-grain calculation model providing the

cost optimal

method. Then, we carry out an experiment using Graphics Processing Unit (GPU) to confirm our theoretical results.

Wojciech Bożejko, Czesław Smutnicki, Mariusz Uchroński


Weitere Informationen