Skip to main content

2011 | Buch

Algorithms and Architectures for Parallel Processing

11th International Conference, ICA300 2011, Melbourne, Australia, October 24-26, 2011, Proceedings, Part II

herausgegeben von: Yang Xiang, Alfredo Cuzzocrea, Michael Hobbs, Wanlei Zhou

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This two volume set LNCS 7016 and LNCS 7017 constitutes the refereed proceedings of the 11th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2011, held in Melbourne, Australia, in October 2011. The second volume includes 37 papers from one symposium and three workshops held together with ICA3PP 2011 main conference. These are 16 papers from the 2011 International Symposium on Advances of Distributed Computing and Networking (ADCN 2011), 10 papers of the 4th IEEE International Workshop on Internet and Distributed Computing Systems (IDCS 2011), 7 papers belonging to the III International Workshop on Multicore and Multithreaded Architectures and Algorithms (M2A2 2011), as well as 4 papers of the 1st IEEE International Workshop on Parallel Architectures for Bioinformatics Systems (HardBio 2011).

Inhaltsverzeichnis

Frontmatter

ADCN 2011 Papers

Lightweight Transactional Arrays for Read-Dominated Workloads

Many common workloads rely on arrays as a basic data structure on top of which they build more complex behavior. Others use them because they are a natural representation for their problem domains.

Software Transactional Memory (STM) has been proposed as a new concurrency control mechanism that simplifies concurrent programming. Yet, most STM implementations have no special representation for arrays. This results, on many STMs, in inefficient internal representations, where much overhead is added while tracking each array element individually, and on other STMs in false-sharing conflicts, because writes to different elements on the same array result in a conflict.

In this work we propose new designs for array implementations that are integrated with the STM, allowing for improved performance and reduced memory usage for read-dominated workloads, and present the results of our implementation of the new designs on top of the JVSTM, a Java library STM.

Ivo Anjo, João Cachopo
Massively Parallel Identification of Intersection Points for GPGPU Ray Tracing

The latest advancements in computer graphics architectures, as the replacement of some fixed stages of the pipeline for programmable stages (shaders), have been enabling the development of parallel general purpose applications on massively parallel graphics architectures (

Streaming Processors

). For years the graphics processing unit (GPU) is being optimized for increasingly high throughput of massively parallel floating-point computations. However, only the applications that exhibit Data Level parallelism can achieve substantial acceleration in such architectures. In this paper we present a parallel implementation of the GridRT architecture for GPGPU ray tracing. Such architecture can expose two levels of parallelism in ray tracing: parallel ray processing and parallel intersection tests, respectively. We also present a traditional parallel implementation of ray tracing in GPGPU, for comparison against the GridRT-GPGPU implementation.

Alexandre S. Nery, Nadia Nedjah, Felipe M. G. França, Lech Jozwiak
Cascading Multi-way Bounded Wait Timer Management for Moody and Autonomous Systems

Timer management is one of the central issues in addressing the ‘moody’ and autonomous characteristics of current Internet. In this paper we formalize the multi-way bounded wait principle for ‘moody’ and autonomous environment. We propose an optimum scheme and compare it with a set of generalized heuristic-based timer management schemes recommended for the harness-a distributed communication al and computational system for moody and autonomous environment.

Asrar Ul Haque, Javed I. Khan
World-Wide Distributed Multiple Replications in Parallel for Quantitative Sequential Simulation

With the recent deployment of global experimental networking facilities, dozens of computer networks with large numbers of computers have become available for scientific studies. Multiple Replications in Parallel (MRIP) is a distributed scenario of sequential quantitative stochastic simulation which offers significant speedup of simulation if it is executed on multiple computers of a local area network. We report results of running MRIP simulations on PlanetLab, a global overlay network which can currently access more than a thousand computers in forty different countries round the globe. Our simulations were run using Akaroa2, a universal controller of quantitative discrete event simulation designed for automatic launching of MRIP-based experiments. Our experimental results provide strong evidence that global experimental networks, such as PlanetLab, can efficiently be used for quantitative simulation, without compromising speed and efficiency.

Mofassir Haque, Krzysztof Pawlikowski, Don McNickle, Gregory Ewing
Comparison of Three Parallel Point-Multiplication Algorithms on Conic Curves

This paper makes a comparison of three parallel point-multiplication algorithms on conic curves over ring Zn. We propose one algorithm for paralleling point-multiplication by utilizing Chinese Remainder Theorem to divide point-multiplication over ring Zn into two different point-multiplications over finite field and to compute them respectively. Time complexity and speedup ratio of this parallel algorithm are computed on the basis of our previous research about the basic parallel algorithms in conic curves cryptosystem. A quantitative performance analysis is made to compare this algorithm with two other algorithms we designed before. The performance comparison demonstrates that the algorithm presented in this paper can reduce time complexity of point-multiplication on conic curves over ring Zn and it is more efficient than the preceding ones.

Yongnan Li, Limin Xiao, Guangjun Qin, Xiuqiao Li, Songsong Lei
Extending Synchronization Constructs in OpenMP to Exploit Pipeline Parallelism on Heterogeneous Multi-core

The ability of expressing multiple-levels of parallelism is one of the significant features in OpenMP parallel programming model. However, pipeline parallelism is not well supported in OpenMP. This paper proposes extensions to OpenMP directives, aiming at expressing pipeline parallelism effectively. The extended directives are divided into two groups. One can define the precedence at thread level while the other can define the precedence at iteration level. Through these directives, programmers can establish pipeline model more easily and exploit more parallelism to improve performance. To support these directives, a set of runtime interfaces for synchronization are implemented on the Cell heterogeneous multi-core architecture using signal block communications mechanism. Experimental results indicate that good performance can be obtained from the pipeline scheme proposed in this paper compared to the naive parallel applications.

Shigang Li, Shucai Yao, Haohu He, Lili Sun, Yi Chen, Yunfeng Peng
Generic Parallel Genetic Algorithm Framework for Protein Optimisation

Proteins are one of the most vital macromolecules on the cellular level. In order to understand the function of a protein, its structure needs to be determined. For this purpose, different computational approaches have been introduced. Genetic algorithms can be used to search the vast space of all possible conformations of a protein in order to find its native structure. A framework for design of such algorithms that is both generic, easy to use and performs fast on distributed systems may help further development of genetic algorithm based approaches. We propose such a framework based on a parallel master-slave model which is implemented in C++ and Message Passing Interface. We evaluated its performance on distributed systems with a different number of processors and achieved a linear acceleration in proportion to the number of processing units.

Lukas Folkman, Wayne Pullan, Bela Stantic
A Survey on Privacy Problems and Solutions for VANET Based on Network Model

For a long term of vehicle communication research, now VANET is in a stage of implementation. However, most of VANET researches focus on message transmission. Vehicle is extremely personal device; therefore personal information, so called privacy has to be protected. In this paper, we analyze identity and location privacy threatening factors, problems, and solutions based on network model. To analyze solution’s effectiveness, we define four attack models: External attack, Internal attack, Correlational attack, and Relational attack. According to our research, most of the solutions use pseudonym identity or address changing scheme to protect identity privacy. Also, solutions are weak to or do not consider the relational attack. We analyze this is due the meet the network model’s transparency design goal and protect vehicle’s real identity even revealing the vehicle’s location. The result of this paper could guide a way to design a privacy preserve solution and present a trend of existing solutions.

Hun-Jung Lim, Tai-Myoung Chung
Scheduling Tasks and Communications on a Hierarchical System with Message Contention

A Directed Acyclic Graph (DAG) of tasks with small communication delays has to be scheduled on the identical parallel processors of clusters connected by a hierarchical network. The number or processors and of clusters is not limited. Message contention has to be avoided. Task duplication is allowed. In this paper, we present a new polynomial algorithm that computes the earliest start dates of all tasks and spreads these tasks to use few processors per cluster, for a DAG with small communication delays. It also avoids message contention, and always delivers messages on time.

Jean-Yves Colin, Moustafa Nakechbandi
Spiking Neural P System Simulations on a High Performance GPU Platform

In this paper we present our results in adapting a Spiking Neural P system (SNP system) simulator to a high performance graphics processing unit (GPU) platform. In particular, we extend our simulations to larger and more complex SNP systems using an NVIDIA Tesla C1060 GPU. The C1060 is manufactured for high performance computing and massively parallel computations, matching the maximally parallel nature of SNP systems. Using our GPU accelerated simulations we present speedups of around 200× for some SNP systems, compared to CPU only simulations.

Francis George Cabarle, Henry Adorna, Miguel A. Martínez-del-Amor, Mario J. Pérez-Jiménez
SpotMPI: A Framework for Auction-Based HPC Computing Using Amazon Spot Instances

The economy of scale offers cloud computing virtually unlimited cost effective processing potentials. Theoretically, prices under fair market conditions should reflect the most reasonable costs of computations. The fairness is ensured by the mutual agreements between the sellers and the buyers. Resource use efficiency is automatically optimized in the process. While there is no lack of incentives for the cloud provider to offer auction-based computing platform, using these volatile platform for practical computing is a challenge for existing programming paradigms. This paper reports a methodology and a toolkit designed to tame the challenges for MPI applications.

Unlike existing MPI fault tolerance tools, we emphasize on dynamically adjusted optimal checkpoint-restart (CPR) intervals. We introduce a formal model, then a HPC application toolkit, named SpotMPI, to facilitate the practical execution of real MPI applications on volatile auction-based cloud platforms. Our models capture the intrinsic dependencies between critical time consuming elements by leveraging instrumented performance parameters and publicly available resource bidding histories. We study algorithms with different computing v.s. communication complexities. Our results show non-trivial insights into the optimal bidding and application scaling strategies.

Moussa Taifi, Justin Y. Shi, Abdallah Khreishah
Investigating the Scalability of OpenFOAM for the Solution of Transport Equations and Large Eddy Simulations

OpenFOAM is a mainstream open-source framework for flexible simulation in several areas of CFD and engineering whose syntax is a high level representation of the mathematical notation of physical models. We use the backward-facing step geometry with Large Eddy Simulations (LES) and semi-implicit methods to investigate the scalability and important MPI characteristics of OpenFOAM. We find that the master-slave strategy introduces an unexpected bottleneck in the communication of scalar values when more than a hundred MPI tasks are employed. An extensive analysis reveals that this anomaly is present only in a few MPI tasks but results in a severe overall performance reduction. The analysis work in this paper is performed with the tool IPM, a portable profiling and workload characterization tool for MPI programs.

Orlando Rivera, Karl Fürlinger, Dieter Kranzlmüller
Shibboleth and Community Authorization Services: Enabling Role-Based Grid Access

Classical authentication and authorization in grid environments can become a user management issue due to the flat nature of credentials based on X.509 certificates. While such credentials are able to identify user affiliations, such systems typically leave out a crucial aspect in user management and resource allocation: privilege levels. Shibboleth-based authentication mechanisms facilitate the secure communication of such user attributes within a trust federation. This paper describes a role-based access control framework that exploits Shibboleth attribute handling and CAS (Community Authorization Services) within a Grid environment. Users are able obtain appropriate access levels to resources outside of their domain on the basis of their native privileges and resource policies. This paper describes our framework and discusses issues of security and manageability.

Fan Gao, Jefferson Tan
A Secure Internet Voting Scheme

We describe information security requirements for a secure and functional Internet voting scheme. Then we present the voting scheme with multiple parties; this voting scheme satisfies all these security requirements. In this scheme, the voter gets a signed key from the registrar, where the registrar signs the key as blinded. The voter uses this signed key during the voting period. All other parties can verify this signature without knowing the identity of the voter, hence the scheme provides privacy of the voter. This voting scheme also satisfies voter verifiability and public verifiability. In addition, the scheme is receipt-free.

Md. Abdul Based, Stig Fr. Mjølsnes
A Hybrid Graphical Password Based System

In this age of electronic connectivity, where we all face viruses, hackers, eavesdropping and electronic fraud, there is indeed no time when security is not critical. Passwords provide security mechanism for authentication and protection services against unwanted access to resources. A graphical based password is one promising alternatives of textual passwords. According to human psychology, humans are able to remember pictures easily. In this paper, we have proposed a new hybrid graphical password based system, which is a combination of recognition and recall based techniques that offers many advantages over the existing systems and may be more convenient for the user. Our scheme is resistant to shoulder surfing attack and many other attacks on graphical passwords. This resistant scheme is proposed for small mobile devices (like smart phones i.e. ipod, iphone, PDAs etc) which are more handy and convenient to use than traditional desktop computer systems.

Wazir Zada Khan, Yang Xiang, Mohammed Y. Aalsalem, Quratulain Arshad
Privacy Threat Analysis of Social Network Data

Social network data has been increasingly made publicly available and analyzed in a wide spectrum of application domains. The practice of publishing social network data has brought privacy concerns to the front. Serious concerns on privacy protection in social networks have been raised in recent years. Realization of the promise of social networks data requires addressing these concerns. This paper considers the privacy disclosure in social network data publishing. In this paper, we present a systematic analysis of the various risks to privacy in publishing of social network data. We identify various attacks that can be used to reveal private information from social network data. This information is useful for developing practical countermeasures against the privacy attacks.

Mohd Izuan Hafez Ninggal, Jemal Abawajy

IDCS 2011 Papers

Distributed Mechanism for Protecting Resources in a Newly Emerged Digital Ecosystem Technology

Digital Ecosystem (DE) is characterized as an open and dynamic environment where the interaction and collaboration between its entities are highly promoted. A major requirement to promote such intensive interaction and collaboration in a DE environment is the ability to secure and uphold the confidentiality, integrity and non-repudiation of shared resources and information. However, current developments of such security mechanisms for protecting the shared resources are still in their infancy. Most of the proposed protection frameworks do not provide a scalable and effective mechanism for engaging multiple interacting entities to protect their resources. This is even a greater issue when multiple resources are exchanged and shared in an open and dynamic environment. Therefore, we proposes a distributed mechanism for individual enterprises to manage their own authorization processes and resource access permissions with an aim to provide a rigorous protection of entities’ resources.

Ilung Pranata, Geoff Skinner, Rukshan Athauda
Reservation-Based Charging Service for Electric Vehicles

This paper designs a telematics service capable of providing electric vehicles with a reservation-based charging mechanism, aiming at improving acceptance ratio. By the telematics network, each vehicle retrieves the current reservation status of charging stations of interest and then sends a reservation request specifying its requirement on charging amount and time constraint. Receiving the request, the charging station checks if it can meet the requirement of the new request without violating the constraints of already admitted requests. In this admission test, the charging scheduler, which may work in the charging station or a remote data center, implements a genetic algorithm to respond promptly to the fast moving vehicle. The performance measurement result, obtained from a prototype implementation, shows that the proposed scheme can significantly improve the acceptance ratio for all range of the number of tasks and permissible peak load, compared with a conventional scheduling strategy.

Junghoon Lee, Gyung-Leen Park, Hye-Jin Kim
Intelligent Ubiquitous Sensor Network for Agricultural and Livestock Farms

This paper designs and implements an intelligent ubiquitous sensor network architecture for agricultural and livestock farms which embrace a variety of sensors and create a great volume of sensor data records. For the sake of efficiently and accurately detecting the specific events out of the great amount of sensor data which may include not just erroneous terms but also correlative attributes, the middleware module embeds an empirical event patterns and knowledge description. For the filtered data, data mining module opens an interface to define the relationship between the environmental aspect and facility control equipments, set the control action trigger condition, and integrate new event detection logic. Finally, the remote user interface for monitoring and control is implemented by on Microsoft Windows, Web, and mobile device applications.

Junghoon Lee, Hye-Jin Kim, Gyung-Leen Park, Ho-Young Kwak, Cheol Min Kim
Queue-Based Adaptive Duty Cycle Control for Wireless Sensor Networks

This paper proposes a control-based approach to duty cycle adaptation for wireless sensor networks. The proposed method controls duty cycle through queue management in order to achieve high performance under variable traffic rates. To have energy efficiency while minimizing the delay, we design a feedback controller, which adapts sleeping interval time to traffic change dynamically by constraining the queue length at a predetermined value. Based on the control theory, we analyze the adaptation behavior of the proposed controller and demonstrate system stability. The simulation results show that the proposed method outperforms existing scheduling protocols by achieving more energy savings while minimizing the delay.

Heejung Byun, Jungmin So
Experimental Evaluation of a Failure Detection Service Based on a Gossip Strategy

Failure detectors were first proposed as an abstraction that makes it possible to solve consensus in asynchronous systems. A failure detector is a distributed oracle that provides information about the state of processes of a distributed system. This work presents a failure detection service based on a gossip strategy. The service was implemented on the JXTA platform. A simulator was also implemented so the detector could be evaluated for a larger number of processes. Experimental results show that increasing the frequency in which gossip messages are sent gives better results than increasing the fanout. Results are included for fault and recovery detection time and mistake rate of the detector.

Leandro P. de Sousa, Elias P. Duarte Jr.
On the Performance of MPI-OpenMP on a 12 Nodes Multi-core Cluster

With the increasing number of Quad-Core-based clusters and the introduction of compute nodes designed with large memory capacity shared by multiple cores, new problems related to scalability arise. In this paper, we analyze the overall performance of a cluster built with nodes having a dual Quad-Core Processor on each node. Some benchmark results are presented and some observations are mentioned when handling such processors on a benchmark test. A Quad-Core-based cluster’s complexity arises from the fact that both local communication and network communications between the running processes need to be addressed. The potentials of an MPI-OpenMP approach are pinpointed because of its reduced communication overhead. At the end, we come to a conclusion that an MPI-OpenMP solution should be considered in such clusters since optimizing network communications between nodes is as important as optimizing local communications between processors in a multi-core cluster.

Abdelgadir Tageldin Abdelgadir, Al-Sakib Khan Pathan, Mohiuddin Ahmed
A Protocol for Discovering Content Adaptation Services

Service-oriented content adaptation scheme has emerged to address content adaptation problem. In this scheme, content adaptation functions are provided as services by multiple providers, located across wide area network. To benefit from these services, clients must be able to locate them in the network. This makes service discovery as an important component. In this paper, we propose a service discovery protocol that takes into account searching space, searching time, QoS and physical location of the potential providers. The performance of the proposed protocol is studied in term of discoverability under various conditions and shown to be substantially better than the keyword-based and QoS-based approaches.

Mohd Farhan Md Fudzee, Jemal Abawajy
Securing RFID Systems from SQLIA

While SQL injection attacks have been plaguing web applications for years the threat they pose to RFID systems have only identified recently. Because the architecture of web systems and RFID systems differ considerably the prevention and detection techniques proposed for web applications are not suitable for RFID systems. In this paper we propose a system to secure RFID systems against tag based SQLIA. Our system is optimized for the architecture of RFID systems and consists of a query structure matching technique and tag data cleaning technique. The novelty of the proposed system is that it’s specifically aimed at RFID systems and has the ability to detect and prevent second order injections which is a problem most current solutions haven’t addressed. The preliminary evaluation of our query matching technique is very promising showing very high detection rate with minimal false positives.

Harinda Fernando, Jemal Abawajy
Modeling QoS Parameters of VoIP Traffic with Multifractal and Markov Models

In this paper, we analyze the jitter and packet loss behavior of voice over Internet protocol (VoIP) traffic by means of networks measurements and simulations results. As result of these analyses, we provide a detailed characterization and accurate modeling of these Quality of Service (QoS) parameters. Our studies have revealed that VoIP jitter can be modeled by self-similar and multifractal models. We present a methodology for simulating packet loss. Besides, we found relationships between Hurst parameter (H) with packet loss rate (PLR).

Homero Toral-Cruz, Al-Sakib Khan Pathan, Julio C. Ramírez-Pacheco
Hybrid Feature Selection for Phishing Email Detection

Phishing emails are more active than ever before and putting the average computer user and organizations at risk of significant data, brand and financial loss. Through an analysis of a number of phishing and ham email collected, this paper focused on fundamental attacker behavior which could be extracted from email header. It also put forward a hybrid feature selection approach based on combination of content-based and behavior-based. The approach could mine the attacker behavior based on email header. On a publicly available test corpus, our hybrid features selections are able to achieve 96% accuracy rate. In addition, we successfully tested the quality of our proposed behavior-based feature using the information gain.

Isredza Rahmi A. Hamid, Jemal Abawajy

M2A2 2011 Papers

On the Use of Multiplanes on a 2D Mesh Network-on-Chip

Alike interconnection networks for parallel systems, Networks-on-chip (NoC) must provide high bandwidth and low latency, but they are further constrained by their on-chip power budget. Consequently, simple network topologies such as the 2D Mesh with shallow buffers and simple routing strategies such as dimensional order routing (DOR) have been widely used in order to achieve this goal. A low number of virtual channels could be used to eliminate head-of-line blocking and increase network throughput.

Due to the spare routing area in deep submicron technology, another possibility is to replicate the simple network once or more times. This work compares and combines the two approaches, by considering the distribution of a fixed number of virtual channels over one or more multiplanes. A thorough evaluation of the possible 2D mesh network configurations under a range of workloads will show that, provided there is spare area, replicating the 2D mesh with 2 virtual channels results on the best power/performance trade-off.

Cruz Izu
A Minimal Average Accessing Time Scheduler for Multicore Processors

In this paper, we study and analyze process scheduling for multicore processors. It is expected that hundreds of cores will be integrated on a single chip, known as a Chip Multiprocessor (CMP). However, operating system process scheduling, one of the most important design issue for CMP systems, has not been well addressed. We define a model for future CMPs, based on which a minimal average accessing time scheduling algorithm is proposed to reduce on-chip communication latencies and improve performance. The impact of memory access and inter process communication (IPC) in scheduling are analyzed. We explore six typical core allocation strategies. Results show that, a strategy with the minimal average accessing time of both core-core and core-memory outperforms other strategies, the overall performance for three applications (FFT, LU and H.264) has improved for 8.23%, 4.81% and 10.21% respectively comparing with other strategies.

Thomas Canhao Xu, Pasi Liljeberg, Hannu Tenhunen
Fast Software Implementation of AES-CCM on Multiprocessors

This paper presents a novel software implementation of AES-CCM (Advanced Encryption Standard-Counter mode with Cipher Block Chaining Message Authentication Code) for multiprocessors. The software includes AES key expansion for dual multiprocessors and cipher/inverse cipher for dual/quad multiprocessors. On the measurement of a Xilinx MicroBlaze multiprocessor based platform, the speedup of our AES key expansion, cipher and inverse cipher is up to 1.7, 2.6 and 2.6 times, respectively. Using the new software implementation of AES, AES-CCM for IEEE 802.11i is implemented on the octet MicroBlaze processors. The fast software implementation of the AESCCM for multi processors is up to 3.6 times faster than the implementation for the single processor.

Jung Ho Yoo
A TCM-Enabled Access Control Scheme

Trusted Cryptography Supporting Platform is a computer platform with high dependable and available software and hardware, within which security mechanism is reliable and robust because some encryption/decryption, authentication techniques are adopted upon the operating system based on the trusted platform module in a chip or ARM board. USB disk is a popular, flexible, removable storage device but it also brings some new information security risks at the same time. In this paper, a TCM (Trusted Cryptography Module)-enabled transparent file encryption/decryption strategy is proposed with which a Minifilter driver subroutine are programmed under Microsoft’s latest Minifilter framework and files of USB disk can be transparently encrypted or decrypted. With the TSM/SDK (TCM Service Module/ Software Development Kit) , the file encryption/decryption procedures are better kept in safety by invocating TCM’s hash component, random function component and encryption/decryption component. Hence, the removable storage’s data (files) are of high security because TCM is an individual hardware, the encryption/decryption operations are running within TCM and the key is stored in TCM.

Gongxuan Zhang, Zhaomeng Zhu, Pingli Wang, Bin Song
Binary Addition Chain on EREW PRAM

An addition chain for a natural number

x

of

n

bits is a sequence of numbers

a

0

,

a

1

, ... ,

a

l

, such that

a

0

 = 1,

a

l

 = 

x

, and

a

k

 = 

a

i

 + 

a

j

with 0 ≤ 

i

,

j

 < 

k

 ≤ 

l

. The addition chain problem is what is the minimal number of additions needed to compute

x

starting from 1? In this paper, we present a new parallel algorithm to generate a short addition chain for

x

. The algorithm has running time

O

(log

2

n

) using polynomial number processors under EREW PRAM (exclusive read exclusive write parallel random access machine). The algorithm is faster than previous algorithms and is based on binary method.

Khaled A. Fathy, Hazem M. Bahig, Hatem M. Bahig, A. A. Ragb
A Portable Infrastructure Supporting Global Scheduling of Embedded Real-Time Applications on Asymmetric MPSoCs

Multiprocessor systems-on-chip (MPSoCs) open notable perspectives in the design of highly-targeted embedded solutions for real-time multitasking applications. The heterogeneity of available platforms, however, often hinders plain applicability of efficient scheduling policies, particularly in the case of loosely coupled architectures which do not provide any support for inter-processor tasks migration. In this paper we present a portable software infrastructure addressing the global scheduling of periodic real-time tasks on such platforms. We show that global scheduling policies, under the restricted-migration model, are applicable also on asymmetric multiprocessing systems and experimentally evaluate the validity of the approach using different FPGA-based configurations that recall manifold architectures of commercial MPSoCs.

Eugenio Faldella, Primiano Tucci
Emotional Contribution Process Implementations on Parallel Processors

An emotional agent software architecture for real-time mobile robotic applications has been developed. In order to allow the agent to undertake more dynamically constrained application problem solving, the processor computation time should be reduced and the gained time is used for executing more complex processes. In this paper, the response time of the operating processes, in each attention cycle of the agent, is decreased by parallelizing the highly parallel processes of the architecture, namely, emotional contribution processes. The implementation of these processes has been evaluated in Field Programmable Gate Array (FPGA) and multicore processors.

Carlos Domínguez, Houcine Hassan, José Albaladejo, Maria Marco, Alfons Crespo
A Cluster Computer Performance Predictor for Memory Scheduling

Remote Memory Access (RMA) hardware allow a given motherboard in a cluster to directly access the memory installed in a remote motherboard of the same cluster. In recent works, this characteristic has been used to extend the addressable memory space of selected motherboards, which enable a better balance of main memory resources among cluster applications. This way is much more cost-effective than than implementing a full-fledged shared memory system.

In this context, the memory scheduler is in charge of finding a suitable distribution of local and remote memory that maximizes the performance and guarantees a minimum QoS among the applications. Note that since changing the memory distribution is a slow process involving several motherboards, the memory scheduler needs to make sure that the target distribution provides better performance than the current one.

In this paper, a performance predictor is designed in order to find the best memory distribution for a given set of applications executing in a cluster motherboard. The predictor uses simple hardware counters to estimate the expected impact on performance of the different memory distributions. The hardware counters provide the predictor with the information about the time spent in processor, memory access and network.

The performance model used by the predictor has been validated in a detailed microarchitectural simulator using real benchmarks. Results show that the prediction accuracy never deviates more than 5% compared to the real results, being less than 0.5% in most of the cases.

Mónica Serrano, Julio Sahuquillo, Houcine Hassan, Salvador Petit, José Duato

HardBio 2011 Papers

Reconfigurable Hardware Computing for Accelerating Protein Folding Simulations Using the Harmony Search Algorithm and the 3D-HP-Side Chain Model

Proteins are essentials to life and they have countless biological functions. They are synthesized in the ribosome of cells following a template given by the messenger RNA (mRNA). During the synthesis, the protein folds into an unique three-dimensional structure, known as native conformation. This process is called protein folding. Several diseases are believed to be result of the accumulation of ill-formed proteins.Therefore, understanding the folding process can lead to important medical advancements and development of new drugs.

César Manuel Vargas Benítez, Marlon Scalabrin, Heitor Silvério Lopes, Carlos R. Erig Lima
Clustering Nodes in Large-Scale Biological Networks Using External Memory Algorithms

Novel analytical techniques have dramatically enhanced our understanding of many application domains including biological networks inferred from gene expression studies. However, there are clear computational challenges associated to the large datasets generated from these studies. The algorithmic solution of some NP-hard combinatorial optimization problems that naturally arise on the analysis of large networks is difficult without specialized computer facilities (i.e. supercomputers). In this work, we address the data clustering problem of large-scale biological networks with a polynomial-time algorithm that uses reasonable computing resources and is limited by the available memory. We have adapted and improved the MST

k

NN graph partitioning algorithm and redesigned it to take advantage of external memory (EM) algorithms. We evaluate the scalability and performance of our proposed algorithm on a well-known breast cancer microarray study and its associated dataset.

Ahmed Shamsul Arefin, Mario Inostroza-Ponta, Luke Mathieson, Regina Berretta, Pablo Moscato
Reconfigurable Hardware to Radionuclide Identification Using Subtractive Clustering

Radioactivity is the spontaneous emission of energy from unstable atoms. Radioactive sources have radionuclides. Radionuclide undergoes radioactive decay and emits gamma rays and subatomic particles, constituting the ionizing radiation. The gamma ray energy of a radionuclide is used to determine the identity of gamma emitters present in the source. This paper describes the hardware implementation of subtractive clustering algorithm to perform radionuclide identification.

Marcos Santana Farias, Nadia Nedjah, Luiza de Macedo Mourelle
A Parallel Architecture for DNA Matching

DNA sequences can be often showed in fragments, little pieces, found at crime scene or in a hair sample for paternity exam. In order to compare that fragments with a subject or target sequence of a suspect, we need an efficient tool to analyze the DNA sequence alignment and matching. So DNA matching is a bioinformatics field that could find relationships functions between sequences, alignments and them try to understand it. Usually done by software through databases clusters analysis, DNA matching requires a lot of computational resources, what may increase the bioinformatics project budget. We propose the approach of a hardware parallel architecture, based on heuristic method, capable of reducing time spent on matching process.

Edgar J. Garcia Neto Segundo, Nadia Nedjah, Luiza de Macedo Mourelle
Backmatter
Metadaten
Titel
Algorithms and Architectures for Parallel Processing
herausgegeben von
Yang Xiang
Alfredo Cuzzocrea
Michael Hobbs
Wanlei Zhou
Copyright-Jahr
2011
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-24669-2
Print ISBN
978-3-642-24668-5
DOI
https://doi.org/10.1007/978-3-642-24669-2