Skip to main content

2007 | Buch

Parallel and Distributed Processing and Applications

5th International Symposium, ISPA 2007 Niagara Falls, Canada, August 29-31, 2007 Proceedings

herausgegeben von: Ivan Stojmenovic, Ruppa K. Thulasiram, Laurence T. Yang, Weijia Jia, Minyi Guo, Rodrigo Fernandes de Mello

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Keynote Speech

Self-stabilizing Distributed Algorithms for Networks

Many essential fundamental services for networked distributed systems (ad hoc, wireless or sensor) involve maintaining a global predicate over the entire network (defined by some invariance relation on the global state of the network) by using local knowledge at each of the participating nodes. The participating nodes can no longer keep track of even a small fraction of the knowledge about the global network due to limited storage. We need a new paradigm of localized distributed algorithms, where a node takes simple actions based on local knowledge of only its immediate neighbors and yet the system achieves a global objective. Self-stabilization is a relatively new paradigm for designing such localized distributed algorithms for networks; it is an optimistic way of looking at system fault tolerance and scalable coordination; it provides a cost effective built-in safeguard against transient failures that might corrupt data in a distributed system. We introduce self-stabilizing protocol design with the example of a total dominating set in a network graph and discuss some open problems.

Pradip K. Srimani
Feature Extraction and Coverage Problems in Distributed Sensor Networks

Distributed Sensor Network is a classical area of multi- disciplinary science. This needs a special type of computing, communication and sensing. This talk presents some new results on the following topics: 1) An optimization framework based on mathematical programming for maximizing the coverage probability of a sensor field under the constraints of investment limit; 2) Feature extraction using sensor networks.

Sitharama S. Iyengar
Peer-to-Peer Computing: From Applications to Platform

In the last several years, we have made great efforts on prototype development and deployments of real systems about peer-to-peer computing. We have released many famous systems in CERNET of China, including: 1) a live streaming system based on P2P computing, named AnySee, in 2004; 2) a P2P VoD system, named GridCast, a P2P based E-Learning system, named APPLE, in 2005; 3) a P2P based gaming platform, named PKTown, and a P2P-based high performance computing platform, named P2HP, in 2006; 4) a P2P live streaming system for wireless environment, named MoSee, in 2007. All these deployed systems have attracted more attention on innovative P2P applications. In the last several years, there are about more than 100,000 users, which have enjoyed the services provided by these applications and we also have collected many data sets about users behaviors in different applications. Based on these logs from real systems, there is one finding: single platform, which includes many typical P2P applications, is a promising system for users, developers and researchers. For users, they can enjoy different services in one software, not many dazzling applications. For developers, they can deploy new P2P services quickly based on the functions and support by platform. For researchers, they can model the complex network and make statistic analysis and physical evolvement based on the traces provided by the platform. In the future, we will focus on P2P based platform, named Ripple, which is to construct a Reliable independent P2P layered engine with manageability to support services, including scientific research, streaming services, game services and etc.

Hai Jin

Algorithms and Applications

A Self-stabilizing Algorithm For 3-Edge-Connectivity

The adoption of self-stabilization as an approach to fault-tolerant behavior has received considerable research interest over the last decade. In this paper, we propose a self-stabilizing algorithm for 3-edge-connectivity of an asynchronous distributed model of computation. The self-stabilizing depth-first search algorithm of Collin and Dolev [4] is run concurrently to build a depth-first search spanning tree of the system. Once such a tree of height

h

is constructed, the detection of all 3-edge-connected components of the system requires

O

(

h

) rounds. The result of computation is kept in a distributed fashion in the sense that, upon stabilization of another phase of the algorithm, each processor knows all other processors that are 3-edge-connected to it.

Until now, this is the only algorithm to compute all the 3-edge-connected components in the context of self-stabilization.

Assuming that every processor requires

m

bits for the depth-first search algorithm, the space complexity of our algorithm is

O

(

hm

) bits per processor.

Abusayeed M. Saifullah, Yung H. Tsin
Number of Processors with Partitioning Strategy and EDF-Schedulability Test: Upper and Lower Bounds with Comparison

In this paper, we study the problem of scheduling a set of

n

periodic preemptive independent hard real-time tasks on the minimum number of processors. We assume that the partitioning strategy is used to allocate the tasks to the processors and the EDF method is used to schedule the tasks on each processor. It is known that this scenario is NP-hard; thus, it is unlikely to find a polynomial time algorithm to schedule the tasks on the minimum number of processors. In this work, we derive a lower and an upper bound for the number of processors required to satisfy the constraints of our problem. We also compare a number of heuristic algorithms with each other and with the bounds derived in this paper. Numerical results demonstrate that our lower bound is very tight and it is very close to the optimal solution.

Arezou Mohammadi, Selim G. Akl
Architecture-Based Optimization for Mapping Scientific Applications to Imagine

It is a challenging issue whether scientific applications are suitable for Imagine architecture. To address this problem, this paper presents a novel architecture-based optimization for the key techniques of mapping scientific applications to Imagine. Our specific contributions include that we achieve fine kernel granularity and choose necessary arrays to organize appropriate streams. Specially, we develop a new stream program generation algorithm based on the architecture-based optimization. We implement our algorithm to some representative scientific applications on ISIM simulation of Imagine, compared the corresponding FORTRAN programs running on Itanium 2. The experimental results show that the optimizing stream programs can efficiently improve computational intensiveness, enhance locality of LRF and SRF, avoid index stream overhead and enable parallelism to utilize ALUs. It is certain that Imagine is efficient for many scientific applications.

Jing Du, Xuejun Yang, Guibin Wang, Tao Tang, Kun Zeng
Implementation and Optimization of Sparse Matrix-Vector Multiplication on Imagine Stream Processor

Sparse matrix-vector multiplication (shortly SpMV) dominates the performance of many scientific and engineering applications. However, it tends to run much more slowly than its dense counterpart because the algorithms have poor temporal and spatial locality, the memory access patterns are irregular. Its performance depends heavily on both the nonzero structure of the sparse matrix and on the machine architecture. In this paper, we address the problem of implementing and optimizing SpMV on Imagine stream processor. We present three classes of implementation algorithms based on different key ideas, first two of which highlight different aspects of underlying stream architecture, and the third algorithm is inspired by the SpMV vector implementation. Then we discuss some critical optimizations. The experimental results over same benchmarks show we achieve up to an average 67 percent relative improvement over published evaluation.

Li Wang, Xue Jun Yang, Gui Bin Wang, Xiao Bo Yan, Yu Deng, Jing Du, Ying Zhang, Tao Tang, Kun Zeng
A Mutual Exclusion Algorithm for Mobile Agents-Based Applications

The mobile agent (MA) technology has been widely applied in distributed applications. However, since mobile agents are highly autonomous, coordinating the their behaviours is hard. We concentrate on the mutual exclusion issue in this paper and propose an algorithm for achieving mutex among mobile agents. In contrast to the existing algorithms for traditional distributed processes, the proposed algorithm does not require MAs to have a pre-knowledge about other competitors, nor the total number of them. MAs can join/leave the competition session freely. The algorithm performance is also evaluated through simulations.

Chun Cao, Jiannong Cao, Xiaoxing Ma, Jian Lü
A Distributed Metaheuristic for Solving a Real-World Scheduling-Routing-Loading Problem

In this paper a real and complex transportation problem including routing, scheduling and loading tasks is presented. Most of the related works only involve the solution of routing and scheduling, as a combination of up to five different types of VRPs (Rich VRP), leaving away the loading task, which are not enough to define more complex real-world cases. We propose a solution methodology for transportation instances that involve six types of VRPs, a new constraint that limits the number of vehicles that can be attended simultaneously and the loading tasks. They are solved using an Ant Colony System algorithm, which is a distributed metaheuristic. Results from a computational test using real-world instances show that the proposed approach outperforms the transportation planning related to manual designs. Besides a well-known VRP benchmark was solved to validate the approach.

Laura Cruz Reyes, Juan Javier González Barbosa, David Romero Vargas, Hector Joaquin Fraire Huacuja, Nelson Rangel Valdez, Juan Arturo Herrera Ortiz, Bárbara Abigail Arrañaga Cruz, José Francisco Delgado Orta
Cellular ANTomata
(Extended Abstract)

Cellular automata can form the basis of a practical model for a broad range of tasks that require the coordination of many simple computing devices. We propose using “semi-synchronous” cellular automata as a platform for efficiently realizing ant-inspired algorithms that coordinate robots within a fixed, geographically constrained environment. We present an appropriate formalization of the resulting

Cellular ANTomaton

model, illustrated via “proof-of-concept” problems that have ant-robots move and aggregate in various ways.

Arnold L. Rosenberg
Key-Attributes Based Optimistic Data Consistency Maintenance Method

Peer-to-peer distributed storage systems usually replicate data objects on multi-node to improve the performance and availability. However, updates may be delayed for P2P systems are generally large-scale and strong distributed, and then the performance of resource location in Internet would be depressed. According to that, an optimistic data consistency maintenance method based on key-attributes is proposed. In the method, updates about key-attributes are separated from user request. Key-updates are propagated by latency-overlay update propagation model, that is, updates are always propagated to nodes having maximum or minimal latency, and assured and uncertain propagation paths of updates are all taken into account. Based on classifying key-update conflicts, a double-level reconciling mechanism including the preprocessing of buffer and the processing of update-log is applied to detect and reconcile conflicts, and then conflicts are solved by policies of last-writer-win and divide-and-rule. Lastly, the technique of managing and maintaining update-log is discussed for the above is deployed based on the information storied in update-log. Delaying key-attributes updates cannot occur by the optimistic disposal method, and then it cannot depress efficiency of resource location based on key-attributes, which adapts well to P2P systems in Internet. The simulation results show it is an effective optimistic consistency maintenance method, achieves good consistency overhead, resource location and access overhead, and has strong robustness.

Jing Zhou, Yijie Wang, Sikun Li
Parallelization Strategies for the Points of Interests Algorithm on the Cell Processor

The Cell processor is a typical example of a heterogeneous multiprocessor-on-chip architecture that uses several levels of parallelism to deliver high performance. Closing the gap between peak performance and sustained performance is the challenge for software tool developers and the application developers. Image processing and media applications are typical ”main stream” applications. In this paper, we use the Harris algorithm for detection of points of interest (PoI) in an image as a benchmark to compare the performance of several parallel schemes on a Cell processor. The impact of the DMA controlled data transfers and the synchronizations between SPEs explains the differences between the performance of the different parallelization schemes. These results will be used to design a tool for an efficient mapping of image processing applications on multi-core architectures.

Tarik Saidani, Lionel Lacassagne, Samir Bouaziz, Taj Muhammad Khan
RWA Algorithm for Scheduled Lightpath Demands in WDM Networks

We have proposed and evaluated a novel heuristic algorithm of routing and wavelength assignment (RWA) for scheduled lightpath demands (SLD). Our algorithm is shown here to have a performance gain in terms of the number of wavelengths up to 65.8% compared with the most recently introduced method, TDP_RWA, and works on directed and undirected optical networks. Also, we show the execution time of our proposed method is feasible.

Sooyeon Park, Jong S. Yang, Moonseong Kim, Young-Cheol Bang
Optimizing Distributed Data Access in Grid Environments by Using Artificial Intelligence Techniques

This work evaluates two artificial intelligence techniques for file distribution in Grid environments. These techniques are used to access data on independent servers in parallel, in order to improve the performance and maximize the throughput rate. In this work, genetic algorithms and Hopfield neural networks are the techniques used to solve the problem. Both techniques are evaluated for efficiency and performance. Experiments were conduced in environments composed of 32, 256 and 1024 distributed nodes. The results allow to confirm the decreasing in the file access time and that Hopfield neural network offered the best performance, being possible to be applied on Grid environments.

Rodrigo F. de Mello, Jose Augusto Andrade Filho, Evgueni Dodonov, Renato Porfirio Ishii, Laurence T. Yang
Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors

Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and implementing efficient parallel algorithms for graph problems on symmetric multiprocessors and chip multiprocessors with a case study of parallel tree and connectivity algorithms. The problems we study represent a wide range of irregular problems that have fast theoretic parallel algorithms but no known efficient parallel implementations that achieve speedup without serious restricting assumptions about the inputs. We believe our techniques will be of practical impact in solving large-scale graph problems.

Guojing Cong, David A. Bader
Distributed Memorization for the k -Vertex Cover Problem

We present the first investigation of the well-known memorization technique for solving the

k

-Vertex Cover

problem in a distributed setting. Memorization was introduced by Robson [15] in his paper on solving the

Maximum Independent Set

problem. The idea is to augment a recursive algorithm with the capability to store subproblem instances and solutions of bounded size in a table that can be quickly referenced, so that subproblems are guaranteed to be solved exactly once. This approach has recently been applied with success to improve the complexity of the fixed-parameter tractable algorithms for solving the

k

-Vertex Cover

problem [12,5]. We present a general parallel approach for using memorization to solve the

k

-Vertex Cover

problem where the subgraphs are precomputed [12]. In this case, the subgraphs and corresponding solutions are generated in a preprocessing step, rather than during the recursion. Our technique makes efficient use of the processors generating the lookup table, while at the same time requiring less space. We describe a distributed algorithm using this technique, well-suited to cluster or grid computing platforms.

Peter J. Taillon
MADARP: A Distributed Agent-Based System for On-Line DARP

The present work describes the design of a distributed agent system devoted to the Dial-a-Ride Problem. This routing and scheduling problem consists in finding a set of routes and schedules for each vehicle that satisfies a set of trip requests comming from users. The agent system distributes an improved insertion heuristic for the scheduling of passengers’ trip requests over a fleet of vehicles. Agents make use of the contract-net protocol as base coordination mechanism for the planning and scheduling of passenger trips.

Claudio Cubillos, Broderick Crawford, Nibaldo Rodríguez
An Incremental Distributed Algorithm for a Partial Grundy Coloring of Graphs

A coloring of a graph

G

 = (

V

,

E

) is a partition of {

V

1

,

V

2

 ⋯ 

V

k

} of

V

into

k

independent sets called color classes. A vertex

v

 ∈ 

V

i

is called a Grundy vertex if it is adjacent to at least one vertex in color class

V

j

, for every

j

 < 

i

. In the partial Grundy coloring, every color class contains at least one Grundy vertex. Such a coloring gives a partitioning of the graph into clusters for which every cluster has a clusterhead (the Grundy vertex) adjacent to some other clusters. Such a decomposition is very interesting for large distributed systems and networks. In this paper, we propose a distributed algorithm to maintain the partial Grundy coloring of any graph G when an edge is added

Lyes Dekar, Brice Effantin, Hamamache Kheddouci
Efficient Multidimensional Data Redistribution for Resizable Parallel Computations

Traditional parallel schedulers running on cluster supercomputers support only static scheduling, where the number of processors allocated to an application remains fixed throughout the execution of the job. This results in under-utilization of idle system resources thereby decreasing overall system throughput. In our research, we have developed a prototype framework called ReSHAPE, which supports dynamic resizing of parallel MPI applications executing on distributed memory platforms. The resizing library in ReSHAPE includes support for releasing and acquiring processors and efficiently redistributing application state to a new set of processors. In this paper, we derive an algorithm for redistributing two-dimensional block-cyclic arrays from

P

to

Q

processors, organized as 2-D processor grids. The algorithm ensures a contention-free communication schedule for data redistribution if

P

r

 ≤ 

Q

r

and

P

c

 ≤ 

Q

c

. In other cases, the algorithm implements circular row and column shifts on the communication schedule to minimize node contention.

Rajesh Sudarsan, Calvin J. Ribbens
Distributed Local 2-Connectivity Test of Graphs and Applications

The vertex connectivity of a graph is the smallest number of vertices whose deletion separates the graph or makes it trivial. This work is devoted to the problem of vertex connectivity test of graphs in a distributed environment based on a constructive approach. The contribution of this paper is threefold. First, using a pre-constructed spanning tree of the considered graph, we present a protocol to test whether a given graph is 2-connected using only local knowledge. Second, we present an encoding of this protocol using graph relabeling systems. The last contribution is the implementation of this protocol in the message passing model. For a given graph

$\: G ,\:$

where

$\: M \:$

is the number of its edges,

$\: N \:$

the number of its nodes and

$\: \Delta \:$

is its degree, our algorithms need the following requirements: The first one uses

$\:O(\Delta \times N^2 )\:$

steps and

$\:O(\Delta \times \log \Delta)\:$

bits per node. The second one uses

$\: O(\Delta \times N^2 )\:$

messages and

$\:O(N^2)\:$

time and

$\:O(\Delta \times \log \Delta)\:$

bits per node. Furthermore, the studied network is

semi-anonymous

: Only the root of the pre-constructed spanning tree needs to be identified. Moreover, we investigate some applications that can use our protocol as a pre-processing task for initial configurations.

Brahim Hamid, Bertrand Le Saëc, Mohamed Mosbah

Architectures and Systems

Comparing Direct-to-Cache Transfer Policies to TCP/IP and M-VIA During Receive Operations in MPI Environments

The main contributors to message delivery latency in message passing environments are the copying operations needed to transfer and bind a received message to the consuming process/thread. To reduce this copying overhead, we introduce architectural extensions comprising a specialized network cache and instructions. In this work, we study the possible overhead and cache pollution introduced through the operating system and the communications stack as exemplified by Linux, TCP/IP and M-VIA. We introduce this overhead in our simulation environment and study its effects on our proposed extensions. Ultimately, we have been able to compare the performance achieved by an application running on a system incorporating our extensions with the performance of the same application running on a standard system. The results show that our proposed approach can improve the performance of MPI applications by 10% to 20%.

Farshad Khunjush, Nikitas J. Dimopoulos
Virtual Distro Dispatcher: A Costless Distributed Virtual Environment from Trashware

Obsolete hardware can be effectively reused through intelligent software optimization, which is possible only when source code is available. Virtual Distro Dispatcher (VDD) is a system that produces virtual machines on a central server and projects them on a number of costless physical terminals. VDD is the result of an extreme software optimisation based on virtualization and terminal servers. VDD creates and projects Linux distros that are completely customizable and different from each other. They are virtual desktop machines that can be used for testing or developing and are completely controllable directly from each terminal. Memory consumption has been strongly reduced without sacrificing performances. Test results are encouraging to proceed with the research towards clustering.

Flavio Bertini, D. Davide Lamanna, Roberto Baldoni
A Parallel Infrastructure on Dynamic EPIC SMT and Its Speculation Optimization

SMT(simultaneous multithreading) processors execute instructions from different threads in the same cycle, which has the unique ability to exploit ILP(instruction-level parallelism) and TLP(thread-level parallelism) simultaneously. EPIC(explicitly parallel instruction computing) emphasizes importance of the synergy between compiler and hardware. Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. Control and data speculations are effective ways to improve instruction level parallelism. In this paper, we present our efforts to design and implement a parallel environment, which includes an optimizing, portable parallel compiler OpenUH and SMT architecture EDSMT based on IA-64. Meanwhile, its speculation is also reexamined.

Qingying Deng, Minxuan Zhang, Jiang Jiang
An SRP Target Mode to Improve Read Performance of SRP-Based IB-SANs

SCSI RDMA Protocol (SRP) is used to build high performance Storage Area Networks (SANs) over InfiniBand, or SRP-based IB-SANs for short. The I/O read performance is critical for many read dominant applications, such as multimedia, remote sensing, data backup, etc. However, if I/O accesses focus on a specific storage device of an IB-SAN, the local I/O performance of single device could become the bottleneck, leaving the network performance under utilized. In this paper, we propose an SRP target mode called Target Disk Cache Assisted (tDCA) mode, which explores the file read-ahead feature and page cache of Linux to tackle the performance gap between storage devices and network. Experimental results show that this strategy improves the read performance in terms of throughput which is increased significantly for both random read with good locality and sequential read.

Zhiying Jiang, Jin He, Jizhong Han, Xigui Wang, Yonghao Zhou, Xubin He
An FPGA Design to Achieve Fast and Accurate Results for Molecular Dynamics Simulations

A Molecular Dynamics (MD) system is defined by the position and momentum of particles and their interactions. The dynamics of a system can be evaluated by an N-body problem and the simulation is continued until the energy reaches equilibrium. Thus, solving the dynamics numerically and evaluating the interaction is computationally expensive even for a small number of particles in the system. We are focusing on long-ranged interactions, since the calculation time is O(N

2

) for an N particle system. There are many existing algorithms aimed at reducing the calculation time of MD simulations. Multigrid (MG) method [1] reduces O(N

2

) calculation time to O(N) time while still achieving reasonable accuracy. Another movement to achieve much faster calculation time is running MD simulation on special purpose processors and customized hardware with ASICs or FPGAs. In this paper, we design and implement an FPGA-based MD simulator with an efficient MG method.

Eunjung Cho, Anu G. Bourgeois, Feng Tan
Performance and Complexity Analysis of Credit-Based End-to-End Flow Control in Network-on-Chip

Network-on-Chip is an alternative paradigm to improve communication bandwidth compared to bus-based communication, and its performance degrades if there is no effective flow control method., Heterogeneous networks with very slow processing elements (PEs) especially need a flow control mechanism at the transport layer to prevent too much packet injection. In this paper, a credit-based end-to-end flow control (CB-EEFC) is implemented to control the network latency at high traffic loads. Simulation in mesh networks shows improved performance in latency and 0.5% up to 3% saturated throughput decrease with the CB-EEFC method. RTL gate level simulation shows that a network interface using CB-EEFC brings about a 31.4% increase in complexity compared to a network interface without CB-EEFC.

Seongmin Noh, Daehyun Kim, Vu-Duc Ngo, Hae-Wook Choi
An QoS Aware Mapping of Cores Onto NoC Architectures

Network-on-chip (NoC) is being proposed as a scalable and reusable communication platform for future SoC applications. The NoC, somewhat, resembles the parallel computer network. However, the NoC design highly requires the certain satisfaction of latency, power consumption, and area constraints. The latency of the network relates much to throughput and power consumption. Moreover, the IPs and the network are heterogeneous. Hence, a certain mapping of IPs onto a certain architecture produces a certain value of network latency as well as power consumption. The change of mapping scheme leads to a significant change of the values of these constraints. The fact that if we want to maximize the system’s throughput, the network latency also increases and if we minimize the network latency, the trade off is that the throughput will decrease. In this paper, we present an mapping scheme that does compromise between throughput maximization and latency minimization. This sub-optimal mapping is found using the spanning tree searching algorithm. The experiment architecture using here is Mesh based topology. We use NS2 to simulate and calculate the system throughput and system power consumption is calculated using Orion model.

Huy-Nam Nguyen, Vu-Duc Ngo, Younghwan Bae, Hanjin Cho, Hae-Wook Choi
Latency Optimization for NoC Design of H.264 Decoder Based on Self-similar Traffic Modeling

In this article, we present analytical method to evaluate the NoC design of H.264 decoder’s latency based on the self-similar traffic models of all 12 IPs. The traffic models are generated by using the superposition of four 2-state Modulated Markov Poisson Process (MMPP) and the real traced data transaction between IPs. The optimization engine is utilized to automatically allocate IPs on the desired routers to achieve the minimal latency.

Vu-Duc Ngo, June-Young Chang, Younghwan Bae, Hanjin Cho, Hae-Wook Choi
Hardware Implementation of Common Protocol Interface for a Network-Based Multiprocessor

Our research project “UMP-PJ” has suggested the UMP Network Architecture for the next-generation computing infrastructure, in which each network node is coordinated each other. We have conducted the research on the basic architecture of the UMP Network, and shown its usefulness in the last papers. We defined a Processing Element (PE) comprised of PE Wrapper and PE Core. PE Wrapper is a common network interface of network node for UMP Network, and PE Core is an appreciation-specific function module. This paper evaluates the hardware implementation of PE. Especially, PE Wrapper is the key to satisfy the scalability and flexibility of UMP Network Architecture. The experimental model processed JPEG encoding application successfully with a PE implemented with an FPGA board on PC in conjunction with other software PEs. Experimental results demonstrate that no system bottleneck and redundant processing are caused by PE Wrapper implemented with hardware. This implies UMP Network Architecture is suitable for hardware implementation.

Arata Shinozaki, Mitsunori Kubo, Takayuki Nakatomi, Baoliu Ye, Minyi Guo

Datamining and Databases

A Distributed Hebb Neural Network for Network Anomaly Detection

One of the most challenging problems in anomaly detection is to develop scalable algorithms which are capable of dealing with large audit data, network traffic data, or alter data. In this paper a distributed neural network based on Hebb rule is presented to improve the speed and scalability of inductive learning. The speed is improved by randomly splitting a large data set into disjoint subsets and each subset data is presented to an independent neural network, these networks can be trained in distributed and each one in parallel. The analysis of completeness and risk bounds of competitive Hebb learning proof that the distributed Hebb neural network can avoid the accuracy being degraded as compared to running a single algorithm with the entire data. The experiments are performed on the KDD’99 Data set, which is a standard intrusion detection benchmark. Comparisons with other approaches on the same benchmark demonstrate the effectiveness and applicability of the proposed method.

Daxin Tian, Yanheng Liu, Bin Li
Processing Global XQuery Queries Based on Static Query Decomposition

The concept of XML views is proposed to integrate heterogeneous data sources in XML, and a global XML view can represent the distributed heterogeneous data sources as single XML document. XQuery is one of the standard query languages for searching XML data. How to process efficiently of global XQuery queries, which are written under a global XML view, emerges as a new research topic. Popular techniques of distributed query processing are based on decomposition, and static decomposition is one of the processing techniques for the traditional query language. However, XQuery has a complex syntax such as FOR clauses that are not supported by other query languages, and its structural complexity causes some considerations for the decomposition of global XQuery queries. In this paper, we propose a method to decompose a global XQuery query into local XQuery queries and to construct the result of the global XQuery query from the results of the local XQuery queries.

Jong-Hyun Park, Ji-Hoon Kang
Formal Verification and Performance Evaluation of User Query Pattern-Based Relational Schema-to-XML Schema Translation Algorithm

This paper describes formal verification and quantitative performance evaluation for validating of query pattern-based relational schema-to-XML Schema translation (QP-T) algorithm. Many translation algorithms such as FT, NeT, CoT, ConvRel, and VP-T have been introduced on structural and/or semantic aspect for exact and effective translation. However, conventional algorithms consider only explicit referential integrity specified by relational schema or limitations regarding on reflection of implicit referential integrity information. It causes several problems such as incorrect translation, abnormal relational model transition, and so on. The QP-T algorithm analyzes query pattern and extract implicit referential integrities by interrelationship of equi-join in user queries. The QP-T algorithm can make up for weak points of the VP-T algorithm and create more exact XML Schema as a result by using with VP-T algorithm together.

Jinhyung Kim, Dongwon Jeong, Doo-Kwon Baik
Adaptive Processing for Continuous Query over Data Stream

Stream applications such as sensor data processing, financial tickers and Internet traffic analysis require that information, naturally, occur as a stream of data values. Due to a late and out-of-order arrival of infinite, unbound and multiple input streams, processing continuous queries over them may lead to producing an incorrect answer or delaying query execution. Hence to minimize this waiting time, previous works have used timeout technique without considering the frequency of timeouts. It results in decreasing the accuracy of query execution results, since the more the frequency of timeouts, the more the loss of data. We propose an AP-STO method using StB that stores operator’s state and a window time-out method based on the waiting time for the next tuple by resetting the size of a window according to the frequency of timeouts. It reduces a data lost rate and increases the tuples output-rate. We compare AP-STO method with an existing method and use output-rate and response time as criteria for performance evaluation. Our proposed method shows a substantial improvement in system performance in terms of the accuracy of query execution and the increment of tuples output-rate per a query due to the reduction in loss rate of data.

Misook Bae, Buhyun Hwang, Jiseung Nam
Parallel Computation of Closed Itemsets and Implication Rule Bases

Formal concept analysis has been successfully applied as a data mining framework whereby target patterns come in the form of intent families and implication bases. Since their extraction is a challenging task, especially for large datasets, parallel techniques should be helpful in reducing the computational effort and increasing the scalability of the approach. In this paper we describe a way to parallelize a recent divide-and-conquer method computing both the intents and the

Duquenne-Guiges

implication basis of dataset. Wile intents admit a straightforward computation, adding the basis — whose definition is recursive — poses harder problems, in particular, for parallel design. A first, and by no means final, solution relies on a partition of the basis that allows the crucial and inherently sequential step of redundancy removal to be nevertheless split into parallel subtasks. A prototype implementation of our method, called

ParCIM

, shows a nearly linear acceleration w.r.t. its sequential counter-part.

Jean François Djoufak Kengue, Petko Valtchev, Clémentin Tayou Djamegni
An Optimal Share Transfer Problem on Secret Sharing Storage Systems

We have been developing a secure and reliable distributed storage system, which uses a secret sharing scheme. In order to efficiently store data in the system, this paper introduces an optimal share transfer problem, and proves it to be, generally, NP-hard. It is also shown that the problem can be resolved into a Steiner tree problem. Finally, through computational experiments we perform the comparison of heuristic algorithms for the Steiner tree problem.

Toshiyuki Miyamoto, Sadatoshi Kumagai
Deadline and Throughput-Aware Control for Request Processing Systems

Congestion Control is a necessary tool in Transaction Processing Systems (TPS), to avoid excessive degradation of response times. But it should have an autonomic behavior, adapting automatically to request characteristics to deliver the best possible service. Given that every request has either explicit or implicit deadlines (maximum acceptable response times), we analyze strategies that target request deadlines and throughput. These include maximum throughput seeker strategies, strategies using feedback control on miss rate and an additional proposal for preventive control of miss rates and throughput. Another goal of our proposal is for the control to be external and it should not rely on analytic models for control (because the predictions may be erroneous due to physical system issues). We analyze and compare alternative designs for the control including our own proposals and related ones. Our experiments consider both varied request inter-arrival and duration distributions and a real transaction processing benchmark.

Pedro Furtado, Ricardo Antunes
Cluster Recovery for Fault Tolerance of Spatial Database Cluster in Sensor Networks

In sensor networks, a huge amount of data is collected by millions of sensors and small mobile devices need to be processed fast. And database which stores those data always should be able to response for any requirement. Spatial database cluster provides high performance and high availability. That suits sensor networks because spatial database cluster can efficiently manage and process much amount of data. The previous system, however, should write external logs every transaction for high availability in all nodes. So, all of update transactions are slowly processed because of writing external logs. Also, recovery time of the failed node is increased because external logs for all of database are written in only single storage. In this paper, we propose the cluster recovery of spatial database cluster. The proposed method has cluster logs in each table unit for consistency among nodes. Also, the cluster log is written just in case any node is failed. Therefore the proposed method is processed more fast because all update transactions don’t write cluster logs. And the proposed method provides fast recovery because each table can be recovered concurrently.

Byeong-Seob You, Gyung-Bae Kim, Hae-Young Bae

Fault Tolerance and Security

A Secure Energy-Efficient Routing Protocol for WSN

The intent of this paper is to propose an energy-efficient routing protocol with data transmission security for wireless sensor networks. We create an energy and distance aware sink-rooted tree in the network which is used for secure data transmissions from the source sensors to the base station. We mainly focus on ensuring authenticity and confidentiality of the sensor reports by adopting one-way hash chain and pre-loaded shared secret keys. To achieve data freshness, there is an optional key refreshment mechanism in our protocol. Along with the detailed description of our protocol, we present an analysis and performance evaluation of our proposal.

Al-Sakib Khan Pathan, Choong Seon Hong
Designing Scalable Self-healing Key Distribution Schemes with Revocation Capability

Self-healing key distribution is a potential candidate to establish session keys for secure communication to large and dynamic groups in highly mobile, volatile and hostile wireless network, where frequent membership changes may be necessary and ability to revoke users during certain exchanges is desirable. The main property of self-healing key distribution scheme is that even if during a certain session some broadcast messages are lost due to network faults, the users are capable of recovering lost session keys on their own, without requesting additional transmission from the group manager. In this paper, we propose a scalable self-healing key distribution with

t

revocation capability. Our proposed scheme has improvement in storage overhead over the previous approaches with the same communication cost required by the most optimal previous scheme. The scheme is supported by a proper security analysis in an appropriate security model. We prove that it is unconditionally secure and achieve both forward secrecy and backward secrecy. Our proposed self-healing key distribution is not restricted to

m

sessions in

Setup

phase. Besides, we develop a construction for self-healing key distribution that enables key recovery from a single broadcast message.

Ratna Dutta, Sourav Mukhopadhyay
Key Predistribution Using Partially Balanced Designs in Wireless Sensor Networks

We propose two deterministic key predistribution schemes in a Wireless Sensor Network (WSN), in which sensor nodes are deployed randomly. Both the schemes are based on Partially Balanced Incomplete Block Designs (PBIBD). An important feature of our scheme is that every pair of nodes can communicate directly, making communication faster and efficient. The number of keys per node is of the order of

$\sqrt N$

, where

N

is the number of nodes in the network. Our second design has the added advantage that we can introduce new nodes in the network keeping the key pool fixed. We study the resiliency of the network under node compromise and show that our designs give better results than the existing ones.

Sushmita Ruj, Bimal Roy
An Efficient ID-Based Authenticated Key Agreement Protocol with Pairings

In 2003, Shim proposed an efficient ID-based authenticated key agreement protocol based on Weil pairings [1]. Sun et al. raised the potential of a man-in-the-middle attack in [2]. In 2004, Ryu et al. proposed an efficient ID-based authenticated key agreement protocol from pairings [3]. In 2005, however, Boyd et al. noted security problems of Ryu et al.’s protocol in [4]. In 2005, Yuan et al. also pointed out the same weakness [5] in Ryu et al.’s protocol. Then, they proposed a new protocol that combines Ryu et al.’s protocol with Shim’s protocol. In this paper, we demonstrate that Shim’s protocol does not provide KGC forward secrecy, then we propose a more efficient and secure protocol which does provide such security. As a result, our protocol does not need an additional ECC point-addition unlike Yuan et al.’s protocol and our’s can generate two secure session key to perform the secure message transmission.

Jai-Boo Oh, Eun-Jun Yoon, Kee-Young Yoo
Leveraging Many Simple Statistical Models to Adaptively Monitor Software Systems

Self-managing systems require continuous monitoring to ensure correct operation. Detailed monitoring is often too costly to use in production. An alternative is adaptive monitoring, whereby monitoring is kept to a minimal level while the system behaves as expected, and the monitoring level is increased if a problem is suspected. To enable such an approach, we must model the system, both at a minimal level to ensure correct operation, and at a detailed level, to diagnose faulty components. To avoid the complexity of developing an explicit model based on the system structure, we employ simple statistical techniques to identify relationships in the monitored data. These relationships are used to characterize normal operation and identify problematic areas.

We develop and evaluate a prototype for the adaptive monitoring of J2EE applications. We experiment with 29 different fault scenarios of three general types, and show that we are able to detect the presence of faults in 80% of cases, where all but one instance of non-detection is attributable to a single fault type. We are able to shortlist the faulty component in 65% of cases where anomalies are observed.

Mohammad Ahmad Munawar, Paul A. S. Ward
Binomial Graph: A Scalable and Fault-Tolerant Logical Network Topology

The number of processors embedded in high performance computing platforms is growing daily to solve larger and more complex problems. The logical network topologies must also support the high degree of scalability in dynamic environments. This paper presents a scalable and fault tolerant topology called binomial graph (BMG). BMG provides desirable topological properties in terms of both scalability and fault-tolerance for high performance computing such as reasonable degree, regular graph, low diameter, symmetric graph, low cost factor, low message traffic density, optimal connectivity, low fault-diameter and strongly resilient. Several fault-tolerant routing algorithms are provided on BMG for various message types. More importantly, BMG is able to deliver broadcast messages from any node within

log

2

(

n

) steps.

Thara Angskun, George Bosilca, Jack Dongarra
Eventually Perfect Failure Detectors Using ADD Channels

We present a novel implementation of the

eventually perfect failure detector

(

$\Diamond\mathcal{P}$

) from the original hierarchy of Chandra-Toueg oracles. Previous implementations of

$\Diamond\mathcal{P}$

have assumed models of partial synchrony where point-to-point message delay is bounded and/or communication is reliable. We show how to implement this important oracle under even weaker assumptions using Average Delayed/Dropped (ADD) channels. Briefly, all messages sent on an ADD channel are

privileged

or

non-privileged

. All non-privileged messages can be arbitrarily delayed or even dropped. For each run, however, there exists an unknown window size

w

, and two

unknown

upper-bounds

d

and

r

, where

d

bounds the average delay of the last

w

privileged messages, and

r

bounds the ratio of non-privileged messages to privileged messages per window.

Srikanth Sastry, Scott M. Pike
Stochastic Communication Delay Analysis of Adaptive Wormhole-Switched Routings in Tori with Faults

This paper proposes a novel analytical modeling approach to investigate the performance of five prominent adaptive routings in wormhole-switched 2-D tori fortified with an effective scheme suggested by Chalasani and Boppana [1], as an instance of a fault-tolerant method. This scheme has been widely used in the literature to achieve high adaptivity and support inter-processor communications in parallel computers due to its ability to preserve both communication performance and fault-tolerant demands in such networks. Analytical results of the model are confirmed by comparing with those obtained through simulation experiments.

Farshad Safaei, Mahmood Fathy, Ahmad Khonsari, Mohamed Ould-Khaoua
An Efficient Fault-Tolerant Routing Methodology for Fat-Tree Interconnection Networks

In large cluster-based machines, fault-tolerance in the interconnection network is an issue of growing importance, since their increasing size rises the probability of failure. The topology used in these machines is usually a fat-tree. This paper proposes a new distributed fault-tolerant routing methodology for fat-trees. It does not require additional network hardware. It is scalable, since the required memory, switch hardware and routing delay do not depend on the network size. The methodology is based on enhancing the Interval Routing scheme with exclusion intervals. Exclusion intervals are associated to each switch output port, and represent the set of nodes that are unreachable from this port after a failure appears. We propose a mechanism to identify the exclusion intervals that must be updated after detecting a failure, and the values to write on them. Our methodology is able to support a relatively high number of network failures with a low degradation in network performance.

Crispín Gómez, María E. Gómez, Pedro López, José Duato
On the Optimality of Rollback-Recovery Protocol Preserving Session Guarantees

The rVsAll rollback-recovery protocol assures that consistency models, called session guarantees, are provided for mobile clients and unreliable servers. The protocol is optimized according to the required consistency model by taking into account properties of session guarantees and integrating their consistency management with known rollback-recovery techniques: message-logging and checkpointing. We have proved the correctness of rVsAll protocol in terms of safety and liveness properties. These properties assert that clients access object replicas maintained by servers according to the required session guarantee regardless of server’s failures, and state that each access operation issued by clients will eventually be performed (in a finite time), even in the presence of server’s failures.

In this paper, we show that the proposed protocol is also optimal, in the sense that, the consistent global checkpoint taken in rVsAll protocol contains the minimal number of operations indispensable to fulfill the required session guarantees after server’s failure and its recovery. The paper presents the proof of the optimality property of the proposed protocol.

Jerzy Brzeziński, Anna Kobusińska, Jacek Kobusiński

Middleware and Cooperative Computing

A Replication Software Architecture(RSA) for Supporting Irregular Applications on Wide-Area Distributed Computing Environments

In the distributed computing environment, many large-scale scientific applications are irregular applications which perform their computation and I/O on an irregularly discretized mesh. However, most of the previous work in the area of irregular applications focuses mainly on the local environments. In distributed computing environments, since many remotely located scientists should share the data to produce useful results, providing a consistent data replication mechanism to minimize the remote data access time is a critical issue in achieving high-performance bandwidth. We have developed a replication software architecture(RSA) that enables the geographically distributed scientists to easily replicate irregular computations with minimum overheads, while safely sharing large-scale data sets to produce useful results. Since RSA uses database support to store the data-related and computational-related metadata, it can easily be ported to any computing environments. In this paper, we describe the design and implementation of RSA for irregular applications and present performance results on Linux clusters.

Jaechun No, Chang Won Park, Sung Soon Park
Cooperative Grid Jobs Scheduling with Multi-objective Genetic Algorithm

Job scheduling on computational grids is a key problem in large scale grid-based applications for solving complex problems. The aim is to obtain an efficient scheduler able to allocate dependable jobs originated from large scale applications on hierarchy based grid computing platforms with heterogeneous resources. In contrast to satisfying multi objectives of different levels, which is NP-hard in most formulations, a set of cooperative multi-objective genetic algorithm (MOGA) is presented. Using this scheme, the application level generates multiple local schedules based on local nodes and objectives to a schedule pool, from which the system level can assemble a set of global schedules according to global objectives. The MOGA scheduling scheme is shown to perform well on the experimental scenario, which shows its flexibility and possible application to more complex job scheduling scenarios with multiple and diverse tasks and nodes.

Bin Zeng, Jun Wei, Wei Wang, Pu Wang
A Pro-middleware for Grids Computing

Several works on grid computing have been proposed during the last few years. However, most of them including available software, can not deal properly with some issues related to abstraction and friendliness of grid. A much wider variety of applications and large community of users can get benefit once grid computing technologies become easier to use and more sophisticated. This work concentrates on presenting ‘Users-Grid’ (pro-middleware), which sits between the grid middleware and user applications. It introduces a high-level abstraction layer and hides the intricacies of the grid middleware. It attempts to make the grid-related aspects transparent to the end users. Its main features are automatic DAG inference and seamless job submission. It facilitates end-users to directly run their applications on grid by themselves without depending on the expertise of support teams.

Raihan Ur Rasool, Qingping Guo
On Formal MOM Modeling

Distributed applications are usually concurrent and nondeterministic. For this reason, formal verification on their design specifications is an essential technique for us to gain more confidence in the correctness of the behavioral aspects of our design before putting them into coding stage. Message-Oriented Middleware (MOM) is widely used to simplify the task of interprocess communications in distributed applications. To model the MOM-based applications for verification purpose, the services provided by MOM must also be integrated into the models. However, MOM modeling is non-trivial. While providing high-level program interfaces which shield programmers from the complexity of the underlying operating systems and networks, MOM may also conceals under such interfaces the concurrency and nondeterminism present in the underlying networks. This increases the possibility of misinterpretting the behavior of the applications, which in turn causes design errors. An over-abstracted MOM model based ons Application Programming Interface may bury such design errors while an over-detailed model may consume too much resource and render the verification infeasible. As a guideline for MOM modeling, we present several formal models of various behavioral aspects of MOM in terms of Promela, the specification language used in SPIN model checker. Based on our empirical study, we also discuss the impact of incorporating these formal models in different settings into the MOM-based application models, in terms of increased state space for model checking.

Hanmei Cui, Jessica Chen
Performability Analysis of Grid Architecture Via Queueing Networks

One of the major challenges for grid technologies is to create the scientific and technological base for share, collaboration, large-scale distributed systems. Theories and models of grid architectures are important to this endeavor as well as to providing the foundations for constructing grid systems able to work effectively. It is important to model and analyze the grid architecture so that it can evolve guided by scientific principles. On the basis of a coarse-grain classification of grid applications, we present a novel grid architecture taxonomy, interaction-intensive and computation-intensive architecture. In this paper, we will give some new grid performance metrics and model grid architectures mathematically via queueing networks. In addition, we obtain some scientific principles guiding the grid architecture design.

Haijun Yang, Minqiang Li, Qinghua Zheng
An Effective Approach Based on Rough Set and Topic Cluster to Build Peer Communities

A peer community is composed of a number of peers who share files about the same topic in the file sharing P2P applications. Building peer communities can benefit content location and retrieval in P2P systems. We propose an effective approach based on rough set and topic cluster to build peer communities. Firstly, we compute one of the best reduced sets of all the same type files, such as the video files, with files’ attributes in a peer. Secondly, topic clusters of a peer are calculated, which represent the interests of it. Finally, we build peer communities using the super peer technique. Experiments performed on the real data sets prove that our approach is effective. Experimental results verify that our approach works much better compared with that of previous approaches.

Quanqing Xu, Zhihuan Qiu, Yafei Dai, Xiaoming Li
Evaluation on the UbiMDR Framework

This paper describes the performance evaluation to validate the superiority of UbiMDR framework. In ubiquitous application, the semantic operability is one of the most important issues to maximize the usability of sensors in sensor fields. However, existing frameworks are not suitable for the ubiquitous computing environment because of data heterogeneity between data elements. The MDR-based framework in ubiquitous computing provides the semantic interoperability among ubiquitous applications or sensor fields. In addition, the UbiMDR framework represents low costs for mapping or addition of new elements or operation compared to conventional framework.

Jeong-Dong Kim, Dongwon Jeong, Jinhyung Kim, Doo-Kwon Baik
Distributing Fixed Time Slices in Heterogeneous Networks of Workstations (NOWs)

Heterogeneous Networks of Workstations (NOWs) offer a cost-effective solution for parallel processing. The completion time of a parallel task over NOWs depends on how the task is divided and distributed among the heterogeneous workstations. In this paper we present a distribution scheme which attempts to minimize the task’s completion time over a heterogeneous NOWs. The scheme is based on the idea of distributing fixed time slices of work as opposed to fixed work slices. Our simulations show that the proposed scheme outperforms both fixed and variable work distribution schemes commonly in use. The scheme is very simple and requires no active monitoring of the network. Furthermore it is adaptive and copes very well with the changes in background loads on workstations and network interference.

Yassir Nawaz, Guang Gong
A Grid Resources Valuation Model Using Fuzzy Real Option

In this study, we model pricing of grid/distributed computing resources as a problem of real option pricing. Grid resources are non-storable compute commodities (eg., CPU cycles, memory, etc). The non-storable characteristic feature of the grid resources hinders it from fitting into a risk-adjusted spot price model for pricing financial options. Grid resources users pay upfront to acquire and use grid compute cycles in the future, for example, six months. The user expects a high and acceptable degree of satisfaction expressed as the Quality of Service (QoS) assurance. This requirement further imposes service constraints on the grid because it must provide a user-acceptable QoS guarantee to compensate for the upfront value. This study integrates three threads of our research; pricing the grid compute cycles as a problem of real option pricing, modeling grid resources spot price using a discrete time approach, and addressing uncertainty constraints in the provision of QoS using fuzzy logic. We have proved the feasibility of this model through experiments and we have presented some of our pricing results and discussed them.

David Allenotor, Ruppa K. Thulasiram
Enhancing Data Replication with Greedy Pipeline-Based Aggressive Copy Protocol in Data Grids

To gain high performance computing or store large amount of data using inexpensive devices, grid system is one of the well-known solutions. In most cases, the grid can be categorized into two types: computational grid and data grid. Data grid is used for data intensive applications. In data grids, replication is used to reduce access latency and bandwidth consumption. Furthermore, it can also improve data availability, load balancing and fault tolerance. If there are many replicas, they may have coherence problems while being updated. In this paper, based on the aggressive-copy method, we propose a novel Greedy Pipeline-based Aggressive Copy (GPAC) protocol. The performance of pipelining dataset blocks and greedy sequencing in the GPAC can accelerate data replication speed in compared with previous works. Both analytical and experimental results show promising performance enhancements.

Reen-Cheng Wang, Su-Ling Wu, Ruay-Shiung Chang
A Performance Comparison of the Contiguous Allocation Strategies in 3D Mesh Connected Multicomputers

The performance of contiguous allocation strategies can be significantly affected by the distribution of job execution times. In this paper, the performance of the existing contiguous allocation strategies for 3D mesh multicomputers is re-visited in the context of heavy-tailed distributions (e.g., a Bounded Pareto distribution). The strategies are evaluated and compared using simulation experiments for both First-Come-First-Served (FCFS) and Shortest-Service-Demand (SSD) scheduling strategies under a variety of system loads and system sizes. The results show that the performance of the allocation strategies degrades considerably when job execution times follow a heavy-tailed distribution. Moreover, SSD copes much better than FCFS scheduling strategy in the presence of heavy-tailed job execution times. The results also show that the strategies that depend on a list of allocated sub-meshes for both allocation and deallocation have lower allocation overhead and deliver good system performance in terms of average turnaround time and mean system utilization.

Saad Bani-Mohammad, Mohamed Ould-Khaoua, Ismail Ababneh, Lewis Mackenzie
An Enhanced Approach for PDA and Cellular Clients to Submit and Monitor Applications in the Mobile Grid

Some challenges in the mobile grid environments are not handled by some related works, for example, adapting to heterogeneous interfaces of different mobile devices (PDAs and cellular) for submission and monitoring of applications in grid environments. This article presents an approach that employs the workflow concept for providing automated and adapted features for executing applications in grid mobile configurations. The approach, coined as SuMMIT, enabled to consume less battery energy of PDAs and more agility for submitting and monitoring applications in comparison with some related works. In addition, the SuMMIT environment provides an execution flow adjustment, in case of disconnection, matching requirements of submitted application and options defined by the user. The SuMMIT also has adapted and optimized characteristics according to some limitations and problems found in different devices.

Vinicius C. M. Borges, Anubis G. M. Rossetto, Frank J. Knaesel, Mario A. R. Dantas
GiPS: A Grid Portal for Executing Java Applications on Globus-Based Grids

Grids are becoming the platform of choice for high performance computing. Although grids present a unified view of resources, they need evolved user interfaces in order to fully take advantage of their potential for real applications. Grid portals can deliver complex grid solutions to users; they do so without the need to download or install specialized software, or worrying about setting up networks, firewalls, and port policies. Due to the powerful, general-purpose nature of grid technology, and the nature of the grid resources they expose, the security of portals or points of access to such resources must be carefully considered. In this paper we present

GiPS

, the

suma/g

Grid Portal, a user-specific portal which allows Java applications to access grid resources for execution. We describe how the portal exploits standard, off-the-shelf commodity software together with existing grid infrastructures in order to facilitate security and data access. The main technologies used by

GiPS

are GSI, MyProxy, Java CoG Kit, GridSphere, and

suma/g

middleware. In

suma/g

, Java classes and files are loaded on demand from the user’s machines. We describe how the

suma/g

Grid Portal supports this execution model and how it allows users to access controlled external data servers (users’

local

file systems and file systems accessible from their local workstations), under a secure platform.

Yudith Cardinale, Carlos Figueira
Advanced Grid DataBase Management with the GRelC Data Access Service

Many data grid applications manage and process huge datasets distributed across multiple grid nodes and stored into heterogeneous databases; e-Science projects need to access widespread databases within a computational grid environment, through a set of secure, interoperable and efficient data grid services. In the data grid management area several projects aims at addressing these issues providing different solutions and proposing a set of data access and integration/federation services.

In this paper we present the GRelC Data Access, a WS-I based data grid access service developed by SPACI Consortium and the University of Salento. Its architectural design is discussed and experimental results related to an European testbed are also reported.

Sandro Fiore, Alessandro Negro, Salvatore Vadacca, Massimo Cafaro, Maria Mirto, Giovanni Aloisio
A Generic Distributed Monitor Construct for Programming Process Synchronization in Distributed Systems

The monitor construct has been implemented in several concurrent and/or parallel programming languages for shared-memory system environments, Extensions of the monitor to support process synchronization in distributed systems have also been proposed. But, most existing work only provides the architecture design of the distributed monitor. There is no discussion about the algorithmic and implementation issues. Also, none of them consider how to implement conditional variables. In this paper, we present the design and implementation of a distributed monitor construct, named DisMoniC, for programming process synchronization in distributed systems. DisMoniC is generic in the sense that it can be used with any distributed mutual exclusion (DME) algorithm to implement exclusive access to the monitor operations. Time-efficient algorithms are proposed to implement conditional process synchronization in the distributed monitor. We also present performance evaluation of the proposed construct.

Jiannong Cao, Miaomiao Wang, Weigang Wu, Xianbing Wang, Stephen C. F. Chan

Networks

Low Latency Vertical Handover Using MIH L2-Trigger Algorithm in Mobile IP Networks

Recently, information communication market is reorganized focused on the digital convergence service used for integrating the communication, broadcasting and internet media by connecting the communication equipment, broadcasting equipment and computer via a network. The mobility support protocol enabling the effective handover between heterogeneous networks is emerged as the hot issue in relation with these next generation communication networks. In this paper, we are to suggest the MIH Vertical handover procedure required for the implementation of MIH over MIPv4 Low-Latency vertical handover by analyzing the handover performance in the overlap area between wireless networks using the vertical handover function of MIPv4 Mobile IP Technology of IETF and considering the expected problems. Also, we are to suggest the MIH L2-Trigger generation algorithm required for deciding the optimal handoff. We evaluate the vertical handover performance of MIPv4 low-latency in the overlap area between wireless networks by using the NS-2 and applying the IEEE 802.21 MIH (Media Independent Handover) based on the suggested technique and algorithm.

Jin-Man Kim, Jong-Wook Jang
SPACC: A Simple Positioning and Coverage Control Solution for Wireless Sensor Networks

A category of wireless sensor networks consists of lots of autonomous sensor nodes with limited power and few base stations with theoretically unlimited power. A number of redundant nodes are usually deployed densely in these types of networks in order to provide redundancy for sensing and communications. There is a challenge though of which nodes must be active and which ones must be asleep, without compromising the coverage and network connectivity. To get round this challenge, each node should somehow know the position of its immediate neighbors. Previous researches have impractically assumed the existence of a GPS module in each node, which is in direct contradiction with the main constraints of low cost and size of sensor nodes. This paper proposes an energy saving solution without requiring the nodes to possess any physical GPS. The goal is to minimize the number of active sensors with respect to coverage and connectivity. Each node decides locally by itself whether to be active or not. There is no need for any global synchronization between nodes. Simulation results show that the higher density of nodes in our proposed solution leads to better coverage, higher energy saving and longer network lifetime.

Mohsen Sharifi, Ehsan Farzad
Research of Routing Algorithm in Hierarchy-Adaptive P2P Systems

Recently superpeers are introduced to improve the performance of P2P systems. A superpeer is a node in a P2P system that operates as a server for a set of clients. By exploiting heterogeneity, the superpeer paradigm allows P2P systems to run more efficiently. This paper proposes a hierarchy-adaptive P2P topology DAHP2P and a hierarchical routing algorithm Hroute. Peers are grouped into clusters according to proximity and super peers form the upper-level overlay, the number of hierarchy is self-adaptively changed according to the number of nodes in the system, a hierarchical routing algorithm is designed to reduce the routing hops. Simulation results show that Hroute can significantly reduce the expected number of hops and latency of message routing, and loads of peers at different layers are relatively balanceable.

Xiao-Ming Zhang, Yi-Jie Wang, ZhouJun Li
Bandwidth Degradation Policy for Adaptive Multimedia Services in Mobile Cellular Networks

One of the key challenges in the optimal control of adaptive multimedia services transmission in mobile cellular network is to balance user satisfaction and fairness among all users, while at the same time ensuring that the scarce bandwidth be utilized efficiently. We propose a novel Bandwidth Degradation Policy (

BDP

) for Adaptive Multimedia Services which could solve the conflicting relationship between user satisfaction and fairness. The

BDP

employs specific degradation policy to accommodate more users and compensation mechanism to maintain fairness among all users, and meanwhile increasing the user satisfactions and the whole network revenue. Specifically, we introduce four new

QoS

metrics to evaluate the performances of adaptive multimedia services in mobile cellular networks. Finally we compare our

BDP

with other strategies in the literature through both of the analytical results and simulation results. The presented extensive simulation results with concerns of mobility validate our analysis.

Yide Zhang, Lemin Li, Gang Feng
On the System Performance vs. User Movement with Systematic Simulation in Mobile Cellular Networks

We demonstrate in the paper that user movement has great influences on the system performance of mobile cellular networks. We also develop a system model to study to what extent the influences will be, with focusing on the user movement simulation. We concern the parameters that determine the system performance of mobile cellular networks. We show partially in this paper that these parameters vary while user movement changes, but the effects are interactive. Since handoff calls are given a higher priority over new calls and

GC

mechanism is employed, our analysis shows that a mature mobile cellular network should make use of user movement characteristics and dynamically adjust its reservation threshold of the handoff call requests. Extensive simulation results validate our analysis.

Yide Zhang, Lemin Li, Gang Feng
Channel Assignment and Spatial Reuse Scheduling to Improve Throughput and Enhance Fairness in Wireless Mesh Networks

In wireless mesh network, by equipped mesh router with multiple radios tuned into orthogonal channels, throughput improvement problem can be alleviated. Efficient channel assignment and link scheduling is essential for throughput improvement. Effective channel assignment schemes can greatly relieve the interference effect between nearby transmissions. However, not only the links in wireless mesh network using different channels can be activated at a time, but some links in the same channel also can be activated concurrently if the SNIR (Signal-to-Noise and Interference Ratio) at their receiver endpoints is not lower than the threshold. In this paper, we investigate the problem of how to schedule a set of feasible transmission under physical interference model by using the Spatial TDMA access scheme and channel assignment in wireless mesh networks. We also consider the fairness enhancement to prevent some border nodes of the network from starvation. By using Minimum Spanning Tree as network subgraph constructed from original network graph, we propose centralized algorithms for scheduling and channel assignment to maximize the aggregate throughput and to provide the fairness of the network. We also evaluate the throughput improvement and fairness enhancement of our algorithms through extensive simulations and the results show that our algorithm can achieve significant aggregate throughput and fairness performance.

Nguyen H. Tran, Choong Seon Hong
Effects of Mobility on Membership Estimation and Routing Services in Ad Hoc Networks

The use of MANET’s (or Mobile Ad hoc NETworks) is becoming very popular. Power efficiency is a key issue in this type of network, as mobile devices usually rely on limited power supplies. One essential service, the routing protocol, employed to discover routes between nodes in the network, can greatly affect power consumption.

Furthermore, many distributed applications require an additional membership service to keep track of the nodes that make up the system at any moment. In general, this information is not provided by routing services with the exception of the Optimised Link State Routing protocol (OLSR).

The two services, routing and membership estimation, form a basic support to build other higher–level distributed services on ad hoc networks. To decrease the over-all power consumption these services should be optimized for the intended use of the network. In particular the degree of mobility can have an impact on the power consumption and performance of different approaches to routing and membership estimation.

In this paper we present a study of two different approaches that combine a routing service with membership estimation. We compare the proactive OLSR with our own approach. Our approach consists of integrating a gossip-style failure detector with the reactive Dynamic Source Routing protocol (DSR).

We present an analysis of the effects of mobility on the global performance and power consumption of the two approaches. We identify scenarios for which each approach is best suited.

Juan Carlos García, Mari-Carmen Bañuls, Stefan Beyer, Pablo Galdámez
Hamiltonicity and Pancyclicity of Binary Recursive Networks

By means of analysis and generalization of the hypercube and its variations of the same topological properties and network parameters, a family of interconnection networks, referred to as binary recursive networks, is introduced in this paper. This kind of networks not only provides a powerful method to investigate the hypercube and its variations on the whole, but also puts forth an effective tool to explore new network structures. A constructive proof is presented to show that binary recursive networks are Hamiltonian based on their recursive structures, and an approach to prove 4-pancyclicity of a subfamily of binary recursive networks is outlined.

Yun Sun, Zhoujun Li, Deqiang Wang
Strategies for Traffic Grooming over Logical Topologies

In WDM mesh networks, low-speed data streams from individual users are combined, using the techniques of traffic grooming, for efficient utilization of the high bandwidth of a lightpath. The objective of traffic grooming is to minimize the cost of the network and/or to maximize the network throughput. Proposed solutions for this optimization problem are computationally intractable, even for networks of moderate size. In this paper, we have presented efficient Integer Linear Program (ILP) formulations for traffic grooming on mesh WDM networks, one for minimizing the congestion and the other to maximize the throughput of the network, with an assumption that the logical topology is specified. Unlike previous formulations, our formulations can be used for practical sized networks. We have simulated our formulations with networks up to 30 nodes, and with hundreds and even thousands of low-speed data streams and have shown that the formulations are able to generate optimal solutions within a reasonable amount of time.

Arunita Jaekel, Ataul Bari, Subir Bandyopadhyay
Implementing IPv4+4 Addressing Architecture with IPv4 LSRR Option for Seamless Peer-to-Peer (P2P) Communication

IPv4 architecture is well entrenched with Network Address Translation (NAT) boxes, which cause well-known problems for Peer-to-Peer (P2P) applications. IPv6 would enable end-to-end connectivity when deployed, but the industry has been slow in transitioning to IPv6. IPv4+4 has been suggested as an alternative NAT-extended addressing architecture, where the idea is to assign 64-bit end-to-end globally unique addresses for nodes on private address realms by concatenating the 32-bit globally routable IPv4 address of the realm (border) gateway with the 32-bit private IPv4 addresses of the nodes. While IPv4+4 addressing proposal is neat, existing IPv4+4 implementations require changes to all border gateways and end-hosts, which hinders its deployment. In this paper we show how the IPv4+4 addressing architecture can be implemented by using a modified version of the standard IPv4 Loose Source Record Route (LSRR) option. Our proposal requires no changes to existing IPv4 infrastructure (assuming all IPv4-compliant nodes implement LSRR as required by RFC 791), thus enabling seamless end-to-end communication for P2P applications. We demonstrate packet forwarding with the 64-bit IPv4+4 addresses, and illustrate how the widely-used P2P voice over IP protocol, the Session Initiation Protocol, can make use of our proposal for seamless end-to-end communication.

Cihan Topal, Cuneyt Akinlar
Dynamic Handover Mechanism Using Mobile SCTP in Contention Based Wireless Network

Mobile SCTP (mSCTP) has been proposed to support transport layer mobility management. In this paper, we proposed a dynamic handover mechanism named Contention-based mSCTP (C-mSCTP) in the contention based wireless network. The handover strategy in C-mSCTP takes wireless bandwidth and the contention probability into account to set the primary path before transmitting data. The simulation results show that the C-mSCTP achieves much better performance in terms of transmission delay and throughput than mSCTP does.

Lin-Huang Chang, Huan-Jie Lin, Ing-chau Chang
A Clustering-Based Channel Assignment Algorithm and Routing Metric for Multi-channel Wireless Mesh Networks

Multiple non-overlapped channels are available in IEEE 802.11 but are rarely used today in wireless multi-hop networks. Wireless mesh network is a special type of multi-hop ad hoc network and is envisioned to provide high capacity and large coverage. In this paper, we propose a 2-hop clustering based multi-interface, multi-channel network architecture and design a novel channel assignment algorithm and routing metric. Channel assignment is composed of Inter-cluster Static Assignment and Intra-cluster Dynamic Assignment. Since traditional routing metrics, such as hop-count, may not perform well in multi-channel wireless networks, we propose the CDM routing metric, which combines hop-count, channel diversity and channel switching capability together. Simulation results show that our algorithms achieve up to 3.3 times higher end-to-end throughput.

Chao Liu, Zhongyi Liu, Yongqiang Liu, Huizhou Zhao, Tong Zhao, Wei Yan
A Hierarchical Care-of Prefix with BUT Scheme for Nested Mobile Networks

In this paper, we apply the hierarchical concept to the Care-of Prefix (CoP) scheme as HCoP and enhance HCoP with a novel Binding Update Tree (BUT) structure as HCoP-B for NEtwork MObility (NEMO) management of the nested mobile network. As compared to schemes such as Reverse Routing Header (RRH), Route Optimization using Tree Information Option (ROTIO) and HCoP with numerical performance evaluations, HCoP-B achieves the shortest handoff latency and significantly reduces the consumed network bandwidth of global binding update messages for route optimizations (RO) of all correspondent nodes (CN) after the nested mobile network hands over to a new AR. Consequently, HCoP-B resolves the RO storm problem.

Ing-Chau Chang, Chia-Hao Chou, Lin-Huang Chang
Some Properties of WK-Recursive and Swapped Networks

The surface area which is defined as the number of vertices at a given distance from a base vertex of a graph is considered to be as one of the most useful yet abstract combinatorial properties of a graph. The applicability of surface area spans many problem spaces such as those in parallel and distributed computing. These problems normally involve combinatorial analysis of underlying graph structures (e.g., spanning tree construction, minimum broadcast algorithms, efficient VLSI layout, performance modeling). In this paper, we focus on the problem of finding the surface area of a class of popular graphs, namely the family of WK-recursive and swapped networks. These are attractive networks due to their useful recursive structures.

Navid Imani, Hamid Sarbazi-Azad, Albert Y. Zomaya
Design and Analysis of Multicast Communication in Multidimensional Mesh Networks

This paper addresses the issue of multicast communication in scalable interconnection networks, using path-based scheme. Most existing multicast algorithms either assume a fixed network size, low dimensional networks or only consider the latency at the network level. As a consequence, most of these algorithms implement multicast in a sequential manner and can not scale well with the network dimensions or the number of nodes involved. Furthermore, most of these algorithms handle multicast communication with low throughput. In this paper, we propose a multicast algorithm for multidimensional interconnection networks, which is built upon our Qualified Groups QG multicast scheme for ensuring efficient communication irrespective of the network sizes/dimensions or the number of the destination nodes. Unlike the existing works, this study considers the scalability and latency at both the network and node levels so as to achieve a high degree of parallelism. Our results show that the proposed algorithm considerably improves the multicast message delivery ratio, throughput and scalability.

Ahmed Al-Dubai, Mohamed Ould-Khaoua, Imed Romdhani
Zone Based Data Aggregation Scheduling Scheme for Maximizing Network Lifetime

A wireless sensor network consists of many micro-sensor nodes distributed throughout an area of interest. Each node has a limited energy supply and generates information that needs to be communicated to a sink node. The basic operation in such a network is the systematic gathering and transmission of sensed data to a base station for further processing. During data gathering, sensors have the ability to perform in-network aggregation (fusion) of data packet routes to the base station. The lifetime of such a sensor system can be defined as the time during which the sensor information is gathered from all of the sensors and combined at the base station. Given the location of the sensors, the base station and the available energy at each sensor, the main interest is to find an efficient manner in which data can be collected from the sensors and transmitted to the base station, so as to maximize the system lifetime. We address the zone based data aggregation scheduling scheme for maximizing network lifetime. The experimental results demonstrate that the proposed protocol significantly outperforms other data aggregation protocols in terms of the energy saving and system lifetime.

Sangbin Lee, Kyuho Han, Kyungsoo Lim, Jinwook Lee, Sunshin An
A Robust Scalable Cluster-Based Multi-hop Routing Protocol for Wireless Sensor Networks

Wireless sensor networks are widely deployed for a wide range of applications for gathering information about the environment, monitoring huge building etc. However, the limited energy of the sensor nodes requires efficient gathering of information so that the network lifetime is increased. In literature it is proved that this efficiency can be achieved by clustering the sensor nodes in the network. Previously we proposed a single hop genetic based clustering protocol (GCA) for sensor networks. However, multi-hop routing techniques are known to be a practical approach to solving the problem of routing in sensor networks. In this paper, we present a new robust clustering based multi-hop routing protocol for wireless sensor networks. The proposed genetic clustering algorithm (M-GCA) employs evolutionary techniques to form an efficient virtual backbone. This backbone is used to support routing messages to the base station. Simulation results show that the proposed multi-hop routing protocol outperforms GCA, a single-hop clustering protocol in several scenarios.

Sudha Mudundi, Hesham Ali
Qos Provisioning in Mobile Networks Based on Aggregate Bandwidth Reservation

This paper proposes a novel resource management framework that integrates Call Admission Control (CAC) and aggregate reservation of bandwidth for mobile networks in a scalable fashion. Our proposal avoids per-user reservation signaling overhead and takes into account the expected bandwidth to be used by calls handed off from neighboring cells within a prediction interval through the Trigg and Leach Method (an adaptive exponential smoothing technique). Our scheme is compared through simulations with the ACR (Adaptive Channel Reservation) scheme, a dynamic reservation-based proposal that uses GPS systems to extrapolate users’ movement and to trigger reservations in the next predicted cell. The simulation results show that our proposal provides the best performance in terms of handoff dropping probability and can achieve similar levels of call blocking probability as compared to ACR. In addition, our proposal can grant an upper bound on handoff dropping probability even under very high loads.

Kelvin L. Dias, Stenio F. L. Fernandes, Djamel F. H. Sadok
A Network Performance Sensitivity Metric for Parallel Applications

Excessive run time variability of parallel application codes on commodity clusters is a significant challenge. To gain insight into this problem our earlier work developed a tools to emulate parallel applications (PACE) by simulating computation and using the cluster’s interconnection network for communication, and further study parallel application run time effects (PARSE). This work expands our previous efforts by presenting a metric derived from PARSE test results conducted on several widely used parallel benchmarks and application code fragments. The metric suggests that a parallel application’s sensitivity to network performance variation can be quantified relative to its behavior in optimal network performance conditions. Ideas on how this metric can be useful to parallel application development, cluster system performance management and system administration are also presented.

Jeffrey J. Evans, Cynthia S. Hood
The Influence of Interference Networks in QoS Parameters in a WLAN 802.11g Environment

This paper proposes a strategy to determine how much a given network can affect the QoS parameters of another, by interference. In order to achieve this, a measurement campaign was carried out in two stages: firstly with a single AP and later with two APs separated by a distance less than three meters, using the same channel. After the measurement, an analysis of the results and a set of inferences were made by using Bayesian Networks, whose inputs were the experimental data, i.e. QoS metrics such as: throughput, jitter, packet loss, PMOS and physical metrics like power and distance.

Jasmine P. L. Araújo, Josiane C. Rodrigues, Simone G. C. Fraiha, Felipe M. Lamarão, Nandamudi L. Vijaykumar, Gervásio P. S. Cavalcante, Carlos R. L. Francês

Software and Languages

Instruction Selection for Subword Level Parallelism Optimizations for Application Specific Instruction Processors

Application Specific Instruction Processors (or, ASIPs) have the potential to meet the high-performance demands of multimedia applications, such as image processing, audio and video encoding, speech processing, and digital signal processing. To achieve lower cost and efficient energy for high performance embedded systems built by ASIPs, subword parallelism optimization will become an important alternative to accelerate multimedia applications. But one major problem is how to exploit subword parallelism for ASIPs with limited resources. This paper shows that loop transformations such as loop unrolling, variable expansion, etc., can be utilized to create opportunities for subword parallelism, and presents a novel approach to recognize and extract subword parallelism based on Cost Subgragh (or, CSG). This approach is evaluated on Transport Triggered Architecture (TTA), a customizable processor architecture that is particularly suitable for tailoring the hardware resources according to the requirements of the application. In our experiment, 63.58% of loops and 85.64% of instructions in these loops can exploit subword parallelism. The results indicate that significant available subword parallelism would be attained using our method.

Miao Wang, Guiming Wu, Zhiying Wang
High Performance 3D Convolution for Protein Docking on IBM Blue Gene

We have developed a high performance 3D convolution library for Protein Docking on IBM Blue Gene. The algorithm is designed to exploit slight locality of memory access in 3D-FFT by making full use of a cache memory structure. The 1D-FFT used in the 3D convolution is optimized for PowerPC 440 FP2 processors. The number of SIMOMD instructions is minimized by simultaneous computation of two 1D-FFTs. The high performance 3D convolution library achieves up to 2.16 Gflops (38.6% of peak) per node. The total performance of a shape complementarity search is estimated at 7 Tflops with the 4-rack Blue Gene system (4096 nodes).

Akira Nukada, Yuichiro Hourai, Akira Nishida, Yutaka Akiyama
KSEQ: A New Scalable Synchronous I/O Multiplexing Mechanism for Event-Driven Applications

The performance of event-driven network applications, such as Web servers and proxies, was influenced by the scalability and efficiency of synchronous I/O multiplexing mechanism. Research shows that event-based mechanism can ensure the scalability, and using kernel-user shared memory to evade system calls can reduce a lot of system overhead. But these two features can not be combined by any solution till now, because of synchronous problem. This paper attempts to design an event notification mechanism for event-driven network applications, which using kernel-user shared event queues (KSEQ) to achieve both good scalability and low system overhead. The KSEQ works something like double buffer, and both application and kernel can write the shared data structures without the help of synchronization system calls. Experiment shows that the Squid proxy server using this mechanism presents shorter response time than other mechanisms.

Hongtao Xia, Weiping Sun, Jingli Zhou, Yunhua Huang, Jifeng Yu
A Synchronous Mode MPI Implementation on the Cell BETM Architecture

The Cell Broadband Engine shows much promise in high performance computing applications. The Cell is a heterogeneous multi-core processor, with the bulk of the computational work load meant to be borne by eight co-processors called SPEs. Each SPE operates on a distinct 256 KB local store, and all the SPEs also have access to a shared 512 MB to 2 GB main memory through DMA. The unconventional architecture of the SPEs, and in particular their small local store, creates some programming challenges. We have provided an implementation of core features of MPI for the Cell to help deal with this. This implementation views each SPE as a node for an MPI process, with the local store used as if it were a cache. In this paper, we describe synchronous mode communication in our implementation, using the rendezvous protocol, which makes MPI communication for long messages efficient. We further present experimental results on the Cell hardware, where it demonstrates good performance, such as throughput up to 6.01 GB/s and latency as low as 0.65

μ

s on the pingpong test. This demonstrates that it is possible to efficiently implement MPI calls even on the simple SPE cores.

Murali Krishna, Arun Kumar, Naresh Jayam, Ganapathy Senthilkumar, Pallav K. Baruah, Raghunath Sharma, Shakti Kapoor, Ashok Srinivasan
Backmatter
Metadaten
Titel
Parallel and Distributed Processing and Applications
herausgegeben von
Ivan Stojmenovic
Ruppa K. Thulasiram
Laurence T. Yang
Weijia Jia
Minyi Guo
Rodrigo Fernandes de Mello
Copyright-Jahr
2007
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-74742-0
Print ISBN
978-3-540-74741-3
DOI
https://doi.org/10.1007/978-3-540-74742-0