Skip to main content

2005 | Buch

Distributed and Parallel Computing

6th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP, Melbourne, Australia, October 2-3, 2005. Proceedings

herausgegeben von: Michael Hobbs, Andrzej M. Goscinski, Wanlei Zhou

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

There are many applications that require parallel and distributed processing to allow complicated engineering, business and research problems to be solved in a reasonable time. Parallel and distributed processing is able to improve company profit, lower costs of design, production, and deployment of new technologies, and create better business environments. The major lesson learned by car and aircraft engineers, drug manufacturers, genome researchers and other specialist is that a computer system is a very powerful tool that is able to help them solving even more complicated problems. That has led computing specialists to new computer system architecture and exploiting parallel computers, clusters of clusters, and distributed systems in the form of grids. There are also institutions that do not have so complicated problems but would like to improve profit, lower costs of design and production by using parallel and distributed processing on clusters. In general to achieve these goals, parallel and distributed processing must become the computing mainstream. This implies a need for new architectures of parallel and distributed systems, new system management facilities, and new application algorithms. This also implies a need for better understanding of grids and clusters, and in particular their operating systems, scheduling algorithms, load balancing, heterogeneity, transparency, application deployment, which is of the most critical importance for their development and taking them by industry and business.

Inhaltsverzeichnis

Frontmatter
Improving Concurrent Write Scheme in File Server Group

The explosive growth of the Web contents has led to increasing attention on scalability and availability of file system. Hence, the ways to improve the reliability and availability of system, to achieve the expected reduction in operational expenses and to reduce the operations of management of system have become essential issues. A basic technique for improving reliability of a file system is to mask the effects of failures through replication. Consistency control protocols are implemented to ensure the consistency among replicas. In this paper, we leveraged the concept of intermediate file handle to cover the heterogeneity of file system and proposed an efficient data consistency control scheme supporting dependence checking among writes and management of out-of-ordered requests for file server group. Finally, the results of experiments proved the efficiency of the proposed consistency control mechanism. Above all, easy to implement is our main design consideration.

Fengjung Liu, Chu-sing Yang
A Comparative Performance Study of Distributed Mutual Exclusion Algorithms with a Class of Extended Petri Nets

A few algorithms of distributed mutual exclusion are discussed, their unified model in terms of a finite-population queuing system is proposed, and their simulation performance study is presented with the assumption that they use multicast communication if possible. To formally represent the algorithms for simulation, a class of extended Petri nets is used. The simulation was done in the simulation system Winsim based on this class of Petri nets.

Alexander Kostin, Ljudmila Ilushechkina, Erhan Basri
A Practical Comparison of Cluster Operating Systems Implementing Sequential and Transactional Consistency

Shared Memory is an interesting communication paradigm for SMP machines and clusters. Weak consistency models have been proposed to improve efficiency of shared memory applications. In a programming environment offering weak consistency it is a necessity to worry about individual load and store operations and about proper synchronization. In contrast to this explicit style of distributed programming hared memory systems implementing strong consistency models are easy to program and consistency is implicit. In this paper we compare two representatives: Kerrighed and Plurix implementing sequential and transactional consistency respectively. Kerrighed is a single system image operating system (OS) based on Linux whereas Plurix is a native OS for PC clusters designed for shared memory operation. The measurements presented in this paper show that strong consistency models implemented at the OS level are competitive.

Stefan Frenz, Renaud Lottiaux, Michael Schoettner, Christine Morin, Ralph Goeckelmann, Peter Schulthess
Clock Synchronization State Graphs Based on Clock Precision Difference

Consistent and stable global states of clock synchronization are very important in distributed and parallel systems. This paper presents an innovative strategy and method to obtain stable global clock synchronization state graphs in asynchronous Internet environments. Our model will introduce the concept of clock precision difference as a means to evaluate running states of all clocks in this system and make this system self-adaptive well. Finally, we introduce the concept of clock precision difference into global states analysis of clock synchronization and construct clock synchronization state graphs in order to evaluate distributed clock synchronization states. We also present detailed simulations of the strategy and mathematical analysis used on real Internet environments.

Ying Zhao, Wanlei Zhou, Yingying Zhang, E. J. Lanham, Jiumei Huang
A Recursive-Adjustment Co-allocation Scheme in Data Grid Environments

The co-allocation architecture was developed in order to enable parallel downloads of datasets from multiple servers. Several co-allocation strategies have been coupled and used to exploit rate differences among various client-server links and to address dynamic rate fluctuations by dividing files into multiple blocks of equal sizes. However, a major obstacle, the idle time of faster servers having to wait for the slowest server to deliver the final block, makes it important to reduce differences in finish time among replica servers. In this paper, we propose a dynamic co-allocation scheme, namely

Recursive-Adjustment Co-Allocation

scheme, to improve the performance of data transfer in Data Grids. Our approach reduces the idle time spent waiting for the slowest server and decreases data transfer completion time.

Chao-Tung Yang, I-Hsien Yang, Kuan-Ching Li, Ching-Hsien Hsu
Reducing the Bandwidth Requirements of P2P Keyword Indexing

This paper describes the design and evaluation of a federated, peer-to-peer indexing system, which can be used to integrate the resources of local systems into a globally addressable index using a distributed hash table. The salient feature of the indexing systems design is the efficient dissemination of term-document indices using a combination of duplicate elimination, leaf set forwarding and conventional techniques such as aggressive index pruning, index compression, and batching. Together these indexing strategies help to reduce the number of RPC operations required to locate the nodes responsible for a section of the index, as well as the bandwidth utilization and the latency of the indexing service. Using empirical observation we evaluate the performance benefits of these cumulative optimizations and show that these design trade-offs can significantly improve indexing performance when using a distributed hash table.

John Casey, Wanlei Zhou
A Deadline and Budget Constrained Scheduling Algorithm for eScience Applications on Data Grids

In this paper, we present an algorithm for scheduling of distributed data intensive Bag-of-Task applications on Data Grids that have costs associated with requesting, transferring and processing datasets. The algorithm takes into account the explosion of choices that result due to a job requiring multiple datasets from multiple data sources. The algorithm builds a resource set for a job that minimizes the cost or time depending on the user’s preferences and deadline and budget constraints. We evaluate the algorithm on a Data Grid testbed and present the results.

Srikumar Venugopal, Rajkumar Buyya
A Survivability Model for Cluster System

Even in an intrusion tolerant system, the resources will be fatigued if the intrusion is long lasting because of compromising iteratively or incrementally. In due course, the system will not provide even the minimum critical functionality. Thus we propose a model to increase the cluster system survivability level by maintaining the essential functionality. In this paper, we present the cluster recovery model with a software rejuvenation methodology, which is applicable in security field and also less expensive. Firstly, we perform the steady-state analysis of a cluster system and then study the 4th Generation Security Mechanism: Restore system with cold standby cluster. The basic idea is investigate the consequences for the exact responses in face of attacks and rejuvenate the running software or/and service, or/and reconfigure it. It shows that the system operates through intrusions and provides continued the critical functions, and gracefully degrades non-critical system functionality in the face of intrusions.

Khin Mi Mi Aung, Kiejin Park, Jong Sou Park
Localization Techniques for Cluster-Based Data Grid

In this paper, we present an efficient method for optimizing localities of data distribution when executing data parallel applications. The data to logical grid nodes mapping technique is employed to enhance the performance of parallel programs on cluster grid. Cluster grid is a typical computational grid environment consists of several clusters located in multiple campuses that are distributed globally over the Internet. Objective of the proposed technique is to reduce inter-cluster communication overheads and to speed the execution of data parallel programs in the underlying distributed cluster grid. The theoretical analysis and experimental results show improvement of communication costs and scalable of the proposed techniques on different hierarchical cluster grids.

Ching-Hsien Hsu, Guan-Hao Lin, Kuan-Ching Li, Chao-Tung Yang
GridFTP and Parallel TCP Support in NaradaBrokering

Many of the key features of file transfer mechanisms like reliable file transferring and parallel transferring are developed as part of the service. It makes very hard to re–use the same code for the different systems. We are trying to overcome this disadvantage by decoupling useful features of file transfer mechanisms from the implementation of the service and protocol, and instead placed into the messaging substrate. We may thus treat file transfer operations as a specific usage case for a more general messaging environment. This will allow us to provide file transfer quality of service to other file transfer tools that does not have same features.

Sang Boem Lim, Geoffrey Fox, Ali Kaplan, Shrideep Pallickara, Marlon Pierce
2-Layered Metadata Service Model in Grid Environment

Data intensive task is becoming one of the most important applications in grid environment. The scale of data sets has been hundreds of terabytes and soon will be petabytes. The primary problem we face is how to organize the geographical distributed storage devices to support the collaborative operations on data in those resources. On the one hand, performance is critical to such application, but on the other hand the diverse network conditions prevent users from getting the same service quality. This paper focuses on how to resolve the above problem, and presents a 2-layered metadata service model in grid environment which utilizes the special locality of users’ distribution and provides a platform for grid data management. We have implemented that 2-layered metadata service model in

ChinaGrid Supporting Platform

(CGSP) – the grid middleware for ChinaGrid project.

Muzhou Xiong, Hai Jin, Song Wu
pKSS: An Efficient Keyword Search System in DHT Peer-to-Peer Network

The state-of-the-art keyword search system for structured P2P systems is built on the distributed inverted index. However, Distributed inverted index by keywords may incur significant bandwidth for executing more complicated search queries such as multiple-attribute queries. In order to reduce query overhead, KSS (Keyword Set Search) by Gnawali partitions the index by a set of keywords. However, a KSS index is considerably larger than a standard inverted index, since there are much more word sets than individual words. And the insert overhead and storage overhead are obviously unacceptable for full-text search on a collection of documents. In this paper, we presents pKSS, a P2P keyword search system that that adopts term ranking approach such as TFIDF and exploits the relationship information between query keywords to improve performance of P2P keyword search. Experimental results clearly demonstrated that the improved keyword search is more efficient than KSS index in insert overhead and storage overhead, and much less than standard inverted index on bandwidth costs for a query.

Yin Li, Fanyuan Ma, Liang Zhang
A Comparative Study at the Logical Level of Centralised and Distributed Recovery in Clusters

Cluster systems are becoming more prevalent in today’s computer society and users are beginning to request that these systems be reliable. Currently, most clusters have been designed to provide high performance at the cost of providing little to no reliability. To combat this, this report looks at how a recovery facility, based on either a centralised or distributed approach could be implemented into a cluster that is supported by a checkpointing facility. This recovery facility can then recover failed user processes by using checkpoints of the processes that have been taken during failure free execution.

Andrew Maloney, Andrzej Goscinski
Toward Self Discovery for an Autonomic Cluster

Nondedicated clusters are currently at the forefront of the development of high performance computing systems. These clusters are relatively intolerant of hardware failures and cannot manage dynamic cluster membership efficiently. This report presents the logical design of an innovative self discovery service that provides for automated cluster management and resource discovery. The proposed service has an ability to share or recover unused computing resources, and to adapt to transient conditions autonomically, as well as the capability of providing dynamically scalable virtual computers on demand.

Eric Dines, Andrzej Goscinski
Mining Traces of Large Scale Systems

Large scale distributed computing infrastructure captures the use of high number of nodes, poor communication performance and continously varying resources that are not available at any time. In this paper, we focus on the different tools available for mining traces of the activities of such aforementioned architecture. We propose new techniques for fast management of a frequent itemset mining parallel algorithm. The technique allow us to exhibit statistical results about the activity of more that one hundred PCs connected to the web.

Christophe Cérin, Michel Koskas
Setup Algorithm of Web Service Composition

A number of web services are now available and it therefore seems natural to reuse existing web services to create composite web services. The pivotal problems of web services composition are how to model the input and output data dependency of candidate web services and how to satisfy that of a service request by composition efficiently. In this paper we present the concept of “invocation layer” based on data dependency between web services invocation and design the algorithms to get the least invocation layers of candidate web services satisfying the given service request.

YanPing Yang, QingPing Tan, Yong Xiao
Self Healing and Self Configuration in a WSRF Grid Environment

The move towards web services in Grid computing requires mechanisms for services to maintain state. This is introduced by the Web Services Resource Framework which provides a basis for web services to access stateful resources. While this allows access to stateful resources, the web services themselves are not stateful. Currently, Grids require a lot of direct involvement of application developers, who are, in general, not computing specialists. The principles of autonomic computing introduce characteristics which are aimed at automatic improvement of computing systems and can be applied to the Grid. This paper addresses the principles of self healing and self configuration in a Grid environment and implements a service using the WSRF.NET framework to investigate the affect and applicability of the Web Services Resource Framework on these principles and improve the WSRF specification.

Michael Messig, Andrzej Goscinski
Study on Life Cycle Model of Dynamic Composed Web Services

Modern requirement of dynamic Web Services rely increasingly on composing concurrent, distributed, mobile, re-configurable and heterogenous services, and substantial progress has already been made towards composed Web Services. In this paper, first, we proposed a life cycle of composed Web services, then designed a model named Service-Cloud model based on the process of forming clouds in nature. Finally, based on Service-Cloud model, we design and implement a prototype.

Chen Yanping, Li Zengzhi, Jin Qinxue, Wang Chuang
Fault-Tolerant Dynamic Job Scheduling Policy

In this paper, we propose a scalable and fault-tolerant job scheduling framework for grid computing. The proposed framework loosely couples a dynamic job scheduling approach with the hybrid replications approach to schedule jobs efficiently while at the same time providing fault-tolerance. The novelty of the proposed framework is that it uses passive replication approach under high system load and active replication approach under low system loads. The switch between these two replication methods is also done dynamically and transparently.

J. H. Abawajy
An Efficient Dynamic Load-Balancing Algorithm in a Large-Scale Cluster

Random stealing is a well-known dynamic load-balancing algorithm. However, for a large-scale cluster, the simple random stealing policy is no longer efficient because an idle node must randomly steal many times to obtain a task from another node. This will not only increase the idle time for all nodes but also produce a heavy network communication overhead. In this paper, we propose a novel dynamic load-balancing algorithm, Transitive Random Stealing (TRS), which can make any idle node obtain a task from another node with much fewer stealing times in a large-scale cluster. A probabilistic model is constructed to analyze the performance of TRS, random stealing and Shis, one of load balance policies in the EARTH system. Finally, by the random baseline technique, an experiment designed to compare TRS with Shis and random stealing for five different load distributions in the Tsinghua EastSun cluster convinces us that TRS is a highly efficient dynamic load-balancing algorithm in a large-scale cluster.

Bao-Yin Zhang, Ze-Yao Mo, Guang-Wen Yang, Wei-Min Zheng
Job Scheduling Policy for High Throughput Grid Computing

The growing computational power requirements of grand challenge applications has promoted the need for merging high throughput computing and grid computing principles to harness computational resources distributed across multiple organisations. This paper identifies the issues in resource management and scheduling in the emerging high throughput grid computing context. We also survey and study the performance of several space-sharing and time-sharing opportunistic scheduling policies that have been developed for high throughput computing.

J. H. Abawajy
High Performance Task Scheduling Algorithm for Heterogeneous Computing System

A key issue in obtaining high performance from a parallel program represented by a Directed A-cyclic Graph (DAG) is to efficiently mapping it into the target system. The problem is generally addressed in terms of task scheduling, where the tasks are the schedulable units of a program. The task scheduling problems have been shown to be NP-complete in general as well as several restricted cases. In order to be of practical use for large applications, scheduling algorithms must guarantee high performance by minimizing the schedule length and scheduling time. In this paper we propose a new task-scheduling algorithm namely, High Performance task Scheduling (HPS) algorithm for heterogeneous computing system with complexity O (v + e) (p+ log v), which provides optimal results for applications represented by DAGs. The performance of the algorithm is illustrated by comparing the schedule length, speedup, efficiency and the scheduling time with existing algorithms reported in this paper. The comparison study based on both randomly generated graphs and graphs of some real applications shows that HPS algorithm substantially outperforms existing algorithms.

E. Ilavarasan, P. Thambidurai, R. Mahilmannan
Execution Environments and Benchmarks for the Study of Applications’ Scheduling on Clusters

In this paper, we have demonstrated how the existing programming environments, tools and middleware could be used for the study of execution performance of parallel and sequential applications on a non-dedicated cluster. A set of parallel and sequential benchmark applications selected for and used in the experiments were characterized, and experiment requirements shown.

Adam K. L. Wong, Andrzej M. Goscinski
Data Distribution Strategies for Domain Decomposition Applications in Grid Environments

In this paper, we evaluate message-passing applications in Grid environments using domain decomposition technique. We compare two domain decomposition strategies: a balanced and unbalanced one. The balanced strategy is commonly strategy used in homogenous computing environment. This strategy presents some problems related with the larger communication latency in Grid environments. We propose an unbalanced domain decomposition strategy in order to overlap communication latency with useful computation. This idea consists in assigning less workload to processors responsible for sending updates outside the host. We compare the results obtained with the classical balanced strategy. We show that the unbalanced distribution pattern improves the execution times of domain decomposition applications in Grid environments. We considered two kinds of meshes, which define the most typical cases. We show that the expected execution time can be reduced up to 53%. We also analyze the influence of the communication patterns on execution times using the Dimemas simulator.

Beatriz Otero, José M. Cela, Rosa M. Badia, Jesús Labarta
Inter-round Scheduling for Divisible Workload Applications

Most common jobs of Grid computing are arbitrarily divisible. Divisible load theory(DLT) provides the mathematical machinery for time-optimal processing. With multiple round load distributions, idle processor periods can be harnessed for useful computation. Optimized rounds for the purpose can be planned in advance. The Grid is dynamic in nature. The above theory does not fully account for this. Any realistic scheduling strategy based on DLT has to take this fact into account. Existing multiple rounds algorithms do not involve time-varying effects due to environmental changes. This is a situation that leads to processing delays, such as Disk I/O contention. The proposed inter-round scheduling algorithm takes this into consideration. It involves time-varying resource performance degradation and results in resonable performance.

DongWoo Lee, R. S. Ramakrishna
Scheduling Divisible Workloads Using the Adaptive Time Factoring Algorithm

In the past years a vast amount of work has been done in order to improve the basic scheduling algorithms for master/slave computations. One of the main results from this is that the workload of the tasks may be adapted during the execution, using either a fixed increment or decrement (

e.g.

based on an arithmetical or geometrical ratio) or a more sophisticated function to adapt the workload. Currently, the most efficient solutions are all based on some kind of evaluation of the slaves’ capacities done exclusively by the master. We propose in this paper the Adaptive Time Factoring scheduling algorithm, which uses a different approach distributing the scheduling between slaves and master. The master computes, using the Factoring algorithm, a time slice to be used by each slave for processing, and the slave predicts the correct workload size it should receive in order to accomplish this time slice. The prediction is based on a performance model located on each slave which is refined during the execution of the application in order to provide better predictions. We evaluated the proposed algorithm using a synthetic testbed and compared the obtained results with other scheduling algorithms.

Tiago Ferreto, César De Rose
Adaptive Policy Triggering for Load Balancing

Load balancing is an attractive problem in storage system. With the fast growth of high-speed network technology and novel storage architecture, smarter storage device becomes an effective way to solve the problem. In this paper, we present an effective object migration & replication policy in our object storage system (OSS), denoted adaptive policy triggering. This policy migrates/replicates object from congested object storage controllers to relatively uncongested object storage controllers according to the information of three facets in the OSS, including metadata server, object storage controller and storage object itself. First, clients interact with the metadata server (MS). So those global policies are triggered by the MS. Second, the OSC has better knowledge of its own load than MS. It is reasonable that local policy is triggered by the OSC. Third, the storage object is encapsulated with data, attributes and methods. These attributes can be set or got when objects are accessed. And object attribute values are good as policy threshold. Furthermore, object methods also can be triggered according to some policies.

Dan Feng, Lingfang Zeng
Parallel Algorithms for Fault-Tolerant Mobile Agent Execution

Redundancy is a basic technique for achieving fault tolerance, but the overhead introduced by redundancy may degrade system’s performance. In this paper, we propose efficient replication based algorithms for fault-tolerant mobile agent execution, which allows for parallel processing in the agent execution so as to reduce the overheads caused by redundancy. We also investigate the heartbeat based failure detector approach and modify it for use in our proposed algorithms. Performance evaluation has been performed to compare the proposed algorithms with the existing algorithm. Both analytic and simulation results show that our new algorithms can significantly improve system’s performance.

Jin Yang, Jiannong Cao, Weigang Wu, Cheng-Zhong Xu
Design and Multithreading Implementation of the Wave-Front Algorithm for Constructing Voronoi Diagrams

The Voronoi diagram is one of the most fundamental data structures in computational geometry, which is concerned with the design and analysis of algorithms for geometrical problems. In this paper, a parallel algorithm for constructing the Voronoi diagram on CREW (Concurrent Read and Exclusive Write) model is proposed. This is an improved algorithm based on Preilowski and Mumbeck’s work. In their algorithm, they apply the Neighbor-Point-Theorem and present a parallel approach to check neighbor points. In this article, we propose an improved approach, Wave-Front algorithm, which is a quite different way to check neighbor points. The algorithm is then implemented in both sequential and multithreaded models.Since the Wave-Front algorithm has inherently concurrent tasks that can be executed simultaneously, multithreaded version was executed to observe the performance. Computational results indicate the effectiveness of the threaded model.

Grace J. Hwang, Joseph M. Arul, Eric Lin, Chung-Yun Hung
A Proposal of Parallel Strategy for Global Wavelet-Based Registration of Remote-Sensing Images

With the increasing importance of multiple multiplatform remote sensing missions, digital image registration has been applied into many fields, and specially plays a very important role in remotely sensed data processing. Firstly a brief introduction of existing parallel methods of wavelet-based global registration is given. And then the communication optimization for GP method is described. The optimized algorithm is named

Group-Optimized-Parallel

(GOP for short). To find out the reason of occasionally lower efficiency of GOP than other methods, a more careful analysis is presented in theory and proved in experiments. Moreover, we give a quantitative criterion, called

Remainder Items

, to choose the best solution in different input conditions.

Haifang Zhou, Yu Tang, Xuejun Yang, Hengzhu Liu
Performance Analysis of a Parallel Sort Merge Join on Cluster Architectures

We developed a concise but comprehensive analytical model for the well-known sort merge Join algorithm on cost effective cluster architectures.

We try to concentrate on a limited number of characteristic parameters to keep the analytical model clear and focused. We believe that a meaningful model can be built upon only three characteristic parameter sets, describing main memory size, the I/O bandwidth and the disk bandwidth. We justify our approach by a practical implementation and a comparison of the theoretical to real performance values.

Erich Schikuta
Parallel Clustering on the Star Graph

In this paper, a parallel algorithm for data clustering is presented on a multi-computer with star topology. This algorithm is fast and requires a small amount of memory per processing element, which makes it even suitable for SIMD implementation. The proposed parallel algorithm completes in

O

(

K+S

2

– T

2

) steps for a clustering problem of

N

data patterns with M features per pattern and

K

clusters, where

N.M = S

!,

K.M = T

!, and

M=R

!, on a

s

-star interconnection network.

M. Fazeli, H. Sarbazi-Azad, R. Farivar
Hierarchical Parallel Simulated Annealing and Its Applications

In this paper we propose a new parallelization scheme for Simulated Annealing | Hierarchical Parallel SA (HPSA). This new scheme features coarse-granularity in parallelization, directed at message-passing systems such as clusters. It combines heuristics such as adaptive clustering with SA to achieve more efficiency in local search. Through experiments with various optimization problems and comparison with some available schemes, we show that HPSA is a powerful general-purposed optimization method. It can also serve as a framework for meta-heuristics to gain broader application.

Shiming Xu, Wenguang Chen, Weimin Zheng, Tao Wang, Yimin Zhang
Multi-color Difference Schemes of Helmholtz Equation and Its Parallel Fast Solver over 3-D Dodecahedron Partitions

In this paper, the problem of partitioning parallel dodecahedrons in 3D is examined. Two schemes are introduced and their convergence rate discussed. A parallel fast solver was implemented and tested experimentally, with the performance results presented.

Jiachang Sun
GridMD: Program Architecture for Distributed Molecular Simulation

In the present work we describe architectural concepts of the distributed molecular simulation package GridMD. The main purpose of this work is to underline the construction patterns which may help to generalize the design of an application for extensive atomistic simulations. The issues such as design-time parallel execution implication, flexibility and extensions, portability to Grid environments and maximal adaptation of existing third-party codes and resources are addressed. The library is being currently developed, with gradually growing number of available components and tools. The basic GridMD engine is a free software and is distributed under the terms of wxWidgets library license [1].

Ilya Valuev
Visuel: A Novel Performance Monitoring and Analysis Toolkit for Cluster and Grid Environments

The computing power provided by high performance low-cost PC-based Cluster and Grid platforms are attractive, and they are equal or superior to supercomputers and mainframes widely available. In this research paper, we present the design rationale and implementation of Visuel, a toolkit for performance measurement and analysis of MPI parallel programs and real time resources monitoring in cluster and grid computing environments. The proposed toolkit is web-based interface to show performance activities of all computing nodes involved in the execution of a MPI parallel program, such as CPU and memory usage levels of each computing node, and monitors all computing nodes of a computing platform by displaying real time performance data. In addition, this toolkit is able to display comparative performance data charts of multiple executions of MPI parallel application under investigation, which facilitates the “what-if” analysis. The usage of this toolkit shows that it outperforms in easing the process of investigation of parallel applications.

Kuan-Ching Li, Hsiang-Yao Cheng, Chao-Tung Yang, Ching-Hsien Hsu, Hsiao-Hsi Wang, Chia-Wen Hsu, Sheng-Shiang Hung, Chia-Fu Chang, Chun-Chieh Liu, Yu-Hwa Pan
Introduction to a New Tariff Mechanism for Charging for Computer Power in the Grid

In order to charge the computer power in the grid, we have made an effort to work towards a standard pricing unit. Currently there is a lot of work done towards the development of grid economy models and architectures. But when it comes to charging, the usual metric which has been popularly used is $ per CPU per Hour which seems to be too simple. Our effort is to make this metric more meaningful to both grid service provider and client. We argue that in a particular grid host the metric should reflect the true load consumed by the clients and the delays caused due to the other loads. Further it should eventually reflect the network, memory etc consumed by the client as well.

Previously we have studied about the prediction in the grid after introducing the division of the load average at the kernel level. This gave more meaning to the historical load collection as CPU historical load data had been collected separately for each login user. Interestingly, later on the division of load strategy has been helpful in the development of a meaningful tariff mechanism and would be demonstrated in this paper.

Eventually this fare mechanism would be used to predict the computational costs, which would certainly contribute to the scheduling in the grid.

Sena Seneviratne, David Levy
Host Load Prediction for Grid Computing Using Free Load Profiles

In Order to increase the overall performance, we have studied methods for improving load prediction, which would help improve load balancing in the Grid. Current software designed to handle distributed applications does focus on the problem of forecasting the computer’s future load. The UNIX five-second-host load has been collected and used to predict the host load, but the solution of forecasting can be further improved if CPU historical load data had been collected separately for each login user. Another important aspect of historical data collection is that before submission to the grid, the user separates his HPC program into sizable parallel programs and test runs them supposedly on load free computers. This means the user can obtain the load profile of the parallel program on a load free computer together with other important information. Once the free load profile is known, load behaviour of a job under certain variable background load conditions can be predicted. Thus the forecast can be performed for each user before adding the weighted values towards the final solution of prediction. In this paper we have proved that load prediction using free load profiles provided better results. In fact once the user based load data are collected, the forecasting is somewhat like that of the Stock market.

Sena Seneviratne, David Levy
Active Link: Status Detection Mechanism for Distributed Service Based on Active Networks

For network service, it is obvious that “PUSH” method is more efficient than “PULL” method for bandwidth consuming. One problem for “PUSH” method is that the client is difficult to keep track with the status of server. Traditional polling method is bandwidth consuming and put much burden on server. Active link is an active network based service which builds a tree structure between clients and server. Different clients and service can share link information if they have the same middle nodes in the link path. This mechanism can reduce bandwidth consumption and burden of server.

Zhan Tao, Zhou Xingshe, Liao Zhigang, Chen Yan
Performance Monitoring for Distributed Service Oriented Grid Architecture

Computational Grids provide an emerging highly distributed computing platform for scientific computing. Recently, service oriented architecture (SOA) is a trend of implementing software systems including Grid computing systems. SOA provides more flexibilities for Grid users at the service level. Since performance is still one of the major concerns in Grid environments, Grid service performance issues needs to be extensively investigated and studied. However, a lot of issues are still open to be explored and few work has been done on them.

In this paper, we propose a Grid service monitoring architecture for flexible monitoring on various Grid services. We implemented it for monitoring WSRF (Web Service Resource Framework) services in this paper. We show how the service oriented Grid monitor work with a simple example WSRF-compliant MathService. Moreover, the relationship of the monitor and Grid super scheduler is also analyzed. In this way, the scheduler may produce service performance oriented policies that ensure optimal quality of services for Grid applications.

Liang Peng, Melvin Koh, Jie Song, Simon See
Distributed Defense Against Distributed Denial-of-Service Attacks

Distributed defense is a promising way to neutralize the distributed Denial-of-Service attacks by detecting and responding the attacking sources widespread around the Internet. Components of the distributed defense system will cooperate with each other to combat the attacks. Compared with the centralized defense systems, distributed defense systems can discover the attacks more timely from both source end and victim end, fight the attacks with more resources and take advantage of more flexible strategies. This paper investigates 7 distributed defense systems which make use of various strategies to mitigate the DDoS attacks. Different architectures are designed in these 7 systems to provide distributed DDoS defense solutions. We evaluate these systems in terms of deployment, detection, response, security, robustness and implementation. For each criteria, we give a recommendation on which technologies are best suitable for a successful distributed defense system based on the analysis result. Finally we propose our idea on the design of an effective distributed defense system.

Wei Shi, Yang Xiang, Wanlei Zhou
Security and Safety Assurance Architecture: Model and Implementation (Supporting Multiple Levels of Criticality)

A combined architecture is described to protect the system against malicious attacks as well as unplanned system failures. Discussions are laid on its characteristics, structure, safety assurance technologies. Safety kernel (shell) and integrity policy for criticality are used to ensure the system safety. Furthermore, to implement rules of integrity policy, the reflective technology based on metaobject is adopted and how to apply reflective technology to implement these rules is analyzed in details. Finally, an experiment illuminates the feasibility of the proposed architecture.

Li Zhongwen
Modeling and Analysis of Worm and Killer-Worm Propagation Using the Divide-and-Conquer Strategy

A new approach to fight against Internet worms through the use the worm-killing worm has been presented. This paper attempts to model the interaction between the two worms using the divide-and-conquer strategy. We extends the idea of the killer-worm and divide it into three basic types. 1) Patching type: It only installs the patches on the susceptible machines; 2) Predator type: It only kills the worm (it may also patch the infected machines); 3) Composition type: It does both the jobs. The state transition diagram of the two worms and a mathematical model for every type are given. The results by dynamic simulation with the help of MATLAB are obtained.

Dan Wu, Dongyang Long, Changji Wang, Zhanpeng Guan
An Efficient Reliable Architecture for Application Layer Anycast Service

Anycast is a new service in IPv6, and there are some open issues about the anycast service. In this paper, we focus on efficient and reliable aspects of application layer anycast. We apply the requirement based probing routing algorithm to replace the previous period based probing routingalgorithm for anycast resolvers. We employ the twin server model among the anycast servers, therefore, try to present a reliable service in the Internet environment. Our theoretical analysis shows that the proposed architecture works well, and it offers a more efficient routing performance and fault tolerance capability.

Shui Yu, Wanlei Zhou
A Distributed Approach to Estimate Link-Level Loss Rates

Network tomography aims to obtain link-level characteristics, such as loss rate and average delay of a link, by end-to-end measurement. A number of methods have been proposed to estimate the loss rate of a link by end-to-end measurement, all of them, in principle, are based on parametric statistics to identify unknown parameters. In addition, they all used the traditional centralized data processing techniques to complete the estimation, which is time-consuming and unscaleable. In this paper, we put forward a distributed method to tackle the scalability problem. The proposed method, instead of estimating link-level characteristics directly, estimate path level characteristics first that can be executed in parallel and can be achieved by a distributed system. The path level characteristics obtained can be used to identify link-level ones later. The proposed method has been proved to be an maximum likelihood estimate.

Weiping Zhu
Evaluation of Interconnection Network Performance Under Heavy Non-uniform Loads

Many simulation-based performance studies of interconnection networks are carried out using synthetic workloads under the assumption of independent traffic sources. We show that this assumption, although useful for some traffic patterns, can lead to deceptive performance results for loads beyond saturation. Network throughput varies so much amongst the network nodes that average throughput does not reflect anymore network performance as a whole. We propose the utilization of burst synchronized traffic sources that better reflect the coupled behavior of parallel applications at high loads. A performance study of a restrictive injection mechanism is used to illustrate the different results obtained using independent and non-independent traffic sources.

C. Izu, J. Miguel-Alonso, J. A. Gregorio
Analytical Models of Probability Distributions for MPI Point-to-Point Communication Times on Distributed Memory Parallel Computers

Measurement and modelling of distributions of data communication times is commonly done for telecommunication networks, but this has not previously been done for message passing communications on parallel computers. We have used the MPIBench program to measure distributions of point-to-point MPI communication times for two different parallel computers, with a low-end Ethernet network and a high-end Quadrics network respectively. Here we present and discuss the results of efforts to fit the measured distributions with standard probability distribution functions such as exponential, lognormal, Erlang, gamma, Pearson 5 and Weibull distributions.

D. A. Grove, P. D. Coddington
Communication Data Multiplexing in Distributed Simulation

Complex and large-scale distributed systems are characterized by numerous interactive data communication among distributed components over network. This paper proposes a communication data multiplexing approach as an efficient message traffic reduction scheme. This approach encodes joint outputs of sender components into a single message and applies to a distributed system involving components moving and interacting in multi-dimensional space. For performance evaluation, this paper applies uses a projectile/missile case study with realistic multi-dimensional dynamics. This paper investigates variation of system accuracy and network bandwidth requirement, while a ratio of active components and a time granule are varied. Analytical and empirical data clearly indicate the advantages of multiplexing in saving communication resources in a large-scale distributed simulation. In addition, this paper discusses effectiveness and limitation of the multiplexing approach while considering the tradeoff between system accuracy and performance.

Jong Sik Lee
Novel Adaptive Subcarrier Power and Bit Allocation Using Wavelet Packet Parallel Architecture

To realize the requirement of next mobile communication, the adaptive allocation schema of power and bit is presented by using discrete wavelet packet transform (DWPT). The subcarrier modulation and rate allocation method is used for OFDM-DS/CDMA system. According to the downlink channel feedback bit error rate (BER) acquired from uplink channel, an optimal wavelet packet multicarrier modulation allocation is established. For the given BER and QoS, the transmitting power is minimum. The system realizes the high frequency spectrum efficiency. The result shows that the adaptive wavelet packet algorithm not only has the fast convergence rate, but also achieves minimum complexity. The allocation proposed has better performance compared with traditional OFDM system. It is valuable that the system can adaptively adjust the power and bit rate to achieve minimum total transmission power in high rate and efficiency.

Ren Ren, Shihua Zhu
A Low–Level Communication Library for Java HPC

Designing a simple but powerful low-level communication library for Java HPC environments is an important task. We introduce new low-level communication library for Java HPC, called

mpjdev

. The mpjdev API is designed with the goal that it can be implemented

portably

on network platforms and

efficiently

on parallel hardware. Unlike MPI which is intended for the application developer, mpjdev is meant for library developers. Application level communication may be implemented on top of mpjdev. The mpjdev API itself might be implemented on top of Java sockets in a portable network implementation, or-on HPC platforms-through a JNI (Java Native Interface) to a subset of MPI.

Sang Boem Lim, Bryan Carpenter, Geoffrey Fox, Han-Ku Lee
Object-Oriented Design and Implementations of 3G-324M Protocol Stack

This paper describes an object-oriented design and efficient implementation of 3G-324M protocol stack for real-time multimedia transmission. In particular, we discuss the implementations of 324M class hierarchical structure that includes classes H.245 (control) and H.223 (multiplexing) protocols. Our implementation is efficient and has been tested in a realistic 3G infrastructures in Hong Kong as well as in some China industries for optimizations of processing and transmission of real-time video, audio and data.

Weijia Jia Haohuan Fu, Ji Shen
Efficient Techniques and Hardware Analysis for Mesh-Connected Processors

This paper proposes efficient techniques to reconfigure a multi-processor array, which embedded in a 6-port switch lattice in the form of a rectangular grid. It has been shown that the proposed architecture with 6-port switches eliminate gate delays and notably increase the harvest when compared with one using 4-port switches. A new rerouting algorithm combines the latest techniques to maximize harvest without increase in reconfiguration time. Experimental results show that the new reconfiguration algorithm consistently outperforms the most efficient algorithm proposed in literature.

Wu Jigang, Thambipillai Srikanthan, Schröder Heiko
Backmatter
Metadaten
Titel
Distributed and Parallel Computing
herausgegeben von
Michael Hobbs
Andrzej M. Goscinski
Wanlei Zhou
Copyright-Jahr
2005
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-32071-5
Print ISBN
978-3-540-29235-7
DOI
https://doi.org/10.1007/11564621