Skip to main content
Top

2005 | Book

Parallel and Distributed Processing and Applications

Third International Symposium, ISPA 2005, Nanjing, China, November 2-5, 2005. Proceedings

Editors: Yi Pan, Daoxu Chen, Minyi Guo, Jiannong Cao, Jack Dongarra

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

Welcome to the proceedings of ISPA 2005 which was held in the city of Nanjing. Parallel computing has become a mainstream research area in computer science and the ISPA conference has become one of the premier forums for the presentation of new and exciting research on all aspects of parallel computing. We are pleased to present the proceedings for the 3rd International Symposium on Parallel and Distributed Processing and Applications (ISPA 2005), which comprises a collection of excellent technical papers, and keynote speeches. The papers accepted cover a wide range of exciting topics, including architectures, software, networking, and applications. The conference continues to grow and this year a record total of 968 manuscripts (including workshop submissions) were submitted for consideration by the Program Committee or workshops. From the 645 papers submitted to the main conference, the Program Committee selected only 90 long papers and 19 short papers in the program. Eight workshops complemented the outstanding paper sessions.

Table of Contents

Frontmatter

Keynote Speech

Data Structures and Algorithms for Packet Forwarding and Classification

We review the data structures that have been proposed for the forwarding and classification of Internet packets. Data structures for both one-dimensional and multidimensional classification as well as for static and dynamic rule tables are reviewed. Sample structures include multi-bit one- and two-dimensional tries, quad trees, binary trees on binary trees, and list of hash tables.

Sartaj Sahni
Using Speculative Multithreading for General-Purpose Applications

As multi-core technology is currently deployed in computer industry primarily for limiting power consumption and improving system throughput, continued performance improvement of a single application on such systems remains an important and challenging task. Using thread-level parallelism (TLP) to improve instruction-level parallelism (ILP), i.e. to improve the number of instructions executed per clock cycle, has shown to be effective for many general-purpose applications. However, because of the program characteristics of these applications, effective speculative schemes at both thread and instruction levels are crucial. In the past few years, we have seen significant progress being made in the architectures and the compiler techniques to support such thread-level speculative execution model. In this talk, we will discuss these architectural and compiler issues, in particular, the compiler techniques that could support speculative multithreading for general-purpose applications.

Pen-Chung Yew
Towards Peta-Bit Photonic Networks

With a tremendous growth in the Internet traffic, next generation network have been requiring a large increase in transmission capacity, switching-system high-throughput and high-performance optical networking. Wavelength Division Multiplexing (WDM) technology has been increased to the number of wavelengths per fiber hundreds or more with each wavelength operating at the rates of 10Gbps or higher. Thus, the use of all-optical (photonic) networks based on the WDM technology is considered promising to provide peta-bit bandwidth for next generation Internet. To enable the future peta-bit photonic networks, deliberate studies are deserved for some key techniques, such as the ultra-high speed all-optical switching, high performance routing and wavelength assignment (RWA), efficient restoration and protection, etc. This paper provides you with the knowledge about dense WDM networks, high-speed optical switching architectures, high performance routing and wavelength assignment, efficient restoration, as well as prospective vision of future photonic Internet.

Susumu Horiguchi

Tutorial

Technologies and Considerations for Developing Internet and Multiplayer Computer Games: A Tutorial

Games are universal and probably as old as humankind. Today the development of computer technology, especially the development of fast networks and the Internet, brings games a faster growth than ever before. Game design and development is now a fast-growing entertainment field, with a lot to offer professionally and creatively. In fact, from IT professional’s point of view, creating computer games provides us with all the usual technical challenges associated with software development, such as requirement analysis, architectural design, rapid prototyping, HCI, parallel and distributed processing, code reuse, programming, performance evaluation, testing and maintenance. It also provides challenges on other exciting aspects, such as storyboarding, screenplays, illustration, animation, sound effects, music, and social impact. By developing a computer game from start to finish, one would be able to acquire multi-disciplinary knowledge to become an IT professional for the modern era.

Wanlei Zhou
Routing in 2-D Meshes: A Tutorial

As we know, the performance of networks systems is dependent on the end-to-end cost of communication mechanisms. Routing is a process of finding a path from the source node to the destination node in a given network system. The ability to route message efficiently becomes increasingly important. Routing in mesh-connected networks, such as 2-D meshes, has been commonly discussed due to the structural regularity for easy construction and the high potential legibility for variety of algorithms.

Zhen Jiang

Session 1A: Cluster Systems and Applications

RDIM: A Self-adaptive and Balanced Distribution for Replicated Data in Scalable Storage Clusters

As storage systems scale from a few storage nodes to hundreds or thousands, data distribution and load balancing become increasingly important. We present a novel decentralized algorithm, RDIM (Replication Under Dynamic Interval Mapping), which maps replicated objects to a scalable collection of storage nodes. RDIM distributes objects to nodes evenly, redistributing as few objects as possible when new nodes are added or existing nodes are removed to preserve this balanced distribution. It supports weighted allocation and guarantees that replicas of a particular object are not placed on the same node. Its time complexity and storage requirements compare favorably with known methods.

Zhong Liu, Nong Xiao, Xing-Ming Zhou
Modeling and Analysis of a Parallel Nested Loop Join on Cluster Architectures

We develop a concise but comprehensive analytical model for the well-known Nested Loop Join algorithm on cost effective cluster architectures. We concentrate on a limited number of characteristic parameters to keep the analytical model clear and focused. We believe that a meaningful model can be built upon only three characteristic parameter sets, describing main memory size, the I/O bandwidth and the disk bandwidth. We justify our approach by a practical implementation and a comparison of the theoretical and real performance values.

Erich Schikuta
Scheduling Efficiently for Irregular Load Distributions in a Large-scale Cluster

Random stealing is a well-known dynamic scheduling algorithm. However, in a large-scale cluster, an idle node must randomly steal many times to obtain a task from another node, especially, this problem severely affects performance in systems where only a few nodes generate most of the system workload. In this paper, we present an efficient dynamic scheduling algorithm, Transitive Random Stealing (TRS) based on random stealing, which makes any idle node rapidly obtain a task from another node for irregular load distributions in a large-scale cluster. Then by the random baseline technique, we experimentally compare TRS with Shis, one of load balance policies in the EARTH system, and random stealing for different load distributions in the Tsinghua EastSun cluster and show that TRS is a highly efficient scheduling algorithm for irregular load distributions in a large-scale cluster. Finally, TRS is implemented in the Jcluster environment, a high performance Java parallel environment, and an experiment result is given in the HKU Gideon 300 cluster.

Bao-Yin Zhang, Ze-Yao Mo, Guang-Wen Yang, Wei-Min Zheng
A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems

A metadata service is one of the important factors to affect the performance of cluster file systems. We propose a content-based load balancing algorithm that dynamically distributes client requests to appropriate metadata servers based on the types of metadata operations. By replicating metadata and logging update messages in each server rather than moving metadata across servers, we significantly reduce the response time and evenly distribute client requests among metadata servers.

Junho Jang, Saeyoung Han, Sungyong Park, Jihoon Yang
Reducing the Overhead of Intra-Node Communication in Clusters of SMPs

This article presents the

C++

library vShark which reduces the intra-node communication overhead of parallel programs on clusters of SMPs. The library is built on top of message-passing libraries like MPI to provide thread-safe communication but most importantly, to improve the communication between threads within one SMP node. vShark uses a modular but transparent design which makes it independent of specific communication libraries. Thus, different subsystems such as MPI, CORBA, or PVM could also be used for low-level communication. We present an implementation of vShark based on MPI and the POSIX thread library, and show that the efficient intra-node communication of vShark improves the performance of parallel algorithms.

Sascha Hunold, Thomas Rauber

Session 1B: Performance Evaluation and Measurements

On Service-Oriented Network Measurement Architecture with Mobile Agent

Interoperability and adaptability are two major problems that embarrass network measurement practices today on how to finely integrate heterogeneous measurement systems and functionalities. This paper proposes a service-oriented approach based on Web Service for building integrated network measurement architecture that’s scalable and adaptable for change. Measurement functionalities are wrapped in Web Service that can be described in WSDL, discovered by UDDI and accessed through SOAP openly. Mobile Agent, as an autonomous entity, is employed to implement the global control of network measurement, which migrates from site to site calling these Web Services to perform the measurements and returns with the data collected from them. This approach de-couples network measurement control and supporting network measurement functionalities thus introduces flexibilities into the implementation of both sides. The architecture promised by this approach allows not only fast deployment of network measurement functionalities but also simple introduction of measurement control policies.

Zhi Wang, Bo Yu, Chuanshan Gao
Pathtrait: A Tool for Tight Link Location and End-to-End Available Bandwidth Measurement

Estimating the end-to-end available bandwidth along a network path is of great significance in congestion control, streaming applications, QoS verification, server selection. Knowing the exact locations of tight links, network operators can apply traffic engineering, routing policy optimization and fault diagnosis. In this paper we present

Pathtrait

, a tool that allows end users to accurately locate the tight link along a network path and efficiently estimate the end-to-end available bandwidth through the information of tight link location.

Pathtrait

is based on a novel probing technique that generates three different sorts of probing trains. We utilize a original probing structure to capture the input rate and output rate of a single probing train at certain link among the estimated network path, which can infer the tight link and estimate the available bandwidth of the tight link.

Dalu Zhang, Ye Wu, Jian Xu
Performance Evaluation of a Self-organized Hierarchical Topology for Update of Replicas in a Large Distributed System

In this paper we evaluate our own weak consistency algorithm, which is called the ”Fast Consistency Algorithm”, and whose main aim is optimizing the propagation of changes introducing a preference for nodes and zones of the network which have greatest demand. Weak consistency algorithms allow us to propagate changes in a large, arbitrary changing storage network in a self-organizing way. These algorithms generate very little traffic overhead; they have low latency and are scalable, in addition to being fault tolerant. The algorithm has been simulated over ns-2, and measured its performance for complex spatial distributions of demand, including Internet like self-similar fractal distributions of demand. The impulse response of the algorithm has been characterized. We conclude that considering application parameters such as demand in the event or change propagation mechanism to: 1) prioritize probabilistic interactions with neighbors with higher demand, and 2) including little changes on the logical topology (leader interconnection in hierarchical topology ), gives a surprising improvement in the speed of change propagation perceived by most users. In other words, it satisfies the greatest demand in the shortest amount of time.

Jesús Acosta-Elias, B. Pineda Reyes, E. Chavez Leos, Alejandro Ochoa-Cardiel, Mario Recio, Omar Gutierrez-Navarro
A Proposal of Reconfigurable MPI Collective Communication Functions

Message Passing Interface (MPI) Collective Communication Functions (MCCF) are usually implemented in programming libraries utilizing invariable algorithms. Not always do such algorithms yield the best performance with all kinds of applications and over all execution environments. In this paper, we present, simulate, analytically model, verify and analyze reconfigurable MCCF that present variable structures and behaviors, in order to provide optimized configurations, flexibility and performance. In this paper we propose and present a set of optimized reconfigurable MCCF, which add flexibility and high performance to collective communications. We simulate, analytically model, verify and analyze the proposed functions, and compare them with invariable implementations. Our results show that reconfiguration at the algorithm level really yields flexibility and performance gains in MCCF.

Luiz E. S. Ramos, Carlos A. P. S. Martins

Session 1C: Distributed Algorithms and Systems

Design and Evaluation of Network-Bandwidth-Based Parallel Replication Algorithm

Data replication can be used to reduce bandwidth consumption and access latency in the distributed system where users require remote access to large data objects. In this paper, according to the intrinsic characteristic of distributed storage system, the parallel replication algorithm NBPRA (Network-Bandwidth-based Parallel Replication Algorithm) is proposed. In the NBPRA, according to the network state, several replicas of a data object are selected, which are of the least access cost; then the different parts of the data object are transferred from these replicas, and they are used to make a new replica. The results of performance evaluation show that the NBPRA can utilize the network bandwidth efficiently, provide high data replication efficiency and substantially better access efficiency, and the improvement of system performance is related to the number of different data objects accessed by jobs.

Yijie Wang, Yongjin Qin
A Quorum Based Group k-Mutual Exclusion Algorithm for Open Distributed Environments

This paper presents a quorum-based group

k

-mutual exclusion algorithm for open distributed computing systems that can evolve their behavior based on membership changes in the environment. The algorithm consists of two main layers; the quorum-consensus and quorum-reconfiguration. The quorum consensus layer is used to handle requests from and to the application layer, and it directly adopts a proposed

k

-coterie based algorithm of the group

k

-mutual exclusion in the static environments without any change to its protocol. Thus, the message complexity and quorum availability are the same as in the static environments. The quorum reconfiguration reconstructs information structure of the

k

-coterie by simply implementing the properties of two quorum input operations called coterie-join and coterie-cross. The reconfiguration layer is simple to use and has a great ability to complete any operation during reconfiguration powerfully thus system does not enter the halt state.

Armin Lawi, Kentaro Oda, Takaichi Yoshida
An Efficient Implementation of the Backtesting of Trading Strategies

Some trading strategies are becoming more and more complicated and utilize a large amount of data, which makes the backtesting of these strategies very time consuming. This paper presents an efficient implementation of the backtesting of such a trading strategy using a parallel genetic algorithm (PGA) which is fine tuned based on thorough analysis of the trading strategy. The reuse of intermediate results is very important for such backtesting problems. Our implementation can perform the backtesting within a reasonable time range so that the tested trading strategy can be properly deployed in time.

Jiarui Ni, Chengqi Zhang
Reconfigurable Object Consistency Model for Distributed Shared Memory

The consistency models are responsible for managing the state of shared data for the applications of a distributed shared memory (DSM) systems. The already proposed consistency models are inflexible and cannot adapt to the workload and environments characteristics. So, they cannot achieve the best performance for the workloads and environments in all the cases. In this work, we propose, present and analyze a reconfigurable consistency model (ROCoM –Reconfigurable Object Consistency Model) for object based DSMs. ROCoM behavior was represented using a reconfigurable algorithm (RA) and its analysis was made using a simulation tool. Our results show that ROCoM, on average, had 34% (upper bound) better performance than other ones.

Christiane V. Pousa, Luís F. W. Góes, Carlos A. P. S. Martins

Session 1D: Fault Tolerance and Reliability

ER-TCP: An Efficient Fault-Tolerance Scheme for TCP Connections

This paper proposes a novel scheme, called ER-TCP, which transparently masks the failures on the server nodes in a cluster from clients at TCP connection level. Connections at the server side are actively and fully replicated to remain consistency. A log mechanism is designed to cooperate with the replication to achieve small sacrifice on the performance of communication and makes the scheme scale beyond a few nodes, even when they have different processing capacities. The scheme is justified by experiments conducted on prototype implementation.

Zhiyuan Shao, Hai Jin, Bin Cheng, Wenbin Jiang
Online Adaptive Fault-Tolerant Routing in 2D Torus

In this paper, we propose efficient routing algorithms for 2D torus with possible large number of faulty nodes. There is no presumption on the number and the distribution of faulty nodes. The proposed algorithms find a fault-free path between any two nonfaulty nodes with high probability in linear time by using only the local routing information of the network. The results of our empirical analysis through simulations show that the algorithms can find a fault-free path between any two nonfaulty nodes with high probability. For example, in a torus of size up to 128×128, where, the number of faulty nodes up to 15%, the heuristuc-square routing algorithm finds a fault-free path with a probability of 90% or higher. The experimental results are impressive for 2D torus with only four links per node.

Yamin Li, Shietung Peng, Wanming Chu
Replicating Multithreaded Web Services

Replication is a widely used technique for providing high-availability and fault-tolerance of critical services. Multithreaded implementation of services presents a challenge to the replication technique, since managing the execution order of the threads on different replication sites for consistency purpose is not a trivial task. This paper presents a middleware that transparently support reliable web services built on active replication. The middleware is responsible for maintaining the consistency of the replicas’ states. It also handles issues relating to multithreaded implementation of web services.

Xinfeng Ye, Yilin Shen
Design Schemes and Performance Analysis of Dynamic Rerouting Interconnection Networks for Tolerating Faults and Preventing Collisions

In fault-tolerant multistage interconnection design, the method of providing disjoint paths can tolerate faults, but it is complicated and hard to choose a collision-free path in disjoint paths networks. A disjoint paths network can concurrently send more identical packets from the source node to increase the arrival ratio, but the method might increase the collision ratio. In contrast, a dynamic rerouting method finds an alternative path that tolerates faults or prevents collisions. In this paper, we present methods of designing dynamic rerouting networks. This paper presents 1) three kinds of dynamic rerouting networks designed to tolerate faults and prevent collisions; 2) design schemes that enable a dynamic rerouting network to use destination tag routing to save hardware cost in switches for computing rerouting tags; and 3) simulation results of related dynamic rerouting networks to realize the factors which influence the arrival ratio including the fault tolerant capability and the number of rerouting hops. According to our proposed design schemes and according to our analysis and simulation results, a designer can choose an applicable dynamic rerouting network by using cost-efficient considerations.

Ching-Wen Chen, Chang-Jung Ku, Chih-Hung Chang
RRBS: A Fault Tolerance Model for Cluster/Grid Parallel File System

Parallel file systems stripe the data from a single file across multiple cluster/grid nodes so that the systems can access file in parallel. In such a system, if an I/O node or the storage device of that node doesn’t work, all the subfiles on the node can’t be accessed. In this paper, we introduce a special fault tolerance model for parallel file systems called Round-robin Redundant Backup of Subfile (RRBS). This model ensures the accessibility of the parallel files even when an I/O node is failure. In order to test the usability of RRBS, we also developed a prototype of parallel file system called WPFS on a PC/Windows cluster.

Yan-mei Huo, Jiu-bin Ju, Liang Hu

Session 2A: High-Performance Computing and Architecture I

Fast Parallel FFT on CTaiJi: A Coarse-Grained Reconfigurable Computation Platform

Traditional microprocessors are today getting more and more inefficient for a growing range of applications that are mainly about processing data-stream. These applications have two character characteristics: one is that lots of intensive computation tasks need to be processed, another is that the running time of these tasks occupy more than 90% of total time. Coarse grained reconfigurable computation is very fitful for these tasks and can achieve very high performance. This paper presents implementation of the task of fast parallel complex FFT on CTaiJi, the 16bits Reconfigurable computation platform, which is targeting on streamed applications such as multi-media and DSP (digital signal processing). The proposed mapping comprises fast store-address transformation and configuring the function of PEA (processing element array) to fit for FFT. More-over, the performance is scalable according to FFT sizes. Since there is no functionality specifically tailored to FFT, the results demonstrate the capability of CTaiJi architecture to extract parallelism from streamed applications. Further ration- ales are given based on the concepts of scalar operand networks.

LiGuo Song, YuXian Jiang
Integrating Local Job Scheduler – LSF TM with Gfarm TM

Applications that both access and generate large data sets increasingly draw our attention in high energy physics, astronomy, genomics and other disciplines. The Data Grids, like Gfarm, seek to harness geographically distributed resources for such large-scale data-intensive problems. However, scheduling is a challenging task in this context. In this paper, we discuss the integration of LSF with Gfarm. We will discuss how to enable LSF to support Gfarm applications requiring GSI authentication, the design and implementation of data aware scheduling and data management. The system is able to find data-affinity hosts for Gfarm jobs and to adjust the distribution of the data replicas dynamically according to the job load. Before job running, the system will setup the proper credential for it. Using the LSF scheduler plugin mechanism, we do not need to write a new scheduler from scratch or make a lot of changes to an existing scheduler.

Xiaohui Wei, Wilfred W. Li, Osamu Tatebe, Gaochao Xu, Liang Hu, Jiubin Ju
Cache Management for Discrete Processor Architectures

Many schemes had been used to reduce the performance (or speed) gap between processors and main memories; such as the cache memory is one of the most methods. In this paper, we issue the structure of shared cache, which is based on the multiprocessor architectures to reduce the memory latency time that is the one of major performance bottlenecks of modern processors. In this paper, we mix two schemes, sharing cache and multithreading, to implement this proposed multithreaded architecture with shared cache, to reduce the memory latency and, furthermore improve the processor performance. In this proposed multithreaded architecture, the shared cache is achieved in level-1 (L1) data cache. The L1 shared data cache is combination of cache clock in the single space address and a cache controller to solve the required data transmitting, data copies simultaneously, and reduce memory latency time.

Jih-Fu Tu
Enhancing DCache Warn Fetch Policy for SMT Processors

Simultaneous Multithreading (SMT) processors improve performance by allowing running instructions from several threads simultaneously at a single cycle. These threads executing simultaneously share the processor’s resources, but at the same time compete for them. A thread missing in L2 cache may allocate a large number of resources which other threads could be using to make forward progress. And as a result, the overall performance of SMT processors is degraded. To prevent this situation, many instruction fetch policies are proposed. DWarn is among the most efficient fetch policies to handle L2 cache misses. In this paper, we present an enhanced version of the DWarn policy called DWarn+. Results show that our policy significantly improves the original one in throughput and fairness when not more than four threads run. When the number of threads running is higher than 4, our policy enhances the original one mainly for memory bounded workloads, and the average improvement for all types of workloads is very limited.

Minxuan Zhang, Caixia Sun
Aggressive Loop Fusion for Improving Locality and Parallelism

Existing loop fusion algorithms fuse loop nests only when the dependences in the loop nests are not violated. This paper presents a new algorithm that is capable of fusing loop nests in the presence of fusion-preventing anti-dependences. We eliminate all these violated dependences by automatic array copying. In this work, such an aggressive loop fusion strategy is applied to a Jacobi program. The performance of such iterative methods is typically limited by the speed of the memory system. Fusing the two loop nests in the Jacobi program into one reduces data cache misses, and consequently, improves the performance results of both sequential and parallel versions of the Jacobi program, as validated by our experimental results on an HP AlphaServer SC45 supercomputer.

Jingling Xue

Session 2B: Parallel Algorithms and Systems I

An Upper Bound on Blocking Probability of Vertical Stacked Optical Benes Networks

D

irectional coupler (DC)-based optical switching networks can switch signals at the rate of several terabits per second. Benes networks are widely employed for their small depth and self-routing capability. Crosstalk between two optical signals passing through the same DC is an intrinsic drawback in DC-based optical networks. Vertical stacking of multiple copies of an optical Benes network has been intensively studied by researchers to build non-blocking optical networks. The resulting network is called vertically stacked optical Benes network (VSOBN). However, no rigorous analysis has been done to predict the behavior of VSOBN. In this paper, we study the deterministic conditions for strictly non-blocking VSOBN with and without worst case scenarios. We further analyze the blocking probabilities of VSOBN networks under a fixed load and develop their upper bound with respect to the number of planes in the networks. These performance measures can be used to predict the performance of VSOBN.

Jiling Zhong, Yi Pan
It’s Elementary, My Dear Watson: Time-Optimal Sorting Algorithms on a Completely Overlapping Network

Several parallel architectures exist in computer science literature. Motivated by the experimental overlapping connectivity network, we propose a new theoretical network called a completely overlapping network (CON). This network is an extension of the overlapping connectivity network with multiple buses. In this paper we investigate some properties of this network and demonstrate the use of CON and its usefulness by solving two toy problems: decimal number and one-digit binary number sortings.

Sanpawat Kantabutra, Wattana Jindaluang, Prapaporn Techa-angkoon
Lock-Free Parallel Garbage Collection

This paper presents a lock-free parallel algorithm for

garbage collection

in a realistic model using synchronization primitives offered by machine architectures.

Mutators

and

collectors

can simultaneously operate on the data structure. In particular no strict alternation between usage and cleaning up is necessary, contrary to what is common in most other garbage collection algorithms.

We first design and prove an algorithm with a coarse grain of atomicity and subsequently apply the reduction theorem developed in [11] to implement the higher-level atomic steps by means of the low-level primitives.

H. Gao, J. F. Groote, W. H. Hesselink
Adaptive Parallel Ant Colony Optimization

In this paper an adaptive parallel ant colony optimization is developed. We propose two different strategies for information exchange between the processors: selection based on sorting and on distance, which make each processor choose a partner to communicate and update the pheromone according to the partner’s pheromone. In order to increase the ability of search and avoid early convergence, we also propose a method of adjusting the time interval of information exchange adaptively according to the convergence factor of each processor. Experimental results based on traveling salesman problem on the massive parallel processors (MPP) Dawn 2000 demonstrate the proposed APACO are superior to the classical ant colony optimization.

Ling Chen, Chunfang Zhang
Collective Communications for Scalable Programming

HPJava is an environment for scientific and parallel programming using Java. It is based on an extended version of the Java language. One feature that HPJava adds to Java is a multi-dimensional array, or multiarray, with properties similar to the arrays of Fortran. We are using Adlib as our high-level collective communication library. Adlib was originally developed using C++ by the Parallel Compiler Runtime Consortium (PCRC). Many functionalities of this high-level communication library is following its predecessor. However, many design issues are reconsidered and re-implemented according to Java environment. Detailed functionalities and implementation issues of this collective library will be described.

Sang Boem Lim, Bryan Carpenter, Geoffrey Fox, Han-Ku Lee

Session 2C: Network Routing and Communication Algorithms I

A Fast and Scalable Conflict Detection Algorithm for Packet Classifiers

Packet filters are rules for classifying packets based on their header fields. A filter conflict occurs when two or more filters overlap, creating an ambiguity in packet classification. There has been prior works on conflict detection for multi-dimensional classifiers, but their efficiency and scalability are not good. A new algorithm is proposed, which uses hashing-based PATRICIA trie. The new algorithm can fast detect conflicts in classifiers and have high scalability. The technology of processing transport-level ports can bring more security than existed algorithms.

Xin Li, Zhenzhou Ji, Mingzeng Hu
Loss Rate Aware Preferential Treamtment Scheme at the Congested Router

One weakness of the RED algorithm typical of routers is that at any given time, it imposes the same loss probability on all flows, regardless of their QoS. In this paper, we propose an improved packet discard algorithm based on RED for real-time multimedia service, which does its endeavor to avoid dropping packets from the same flow continuously in terms of instantaneous loss rate and short-time average loss rate. It traces the concerned flow’s state whenever every packet’s arrival and then dynamically adjusts the packet drops to achieve a given loss rate. When network is congested, it drops all packets of certain invalid flows mandatorily to avoid a majority of applications being invalidated simultaneously because of packet loss. Our evaluation results indicate that the proposed algorithm provides better QoS for real-time multimedia service, whether network is congested or not.

Dongping Zhao, Deyun Zhang, Jiuxing Cao, Weibin Zheng, Zhiping An
A Heuristic Routing Algorithm for Degree-Constrained Minimum Overall Latency Application Layer Multicast

Application Layer Multicast (ALM) shifts multicast functionality from routers to end hosts and has the potential to address most problems associated with IP multicast. It has attracted wide attention in research community in recent years. However, as an end host based solution, the applicability of ALM to realtime applications such as streaming services is constrained by node bandwidth and transmission latency. How to guarantee QoS is still a challengeable problem. In this paper, we think overall latency is a more effective metric for evaluating the QoS perceived by most users and explore the optimization problem of Degree-Constrained Minimum Overall Latency Spanning Tree (DCMOLST). We divide the optimization process into initialization phase and dynamic adjustment phase. In the former stage, we propose a heuristic algorithm through giving a more consideration to both transmission delay and node bandwidth, so as to avoid QoS degradation caused by single metrics. In the later, we present a set of distributed iterative optimizing operations for further optimization. Experimental results show that our proposal can improve overall performance efficiently and is able to cope with network dynamics.

Baoliu Ye, Minyi Guo, Daoxu Chen, Sanglu Lu
DIRA: Distributed Insertion and Relocation Routing Algorithm for Overlay Multicast in Diffserv Domain

Traditional multicast routing algorithms require routers to be specially designed such that they can forward the same copy of packet onto different output links simultaneously. This idea conflicts severely with the core-stateless principle in most QoS network architectures, such as Diffserv. Meanwhile, overlay multicast emerges as an effective alternative to the traditional approaches. In this paper, we combine the overlay multicast technique with the Diffserv architecture seamlessly and design an overlay multicast routing algorithm, which generates a multicast tree at a pretty low cost incrementally. Owing to its distributed nature, the computation of the tree can be executed efficiently. Through a large amount of simulations, we have shown that our algorithm is quite competitive in terms of network resource utilization and other metrics.

Xiao Chen, Huagang Shao, Weinong Wang
Load Balancing Based on Similarity Multi-paths Routing

To load balance in Internet, we need more valid routing paths to share load in the case of no long-term routing loops to be introduced. It is acknowledged to adopt near or relaxed best routing to extend the number of available paths in multi-path routing. However, it is difficult to determine the degree of approximation or relaxed. A new distributed algorithm (which is called similarity multi-paths routing, SMR) for the dynamic computation of multiple paths from source to destination in a computer network is presented in this paper. SMR uses similarity principle to computes similarity coefficient between the shortest path and other paths, and then makes use of similarity coefficient to estimate the degree of approximation. Simulations show us it is robust for SMR to select near or relaxed best paths. Based on SMR, we also propose a traffic balancing algorithm. Its average performance is analyzed by simulation and compared against Equal Cost Multi-path (ECMP).

Wuping Xu, Puliu Yan, Delin Xia, Ming Wu

Session 2D: Security Algorithms and Systems I

Secure Real-Time Transaction Processing with Timeliness Guarantees in Mobile Distributed Real-Time Database Systems

Mobile distributed real-time databases are needed in security-critical applications, e.g., e-commerce, stock trading system, and military applications. In these applications, mobile distributed real-time database systems have to simultaneously satisfy two requirements in guaranteeing data security and minimizing the deadline miss ratio for admitted transactions. Multilevel secure database system based on mandatory access control can prevent direct unlawful information flows between transactions belonging to different clearance levels. However, it cannot prevent the covert communications between transactions belonging to different clearance levels. This paper presents a secure hybrid optimistic real-time concurrency control protocol (SHORTCC). The protocol not only considers carefully the inherent characteristics of mobile environment and the timing constraints of time-critical applications, but also achieves data security without sacrificing real-time performance significantly.

Yingyuan Xiao, Yunsheng Liu, Guoqiong Liao, Xiaofeng Liu
A New Data Fusion Model of Intrusion Detection-IDSFP

Based on the multi-sensor data fusion technology, a new Intrusion Detection Data Fusion Model-IDSFP is presented. This model is characterized by correlating and merging alerts of different types of IDSs, generating the measures of the security situation, and thus constituting the evidence. Current security situation of network is estimated by applying the D-S Evidence Theory, and some IDSs in the network are dynamically adjusted to strengthen the detection of the data that relate to the attack attempt. Consequently, the false positive rate and the false negative rate are effectively reduced, and the detection efficiency of IDS is accordingly improved.

Junfeng Tian, Weidong Zhao, Ruizhong Du, Zhe Zhang
The Application of Collaborative Filtering for Trust Management in P2P Communities

The open and anonymous nature of P2P services opens the door to malicious peers who cause the loss of trust by providing corrupted data or harmful services. The introduction of a trust management system is one of the possible ways to combat this problem. This paper presents some new ideas for the design of a P2P trust management system. Its main contributions include: a recommendation-aggregating model based on collaborative filtering (CF), a polling protocol for trust queries and responses, and the use of identity-based cryptosystem to secure recommendations. Simulations show that our CF-based trust model performs pretty well even when malicious peers make the majority.

Min Zuo, Kai Wang, Jianhua Li
Intelligent DDoS Packet Filtering in High-Speed Networks

Currently high-speed networks have been attacked by successive waves of Distributed Denial of Service (DDoS) attacks. There are two major challenges on DDoS defense in the high-speed networks. One is to sensitively and accurately detect attack traffic, and the other is to filter out the attack traffic quickly, which mainly depends on high-speed packet classification. Unfortunately most current defense approaches can not efficiently detect and quickly filter out attack traffic. Our approach is to find the network anomalies by using neural network, deploy the system at distributed routers, identify the attack packets, and then filter them quickly by a Bloom filter-based classifier. The evaluation results show that this approach can be used to defend against both intensive and subtle DDoS attacks, and can catch DDoS attacks’ characteristic of starting from multiple sources to a single victim. The simple complexity, high classification speed and low storage requirements make it especially suitable for DDoS defense in high-speed networks.

Yang Xiang, Wanlei Zhou

Session 3A: High-Performance Computing and Architecture II

2L-MuRR: A Compact Register Renaming Scheme for SMT Processors

In simultaneous multithreaded (SMT) processors, a larger multi-ported rename register file is indispensable for holding more intermediate results of in-flight instructions. However, larger rename register file incurs longer access delay and more power consumption, which are becoming a bottleneck in future SMT processors. To tackle these problems, we propose

2L-MuRR

, the abbreviation of

Multi-usable Rename Register with 2-Level renaming and allocating

, which focuses on more efficient utilization of a fewer number of rename registers. Based on the fact that the effective bit-width of most operands is narrower than the full-bit width of a register entry, 2L-MuRR partitions each rename register into several fields of different widths. Either single field or field combination can hold an operand, thus making each rename register

multi-usable

. The simulations show that 2L-MuRR improves the efficiency of the rename register file significantly, achieving higher performance with much fewer rename registers.

Hua Yang, Gang Cui, Xiao-zong Yang
Scheduling Convex Bipartite Communications Toward Efficient GEN_BLOCK Transformations

Irregular data redistribution is used to enhance data locality and algorithm performance on heterogeneous processor systems. In this paper, we present an efficient scheduling algorithm based on convex bipartite communications for irregular

GEN_BLOCK

transformations. The proposed technique consists of two phases:

degree reduction phase

, schedules communications involved in processors with degree greater than two; and

coloring phase

, schedules remaining communications of all processors with degree-2 and degree-1. To evaluate the performance of our algorithm, we have implemented the proposed technique along with three scheduling methods. The simulation results show improvement of total communication costs by the proposed algorithm.

Ching-Hsien Hsu, Shih-Chang Chen, Chao-Yang Lan, Chao-Tung Yang, Kuan-Ching Li
A Chronological History-Based Execution Time Estimation Model for Embarrassingly Parallel Applications on Grids

In order to identify and schedule jobs that are suitable for determined resources, an execution time estimation model is required. In this paper, it is described a Chronological history-based execution time estimation model to predict current execution time, according to the previous execution results. We built a heterogeneous computational Grid environment using Globus Toolkit, and our research is focused in Grid computing environments and to execute parallel jobs on multiple resources by measuring its accuracy. The experimental results shown that our model can accurately predict the execution time of embarrassingly parallel applications.

Chao-Tung Yang, Po-Chi Shih, Cheng-Fang Lin, Ching-Hsien Hsu, Kuan-Ching Li
Developing High-Performance Parallel Applications Using EPAS

In spite of the advent of high performance parallel computers and commodity clusters, complexity of parallel application development remains one of the major obstacles towards the mainstream adoption of parallel computing. Researchers are constantly investigating different approaches to reduce parallel application development time and increase productivity. As re-usable components, patterns have gained popularity in the sequential programming domain. Subsequently, several pattern-based parallel programming environments (PPEs) have been proposed to facilitate parallel application development procedure. Unfortunately, most of these PPEs lack the required flexibility in order to develop real-life parallel applications. In this paper, we describe the features of the EPAS (Extended Parallel Architectural Skeleton) PPE that enables development of complex parallel applications. We investigate and design the required patterns, and then use them to develop a parallel data cube computing application. Finally, we present the performance of the developed applications and discuss the results.

Mohammad Mursalin Akon, Ajit Singh, Xuemin Shen, Dhrubajyoti Goswami, Hon Fung Li
On Utilization of the Grid Computing Technology for Video Conversion and 3D Rendering

In this paper, we investigate the recent popular computing technique called Grid Computing, and use video conversion and 3D rendering applications to demonstrate this technology’s effectiveness and high performance. We also report on developing a resource broker called Phantom that runs on our grid computing testbed and whose main function is querying nodes in grid computing environments and showing their system information to aid in selecting the best nodes for job assignments to have the jobs executed in the least amount of time.

Chao-Tung Yang, Chuan-Lin Lai, Kuan-Ching Li, Ching-Hsien Hsu, William C. Chu

Session 3B: Parallel Algorithms and Systems II

Communication-Free Data Alignment for Arrays with Exponential References Using Elementary Linear Algebra

For array references with induction variables, after induction variable substitution for those induction variables is performed, those array references substituted are transformed as

nonlinear

expressions. The goal of data alignment is to intelligently map computations and data onto a set of virtual processors organized as a Cartesian grid with multi-dimensions (or a

template

in HPF term), and to provide data locality in a program so that the data access communication costs can be minimized. Most data alignment methods are mainly devised to align the arrays referenced using linear subscripts or quadratic subscripts with

n

loop index variables [Chang, 2004]. In this paper, we propose a new communication-free data alignment technique to align the arrays referenced using

exponential

subscripts with

n

loop index variables or other complex nonlinear expressions. The experimental results from our techniques on SPEC95FP Benchmarks point out that the techniques can be applied to improve the execution time of the subroutines in those benchmarks.

Weng-Long Chang, Minyi Guo, Michael Ho, Sien-Tang Tsai
Parallel Unstructured Quadrilateral Mesh Generation

In this paper we present our efforts to parallelize an unstructured quadrilateral mesh generator. Its serial version is based on the divider-and-conquer idea, and mainly includes two stages, i.e. geometry decomposition and mesh generation. Both stages are parallelized separately. A couple of parallel models are introduced and compared to parallelize the stage of geometry decomposition. A fine-grain level parallel algorithm proves preferable to that based on the task-dependency tree, with which the load imbalance brought by the improper utilization of the symmetry of the vertex pair matrix is removed nicely. Since the number of elements in sub-domains could be pre-computed before meshing, a simple static load balancing scheme is investigated, and the effect of granularity is also discussed briefly. Finally, experiments are designed to evaluate the performance of the parallel mesh generator in detail.

Jianjun Chen, Yao Zheng
Container Problem in Burnt Pancake Graphs

In this paper, we propose an algorithm that solves the container problem in n-burnt pancake graphs in polynomial order time of n. Its correctness is proved and estimates of time complexity and sum of paths lengths are given. We also report the results of computer experiment conducted to measure the average performance of our algorithm. burnt pancake graphs, container problem, internally disjoint paths, polynomial time algorithm.

N. Sawada, Y. Suzuki, K. Kaneko
A Cost Optimal Parallel Quicksorting and Its Implementation on a Shared Memory Parallel Computer

This paper discusses a parallel quicksort algorithm that is cost optimal, in average, using O(n/log(n)) processors. The cost optimality is mainly due to a cost optimal partitioning algorithm that utilizes all the processors when partitioning the array. A temporary array of the same size as the original array is needed during the partitioning process. The prefix sums are used to determine where a processor can copy its data.

We will prove that the algorithm has an average case complexity O(log

2

n), where n is the size of the data array. We will also discuss the implementation of our algorithm on a shared memory parallel computer and demonstrate that it outperforms other O(log

2

n) parallel sorting algorithms. In addition, it outperforms the sequential quicksort algorithm starting with two processors.

Jie Liu, Clinton Knowles, Adam Brian Davis

Session 3C: Network Routing and Communication Algorithms II

Near Optimal Routing in a Small-World Network with Augmented Local Awareness

In order to investigate the routing aspects of small-world networks, Kleinberg [13] proposes a network model based on a

d

-dimensional lattice with long-range links chosen at random according to the

d

-harmonic distribution. Kleinberg shows that the greedy routing algorithm by using only local information performs in

O

(log

2

n

) expected number of hops, where

n

denotes the number of nodes in the network. Martel and Nguyen [17] have found that the expected diameter of Kleinberg’s small-world networks is Θ(log

n

). Thus a question arises naturally: Can we improve the routing algorithms to match the diameter of the networks while keeping the amount of information stored on each node as small as possible?

Jianyang Zeng, Wen-Jing Hsu, Jiangdian Wang
Systolic Routing in an Optical Fat Tree

In this paper we present an all-optical network architecture and a systolic routing protocol for it. An

r

-dimensional optical fat tree network (

$\mathcal{OFT}$

) consists of 2

r

–1 routing nodes and

n

= 2

r

processing nodes deployed at the leaf nodes of the network. In our construction packets injected into the

$\mathcal{OFT}$

carry no routing information. Routing is based on the use of a cyclic control bit sequence and scheduling. The systolic routing protocol ensures that no electro-optical conversion is needed in the intermediate routing nodes and all the packets injected into the routing machinery will reach their target without collisions. A work-optimal routing of an

h

-relation is achieved with a reasonable size of

h

.

Risto T. Honkanen
Fast Total-Exchange Algorithm

Optical communication offers huge bandwidth and makes it possible to build communication networks of very high bandwidth and connectivity. We study routing of the

h

-relations in optical communication pararallel computer under so called

OCPC

or

1-collision

assumption. In an

h

-relation each processor is the origin and the destination of at most

h

-messages.

In this paper we study the case where

h

is much larger than the number of the processors. Our algorithm uses total-exchange primitive to route packets. Our algorithm routes random

h

-relations in a

p

-processor network using

$\frac{h}{p}(1+o(1))+O(\sqrt{\frac{h}{p}log p})$

total-exchange rounds with high probability. The algorithm attempts to balance the number of packets between origin-destination pairs. The experiments show that when

h

is large compared to the number of processors, the algorithm achieves simulation cost which is very close to 1. I.e. the

h

-relation is routed in the

ch

, where

c

is only little more than 1.

Anssi Kautonen
MFLWQ: A Fair and Adaptive Queue Management Algorithm for Scavenger Service

This paper presents a MFLWQ Algorithm for Scavenger Service, which is a non-elevated QoS technique proposed by the Internet2 project. The algorithm balances the bandwidth distribution among Scavenger Service flows by a modified Flow Random Early Detection (FRED) algorithm; estimates active Scavenger Service flow number by the variable

nactive

in FRED and accordingly tunes the bandwidth allocation between SS flows and BE flows in a logarithmical way to reach better performance and robustness. Simulations show that the algorithm provides more appropriate bottom bandwidth guarantee for Scavenger Service flows when protecting Best-Effort flows well and extends the applicability of Scavenger service to no-adaptive traffic like UDP.

Xiaofeng Chen, Lingdi Ping, Zheng Wan, Jian Chen

Session 3D: Security Algorithms and Systems II

CBTM: A Trust Model with Uncertainty Quantification and Reasoning for Pervasive Computing

This paper presents a novel trust model in which we model trust based on an exotic uncertainty theory, namely cloud model. We regard trust between entities as a cloud that is called as trust cloud. Based on such a quantification model of trust, we further propose the algorithms to compute propagated trust relationships and aggregated trust relationships, which are needed for trust reasoning in pervasive computing environments. Finally, we compare the proposed trust model with other three typical models in simulation experiments, and the results shows the cloud-based trust model performs better in a total sense.

Rui He, Jianwei Niu, Guangwei Zhang
An Authentication Protocol for Pervasive Computing

Authentication protocols are essential for security in many systems. However, authentication protocols are error-prone and difficult to design. In pervasive computing, the inherent characteristics such as mobility and restricted resources make it even harder to design suitable authentication protocols. In this paper we propose an authentication protocol to solve an open problem in pervasive computing, that is secure use of public information utilities without accessing a

trusted third party

(TTP). Our solution not only provides authentication, but also establishes a secure communication channel between the user and the service provider without the participation of TTP. The authentication protocol can be built with any secure symmetric and asymmetric cryptographic algorithm. We show the protocol can resist passive and active attacks. We also discuss how the protocol can be extended to an applicable scheme with payment support.

Shiqun Li, Jianying Zhou, Xiangxue Li, Kefei Chen
A Hybrid Neural Network Approach to the Classification of Novel Attacks for Intrusion Detection

Intrusion Detection is an essential and critical component of network security systems. The key ideas are to discover useful patterns or features that describe user behavior on a system, and use the set of relevant features to build classifiers that can recognize anomalies and known intrusions, hopefully in real time. In this paper, a hybrid neural network technique is proposed, which consists of the self-organizing map (SOM) and the radial basis function (RBF) network, aiming at optimizing the performance of the recognition and classification of novel attacks for intrusion detection. The optimal network architecture of the RBF network is determined automatically by the improved SOM algorithm. The intrusion feature vectors are extracted from a benchmark dataset (the KDD-99) designed by DARPA. The experimental results demonstrate that the proposed approach performance especially in terms of both efficient and accuracy.

Wei Pan, Weihua Li
Efficient and Beneficial Defense Against DDoS Direct Attack and Reflector Attack

Distributed Denial-of-Service (DDoS) attacks misuse network resource and bring serious threats to the internet. Detecting DDoS at the source-end has many advantages over defense at the victim-end and intermediate-network. However, one of the main problems for source-end methods is the performance degradation brought by these methods and no direct benefit for Internet Service Provider(ISP), which discourages ISPs to deploy the defense system. We propose an efficient detection approach, which only requires limited fixed-length memory and low computation overhead but provides satisfying detection results. Our method is also beneficial because the method can not only detect direct DDoS attack for other ISPs, but also protect the ISP itself from reflector DDoS attack. The efficient and beneficial defense is practical and expected to attract more ISPs to join the cooperation. The experiments results show our approach is efficient and feasible for defense at the source-end.

Yanxiang He, Wei Chen, Wenling Peng, Min Yang

Session 4A: Grid Applications and Systems

Study on Equipment Interoperation Chain Model in Grid Environment

Scientific collaboration has emerged as an important tool for getting forefront research results by interoperating geographically distributed scientific equipment to solve complicated scientific problems. Grid technologies offer effective strategies to achieve this ambitious goal. However, more capabilities are required in the context of Equipment Grid due to the complexity of collaboration between equipment resources. In this paper an interesting equipment interoperation chain model is proposed. Equipment resources are organized into equipment pools, based on which the interoperation chain of equipment is built. The performance is discussed by means of several theoretical tools like Petri Net and

π

-Calculus from the different viewpoint of users. The structure of interoperation chain is proven highly efficient and feasible. Finally we analyze the prospective direction and challenges in this field.

Yuexuan Wang, Cheng Wu
Grid Accounting Information Service with End-to-End User Identity

Grid computing virtualizes heterogeneous geographically disperse resources. Because of the characteristics of the grid environment, the concept of ‘user’ is different from that of traditional local computing environment. That mean, new resolution that providing the end-to-end user identity from grid user to local account is needed. In this paper, we design and implement an accounting information gathering and service (AIService) system. To resolve this problem, we designed a grid access control system, called PGAM. Usage Record of UR-WG in GGF is used as a common usage record. And we designed and implemented a grid service which provides the gathered accounting information.

Beob Kyun Kim, Haeng Jin Jang, Tingting Li, Dong Un An, Seung Jong Chung
Performance Analysis and Prediction on VEGA Grid

With the dramatic development of grid technologies, performance analysis and prediction of grid systems is increasingly significant to develop a variety of new grid technologies. The VEGA grid, a new grid infrastructure developed by Institute of Computing Technology, CAS, views a grid as a distributed computer system. In this paper, we propose some new metrics to evaluate the performance of it. Moreover, we apply queueing system models to model the VEGA grid and predict the performance of it in terms of the mean queue length and mean service time, especially, in the equilibrium state. Hence, a real application, the Air booking service, is deployed on the VEGA grid as a benchmark to measure the performance via latency and throughput. Finally, we point out this method can be used on other homogeneous grid systems.

Haijun Yang, Zhiwei Xu, Yuzhong Sun, Zheng Shen, Changshu Liu
An Accounting Services Model for ShanghaiGrid

The ShanghaiGrid, as a Grid Computing Environment, is an information Grid to serve the public in the city, and all resources are regarded as Grid Services in the Open Grid Services Architecture (OGSA). The primary goal of the ShanghaiGrid is to build a generally shared information grid platform. Charging and accounting are an important part of the grid computing system in the ShanghaiGrid. This paper discusses an accounting services model and accounting life cycle that will be used in the ShanghaiGrid. We will analyze the charging and accounting process in detail based on this model and cycle.

Jiadi Yu, Qi Qian, Minglu Li
An Agent-Based Grid Computing Infrastructure

The conventional computing grid has developed a service oriented computing architecture with a super-local, two commit scheduling strategy. This architecture is limited in modeling open systems with highly dynamic and autonomous computing resources due to its server-based computing model. The use of super-local scheduling strategy also limits the utilization of the computing resources. In this paper, we propose a multi-agent based grid computing infrastructure to tackle the above issues, while provide reasonable compatibility and interoperability with the conventional grid systems and clients. Compared with the existing grids, the new infrastructure is leveraged by the intelligent agents, and therefore is more efficient and flexible for open systems.

Jia Tang, Minjie Zhang

Session 4B: Database Applications and Data Mining

Dynamically Mining Frequent Patterns over Online Data Streams

Data streams are massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, it is challenge to find frequent items over data streams in a dynamic environment. In this paper, a new novel algorithm was proposed, which can capture frequent items with any length online continuously. Furthermore, several optimization techniques are devised to minimize processing time as well as main memory usage. Compared with related algorithm, it is more suitable for the mining of long frequent items. Finally, the proposed method is analyzed by a series of experiments and the results show that this algorithm owns significantly better performance than before.

Xuejun Liu, Hongbing Xu, Yisheng Dong, Yongli Wang, Jiangbo Qian
Clustering Mixed Type Attributes in Large Dataset

Clustering is a widely used technique in data mining, now there exists many clustering algorithms, but most existing clustering algorithms either are limited to handle the single attribute or can handle both data types but are not efficient when clustering large data sets. Few algorithms can do both well. In this paper, we propose a clustering algorithm CFIKP that can handle large datasets with mixed type of attributes. We first use

CF

*

-tree to pre-cluster datasets. After the dense regions are stored in leaf nodes, then we look every dense region as a single point and use an improved k-prototype to cluster such dense regions. Experiments show that the CFIKP algorithm is very efficient in clustering large datasets with mixed type of attributes.

Jian Yin, Zhifang Tan
Mining Association Rules from Multi-stream Time Series Data on Multiprocessor Systems

Mining association rules from multi-stream data has received a lot of attention to the data mining community. It is quite effective and useful to discover such rules. However, it is a very time consuming and expensive task to mine the rules from these kinds of time ordered real valued continuous data sets with high dimensionality when they are enormous in size. This strongly motivates the need of efficient parallel processing techniques and algorithms. In this paper, we use parallel processing to discover dependency from the large amount of time series multi-stream data. We apply two parallel programming techniques (OpenMP and MPI) to implement this. The experimental results conducted in multiprocessor systems show the effectiveness of MPI over OpenMp.

Biplab Kumer Sarker, Toshiya Hirata, Kuniaki Uehara, Virendra C. Bhavsar
Mining Frequent Closed Itemsets Without Candidate Generation

Mining frequent closed itemsets provides complete and non-redundant result for the analysis of frequent pattern. Most of the previous studies adopted the FP-tree based conditional FP-tree generation and candidate itemsets generation-and-test approaches. However, those techniques are still costly, especially when there exists prolific and/or long itemsets. This paper redesigns FP-tree structure and proposes a novel algorithm based on it. This algorithm not only avoids building conditional FP-tree but also can get frequent closed itemsets directly without candidate itemsets generation. The experimental results show the advantage and improvement of these strategies.

Kai Chen
Distribution Design in Distributed Databases Using Clustering to Solve Large Instances

In this paper we approach the solution of large instances of the distribution design problem. The traditional approaches do not consider that the size of the instances can significantly reduce the efficiency of the solution process, which only involves a model of the problem and a solution algorithm. We propose a new approach that incorporates multiple models and algorithms and mechanisms for instance compression, for increasing the scalability of the solution process. In order to validate the approach we tested it on a new model of the replicated version of the distribution design problem which incorporates generalized database objects, and a method for instance compression that uses clustering techniques. The experimental results, utilizing typical Internet usage loads, show that our approach permits to reduce at least 65% the computational resources needed for solving large instances, without significantly reducing the quality of its solution.

Joaquin Perez Ortega, Rodolfo A. Pazos Rangel, Jose A. Martinez Florez, J. Javier Gonzalez Barbosa, E. Alejandor Macias Diaz, J. David Teran Villanueva

Session 4C: Distributed Processing and Architecture

Modeling Real-Time Wormhole Networks by Queuing Theory

This paper discuses a new approach that models the single node of general real-time wormhole networks, and analyzes this new model with queuing theory. Since the wormhole network’s node is too complicated to analyze generally, a multi-vision solution is applied to analyzing the wormhole network’s node. The wormhole network single-node model is decomposed into three sub-models, and each of them can be analyzed with queuing theory separately. Lastly, a simulation model is made with this solution, and several simulation results are presented to illustrate the performance of real-time wormhole scheduling in single-node.

Lichen Zhang, Yuliang Zhang
A Discrete Event System Model for Simulating Mobile Agent

Simulation has been proved to be a practical approach for performance evaluation of mobile agents. However, the lack of a standard for the execution of mobile agents makes the semantics ambiguous. Thus, the simulation of mobile agents is not feasible or reasonable without an explicitly defined execution model of agents. In this paper, we propose an execution model of mobile agents called SMA. Based on the SMA model, the discrete event models describing the SMA agents and hosts, called SMA-DEVS, are presented using the modelling approach of DEVS and DSDE. We implement a simulation environment based on SMA-DEVS and test the environment with certain mobile agent-based algorithms.

Xuhui Li, Jiannong Cao, Yanxiang He, Jingyang Zhou
A Holistic Approach to Survivable Distributed Information System for Critical Applications

In this paper, a holistic approach to realize survivability of distributed information network systems for critical applications(DISCA) based on three basic states, processed, stored, and transmitted, of information (called a PST-based system model), is proposed and its evaluation method and some experiment results are given as an example of its application. A PST-based system model brings all three parts together and coordinates them through the services supported by them, in which whole system’s survivability is embodied by system services and their interdependency relations. With this model, a multi-layer survivability framework based on the information states is formed and the complexity of a DISCA system in implementation and evaluation can be conquered in the most prevalent approach—“divide and conquer” approach.

H. Q. Wang, D. X. Liu, D. Xu, Y. Y. Lan, X. Y. Li, Q. Zhao
A Personalized and Scalable Service Broker for the Global Computing Environment

Service broker is a software component that maps user’s request to proper services. By using the broker, users can access various services without any knowledge of the services. However, design of the service broker for GCE (Global Computing Environment) is quite difficult because a broker cannot cope with all the information of services which are spread over the world. Thus, we need to build new architectural foundations for global service brokering. In this paper, we propose a scalable broker system for GCE based on the federation of personalized brokers. In our broker system, every user can have a service broker. The broker maintains the information of services related to the user and exchanges the information with neighboring brokers if necessary. As a result, the broker can provide proper services to users faster and a burden of brokers can also be decreased. Experimental results show that the proposed service broker can reduce the average service discovery time and the average operation overhead of each broker comparing to other brokering architectures.

Kyung-Lang Park, Chang-Soon Kim, Oh-Young Kwon, Hyoung-Woo Park, Shin-Dug Kim
Distributed Network Computing on Transient Stability Analysis and Control

In this paper, we introduce graph theory into transient stability analysis in power system. In the weighted graph, Vertex weight represents node’s parallel computing workload and edge weight represents serial computing workload on the border of regions, which reflects the degree of parallelism of computing and improves speed-up ration of system. In order to reduce communication time wastage induced by CSMA protocol in TCP/IP based LAN, asynchronous message passing is used in our method. Simulation results show that it achieves better performance.

Chenrong Huang, Mingxue Chen

Session 4D: Sensor Networks and Protocols

A Distributed Power-Efficient Data Gathering and Aggregation Protocol for Wireless Sensor Networks

In this paper, we propose a distributed power-efficient data gathering and aggregation algorithm (DPEG), in which a node, according to its residual energy and the strength of signal received from its neighboring nodes, independently makes its decision to compete for becoming a cluster head. In addition, assume that the inter-cluster communication data is, in a multi-hop manner, sent to the designated node, which then sends the data gathered by the whole network to the base station. DPEG also proposes a simple approach to solve the cluster coverage problem. With the increase in node density, this approach lets sensor network lifetime be linear in the number of nodes. Our experimental results have proved that DPEG algorithm, in the best case, lets sensor network lifetime be respectively increased by 1800% and 300% as compared with another two data gathering and aggregation protocols— LEACH and PEGASIS.

Ming Liu, Jiannong Cao, Hai-gang Gong, Li-jun Chen, Xie Li
A Key Management Scheme for Cross-Layering Designs in Wireless Sensor Networks

The current wireless sensor network designs are largely based on a layered approach. The suboptimality and inflexibility of this paradigm result in poor performance, due to constraints of power, communication, and computational capabilities. Key management plays an important role in wireless sensor networks, because it not only takes charge of securing link-layer communications between nodes, but also has great effects on other protocol layers, e.g. routing and IDS (Intrusion Detection System). However, no existing key management protocols have attached enough importance to cross-layering designs. In this paper, we propose a cross-layering key management scheme, which can provide other protocol layers with a nice trust-level metric. The trust-level metric is generated during the pairwise key establishment phase, and it varies as system conditions change. This metric describes the security level between two neighboring nodes and helps other protocol layers to make decisions. We also present simulations and analysis to show the superior characteristics of our scheme against both passive attacks and active attacks.

Bo Yu, Haiguang Chen, Min Yang, Dilin Mao, Chuanshan Gao
A Clustering Mechanism with Various Cluster Sizes for the Sensor Network

One of the most important issues on the sensor network with resource-limited sensor nodes is prolonging the network lifetime by effectively utilizing the limited node energy. The most representative mechanism to achieve a long-lived sensor network is the clustering mechanism which can be further classified into the single-hop mode and the multi-hop mode. In the single-hop mode, all sensor nodes in a cluster communicate directly with the cluster head (CH) via single hop, so the contention-less MAC protocol is preferred. In the multi-hop mode, sensor nodes communicate with the CH with the help of other intermediate nodes and the contention-less MAC protocol is not required. One of the most critical factors that impact on the performance of the existing multi-hop clustering mechanism (in which the cluster size is fixed to some value, so we call this the fixed-size mechanism) is the cluster size and, without the assumption on the uniform node distribution, finding out the best cluster size is intractable. Since sensor nodes in a real sensor network are distributed non-uniformly, the fixed-size mechanism may not work best for real sensor networks. Therefore, in this paper, we propose a new dynamic-size multi-hop clustering mechanism in which the cluster size is adjusted according to the information on the load and the residual energy of a CH and that of other nodes near to the CH. We show that our proposed scheme outperforms other clustering mechanisms by carrying out simulations.

Yujin Lim, Sanghyun Ahn
Percentage Coverage Configuration in Wireless Sensor Networks

Recent researches on energy efficient coverage configuration in wireless sensor networks mainly address the goal of 100% or near 100% coverage preserving. However, we find that a small percentage of loss of coverage, which is acceptable in many applications, can result in dramatic increase in energy savings. Therefore, in this paper percentage coverage rather than complete coverage is selected as the design goal, and a location-based Percentage Coverage Configuration Protocol (PCCP) is developed to assure that the proportion of the sensing area after configuration to the original sensing area is no less than a desired percentage. Numerical testing results show that PCCP can not only guarantee the desired coverage percentage but also generate more energy efficient configuration in comparison with the existing schemes so that the system lifespan is extended significantly.

Hongxing Bai, Xi Chen, Yu-Chi Ho, Xiaohong Guan

Session 5A: Peer-to-Peer Algorithms and Systems I

A Fault-Tolerant Content Addressable Network

In this paper, we propose a new method to enhance the fault-tolerance of the Content Addressable Network (CAN), which is known as a typical pure P2P system based on the notion of Distributed Hash Table (DHT). The basic idea of the proposed method is to introduce a redundancy to the management of index information distributed over the nodes in the network, by allowing each index to be assigned to several nodes, which was restricted to be one in the original CAN system. To keep the consistency among several copies of indices, we propose an efficient synchronization scheme based on the notion of labels assigned to each copy in a distinct manner. The performance of the proposed scheme is evaluated by simulation. The result of simulations indicates that the proposed scheme really enhances the fault-tolerance of the CAN system.

Daisuke Takemoto, Shigeaki Tagashira, Satoshi Fujita
Effective Resource Allocation in a JXTA-Based Grid Computing Platform JXTPIA

To extend a Peer-to-Peer (P2P) network system with the mechanisms of distributed/Grid computing, we developed a flexible JXTA- based P2P network interface and architecture, JXTPIA. The JXTPIA system provides the basic functionalities for Grid computing, such as resources allocation and sharing, task scheduling and assignment, network structure constructing and maintenance, etc. One of the main challenges in developing the JXTPIA system is efficient allocation of resources. We developed and evaluated algorithms for resource allocation to improve the efficiency of the JXTPIA system. The experimental results show that the efficiency of the JXTPIA system differs depending on the adopted algorithms. It indicates that scheduling based on limited information about peers has effects on the performance of the entire system. Though the adopted algorithms are specialized for the JXTPIA system only, the principle of the algorithms can be widely used on similar systems.

Kenichi Sumitomo, Takato Izaiku, Yoshihiro Saitoh, Hui Wang, Minyi Guo, Jie Huang
A Generic Approach to Make Structured Peer-to-Peer Systems Topology-Aware

With the help of distributed hash tables, the structured peer-to-peer system has a short routing path and good extensibility. However, the mismatch between the overlay and physical network is the barrier to build an effective peer-to-peer system in the large-scale environment. In this paper, we propose a generic approach to solve this problem, which is quite different from other protocol-dependent methods. We reserve the structure of system and break the coupling between the node and its identifier by swap operations. We also propose several policies to reduce the traffic overhead. The policies include adaptive probing and shadow scheme. The experiment shows that our approach can greatly reduce the average latency of overlay networks and the overhead is controllable.

Tongqing Qiu, Fan Wu, Guihai Chen
A Workflow Management Mechanism for Peer-to-Peer Computing Platforms

This paper proposes a workflow management mechanism to address a neglected aspect of existing P2P computing platforms – the lack of support for various computational models. In the workflow management mechanism, a workflow description file is used to define the workflow diagram of the target application. We develop a prototype system, and evaluate it using a test program to demonstrate how the workflow management mechanism effectively works.

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

Session 5B: Internet Computing and Web Technologies I

DDSQP: A WSRF-Based Distributed Data Stream Query System

Today many current and emerging applications require support for on-line analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. Stream-oriented systems are inherently geographically distributed and because distribution offers scalable load management and higher availability, future stream processing systems will operate in a distributed fashion. Moreover, service-based approaches have gained considerable attention recently for supporting distributed application development in e-business and e-science. In this paper, we present our innovative work to build a large scale distributed query processing over streaming data, this system has been designed as a WSRF-compliant application built on top of standard Web services technologies. Our distributed data stream Queries are written and evaluated over distributed resources discovered and accessed using emerging the WS-Resource Framework specifications. The data stream query processor has been designed and implemented as a collection of cooperating services, using the facilities of the WSRF to dynamically discover, access and use computational resources to support query compilation and evaluation.

Jia-jin Le, Jian-wei Liu
Quantitative Analysis of Zipf’s Law on Web Cache

Many studies have shown that Zipf’s law governs many features of the WWW and can be used to describe the popularity of the Web objects. Based upon Zipf’s law, we analyze quantitatively the relationship between the hit ratio and the size of Web cache, present approximate formulae to calculate the size of Web cache when the hit ratio is given under the condition of basic Zipf’s law and Zipf-like law, determine the critical value n in the top-n prefetching algorithm by studying the effect of parameter

α

on the hot Web documents. Zipf’s law plays an important role in solving the Internet latency, and holds the promise of more effective design and use of Web cache resources.

Lei Shi, Zhimin Gu, Lin Wei, Yun Shi
An Adaptive Web Caching Method Based on the Heterogeneity of Web Object

Heterogeneity of Web objects is the important causes of the decrease the performance of web caching algorithms. To increase the throughput of web caching process and to improve service availability, we may consider heterogeneity of web object adaptively. In this study, we proposed the new web-caching algorithm. A heterogeneity variation of an object can be reduced as the proposed method dividedly managing. Web objects and a cache scope with heterogeneity, and it is adaptively reflecting a variation of object reference characteristics with the flowing of time. In the experiments, we verified that the performance of the proposed method was more improved than existing algorithms through the two experiment models, which considered heterogeneity of an object.

Yun Ji Na, Il Seok Ko, Gun Heui Han
Supporting Wireless Web Page Access in Mobile Environments Using Mobile Agents

Because of the limited bandwidth, wireless communication in mobile environments is much more expensive than the communication cost in wired network. Thus, to save the cost of wireless communication, during the process of wireless Web page access, mobile unit usually prefetches some Web pages into its local cache and then disconnect itself from the network, so that updates are performed in its local cache and propagated back to Web server upon reconnection again. In this paper, after discussing the design rationale of a new Web page access mechanism, the performance of this mechanism is analyzed theoretically and experimentally. The results show that the efficiency of update propagation is improved and the burden of Web server is decreased.

HaiYang Hu, JiDong Ge, Ping Lu, XianPing Tao, Jian Lu

Session 5C: Network Protocols and Switching I

TCP and ICMP in Network Measurement: An Experimental Evaluation

Both TCP and ICMP are applied in network measurement, while investigating differences between the measured results of them is important but has been less addressed. To compare the differences between TCP and ICMP when they are used in measuring host connectivity, RTT, and packet loss rate, we designed two groups of comparison programs, after careful evaluating of the program parameters, we executed a lot of experiments on the Internet. The experimental results shows, there are significant differences between the host connectivity measured using TCP or ICMP; in general, the accuracy of TCP is 20%-30% higher than that of ICMP. The case of RTT and packet loss rate is complicated, which are related to path loads and destination host loads. While commonly, the RTT and packet loss rate measured using TCP or ICMP are very close. We also give some advices on protocol selection for conducting accurate network measurements.

Wenwei Li, Dafang Zhang, Gaogang Xie, Jinmin Yang
Fuzzy Congestion Avoidance in Communication Networks

As an enhancement mechanism for the end-to-end congestion control, Active Queue Management (AQM) can keep smaller queuing delay and higher throughput by proposing fully dropping the packets at the intermediate nodes. comparing with RED algorithm, although PI controller for AQM designed by Hollot improves the stability, it seems other methods to design of robust controllers may lead to better results. Morover, the transient performance of PI controller is not perfect, such as the regulating time is so long. In order to overcome to this drawback, in this paper, a novel adaptive fuzzy logic based controller is designed for Active Queue Management (AQM) in TCP/AQM networks. From control point of view, it is rational to regard AQM as a typical regulation system. Recently many AQM algorithms have been proposed to address performance degradations of end-to-end congestion control. However, these AQM algorithms show weaknesses to detect and control congestion under dynamically changing network situations. A simulation study over a wide range of IP traffic conditions shows the effectiveness of the proposed controller in terms of the queue length dynamics, the packet loss rates, and the link utilization.

F. Habibipour, M. Khajepour, M. Galily
A New Method of Network Data Link Troubleshooting

On the basis of analyzing the evolution and drawbacks of current network fault diagnosis methods, a novel network data link troubleshooting system (NDTS) based on fuzzy neural network is proposed. NDTS tightly combines neural network and rough sets, so that it can be used to fit the smooth curves perfectly. Let the membership function as the base, an rule scavenging method is put forward in NDTS, which is the variable-precision modal, and the notion of variable-precision be founded on the measurement of dependent degree. Furthermore, NDTS is adopted to deal with the mapping relation, categorizing the network faults. The experiment system implemented by this method shows the proposed system is an open and efficient troubleshooting engine.

Qian-Mu Li, Yong Qi, Man-Wu Xu, Feng-Yu Liu
Ethernet as a Lossless Deadlock Free System Area Network

The way conventional Ethernet is used today differs in two aspects from how dedicated system area networks are used. Firstly, dedicated system area networks are lossless and only drop frames when bit errors occur, while conventional Ethernet drop frames whenever congestion occur. Secondly, these networks are either deadlock free or use mechanisms which avoids deadlock situations, while still using all available links. Ethernet avoids deadlocks by using a spanning tree protocol which turns any topology into a tree. A drawback of this approach is that we are left with a lot of unused links and thus wasting resources.

In this paper we describe how to obtain a lossless deadlock free network with the best possible performance, while adhering to the current Ethernet standard and using off-the-shelf Ethernet equipment. We achieve this by introducing flow control in all network nodes and by taking control over the routing algorithm. Also, we use TCP to illustrate the effect of flow control on higher layer protocols.

Through simulations we verify the following tree improvements. Firstly, the activation of flow control turns Ethernet into a lossless network. Secondly, taking control over the routing algorithm allows us to build any topology without the limitations of the spanning tree protocol. And thirdly, an overall improvement in throughput is achieved by combining these enhancements.

Sven-Arne Reinemo, Tor Skeie

Session 5D: Ad Hoc and Wireless Networks I

An Anycast-Based Geocasting Protocol for Mobile Ad Hoc Networks

The goal of geocasting protocols is to deliver data packets to a group of nodes that are within a specified geographical area, i.e., the geocast region. In an ad hoc environment, there are numerous scenarios which benefit from geocast communication. In this paper, the network is divided into grids, we propose a new routing protocol for geocasting, which combines anycast and flood. The proposed protocol utilizes the location information to route messages in grid-by-grid manner. The routing path by using proposed protocol is the shortest route between hosts in grids. The grid structure is successfully used to eliminate redundant transmission of geocasting messages. The time complexity of route discovery and the routing overhead are reduced.

Jipeng Zhou
A Mesh Based Anycast Routing Protocol for Ad Hoc Networks

Ad hoc networks became a hot topic recently, but the routing algorithm of anycast in the ad hoc networks has not yet been much explored. In this paper, we propose a mesh-based anycast routing algorithm (MARP) for ad hoc networks. The proposed routing model is robust and reliable, which can solve the unsteady topology problem in ad hoc networks. The future work is discussed at the end of this paper.

Shui Yu, Wanlei Zhou
Research of Power-Aware Dynamic Adaptive Replica Allocation Algorithm in Mobile Ad Hoc Networks

Power conservation is a critical issue in mobile ad hoc networks, as the nodes are powered by batteries only. In this paper, according to the mobility of nodes, the power-aware dynamic adaptive replica allocation algorithm is proposed. In the power-aware dynamic adaptive replica allocation algorithm, based on the locality of data access, the replica allocation scheme is adjusted regularly in order to reduce the power consumption, and thus extend the survival time of network. The relation between mobility models and efficiency of power-aware dynamic adaptive replica allocation algorithm is studied. The results of performance evaluation show that the power-aware dynamic adaptive replica allocation algorithm can reduce the total power consumption of network greatly.

Yijie Wang, Kan Yang
GCPM: A Model for Efficient Call Admission Control in Wireless Cellular Networks

Call Admission Control (CAC) is crucial for assuring the quality of service (QoS) of communication in wireless cellular networks. In this paper, we propose a model, called Guard Channel Prediction Model (GCPM), for efficient call admission control satisfying the QoS requirements. A predictive value of the appropriate number of guard channels can be calculated based on this model by using statistical properties of new and handoff call arrival rates and mean call residency time, as well as the total capacity of a specific cell. Simulation studies are carried out to evaluate the performance in comparison with an existing adaptive algorithm under variable traffic loads and mobility patterns. Simulation results show that our proposed GCPM, using the static and fractional Guard Channel policy to process both types of incoming calls based on the predictive values, has gained better QoS with less blocking probabilities of both types of calls and meanwhile, larger network utilizations.

Lanlan Cong, Beihong Jin, Donglei Cao, Jiannong Cao

Session 6A: Peer-to-Peer Algorithms and Systems II

Cross-Layer Flow Control Based on Path Capacity Prediction for Multi-hop Ad Hoc Network

In this paper, we first present a simple and effective path capacity predicting method to model the complex interaction between medium access mechanism and routing scheme. Based on the predicting model, an end-to-end cross-layer flow control architecture is proposed. The key issues about flowing control are discussed deeply. As we designed, whenever the path length changes, the applications adaptively modify their sending rate with the pre-computed optimal value. Simulation evaluation has demonstrated that this technique can greatly improve the performance of end-to-end transmission in ad hoc network: the throughput of the long path is improved by up to 40%.

Yongqiang Liu, Wei Yan, Yafei Dai
Constructing the Robust and Efficient Small World Overlay Network for P2P Systems

The current P2P application protocols are usually constructed over the application-level overlay network. However, because the users in the P2P systems always follow a very dynamic mode, the overlay network with poor performance will leads to the problem of connectivity—the departures of peers often break the network into plenty of small parts, and results in the resource islands. Although increasing the links between peers can enhance the performance of connectivity by information redundancy. But it will lead to the severe cost of maintenance. Then there is an urgent need to integrate the online peers as a “giant component” as large as possible, so that the resources online can be shared completely; at the same time to guarantee the cost of maintenance as little as possible. And due to the prevalence and significance of small world in reality and theory, in this paper we analyzed the correlation between the shortcuts density and the connectivity, as well as the impact of shortcuts density to robustness over the popular WS Small World model. At last, numerical simulation was done to confirm our analytic results.

Guofu Feng, Ying-chi Mao, Dao-xu Chen
Transparent Java Threads Migration Protocol over Peer2Peer

The Java Virtual Machine computing model implements a multi-threading paradigm but its computing model does not define and does not verify the distribution paradigm of the threads over set of JVM instances. Without a distribution paradigm the Java Virtual Machine computing model cannot get any advantage from the theory of parallel Turing Machines. This work formally specifies and verifies the JVM computing model distribution paradigm. An intrinsic transparent thread distribution mechanism over many JVMs relying on different communication technology such as Peer to Peer is an important outcome of the presented solution. Other consequences, such as distributed JVM run-time location, aggregation and reachability, are achieved. Moreover the creation of Virtual Farms of JVMs for Multi-threading applications computing is made possible.

Edgardo Ambrosi, Marco Bianchi, Carlo Gaibisso, Giorgio Gambosi, Flavio Lombardi
Analytic Performance Modeling of a Fully Adaptive Routing Algorithm in the Torus

Over the past decade, many fully adaptive routing algorithms have been proposed in the literature, of which Duato’s routing algorithm has gained considerable attention for analytical modeling. In this study we propose an analytical model to predict message latency in wormhole routed 2-dimensional torus networks in which fully adaptive routing, based on Linder-Harden’s methodology [10], is employed. This methodology presents a framework in which adaptive routing algorithms can be developed for the k-ary n-cube network. Simulation experiments reveal that the latency results predicted by the proposed analytical model are in good agreement with those provided by simulation experiments.

Mostafa Rezazad, Hamid Sarbazi-azad
Redundancy Schemes for High Availability in DHTs

High availability in peer-to-peer DHTs requires data redundancy. This paper takes user download behavior into account to evaluate redundancy schemes in data storage and share systems. Furthermore, we propose a hybrid redundancy scheme of replication and erasure coding. Experiment results show that replication scheme saves more bandwidth than erasure coding scheme, although it requires more storage space, when average node availability is higher than 48%. Our hybrid scheme saves more maintenance bandwidth with acceptable redundancy factor.

Fan Wu, Tongqing Qiu, Yuequan Chen, Guihai Chen
VIP: A P2P Communication Platform for NAT Traversal

Nowadays, the Internet architecture is complicated and IP addresses are limited in IPV4 context. Many users located behind different kinds of NATs or Firewalls can hardly get a public unique IP. So the hosts behind the NAT can not be accessed by the hosts behind the other NATs. Some P2P systems can partially solve such kind of problems, but unfortunately, these systems just focus the specific self-contained applications such as Skype and BitTorrent whose P2P architectures and NAT traversal mechanisms can not be re-used by other applications directly. In this paper we present a solution by setting up a Virtual Intranet Platform (VIP) which use the public DHT service -OpenDHT as the distributed address/port information rendezvous. Without changing the configuration of the NAT, all the network and distributed application service behind the NAT can make use of the VIP to communicate with the corresponding peer services outside the NAT. The performance of the bandwidth, data lost and delay problems are much better than the existing traditional C-S framework platforms, more general than specific P2P applications. The P2P Communication Platform for NAT Traversal-VIP, is robust and scalable because there are no single failure points in the platform, the structure is in distributed, and majority of the traffic data between two hosts behind the NAT can be transfer directly without relaying.

Xugang Wang, Qianni Deng

Session 6B: Internet Computing and Web Technologies II

Hyper-Erlang Based Model for Network Traffic Approximation

The long-tailed distribution characterizes many properties of Internet traffic. The property is often modeled by Lognormal distribution, Weibull or Pareto distribution theoretically. However, it hinders us in traffic analysis and evaluation studies directly from these models due to their complex representations and theoretical properties. This paper proposes a Hyper-Erlang Model (Mixed Erlang distribution) for such long-tailed network traffic approximation. It fits network traffic with long-tailed characteristic into a mixed Erlang distribution

directly

to facilitate our further analysis. Compared with the well-known hyperexponential based method, the mixed Erlang model is more accurate in fitting the tail behavior and also computationally efficient.

Junfeng Wang, Hongxia Zhou, Fanjiang Xu, Lei Li
Prediction-Based Multicast Mobility Management in Mobile Internet

Multicast mobility management poses a great challenge in mobile Internet. This paper proposes a novel multicast mobility management algorithm using our proposed RingNet hierarchy, which takes advantage of our designed four states for mobility management: Not-in-the-group, PassiveReservation, QuasiReservation, and ActiveReservation, and two kinds of ranges which are closely related to a Mobile Host (MH) ’s attached device called the Access Proxy (AP): TransmissionRange and ReservationRange. By judging the state of the AP and the distance between the AP and its attaching MH, operations of mobility management can be implemented. The introduction of prediction algorithm greatly decreases the blindness of resource reservation, and avoids the unnecessary waste of bandwidth. Furthermore, the introduction of resource reservation makes the smooth handoff highly probable.

Guojun Wang, Zhongshan Gao, Lifan Zhang, Jiannong Cao
A Rule-Based Workflow Approach for Service Composition

With the frequent changes in recent business and scientific environment, more efficient and effective workflow infrastructure is required. Besides, with increasing emphasis on Service-oriented architecture, service composition becomes a hot topic in workflow research. This paper proposes a novel approach of using ECA rules to realize the workflow modeling and implementation for service composition. First of all, the concept and formalization of ECA rule-based Workflow is presented. Second, an automatic event composition algorithm is developed to ensure the correctness and validness of service composition at design time. Finally, the proposed ECA rule-based approach for service composition is illustrated through a prototype system.

Lin Chen, Minglu Li, Jian Cao
Manage Distributed Ontologies on the Semantic Web

Managing distributed ontologies is a challenging issue in the Semantic Web area. Different to most current distributed ontologies management researches, which focus on ontologies maintenance, evolutions, and versioning, this paper proposes a new distributed ontologies management framework based on the function-oriented perspective, and its goal is to bring multiple distributed ontologies together to provide more powerful capabilities. Ontology mapping is the key factor for manage distributed ontologies. This management framework also proposes a novel approach to eliminate the redundancies and errors of mappings in distributed ontologies.

Peng Wang, Baowen Xu, Jianjiang Lu, Dazhou Kang, Yanhui Li

Session 6C: Network Protocols and Switching II

Next Generation Networks Architecture and Layered End-to-End QoS Control

Next-generation network (NGN) is a new concept and becoming more and more important for future telecommunication networks. This paper illustrates five function layers of NGN architecture and discusses some end-to-end QoS (quality of service) issues for NGN (called NGNQoS). The five function layers are: (1) Application Layer that supports SIP protocol; (2) Network Control Layer that aims at overcoming the bottleneck problems at edge nodes or servers for end-to-end admission control; (3) Adaptation Layer that supports different network configurations and network mobility; (4) Network Transmission Layer that provides end-to-end QoS control for real-time communications through integrating Differentiated Service (DiffServ) and Multi-Protocol Label Switching (MPLS) and (5) Management Layer that provides Web-based GUI browser for data presentation, monitoring, modification and decision making in NGN.

Weijia Jia, Bo Han, Ji Shen, Haohuan Fu
FairOM: Enforcing Proportional Contributions Among Peers in Internet-Scale Distributed Systems

The viability of overlay multicasting has been established by previous research. However, in order to apply overlay multicast to Internet-scale distributed systems, such as the Grid and Peer-to-Peer systems, the issue of effectively enforcing fairness among peers so as to optimize overall performance remains as a challenge. This paper argues that simply applying a multiple-tree scheme does not provide sufficient fairness, in terms of performance. Instead, we believe that a better way to define fairness, for performance’s sake, is to factor in peers’ proportional contributions as it provides the opportunity to support many simultaneous multicasting sessions. This paper then presents a protocol, called FairOM (Fair Overlay Multicast), to enforce proportional contributions among peers in Internet-scale distributed systems. By exploiting the notion of staged spare capacity group and deploying a two-phase multicast forest construction process, FairOM enforces proportional contributions among peers, which enables more simultaneous multicasting sessions and alleviates potential hot-spots. The simulation results of a large multicast group with 1000 members show that FairOM achieves the goal of enforcing proportional contributions among peers and does not overwhelm the peers, including the multicast source. FairOM also achieves low delay penalty for peers and high path diversity.

Yijun Lu, Hong Jiang, Dan Feng
An Efficient QoS Framework with Distributed Adaptive Resource Management in IPv6 Networks

In this paper, we proposed a new QoS framework with Distributed Adaptive Resource Management(DARM). DARM provides end-to-end QoS guarantees to individual flows with minimal overhead, while keeping the scalability characteristic of DiffServ. In DARM, per-flow admission control and resource reservation, in conjunction with a novel IPv6 flow label mechanism, can be processed instantaneously in a fully distributed and independent manner at edge of network without hop-by-hop signaling. In addition, DARM is capable of reconfiguring network resource adaptively according to dynamically changing of traffic load. Through extensive simulations, the results clearly exhibit that DARM has a better overall performance comparing to the IntServ and DiffServ.

Huagang Shao, Weinong Wang, Rui Xie, Xiao Chen
Scheduling Latency Insensitive Computer Vision Tasks

In recent times, there are increasing numbers of computer vision and pattern recognition (CVPR) technologies being applied to real time video processing using single processor PCs. However, these multiple computational expensive tasks are generating bottlenecks in real-time processing. We propose a scheme to achieve both high throughput and accommodation to user-specified scheduling rules. The scheduler is then distributing ‘slices’ of the latency insensitive tasks such as video object recognition and facial localization among the latency sensitive ones. We show our proposed work in detail, and illustrating its application in a real-time e-learning streaming system. We also provide discussions into the scheduling implementations, where a novel concept using interleaved SIMD execution is discussed. The experiments have indicated successful scheduling results on a high end consumer grade PC.

Richard Y. D. Xu, Jesse S. Jin

Session 6D: Ad Hoc and Wireless Networks II

Throughput Analysis for Fully-Connected Ad Hoc Network with Multiuser Detection

The importance of multiuser detection for CDMA-based ad hoc network is addressed in this paper. Conventionally, the terminal in CDMA-based ad hoc network uses matched filter to receive packets, so the performance (e.g., throughput) of the network suffers from multi-access interference (MAI). Different from above scheme, in this paper, each terminal of the ad hoc network is equipped with an adaptive blind linear multiuser detector, so the ability of MAI-resistance is gained. Based on fully-connected network model and Log-distance path loss radio propagation model, the throughput of ad hoc network with multiuser detection is studied. Simulation results show that multiuser detection can remarkably enlarge the throughput of ad hoc network.

Xiaocong Qian, Baoyu Zheng, Genjian Yu
Dynamic Traffic Grooming for Survivable Mobile Networks – Fairness Control

The Internet is replacing the traditional telephone network as the ubiquitous network infrastructure. Internet customers are increasing at an exponential rate and will continue to increase in the near future. With the proliferation of mobile communication technologies and wireless personal devices, the demand for mobile communications has grown exponentially over the last decade and is expected to grow even more in the near future. This paper proposes a new bandwidth allocation scheme that guarantees the time independent fairness and fault tolerance in the heterogeneous mobile communication services. It will hold some calls in the second buffer rather than directly discarding it when the residual bandwidth is insufficient. A multimedia call that satisfies all connection requirements has precedence over other calls.

Hyuncheol Kim, Sunghae Kim, Seongjin Ahn
An Efficient Cache Access Protocol in a Mobile Computing Environment

The use of periodic invalidation reports (IRs), has been shown to be a useful technique for conserving wireless bandwidth and battery power. However, IR-based schemes have some drawbacks, such as long query delay and low client caching availability, even if the clients have sufficient local cache capacity. In this paper, we propose an efficient cache access protocol to address these problems. Instead of passively waiting, the clients use the local cache actively. Using our protocol, we can remove the

”false alarm”

that causes unnecessary delay. Based on our threshold-based scheme, the proposed protocol can optimize response time with little loss of data currency. Our simulation results are carried out to evaluate the proposed methodology. Compared to previous IR-based schemes, our scheme can reduce the response time significantly with a very little loss of data currency.

Jae-Ho Choi, SangKeun Lee
Implementation and Performance Study of Route Caching Mechanisms in DSR and HER Routing Algorithms for MANET

Route caching strategy is an important on-demand routing protocol for mobile ad hoc networks. On-demand routing protocol for mobile ad hoc networks utilizes route caching in different forms to reduce overheads, peer-to-peer delay. This paper presents a variation in view compared to DSR and HER, to minimize cache staleness, partitions and enhance reliability of service. The variation is with respect to identification of route and cache validation of errors using different techniques namely Update Route Caching (URC), Temporal Cache Validation (TCV), Negative Cache Validation (NCV) and Combined Cache Validation (CCV). The proposed method refreshes cache more often than the DSR and HER thereby initiating route requests earlier, when a route still being used is broken, thus reducing peer to peer delay to transmit packets. The results of GloMoSim simulator validates higher cache hit percentage and reliable delivery of packets in the technique Combined Cache Validation (CCV) when in comparison to DSR and HER

K. Murugan, Sivasankar, Balaji, S. Shanmugavel
An Effective Cluster-Based Slot Allocation Mechanism in Ad Hoc Networks

This work studies the allocation of bandwidth resources in wireless ad hoc networks. The highest-density clustering algorithm is presented to promote reuse of the spatial channel and a new slot allocation algorithm is proposed to achieve conflict-free scheduling for transmissions. Since the location-dependent contention is an important characteristic of ad hoc networks, in this paper we consider this feature of ad hoc networks to present a new cluster formation algorithm, by increasing the number of simultaneous links to enhance spatial channel reuse. Furthermore, because each cluster has its own scheduler and schedulers operate independently of each other, the transmissions may conflict among the clusters. In this paper, we classify the flows by the locations of their endpoints to prevent this problem. Finally, the proposed mechanism is implemented by simulation and the results reveal that the conflicts can be efficiently avoided without global information and the network throughput is improved without violating fairness.

Tsung-Chuan Huang, Chin-Yi Yao
Backmatter
Metadata
Title
Parallel and Distributed Processing and Applications
Editors
Yi Pan
Daoxu Chen
Minyi Guo
Jiannong Cao
Jack Dongarra
Copyright Year
2005
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-32100-2
Print ISBN
978-3-540-29769-7
DOI
https://doi.org/10.1007/11576235

Premium Partner