Skip to main content

2005 | Buch

Advanced Parallel Processing Technologies

6th International Workshop, APPT 2005, Hong Kong, China, October 27-28, 2005. Proceedings

herausgegeben von: Jiannong Cao, Wolfgang Nejdl, Ming Xu

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

Welcome to the proceedings of APPT 2005: the 6th International Workshop on Advanced Parallel Processing Technologies. APPT is a biennial workshop on parallel and distributed processing. Its scope covers all aspects of parallel and distributed computing technologies, including architectures, software systems and tools, algorithms, and applications. APPT originated from collaborations by researchers from China and Germany and has evolved to be an international workshop. APPT 2005 was the sixth in the series. The past ?ve workshops were held in Beijing, Koblenz, Changsha, Ilmenau, and Xiamen, respectively. The Program Committee is pleased to present the proceedings for APPT 2005. This year, APPT 2005 received over 220 submissions from researchers all over the world. All the papers were peer reviewed by two to three Program Committee members on their relevance, originality, signi?cance, technical qu- ity, and presentation. Based on the review result, 55 high-quality papers were selected to be included in the proceedings. The papers in this volume represent the forefront of research on parallel processing and related ?elds by researchers from China, Germany, USA, Korea, India, and other countries. The papers - cepted cover a wide range of exciting topics, including architectures, software, networking, and applications.

Inhaltsverzeichnis

Frontmatter

Keynote Speech

Research Issues in Adapting Computing to Small Devices

Advances in pervasive and mobile technologies are making computing available to us at anytime anywhere. Availability however does not automatically mean it is in a form that implies ease of use. Usability in the mobile world amounts to a set of problems that are not so much precedented in the history of computing. Handheld mobile devices that are thin-lean-mean for instance present challenges that require fundamental changes in the way computation is carried out, its architecture, or its supporting environment. A practical goal is to minimize these changes, which calls for automatic or semi-automatic adaptation of existent computing to the small devices. We discuss the issues and research challenges of “X adapting to Y”, where X includes content, data, code, computation, GUI, and so on, and the changes in semantics and/or syntax due to the adaptation are to satisfy the constraints of Y. Some experiments we have carried out for content and code adaptation provide some useful illustration.

Francis C. M. Lau
Mobile Context-Aware Systems – Linking the Physical and Digital World

The rapid miniaturisation and decline in prices of computer, communication and sensor technology give rise to a number of interesting developments, such as multifunctional mobile devices, sensor platforms embedded into everyday things, and sensor nodes organised in a wireless network. Those systems can capture and process sensory data and communicate this information to other peers in their proximity or to an existing server infrastructure. The sensory data are fed into spatio-temporal models of the physical world, which build the basis for the promising class of context-aware applications. Based on these developments it can be anticipated that there will be billions of sensor systems in our physical environment near future. Consequently, we envision most of the future applications to be context-aware, sharing highly dynamic digital world models offered by a large number of content providers. Obviously, the realisation of this vision will cause a number of both technological and social challenges. Some of these challenges are subject to the research of Nexus, a Centre of Excellence established at University of Stuttgart in the year 2003. In this talk, we present the vision and objectives of Nexus. Moreover, we will discuss some aspects of scaleable context management.

Kurt Rothermel

Architecture

A Data Transformations Based Approach for Optimizing Memory and Cache Locality on Distributed Memory Multiprocessors

Data locality is one of the key factors in affecting the performance of parallel programs running on distributed memory multiprocessors. This paper presents an approach for optimizing memory locality and cache locality of perfect or non-perfect loop nests using linear data transformations on distributed memory multiprocessors. The approach optimizes memory locality with the data space fusion technique and cache locality with the projection-delamination technique, and combines the both techniques effectively to make the overheads of remote memory accesses and local memory accesses as low as possible. We conduct experiments with nine programs and the results show the approach is effective in optimizing memory locality and cache locality simultaneously.

Xia Jun, Xue-Jun Yang
A Fetch Policy Maximizing Throughput and Fairness for Two-Context SMT Processors

In Simultaneous Multithreading (SMT) processors, co-scheduled threads share the processor’s resources, but at the same time compete for them. A thread missing in L2 cache may hold a large number of resources which other threads could be using to make forward progress. And as a result, the overall performance of SMT processors is degraded. Currently, many instruction fetch policy focus on this problem. However, these policies are not perfect, and each has its own disadvantages. Especially, these policies are designed for processors implementing any ways simultaneous multithreading. The disadvantages of these policies may become more serious when they are used in two-context SMT processors.

In this paper, we propose a novel fetch policy called RG-FP (Resource Gating based on Fetch Priority), which is specially designed for two-context SMT processors. RG-FP combines reducing fetch priority with controlling shared resource allocation to prevent the negative effects caused by loads missing in L2 cache. Simulation results show that our RG-FP policy outperforms previously proposed fetch policies for all types of workloads in both throughput and fairness, especially for memory bounded workloads. Results also tell that our policy shows different degrees of improvement over other fetch policies. The increment over PDG is greatest, reaching 41.8% in throughput and 50.0% in Hmean on average.

Caixia Sun, Hongwei Tang, Minxuan Zhang
A Loop Transformation Using Two Parallel Region Partitioning Method

Loop parallelization is an important optimization issue in the execution of scientific programs. This paper proposes loop transformation techniques for finding parallel regions within nested loops with non-uniform dependences in order to improve parallelism. By parallelizing anti dependence region using variable renaming, there remains only flow dependence in the loop. We then divide the iteration space into FDT (Flow Dependence Tail set) and FDH (Flow Dependence Head set). By two given equations, we show how to determine whether the intersection of FDT and FDH is empty or not. So, we can find two parallel regions for doubly nested loops with non-uniform dependences. In the case that FDT does not overlap FDH, we will divide the iteration space into two parallel regions by a line.

Sam Jin Jeong, Jung Soo Han
Criticality Based Speculation Control for Speculative Multithreaded Architectures

Unending quest for performance improvement coupled with the advancements in integrated circuit technology have led to the development of new architectural paradigm. Speculative multithreaded architecture (SpMT) philosophy relies on aggressive speculative execution for improved performance. However, aggressive speculative execution comes with a mixed flavor of improving performance, when successful, and adversely affecting the performance (and energy consumption) because of useless computation in the event of mis-speculation. Dynamic instruction criticality information can be applied to control and guide such an aggressive speculative execution.

In this paper, we propose a model to determine the dynamic instruction criticality of SpMT execution. We have also developed two novel techniques, utilizing the criticality information, namely delaying the non-critical loads and the criticality based thread-prediction for reducing useless computations. Our experiments with criticality based speculation control show a significant reduction in useless computation with little reduction in speedup.

Rahul Nagpal, Anasua Bhowmik
Design and Implementation of Semantic Caching Coherency Control Scheme Toward Distributed Environment

Semantic caching is very attractive for use in distributed computing environments based on historical queries and their descriptions, one of whose important issues is how to best maintain semantic caching using a coherency control scheme. With the object of applying semantic caching into practice, the cache coherency problems including the data between the server and its caching as well as the cached data and their semantic descriptions are analyzed. This paper presents conflicts existing in semantic caching and their formal definitions, proposes the semantic caching model, and coherency control scheme, meanwhile derives update list optimization algorithm adopted in server and coherency control algorithm used in clients. Finally, the performance of the semantic caching coherency control scheme is examined and analyzed through a simulation study in detail.

Hai Wan, Lei Li
Energy Efficient United L2 Cache Design with Instruction/Data Filter Scheme

The on-chip caches usually consume a significant amount of energy in modern microprocessors. This paper presents an I/D filter scheme to reduce the energy consumption of united L2 caches shared by instructions and data. By adding an I/D indicator bit, the cache block is classified into I-block and D-block. For instruction and data accesses, only the corresponding blocks instead of all the blocks in the same set selected are accessed. By this method, we can easily filter the unnecessary way activities and save the energy consumption. This technique uses a small amount of additional hardware without increasing the cache access latency, and the area overhead is negligible. Simplescalar simulator and CACTI were used to evaluate the performance of our proposed architecture, the results shows that the I/D filter scheme is energy efficient for set-associative caches.

Zhiqiang Ma, Zhenzhou Ji, Mingzeng Hu, Yi Ji
Improving Latency Tolerance of Network Processors Through Simultaneous Multithreading

Existing multithreaded network processors architecture with multiple processing engines (PEs), aims at taking advantage of blocked multithreading technique which executes instructions of different user-defined threads in the same PE pipeline, in explicit and interleave way. Multiple PEs, each of which is a multithreaded processor core, process several packets in parallel to hide long memory access latency. Most of them are optimized for throughputs mostly in data-plane. In future network workloads, the boundaries between data-plane and control-plane become blurred, so that PEs are demanded not only wire speed packet forwarding on data-plane, but also highly intelligent and increased complex packet processing function on control-plane. In this paper, we analyze SMT’s short latency tolerance potential when used in out-of-order and dynamic scheduling PE cores. We show in this paper that 2~4 issue SMT provides an excellent short memory and branch latency tolerance, which gain higher instructions throughout as well as much simpler structures.

Bo Liang, Hong An, Fang Lu, Rui Guo
RIMP: Runtime Implicit Predication

If-conversion and predicated execution are widely adopted to eliminate branch misprediction penalty. Previous predication execution depends on compiler to generate explicit predicated instructions. In this paper, a trace-based predicate mechanism named RIMP (Runtime IMplicit Predication) is discussed. The candidates of if-conversion will be identified during dynamic execution. Conventional trace cache has been modified to store RIMP traces, which include instructions both from fall-through and target block following the conditional branch. Hardware extension will add predication to RIMP trace automatically. With the help of RIMP, legacy applications can benefit from predication mechanism without recompiling source code. Simulation of RIMP implementation under diverse microarchitecture configurations is presented in the paper. Results have shown promising performance improvement. In general, RIMP with 64kB trace storage delivers an average 10.3% IPC improvement while actually speeding up the execution time by over 7%.

YuXing Tang, Kun Deng, XiaoDong Wang, Yong Dou, XingMing Zhou
Static Partitioning vs Dynamic Sharing of Resources in Simultaneous MultiThreading Microarchitectures

Simultaneous MultiThreading (SMT) achieves better system resource utilization and higher performance because it exploits Thread-Level Parallelism (TLP) in addition to “conventional” Instruction-Level Parallelism (ILP). Theoretically, system resources in every pipeline stage of an SMT microarchitecture can be dynamically shared. However, in commercial applications, all the major queues are statically partitioned. From an implementation point of view, static partitioning of resources is easier to implement and has a lower hardware overhead and power consumption. In this paper, we strive to quantitatively determine the tradeoff between static partitioning and dynamic sharing. We find that static partitioning of either the instruction fetch queue (IFQ) or the reorder buffer (ROB) is not sufficient if implemented alone (3% and 9% performance decrease respectively in the worst case comparing with dynamic sharing), while statically partitioning both the IFQ and the ROB could achieve an average performance gain of 9% at least, and even reach 148% when running with floating-point benchmarks, when compared with dynamic sharing. We varied the number of functional units in our efforts to isolate the reason for this performance improvement. We found that static partitioning both queues outperformed all the other partitioning mechanisms under the same system configuration. This demonstrates that the performance gain has been achieved by moving from dynamic sharing to static partitioning of the system resources.

Chen Liu, Jean-Luc Gaudiot

Algorithm and Theory

Autonomous-Centered Problem Allocation Oriented to Cooperation

By reasonably allocating a cooperative problem which need multiple solvers cope with together, the problem could be performed more effectively and efficiently. A problem could be divided into multiple sub-problems; each has certain ability requirement which is the hinge to relate problem and solver. According to ability requirement, the solver candidate set for each sub-problem could be established. To select suitable solver from candidate set so as to solve a cooperative problem in more autonomous and consistent way, a mathematical allocation model taking the minimization of interaction number as objective function is established. The model solving process is deployed by decreasing two kinds of interactions, i.e. intra-interaction and extra-interaction. Experiment shows this method obtains better performance than general allocation.

Xiping Liu, Wanchun Dou, Guihai Chen, Shijie Cai, Jiashan Tang
Contention-Free Communication Scheduling for Irregular Data Redistribution in Parallelizing Compilers

The data redistribution problems on multi-computers had been extensively studied. Irregular data redistribution has been paid attention recently since it can distribute different size of data segment of each processor to processors according to their own computation capability.

High Performance Fortran Version 2

(HPF-2) provides

GEN_BLOCK

data distribution method for generating irregular data distribution. In this paper, we develop an efficient scheduling algorithm, Smallest Conflict Points Algorithm (SCPA), to schedule HPF2 irregular array redistribution. SCPA is a near optimal scheduling algorithm, which satisfies the minimal number of steps and minimal total messages size of steps for irregular data redistribution.

Kun-Ming Yu, Chi-Hsiu Chen, Ching-Hsien Hsu, Chang Wu Yu, Chiu Kuo Liang
Experiments on Asynchronous Partial Gauss-Seidel Method

This paper presents design and experimental results of a parallel linear equation solver by asynchronous partial Gauss-Seidel method. The basic idea of this method is derived from the asynchronous iterative method; newly computed values of unknowns are broadcast to all other processors and are incorporated into computing the next value immediately after they are received. However, since the asynchronous iterative method requires frequent data passing, it is difficult to achieve high performance on practical cluster computing systems due to its enormous communication overhead. To avoid it, the asynchronous partial Gauss-Seidel method reduces frequency of broadcasting new values of unknowns by passing multiple values in a chunk. The experimental results show the advantage of the asynchronous partial Gauss-Seidel method.

Hiroshi Nishida, Hairong Kuang
Improved Program Dependence Graph and Algorithm for Static Slicing Concurrent Programs

Based on the comparison among existing slicing algorithms and analysis of the fact that Krinke’s algorithm [9] produces imprecise program slice for the program structure which has loops nested with one or more threads, a conclusion is drawn that the reason for the impreciseness is that Krinke’s data structure—threaded program dependence graph—had over coarse definitions of data dependence relations between threads, and the constraint put on the execution path in concurrent program is unduly loose. An improved threaded program dependence graph is proposed which adds a new dependence relation of loop-carried data dependence crossing thread boundaries. An improved slicing algorithm is also proposed which introduces a new concept of regioned execution witness to further constrain the execution path. The pseudo code of the algorithm adding loop-carried data dependence relations crossing thread boundaries is given. The pseudo code of the new slicing algorithm is also given whose complexity has been analyzed. Examples show that the improved slicing algorithm designed on the improved data structure can restrain the impreciseness of Krinke’s.

Jianyu Xiao, Deyun Zhang, Haiquan Chen, Hao Dong
Parallelisation of Sequential Programs by Invasive Composition and Aspect Weaving

We propose a new method of interactively parallelising programs that is based on aspect weaving and invasive software composition. This can be seen as an alternative to skeleton programming. We give motivating examples for how our method could be applied.

Mikhail Chalabine, Christoph Kessler
Revisiting the Election Problem in Asynchronous Distributed Systems

This paper is about the relationship between the Election Problem and Failure Detectors in asynchronous distributed systems. It is stated in [7] that a Perfect Failure Detector

P

is needed to solve the Election problem. But in contrast to the result, there is a failure detector that solves Election weaker than the Perfect Failure Detector. We introduce the Confirmatory failure detector

C

. We show that to solve Election,

C

is necessary while

P

is not, whereas C+

$\diamondsuit$

S is sufficient when a majority of the processes are correct.

SungUoon Bauk
Scheduling Scheme with Fairness and Adaptation in the Joint Allocation of Heterogeneous Resources

Generally speaking, packets transmitted from source to destination need not only bandwidth and buffer but also computing and processing capacity in each node, available wavelength in optical networks and power in wireless sensor networks. So, joint and fair allocation of heterogeneous resources (such as, bandwidth, buffer and processing) is rather pivotal and necessary for realizing differentiated end-to-end QoS guarantee. In this paper, we proposed the framework and algorithm, realized fair allocation of heterogeneous resources, and designed the Heterogeneous-Deficit Round Robin (H-DRR) algorithm, which was the improvement to DRR. The comparison of performance with and without unfriendly packets stated that H-DRR could realize the fair allocation of heterogeneous resources and provide better QoS guarantee, especially in the differentiated situation.

Yu Hua, Chanle Wu, Mengxiao Wu
Solving the Symmetric Tridiagonal Eigenproblem Using MPI/OpenMP Hybrid Parallelization

We present a hybrid MPI/OpenMP parallel implementation for the eigenvalues of symmetric tridiagonal matrices on cluster of SMP’s environments. The algorithm is based on a divide-and-conquer method which uses the split-merge technique and Laguerre’s iteration. We study two different implementations of the algorithm: one based on MPI and the other based on a hybrid parallel paradigm with MPI/OpenMP. We take a coarse grain OpenMP approach to parallel implementation for solving the eigenvalues of symmetric tridiagonal submatrices within a SMP node. And dynamic work sharing is used in Laguerre’s iterations. This has two effects: first, the amount of synchronization has been reduced; secondly, this could have an effect on the load balance. In addition, we analyze the communication overhead on two different implementations. An experimental analysis on the DeepComp 6800 shows the hybrid algorithm performs good scalability.

Yonghua Zhao, Jiang Chen, Xuebin Chi
Trust Management with Safe Privilege Propagation

Trust management uses delegation to enable decentralized authorization across administrative domains. Delegation passes one’s authority over resources to trusted entities and thus enables more flexible and scalable authorization. However, unrestricted delegation may result in privilege proliferation and breach the privacy of information systems. The delegation models of existing trust management systems do not provide effective control on delegation propagation, and the correctness of constraint enforcement mechanisms is not formally analyzed, which may lead to privilege proliferation. In this paper, we propose a role-based constrained delegation model (RCDM), which restricts the propagation scope of delegation trees by a novel delegation constraint mechanism named

spacial constraint

. This paper also introduces a rule-based language to specify the policies and the deduction algorithm for constrained delegation defined in RCDM. The soundness and completeness properties of the deduction algorithm ensure the safety and availability of our delegation model.

Gang Yin, Huai-min Wang, Tao Liu, Ming-feng Chen, Dian-xi Shi
Vector Space Based on Hierarchical Weighting: A Component Ranking Approach to Component Retrieval

In this paper, we present an approach to software component ranking intended for use in searching for such components on the internet. The method used introduces a novel method of weighting keywords that takes account of where within the structure of a component the keyword is found. This hierarchical weighting scheme is used in two ranking algorithms: one using summed weights, the other using a vector space model. Experimental comparisons with algorithms using TF-IDF weighting that ignore component structure are described. The results demonstrate consistent superiority of the hierarchical weighting approach.

Gui Gui, Paul D. Scott

System and Software

A High Availability Mechanism for Parallel File System

Parallel file systems achieve a high I/O throughput by dividing a file into multiple blocks and storing them on multiple I/O nodes. However, the reliability and availability of the parallel file systems are sacrificed for the stripping of file data over multi I/O nodes. A new mechanism named Logic Mirror Ring (

LMR

), has been developed to improve the reliability and availability of the parallel file systems in this study. A logic mirror ring is built over all I/O nodes to indicate the mirror relationship among the nodes, i.e., each node maintains not only its own data but also the mirror data of other nodes. The fault tolerant capability of the system is improved because the node maintaining the mirror data of the failed node will take over the requests to the failed node. The mirror depth can be adjusted to different levels based on the requirements of the reliability and availability. A model is developed to evaluate the reliability and availability of the parallel file systems. The effects of

LMR

on the reliability and availability of the parallel file system is studied. The results show that

LMR

can be used to improve the reliability and availability of the parallel file systems effectively.

Hu Zhang, Weiguo Wu, Xiaoshe Dong, Depei Qian
A User-Guided Semi-automatic Parallelization Method and Its Implementation

In this paper, we propose a user-guided semi-automatic parallelization method, which is based on code templates corresponding to parallel programming paradigms and the concept of meta-task independent with each other. As an implementation of this method, we develop the system

Metaparallel

, which is based on Java language and MPICH, and the framework of

Metaparallel

is discussed. At last, the parallelization flow is studied with a case. In addition, we test the usability of

Metaparallel

by the practical engineering problem.

Chuliang Weng, Zhongguo Chen, Xinda Lu, Minglu Li, Yong Yin
CAPU: Enhancing P2P File Sharing System with Capacity Aware Topology

Measurement works show that the unstructured P2P file sharing systems such as Gnutella face the problem of poor scalability and inefficiency search for unpopular items. In this paper, we propose new mechanisms that greatly enhance the performance of file sharing system. Our work exploits the prevalent heterogeneity of the nodes in existing unstructured networks in terms of capacity to construct a quasi-hierarchical topology-aware topology which achieves approximately optimal system throughput. Based on this overlay topology, we propose proactive file index propagation scheme to facilitate search. We also introduce a two-stage search algorithm integrate probabilistic biased random walk that search for popular items and low-redundant multicast (MPR) searching for rare items, achieving approximately O(1) search efficiency for popular items and receivable search latency for rare items respectively. We evaluate our design through simulations and the results show 3 to 5 orders of magnitude improvement in total system capacity compared to other Gnutella-like system.

Hongliang Yu, Weimin Zheng, Dongsheng Wang, Haitao Dong, Lu Li
Implementing Component Persistence in CCM Based on StarPSS

In distributed computing environment, we always need to store the objects’ state. CORBA Persistent State Service (PSS) provide a high-level approach to realize the persistence of CORBA object. In this paper, we present StarPSS, a design and implementation of PSS in C++ language, and based on the StarPSS, we propose a design of mechanism to implement component persistence in CORBA Component Model (CCM) environment.

Jingbin An, Yan Jia, Zhiying Wang
Load Balancing Design Issues on Prefetch-Based DSM Systems

In recent years, the cluster computing technology has become a cost-effective computing infrastructure, because it aggregates resources of computational power, communication and storage. It is also considered a very attractive platform for low-cost supercomputing. Software distributed shared memory (DSM) provides a convenient and effective solution for programming parallel applications. However, both page faults and communication are major sources of overheads in DSM systems. Prefetching strategy can overlap data transporting time with computation time, as also reducing page faults. Unfortunately, it conducts load imbalance during barrier synchronization. For solving such inconveniences, this research paper discusses the load balancing for barrier synchronization in DSM systems. We discuss that, leaving the loop when half of hosts have finished prefetching is the best method, and therefore, we modify the threshold of leaving loop. Experiments show that, by incorporating load balancing into DSM systems, the barrier synchronization has been improved.

Hsiao-Hsi Wang, Kuan-Ching Li, Kuo-Jen Wang, Ssu-Hsuan Lu, Chun-Chieh Yang
Task Assignment for Network Processor Pipelines Using GA

In several commercial network processors programming environments, programmer must manually assign many processing tasks to the processor pipelines which consist of many processing engines. Due to the large exploration space, this manual procedure is usually very tedious and inefficient and the optimal or even near-optimal assignment scheme may be difficult to obtain. This paper proposes an automated task-to-PE assignment algorithm based on genetic algorithm. Experimental results show that this method can quickly obtain near-optimal solutions from the large solution space and the algorithm execution time is decoupled with pipeline stages. These two features make this method very suitable to be used in a NP application development environment and provide a more efficient development experience for developers.

Shoumeng Yan, Xingshe Zhou, Lingmin Wang, Fan Zhang, Haipeng Wang
Test-Suite Reduction Using Genetic Algorithm

As the software is modified and new test cases are added to the test-suite, the size of the test-suite grows and the cost of regression testing increases. In order to decrease the cost of regression testing, researchers have researched on the use of test-suite reduction techniques, which identify a subset of test cases that provides the same coverage of the software, according to some criterion, as the original test-suite. This paper investigates the use of an evolutionary approach, called genetic algorithms, for test-suite reduction. The algorithm builds the initial population based on test history, calculates the fitness value using coverage and cost information, and then selectively breeds the successive generations using genetic operations. This generational process is repeated until a minimized test-suite is founded. The results of studies show that, genetic algorithms can significantly reduce the size of the test-suite and the cost of regression testing, and achieves good cost-effectiveness.

Xue-ying Ma, Bin-kui Sheng, Cheng-qing Ye

Grid Computing

A Constellation Model for Grid Resource Management

By analyzing the demand of the Grid resource management, this paper proposes a constellation model for dynamically organizing and managing Grid resources according to the integrated service capabilities of each node. The logical layer of the constellation model matches with the underlying physical organization. Some evaluation criterions for the integrated service capabilities are proposed in this paper, which can also be used as the criterion in selection of the standby fixed star node in resource management. By defining the minimum resource management unit, the model is more suitable to manage the resources dynamically and easier to realize uniform resource management. The services in the constellation model can be developed according to WSRF and the model conforms to OGSA.

Yinfeng Wang, Xiaoshe Dong, Xiuqiang He, Hua Guo, Fang Zheng, Zhongsheng Qin
An Effective Information Service Architecture in Grid Environment

The information service is a vital part of any Grid software platform, providing the fundamental mechanism for discovering and monitoring services and resources in a Grid. This paper presents an effective information service architecture developed for ChinaGrid Support Platform (CGSP). The architecture is based on domains in CGSP, which are autonomous grid systems. The key issues are to collect information of diverse resources within a domain dynamically, and to share these collected data across domains effectively and securely by providing a unique interface based on a delegation method and an index aggregation mechanism.

Huashan Yu, Yin Luo, Xingguo Zhu, Xiaoming Li
An Efficient Data Management System with High Scalability for ChinaGrid Support Platform

There are a great number of data intensive applications in ChinaGrid. They require an efficient and high performance data management. The data management in

ChinaGrid Support Platform

(CGSP) supplies a data access mechanism with location transparency, name transparency, and protocol transparency as while as ensuring the transfer efficiency. The data management system consists of five parts: the storage data server based on Global Distributed Storage System to guarantee the reliability and performance of data transfer; the storage resource agent to discover, publish and catalog the storage resources; the data logical domain management to enable applications to select specific storage resources; the metadata management to publish, query and access metadata; and the uniform data access entry to organize grid users’ data space. We present the design philosophy of the efficient data management system with high scalability for CGSP and also give preliminary performance results.

Hai Jin, Wenjun Gong, Song Wu, Muzhou Xiong, Li Qi, Chengwei Wang
CGSP: An Extensible and Reconfigurable Grid Framework

ChinaGrid Support Platform (CGSP) is proposed to provide grid toolkit for ChinaGrid application developers and specific grid constructors, in order to reduce their development cost as greatly as possible. CGSP extensible and reconfigurable framework, which satisfies the expansion and autonomy requirement of ChinaGrid, is mainly discussed in the paper. In the framework, domain is presented to denote one unit which could provide grid service for end users by itself. Layered structure of domains and corresponding interactive relationship are paid much more attention. CGSP design motivation and simple execution management mechanism are also described in this paper.

Yongwei Wu, Song Wu, Huashan Yu, Chunming Hu
Early Experience of Remote and Hot Service Deployment with Trustworthiness in CROWN Grid

CROWN Grid aims to empower in-depth integration of resources and cooperation of researchers nationwide and worldwide. In such a distributed environment, to facilitate adoption of services, remote and hot service deployment is highly desirable. Furthermore, when the deployer and the target container are from different domains, great security challenges arise when a service is deployed to the remote container. In this paper, we present ROST, an original scheme of Remote & hOt Service deployment with Trustworthiness. By dynamically updating runtime environment configurations, ROST avoids restarting the runtime system during deployment. Moreover, we adopt trust negotiation in ROST to assure the security of service deployment. We conduct experiments in a real grid environment, and evaluate ROST comprehensively.

Hailong Sun, Yanmin Zhu, Chunming Hu, Jinpeng Huai, Yunhao Liu, Jianxin Li
Grid Developing Environment in CGSP System

Grid computing is becoming a mainstream technology for multi-institutional distributed resources sharing and system integration. Normally, the programmer’s productivity in designing and implementing efficient parallel applications over grid remains a very time-consuming task, especially for the non-compute users. At the same time, the development of grid programming environments, which would enable programmers to efficiently exploit grid technologies, becomes an important and hot research issue too. In this paper, grid developing environment (GDE) based on ChinaGrid Support Platform (CGSP) is discussed. GDE supplies the portal building, job defining, programming interface, and administration tools for CGSP. The GDE motivations, architecture and corresponding implementation over CGSP are presented respectively.

Weimin Zheng, Lisen Mu, Qing Wang, Yongwei Wu
Grid Job Support System in CGSP

As Grid computing becomes more and more practical, User needs a steady computing environment and an effective way to assemble simple services into complicated service to meet their requirements. Based on BPEL4WS, this paper introduces a Grid Job Description Language (GJDL) to combine correlation services into composited service. With GJDL, we design and implement a job support system which has been already applied successfully in China Grid Support Platform (CGSP).

Jinpeng Huai, Yu Wan, Yong Wang, Haifeng Ou
JFreeSim: A Grid Simulation Tool Based on MTMSMR Model

Due to the non-repeatability of the grid environment, limitation comes out when conducting grid performance analysis in real environment. Therefore, grid simulation tool is used extensively as an important research tool. This paper proposes JFreeSim, a new grid simulation tool based on

multiple tasks, multiple schedulers and multiple resources

(MTMSMR) model. As a modular and extensible simulation tool, JFreeSim realizes many kinds of entity modeling and communication mechanism between all entities, and makes system simulation accord with the characteristics of the grid environment. Experiments indicate that JFreeSim can provide users with flexibility in configuring and meet requirements of different system architectures and applications, and the simulation results are as expected.

Hai Jin, Jin Huang, Xia Xie, Qin Zhang
OOML-Based Ontologies and Its Services for Information Retrieval in UDMGrid

In order to effectively integrate and share the enormous dispersed resources of various digital museums, University Digital Museum Grid (UDMGrid) has been developed to provide one-stop information services about kinds of digital specimens in the form of grid services. To eliminate the heterogeneity between the information resources, shared concepts for these digital museums are indispensable. This paper studies OOML-based ontologies and its services for information retrieval in UDMGrid, including the object oriented ontology construction and ontology mapping, in which a novel inheritance mechanism is proposed to eliminate logic confusion. On the basis of OOML-based ontologies, ontology services are developed to assist the information retrieval by transforming global concepts to local concepts.

Xixi Luo, Xiaowu Chen

Networking

A Hybrid Integrated QoS Multicast Routing Algorithm in IP/DWDM Optical Internet

An integrated QoS multicast routing algorithm in IP/DWDM optical Internet is proposed in this paper. Considering load balancing, given a multicast request and flexible QoS requirement, to find a QoS multicast routing tree is NP-hard. Thus, a hybrid algorithm based on simulated annealing and tabu search is introduced to construct the cost suboptimal QoS multicast routing tree, embedding the wavelength assignment procedure based on segment and wavelength graph ideas. Hence, the multicast routing and wavelength assignment is solved integratedly. Simulation results have shown that the proposed algorithm is both feasible and effective.

Xingwei Wang, Jia Li, Min Huang
An Efficient Distributed Broadcasting Algorithm for Ad Hoc Networks

In this paper, an efficient distributed heuristic-based algorithm is presented, which is based on joint distance-counter threshold scheme. It features a distributed manner by each node in the network needing no global information. Each node in an ad hoc network receives the message from its neighbors and decides whether to operate retransmitting or not according to the signal strength and times of the receiving messages. The algorithm has superiority such as reliability, rebroadcast saving, less communication overhead for broadcasting task, localized and parameter-less behaviors, so it is easy to operate and possesses a good performance in mobile ad hoc communication environments. A comparison with several other existing algorithms is conducted. It shows by simulation results that the new algorithm is more efficient than others.

Qiang Sun, Layuan Li
Chaos-Based Dynamic QoS Scheme and Simulating Analysis

This paper takes use of chaos related theories to analyze the real network traffic about its chaotic properties and prediction attributes. Owning to the good performance of chaos-based prediction in short term forecasting, the prediction-based DiffServ framework and the Dynamic QoS scheme are firstly given in the paper. The OPNET-based simulating result shows that the QoS performaces and the network’s throughputs in heavy-load environment are all improved remarkably, comparing with the traditional static QoS configuring and measure-based dynamic QoS setting methods.

Qigang Zhao, Qunzhan Li
Dynamic Delaunay Triangulation for Wireless Ad Hoc Network

Geometric routing protocols benefit from localized Delaunay triangulation, which can guarantee the delivery of packet and bound the length of route. In this paper we propose a localized algorithm to build Delaunay triangulation in wireless ad hoc network. The algorithm considers not only stationary situation but also dynamic situation in which nodes can dynamically join and leave the network. The communication cost of the algorithm is O(

nlogn

). Therefore, our algorithm is applicable in wireless sensor network, in which nodes dynamically join and leave network. We also prove the correctness of the algorithm.

Ming Li, XiCheng Lu, Wei Peng
Energy Efficient Multipath Routing in Large Scale Sensor Networks with Multiple Sink Nodes

Due to the battery resource constraint, it is a critical issue to save energy in wireless sensor networks, particularly in large sensor networks. One possible solution is to deploy multiple sink nodes simultaneously. In this paper, we propose a protocol called MRMS (Multipath Routing in large scale sensor networks with Multiple Sink nodes) which incorporates multiple sink nodes, a new path cost metric for improving path selection, dynamic cluster maintenance and path switching to improve energy efficiency. MRMS is shown to increase the lifetime of sensor nodes substantially compared to other algorithms based on a series of simulation experiments.

Yuequan Chen, Edward Chan, Song Han
FLC: A Novel Dynamic Buffer Tuner for Shortening Service Roundtrip Time over the Internet by Eliminating User-Level Buffer Overflow on the Fly

The proposed Fuzzy Logic Controller (FLC) is for dynamic buffer tuning at the user/server level. It eliminates buffer overflow on-line by ensuring that the buffer length always cover the queue length adaptively. The FLC contributes to prevent the AQM (active queue management) resources dished out at the system level from being wasted and to shorten the service roundtrip time (RTT) by reducing retransmission caused by user-level buffer overflow at the reception side. Combining fuzzy logic and the conventional PIDC model creates the FLC that operates with the {0, Δ}

2

objective function. The fuzzy logic maintains the given safety margin about the reference point, which is symbolically represented by “0” in {0, Δ}

2

. The short execution time of the FLC makes it suitable for supporting time-critical applications over the Internet.

Wilfred W. K. Lin, Allan K. Y. Wong, Tharam S. Dillon
Intelligent Congestion Avoidance in Differentiated Service Networks

Active Queue management (AQM) takes a trade-off between link utilization and delay experienced by data packets. From the viewpoint of control theory, it is rational to regard AQM as a typical regulation system. Although PI controller for AQM outperforms RED algorithm, the mismatches in simplified TCP flow model inevitably degrades the performance of controller designed with classic control theory. The Differentiated Service (Diff-Serv) architectures are proposed to deliver Quality of Service (QoS) in TCP/IP networks. The aim of this paper is to design an active queue management system to secure high utilization, bounded delay and loss, while the network complies with the demands each traffic class sets. To this end, predictive control strategy is used to design the congestion controller. This control strategy is suitable for plants with time delay, so the effects of round trip time delay can be reduced suing predictive control algorithm in comparison with the other exciting control algorithms. Simulation results of the proposed control action for the system with and without round trip time delay, demonstrate the effectiveness of the controller in providing queue management system.

Farzad Habibipour, Ahmad Faraahi, Mehdi Glily
Rule-Based Anomaly Detection of Inter-domain Routing System

Inter-domain routing (IDR) system is a critical part of the Internet infrastructure. However, anomalies exist in BGP routing behaviors because of BGP misconfigurations, router malfunctions or deliberate attacking. To help secure the IDR system, this paper presents a rule-based framework and a rich set of detection rules to identify the abnormal routing behaviors. The detection rules are categorized into General Anomaly-detection Rules (GADRs) and Special Anomaly-detection Rules (SADRs), and they work together with the Basic Models and the Generated Models of the Internet respectively. Under the proposed framework, a prototype system, ISP-Health, is implemented, which can find out various abnormal routes and complex hidden routing attacks.

Peidong Zhu, Xin Liu, Mingjun Yang, Ming Xu
Transaction of Web Services Based on Struts

There are many frameworks for web applications and Struts is one of them with a collection of Java code designed to help you build solid applications while saving time. It provides the basic skeleton and plumbing, and takes complex applications as a series of basic components: Views, Action classes, and Model components. Web services are new technology for next generation Internet. In this paper, Struts-enabled framework is first described and then a transaction scenario is discussed for web services applications.

Gong-Xuan Zhang, Ping-Li Wang, Wen Chen

Applied Technologies

A Method of Aggregate Query Matching in Semantic Cache for Massive Database Applications

Aggregate queries are frequent in massive database applications. Their execution tends to be time consuming and costly. Therefore efficiently executing aggregate queries is very important. Semantic cache is a novel method for aiding query evaluation that reuses results of previously answered queries. But little work has been done on semantic cache involving aggregate queries. This is a limiting factor in its applicability. To use semantic cache in massive database applications, it is necessary to extend semantic cache to process aggregate query. In this paper, query matching is identified as a foundation for answering aggregate query by semantic caches. Firstly a formal semantic cache model for aggregate query is proposed. Based on this model, we discuss aggregate query matching. Two algorithms are presented for aggregate query matching. These two algorithms have been implemented in a massive database application project. The practice shows the algorithms are efficient.

Jianyu Cai, Yan Jia, Shuqiang Yang, Peng Zou
A Parallel Modular Exponentiation Scheme for Transformed Exponents

This paper introduces an efficient method to compute modular exponentiation operations in parallel. For the parallel process of a modular exponentiation operation, the exponent is transformed into mixed radix digits first. Each digit is then an exponent for a partial result of the modular exponentiation operation. Because the computing processes for these partial results are highly independent, they can be carried out concurrently. The bases in these partial exponentiation operations can be pre-computed and used till the exponent moduli set changed. If the largest mixed radix digit is

k

-bits with respect to

m

exponent moduli, the time complexity for the proposed scheme is then

k

+

log

2

m

. The performing complexity is very efficient, compared with other methods. Since the comparison is based on the same modular multiplication hardware, the performance is better if the fewer operations required. In the scenario of two exponent moduli, the performance improvment is approximately 40%. Finally, the proposed scheme is presented with a parallel algorithm for which the computing architecture is also illustrated in the paper.

Chin-Chen Chang, Yeu-Pong Lai
Content Selection Model for Adaptive Content Delivery

In order to adapt content delivery to different client capabilities and preferences, we propose a content selection model to automatically classify HTML content based on its functionality, then map client descriptions on preferences and device capabilities into our classification scheme, and finally selectively deliver the content which users want and which devices can handle. The experiment shows that our content selection model could reduce HTML object size, object latency and page latency. Therefore, it is effective in saving network resources and improving clients’ access experiences.

Chen Ding, Shutao Zhang, Chi-Hung Chi
Dynamic Service Provisioning for Multiplayer Online Games

Multiplayer online games have become a popular class of distributed applications with an enormous amount of running Internet-based game sessions. The basic concept of how to provide game services for users has not changed for years: High-bandwidth, dedicated game servers are statically set up to continuously run game sessions, regardless of how many users actually play. This straightforward approach is inefficient, because it does not take the current user demand into account, thus wasting resources. In this paper, we present a novel system architecture for organizing dynamic, on-demand game services for single-server online games. Our system allows users to book game services for immediate play or some time in advance. The system takes the users’ demands into account and dynamically sets up the required server resources in an efficient way. In contrast to the usually offered flat-rate rental of servers on at least a monthly basis, our system allows to charge users depending on the actual services usage and to incorporate new pay-per-use business models.

Jens Müller, Rafael Schwerdt, Sergei Gorlatch
Principal Component Analysis for Distributed Data Sets with Updating

Identifying the patterns of large data sets is a key requirement in data mining. A powerful technique for this purpose is the principal component analysis (PCA). PCA-based clustering algorithms are effective when the data sets are found in the same location. In applications where the large data sets are physically far apart, moving huge amounts of data to a single location can become an impractical, or even impossible, task. A way around this problem was proposed in [10], where truncated singular value decompositions (SVDs) are computed locally and used to reduce the communication costs. Unfortunately, truncated SVDs introduce local approximation errors that could add up and would adversely affect the accuracy of the final PCA. In this paper, we introduce a new method to compute the PCA without incurring local approximation errors. In addition, we consider the situation of updating the PCA when new data arrive at the various locations.

Zheng-Jian Bai, Raymond H. Chan, Franklin T. Luk
Priority Conscious Transaction Routing in a Real-Time Shared Disks Cluster

A great deal of research indicates that the shared disks (SD) cluster is suitable to high performance transaction processing. However, the aggregation of SD cluster with real-time processing has not been investigated. By adopting cluster technology, the real-time services will be highly available and can exploit inter-node parallelism. In this paper, we propose a priority conscious transaction routing algorithm for a real-time SD cluster, which allocates real-time transactions to a node in the SD cluster. Unlike traditional routing algorithms that consider transaction affinity and load balancing only, our algorithm also considers transaction priorities inherent to real-time applications. We evaluate the performance of our algorithm under a wide variety of real-time workloads. The experiment results show that our algorithm outperforms both pure priority-based algorithms and pure affinity-based algorithms.

Kyungoh Ohn, Sangho Lee, Haengrae Cho
Probabilistic Continuous Update Scheme in Location Dependent Continuous Queries

It is difficult to maintain the exact location of mobile objects due to the limited resources in a mobile network. A consequence of this problem is that the update cost for a location-dependent continuous query for moving objects can be quite high using traditional methods. In this paper, we propose a probabilistic update method to maintain the fidelity of the query results without incurring significant update cost. Our scheme makes use of two types of updates, one to keep the uncertainty of the mobile object’s position within a specific confidence interval, and the other using probability that the moving object’s location uncertainty will affect the query result as the threshold to decide whether an update should be generated or not. The effectiveness of our approach is demonstrated using a series of simulation experiments.

Song Han, Edward Chan
SIP-Based Adaptive Multimedia Transmissions for Wired and Wireless Networks

SIP (Session Initiation Protocol) is a signaling protocol standardized by IETF, aiming to manage the multimedia transmission sessions among different parties. This paper illustrates an adaptive multimedia transmission system for wired and wireless networks based on SIP with protocol selection mechanism for a certain level of QoS guarantee. In our system, SIP is not only used for call setup signaling but also for carrying the information in the protocol selection. Using Agent Server, our system can select the most suitable protocol for adapting different situations intelligently during the connections and data buffering service is also provided for various media data flows between the end users with acceptable QoS level without any interruption and disconnection regardless of types of devices, platforms and protocols used.

Weijia Jia, Man-Ching Yuen
WM+: An Optimal Multi-pattern String Matching Algorithm Based on the WM Algorithm

The WM algorithm, designed by Sun Wu and Udi Manber, is considered the fastest multi-pattern string matching algorithm in practice except when the pattern number is very large or the alphabet size is small[2]. Theoretically, the scanning time of WM is average-optimal (i.e. O(

n

log

σ

(

rm

)/

m

)), but in the worst case, its scanning time can not be evaluated at all. The maximum shift of the original WM algorithm is m-B+1, where

m

is the minimum length of all patterns and B is the

q

-gram size. The tuned WM algorithm (abbreviated as WM+) can reach higher performance by improving the

shift

table building algorithm and combining the AC algorithm with the original WM algorithm. And the scanning time of the WM+ algorithm in the worst case is predictable. Experiments show that the scanning time of the WM+ algorithm is less or not great than that of the WM algorithm for varied size of

m

and number of patterns, especially in the worst case.

Xunxun Chen, Binxing Fang, Lei Li, Yu Jiang
Backmatter
Metadaten
Titel
Advanced Parallel Processing Technologies
herausgegeben von
Jiannong Cao
Wolfgang Nejdl
Ming Xu
Copyright-Jahr
2005
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-32107-1
Print ISBN
978-3-540-29639-3
DOI
https://doi.org/10.1007/11573937