Skip to main content

2013 | Buch

Network and Parallel Computing

10th IFIP International Conference, NPC 2013, Guiyang, China, September 19-21, 2013. Proceedings

herausgegeben von: Ching-Hsien Hsu, Xiaoming Li, Xuanhua Shi, Ran Zheng

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the 10th IFIP International Conference on Network and Parallel Computing, NPC 2013, held in Guiyang, China, in September 2013. The 34 papers presented in this volume were carefully reviewed and selected from 109 submissions. They are organized in topical sections named: parallel programming and algorithms; cloud resource management; parallel architectures; multi-core computing and GPU; and miscellaneous.

Inhaltsverzeichnis

Frontmatter

Session 1: Parallel Programming and Algorithms

A Virtual Network Embedding Algorithm Based on Graph Theory

Network virtualization is becoming a promising way of removing the inherent ossification of the Internet, and has been steadily attracting more and more researchers’ attention during the decades. The major challenges in this field are improving the efficiency of virtual network embedding procedure and raising the rate of virtual network requests being successfully accepted by the substrate network. This paper introduces a new virtual network embedding algorithm based on node similarity, which means the similarity between the virtual nodes and the substrate nodes. For more details, by calculating the degree of nodes both in virtual network and substrate network, which is actually the number of links associated with them, the algorithm achieves better mapping results between virtual network and the substrate network on the topology aspect.

Zhenxi Sun, Yuebin Bai, Songyang Wang, Yang Cao, Shubin Xu
Access Annotation for Safe Program Parallelization

The safety of speculative parallelization depends on monitoring all program access to shared data. The problem is especially difficult in software-based solutions. Till now, automatic techniques use either program instrumentation, which can be costly, or virtual memory protection, which incurs false sharing. In addition, not all access requires monitoring. It is worth considering a manual approach in which programmers insert access annotations to reduce the cost and increase the precision of program monitoring.

This paper presents an interface for access annotation and two techniques to check the correctness of user annotation, i.e. whether all parallel executions are properly monitored and guaranteed to produce the sequential result. It gives a quadratic-time algorithm to check the exponential number of parallel interleavings. The paper then uses the annotation interface to parallelize several programs with uncertain parallelism. It demonstrates the efficiency of program monitoring by a performance comparison with OpenMP, which does not monitor data access or guarantee safety.

Chen Ding, Lei Liu
Extracting Threaded Traces in Simulation Environments

Instruction traces play an important role in analyzing and understanding the behavior of target applications; however, existing tracing tools are built on specific platforms coupled with excessive reliance on compilers and operating systems. In this paper, we propose a precise thread level instruction tracing approach for modern chip multi-processor simulators, which inserts instruction patterns into programs at the beginning of main thread and slave threads. The target threads are identified and captured in a full system simulator using the instruction patterns without any modifications to the compiler and the operating system. We implemented our approach in the GEM5 simulator and evaluations were performed to test the accuracy on x86-Linux using standard benchmarks. We compared our traces to the ones collected by a Pin-tool. Experimental results show that traces extracted by our approach exhibit high similarity to the traces collected by the Pin-tool. Our approaches of extracting traces can be easily applied to other simulators with minor modification to the instruction execution engines.

Weixing Ji, Yi Liu, Yuanhong Huo, Yizhuo Wang, Feng Shi
A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors

The LU decomposition is a widely used method to solve the dense linear algebra in many scientific computation applications. In recent years, the single instruction multiple data (SIMD) technology has been a popular method to accelerate the LU decomposition. However, the pipeline parallelism and memory bandwidth utilization are low when the LU decomposition mapped onto SIMD processors. This paper proposes a fine-grained pipelined implementation of LU decomposition on SIMD processors. The fine-grained algorithm well utilizes data dependences of the native algorithm to explore the fine-grained parallelism among all the computation resources. By transforming the non-coalesced memory access to coalesced version, the proposed algorithm can achieve the high pipeline parallelism and the high efficient memory access. Experimental results show that the proposed technology can achieve a speedup of 1.04x to 1.82x over the native algorithm and can achieve about 89% of the peak performance on the SIMD processor.

Kai Zhang, ShuMing Chen, Wei Liu, Xi Ning
FRESA: A Frequency-Sensitive Sampling-Based Approach for Data Race Detection

Concurrent programs are difficult to debug due to the inherent concurrence and indeterminism. One of the problems is race conditions. Previous work on dynamic race detection includes fast but imprecise methods that report false alarms, and slow but precise ones that never report false alarms. Some researchers have combined these two methods. However, the overhead is still massive. This paper exploits the insight that full record on detector is unnecessary in most cases. Even prior sampling method has something to do to reduce overhead with precision guaranteed. That is, we can use a frequency-sensitive sampling approach. With our model on sampling dispatch, we can drop most unnecessary detection overhead. Experiment results on DaCapo benchmarks show that our heuristic sampling race detector is performance-faster and overhead-lower than traditional race detectors with no loss in precision, while never reporting false alarms.

Neng Huang, Zhiyuan Shao, Hai Jin
One-to-One Disjoint Path Covers in DCell

DCell has been proposed for one of the most important data center networks as a server centric data center network structure. DCell can support millions of servers with outstanding network capacity and provide good fault tolerance by only using commodity switches. In this paper, we prove that there exist

r

vertex disjoint paths {

P

i

| 1 ≤ 

i

 ≤ 

r

} between any two distinct vertices

u

and

v

of

DCell

k

(

k

 ≥ 0) where

r

 = 

n

 + 

k

 − 1 and

n

is the vertex number of

DCell

0

. The result is optimal because of any vertex in

DCell

k

has

r

neighbors with

r

 = 

n

 + 

k

 − 1.

Xi Wang, Jianxi Fan, Baolei Cheng, Wenjun Liu, Yan Wang

Session 2: Cloud Resource Management

A Network-Aware Virtual Machine Allocation in Cloud Datacenter

In a cloud computing environment, virtual machine allocation is an important task for providing infrastructure services. Generally, the datacenters, on which a cloud computing platform runs, are distributed over a wide area network. Therefore, communication cost should be taken into consideration when allocating VMs across servers of multiple datacenters. A network-aware VM allocation algorithm for cloud is developed. It tries to minimize the communication cost and latency between servers, with the number of VMs, VM configurations and communication bandwidths are satisfied to users. Specifically, a two-dimensional knapsack algorithm is applied to solve this problem. The algorithm is evaluated and compared with other ones through experiments, which shows satisfying results.

Yan Yao, Jian Cao, Minglu Li
Research on the RRB+ Tree for Resource Reservation

The performance of the data structure has a significant impact on the overall performance of the advance resource reservation in the distributed computing. Because the query and update operations of the B+ tree are of high efficiency, so this paper proposes a B+ tree structure suitable for resource reservation the RRB+ tree. Also, we design and implement the corresponding algorithms of query, insertion and deletion. Different with the B+ tree that insert and delete one key word at a time, the RRB+ tree insert one reservation request and delete one tree node every time. The RRB+ tree is of a higher precision of expression. With the fixed reservation admission control algorithm and the same rate of acceptance, the experimental results show that the RRB+ tree is easier to operate for the complex and changing network environment, and have a higher utilization of storage space.

Libing Wu, Ping Dang, Lei Nei, Jianqun Cui, Bingyi Liu
Totoro: A Scalable and Fault-Tolerant Data Center Network by Using Backup Port

Scalability and fault tolerance become a fundamental challenge of data center network structure due to the explosive growth of data. Both structures proposed in the area of parallel computing and structures based on tree hierarchy are not able to satisfy these two demands. In this paper, we propose Totoro, a scalable and fault-tolerant network to handle the challenges by using backup built-in Ethernet ports. We connect a bunch of servers to an intra-switch to form a basic partition. Then we utilize half of backup ports to connect those basic partitions with inter-switches to build a larger partition. Totoro is hierarchically and recursively defined and the high-level Totoro is constructed by many low-level Totoros. Totoro can scale to millions of nodes. We also design a fault-tolerant routing protocol. Its capability is very close to the performance bound. Our experiments show that Totoro is a viable interconnection structure for data centers.

Junjie Xie, Yuhui Deng, Ke Zhou
A Cloud Resource Allocation Mechanism Based on Mean-Variance Optimization and Double Multi-Attribution Auction

As a new kind of commercial model, cloud computing can integrate various kinds of resources in the network. Resource providers offer these resources to users in the form of service and receive corresponding profits. To make more rational use of the cloud resources, an effective mechanism is necessary for allocating the resources. In this paper, the price attribution and non-price attributions of both traders are analyzed. The support vector machine algorithm is utilized to predict the price, further determining the quote and bid. Then, the BP neural network algorithm is used to transfer the non-price attributions to the quality index. Finally, to maximize the total satisfaction of resource providers and resource consumers, the mean-variance optimization algorithm is adopted to obtain the optimized cloud resource allocation scheme. Simulation results have shown that the proposed mechanism is feasible and effective.

Chengxi Gao, Xingwei Wang, Min Huang
ITC-LM: A Smart Iteration-Termination Criterion Based Live Virtual Machine Migration

Live migration of virtual machines (VMs) plays an important role in grids, clouds and datacenters, and has become the cornerstone of resource management in virtualized systems. The efficiency of live migration depends on the downtime, total migration time and total transferred data. However, while migrating a memory-intensive VM, XEN/KVM often do many useless iterations of memory copy in order to reach expected downtime which can never be reached, leading to a great deal of useless data transferring and insufferable total migration time. It consumes mass of network bandwidth and CPU resource when transferring memory from one to another node. Hence, a critical task is to determine the optimal time to terminate the copy iteration for live migration. In this paper, we propose a smart iteration-termination criterion based live migration which is termed as ITC-LM, to self adaptively control when to terminate iteration. We have implemented ITC-LM into KVM/QEMU. The improvement is significant, especially when migrate a memory-intensive VM. The experimental results show that, our approach can decrease 50.33% of total transferred data on average without impairing migration downtime.

Liangwei Zhu, Jianhai Chen, Qinming He, Dawei Huang, Shuang Wu
A Scheduling Method for Multiple Virtual Machines Migration in Cloud

Infrastructure as a Service(IaaS) is important in Cloud Computing, which provides on-demand virtual machines(VMs) to users. The resource management plays an important role in IaaS cloud, which deploys and relocates virtual machine on available hosts for different targets, such as load balancing, power saving and resource utilization improving. The virtual machine placement problem can be considered as a bin packing problem. Many researchers use the heuristic algorithms based approach to solve this virtual machine placement problem. However, they all focus on how to find the optimization solution for the bin packing problem of virtual machine placement. These studies did not consider the scheduling of multiple virtual machine migration that involved in the transfer process from one V-P mapping to another. Because of the large overhead produced by virtual machine migration, the optimization of multiple virtual machines migration process could reduce the overhead of resource management in IaaS cloud, and accelerate the migration process. In this paper, we analyse and formal the multiple virtual machines migration problem, and propose a scheduling method to reduce the VM migration times and accelerate the migration process. Experiments show that our method can decrease the VM migration times, reduce the traffic and accelerate the process of multiple virtual machine migration.

Zhenzhong Zhang, Limin Xiao, Xianchu Chen, Junjie Peng

Session 3: Parallel Architectures

Speeding Up Galois Field Arithmetic on Intel MIC Architecture

Galois Field arithmetic is the basis of LRC, RS and many other erasure coding approaches. Traditional implementations of Galois Field arithmetic use multiplication tables or discrete logarithms, which limit the speed of its computation. The Intel Many Integrated Core (MIC) Architecture provides 60 cores on chip and very wide 512-bit SIMD instructions, attractive for data intensive applications. This paper demonstrates how to leverage SIMD instructions and shared memory multiprocessing on MIC to perform Galois Field arithmetic. The experiments show that the performance of the computation is significantly enhanced.

Kai Feng, Wentao Ma, Wei Huang, Qing Zhang, Yili Gong
Analyzing the Characteristics of Memory Subsystem on Two Different 8-Way NUMA Architectures

Two NUMA architectures with different memory subsystems are experimentally analyzed in this paper. By applying the benchmark with various access patterns, it shows much different characteristics of memory system between Xeon E5620 with Global Queue and LS 3A with typical crossbar switch. The experiment results reveal the fact that LS 3A and Xeon E5620 have some similar features. Our study also showed some other diverse features of these two platforms: due to the different contention locations and mechanisms, the memory access model for E5620 doesn’t fit for LS 3A. Through comparing, we find that one advantage of LS 3A is that it can obtain steady bandwidth on both local and remote thread, and it is more fair for local and remote access under some circumstances. Another fact is that LS 3A is not such sensitive to remote access, compared with E5620, so there will be no obvious performance degradation caused by non-local memory access.

Qiuming Luo, Yuanyuan Zhou, Chang Kong, Guoqiang Liu, Ye Cai, Xiao-Hui Lin
Software/Hardware Hybrid Network-on-Chip Simulation on FPGA

In this paper, a software-and-hardware hybrid simulation method for CMP (Chip-MultiProcessor) system is designed, as well as its performance model. In detail, the NoC (Network-On-Chip) module is totally simulated by the FPGA resource; a software-and-hardware interaction interface of this module is provided so that the simulation software running on the on-chip soft core can cooperate with the NoC to complete the whole simulation. In other words, the most time-consuming and relatively-fixed part is implemented by hardware and others are implemented by software, which maintains simulation flexibility and high performance owing to the compact on-chip design. We implement this design on the Xilinx’s Virtex 5 155T chip and the work frequency is 100Mhz. Compared with a typical software counterpart, the simulation speed of NoC is more than 3000 times faster; and the advantage is widened further with the increasing injection rate. Moreover, compared with another hybrid method executing the software part on the host CPU, it is still fairly faster although the host performance is much higher than the on-chip core.

Youhui Zhang, Peng Qu, Ziqiang Qian, Hongwei Wang, Weimin Zheng
Total Exchange Routing on Hierarchical Dual-Nets

The hierarchical dual-net (HDN) is a newly proposed interconnection network for massive parallel computers. The HDN is constructed based on a symmetric product graph (base network). A

k

-level hierarchical dual-net, HDN(

B

,

k

,

S

), contains

$n_k=(2n_0)^{2^k}/(2\prod_{i=1}^{k}s_i)$

nodes, where

S

 = {

G

1

,

G

2

,…,

G

k

},

G

i

is a super-node and

s

i

 = |

G

i

| is the number of nodes in the super-node at the level

i

for 1 ≤ 

i

 ≤ 

k

, and

n

0

is the number of nodes in the base network

B

. The

S

is used mainly for adjusting the scale of the system. The node degree of HDN(

B

,

k

,

S

) is

d

0

 + 

k

, where

d

0

is the node degree of the base network. The HDN is node and edge symmetric and can contain huge number of nodes with small node-degree and short diameter. The total exchange is one of the most dense communication patterns and is at the heart of numerous applications and programming models in parallel computing. In this paper, we show that the total exchange routing can be done on HDN efficiently.

Yamin Li, Wanming Chu
Efficiency of Flexible Rerouting Scheme for Maximizing Logical Arrays

In a multiprocessor array, some processing elements (PEs) fail to function normally due to hardware defects or soft faults caused by overheating, overload or occupancy by other running applications. Fault-tolerant reconfiguration considered in this paper is to reorganize fault-free PEs from a processor array with faults to a logical array of regular mesh topology by changing the interconnections among PEs. This paper presents the efficiency of the flexible rerouting scheme to maximize the usage of the fault-free PEs, by developing an efficient reconfiguration algorithm without backtracking. The proposed algorithm constructs each logical columns from left to right on candidate PE sets. It updates the candidate sets by excluding the PEs which cannot be used, once a logical column is formed. Also, it is proved that the proposed heuristic algorithm is able to generate the maximum-size logical array in linear time. Experimental results show that 123 logical columns can be constructed on 256 ×256 host arrays with fault density of 30%, resulting in an improvement of 51% in comparison to the previous algorithm by which only 82 logical columns can be produced. Furthermore, our algorithm is able to generate target arrays with harvest over 56% on host arrays with fault density of 50%, while the previous work cited in this paper fails to construct any target array in this case.

Guiyuan Jiang, Jigang Wu, Jizhou Sun
An Efficient Crosstalk-Free Routing Algorithm Based on Permutation Decomposition for Optical Multi-log2 N Switching Networks

Optical switching networks (OSN) based on optical directional couplers (DC) may be the most promising candidate to provide a high switching rate when the speed mismatch problem between links (optical fibers) and switches is increasingly serious. Although such switches have many advantages, the DC suffers from an inherent crosstalk problem that can greatly aggravate the switching performance. Based on semi-permutations, a parallel decomposition algorithm,which is called

multi-decomposition

, is proposed in this paper for solving the optical crosstalk problem in optical multi-log

2

N switching networks. According to the number of planes in a multi-log

2

N network, the multi-decomposition is performed in parallel to partition a permutation into several sub-permutations, each of which is established without crosstalk within each plane. We demonstrate that our algorithm can completely remove the crosstalk in optical multi-log

2

N networks when

n

is even, and that it may be generated only in the stage (

n

-1)/2 (i.e., the middle stage) when

n

is odd, but the corresponding probability of generating crosstalk is to be less than or equal to

$\frac{1}{2^{(n+1)/2}-1}$

. In addition, our algorithm can achieve a low complexity for decomposition a permutation due to its parallelism so that any permutations can be realized in multi-log

2

N

networks under the constraint of avoiding crosstalk.

Xiaofeng Liu, Youjian Zhao, Yajuan Wu
Conditional Diagnosability of Complete Josephus Cubes

The growing size of the multiprocessor system increases its vulnerability to component failures. The fault diagnosis is the process of identifying faulty processors in a system through self-testing, and the diagnosability is an important parameter to measure the reliability of an interconnection network. As a new measure of fault tolerance, conditional diagnosability can better evaluate the real diagnosability of interconnection networks. In this paper, we derive the conditional diagnosability of the multiprocessor systems in terms of Complete Josephus Cubes

CJC

n

(

n

 ≥ 8) under the comparison model.

Lishan Lu, Shuming Zhou
Circular Dimensional-Permutations and Reliable Broadcasting for Hypercubes and Möbius Cubes

Reliable broadcasting for interconnection networks can be achieved by constructing multiple independent spanning trees(ISTs) rooted at the same node. In this paper, we prove that there exists (

n

 − 1)! sets of ISTs rooted at an arbitrary node for

Q

n

and

M

n

based on circular dimensional-permutations of 0, 1, …,

n

 − 1 and

n

 ≥ 1. At the same time, we give an parallel algorithm, called BCIST, which is the further study of IST problem for

Q

n

and

M

n

in literature. Furthermore, simulation experiments of ISTs based on JUNG framework and different sets of disjoint paths between node 1 and any node

v

 ∈ 

V

(0-

M

4

)\{1} for 0-

M

4

are also presented.

Baolei Cheng, Jianxi Fan, Jiwen Yang, Xi Wang

Session 4: Multi-core Computing and GPU

Accelerating Parallel Frequent Itemset Mining on Graphics Processors with Sorting

Frequent Itemset Mining (FIM) is one of the most investigated fields of data mining. The goal of Frequent Itemset Mining (FIM) is to find the most frequently-occurring subsets from the transactions within a database. Many methods have been proposed to solve this problem, and the Apriori algorithm is one of the best known methods for frequent Itemset mining (FIM) in a transactional database. In this paper, a parallel Frequent Itemset Mining Algorithm, called Accelerating Parallel Frequent Itemset Mining on Graphic Processors with Sorting (APFMS), is presented. This algorithm utilizes new-generation graphic processing units (GPUs) to accelerate the mining process. In it, massive processing units of GPU were used to speed up the frequent item verification procedure on the OpenCL platform. The experimental results demonstrated that the proposed algorithm had dramatically reduced computation time compared with previous methods.

Yuan-Shao Huang, Kun-Ming Yu, Li-Wei Zhou, Ching-Hsien Hsu, Sheng-Hui Liu
Asymmetry-Aware Scheduling in Heterogeneous Multi-core Architectures

As threads of execution in a multi-programmed computing environment have different characteristics and hardware resource requirements, heterogeneous multi-core processors can achieve higher performance as well as power efficiency than homogeneous multi-core processors. To fully tap into that potential, OS schedulers need to be heterogeneity-aware, so they can match threads to cores according to characteristics of both. We propose two heterogeneity-aware thread schedulers, PBS and LCSS. PBS makes scheduling based on applications’ sensitivity on large cores, and assigns large cores to applications that can achieve better performance gains. LCSS balances the large core resource among all applications. We have implemented these two schedulers in Linux and evaluated their performance with the PARSEC benchmark on different heterogeneous architectures. Overall, PBS outperforms Linux scheduler by 13.3% on average and up to 18%. LCSS achieves a speedup of 5.3% on average and up to 6% over Linux scheduler. Besides, PBS brings good performance with both asymmetric and symmetric workloads, while LCSS is more suitable for scheduling symmetric workloads. In summary, PBS and LCSS provide repeatability of performance measurement and better performance than the Linux OS scheduler.

Tao Zhang, Xiaohui Pan, Wei Shu, Min-You Wu
Scalable-Grain Pipeline Parallelization Method for Multi-core Systems

How to parallelize the great amount of legacy sequential programs is the most difficult challenge faced by multi-core designers. The existing parallelization methods at the compile time due to the obscured data dependences in C are not suitable for exploring the parallelism of streaming applications. In this paper, a software pipeline for multi-layer loop method is proposed for streaming applications to exploit the coarse-grained pipeline parallelism hidden in multi-layer loops. The proposed method consists of three major steps: 1) transform the task dependence graph of a streaming application to resolve intricate dependence, 2) schedule tasks to multiprocessor system-on-chip with the objective of minimizing the maximal execution time of all pipeline stages, and 3) adjust the granularity of pipeline stages to balance the workload among all stages. The efficiency of the method is validated by case studies of typical streaming applications on multi-core embedded system.

Peng Liu, Chunming Huang, Jun Guo, Yang Geng, Weidong Wang, Mei Yang
An Effective Approach for Vocal Melody Extraction from Polyphonic Music on GPU

Melody extraction from polyphonic music is a valuable but difficult problem in music information retrieval. The extraction incurs a large computational cost that limits its application. Growing processing cores and increased bandwidth have made GPU an ideal candidate for the development of fine-grained parallel algorithms. In this paper, we present a parallel approach for salience-based melody extraction from polyphonic music using CUDA. For 21 seconds of polyphonic clip, the extraction time is cut from 3 seconds to 33 milliseconds using NVIDIA GeForce GTX 480 which is up to 100 times faster. The increased performance allows the melody extraction to be carried out for real-time applications. Furthermore, the evaluation of the extraction on huge datasets is also possible. We give insight into how such significant speed gains are made and encourage the development and adoption of GPU in music information retrieval field.

Guangchao Yao, Yao Zheng, Limin Xiao, Li Ruan, Zhen Lin, Junjie Peng
Modified Incomplete Cholesky Preconditioned Conjugate Gradient Algorithm on GPU for the 3D Parabolic Equation

In this study, for solving the three-dimensional partial differential equation

u

t

 = 

u

xx

 + 

u

yy

 + 

u

zz

, an efficient parallel method based on the modified incomplete Cholesky preconditioned conjugate gradient algorithm (MICPCGA) on the GPU is presented. In our proposed method, for this case, we overcome the drawbacks that the MIC preconditioner is generally difficult to be parallelized on the GPU due to the forward/backward substitutions, and thus present an efficient parallel implementation method on the GPU. Moreover, a vector kernel for the sparse matrix-vector multiplication, and optimization of vector operations by grouping several vector operations into a single kernel are adopted. Numerical results show that our proposed forward/backward substitutions and MICPCGA on the GPU both can achieve a significant speedup, and compared to an approximate inverse SSOR preconditioned conjugate gradient algorithm (SSORPCGA), our proposed MICPCGA obtains a bigger speedup, and outperforms it in solving the three-dimensional partial differential equation.

Jiaquan Gao, Bo Li, Guixia He
Partition-Based Hardware Transactional Memory for Many-Core Processors

Transactional memory is an appealing technology which frees programmer from lock-based programming. However, most of current hardware transactional memory systems are proposed for multi-core processors, and may face some challenges with the increasing of processor cores in many-core systems, such as inefficient utilization of transactional buffers, unsolved problem of transactional buffer overflow, etc. This paper proposes PM_TM, a hardware transactional memory for many-core processors. The system turns transactional buffers that are traditionally private to processor cores into shared by moving them from L1-level to L2-level, and uses partition mechanism to provide logically independent and dynamically expandable transactional buffers to transactional threads. As the result, the solution can utilize transactional buffers more efficient and moderate the problem of transactional buffer overflow. The system is simulated and evaluated using gems and simics simulator with STAMP benchmarks. Evaluation results show that the system achieves better performance and scalability than traditional solutions in many-core processors.

Yi Liu, Xinwei Zhang, Yonghui Wang, Depei Qian, Yali Chen, Jin Wu

Session 5: Miscellaneous

Roadside Infrastructure Placement for Information Dissemination in Urban ITS Based on a Probabilistic Model

Information dissemination is an important application in VANETs for traffic safety and efficiency. In urban area, roadside infrastructure nodes can be deployed for information dissemination. However, it is inefficient and uneconomical to cover the whole urban area. How to find the optimal locations to place DPs is a research problem. Some works on this issue have to collect accurate trajectories of all the vehicles, which is not practical in the real environment. In this paper, we propose a novel approach for DPs placement in grid road networks without knowing trajectories. Based on the analysis of path number between two intersections, a probabilistic model is proposed to get the trajectories estimation of vehicles. The theoretical optimal algorithm (OA) and two heuristic algorithms (called KP-G and GA) are developed for the problem. Simulation results reveal that GA is scalable and has the highest coverage ratio on average.

Bo Xie, Geming Xia, Yingwen Chen, Ming Xu
Relay Hop Constrained Rendezvous Algorithm for Mobile Data Gathering in Wireless Sensor Networks

Recent research shows that significant energy saving can be achieved in wireless sensor networks (WSNs) by introducing mobile collector (MC). One obvious bottleneck of such approach is the large data collection latency due to low mobile speed of MC. In this paper, we propose an efficient rendezvous based mobile data gathering protocol for WSNs, in which the aggregated data will be relayed to Rendezvous Node (RN) within bounded hop

d

. The algorithm design in the protocol jointly considers MC tour and data routing routes in aggregation trees. The effectiveness of the approach is validated through both theoretical analysis and extensive simulations.

Wenjun Liu, Jianxi Fan, Shukui Zhang, Xi Wang
Energy Efficient Task Scheduling in Mobile Cloud Computing

Cloud computing can enhance the computing capability of mobile systems by offloading. However, the communication between the mobile device and the cloud is not free. Transmitting large data to cloud consumes much more energy than processing data in mobile device, especially in a low bandwidth condition. Further, some processing tasks can avoid transmitting large data between mobile device and server. Those processing tasks (encoding, rendering) are as the compress algorithm, which can reduce the size of raw data before it is sent to server. In this paper, we present an

energy efficient task scheduling

strategy (EETS) to determine what kind of task with certain amount of data should be chosen to be offloaded under different environment. We have evaluated the scheduler by using an Android smartphone. The results show that our strategy can achieve 99% of accuracy to choose the right action in order to minimize the system energy usage.

Dezhong Yao, Chen Yu, Hai Jin, Jiehan Zhou
BotInfer: A Bot Inference Approach by Correlating Host and Network Information

Botnet is widely used in cyber-attacks and becomes a serious threat to network security. Existing approaches can detect botnet effectively in certain environments, however problems still exist in using host or network detection approaches respectively, such as robustness in detection tools, difficulties in global deployment and low precision rate. To solve the above problems, a novel detection approach called BotInfer is proposed. In BotInfer approach, host-based bot detection tools are deployed on some of the hosts; network flow of all the hosts is captured and analyzed; host detection result and flow information are correlated by the bot inference engine. Through the experiments, BotInfer can effectively detect the hosts in the network. When the deployment rate of bot detection tools in the network reaches 80%, the precision rate of the hosts with detection tools is about 99%, and the precision rate of the hosts without detection tools is about 86%.

Yukun He, Qiang Li, Yuede Ji, Dong Guo
On-Demand Proactive Defense against Memory Vulnerabilities

Memory vulnerabilities have severely affect system security and availability. Although there are a number of solutions proposed to defense against memory vulnerabilities, most of existing solutions protect the entire life cycle of the application or survive attacks after detecting attacks. This paper presents OPSafe, a system that make applications safely survive memory vulnerabilities for a period of time from the starting or in runtime with users’ demand. OPSafe can provide a hot-portable

Green Zone

of any size with users’ demand, where all the subsequent allocated memory objects including stack objects and heap objects are reallocated and safely managed in a protected memory area. When users open the green zone, OPSafe uses a comprehensive memory management in the protected memory area to adaptively allocate buffers with multiple times of their defined sizes and randomly place them. Combined with objects free masking techniques, OPSafe can avoid overrunning each other and dangling pointer errors as well as double free or invalid free errors. Once closing the green zone, OPSafe clears away all objects in the protected area and then frees the protected area. We have developed a Linux prototype and evaluated it using four applications which contains a wide range of vulnerabilities. The experimental results show that OPSafe can conveniently create and destruct a hot-portable green zone where the vulnerable application can survive crashes and eliminate erroneous execution.

Gang Chen, Hai Jin, Deqing Zou, Weiqi Dai
Mahasen: Distributed Storage Resource Broker

Modern day systems are facing an avalanche of data, and they are being forced to handle more and more data intensive use cases. These data comes in many forms and shapes: Sensors (RFID, Near Field Communication, Weather Sensors), transaction logs, Web, social networks etc. As an example, weather sensors across the world generate a large amount of data throughout the year. Handling these and similar data require scalable, efficient, reliable and very large storages with support for efficient metadata based searching. This paper present Mahasen, a highly scalable storage for high volume data intensive applications built on top of a peer-to-peer layer. In addition to scalable storage, Mahasen also supports efficient searching, built on top of the Distributed Hash table (DHT)

K. D. A. K. S. Perera, T. Kishanthan, H. A. S. Perera, D. T. H. V. Madola, Malaka Walpola, Srinath Perera
Probabilistic QoS Analysis of Web Services

In such a competitive world, quality assurance can make the difference between a successful business and bankruptcy. For Internet services, the presence of low performance servers, high latency or overall poor service quality can translate into lost sales, user frustration and customers lost. In this paper, we propose a novel method for QoS metrification based on Hidden Markov Models. The techniques we show can be used to measure and predict the behavior of Web Services under several criteria, and can thus be used to rank services quantitatively rather than just qualitatively. We demonstrate the feasibility and usefulness of our methodology by drawing experiments on real world data. Our results have shown how our proposed methods can help the user to automatically select the best available Web Service based on several metrics, among them system predictability and response times variability.

Waseem Ahmed, Yong Wei Wu
A Novel Search Engine to Uncover Potential Victims for APT Investigations

Advanced Persistent Threats (APT) are sophisticated and target-oriented cyber attacks which often leverage customized malware and bot control techniques to control the victims for remotely accessing valuable information. As the APT malware samples are specific and few, the signature-based or learning-based approaches are weak to detect them. In this paper, we take a more flexible strategy: developing a search engine for APT investigators to quickly uncover the potential victims based on the attributes of a known APT victim. We test our approach in a real APT case happened in a large enterprise network consisting of several thousands of computers which run a commercial antivirus system. In our best effort to prove, the search engine can uncover the other unknown 33 victims which are infected by the APT malware. Finally, the search engine is implemented on Hadoop platform. In the case of 440GB data, it can return the queries in 2 seconds.

Shun-Te Liu, Yi-Ming Chen, Shiou-Jing Lin
Backmatter
Metadaten
Titel
Network and Parallel Computing
herausgegeben von
Ching-Hsien Hsu
Xiaoming Li
Xuanhua Shi
Ran Zheng
Copyright-Jahr
2013
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-40820-5
Print ISBN
978-3-642-40819-9
DOI
https://doi.org/10.1007/978-3-642-40820-5