Skip to main content
main-content

Über dieses Buch

The two-volume set LNCS 11944-11945 constitutes the proceedings of the 19th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2019, held in Melbourne, Australia, in December 2019.

The 73 full and 29 short papers presented were carefully reviewed and selected from 251 submissions. The papers are organized in topical sections on: Parallel and Distributed Architectures, Software Systems and Programming Models, Distributed and Parallel and Network-based Computing, Big Data and its Applications, Distributed and Parallel Algorithms, Applications of Distributed and Parallel Computing, Service Dependability and Security, IoT and CPS Computing, Performance Modelling and Evaluation.

Inhaltsverzeichnis

Frontmatter

Parallel and Distributed Architectures

Frontmatter

SPM: Modeling Spark Task Execution Time from the Sub-stage Perspective

Tasks are the basic unit of Spark application scheduling, and its execution is affected by various configurations of Spark cluster. Therefore, the prediction of task execution time is a challenging job. In this paper, we analyze the features of task execution procedure on different stages, and propose the method of prediction of each sub-stage execution time. Moreover, the correlative time overheads of GC and shuffle spill are analyzed in detail. As a result, we propose SPM, a task-level execution time prediction model. SPM can be used to predict the task execution time of each stage according to the input data size and configuration of parallelism. We further apply SPM to the Spark network emulation tool SNemu, which can determine the start time of each shuffle procedure for emulation effectively. Experimental results show that the prediction method can achieve high accuracy in a variety of Spark benchmarks on Hibench.

Wei Li, Shengjie Hu, Di Wang, Tianba Chen, Yunchun Li

Improving the Parallelism of CESM on GPU

Community Earth System Model (CESM) is one of the most popular climatology research models. However, the computation of CESM is quite expensive and usually lasts for weeks even on high-performance clusters. In this paper, we propose several optimization strategies to improve the parallelism of three hotspots in CESM on GPU. Specifically, we analyze the performance bottleneck of CESM and propose corresponding GPU accelerations. The experiment results show that after applying our GPU optimizations, the kernels of the physical model achieve significant performance speedup respectively.

Zehui Jin, Ming Dun, Xin You, Hailong Yang, Yunchun Li, Yingchun Lin, Zhongzhi Luan, Depei Qian

Parallel Approach to Sliding Window Sums

Sliding window sums are widely used for string indexing, hashing, time series analysis and machine learning. New vector algorithms which utilize the advanced vector extension (AVX) instructions available on modern processors, or the parallel compute units on GPUs and FPGAs, would provide a significant performance boost.We develop a generic vectorized sliding sum algorithm with speedup for window size w and number of processors P is O(P/w) for a generic sliding sum. For a sum with commutative operator the speedup is improved to O(P/log(w)). Implementing the algorithm for the bioinformatics application of minimizer based k-mer table generation using AVX instructions, we obtain a speedup of over 5$$\times $$.

Roman Snytsar, Yatish Turakhia

Rise the Momentum: A Method for Reducing the Training Error on Multiple GPUs

Deep neural network training is a common issue that is receiving increasing attention in recent years and basically performed on Stochastic Gradient Descent or its variants. Distributed training increases training speed significantly but causes precision loss at the mean time. Increasing batchsize can improve training parallelism in distributed training. However, if the batchsize is too large, it will bring difficulty to training process and introduce more training error. In this paper, we consider controlling the total batchsize and lowering batchsize on each GPU by increasing the number of GPUs in distributed training. We train Resnet50 [4] on CIFAR-10 dataset by different optimizers, such as SGD, Adam and NAG. The experimental results show that large batchsize speeds up convergence to some degree. However, if the batchsize of per GPU is too small, training process fails to converge. Large number of GPUs, which means a small batchsize on each GPU declines the training performance in distributed training. We tried several ways to reduce the training error on multiple GPUs. According to our results, increasing momentum is a well-behaved method in distributed training to improve training performance on condition of multiple GPUs of constant large batchsize.

Yu Tang, Lujia Yin, Zhaoning Zhang, Dongsheng Li

Pimiento: A Vertex-Centric Graph-Processing Framework on a Single Machine

Here, we describe a method for handling large graphs with data sizes exceeding memory capacity using minimal hardware resources. This method (called Pimiento) is a vertex-centric graph-processing framework on a single machine and represents a semi-external graph-computing system, where all vertices are stored in memory, and all edges are stored externally in compressed sparse row data-storage format. Pimiento uses a multi-core CPU, memory, and multi-threaded data preprocessing to optimize disk I/O in order to reduce random-access overhead in the graph-algorithm implementation process. An on-the-fly update-accumulated mechanism was designed to reduce the time that the graph algorithm accesses disks during execution. Our experiments compared external this method with other graph-processing systems, including GraphChi, X-Stream, and FlashGraph, revealing that Pimiento achieved $$7.5\times , 4\times , 1.6\times $$ better performance on large real-world graphs and synthetic graphs in the same experimental environment.

Jianqiang Huang, Wei Qin, Xiaoying Wang, Wenguang Chen

Software Systems and Programming Models

Frontmatter

Parallel Software Testing Sequence Generation Method Target at Full Covering Tested Behaviors

Parallel software system testing is very difficult because the number of states expands sharply caused by parallel behaviors. In this paper, a testing sequence generation method is proposed, target at full covering tested behaviors and related behaviors over all execution paths. Because of state spaces of parallel software systems are often large, this paper focuses on the state space sub-graph, contained all firing of tested and related behaviors, of system model. Firstly, the software system is modeled with Colored Petri Net (CPN), called system model (SM), and every tested behavior or related behavior is modeled also with CPN, called Behavior Model Unit (BMU). Then, the method proposes mapping operation, intersection operation, and so on to finally realize the generation of test sequence. Practices show that this method is efficient, which could achieve full coverage.

Tao Sun, Xiaoyun Wan, Wenjie Zhong, Xin Guo, Ting Zhang

Accurate Network Flow Measurement with Deterministic Admission Policy

Network management tasks require real-time visibility of current network status to perform the appropriate operations. However, the resource limitation of network devices and the real-time requirements make it difficult to provide accurate network measurement feedbacks. To reduce the error and inefficiencies caused by random operations in existing algorithms, we propose an efficient measurement architecture with the Deterministic Admission Policy (DAP). DAP provides accurate large-flow detection and high network measurement precision by making full use of the information belong to large flows and small flows, and dynamically filtrating small flows as the network status evolves. To make the algorithm easy to implement on hardware, we propose d-Length DAP by replacing the global optimality with local optimality. Experimental results show that our algorithm can reduce the measurement error by 3 to 25 times compared to other algorithms.

Hongchao Du, Rui Wang, Zhaoyan Shen, Zhiping Jia

A Comparison Study of VAE and GAN for Software Fault Prediction

Software fault is an unavoidable problem in software project. How to predict software fault to enhance safety and reliability of system is worth studying. In recent years, deep learning has been widely used in the fields of image, text and voice. However it is seldom applied in the field of software fault prediction. Considering the ability of deep learning, we select the deep learning techniques of VAE and GAN for software fault prediction and compare the performance of them. There is one salient feature of software fault data. The proportion of non-fault data is well above the proportion of fault data. Because of the imbalanced data, it is difficult to get high accuracy to predict software fault. As we known, VAE and GAN are able to generate synthetic samples that obey the distribution of real data. We try to take advantage of their power to generate new fault samples in order to improve the accuracy of software fault prediction. The architectures of VAE and GAN are designed to fit for the high dimensional software fault data. New software fault samples are generated to balance the software fault datasets in order to get better performance for software fault prediction. The models of VAE and GAN are trained on GPU TITAN X. SMOTE is also adopted in order to compare the performance with VAE and GAN. The results in the experiment show that VAE and GAN are useful techniques for software fault prediction and VAE has better performance than GAN on this issue.

Yuanyuan Sun, Lele Xu, Lili Guo, Ye Li, Yongming Wang

A Framework for Designing Autonomous Parallel Data Warehouses

Parallel data platforms are recognized as a key solution for processing analytical queries running on extremely large data warehouses (DWs). Deploying a DW on such platforms requires efficient data partitioning and allocation techniques. Most of these techniques assume a priori knowledge of workload. To deal with their evolution, reactive strategies are mainly used. The BI 2.0 requirements have put large batch and ad-hoc user queries at the center. Consequently, reactive-based solutions for deploying a DW in parallel platforms are not sufficient. Autonomous computing has emerged as a paradigm that allows digital objects managing themselves in accordance with high-level guidance by the means of proactive approaches. Being inspired by this paradigm, we propose in this paper, a proactive approach based on a query clustering model to deploying a DW over a parallel platform. The query clustering triggers partitioning and allocation processes by considering only evolved query groups. Intensive experiments were conducted to show the efficiency of our proposal.

Soumia Benkrid, Ladjel Bellatreche

Distributed and Parallel and Network-based Computing

Frontmatter

Stable Clustering Algorithm for Routing Establishment in Vehicular Ad-Hoc Networks

With regard to the complex and varied urban scenes in Vehicular Ad-Hoc Network, such as the vehicle nodes with fast speed, unstable links and frequent changes in network topology, this paper proposed a stable clustering algorithm to establish routing for VANET. In this algorithm, clustering was first formed according to the Similar Neighbor Node Table. Then on the basis of the Highest Connectivity Algorithm, parameters such as position and speed of the node were introduced to calculate the Selection Priority which were used to produce the preferred and the alternative cluster head node. The introduction of alternative cluster head nodes improved the stability of clustering to a certain extent. Finally, the clustering was established and maintained for six special scenarios. Compared with the traditional clustering algorithm in VANET, it had the characteristics of lower end-to-end delay, higher packet delivery rate and lower change rate of preferred cluster head nodes, which greatly improved the communication quality of VANET.

Jieying Zhou, Pengfei He, Yinglin Liu, Weigang Wu

Utility-Based Location Distribution Reverse Auction Incentive Mechanism for Mobile Crowd Sensing Network

In the mobile crowd sensing network, the existing research does not consider the completion quality factor of the task and the individualized difference of the participant’s ability. The location distribution of the participant will affect the quality of the task and the timeliness of obtaining the sensing task information. Participants in good positions can improve the completion rate of tasks, while participants with good reputation values can ensure the quality of the tasks. In this paper, the distance between the sensing point and the worker is used as one of the criteria for selecting the sensing task object. A utility-based location-distribution reverse auction incentive mechanism (ULDM) is proposed, which comprehensively considers budget constraints, worker’s reputation, and location characteristics in the sensing model, define the distance correlation and time correlation to evaluate the utility of the data collected by the winner. Finally the experimental results show that the successful package delivery rate, average delay and energy consumption are used as evaluation parameters, which improves the quality of task completion and suppresses the selfish behavior of selfish workers, which proves that ULDM has better incentive effect than reputation incentive mechanism.

Chunxiao Liu, Huilin Wang, Yanfeng Wang, Dawei Sun

Safeguarding Against Active Routing Attack via Online Learning

The Border Gateway Protocol (BGP) is a vital protocol on the Internet. However, the BGP is susceptible against the prefix interception attack, how to seek a secure route against prefix interception attacks is an important problem. For this reason, we propose a novel and effective router selection method on the Internet. First, we evaluate resilience of autonomous systems against prefix interception attacks. Second, we evaluate security risk of next-hop routers and we also obtain the historical performance of next-hop routers via online learning. Finally, we compromise the two performance metrics of routers’ resilience and historical performance to choose a secure route. Our method is verified by network simulations. The results show that the proposed method has more resilience against prefix interception attacks.

Meng Meng, Ruijuan Zheng, Mingchuan Zhang, Junlong Zhu, Qingtao Wu

Reliability Aware Cost Optimization for Memory Constrained Cloud Workflows

Due to the increasing number of constituting jobs and input data size, the execution of modern complex workflow-based applications on cloud requires a large number of virtual machines (VMs), which makes the cost a great concern. Under the constraints of VM processing and storage capabilities and communication bandwidths between VMs, how to quickly figure out a cost-optimal resource provisioning and scheduling solution for a given cloud workflow is becoming a challenge. The things become even worse when taking the infrastructure-related failures with transient characteristics into account. To address this problem, this paper proposes a soft error aware VM selection and task scheduling approach that can achieve near-optimal the lowest possible cost. Under the reliability and completion time constraints by tenants, our approach can figure out a set of VMs with specific CPU and memory configurations and generate a cost-optimal schedule by allocating tasks to appropriate VMs. Comprehensive experimental results on well-known scientific workflow benchmarks show that compared with state-of-the-art methods, our approach can achieve up to 66% cost reduction while satisfying both reliability and completion time constraints.

E Cao, Saira Musa, Jianning Zhang, Mingsong Chen, Tongquan Wei, Xin Fu, Meikang Qiu

Null Model and Community Structure in Heterogeneous Networks

Finding different types of communities has become a research hot spot in network science. Plenty of the real-world systems containing different types of objects and relationships can be perfectly described as the heterogeneous networks. However, most of the current research on community detection is applied for the homogeneous networks, while there is no effective function to quantify the quality of the community structure in heterogeneous networks. In this paper, we first propose the null model with the same heterogeneous node degree distribution of the original heterogeneous networks. The probability of there being an edge between two nodes is given to build the modularity function of the heterogeneous networks. Based on our modularity function, a fast algorithm of community detection is proposed for the large scale heterogeneous networks. We use the algorithm to detect the communities in the real-world twitter event networks. The experimental results show that our method perform better than other exciting algorithms and demonstrate that the modularity function of the heterogeneous networks is an effective parameter that can be used to quantify the quality of the community structure in heterogeneous networks.

Xuemeng Zhai, Wanlei Zhou, Gaolei Fei, Hangyu Hu, Youyang Qu, Guangmin Hu

Big Data and Its Applications

Frontmatter

An Asynchronous Algorithm to Reduce the Number of Data Exchanges

Communication or data movement cost is significantly higher than computation cost in existing large-scale clusters, for clusters having long network latency. For high-frequency parallel iterative applications, performance bottleneck is the long network latency caused by frequent data exchange. This paper presents an asynchronous algorithm capable of reducing the number of data exchanges among processes of parallel iterative applications. The proposed algorithm has been tested on a stencil-based parallel computation and compared with a BSP implementation of the same application. The asynchronous algorithm can effectively reduce the number of data exchanges at the expense of higher computation overhead and larger message size, performance can be improved up to 2.8x.

Zhuo Tian, Yifeng Chen, Lei Zhang

Two-Stage Clustering Hot Event Detection Model for Micro-blog on Spark

With the rapid development of micro-blog, it has become one of the main platforms to publish news and express opinions. Micro-blog analyzing for hot event detection is widely concerned by researchers. However, hot event detection is not easy because micro-blog blogs have the characteristics of large scale, short text and irregular grammar. In order to improve the performance of hot event detection, a two-stage clustering hot event detection model for micro-blog is proposed. The model is designed in spark environment and divided into two parts. First, K-Means method is improved by threshold setting and cosine similarity to cluster blogs. Then, the result of blogs clustering is clustered again to detect hot events by LDA (Latent Dirichlet Allocation) model. Sufficient experiments have been carried out in spark environment, it is shown that the proposed model gains higher accuracy and time efficiency for hot event detection.

Ying Xia, Hanyu Huang

Mobility-Aware Workflow Offloading and Scheduling Strategy for Mobile Edge Computing

Currently, Mobile Edge Computing (MEC) is widely used in different smart application scenarios such as smart health, smart traffic and smart home. However, smart end devices are usually constrained in battery and computing power, and hence how to optimize the energy consumption of end devices with intelligent task offloading and scheduling strategies under constraints such as deadlines is a critical yet challenging topic. Meanwhile, most existing studies do not consider the mobility of end devices during task execution but in reality end devices may need to be constantly moving in a MEC environment. In this paper, motivated by a patient health monitoring scenario, we propose a Mobility-Aware Workflow Offloading and Scheduling Strategy (MAWOSS) for MEC which provides a holistic approach that covers the workflow task offloading strategy, the workflow task scheduling algorithm and the workflow task migration strategy. Comprehensive experimental results show that compared with others, MAWOSS is able to achieve the optimal fitness with lower energy consumption and smaller workflow makespan under the deadlines.

Jia Xu, Xuejun Li, Xiao Liu, Chong Zhang, Lingmin Fan, Lina Gong, Juan Li

HSPP: Load-Balanced and Low-Latency File Partition and Placement Strategy on Distributed Heterogeneous Storage with Erasure Coding

To speedup the accesses to massive amount of data, heterogeneous architecture has been widely adopted in the mainstream storage system. In such systems, load imbalance and scheduler overhead are the primary factors that slow down the I/O performance. In this paper, we propose an effective file scheduling strategy HSPP that includes statistic based file classification, partition with erasure coding and adaptive data placement to optimize load balance and read latency on the distributed heterogeneous storage system. The experiment results show that HSPP is superior than existing strategies in terms of load balance, read latency, and scheduling overhead.

Jiazhao Sun, Yunchun Li, Hailong Yang

Adaptive Clustering for Outlier Identification in High-Dimensional Data

High-dimensional data brings new challenges and opportunities for domains such as clinical, scientific and industry data. However, the curse of dimensionality that comes with the increased dimensions causes outlier identification extremely difficult because of the scattering of data points. Furthermore, clustering in high-dimensional data is challenging due to the intervention of irrelevant dimensions where a dimension may be relevant for some clusters and irrelevant for others. To address the curse of dimensionality in outlier identification, this paper presents a novel technique that generates candidate subspaces from the high-dimensional space and refines the identification of potential outliers from each subspace using a novel iterative adaptive clustering approach. Our experimental results show that the technique is effective.

Srikanth Thudumu, Philip Branch, Jiong Jin, Jugdutt (Jack) Singh

Penguin Search Aware Proactive Application Placement

With the huge proliferation of IoT devices, new challenges have been raised. These IoT devices generate a huge amount of data, instantly. In addition, they are time sensitive, geographically distributed, require high bandwidth, and location awareness. In order to cope with these challenges, recent studies have allowed exploring a new paradigm so-called fog computing. This latter extends Cloud computing at the edge of the network. Fog computing is an intermediate layer that facilitates the deployment of IoT applications by leveraging new characteristics such as support of mobility, location-awareness, and lower latency. However, its limited resources arise the problem of resource provisioning which has an impact on the application placement decisions. In this paper, we focus on the mobile application placement problem in hybrid Cloud-Fog environment. We have considered both delay-sensitive and delay-tolerant applications. Hence, we propose an exact solution as well as a new approach based on penguin search metaheuristic named PsAAP to fulfill the dynamic demands as well as the application’s QoS requirements. To evaluate the proposed approach, we introduce a mobile scenario including three different types of applications. Moreover, we compare the suggested policy with the exact solution, baseline algorithms, heuristic, and metaheuristic methods. Experiments have been conducted using CPLEX and IfogSim-simulator. The final results show the effectiveness of the proposed approach.

Amira Rayane Benamer, Hana Teyeb, Nejib Ben Hadj-Alouane

A Data Uploading Strategy in Vehicular Ad-hoc Networks Targeted on Dynamic Topology: Clustering and Cooperation

Vehicular Ad-hoc Network (VANET) is a special network composed of driving vehicles with dynamic topology. Data uploading from the VANET to the computation server is a challenging issue due to the high mobility of vehicles. By introducing the Mobile Edge Computing (MEC) server deployed on the roadside, this paper proposes a stable clustering strategy based on adjacency screening and designs an Intra-Cluster Data Uploading (ICDU) algorithm to improve the efficiency of data uploading in a dynamic environment. The connection lifetime between vehicles is taken as a key indicator for our proposed clustering strategy to form stable clusters. After the formation of clusters, the ICDU algorithm plans a stable path for vehicles in a cluster to upload data in a cooperative method. Extensive simulation results show that the proposed clustering strategy performs better in terms of the clustering stability compared with Vehicular Multi-hop algorithm for Stable Clustering (VMaSC) and the greedy clustering strategy. The results also prove that our proposed ICDU algorithm outperforms the self-uploading algorithm and can achieve a larger data uploading throughput in the dense scenario compared with the greedy-uploading algorithm.

Zhipeng Gao, Xinyue Zheng, Kaile Xiao, Qian Wang, Zijia Mo

Cloud Server Load Turning Point Prediction Based on Feature Enhanced Multi-task LSTM

Cloud workload turning point is either a local peak point which stands for workload pressure or a local valley point which stands for resource waste. The local trend on both sides of it will reverse. Predicting such kind of point is the premise to give warnings to the system managers to take precautious measures. Comparing with the value base workload predication approach, turning point prediction can provide information about the changing trend of future workload i.e. downtrend or uptrend. So more elaborate resource management schemes can be adopted for these rising and falling trends. This paper is the first study of deep learning based server workload turning point prediction in cloud environment. A well-designed deep learning based model named Feature Enhanced multi-task LSTM is introduced. Novel fluctuate features are proposed along with the multi-task and feature enhanced mechanisms. Experiments on the most famous Google cluster trace demonstrate the superiority of our model comparing with five state-of-the-art models.

Li Ruan, Yu Bai, Limin Xiao

Distributed and Parallel Algorithms

Frontmatter

Neuron Fault Tolerance Capability Based Computation Reuse in DNNs

For applications of speech and video, the consecutive inputs exhibit a high degree of similarity, hence, some results of previous execution can be reused. The technique of quantization can efficiently increase the similarity of consecutive inputs. However, when using quantization, the smaller the number of quantization bits the higher the similarity, as the inputs are constrained to a smaller set of values, but the larger the accuracy loss since input errors are increased. Therefore, we observe that existing reuse schema just applied unique the number of quantization bits in the entire network. If the number of quantization bits is too long, it will directly reduce the similarity between the inputs and thus reduce the reuse ratio. Hence, it is important that exploits the tradeoff among the number of quantization bits, reuse rate, and accuracy. There is an opportunity to significantly improve the performance and efficiency of DNN execution by use multiple quantization bits simultaneously according to the technique of neuron criticality analysis. To do so, we propose a novel reuse schema called Mquans based on neuron criticality analysis without accuracy loss. And evaluation results show that our proposed design achieves 2.7 speedups and 38% energy saving on average over the baseline.

Pengnian Qi, Jing Wang, Xiaoyan Zhu, Weigong Zhang

Reliability Enhancement of Neural Networks via Neuron-Level Vulnerability Quantization

Neural networks are increasingly used in recognition, mining and autonomous driving. However, for safety-critical applications, such as autonomous driving, the reliability of NN is an important area that remains largely unexplored. Fortunately, NN itself has fault-tolerance capability, especially, different neurons have different fault-tolerance capability. Thus applying uniform error protection mechanism while ignore this important feature will lead to unnecessary energy and performance overheads. In this paper, we propose a neuron vulnerability factor (NVF) quantifying the neural network vulnerability to soft error, which could provide a good guidance for error-tolerant techniques in NN. Based on NVF, we propose a computation scheduling scheme to reduce the lifetime of neurons with high NVF. The experiment results show that our proposed scheme can improve the accuracy of the neural network by 12% on average, and greatly reduce the fault-tolerant overhead.

Keyao Li, Jing Wang, Xin Fu, Xiufeng Sui, Weigong Zhang

A Fault Detection Algorithm for Cloud Computing Using QPSO-Based Weighted One-Class Support Vector Machine

The complexity and diversity of cloud computing bring about cloud faults, which affect the quality of services. Existing fault detection methods suffer problems such as low efficiency and low accuracy. In order to improve the reliability of the cloud data center, a fault detection algorithm based on weighted one-class support vector machine (WOCSVM) is proposed to detect and identify the host faults in the cloud data center. Specifically, first, we conduct correlation analysis among monitoring metrics and select key ones for reducing the complexity. Second, for imbalanced monitoring dataset, one-class support vector machine is used to detect and identify host faults, and a weight allocation strategy is proposed to assign weights to the samples, which describes the importance of different sample points in order to improve detection accuracy on potential faults. Finally, for the purpose of increasing the accuracy further, the parameters are set via a parameter optimization algorithm based on quantum-behaved particle swarm optimization (QPSO). Furthermore, experiments by comprising with similar algorithms, demonstrate the superiority of our algorithm under different classification indicators.

Xiahao Zhang, Yi Zhuang

ParaMoC: A Parallel Model Checker for Pushdown Systems

Model checking on Pushdown Systems (PDSs) has been extensively used to deal with numerous practical problems. However, the existing model checkers for pushdown systems are executed on the central processing unit (CPU), the performance is hampered by the computing power of the CPU. Compared with the CPU, the graphics processing unit (GPU) has more processing cores, which are suitable and efficient for parallel computing. Therefore, it is very attractive to accelerate model checking of PDSs on the GPU. In this paper, we present a new parallel model checker, named ParaMoC, to speed up the performance of model checking problems for pushdown systems (PDSs). Moreover, we focus on how to use Graphics Processing Units (GPUs) to accelerate the reachability verification and the LTL model checking of PDSs. The ParaMoC running on a state-of-the-art GPU can be 100 times faster than the traditional PDS model checker.

Hansheng Wei, Xin Ye, Jianqi Shi, Yanhong Huang

FastDRC: Fast and Scalable Genome Compression Based on Distributed and Parallel Processing

With the advent of next-generation sequencing technology, sequencing costs have fallen sharply compared to the previous sequencing technologies. Genomic big data has become the significant big data application. In the face of growing genomic data, its storage and migration face enormous challenges. Therefore, researchers have proposed a variety of genome compression algorithms, but these algorithms cannot meet the processing requirements for large amount of biological data and high processing speed. This manuscript proposes a parallel and distributed referential genome compression algorithm-Fast Distributed Referential Compression (FastDRC). This algorithm compresses a large number of genomic sequences in parallel under the Apache Hadoop distributed computing framework. Experiments show that the compression efficiency of the FastDRC is greatly improved when it compresses large quantities of genomic data. Moreover, FastDRC leads to the only distributed computing method known to us in the field of genome compression. The source code for FastDRC can be obtained from this link: https://github.com/GhostCCCatHenry/FastDRC .

Yimu Ji, Houzhi Fang, Haichang Yao, Jing He, Shuai Chen, Kui Li, Shangdong Liu

A Parallel Approach to Advantage Actor Critic in Deep Reinforcement Learning

Deep Reinforcement learning (DRL) algorithms recently still take a long time to train models in many applications. Parallelization has the potential to improve the efficiency of DRL algorithms. In this paper, we propose an parallel approach (ParaA2C) for the popular Actor-Critic (AC) algorithms in DRL, to accelerate the training process. Our work considers the parallelization of the basic advantage actor critic (Serial-A2C) in AC algorithms. Specifically, we use multiple actor-learners to mitigate the strong correlation of data and the instability of updating, and finally reduce the training time. Note that we assign each actor-learner MPI process to a CPU core, in order to prevent resource contention between MPI processes, and make our ParaA2C approach more scalable. We demonstrate the effectiveness of ParaA2C by performing on Arcade Learning Environment (ALE) platform. Notably, our ParaA2C approach takes less than 10 min to train in some commonly used Atari games when using 512 CPU cores.

Xing Zhu, Yunfei Du

Applications of Distributed and Parallel Computing

Frontmatter

Blockchain-PUF-Based Secure Authentication Protocol for Internet of Things

Devices constituting the Internet of Things (IoT) have become widely used, therein generating a large amount of sensitive data. The communication of these data across IoT devices and over the public Internet makes them susceptible to several cyber attacks. In this paper, we propose an efficient blockchain approach based on the secret computational model of a physically unclonable function (PUF). The proposed framework aims to guarantee authentication of the devices and the miner with a faster verification process compared to existing blockchain techniques. Furthermore, the combination of the blockchain and PUF allows us to propose an efficient framework that guarantees data provenance and data integrity in IoT networks. The proposed framework employs PUFs, which provide unique hardware fingerprints for establishing data provenance. Smart contracts based on the blockchain provide a decentralized digital ledger that is able to resist data tampering attacks.

Akash Suresh Patil, Rafik Hamza, Hongyang Yan, Alzubair Hassan, Jin Li

Selective Velocity Distributed Indexing for Continuously Moving Objects Model

The widespread of GPS embedded devices has lead to a ubiquitous location dependent services, based on the generated real-time location data. This introduced the notion of continuous querying, and with the aid of advanced indexing techniques several complex query types could be supported. However the efficient querying and manipulation of such highly dynamic data is not trivial, processing factors of crucial importance should be carefully thought out such as accuracy and scalability. In this study we focus on Continuous KNN (CKNN) queries processing, one of the most well-know spatio-temporal queries over large scale of continuously moving objects. In this paper we provide an overview of CKNN queries and related challenges, as well as an outline of proposed works in the literature and their limitations, before getting to our contribution proposal. We propose a novel indexing approach model for CKNN querying, namely VS-TIMO. The proposed structure is based on a selective velocity partitioning method, since we have different objects with varying speeds. Our structure base unit is a comprised of a non overlapping R-tree and a two dimensions grid. In order to enhance performances, we design a compact multi-layer index structure on a distributed setting, and propose a CKNN search algorithm for accurate results using a candidate cells identification process. We provide a comprehensive vision of our indexing model and the adopted querying technique.

Imene Bareche, Ying Xia

A New Bitcoin Address Association Method Using a Two-Level Learner Model

Users in the Bitcoin system adopt a pseudonym-Bitcoin address as the transaction account, making Bitcoin address correlation analysis a challenging task. Under this circumstance, this paper provides a new Bitcoin address association scheme which makes address tracing possible in Bitcoin systems. The proposed scheme can be used to warn relevant institutions to study more secure encryption algorithms to protect users’ privacy. Specifically, the important features of a Bitcoin address are extracted. After that, to reduce the computational complexity, we transform the clustering problem of addresses into a binary classification problem in which we integrate the features of two Bitcoin addresses. A novel two-level learner model is then built to analyze if the two Bitcoin addresses are belonging to the same user. Finally we cluster the addresses belonging to the same user accordingly. Extensive experimental results show that the proposed method outperforms the other address association schemes, which use deep learning models or heuristics, and can achieve an increase by 6%–20% in precision and by 10% improvement in recall.

Tengyu Liu, Jingguo Ge, Yulei Wu, Bowei Dai, Liangxiong Li, Zhongjiang Yao, Jifei Wen, Hongbin Shi

Fog Computing Based Traffic and Car Parking Intelligent System

Internet of Things (IoT) has attracted the attention of researchers from both industry and academia. Smart city, as one of the IoT applications, includes several sub-applications, such as intelligent transportation system (ITS), smart car parking and smart grid. Focusing on traffic flow management and car parking systems because of their correlation, this paper aims to provide a framework solution to both systems using online detection and prediction based on fog computing. Online event detection plays a vital role in traffic flow management, as circumstances, such as social events and congestion resulting from accidents and roadworks, affect traffic flow and parking availability. We developed an online prediction model using an incremental decision tree and distributed the prediction process on fog nodes at each intersection traffic light responsible for a connecting road. It effectively reduces the load on the communication network, as the data is processed, and the decision is made locally, with low storage requirements. The spatially correlated fog nodes can communicate if necessary to take action for an emergency. The experiments were conducted using the Melbourne city open data.

Walaa Alajali, Shang Gao, Abdulrahman D. Alhusaynat

Service Dependability and Security

Frontmatter

Semi-supervised Deep Learning for Network Anomaly Detection

Deep learning promotes the fields of image processing, machine translation and natural language processing etc. It also can be used in network anomaly detection. In practice, it is not hard to obtain normal instances. However, it is always difficult to label anomalous instances. Semi-supervised learning can be utilized to resolve this problem. In this paper, we make a comprehensive study of semi-supervised deep learning techniques for network anomaly detection. Three kinds of deep learning techniques including GAN (Generative Adversarial networks), Auto-encoder and LSTM (Long Short-Term Memory) are studied on the latest network traffic dataset of CICIDS2017. Five deep architectures based on semi-supervised learning are designed, including BiGAN, regular GAN, WGAN, Auto-encoder and LSTM. Seven schemes of semi-supervised deep learning for anomaly detection are proposed according to different functions of anomaly score. Grid search is utilized to find the threshold of anomaly detection. Two traditional schemes of machine learning are also adopted to compare performance. There are altogether nine schemes of anomaly detection for CICIDS2017. From results of the experiment for network anomaly detection, it can be found that Auto-encoder outperforms LSTM and the three kinds of GAN. BiGAN and LSTM are both better than WGAN and regular GAN. All the seven schemes of semi-supervised deep learning for anomaly detection outperform the two traditional schemes. The work and results in this paper are meaningful on the application of semi-supervised deep learning for network anomaly detection.

Yuanyuan Sun, Lili Guo, Ye Li, Lele Xu, Yongming Wang

A Vulnerability Assessment Method for Network System Based on Cooperative Game Theory

It is very important for administrators to understand the severity of vulnerabilities in network systems. Although many systems such as CVSS can evaluate individual vulnerabilities, they do not take into account the specific environment, so the results are not helpful. In our paper, we construct a vulnerability dependency graph by modeling the complex dependencies between vulnerabilities, and introduce the Shapley value in the cooperative game. We consider an attack path as a cooperation between the vulnerability nodes, and use Access Complexity as the attack cost of each node, define the characteristic function in the cooperative. Finally, according to the Shapley value of each node, all the vulnerabilities are ranked, and the administrator can patch the high-rank vulnerabilities with the limited security resources. Our experimental results demonstrate that show that our method can more effectively assess the severity of vulnerabilities in specific environments.

Chenjian Duan, Zhen Wang, Hong Ding, Mengting Jiang, Yizhi Ren, Ting Wu

Enhancing Model Performance for Fraud Detection by Feature Engineering and Compact Unified Expressions

The performance of machine learning models can be improved in a variety of ways including segmentation, treating missing and outlier values, feature engineering, feature selection, multiple algorithms, algorithm tuning/compactness and ensemble methods. Feature engineering and compactness of the model can have a significant impact on the algorithm’s performance but usually requires detailed domain knowledge. Accuracy and compactness of machine learning models are equally important for optimal memory and storage needs. The research in this paper focuses on feature engineering and compactness of rulesets. Compactness of the ruleset can make the algorithm more efficient and derivation of new features makes the dataset high dimensional potentially resulting in higher accuracy. We have developed a technique to enhance model’s performance with feature engineering and compact unified expressions for dataset of unknown domain using profile models approach. Classification accuracy is compared using well-known classifiers (Decision Tree, Ripple Down Rule and RandomForest). This technique is applied on fraud analysis bank dataset and multiple synthetic bank datasets. Empirical evaluation has shown that not only the ruleset size of training and prediction dataset is reduced but performance is also improved in other performance metrics including classification accuracy. In this paper, the transformed data is used for the experimental validation and development of fraud detection technique, but it can be used in other domains as well especially for scalable and distributed systems.

Ikram Ul Haq, Iqbal Gondal, Peter Vamplew

Network Intrusion Detection Framework Based on Embedded Tree Model

Network intrusion detection system plays a vital role in network security protections that could be used to protect personal privacy and property security so as to protect users from attackers. However, there are a few samples of attack types with various characteristics. To solve the problem of class-imbalance in network security and correctly detect the attack, this paper proposes a network intrusion detection framework: random forest and gradient boosting decision tree (RF-GBDT). Random forest model is used for feature transformation and gradient boosting decision tree model is used for classification. RF-GBDT was used on the UNSW-NB15 dataset in which only 8 features were selected for training and a large number of irrelevant features were deleted. RF-GBDT not only reduced the training time but also improved the detection rate. The experiment result shows that RF-GBDT model has a higher detection rate and lower false alarm rate compared with other relative algorithms.

Jieying Zhou, Pengfei He, Rongfa Qiu, Weigang Wu

Generative Adversarial Nets Enhanced Continual Data Release Using Differential Privacy

In the era of big data, increasing massive volume of data is generated and published consecutively for both research and commercial purposes. The potential value of sensitive information also attracts interest from adversaries and thereby arises public concern. Current research mostly focuses on privacy-preserving data release in a statistic manner rather than taking the dynamics and correlation of context into consideration. Motivated by this, a novel idea is proposed by combining differential privacy and generative adversarial nets. Generative adversarial nets and its extensions are used to generate a synthetic data set with indistinguishable statistic features while differential privacy guarantees a trade-off between the privacy protection and data utility. Extensive simulation results on real-world data set testify the superiority of the proposed model in terms of privacy protection and improved data utility.

Stella Ho, Youyang Qu, Longxing Gao, Jianxin Li, Yong Xiang

Data Poisoning Attacks on Graph Convolutional Matrix Completion

Recommender systems have been widely adopted in many web services. As the performance of the recommender system will directly affect the profitability of the business, driving bad merchants to boost revenue for themselves by conducting adversarial attacks to compromise the effectiveness of such systems. Several studies have shown that recommender systems are vulnerable to adversarial attacks, e.g. data poisoning attack. Since different recommender systems adopt different algorithms, existing attacks are designed for specific systems. In recent years, with the development of graph deep learning, recommender systems have been also starting to use new methods, like graph convolutional networks. More recently, graph convolutional networks have also been found to be affected by poisoning attacks. However, characteristics of data sources in recommender systems, such as heterogeneity of nodes and edges, will bring challenge to solve attack problem. To overcome this challenge, in this paper, we propose data poisoning attacks on graph convolutional matrix completion (GCMC) recommender system by adding fake users. The key point of the method is to make fake users mimicrking rating behavior of normal users, then pass the information of thier rating behaviors towards the target item back to related normal users, attempting to interfere with the prediction of the recommender system. Futhermore, on two real-world datasets ML-100K and Flixster, the results show that our method significantly overmatches three baseline methods: (i) random attack, (ii) popular item based attack, (iii) and mimicry with random scores based attack.

Qi Zhou, Yizhi Ren, Tianyu Xia, Lifeng Yuan, Linqiang Chen

Secure Data Deduplication with Resistance to Side-Channel Attacks via Fog Computing

Deduplication could greatly save the storage overhead of cloud server by eliminating duplicated data and retaining one copy. In order to ensure the data privacy, many researchers try to make deduplication feasible in ciphertext. A typical scheme is message-locked encryption (MLE) which takes cryptographic hash value of message as encryption key. However, MLE is vulnerable to side-channel attacks. To our knowledge, the existing schemes try to mitigate these attacks with either security drawbacks or expensive overhead. In this paper, we propose two new techniques to solve two typical side-channel attacks named probe attack and key-cache attack via fog computing with new security and efficiency tradeoffs. Built on the new techniques, we propose a secure data deduplication system in fog computing environment. Our evaluation shows that our system has better performance compared with previous works.

Fuyou Zhang, Saiyu Qi, Haoran Yuan, Meng Zhang

Practical IDS on In-vehicle Network Against Diversified Attack Models

A vehicle bus is a specialized internal communication network that interconnects components inside a vehicle. The Controller Area Network (CAN bus), a robust vehicle bus standard, allows microcontrollers and devices to communicate with each other. The community has seen many security breach examples that exploit CAN functionalities and other in-vehicle flaws. Intrusion detection systems (IDSs) on in-vehicle network are advantageous in monitoring CAN traffic and suspicious activities. Whereas, existing IDSs on in-vehicle network only support one or two attack models, and identifying abnormal in-vehicle CAN traffic against diversified attack models with better performance is more expected as can be then implemented practically. In this paper, we propose an intrusion detection system that can detect many different attacks. The method analyzes the CAN traffic generated by the in-vehicle network in real time and identifies the abnormal state of the vehicle practically. Our proposal fuses the autoencoder trick to the SVM model. More precisely, we introduce to the system an autoencoder that learns to compress CAN traffic data into extracted features (which can be uncompressed to closely match the original data). Then, the support vector machine is trained on the features to detect abnormal traffic. We show detailed model parameter configuration by adopting several concrete attacks. Experimental results demonstrate better detection performance (than existing proposals).

Junchao Xiao, Hao Wu, Xiangxue Li, Yuan Linghu

Ultragloves: Lowcost Finger-Level Interaction System for VR-Applications Based on Ultrasonic Movement Tracking

The VR technology is undergoing a series of evolution, including not only the higher video quality but also more fluent human-computer interaction operations. Current solution mainly use camera-based or specified equipment-based hand gesture recognition, where the former suffers from low accuracy and limited interaction capacity while the latter suffers from expensive cost. In this paper, we propose Ultragloves, which is a high accurate finger-level and low cost gesture interaction system for VR and smartphones. Specifically, it is enabled by the gloves implanting multiple microphones with which the ultrasonic signal are played. In the VR or mobile devices, the hand gestures are rebuilt with the recorded FMCW-signal and the hand motion model. Furthermore, to improve the accuracy and calculation speed, we propose a parallel processing algorithm for the signals, which could greatly accelerate our solution to real time manner even in COTS smartphones. The real implementation based experiments shows that, Ultragloves could capture the gestures in the accuracy of 2 cm while the parallel algorithm could accelerate the solution with about 3 times.

Si Li, Yanchao Zhao, Chengyong Liu

Adaptive Detection Method for Packet-In Message Injection Attack in SDN

Packet-In message injection attack is severe in Software Defined Network (SDN), which will cause a single point of failure of the centralized controller and the crash of the entire network. Nowadays, there are many detection methods for it, including entropy detection and so on. We propose an adaptive detection method to proactively defend against this attack. We establish a Poisson probability distribution detection model to find the attack and use the flow table filter to mitigate it. We also use the EWMA method to update the expectation value of the model to adapt the actual network conditions. Our method has no need to send additional packets to request the switch information. The experiment results show that there is 92% true positive rate in case of attack with random destination IP packets injected, and true positive rate is 98.2% under the attack with random source IP packets injected.

Xinyu Zhan, Mingsong Chen, Shui Yu, Yue Zhang

PMRS: A Privacy-Preserving Multi-keyword Ranked Search over Encrypted Cloud Data

In cloud computing, data owners outsource their data to clouds for saving cost of data storage and computation. However, while enjoying the benefits of cloud computing, users have to face the risk that sensitive outsourced data could be leaked. This paper proposes a privacy-preserving multi-keyword ranked search scheme over encrypted cloud data, which adopts a novel two-layer complete binary tree index structure. The upper layer index is used to filter the candidate documents while the lower layer index is used to prune those unqualified documents, and then the search result is efficiently determined. Security analysis is presented which indicates that the proposed scheme is capable of preserving the privacy of outsourced data. Experiment results show that the proposed scheme has good performance in terms of search time cost.

Jingjing Bao, Hua Dai, Maohu Yang, Xun Yi, Geng Yang, Liang Liu

Privacy-Preserving Fine-Grained Outsourcing PHR with Efficient Policy Updating

Personal Health Record (PHR) is a novel way of managing individual health. With the help of cloud computing and Ciphertext-Policy Attribute-Based Encryption (CP-ABE), the fine-grained access control, as well as authenticated sharing, can be realized. However, the importance of policy preserving has emerged to be a major security risk in the PHR cloud sharing scenario. Besides, cloud-assist policy updating, as a practical functionality of the PHR system, should also be taken into consideration. In this paper, we present an integrated policy preserving PHR scheme with efficient policy updating. The scheme is proved to be selectively secure under q-BDHE assumption. The evaluation result indicates that our scheme can reach a balanced trade-off in terms of privacy and efficiency.

Zuobin Ying, Wenjie Jiang, Ximeng Liu, Maode Ma

Lightweight Outsourced Privacy-Preserving Heart Failure Prediction Based on GRU

The medical service provider establishes a heart failure prediction model with deep learning technology to provide remote users with real-time and accurate heart failure prediction services. Remote users provide their health data to the health care provider for heart failure prediction through the network, thereby effectively avoiding the damage or death of vital organs of the patient due to the onset of acute heart failure. Obviously, sharing personal health data in the exposed data sharing environment would lead to serious privacy leakage. Therefore, in this paper, we propose a privacy-preserving heart failure prediction (PHFP) system based on Secure Multiparty Computation (SMC) and Gated Recurrent Unit (GRU). To meet the real-time requirements of the PHFP system, we designed a series of data interaction protocols based on additional secret sharing to achieve lightweight outsourcing computing. Through these protocols, we can protect the user’s health data privacy while ensuring the efficiency of the heart failure prediction model. At the same time, to provide high-quality heart failure prediction services, we also use the new mathematical fitting method to directly construct the safety activation function, which reduces the number of calls to the security protocol and optimizes the accuracy and efficiency of the system. Besides, we built a security model and analyzed the security of the system. The experimental results show that PHFP takes into account the safety, accuracy, and efficiency in the application of heart failure prediction.

Zuobin Ying, Shuanglong Cao, Peng Zhou, Shun Zhang, Ximeng Liu

DAPS: A Decentralized Anonymous Payment Scheme with Supervision

With the emergence of blockchain-based multi-party trading scenarios, such as finance, government work, and supply chain management. Information on the blockchain poses a serious threat to users’ privacy, and anonymous transactions become the most urgent need. At present, solutions to the realization of anonymous transactions can only achieve a certain degree of trader identity privacy and transaction content privacy, so we introduce zero knowledge proof to achieve complete privacy. At the same time, unconditional privacy provides conditions for cybercrime. Due to the great application potential of the blockchain in many fields, supporting privacy protection and supervision simultaneously in the blockchain is a bottleneck, and existing works can not solve the problem of coexistence of privacy protection and supervision.This paper takes the lead in studying the privacy and supervision in multi-party anonymous transactions, and proposes a distributed anonymous payment scheme with supervision (DAPS) based on zk-SNARK, signature, commitment and elliptic curve cryptography, which enables users to be anonymous under supervision in transactions. The advantages of DAPS are twofold: enhanced privacy and additional supervision. We formally discussed the security of the whole system framework provided by the zero-knowledge proof, and verified its feasibility and practicability in the open source blockchain framework BCOS.

Zhaoyang Wang, Qingqi Pei, Xuefeng Liui, Lichuan Ma, Huizhong Li, Shui Yu

An Approach of Secure Two-Way-Pegged Multi-sidechain

As a temper-resistant ledger, blockchain ensures integrity of transaction information among trust-less participants in peer to peer network. Thus, blockchain has attracted enormous research interests in the past decade due to its prior application in cryptocurrency, financial auditing, supply chain management, etc. However, blockchain scalability limitations has impeded blockchain technology from large scale commercial applications. Since blockchain is low in throughput, blockchain in incapable of handling large scale asset transfer. To tackle this problem, in this paper, we propose a secure multi-sidechain system. The proposed approach can transfer assets simultaneously to increase throughput. In addition, security of assets during transfer was also ensured by implementing firewall property. Detailed adversarial analysis shows this proposed approach can (1) prevent double spending and transaction ordering dependence attacks during asset transfer, (2) protect mainchain from sidechain’s catastrophic failure, (3) apply multi-sidechain model with different functions in each sidechain.

Jinnan Guo, Keke Gai, Liehuang Zhu, Zijian Zhang

IoT and CPS Computing

Frontmatter

DCRRDT: A Method for Deployment and Control of RFID Sensors Under Digital Twin-Driven for Indoor Supervision

In the field of indoor supervision based on RFID, the quality of monitoring is affected by how many and where the RFID sensors are deployed. Due to the limitation of time and workforce, It is a key problem to improve efficiency and to reduce the complexity of deployment. We propose a deployment & control scheme of RFID sensors based on digital-twin technology. The constructed digital-twin model can simulate the state, the performance, and the activity of physical entities. In this paper, we predict and analyze based on digital-twin technology to solve the problems of re-design & re-deployment. We further achieve the goal of saving deployment time & workforce through the intuition and virtual simulation of digital-twin. We take three problem scenarios to demonstrate the proposed RFID sensor deployment & control scheme is highly efficient and resource-saving.

Siye Wang, Mengnan Cai, Qinxuan Wu, Yijia Jin, Xinling Shen, Yanfang Zhang

A Binary Code Sequence Based Tracking Algorithm in Wireless Sensor Networks

This paper proposes a binary code sequence based tracking algorithm in wireless sensor network. The proposed algorithm can release the influence of sensed data on localization results via building the map between target’s occurrence region and a binary code sequence. To solve the ambiguity problem existing in occurrence region determination, the paper further gives a Voronoi diagram based location refinement algorithm. The simulation results show the tracking results under difference trajectories.

Yang Zhang, Qianqian Ren, Yu Pan, Jinbao Li

Sampling Based Katz Centrality Estimation for Large-Scale Social Networks

Katz centrality is a fundamental concept to measure the influence of a vertex in a social network. However, existing approaches to calculating Katz centrality in a large-scale network is unpractical and computationally expensive. In this paper, we propose a novel method to estimate Katz centrality based on graph sampling techniques. Specifically, we develop an unbiased estimator for Katz centrality using a multi-round sampling approach. We further propose SAKE, a Sampling based Algorithm for fast Katz centrality Estimation. We prove that the estimator calculated by SAKE is probabilistically guaranteed to be within an additive error from the exact value. The computational complexity of SAKE is much lower than the state-of-the-arts. Extensive evaluation experiments based on four real world networks show that the proposed algorithm achieves low mean relative error with low sampling rate, and it works well in identifying high influence vertices in social networks.

Mingkai Lin, Wenzhong Li, Cam-tu Nguyen, Xiaoliang Wang, Sanglu Lu

Location Prediction for Social Media Users Based on Information Fusion

The real locations of social media users have always been a hot spot for people. However, considering personal privacy and other factors, most locations provided by users are ambiguous, missing or wrong. In order to get users’ real location, we collect various types of geographic related information from users in social networks and propose an information fusion network model to organize the information efficiently. After that, we take advantage of the iterative-based information fusion method to process the geographical related information in the information fusion network and the outputs are used as users’ geographical location. Finally, the experimental results show that our research method can greatly improve the prediction accuracy and reduce the corresponding distance error.

Gaolei Fei, Yang Liu, Yong Cheng, Fucai Yu, Guangmin Hu

Performance Modelling and Evaluation

Frontmatter

Concurrent Software Fine-Coarse-Grained Automatic Modeling Method for Algorithm Error Detection

Concurrent software state space explodes, which makes algorithm error detection difficult. This paper proposes a fine-coarse-grained automatic modeling method. Based on the JAVA concurrent program, we generate the HCPN (Hierarchical Coloured Petri Net) fine-coarse-grained model that in accordance with the behavior of the source program automatically. The goal is to detect the algorithm errors in the program through the model checking technology. We complete the modeling of interactive, property-related and specific structure statements through fine-grained method and complete the modeling of other statements through coarse-grained method. Avoid the state space explosion effectively under the premise of retaining the interaction behavior and the property-related behavior execution path. This paper verifies the effect of fine-coarse-grained automatic modeling method by comparing and analyzing the experimental results.

Tao Sun, Jing Zhang, Wenjie Zhong

EC-ARR: Using Active Reconstruction to Optimize SSD Read Performance

Solid State Drive (SSD) has been becoming mainstream storage for its high performance, affordability proportional to its growing storage capacity. However, some inborn characteristics still limit its widespread application: (1) It wears out easily with increasing times of being written/erased. Therefore, SSDs are generally equipped with dedicated Erasure Coding (EC) modules for reliability concerns. However, the EC modules are only statically useful in the sheer scenarios of data loss. In other words, the EC module is never exploited in the regular access situations of dominating frequency, where data is unharmed and intact. (2) Huge latency differences exist among its three basic operations of reading, writing, and erasing, which could lead to performance degradation if there is no proper I/O scheduling. (3) SSD has excellent internal parallelism, which offers a strong possibility to further boost I/O performance if exploited properly.Therefore, this paper proposes EC-ARR (Active-Reconstruction-Read), which exploits in a broader sense both its EC module and channel-level parallelism in combination to achieve better read performance. It is able to not only guard against data loss but also assist in normal data reads where data is intact, with active use of data reconstruction of the EC module. Additionally, to further this active reconstruction method in terms of channel-level parallelism, the static stripe with a length smaller than the number of channels and the data placement scheme with channel-wear-aware are adopted.Simulation experiment based on SSDsim [1] shows that compared with conventional channel-RAID5 SSD, ARR-enabled SSD can increase the read performance by up to 18.5% without significant write performance degradation or storage overhead.

Shuo Li, Mingzhu Deng, Fang Liu, Zhiguang Chen, Nong Xiao

Research of Benchmarking and Selection for TSDB

With the increasing use of sensor and IoT technologies, sensor stream data is generated and consumed at an unprecedented scale. Traditional storage mechanisms represented by relational database systems become more and more difficult to adapt to the store, query, update and other operations of large-scale sensor stream data. This, in turn, has led to the emergence of a new kind of complementary non-relational data store subsumed under the term time series database (TSDB). However, the heterogeneity and diversity of numerous TSDBs impede the well-informed comparison and selection for a given application context. A thorough survey shows that current benchmarks for TSDBs are few and they still need improvement in workload implementation based on real business requirements, data generator based on real-world data and fine-grained performance metrics. How to implement a benchmarking tool for TSDBs according to different tradeoffs in IoT scenarios becomes a key challenge, which will be addressed in this paper. Firstly, we propose a benchmarking platform TS_Store_Test, which integrates five well-known TSDBs using the micro-services mechanism. Meanwhile, we integrated and extend Prometheus to capture the performance metrics in a refined manner. Based on TS_Store_Test, the execution efficiency of some workloads from technical and business perspectives is tested using the real hydrological sensor data. Experimental results demonstrate the usability and scalability of TS_Store_Test, and also show the performance differences of different TSDBs for sensor stream data. Finally, TS_Store_Test is compared with other NoSQL benchmarking suits.

Feng Ye, Zihao Liu, Songjie Zhu, Peng Zhang, Yong Chen

HDF5-Based I/O Optimization for Extragalactic HI Data Pipeline of FAST

The Five-hundred-meter Aperture Spherical Radio Telescope (FAST), which is the largest single-dish radio telescope in the world, has been producing a very large data volume with high speed. So it requires a high performance data pipeline to covert the huge raw observed data to science data product. However, the existing solutions of pipelines widely used in radio data processing cannot tackle this situation efficiently. The paper proposes a pipeline architecture for FAST based on HDF5 format and several I/O optimization strategies. First, we design the workflow engine driving the various tasks efficiently in the pipeline; second, we design a common radio data storage specification on the top of HDF5 format, and also developed a fast converter to map the original FITS format to the new HDF5 format; third, we apply several concrete strategies to optimize the I/O operations, including chunks storage, parallel reading/writing, on-demand dump, and stream process etc. In the experiment of processing 700 GB of FAST data, the results show that HDF5 based data structure without other optimizations was 1.7 times faster than original FITS format. If chunk storage and parallel I/O optimization are applied, the overall performance can reach 4.5 times as the original one. Moreover, due to the good expansibility and flexibility, our solution of FAST pipeline can be adapted to other radio telescopes.

Yiming Ji, Ce Yu, Jian Xiao, Shanjiang Tang, Hao Wang, Bo Zhang

Understanding the Resource Demand Differences of Deep Neural Network Training

More deep neural networks (DNN) are deployed in the real world, while the heavy computing demand becomes an obstacle. In this paper, we analyze the resource demand differences of DNN training and help understand its performance characteristic. In detail, we study both shared-memory and message-passing behavior in distributed DNN training from layer-level and model-level perspectives. From layer-level perspective, we evaluate and compare basic layers’ resource demand. From model-level perspective, we measure parallel training of representative models then explain the causes of performance differences based on their structures. Experimental results reveal that different models vary in resource demand and even a model can have very different resource demand with different input sizes. Further, we give out some observations and recommendations on performance improvement of on-chip training and parallel training.

Jiangsu Du, Xin Zhu, Nan Hu, Yunfei Du

Twitter Event Detection Under Spatio-Temporal Constraints

Billions of data spread on Twitter every day, which carries a lot of information. It is meaningful to mine the useful information and make it valuable. The purpose of Twitter event detection is to detect what happened in our real life from these unstructured data. We introduce the spatio-temporal information of tweets into event detection. The event detection can be divided into three steps in this paper. First, we use the space difference between event words and noise words and introduce the relationship between words, then we can build a model to separate event words and noise words. Then we define the similarity between event tweets from three different aspects, which make up for the shortcomings of existing methods. Finally, we construct a graph based on the similarity between tweets, and the graph can be divided into different event clusters to complete the event detection. Our method has achieved good results and can be applied to event detection in actual life.

Gaolei Fei, Yong Cheng, Yang Liu, Zhuo Liu, Guangmin Hu

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise