Skip to main content

Über dieses Buch

The two-volume set LNCS 11944-11945 constitutes the proceedings of the 19th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2019, held in Melbourne, Australia, in December 2019.

The 73 full and 29 short papers presented were carefully reviewed and selected from 251 submissions. The papers are organized in topical sections on: Parallel and Distributed Architectures, Software Systems and Programming Models, Distributed and Parallel and Network-based Computing, Big Data and its Applications, Distributed and Parallel Algorithms, Applications of Distributed and Parallel Computing, Service Dependability and Security, IoT and CPS Computing, Performance Modelling and Evaluation.



Parallel and Distributed Architectures


PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction

Interconnect networks increasingly bottleneck the performance of datacenters and HPC due to ever-increasing communication overhead. High-radix switches are widely deployed in interconnection networks to achieve higher throughput and lower latency. However, network latency could be greatly deteriorated due to traffic burst and micro-burst features. In this paper, we propose a Prefetch and prediction based Switch (PPS) which can effectively reduce the packet delay and eliminate the effect of traffic burst. By using dynamic allocation multiple queueing (DAMQ) buffer with data prefetch, PPS implements concurrent write and read with zero-delay, thus implementing full pipeline of the packet scheduling. We further propose a simple but efficient arbitration scheme, which completes a packet arbitration within one clock cycle meanwhile maintaining higher throughput. Moreover, by predicting the arbitration results and filtering the potential failed requests in the next round, our scheduling algorithm demonstrates indistinguishable performance from the iSLIP, but with nearly half of the iSLIP’s area and 36.37% less logic units (LUTs). Attributing to the optimal schemes of DAMQ with control data prefetch and two-level scheduling with arbitration prediction, PPS achieves low-latency and high throughput. Also, PPS can easily extend the switching logic to a higher radix for the hardware complexity grows linearly with the number of ports.

Yi Dai, Ke Wu, Mingche Lai, Qiong Li, Dezun Dong

SWR: Using Windowed Reordering to Achieve Fast and Balanced Heuristic for Streaming Vertex-Cut Graph Partitioning

Graph partitioning plays a very fundamental and important role in a distributed graph computing (DGC) framework, because it determines the communication cost and workload balance among computing nodes. Existing solutions are mainly heuristic-based but unfortunately cannot achieve partitioning quality, load balance, and speed at the same time. In this paper, we propose Sliding-Window Reordering (SWR), a streaming vertex-cut graph partitioning algorithm, that introduces a pre-partitioning window to re-order incoming edges, making it much easier for a greedy strategy to maintain balance while optimizing edge assignment at a minimal computational cost. We analytically and experimentally evaluate SWR on several real-world and synthetic graphs and show that it achieves the best overall performance. Compared with HDRF, the state-of-the-art at present, the partitioning speed is increased by 3–20 times, and the partitioning quality is increased by 15% to 30% on average when achieving balanced load among all nodes.

Jie Wang, Dagang Li

Flexible Data Flow Architecture for Embedded Hardware Accelerators

In order to enable control units to run future algorithms, such as advanced control theory, advanced signal processing, data-based modeling, and physical modeling, the control units require a substantial step-up in computational power. In case of an automotive Engine Control Unit (ECU), safety requirements and cost constraints are just as important. Existing solutions to increase the performance of a microcontroller are either not suitable for a subset of the expected algorithms, or too expensive in terms of area. Hence, we introduce the novel Data Flow Architecture (DFA) for embedded hardware accelerators. The DFA is flexible from the concept level to the individual functional units to achieve a high performance per size ratio for a wide variety of data intensive algorithms. Compared to hardwired implementations, the area can be as little as 1.4 times higher at the same performance.

Jens Froemmer, Nico Bannow, Axel Aue, Christoph Grimm, Klaus Schneider

HBL-Sketch: A New Three-Tier Sketch for Accurate Network Measurement

Network measurement is critical for many network functions such as detecting network anomalies, accounting, detecting elephant flow and congestion control. Recently, sketch based solutions are widely used for network measurement because of two benefits: high computation efficiency and acceptable error rate. However, there is usually a tradeoff between accuracy and memory cost. To make a reasonable tradeoff, we propose a novel sketch, namely the HBL (Heavy-Buffer-Light) sketch in this paper. The architecture of HBL sketch is three-tier consisting of heavy part, buffer layer and light part, which can be viewed as an improved version of Elastic sketch which is the state-of-the-art in network measurement. Compared to the Elastic sketch and other typical work, HBL sketch can reduce the average relative error rate by 55%–93% with the same memory capacity limitations.

Keyan Zhao, Junxiao Wang, Heng Qi, Xin Xie, Xiaobo Zhou, Keqiu Li

Accelerating Large Integer Multiplication Using Intel AVX-512IFMA

In this study, we implemented large integer multiplication with Single Instruction Multiple Data (SIMD) instructions. We evaluated the implementation on a processor with Cannon Lake microarchitecture, containing Intel AVX-512IFMA (Integer Fused Multiply-Add) instructions. AVX-512IFMA can compute multiple 52-bit integer multiplication and addition operations through one instruction and it has the potential to process large integer multiplications faster than its conventional AVX-512 counterpart. Furthermore, the AVX-512IFMA instructions take three 52-bit integers of 64-bit spaces as operands, and we can use the remaining 12 bits effectively to accumulate carries (reduced-radix representation). For multiplication in the context of larger integers, we applied the Karatsuba and Basecase methods to our program. The former is known to be a faster algorithm than the latter. For evaluation purposes, we compared execution times against extant alternatives and the GNU Multiple Precision Arithmetic Library (GMP). This comparison showed that we were able to achieve a substantive improvement in performance. Specifically, our proposed approach was up to approximately 3.07 times faster than AVX-512F (Foundation) and approximately 2.97 times faster than GMP.

Takuya Edamatsu, Daisuke Takahashi

A Communication-Avoiding Algorithm for Molecular Dynamics Simulation

Molecular dynamics and many similar time-dependent computing tasks are defined as simple state updates over multiple time steps. In recent years, modern supercomputing clusters have enjoyed fast-growing compute capability and moderate-growing memory bandwidth, but their improvement of network bandwidth/latency is limited. In this paper, we propose a new communication-avoiding algorithmic model based on asynchronous communications which, unlike BSP, records and handles multiple iterative states together. The basic idea is to let computation run in small regular time steps while communications over longer dynamic time steps. Computation keeps checking inaccuracies so that the intervals between communications are small in volatile scenarios but longer when dynamics is smooth. This helps reduce the number of data exchanges via network communication and hence improve the overall performance when communication is the bottleneck. We test MD simulation of condensed covalent materials on the Sunway TaihuLight. For best time-to-solution, the general-purpose supercomputer Sunway TaihuLight performs 11.8 K steps/s for a system with 2.1 million silicon atoms and 5.1 K steps/s for 50.4 million silicon atoms. This time-to-solution performance is close to those of state-of-art hardware solution. A software solution using general-purpose supercomputers makes the technology more accessible to the general scientific users.

Bei Wang, Yifeng Chen, Chaofeng Hou

Out-of-Core GPU-Accelerated Causal Structure Learning

Learning the causal structures in high-dimensional datasets enables deriving advanced insights from observational data. For example, the construction of gene regulatory networks inferred from gene expression data supports solving biological and biomedical problems, such as, in drug design or diagnostics. With the adoption of Graphics Processing Units (GPUs) the runtime of constraint-based causal structure learning algorithms on multivariate normal distributed data is significantly reduced. For extremely high-dimensional datasets, e.g., provided by The Cancer Genome Atlas (TCGA), state-of-the-art GPU-accelerated algorithms hit the device memory limit of single GPUs and consequently, execution fails. In order to overcome this limitation, we propose an out-of-core algorithm for GPU-accelerated constraint-based causal structure learning on multivariate normal distributed data. We experimentally validate the scalability of our algorithm, beyond GPU device memory capacities and compare our implementation to a baseline using Unified Memory (UM). In recent GPU generations, UM overcomes the device memory limit, by utilizing the GPU page migration engine. On a real-world gene expression dataset from the TCGA, our approach outperforms the baseline by a factor of 95 and is faster than a parallel Central Processing Unit (CPU)-based version by a factor of 236.

Christopher Schmidt, Johannes Huegle, Siegfried Horschig, Matthias Uflacker

Software Systems and Programming Models


Accelerating Lattice Boltzmann Method by Fully Exposing Vectorizable Loops

Lattice Boltzmann Method (LBM) plays an important role in CFD applications. Accelerating LBM computation indicates the decrease of simulation costs for many industries. However, the loop-carried dependencies in LBM kernels prevent the vectorization of loops and general compilers therefore have missed many opportunities of vectorization. This paper proposes a SIMD-aware loop transformation algorithm to fully expose vectorizable loops for LBM kernels. The proposed algorithm identifies most potential vectorizable loops according to a defined dependence table. Then, it performs appropriate loop transformations and array copying techniques to legalize loop-carried dependencies and makes the identified loops automatically vectorized by compiler. Experiments carried on an Intel Xeon Gold 6140 server show that the proposed algorithm significantly raises the ratio of number of vectorized loops to number of all loops in LBM kernels. And our algorithm also achieves a better performance than an Intel C++ compiler and a polyhedral optimizer, accelerating LBM computation by 147% and 120% on average lattice update speed, respectively.

Bin Qu, Song Liu, Hailong Huang, Jiajun Yuan, Qian Wang, Weiguo Wu

A Solution for High Availability Memory Access

Nowadays, in-memory computing has plenty of applications like artificial intelligence, databases, machine learning, etc. These applications usually involve with the frequent access to memory. On the other hand, memory components typically become error-prone over time due to the increase of density and capacity. It is urgently important to develop solutions for high-availability memory access. Yet, existing solutions are either lack of flexibility, or consistently more expensive than native memory. To the end, this paper presents a solution called SC2M. It is a software-controlled, high-availability memory mirroring solution. Our solution can flexibly set the granularity of the memory areas for various levels. Furthermore, it can perform duplication of the user-defined data structures in a high-availability version. The systematic instruction-level granularity for memory duplication reduces the overheads for backup, and lowers the probability of data loss. Experiment results demonstrate the feasibility and superiorities of our solution.

Chunjing Gan, Bin Wang, Zhi-Jie Wang, Huazhong Liu, Dingyu Yang, Jian Yin, Shiyou Qian, Song Guo

Verification of Microservices Using Metamorphic Testing

Microservices architecture is drawing more and more attention recently. By dividing the monolithic application into different services, microservices-based applications are more flexible, scalable and portable than traditional applications. However, the unique characteristics of Microservices architecture have also brought significant challenges for software verification. One major challenge is the oracle problem: in the testing of microservices, it is often very difficult to verify the test result given a test input, due to the features of wide distribution, heterogeneity, frequent changes, and numerous runtime behaviors. To tackle such a challenge, in this paper, we investigate how to apply metamorphic testing into the verification of microservices-based applications, which is a simple yet effective approach to oracle problem. Empirical studies are conducted to evaluate the performance of metamorphic testing based on real-world microservice applications, against the baseline random testing technique with a complete oracle. The results show that in the absence of oracles, metamorphic testing can deliver relatively high failure-detection effectiveness. Our work demonstrates that metamorphic testing is both applicable and effective in addressing the oracle problem for the verification of microservices, similar to many other application domains.

Gang Luo, Xi Zheng, Huai Liu, Rongbin Xu, Dinesh Nagumothu, Ranjith Janapareddi, Er Zhuang, Xiao Liu

A New Robust and Reversible Watermarking Technique Based on Erasure Code

With the growth of Information and Communication Technology, data sharing plays a pivotal role in all parts of the Internet. A major issue that needs to be tackled urgently is to protect the ownership of data during the sharing process. Digital watermarking technology is a major solution to ownership protection. Many reversible watermarking approaches are proposed to achieve the function of ownership protection, but most methods can only resist malicious attacks to a certain extent and can not regenerate the complete watermarks in case part of watermarks are destroyed. In this paper, a robust and reversible watermarking technique (RRWEC) applied in relational database has been proposed, which utilizes the Erasure Codes Technique and watermarking in groups. The watermarks embedded into data come from meaningful characters of identification information of a database owner. The grouping method based on clustering, without dependent primary keys, is utilized to group the data, which can resist attacks on the data structure. The erasure code technique is devised to regenerate the complete watermarking information when some sub-watermarks embedded into data are maliciously modified or removed. The results of experiments verify the effectiveness and robustness of RRWEC against malicious attacks, such as data structure attack, subset addition attack, subset alteration and deletion attack, and show that when half of the watermarks are destroyed, the watermark regeneration algorithm can still recover the complete watermarking information.

Heyan Chai, Shuqiang Yang, Zoe L. Jiang, Xuan Wang, Yiqun Chen, Hengyu Luo

Exit-Less Hypercall: Asynchronous System Calls in Virtualized Processes

Many projects of virtualized processes are emerging for less overhead than traditional virtual machines and more isolation than containers. A virtualized process uses hardware virtualization to provide a process abstraction. The virtualized processes are deemed as inefficient compared against native processes using system calls since hypercalls they use cause high-overhead context switches.However, current performance of system calls is severely damaged by Kernel Page Table Isolation (KPTI) while hypercalls are unaffected. Unexpectedly, that gives hopes for virtualized processes to reach competitive performance against native processes.In this paper, we propose and implement Exit-Less Hypercall, a new style of execution framework in virtualized processes by introducing asynchronity, new thread models and adaptive migration.We evaluate the prototype and make a detailed analysis on the impacts of context switches from the native and virtualized processes with KPTI. Moreover, the experiments also show that Exit-Less Hypercall achieves a good performance improvement of up to 121% on virtualized processes using legacy hypercalls and even outperforms native processes using legacy system calls with KPTI by 81%.

Guoxi Li, Wenhai Lin, Wenzhi Chen

Automatic Optimization of Python Skeletal Parallel Programs

Skeletal parallelism is a model of parallelism where parallel constructs are provided to the programmer as usual patterns of parallel algorithms. High-level skeleton libraries often offer a global view of programs instead of the common Single Program Multiple Data view in parallel programming. A program is written as a sequential program but operates on parallel data structures. Most of the time, skeletons on a parallel data structure have counterparts on a sequential data structure. For example, the map function that applies a given function to all the elements of a sequential collection (e.g., a list) has a map skeleton counterpart that applies a sequential function to all the elements of a distributed collection.Two of the challenges a programmer faces when using a skeleton library that provides a wide variety of skeletons are: which are the skeletons to use, and how to compose them? These design decisions may have a large impact on the performance of the parallel programs. However, skeletons, especially when they do not mutate the data structure they operate on, but are rather implemented as pure functions, possess algebraic properties that allow to transform compositions of skeletons into more efficient compositions of skeletons. In this paper, we present such an automatic transformation framework for the Python skeleton library PySke and evaluate it on several example applications.

Frédéric Loulergue, Jolan Philippe

Distributed and Parallel and Network-Based Computing


Impromptu Rendezvous Based Multi-threaded Algorithm for Shortest Lagrangian Path Problem on Road Networks

Input to the shortest lagrangian path (SLP) problem consists of the following: (a) road network dataset (modeled as a time-varying graph to capture its temporal variation in traffic), (b) a source-destination pair and, (c) a departure-time ($$t_{dep}$$). Given the input, the goal of the SLP problem is to determine a fastest path between the source and destination for the departure-time $$t_{dep}$$ (at the source). The SLP problem has value addition potential in the domain of urban navigation. SLP problem has been studied extensively in the research literature. However, almost all of the proposed algorithms are essentially serial in nature. Thus, they fail to take full advantage of the increasingly available multi-core (and multi-processor) systems. However, developing parallel algorithms for the SLP problem is non-trivial. This is because SLP problem requires us to follow Lagrangian reference frame while evaluating the cost of a candidate path. In other words, we need to relax an edge (whose cost varies with time) only for the time at which the candidate path (from source) arrives at the head node of the edge. Otherwise, we would generate meaningless labels for nodes. This constraint precludes use of any label correcting based approaches (e.g., parallel version of Delta-Stepping at its variants) as they do not relax edges along candidate paths. Lagrangian reference frame can be implemented in label setting based techniques, however, they are hard to parallelize. In this paper, we propose a novel multi-threaded label setting algorithm called IMRESS which follows Lagrangian reference frame. We evaluate IMRESS both analytically and experimentally. We also experimentally compare IMRESS against related work to show its superior performance.

Kartik Vishwakarma, Venkata M. V. Gunturi

FANG: Fast and Efficient Successor-State Generation for Heuristic Optimization on GPUs

Many optimization problems (especially nonsmooth ones) are typically solved by genetic, evolutionary, or metaheuristic-based algorithms. However, these genetic approaches and other related papers typically assume the existence of a neighborhood or successor-state function N(x), where x is a candidate state. The implementation of such a function can become arbitrarily complex in the field of combinatorial optimization. Many N(x) functions for a huge variety of different domain-specific problems have been developed in the past to solve this general problem. However, it has always been a great challenge to port or realize these functions on a massively-parallel architecture like a Graphics Processing Unit (GPU). We present a GPU-based method called FANG that implements a generic and reusable N(x) for arbitrary domains in the field of combinatorial optimization. It can be customized to satisfy domain-specific requirements and leverages the underlying hardware in a fast and efficient way by construction. Moreover, our method has a high scalability with respect to the number of input states and the complexity of a single state. Measurements show significant performance improvements compared to traditional exploration approaches leveraging the CPU on our evaluation scenarios.

Marcel Köster, Julian Groß, Antonio Krüger

DETER: Streaming Graph Partitioning via Combined Degree and Cluster Information

Efficient graph partitioning plays an important role in distributed graph processing systems with the rapid growth of the scale of graph data. The quality of partitioning affects the performance of systems greatly. However, most existing vertex-cut graph partitioning algorithms only focused on degree information and ignored the cluster information of a coming edge when assigning edges. It is beneficial to assign an edge to a partition with more neighbors because keeping a dense subgraph in one partition would reduce the communication cost. In this paper, we propose DETER, an efficient vertex-cut streaming graph partitioning algorithm that takes both degree and cluster information into account when assigning an edge to one partition. Our evaluations suggest that DETER algorithm owns the ability to efficiently partition large graphs and reduce communication cost significantly compared to state-of-the-art graph partitioning algorithms.

Cong Hu, Jiang Zhong, Qi Li, Qing Li

Which Node Properties Identify the Propagation Source in Networks?

Malignant propagation events in networks, such as large-scale diffusion of computer viruses, rumors and failures, have caused massive damage to our society. Thus, it is critical to study how to identify the propagation source. However, existing source identification algorithms only quantify the impact mechanisms of part of the factors that affect the Maximum Likelihood Estimator (MLE) of propagation source, which result in reduced source identification accuracy. In this paper, through constructing a mathematical model for propagation process, we derive two node properties, called Average Eccentricity and Infection Force, which quantify the impact mechanisms of all the factors that affect the MLE of propagation source. And then, we design an AEIF source identification algorithm based on the above two node properties, which make AEIF algorithm has improved accuracy and lower time complexity than existing algorithm. Finally, in the experimental part, extensive simulations on various synthetic networks and real-world networks demonstrate the outperformance of AEIF algorithm than existing algorithms, and based on the experimental results, some assignment suggestions of parameters in AEIF algorithm are given.

Zhong Li, Chunhe Xia, Tianbo Wang, Xiaochen Liu

t/t-Diagnosability of BCube Network

BCube network is one of the most classical structures in the server-centered data center networks. The diagnosability of BCube is very important for the network reliability. However, there are few complete theoretical studies in this area. In this paper, the t/t-diagnosis strategy is adopted for the BCube network system under the Preparata, Metze, and Chien’s (PMC) model. And meanwhile, the corresponding diagnosis algorithm is designed to be applied in the scenarios with more wrong nodes. Finally, we prove that the t/t-diagnosability for $$ {\text{BC}}u{\text{be}}_{k,n} $$ is $$ \left( {2k + 1} \right)\left( {n - 1} \right) - 2 $$ for $$ n \ge 2 , $$ $$ k \ge 3 $$ based on theoretical analysis and simulation experiments.

Yuhao Chen, Haiping Huang, Xiping Liu, Hua Dai, Zhijie Han

Big Data and Its Applications


Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark

In this paper, we propose Strark-H, a storage and query strategy for large-scale spatial data based on Spark, to improve the response speed of spatial query by considering the spatial location and category keywords of spatial objects. Firstly, we define a custom InputFormat class to make spark natively understand the content of Shapefile, which is a common file format to store spatial data. Then, we put forward a partition and indexing method for spatial storage, based on which spatial data is partitioned unevenly according to the spatial position, which ensures the size of each partition does not exceed the block in HDFS and preserve the spatial proximity of spatial objects in the cluster. Moreover, a secondary index is generated, including global index based on spatial position for all partitions as well as local index based on category of spatial objects. Finally, we design a new data loading and query scheme based on Strark-H for spatial queries including range query, K-NN query and spatial join query. Extensive experiments on OSM show that Strark-H can be applied to Spark to natively support spatial query and storage with efficiency and scalability.

Weitao Zou, Weipeng Jing, Guangsheng Chen, Yang Lu

Multitask Assignment Algorithm Based on Decision Tree in Spatial Crowdsourcing Environment

To improve the resource utilization rate of a platform and increase worker profit, addressing the problem of a limited suitable range in a single-task assignment in a spatial crowdsourcing environment, this paper provides a single-worker multitask assignment strategy. A candidate worker-selection algorithm based on location entropy minimum priority is proposed. Candidate tasks are selected by calculating their location entropy within a selected area. A candidate worker is obtained based on the Manhattan distance between the candidate task and the worker, completing the single-task assignment to the single worker. Then a multitask assignment algorithm based on a decision tree is designed, which builds a multitask screening decision tree and calculates the candidate tasks’ time difference, travel cost ratio, coincidence rate of route, and income growth rate of workers. We filter out the most appropriate task and assign it to a worker to complete the multitasking assignment. Experimental results show that the proposed algorithm can effectively reduce the average travel cost, reduce the idle rate of workers, and improve their income, which has better effectiveness and feasibility.

Dunhui Yu, Xiaoxiao Zhang, Xingsheng Zhang, Lingli Zhang

TIMOM: A Novel Time Influence Multi-objective Optimization Cloud Data Storage Model for Business Process Management

In recent years, lots of the BPM (business process management) data storage strategies are proposed to utilize the computation and storage resources of the cloud. However, most of them are mainly focusing on the cost-effective methods and little effort is paid on the time influence of different datasets in the BPM. Storing the datasets with high time influence in the cloud is conducive to reduce the total cost (storage cost and computation cost) which is also important to the better performance of the BPM. Aiming at this problem, this paper proposes TIMOM, a novel time influence multi-objective optimization cloud data storage model for BPM. In the TIMOM model, the time influence model of the BPM is firstly constructed to calculate the time influence value for each dataset. Based on the time influence value, a new strategy, SHTD, for storing datasets in the cloud is designed. Then, by taking response time and total cost as two objectives, the multi-objective optimization method is designed to optimize the performance of the SHTD strategy. This method conducts the non-dominant sorting and calculates crowding-distance for all datasets. By doing this, important datasets are selected and stored in the cloud for repeatedly usage. Experimental results have demonstrated that the proposed TIMOM model can effectively improve the performance of the cloud BPM systems.

Erzhou Zhu, Meng Li, Jia Xu, Xuejun Li, Feng Liu, Futian Wang

RTEF-PP: A Robust Trust Evaluation Framework with Privacy Protection for Cloud Services Providers

The trust problem of Cloud Services Providers (CSPs) has become one of the most challenging issues for cloud computing. To build trust between Cloud Clients (CCs) and CSPs, a large number of trust evaluation frameworks have been proposed. Most of these trust evaluation frameworks collect and process evidence data such as the feedback and the preferences from CCs. However, evidence data may reveal the CCs’ privacy. So far there are very few trust frameworks study on the privacy protection of CCs. In addition, when the number of malicious CCs’ feedback increases, the accuracy of existing frameworks is greatly reduced. This paper proposes a robust trust evaluation framework RTEF-PP, which uses differential privacy to protect CCs’ privacy. Furthermore, RTEF-PP uses the Euclidean distances between the monitored QoS values and CCs’ feedback to detect malicious CCs’ feedback ratings, and is not affected by the number of malicious CCs’ feedback rating. Experimental results show that RTEF-PP is reliable and will not be affected by the number of malicious CCs’ feedback rating.

Hong Zhong, JianZhong Zou, Jie Cui, Yan Xu

A Privacy-Preserving Access Control Scheme with Verifiable and Outsourcing Capabilities in Fog-Cloud Computing

Fog computing is a distribution system architecture which uses edge devices to provide computation, storage, and sharing at the edge of the network as an extension of cloud computing architecture, where the potential network traffic jams can be resolved. Whereas, the untrustworthy edge devices which contribute the computing resources may lead to data security and privacy-preserving issues. To address security issues and achieve fine-grained access control to protect privacy of users, ciphertext-policy attribute-based encryption (CP-ABE) mechanism has been well-explored, where data owners obtain flexible access policy to share data between users. However, the major drawback of CP-ABE system is heavy computational cost due to the complicated cryptographic operations. To tackle this problem, we propose a privacy-preserving access control (PPAC) scheme and the contributions are tri-folded: (1) we introduce outsourcing capability in fog-cloud computing (FCC) environment; (2) the outsource verification mechanism has been considered to guarantee the third party execute the algorithm correctly; (3) we design a partiality hidden method to protect the privacy information embedded in the access structures. The experimental results show that our proposed PPAC is efficient, economical and suitable for mobile devices with limited resources.

Zhen Cheng, Jiale Zhang, Hongyan Qian, Mingrong Xiang, Di Wu

Utility-Aware Edge Server Deployment in Mobile Edge Computing

Traditional Mobile Cloud Computing (MCC) has gradually turned to Mobile Edge Computing (MEC) to meet the needs of low-latency scenarios. However, due to the unpredictability of user behaviors, how to arrange edge servers in suitable locations and rationally allocate the computing resources is not easy. Besides, the workload between the servers maybe unbalanced, which could lead to a shrinkage of system utility and waste of energy. So we analyze the workloads in a large MEC system and use one day to represent a workload cycle rotation. Combining the idea of differential workload changes with the local greedy method, we propose a new Gradient algorithm under the constraint of given limited computing capacity. We conduct extensive simulations and compared it with the algorithm based on the average workload as the Weight and the Greedy algorithm, which shows that the Gradient algorithm can reach the maximum utility compared with Weight and Greedy methods.

Jianjun Qiu, Xin Li, Xiaolin Qin, Haiyan Wang, Yongbo Cheng

Predicting Hard Drive Failures for Cloud Storage Systems

To improve reactive hard-drive fault-tolerance techniques, many statistical and machine learning methods have been proposed for failure prediction based on SMART attributes. However, disparate datasets and metrics have been used to experimentally evaluate these models, so a direct comparison between them cannot readily be made.In this paper, we provide an improvement to the Recurrent Neural Network model, which experimentally achieves a 98.06% migration rate and a 0.0% mismigration rate, outperforming the state-of-the-art Gradient-Boosted Regression Tree model, and achieves 100.0% failure detection rate at a 0.02% false alarm rate, outperforming the unmodified Recurrent Neural Network model in terms of prediction accuracy. We also experimentally compare five families of prediction models (nine models in total), and simulate the practical use.

Dongshi Liu, Bo Wang, Peng Li, Rebecca J. Stones, Trent G. Marbach, Gang Wang, Xiaoguang Liu, Zhongwei Li

Distributed and Parallel Algorithms


Efficient Pattern Matching on CPU-GPU Heterogeneous Systems

Pattern matching algorithms are used in several areas such as network security, bioinformatics and text mining, where the volume of data is growing rapidly. In order to provide real-time response for large inputs, high-performance computing should be considered. In this paper, we present a novel hybrid pattern matching algorithm that efficiently exploits the computing power of a heterogeneous system composed of multicore processors and multiple graphics processing units (GPUs). We evaluate the performance of our algorithm on a machine with 36 CPU cores and 2 GPUs and study its behaviour as the data size and the number of processing resources increase. Finally, we compare the performance of our proposal with that of two other algorithms that use only the CPU cores and only the GPUs of the system respectively. The results reveal that our proposal outperforms the other approaches for data sets of considerable size.

Victoria Sanz, Adrián Pousa, Marcelo Naiouf, Armando De Giusti

Improving Performance of Batch Point-to-Point Communications by Active Contention Reduction Through Congestion-Avoiding Message Scheduling

Communication performance plays a crucial role in both the scalability and the time-to-solution of parallel applications. The share of links in modern high-performance computer networks inevitably introduces contention for communications involving multiple point-to-point messages, thus hinders their performance. Passive contention reduction such as the congestion control of the networks can mitigate network contention but with extra protocol cost, while application-level active contention reduction such as topology mapping techniques can only reduce contention of applications with static communication patterns. In this paper, we explore a different approach to actively reduce network contention through a congestion-avoiding message scheduling algorithm, namely CAMS. CAMS determines how to inject the messages in groups to reduce contention just in time before injecting them into the network, thus it is useful in applications with dynamic communication patterns. Experiments with a 2D halo-exchange benchmark on the Tianhe-2A supercomputer shows that it can improve communication performance up to 27% when messages get large. The proposed approach can be used in conjunction with topology mapping to further improve communication performance.

Jintao Peng, Zhang Yang, Qingkai Liu

Applications of Distributed and Parallel Computing


An Open Identity Authentication Scheme Based on Blockchain

With the development of Public Key Infrastructure (PKI), there implements lots of identity management systems in enterprises, hospitals, government departments, etc. These systems based on PKI are typically centralized systems. Each of them has their own certificate authority (CA) as trust anchor and is designed according their own understanding, thus formalizing lots of trust domains isolated from each other and there is no unified business standards with regard to trust delivery of an identity system to another, which caused a lot of inconveniences to users who have cross-domain requirements, for example, repeatedly register same physical identity in different domains, hard to prove the validity of an attestation issued by a domain to another. Present PKI systems choose solutions such as Trust list, Bridge CA or Cross-authentication of CAs to break trust isolation, but practice shows that they all have obvious defects under existing PKI structure. We propose an open identity authentication structure based on blockchain and design 3 protocols including: Physical identity registration protocol, virtual identity binding protocol and Attribution attestation protocol. The tests and security analysis show that the scheme has better practice value compared to traditional ones.

Yuxiang Chen, Guishan Dong, Yao Hao, Zhaolei Zhang, Haiyang Peng, Shui Yu

RBAC-GL: A Role-Based Access Control Gasless Architecture of Consortium Blockchain

Blockchain-based decentralized applications (DApps) have been used in various industrial areas. More are more companies are willing to participate in blockchain technology. Notheisen [18] proposed a framework based on Ethereum [3] to trade lemons on the blockchain platform. In this work, we discuss the application aims to digitize valued assets in the commercial area, such as real estate and watches. This article introduces a general DApp framework and some standard attack methods in the beginning. Our proposed novel role-based access control (RBAC) model improves the system permission control, and gasless mechanism enhances security and reliability. It allows users to publish their assets and use cryptocurrency to trade online. Meanwhile, this work can prevent several categories of attacks, such as gas-related attacks and malicious API invokes, according to our improvements. Besides, the efficiency performance of the system remains the same as before.

Zhiyu Xu, Tengyun Jiao, Lin Yang, Donghai Liu, Sheng Wen, Yang Xiang

Developing Patrol Strategies for the Cooperative Opportunistic Criminals

Stackelberg security game (SSG) has been widely used in counter-terrorism, but SSG is not suitable for modeling opportunistic crime because the criminals in opportunistic crime focus on real-time information. Hence, the opportunistic security game (OSG) model is proposed and applied in crime diffusion in recent years. However, previous OSG models do not consider that a criminal can cooperate with other criminals and this situation is very common in real life. Criminals can agree to attack the selected multiple targets simultaneously and share the utility. The police may be unable to decide which target to protect because multiple targets are attacked at the same time, so criminals can gain more utility through cooperation and interfere with police decisions. To overcome this limitation of previous OSG model, this paper makes the following contributions. Firstly, we propose a new security game framework COSG (Cooperative Opportunistic Security Game) which can capture bounded rationality of the adversaries in the cooperative opportunistic crime. Secondly, we use a compact form to solve the problem of crime diffusion in the cooperative opportunistic crime. Finally, extensive experiments to demonstrate the scalability and feasibility of our proposed approach.

Yanan Zhao, Mingchu Li, Cheng Guo

Deep Learning vs. Traditional Probabilistic Models: Case Study on Short Inputs for Password Guessing

The paper focuses on the comparative analysis of deep learning algorithms and traditional probabilistic models on strings of short lengths (typically, passwords). The password is one of the dominant methods used in user authentication. Compared to the traditional brute-force attack and dictionary attack, password guessing models use the leaked password datasets to generate password guesses, expecting to cover as many accounts as possible while minimizing the number of guesses. In this paper, we analyze the password pattern of leaked datasets and further present a comparative study on two dominant probabilistic models (i.e., Markov-based model and Probabilistic Context-Free Grammars (PCFG) based model) and the PassGAN model (which is a representative deep-learning-based method).We use Laplace smoothing for the Markov model and introduce particular semantic patterns to the PCFG model. Our output shows that the Markov-based models can cover the vast majority of the passwords in the test set and PassGAN demonstrates surprisingly the worst effect. Nevertheless, considering the threat that an attacker may adjust the training set, the PCFG model is better than the Markov model. Using Passcode with high-frequency passwords can increase the coverage while reducing the number of guesses. Brute-force attack can also work better when used in conjunction with probabilistic models. For the same billion guesses, brute-force attack can be used to crack pure digital passwords of 4 to 8 lengths, and original-PCFG and modified-PCFG could increase by 11.16% and 8.69%, respectively.

Yuan Linghu, Xiangxue Li, Zhenlong Zhang

A Fully Anonymous Authentication Scheme Based on Medical Environment

The RFID (Radio Frequency IDentification) technology have been widely applied in the medical environment. Meanwhile, the demand of protecting the patients’ identity information and private medical data is becoming more and more urgent. In addition, many existing schemes only preserve privacy against the external attackers but without considering the threats from servers and readers. In this paper, we propose a fully anonymous medical mutual authentication scheme, which allows identifiers to authenticate to servers without revealing the identity. The proposed scheme applies the anonymous credential and ECC (Elliptic Curve Cryptography) to achieve anonymousness and mutual authentication. Moreover, through the security analysis, our scheme can provide privacy preserving against both the outsider and insider attackers during the authentication compared with typical ECC-based schemes. And the experimental results indicate that the cost on our novel scheme is affordable.

Jing Lv, Ning Xi, Xue Rao

Service Dependability and Security


RaNetMalDozer: A Novel NN-Based Model for Android Malware Detection Over Task Kernel Structures

The traditional neural network can maintain the performance of identifying Android malware by learning Android characteristics. But owing to the rapid growth of Android information, it cannot deal with the massive data efficiently for millions of Android applications. In this paper, we propose a novel NN-based model (RaNetMalDozer) for Android malware detection to improve the accuracy rate of classification and the training speed. The RaNetMalDozer (RNMD) can dynamically select hidden centers by a heuristic approach and expeditiously gather the large-scale datasets of 12,750 Android applications. To systematically analyze Android kernel behaviors, we investigate 112 Android kernel features of task_struct and offer forensic analysis of key kernel features. Furthermore, compared to the traditional neural network, EBPN method which achieves an accuracy rate of 81%, our RNMD model achieves an accuracy rate of 94% with half of training and evaluation time. Our experiments also demonstrate the RNMD model can be used as a better technique of Android malware detection.

Xinning Wang, Chong Li

Moving Target Defense Against Injection Attacks

With the development of network technology, web services become more convenient and popular. However, web services are also facing serious security threats, especially SQL injection attack(SQLIA). Due to the diversity of attack techniques and the static of defense configurations, it is difficult for existing passive defence methods to effectively defend against all SQLIAs. To reduce the risk of successful SQLIAs and increase the difficulty of the attacker, an effective defence technique based on moving target defence (MTD) called dynamic defence to SQLIA (DTSA) was presented in this article. DTSA diversifies the types of databases and implementation languages dynamically, turns the Web server into an untraceable and unpredictable moving target and slows down SQLIAs. Moreover, the period of mutation was determined by the concept of dynamic programming so as to reduce the hazards caused by SQLIAs and minimize the impact on normal users as much as possible. Final, the experimental results showed that the proposed defence method can effectively defend against injection attacks in relational databases.

Huan Zhang, Kangfeng Zheng, Xiaodan Yan, Shoushan Luo, Bin Wu

Tiger Tally: Cross-Domain Scheme for Different Authentication Mechanism

As the most effective way to improve the efficiency of government work, e-government has been built at all levels of China, accompanied by the construction of hundreds of authentication centers, which cause serious isolation of different systems, waste of resources, inconveniences for users who have business requirements across departments and districts. Currently, users need to repeatedly register and manage multiple different accounts, or even multiple different authentication methods. In the context of population migration, cross-departmental and regional business operations are growing, it is of great significance to find trust transfer methods for different government applications.To overcome the issue, in this paper, we first explicitly put forward a trust delivery model named “Tiger tally” that can use consensus of all the participants instead of traditional centralized structure by using blockchain. Then design a cross-domain authentication protocol that is compatible with different authentication mechanism. As our main contribution, our scheme is advanced to resolve the trust delivery issues and it is strictly considered from perspective of security, low cost and unified regulatory requirements. In particular, by integrating “HMAC”, traditionally the purview of message security with token standard, our scheme realized the idea of “Tiger tally” in ancient. It not only overcomes the long-standing trust delivery obstacles in e-government, but also achieves the traceability of responsibility and security guarantee beyond the isolated systems’ security bound.

Guishan Dong, Yuxiang Chen, Yao Hao, Zhaolei Zhang, Peng Zhang, Shui Yu

Simultaneously Advising via Differential Privacy in Cloud Servers Environment

Due to the rapid development of the cloud computing environment, it is widely accepted that cloud servers are important for users to improve work efficiency. Users need to know servers’ capabilities and make optimal decisions on selecting the best available servers for users’ tasks. We consider the process that users learn servers’ capabilities as a multi-agent Reinforcement learning process. The learning speed and efficiency in Reinforcement learning can be improved by transferring the learning experience among learning agents which is defined as advising. However, existing advising frameworks are limited by a requirement during experience transfer, which all learning agents in a Reinforcement learning environment must have the completely same available choices, also called actions. To address the above limit, this paper proposes a novel differential privacy agent advising approach in Reinforcement learning. Our proposed approach can significantly improve the conventional advising frameworks’ application when agents’ choices are not the completely same. The approach can also speed up the Reinforcement learning by the increase of possibility of experience transfer among agents with different available choices.

Sheng Shen, Tianqing Zhu, Dayong Ye, Mengmeng Yang, Tingting Liao, Wanlei Zhou

Feature Generation: A Novel Intrusion Detection Model Based on Prototypical Network

Intrusion detection becomes more and more essential to ensure cyberspace security. In fact, the detection is a process of classifying traffic data. However, attacks usually try to cover up themselves to be as similar as normal traffic to avoid being detected. This will cause a high degree of overlap among different classes in the input data, and affect the detection rate. In this paper, we propose a feature generation based prototypical network (FGPNetwork) model to solve overlapping data classification problem in intrusion detection. By analyzing the characteristics of data transmission in the network, we select the basic package characteristics and roughly divide them into several parts. Then, a contribution rate is used to calculate the specific contribution of basic features to classification. We order the features by rate descending in each part and generate the new features by Convolutional Neural Networks (CNN) with different kernels. The new features can obtain the intrinsic connection of original features and add more nonlinearity to the model. Finally, the combination of new features and original features will be input into the prototypical network. In prototypical network, data is mapped to a high-dimensional space, and separated by narrowing the distance of data and their respective cluster centers. Because of the uneven distribution of the intrusion detection dataset, we use undersampling method in each batch. The experimental result on NSL-KDD test dataset also shows that our model is better than other deep learning intrusion detection methods.

Shizhao Wang, Chunhe Xia, Tianbo Wang

Topic Reconstruction: A Novel Method Based on LDA Oriented to Intrusion Detection

Traditional intrusion detection methods are facing the problems of distinguishing different types of intrusion with high similarity. The methods use a single value to characterize each attribute and mine the relationship of each attribute at the feature extraction stage. However, this granularity of features extraction is not sufficient to distinguish different intrusions whose network flow characteristics are similar. Facing the problem, we establish an intrusion detection model based on Latent Dirichlet Allocation (ID-LDA) and propose a novel topic reconstruction method to extract the distinctive features. We mine the value distribution of each attribute and the association of multiple attributes to extract the more implicit semantic features. These features are more useful for identifying slight differences in different kinds of intrusions. However, the current LDA models are difficult in determining the most optimal topic number. Meanwhile, the recent methods ignore the multiple topics selection. These above problems result in difficulty in generating the perfect Document-Topic Distribution (DTD) and lower detection accuracy. So we propose a topic overlap degree and a dispersion degree to quantitatively assess the quality of the DTD. Finally, we get the most optimal topic number and select the best topics. Experiments on the public NSL-KDD dataset have verified the validity of the ID-LDA. These results outperform many state-of-the-art intrusion detection methods in terms of accuracy.

Shengwei Lei, Chunhe Xia, Tianbo Wang, Shizhao Wang

PDGAN: A Novel Poisoning Defense Method in Federated Learning Using Generative Adversarial Network

Federated learning can complete an enormous training task efficiently by inviting participants to train a deep learning model collaboratively, and the user privacy will be well preserved for the users only upload model parameters to the centralized server. However, the attackers can initiate poisoning attacks by uploading malicious updates in federated learning. Therefore, the accuracy of the global model will be impacted significantly after the attack. To address this vulnerability, we propose a novel poisoning defense generative adversarial network (PDGAN) to defend the poising attack. The PDGAN can reconstruct training data from model updates and audit the accuracy for each participant model by using the generated data. Precisely, the participant whose accuracy is lower than a predefined threshold will be identified as an attacker and model parameters of the attacker will be removed from the training procedure in this iteration. Experiments conducted on MNIST and Fashion-MNIST datasets demonstrate that our approach can indeed defend the poisoning attacks in federated learning.

Ying Zhao, Junjun Chen, Jiale Zhang, Di Wu, Jian Teng, Shui Yu

A Geo-indistinguishable Location Privacy Preservation Scheme for Location-Based Services in Vehicular Networks

In vehicular networks, the location-based services (LBSs) are very popular and essential for most vehicular applications. However, large number of location information sharing may raise location privacy leakage of in-vehicle users. Since the existing privacy protection mechanisms ignore the trajectory information, so that the location privacy of in-vehicle users and the trade-off between privacy and quality of service (QoS) cannot be effectively solved. In order to provide satisfactory privacy and QoS for in-vehicle users, in this paper, we propose an improved geo-indistinguishable location privacy protection scheme (GLPPS). Specifically, we first select an area of service retrieval (ASR) instead of user’s real location to send to LBS, which can protect location privacy while avoiding the leakage of trajectory. Secondly, we establish an income model based on Stackelberg between in-vehicle users and attackers, and design an IM-ASR algorithm based on the Iterative method and Maximin theorem, to achieve optimal trade-off between privacy and QoS. Finally, we prove that GLPPS satisfies $$\alpha $$-differential privacy. Moreover, the simulation results demonstrate that GLPPS can reduce loss of QoS while improving location privacy compared to other methods.

Li Luo, Zhenzhen Han, Chuan Xu, Guofeng Zhao

A Behavior-Based Method for Distinguishing the Type of C&C Channel

The botnet is one of the most dangerous threats to the Internet. The C&C channel is an essential characteristic of the botnet and has been evolved from the C&S type to the P2P type. Distinguishing the type of C&C channel correctly and timely, and then taking appropriate countermeasures are of great importance to eliminate botnet threats.In this paper, we raise a behavior-based method to classify the type of C&C channel. In our method, we put forward a series of features relevant to C&C behavior, apply a feature selection approach to choose the most significant features, use the Random Forest algorithm to build an inference model, and make the final type determination based on the time slot results and their temporal relationship. The experimental result shows not merely that our method can distinguish the type of C&C channel with an accuracy of 100%, but also our feature selection approach can effectively reduce the number of required features and model training time while ensuring the highest accuracy, and then bring about significant improvements in method efficiency. Moreover, the comparison with another method further manifests the advantages of our work.

Jianguo Jiang, Qilei Yin, Zhixin Shi, Guokun Xu, Xiaoyu Kang

IoT and CPS Computing


Sparse Representation for Device-Free Human Detection and Localization with COTS RFID

Passive human detection and localization is the basis for a broad range of intelligent scenarios including unmanned supermarket, health monitoring, etc. Existing computer vision or wearable sensor based methods though can obtain high precision, they still face some inherent defects, such as privacy issues, battery power limitations. Based on the human movement induced backscattered signal changes, we propose a device-free human detection and localization system on radio-frequency identification (RFID) devices. The system extracts environment-independent features from both RSSI and phase for dynamic monitoring in the first stage, then the target is further located if the moving human is detected. In particular, an overcomplete dictionary is learned when creating the fingerprint library, which helps to make the representation of the location more compact and computationally simple. Moreover, PCA based dimensionality reduction method is then adopted to acquire valid features to determine the final position. Extensive experiments conducted in real-life office and bedroom demonstrate that the proposed system provides high accuracy for human detection and achieves the average distance error of less than 1 m.

Weiqing Huang, Shaoyi Zhu, Siye Wang, Jinxing Xie, Yanfang Zhang

A Novel Approach to Cost-Efficient Scheduling of Multi-workflows in the Edge Computing Environment with the Proximity Constraint

The edge computing paradigm is featured by the ability to offload computing tasks from mobile devices to edge clouds and provide high cost-efficient computing resources, storage and network services closer to the edge. A key question for workflow scheduling in the edge computing environment is how to reduce the monetary cost while fulfilling Service-Level-Agreement in terms of performance and quality-of-service requirements. However, it’s still a challenge to guarantee user-perceived quality of service of applications deployed upon edge infrastructures due to the fact that such applications are constantly subject to negative impacts, e.g., network congestions, unexpected long message delays, shrinking coverage range of edge servers due to battery depletion. In this paper, we study the multi-workflow scheduling problem and propose a novel approach to Cost-Efficient Scheduling of Multi-Workflows in the Edge Computing Environment With Proximity Constraint. The proposed approach aims at minimizing edge computing costs while meeting user-specified workflow completion deadlines and leverages a discrete firefly algorithm for yielding the scheduling plan. We conduct experimental case studies based on multiple well-known scientific workflow templates and a real-world dataset of edge resource locations as well. Experimental results clearly suggest that our proposed approach outperforms traditional ones in terms of cost and makespan.

Yuyin Ma, Junyang Zhang, Shu Wang, Yunni Xia, Peng Chen, Lei Wu, Wanbo Zheng

Differential Privacy Preservation for Smart Meter Systems

With the rapid development of IoT and smart homes, smart meters have received extensive attention. The third-party applications, such as smart home controlling, dynamic demand-response, power monitoring, etc., can provide services to users based on consumption data of household electricity collected from smart meters. With the emergence of non-intrusive load monitoring, privacy issues from the data of smart meters become more and more severe. Differential privacy is a recognized concept that has become an important standard of privacy preservation for data with personal information. However, the existing privacy protection methods for the data of smart meters that are based on differential privacy sacrifices the actual energy consumption to protect the privacy of users, thus affecting the charging of power suppliers. To solve this problem, we propose a group-based noise adding method, so as to ensure the correct electricity billing. The experiments with two real-world data sets demonstrate that our approach can not only provide a strict privacy guarantee but also improve performance significantly.

Junfang Wu, Weizhong Qiang, Tianqing Zhu, Hai Jin, Peng Xu, Sheng Shen

Performance Modelling and Evaluation


Secure Multi-receiver Communications: Models, Proofs, and Implementation

With the demand of providing message authentication and confidentiality as well as receiver anonymity in applications such as multicast communication, digital content distribution systems, and pay-per-view channels, many anonymous multi-receiver signcryption mechanisms have been put forward to offer these functions efficiently, which have the lower computational cost and communication overhead compared with the signature-then-encryption approaches. However, most certificateless-based schemes either focus on providing receiver anonymity or focus on improving signcryption efficiency. In addition, most certificateless-based schemes rely on bilinear pairing operations, which are more time consuming than modular exponentiation and scalar multiplication in finite fields. In this paper, we propose a practical anonymous multi-receiver certificateless signcryption (AMCLS) scheme that can satisfy message confidentiality, source authentication, and anonymity simultaneously and efficiently. In the proposed scheme, the sender’s signcryption cost increases linearly with the increase of the designated receivers, while the unsigncryption cost per receiver is constant. The adoption of elliptic curve scalar multiplication instead of bilinear pairing operation improves the efficiency of the proposed scheme. Both the sender and receivers’ identities are encrypted from being exposed to offer anonymity. Through security analysis, our proposal can be proved to achieve chosen-ciphertext attack (CCA) security in encryption indistinguishability and receiver anonymity in strong, commonly accepted attack models. Theoretical analyses and experimental results demonstrate that our scheme enjoys a better efficiency than other certificateless-based schemes.

Maomao Fu, Xiaozhuo Gu, Wenhao Dai, Jingqiang Lin, Han Wang


Weitere Informationen

Premium Partner